Monday 29 October 2012

Expected Points Graph For Chelsea v Manchester United.

Nothing really to add to the acres of posts that are currently being prepared on this explosive game which will be remembered for much more than the thrilling five goal shootout. United scored the opening goal in the first match so far this season where they were the underdogs to do so. They then emulated Liverpool an hour or two earlier in perpetuating the myth that 2-0 is the most dangerous lead in football and Chelsea then seemed poised to add to the debate around momentum in sport. The remainder of the match was football determined to show it's dark, ugly underbelly with red cards, poor decision making and potentially much worse taking centre stage.


4', Luiz (og), 0-1
12', v Persie, 0-2
44', Mata, 1-2
53', Ramires, 2-2
63', Red Card, Ivanovic (Chelsea).
69', Red Card, Torres (Chelsea).
75', Hernandez, 2-3.

Friday 26 October 2012

Why Are Manchester United Conceding So Many Opening Goals ?

November beckons and Manchester United find themselves in the leading two in the Premiership with 18 points from a possible 24 and top of their UEFA Champions league group with a perfect start from their three matches so far. Neither set of results should be surprising, United will surely challenge for the Premiership title and seeding ensures that they are the major force in their UCL group. However, one aspect of their season so far has attracted comment and SAF was suitably irked by another concession of an opening goal to pause during his denouncement of a "poor" referring decision to refer to his side's tardy starts. In 11 UCL and EPL games so far this year, United have conceded the opening goal in eight matches and The Boss wasn't pleased.

The rate at which a team scores or concedes the first goal in a contest, should there be one, is strongly related to the proportion of goals they would be expected to score in such match ups. Unsurprisingly United have been pre game favourites in every game they have played so far this year and as such would have also been favourites to claim any opening goal. Their favouritism in the first goal scoring stakes has ranged from slim ( a shade over 50%) verses Liverpool to very strong (over 85%) in their home encounter with Wigan.

Overall United's pre game supremacy suggests a total of eight opening goals for The Red Devils as the most likely outcome from their 11 "league" matches played this season. So by scoring just three they have perfectly emulated the role expected of their inferior opponents. Surely this deviation from expectations is so large as to indicate that something is badly amiss at the heart of their defence?

A team's recorded match result will always be a combination of their overall performance. It's easy to blame the defence for frequently conceding the opening strike, but we also need to include the failure of the attacking side of the ball to score before their opponents. United's forwards were given almost an hour against Everton on opening day to strike before Fellaini's opener, half a game before Gerrard's opener at Anfield and in a similar vein the attack's task was made much easier when the defence kept Wigan, Galatasary and Newcastle goalless for the entire game. Whether a team gives up the first goal is really a joint team effort.

The same is also true when it comes to retrieving losing positions. It's obviously essential that the team scores in response to falling behind, but the value of those goals are also dependent upon the defence remaining firm. Below I've listed the likely range of outcomes faced by Manchester United in the eight matches where they fell behind.

United's Record When Conceding First In 2012/13.

Games Where United Fell Behind. United's % Chance of Winning from 0-1. United's % Chance of Drawing from 0-1.
v Everton. 12 27
v Fulham. 49 25
v Southampton. 34 27
v Liverpool. 26 32
v Tottenham. 37 26
v Stoke. 47 26
v Cluj. 30 27
v Braga. 47 26
Cumulative Expected Success Rate.
Actual Success Rate.

On average a the team such as United would emerge from these losing positions with a success rate that combines wins and draws of around 50%. The reality was that United managed a success rate of 75% made up of 6 wins and no draws from the 8 matches. The strikeforce naturally grabbed the headlines, but the impressive record also owed something to the defence preventing further decisive goals, most notably away at Anfield and against Cluj. So just as the attack must share part of the fault for United's tendency to trail in 2012/13, the defence should also share some of the credit for effectively rectifying such situations.

Manchester United's defensive performance may have been judged on high profile incidents rather than overall competence, but this is only part of the reason why their defence has perhaps been too harshly classed as disappointing so far.

The major culprit is sample size.

Eleven games contains a fair amount of data, but the identity of the scorer of the first goal only uses one data point. Therefore, we have a very small sample size which can lead to extreme results and often these extreme results can be plausibly explained by a convincing, but spurious narrative. "United are missing Vidic", for example (conveniently forgetting that Vidic also missed a large proportion of last season, when United's first goal concession rate was as impressive as you would expect, 7/38 games).

The reality is that we are simply seeing media attention being directed onto a luck driven artifact of our small sample size. Few headlines are made if teams such as United go on short term runs where they outscore their first goal expectations, (excellence and improvement appears hardly surprising when displayed by the very best), but when the reverse happens (through chance) then people take notice and begin to try to rationally dissect the merely random.

Two opening goals conceded by United have come about through own goals and however you chose to define randomness or luck, few would expect Rooney to continue to hone his scoring skills by warming up with an initial strike at the wrong end as he did last Saturday against Stoke. A team's talent defines how impressive their long term performances will be, but short term streaks are partly out of their control.

United's Defence. More Concentration or Just More Of The Same ?

We can't know exactly how Manchester United's scoring and conceding patterns will change over the remainder of the season, but long term they will perform close to their historical average. We can guess that the press will speculate that SAF has addressed his defences "problem" of allowing the first goal. And in all likelihood the "improvement" will be attributed to skilled management rather than the more likely absence of bad luck.

For a snapshot of what may await United, we just have to look at their cross City rivals. When I wrote this last season, Manchester City had conceded the first goal in their last five Premiership matches, starting with Swansea and ending with Arsenal . If we also include their two Europa League matches against Sporting Lisbon, the run stretched to seven consecutive contests. A mighty fall from earlier in the season when the Champions elect had been impeccable.

As nothing substantial had occurred that obviously indicated that City were a markedly worse defensive or indeed attacking team than they had been earlier in the year, I suggested that simple luck had been predominantly responsible for the unsatisfactory outcomes and they would probably soon produce results closer to their earlier, more extended run. Happily for their fans, City then opened the scoring in all of their remaining matches.

City's lumpy sequence of partly random, partly skill based results demonstrate the type of forces that are currently shaping United's scoring pattern. Just as importantly, City's run of seven first goal "failures" followed immediately by six consecutive "successes" appear far from random......but they are. Randomness always contains runs of consecutive, identical outcomes, even if we think it shouldn't.

Roy Keane suggested at halftime on Tuesday that United needed to concentrate more and SAF indicated that their problem of conceding first would be dealt with. No doubt steps will be taken in training, but given the likely cause of the run, (luck), doing nothing will probably bring about the same improvement. Starting at Chelsea on Saturday, when, for the first time this season United won't be the favoured team to score first.

Wednesday 24 October 2012

Is Shot Blocking A Talent ?

The gradual availability of data relating to events that occur during a match has allowed for a more detailed examination of the ways in which goals are both scored and prevented. Save percentages for keepers are now commonplace and increasingly shots, the precursors to goals can be broken down into sub categories such as on target and off target efforts and also blocked shots. Distance from goal and pitch co ordinates are still  a relatively scarce resource, so at the moment the most productive approach relies on the use of accumulated shot data.

Blocked shots are probably the most neglected of the readily available data, but they account for the fate of almost a quarter of football's goal attempts and can shed light on the choices being made by strikers and the skill sets that may be possessed by defenders.

On average an EPL team will face or attempt over 500 goal attempts over the course of a Premiership season, but even these numbers are mere samples of each team's likely true ability. The actual percentage of shots that either a defence blocks or an attack has blocked over a season will be made up of a combination of randomly blocked efforts and possibly occasions where the skill of the shooter or the blocker has influenced the outcome.

So the first question we need to answer concerns whether blocking shots from a defensive perspective or avoiding blocks from an attacking one is likely to be a repeatable team talent. We can do this by assuming that the league average rate for blocks is shared by every team and then construct a typical spread of rates for blocked shots under these conditions of parity of "talent". If the spread we actually see deviates from that expected by pure chance we can deduce that other factors are present, either individual player talent within teams or a tactical approach that encourages blocks.

Based on the team records for blocked shots from the MCFC lite dataset, the raw attacking and defensive rates appear to indicate the spread isn't consistent with blocking of shots being a purely random process. Some teams possess qualities or setups that see them block or avoid blocks at rates that suggest a skill is involved, especially on the defensive side of the ball.

The Percentage Of Blocked Shots For Attacking Sides In The EPL 2011/12.

Team. Raw Blocking %. Regressed %.
Liverpool. 22.8 24.6
Chelsea. 23.6 25.0
Stoke. 22.9 25.1
Norwich. 23.5 25.2
Aston Villa. 24.2 25.6
Arsenal. 25.9 26.3
Everton. 26.7 26.7
Wigan. 26.6 26.6
Wolves. 27.1 26.7
Snderland. 27.3 26.9
Man Utd. 27.2 27.0
Newcastle. 27.4 27.0
WBA. 27.6 27.1
Spurs. 27.5 27.1
Blackburn. 27.8 27.2
Fulham. 27.7 27.2
QPR. 28.6 27.6
Swansea. 29.0 27.7
Man City. 28.7 27.8
Bolton. 30.7 28.5

We can also use these results to regress the extreme performers, both good and bad towards the league average based on the number of shots they faced or attempted. If the skill is repeatable the season on season correlation is much more likely to be seen in these regressed figures. The first table shows how good teams were at avoiding seeing their shots blocked and there appears to be little correlation between this talent an final league position.

Liverpool top the chart although their striking woes were well documented and successful sides such as Chelsea and Arsenal follow them home along with lesser lights such as Norwich and Villa. That's not of course to imply that a marginally improved ability to avoid seeing you shoot blocked isn't unimportant. Villa for example may owe their Premiership survival to the couple of extra shots that made their way through a forest of legs to possibly find the target. A team's overall record, good or bad is a product of a wide range of footballing talents of which shot blocking is just one.

The Percentage Of Shots Blocked By The Defence In The EPL 2011/12.

Team. Raw Blocking %. Regressed %.
Sunderland. 31.3 30.0
Everton. 31.5 29.9
Stoke. 30.1 29.0
Aston Villa. 29.9 29.0
QPR. 29.5 28.7
WBA. 27.7 27.4
Man Utd. 27.5 27.2
Man City. 27.5 27.1
Spurs. 26.7 26.7
Liverpool. 25.9 26.2
Newcastle. 25.4 25.8
Bolton. 25.5 25.8
Norwich. 25.4 25.8
Swansea. 25.0 25.5
Fulham. 24.9 25.4
Arsenal. 24.2 25.1
Blackburn. 24.1 24.8
Wolves. 24.1 24.8
Wigan. 23.5 24.5
Chelsea. 23.2 24.4

Defensive blocking ability appears to be more unevenly spread between teams. This may be because so teams have exceptional blockers or they adopt a packed defensive style or frustrate opponents into attempting more long range efforts that are more likely to be blocked. Blocking rates for individual players are currently impossible to calculate because although numbers of blocks made by players are available we can only guess at how many attempted blocks they were involved in. So rating individuals at the moment will have to rely on raw counting stats.

Stoke's three best blockers of a shot line up for duty in a defensive wall, while Ric & Crouchie seem less keen.

Stoke are once again towards the top of the table in terms of effectively blocking an opponents efforts and their style of play in 2011/12 certainly involved getting bodies behind the ball. Their three most prolific shot blockers, based on raw numbers, Whelan, Shawcross and Huth invariably made up part of a defensive wall. Of course they may have combined to blocked many efforts precisely because they were often used in the wall or they may have been used in the wall because they had shown great aptitude at anticipating and blocking shots...........

Monday 22 October 2012

Red Cards, Opening Goals & Wins For Both Manchester Clubs.

Last year's title race was fought out almost exclusively between the two Manchester clubs and their matches following the International break saw the Red and Blue half each sent off as strong favourites to beat two Midlands sides, Stoke and WBA, respectively.

United had the more straightforward task, taking on a Stoke side who are now permanently shorn of their extraordinarily long throw weapon and mid table will remain the height of their ambitious as they gradually try to wean themselves off an over reliance on set play goals.

A cursory glance at the EPL table indicated that City's game could be far from straightforward. Their hosts, WBA had started the season strongly with four wins and a solitary defeat and at the start of the day they lay  a point behind City. However, just as the very best can occasionally produce an atypically poor run of short term results, a team that is destined to end the season no better than mid table can also produce Champions League qualifying results before drifting back to their expected level of attainment. The game was certainly not priced up as a meeting of near equals.

It's not uncommon for traditionally mid table teams to start a season by showing markedly improved form, but as this guest post demonstrates few teams manage to maintain that improvement in the manner achieved by Newcastle last season.

Expected Points Graphs For Both Manchester Teams, 20 October, 2012.

11', Rooney(og), 0-1
27', Rooney, 1-1
44', van Persie, 2-1
46', Welbeck, 3-1
58', Kightly, 3-2
65', Rooney, 4-2


23', Red Card, Milner (Manchester City).
67', Long, 1-0
81', Dzeko, 1-1
92', Dzeko, 1-2

So pregame, despite the hot start to the season from WBA, we had two genuine title contenders taking on two mid table teams with United enjoying the added bonus of home advantage and two hours later both Manchester clubs had each claimed maximum points from their matches.

United were the first Manchester club to suffer a setback after 11 minutes, when Rooney lost Shawcross while defending an Adam freekick and tamely walked the ball into his own net as he tried to recover his position. The first goal is often a significant event in a game, however as the post at 5 Added Minutes shows
the opening strike must also be put into a proper context.

United started the game as strong favourites, the game was barely ten minutes old, so they had the lions share of the match to turn things around. At the start of the match the most likely scoreline was a 2-0 victory for the hosts and in the short time that had elapsed, the relative team strengths and the likely shift in game dynamics towards a more attacking United approach made that outcome still most likely. So despite enjoying the ideal start, Stoke were still very much the junior partners in this matchup. And 80 playing minutes later they found themselves on the wrong end of a 4-2 defeat.

If United suffered a minor setback, their rivals City had two punishing obstacles to overcome. Not only did they also concede the opening goal, they did so much later in the game. But they also compounded their plight with Milner's red card after 23'.

A red card on average is bad news. Individual teams can occasionally overcome the disadvantage to secure a positive result, but the general trend is for the carded side to score less than they would have done with a full complement of players and concede more. The severity of the effect is obviously time dependent as well and the graph below illustrates the average depletion in goal difference seen by a team playing with just ten men for varying lengths of time.

The Effect Of A Red Card In The EPL.

Milner's red card, coming as it did after only 23 minutes cost City just over a whole goal. Unlike United, City found the equivalent of allowing the first goal, albeit by way of a red card and the away venue too much of a setback to maintain favouritism in the game and when they further conceded the first goal in reality after 67 minutes, their chances of taking anything from the match was remote. They had to overcome two of the most damaging occurrences in football.

Anyone merely looking at the final score would have seen a not unexpected result, a City win by one goal was the most likely prematch outcome, but as the expected points graph for the game demonstrates, Dzeko provided City with a victory of epic proportions.

Monday 15 October 2012

Using Scoring Records To Predict Future Performance.

In a conversation last week with Simon Gleave of "Scoreboard Journalism" fame I was reminded of the pivotal role that the Poisson Distribution has played in explaining a team's win, loss and draw tally from their goal scoring record and also in predicting their future results from their presumed goal expectancy in those subsequent games. As a retrospective tool the Poisson is unspectacular, but in a predictive role, it's simplicity of use and the ability it gives us to shed light onto the nature of a team's talent and ability is unsurpassed.

There is no shortage of football based posts which describe the actual use and limitations of the Poisson, so I will direct readers to such posts as this one. But for brevity, the Poisson allows you to model the likelihood of any number of discrete events occurring given that we know the average rate at which these events are likely to occur.

So if we think a team is going to average 1.4 goals per 90 minutes against a particular defence we can estimate the probabilities of that team scoring exactly 0, 1, 2, 3 and so on goals. If we repeat the process for their matchday rivals, this allows us to move onto the prediction of assorted scorelines and ultimately game results.

The mathematical steps involved in producing goal probabilities is fairly straightforward, but it is the calculation of each team's future expected goal scoring and conceding rates where much of the hard work lies. Manchester United, as next Saturday will probably confirm score goals at a much higher rate than Stoke City. Before we can confidently begin to estimate the rate at which United will score against City, we need to incorporate such obvious variables as United's scoring rate over a period of games, Stoke's rate of conceding, together will less obvious figures such as the general rate at which home teams outscore away sides and the level of goalscoring typically seen within the Premiership.

By combining such rates for each team we can begin to estimate the likelihood of such match outcomes as a United victory and other, much less likely occurrences, such as a Stoke City away win. How we decide which rates to use for each team will depend on how we implement adjustments such as the weight we give to more recent matches, how severely we regress our final figures towards the mean of our choice and, in the case of the Newcastle Sunderland game whether or not the game is a derby match. (Such matches tend to throw up lower scoring, more evenly matched games).

But the most influential choice we will need to make will be how many matches we use to derive our average team scoring rates. In the table below I've calculated the chance of each EPL team winning their games over the coming weekend using a Poisson based approach incorporating the scoring averages of teams calculated over the previous 32 home and away games, the previous 20 and just using the 7 games played so far this season.

The Win% Chances For EPL Sides On Saturday Using A Poisson Calculation & Expected Scoring Rates Over Differing Timescales.

Team. Using Last 32 Games. Using Last 20 Games. Using Last 7 Games. "True" Odds.
Man Utd. 75 70 54 76
Stoke 8 9 18 8
Fulham. 60 71 80 53
A Villa. 16 9 7 21
Norwich. 15 9 4 15
Arsenal. 65 75 86 62
QPR. 19 15 8 29
Everton. 57 66 79 43
Sunderland. 45 26 36 38
Newcastle. 28 44 30 32
Swansea. 54 42 72 48
Wigan. 21 32 12 24
Spurs. 44 32 21 37
Chelsea. 29 41 52 34
WBA. 22 23 40 19
Man City. 52 52 34 57

The team specific inputs used for each team in Saturday's matches is merely the average of the goals that were scored or conceded by teams over the three different timescales and the purpose of the exercise is to see which timescale produces the most reliable estimation for pre game win and loss probabilities. in short, is it better to use much more recent, but smaller samples of a team's attacking and defensive ability or is more information contained in older, but more numerous sample sizes.

I've used the bookmakers win odds for each team as the "true" odds comparison. These figures are a great untapped source of information. Whatever your views on gambling, the odds presented by bookmakers, once the overround has been stripped from the prices give you an incredibly accurate estimation of the true odds of an event occurring. The odds compilers have access to large amounts of data, long experience at setting prices and a huge vested interest in producing accurate odds. Also we are not at a stage of the season when prices are particularly skewed by expected weight of money. Last season, Bolton's final "must win" visit to Stoke saw the relegation threatened side priced up as having around a 33% chance of winning, even though the evidence suggested that 22% was a more realistic estimate.

Using readily available bookmaking odds as a benchmark does away with the need to produce the copious amounts of estimations that are needed to evaluate a model's effectiveness by comparing predictions against actual outcomes. A fifth of the way into the year and if your win estimate greatly varies from the general bookmaking consensus, then it is your model of events that is almost certainly the one with the flaws.

Swansea and Stoke are likely to experience very different results on Saturday.

Our naive model has no whistles and bells, but the results are overwhelmingly in favour of taking into account as much goalscoring data as possible, even at the expense of recency. Win predictions produced via the Poisson process using data going back 32 games were closest or joint closest to the bookmakers estimate in 13 of the 16 cases. They are shown in blue. (I've omitted the matches involving promoted sides, because they require separate adjustments based around the promoted sides scoring and conceding records from the Championship).

20 game estimates were top or joint top in three out of the 16 teams and the most recent data from the seven games so far this season won out for just Newcastle and Sunderland. A local derby where depressed scoring and home advantage is always factored into prices, so even this meagre victory for recent form alone was a hollow one.

So of the three choices, the scoring records over the previous 32 games trounces the two alternatives as the best predictor of performance in the future. Eight matches comprising 16 teams isn't a huge test sample, but the findings are confirmed over many more matches and multiple leagues and seasons.

Teams can often produce short term bursts of atypical results in a sport where team scoring very rarely averages much more than two goals a game and more is certainly better if you chose to evaluate teams by goals scored and allowed, even if that means you are going back to matches played early in a previous season.

Tuesday 9 October 2012

Worrying Times For Liverpool At Both Ends Of The Pitch.

The recent explosion of available English Premiership data has provided a route for the curious to begin to examine in much more detail the core actions that go towards deciding the outcome of a football match. The Laws of the game award the victory to the team scoring the most goals, so it's sensible to roll the goalscoring process back one stage to try to see which teams are making the most of their scoring opportunities and equally importantly which defences are making scoring difficult for opposing strikers.

How efficiently teams turn goal attempts into shots on target and ultimately into goals often defines where a team ultimately finishes in the table. With access to x,y data points for shooting attempts it is now possible to compare team performance in this area against an average basket of hopefully, representative teams from the EPL. At the very least we should be able to highlight a side's strengths and weaknesses compared to par when dealing with goal attempts or dishing out shots of their own. This is the so called descriptive use of data, projecting suitable regressed figures for use in predicting future performance may come later.

The rolling TV soap opera that is Liverpool's seemingly constant battle with insufficient reward from copious amounts of toil in front of goal again provides an idea test case for this type of analysis. The Reds currently languish on the fringes of the relegation battle, with a negative goal difference (even after a five goal rush at Norwich) and question marks regarding their inability to adapt to a new style of play and also over the form of Pepe Reina.

Shot analysis has made great strides in recent seasons. Analysis no longer has to rely on weight of numbers to even out the quality of the shooting opportunities under investigation, but such unknowns as the position of defenders and the pressure that is being applied to the shooter are still largely absent. So with these caveats, let's look at how Liverpool have fared so far in the EPL.

Using the Opta powered Fourfourtwo app I've recorded the co ordinates of all 123 goal attempts made by Liverpool so far this season and designated each attempt as either "on target", "off target", "blocked" or "a goal"  (a goal is also recorded as an on target effort). By comparing the expected outcome for Liverpool's 123 shots, had they been attempted by an "average" EPL team, with The Red's actual outcomes, we can try to gain a better understanding of their season to date.

Shot Expectancy And Actual Outcome For Liverpool's Attack and Defence. 2012/13 To Date.

Team Expected No. of Shots On Target. Expected No. of Shots Blocked. Expected No. of Goals.
"Average EPL Team" 39 31 10
Actual No. Made By
L'pool Attack.
29 33 9
"Average EPL Team" 24 21 6
Actual No. Allowed
By L'pool Defence.
24 21 12

Looked at over the season so far, there initially appears to be a clear pattern of average performance from the attack because their 123 shots have produced nine goals, just one less than our baseline model predicts for an average team playing similarly average opponents. However, there should be accuracy concerns because Liverpool's shooters have hit the target 29 times compared to an expected 39 and whereas it is possible that the score of 29 is the luck driven outlier, it may be that it will be the goals that begin to fall into line with the wayward shooting.

Based on goals scored Liverpool are about an average attacking side, but shots on target may suggest that currently they aren't quite that good.

If we now move onto the defensive qualities of the side. All conclusions should be tentative after even 100+ shots, so Liverpool's defensive record of dealing with opponent's shots, based as it is on 77 attempts should also be interpreted warily. But from a purely descriptive "this is how we got here" approach the expected goals column stands out.

How accurate a team can be and how quickly their attempts are closed down and blocked is down to a combination of their individual talent and the actions of the defenders. So if we accept that Liverpool have faced a fairly typical cast of opponents so far, defensively, as a unit they are around average at closing down shooters. However, they have conceded double the expected number of goals.

Video analysis would help to decide the cause of this shortfall in goal prevention. It's easy to solely implicate the keeper. But individual mistakes from players in front of him leaving opponents with clearer cut goal attempts or an unusual, but not prohibited run of (poor) outcomes occurring randomly with little actual change in Reina's ability should never be ruled out.

Liverpool's Attacking Record By Game Compared To An Average EPL Baseline.  

Team Expected No. of Shots On Target. Expected No. of Shots Blocked. Expected No. of Goals.
Average Team. 5.1 4.0 1.3
Actual No. v WBA. 2 7 0
Average Team. 5.0 4.8 1.2
Actual No.v Man City 3 5 2
Average Team. 6.0 4.8 1.5
Actual No. v Arsenal. 4 4 0
Average Team. 7.2 6.0 1.9
Actual No. v Sun'land 6 4 1
Average Team 4.8 3.3 1.3
Actual No. v Man U. 6 2 1
Average Team. 5.4 3.7 1.7
Actual No. v Norwich. 6 6 5
Average Team. 5.5 4.8 1.2
Actual No. v Stoke. 2 5 0

Once we start to break down seasons into game by game slices, the numbers begin to become as much a product of Liverpool's opponents as they are of Liverpool. Liverpool may have seen a lot of their shots blocked against WBA simply because The Baggies are extremely adept at blocking shots.

Few teams are currently well known for their ability to block shots, but we can make an more educated estimation around the goal based expectancy. It's likely that Liverpool under performed in the 90 minute stretches that made up the Stoke, Sunderland, Arsenal, WBA games and possibly during the ManU one, while they over performed in the Man City game and massively over performed during the Norwich match.

This illustrates the partly random nature of scoring. Teams can hope for an ideal distribution to provide a maximum points haul, but they have to deal with the distribution of talent based rewards that actually come their way, especially in a short sequence of games. Scoring five at Norwich when an "average on average" matchup would have produced just under two, but then under performing in the majority of other matches,  coupled with accuracy issues from the season long study, may suggest again that goal scoring will be the measurement that will fall more noticeably, rather than accuracy being the one to dramatically increase.

The reality is likely to be a combination of the two, with shots becoming more accurate, goals becoming slightly less efficient and the "new" Liverpool team gravitating to a lowly position within the top ten.

Liverpool's Defensive Record By Game Compared To An Average EPL Baseline. 

Team Expected No. of Shots On Target. Expected No. of Shots Blocked. Expected No. of Goals.
Average Team. 6.1 4.5 1.9
Actual No. v WBA. 6 6 3
Average Team. 3.4 3.0 0.8
Actual No.v Man City 3 3 2
Average Team. 3.5 2.8 0.8
Actual No. v Arsenal. 5 2 2
Average Team. 1.9 2.1 0.4
Actual No. v Sun'land 1 3 1
Average Team 3.0 2.0 1.2
Actual No. v Man U. 3 2 2
Average Team. 5.1 4.1 1.1
Actual No. v Norwich. 4 4 2
Average Team. 1.5 1.8 0.2
Actual No. v Stoke. 2 1 0

The same strength of schedule issues apply in a game by game look at the Red's defence, but the worrying fact is that six of their seven opponents have scored more actual goals than the expectation derived from a wider, hopefully average basket of matchps. The two Manchester clubs and Arsenal are clearly above average, but the balance is restored by the presence of Norwich, Sunderland Stoke and even the currently high flying Baggies. Once again on a game by game basis, collective or individual failings in defence appears to be a big factor in their current plight and may prove to be a vulnerability over the course of the year. Reina is certainly a keeper in the spotlight at present, along with a defence which appears uncomfortable playing the ball from the back.

Friday 5 October 2012

The Case For Data Analysis In Football.

One persistent criticism that has been aimed at football analytics is that it hasn't overturned any existing notions that have formed around the modern game in the same way that the sabermetrics movement challenged the status quo within baseball.

I do not agree with this assertion.

Before we can address this important point, it would be helpful to give a quick (and almost certainly flawed) overview of the statistical revolution that occurred in baseball. Much has been written concerning the major differences between baseball and football. The former has discrete well defined events whereas football is a true "team on team" event where interactions are both complex and numerous. So the challenges in football are different to those found in baseball.

Secondly the timescale and resources available to either sport has been vastly different. Advanced baseball analysis was probably kickstarted in the late 70's with the formal self publication of the thoughts of Bill James, who sought to reinterpret statistical measures that had been around since what is now the last century. Ideas were then developed with the extensive self collection of data, a massive project that required huge amounts of cooperation from enthusiasts on a scale that is all but impossible to replicate within football. The gap between the birth of this fledgling movement and any acknowledged impact on the sport itself was then upwards of twenty years when "Brad Pitt" introduced these "new" ideas to MLB.

By contrast football analytics has had neither the luxury of accumulating large amounts of data, which makes the MCFC and Opta initiative so welcome, nor data laden decades in which to mature, nor so many ancient and flawed targets to demolish.

Many will have their own idea of when football analytics began to evolve, but much of the context setting work started to appear in print and on the internet in the late 90's, mainly based simply on goals scored and allowed and congregating around the many fledgling gambling sites that were and still are so prevalent. Everything from team specific win expectancy, likely final scores and expected times for a first goal to be scored were modeled using two meager team statistics. The importance of different goal scoring environments were recognised and acted upon.

So just as baseball had used models and fundamental data to describe the run and win expectancy of any game in any game state, football amateurs have already done the same for their sport, albeit on sites that are an internet backwater to many.

Football has also used this route to overturn many (journalistic) cliches that persist around the sport. The cup isn't a great leveller, it's the preserve of the Premiership particularly the Big Four or Five. It isn't harder to play against ten men, it's considerably easier. 2-0 isn't the most dangerous lead in football, it's preferable to 1-0 but not as good as 3-0. A team shooting first in a penalty shootout doesn't automatically inherit a 60% chance of winning. And more recently, raw possession isn't as important as what you actually do with it and Swansea aren't Barcelona, Britton isn't Xavi and only this week, West Ham aren't Real Madrid.

To overturn a nonsense, you first need that nonsense to exist.

Progress then stalled through lack of meaningful data, until the very recent introduction of various pay sites, resulting in a rapid familiarity with such areas of the field as "the final third". If the MCFC data dump, which in it's advanced form comprises less than 0.3% of one season's worth of games and therefore contains an even smaller amount of one year's total data, has merely confirmed perceived wisdom as of 2012, isn't that something to celebrate rather than lament.

Sabermetrics, in the view of it's supporters overturned perceived wisdom because the old time scouts got it wrong. It is hugely encouraging, but not totally unexpected to realize that the present day "traditional" football analysts, armed with superior tools and a generation or three removed from analysts in another sport have largely interpreted on field events correctly. And if number crunching can add value and quantify those conclusions, then that's surely even better for everyone involved or even mildly interested in the subject. Collaboration is always preferable to wars and perhaps there isn't a baseball like war to be fought in football. (Rather appropriately baseball is currently embroiled over which flavour of WAR to use).

The fledgling analytics movement within the NFL is probably a much more appropriate field with which to compare football's attempted leap forward. Less developed than baseball, it still has advantages over football (soccer) in terms of simplicity of on field events and access to copious amounts of data. But it's success stories are largely the same as those enjoyed by soccer. NFL number crunchers have helped to sort out the correlation and causation conflict between running the football and winning, they exposed tactical inefficiencies in fourth down decision making and they cleaned up their own self inflicted nonsenses such as the "curse" of running back overuse. They've suggested ways to project college quarterback statistics into the NFL and quantified on field events that are predictive of future wins. It apparently helps if you have a quarterback who can throw the ball........but how much does it help?

In terms of analytical progress, the two sports are neck and neck (although soccer is better placed because of it's global appeal) and with much more data input from interested parties and more than a few false starts, both should progress rapidly in the future. Although those caveats shouldn't really need to be constantly repeated.

Football analytics is in a great place at the moment.

Monday 1 October 2012

How Passing Sequences Create Chances.

In previous posts we've looked at individual passing attempts and how the pitch position of both the passer and the intended receiver impacts on the difficulty of the pass and therefore the likelihood that the pass will be successfully completed. However, the true worth of a team's passing prowess only really becomes apparent when these individual passes are reconstructed into their original passing sequences. 

All of the data that has been used to construct these passing expectancies has been taken from the Bolton Manchester City game from the Reebok, so the usual caveats apply. The conclusions may not be entirely representative of the wider Premiership, although the contrasting styles of each team may reduce this potential problem and the relative closeness of the scoreline throughout the game probably means that both teams had a relatively attacking outlook throughout.

Below I've merely listed the the number of passing moves in each passing sequence before the passing chain is terminated through loss of possession via such events as a shot, a foul, a misplaced pass or a tackle. Every pass is intended to be completed, so I've included "chains" comprising of one misplaced or uncompleted pass and I've also initially terminated chains were a player successfully beats an opponent in a "take on" situation.

Both Bolton and Manchester City attempted very similar number of passing sequences. This is virtually inevitable because a new sequence will tend to start once possession is lost by the opponent and by including single pass chains we can get a clearer picture of the use each side made of it's possession. 

The differing passing styles on that particular match day is clearly visible in the figures. Both teams were equally careless when conceding the ball with their first attempted pass, but the visitors had more longer passing chains than Bolton. City produced nine passing sequences that lasted into double figures compared to just two for the hosts. Equally striking is the concentration of passing sequences for both teams that terminate after just a handful of passes, upwards of 84% of passing moves lasted for four passes or less

Lengths of Passing Chains from The Bolton v Manchester City Game.

No. Of Passes In The Chain.
Manchester City.
Bolton Wanderers.

Much of the previous detail could probably have been guessed at without looking at the actual data. Manchester City are an expensively assembled side who have quality players throughout with the ability to   play a possession based style, while Bolton are traditionally a much more direct team who have recognized the profit to be reaped from delivering the ball rapidly into dangerous areas of the pitch. 

It's only when we begin to look at the combined difficulty of each pass that goes to make up each chain, the input from individual players and the likelihood of chains of differing lengths and from differing starting points producing an attempt on goal can we begin to place passing as a skill into a game context.

We've seen that most passes attempted in a match have an extremely high likelihood of being completed, with the difficulty increasing, predictable with increasing pass length and in areas in and around the opponents goal. 

If we use as an example, Bolton's longest passing sequence which was started by Chris Eagles, midway inside his own half in the 50th minute, comprised 18 completed passes before the same player dragged a chance wide of City's right hand post. Over half of the 18 passes had completion expectations in excess of 80%, the most difficult pass was attempted by full back Robinson on the 12th pass of the move (58% completion expectation), and Eagles contributed six passes to the move in addition to producing the goal attempt. Based on the data and the individual pass expectations, Bolton had about a 1% chance of completing such a move and the last four passes were among the sequence's most difficult attempts. Eagles had around a 10% chance of scoring with his effort. Once we put all these numbers together it quickly becomes apparent why football is a low scoring sport!

This method allows us to pick out not only players who allow longer passing interplay to continue, but also those who may be playing the more difficult passes. Silva and Toure were frequently the most involved players in such Manchester City moves and Silva also linked together small passing sequences with successful "take ons" of opponents. Barry and Milner shared the honours for attempting the most difficult passes of these extended interchanges.

However, given the apparent difficulty in prolonging passing movements and the cash outlay required in player recruitment to build a team capable of producing such sequences, what is the payoff in chance creation? Again sample size is a big caveat, but two factors appear important when turning passing plays into goal attempts. These are, again logically, the area on the pitch where the move originates and the number of completed passes.

Below I've charted how likely it is that a chance will arise from varying lengths of passing sequences, originating from different pitch areas. Winning possession in the final third appears as significant a factor as passing ability. Based on the MCFC data, three passes originating in the final third is likely to produce a goal attempt 10% of the time. To have a similar likelihood of threatening the goal from your own penalty area over four times the number of uninterrupted completions are required. Predictably the combination of last 3rd possession and potent, prolonged passing ability gives a team a better than even money chance of a scoring opportunity arising.

Pass Sequence & Field Position & Their influence On Chance Creation.

Starting Area For Sequence. Number Of Passes. Chances of Producing A Goal Attempt. 
Edge Of Own Box 13 10%
Edge Of Own Box. 3 0.5%
Last 3rd. 13 65%
Last 3rd. 3 10%

By pooling such factors as passing sequence, final 3rd possession and the ability to take on and beat opponents we can highlight where and why scoring opportunities arise, prioritize attainable recruitment of new players and formulate tactics that play to a team's strength and ability.

Swansea make a rare passing excursion into the final third.

We can also add context to such pass orientated teams as Swansea, who invariably begin their sequences deep in their own territory and therefore probably require much longer sequences to create comparable chances, especially with their exaggerated tendency to pass sideways or backwards. On Saturday, the Swans out passed and out possessed Stoke, but were out shot and comfortably beaten on the scoreboard.