Thursday 30 August 2012

A Keeper's Short Passing Ability. An Insignificant Stat That Tells You So Much.

One of the challenges of the recently released Premiership data is trying to marry the figures to on field incidents while at the same time avoiding drawing misleading conclusions because of lack of context or lack of extra detail. Passes and pass completions are one of the most comprehensively covered areas of the data set released by MCFC. Individual passing events are broken down by length, general pitch position and direction, enabling anyone to build  an impressive database relatively quickly.

Below I've summarised the completion rates for all short passes attempted by players during the last campaign. As you would expect, passing is a core footballing skill and the majority of players excel at it. Overall completion rates for all players attempting short passes in all areas of the field exceed 80%

Completion Rates For Short Passes In The Premiership, 2011/12 Season.

Player Position. Completion Rate % .
Striker. 75.8
Midfielder. 84.7
Defender. 83.9
Keeper. 92.1
All Players. 83.3

However, once we break the completion rates down by position we immediately notice a problem. Seemingly, the most accurate practitioners of the short pass as a group are goal keepers and the most accurate short passer of a football in the EPL last term was MCFC's Joe Hart. With all due respect to Joe, when data throws up broad results that are so obviously at odds with reality, then the analysis of the figures is plainly flawed.

Thomas Sorensen....Stoke's most accurate short passer last term ?

In this case the cause of the anomaly is extremely easy to spot. If a keeper misplaces a short pass he may quickly go from being the last line of defence to the only line of defence. Therefore, keepers are only likely to attempt short passes if they are extremely confident that the pass will be completed. Short passes rolled out to an unmarked defensive colleague are much easier to complete than an attempt to thread a 3 yard pass to a fellow striker in a crowded area.

So a goalkeeper's superior passing completion rate comes about because of the type of short passes they choose to attempt. We can further confirm our suspicions that keepers are making predominately easy short passes by looking at the spread of short passing talent that appears to exist between last year's batch of Premiership keepers.

The easier a task is to complete then the more difficult it becomes to quantify the different levels of skill of those undertaking the task. For example, asking a group of individuals to work out 2+2 won't really help you if you are trying to sort the mathematically gifted from the merely numerate. If we calculate the spread of short passing talent within 2011/12 keepers, we discover that the apparent difference in talent is indeed tighter for keepers compared to the values we see for defenders, midfielders and strikers. In short the keepers are, probably wisely, asking themselves easy questions.

A final additional conclusion can be drawn from this trawl through the passing ability of goalies. After correcting for such variables as number of short passes attempted, we find that teammates who shared duties last year appear to have similar levels of passing ability. Sorensen and Begovic, Jaaskelainen and Bogdan, De Gea and Lindergaard, Given and Guzan, Cerny and Bunn, Reina and Doni, each pair's true short passing ability is within just a few percentage points of each other. This may be coincidental, but it's more likely to also reflect the ability of the relatively constant group of players to whom they are passing. A pass is an interaction between two players and if the receiver is constantly allowing the pass to roll under his boot, his mistake will be reflected in the conversion rates of the passer.

A keeper's passing ability isn't near the top of a scout's must have list, so listing their regressed conversion rates for each keeper based on last year's data isn't likely to interest too many people. However, in piecing together the reason's behind Joe Hart's apparent excellence we have uncovered factors that almost certainly contribute to pass conversion rates for other players in other positions within the team.

Explicit mention of the difficulty of the pass choice is absent from the aggregated list, but it appears to be such a large contributor to conversion rates that it can even elevate goalkeepers as a group ahead of presumable more gifted passing midfielders. It is therefore a hugely important factor. A keeper's choice of the easy short passing option compared to his outfield teammates was easy to spot, but less noticeable is discrepancy in pass difficulty choices made by players who occupy similar positions in a team. The player with the higher completion rate may also be choosing an easier option and that has implications for player evaluation and balancing the risk reward of on field actions.

By looking at an obscure goalkeeping statistic, we've highlighted the need to try to quantify the levels of difficulty involved in individual player's on field attempts, the need to acknowledge the level of talent surrounding him and to analyse using only data derived from players plying their craft in similar playing positions, where the risk reward balance for passes will be similar.

The next post will look at the how we can begin to address these issues and contextualize passing for outfield players using the available MCFC data.

Wednesday 29 August 2012

Luck In Sport.

I've never tried my hand at archery except on the Wii and I strongly suspect that competence on the video version doesn't adequately equip you for the real thing. Therefore, should I be lucky enough to be allowed to challenge an Olympic archer it would become rapidly obvious where the talent lay in the one sided competition.

Let's imagine I scored zero bulls from ten attempts and my opponents scored a perfect 10. Our  knowledge of the task being undertaken is enough for us to decide that hitting the bull is a talent and the respective scores should give us a strong indication that the gulf in talent between myself and my opponent is huge. From this limited information we could probably conclude that the Olympian could hit around 99% of such shots because they are yet to miss, although we have only seen 10 trials. I could be generously given a likely future success rate of say 1% purely on the basis that I have a bow in my hand and am competing.

If we repeat the process and the returns are 9 from 10 for the archer and 1 from 10 for me. We can begin to make another informed opinion. The archer is still clearly better than me, but no longer perfect and I have shown enough ability to hit one bull from ten attempts. A reasonable new assessment of the gulf in talent would put the archer at just below his actual 90% strike rate and me just above my figure of 10%. Natural random variation has perhaps favoured the professional and deserted me and I'm actually a little bit better than I've shown and my opponent is really slightly worse than their fine return implies.

We can continue this balancing of random variation and true ability as our respective scores converge, but at some point, say 6 out of 10 and 4 out of 10, we cease to become sure that the discrepancy in scores is still down to a combination of the two factors. It may now be just down to the same forces that allow a fair coin to yield 6 heads from 10 tosses. In short I may now be the equal of my opponent based on talent, but one of us has had a lucky day and the other hasn't.

And if two "coins" can demonstrate variation in "talent" where none exists in limited trials, then so can sports players. It's a point worth remembering the next time you look at raw conversion rates for both teams and individuals and try to pick out "the best".

Tuesday 28 August 2012

How Fouls Turn Into Cards.

A largely uneventful Premiership clash between Arsene Wenger's Arsenal and Tony Pulis' Stoke City at the Britannia Stadium on Sunday was partly enlivened by the pregame war of words between the two respective managers. Their antipathy towards each other isn't well disguised and the current spat predictably centred around the disciplinary record of both sides. Pulis mused that Arsenal were hardly strangers to red or yellow cards during Wenger's tenure at the club, while Arsene wondered aloud as to where Stoke had finished in last season's Barclay's Fair Play League.

At first glance Wenger's point was the stronger, as Stoke finished 20th and bottom of the Fair Play League and Arsenal finished 7th.  However, Barclays and the Premier League appear to have opted for a wide definition of Fair Play and red or yellow cards only comprise a minor portion of the total points used to decide the table order. An identical proportion of points are also awarded for "positive play". Points are awarded for adopting an attacking outlook and continuing to press for goals even when a team is already in the lead.

It is rather apt that a competition designed to reward risk taking, even when this approach may endanger wider objectives, such as Premiership survival, should be sponsored by a global bank and unsurprisingly Stoke and other less successful sides suffer particularly in this category. Taking only cards accrued, Stoke would have finished 8th compared to 17th for The Gunners. Few people would immediately recognize attacking intent as an obvious component of fair play, so perhaps we need a more focused method based on fouls and cards.

The recent data dump of player actions from the 2011/12 season has enabled a much more granular approach to be undertaken and while there are obviously flaws in much of the analysis we can begin to explore cards and discipline in much more detail. The data contains every match appearance for every player over last season and fouls and cards are recorded on a game by game basis, although the reason for the caution is omitted. Around 80% of the cards issued in the EPL were for foul play, offences such as dissent to account for the rest. We can therefore delete players from the list who were cautioned without committing a foul, leaving a fairly homogeneous sample that relates cards issued largely to fouls.

I first set a baseline probability for the number of fouls a player needs to give away before a booking becomes  the most likely outcome. Pitch position and recklessness of the contact isn't recorded, but we can use player position as a reasonable proxy for the former. Defenders will be making the majority of their challenges and potential fouls in areas in and around their own box and they will also be more likely to be illegally preventing chances from being created or taken. Therefore it seems reasonable to assume that they will be given less leeway than an habitually "clumsy" striker. Regression analysis was used on last year's individual player data to calculate the chances of each set of defenders, midfielders or strikers ending a match with at least a yellow card to their name given various levels of fouling.

The chart demonstrates the handicap under which defenders have to operate, they are over twice as likely to receive a card for committing the same number of foul challenges as are strikers, the players they are more often challenging for the ball. Strikers only become more likely than not to leave the pitch with a caution when they have conceded 8 or more fouls compared to just 4 for defenders. Midfielders are allowed 5 challenges before a caution becomes odds on and are also treated much less leniently than out and out attackers. A striker who infringes frequently near to the opponents goal rarely seems to be cautioned for persistent fouling, whereas a defender risk this penalty much earlier in the cycle.

To take an obvious example, Bolton's Kevin Davies in 2011/12 committed 1 foul in each of seven games, 2  fouls three times, 3 fouls four times, 4 fouls twice and 5 fouls three times. Using the regression lines for strikers he would have expected to receive three yellow cards last term, which is precisely what happened. Had he been judged as a defender, he could have expected an average of just over seven cautions.

Davies pulls Huth's shirt, Huth kicks Davies, the ball is just a passing bystander.
Few will be surprised that defenders have to be especially careful about testing a referee's patience through persistent fouling and another preconceived notion that we can test involves the general leniency allowed to home players compared to travelling guests. Defenders require fewer indiscretions before they enter carding territory, so I've similarly calculated the likelihood of a booking arising for increasing number of fouls committed by defenders both at home and on the road. The effect is much less pronounced in this case, but it appears that two typical fouls away from home gives you a 1 in three chance of seeing yellow compared to the slightly more lenient sanction of 2 from 7 chances if the offences come in front of their own fans.

How Defenders Who Foul Fare At Home and On The Road.

Every step along the road to transgression, visitors are more at risk. However, we are only dealing with raw counted numbers here. Pitch position for the foul may be more advanced away from home due to a more adventurous approach from the hosts and the referee may, quite rightly judge a foul on the edge of the box more worthy of a caution than two in more benign areas of the pitch. Effectively elevating the overall likelihood of the same number of fouls by a visiting defender resulting in a caution compared to a hometown player. We certainly have evidence for visitors being more harshly treated in terms of foul numbers leading to cards, but that harshness may be fully justified.

These two examples begin to show the depth of analysis that is possible with more granular data. Bolton's apparently lenient treatment from referees last season, where they committed on average, a near league high number of fouls per booking is fully explained by the likely area of the pitch in which the fouls were made. Bolton's forwards were responsible for 34% of their team fouls compared to a league average of just 22%.

Similar broad conclusions can be teased from this extensive data set that may hint at the slight variations in refereeing stance that exists over different match ups and how players of differing styles are dealt with. By isolating games between the Top Four and the rest of the league, there appears to be good evidence that referees are partly protective towards the bigger teams, especially when facing inferior opposition. Players from inferior teams are more likely to be booked after just one foul than their more illustrious opponents in the same game. However, the referees appear to realize that they have taken this stance because the balance then switches with increasing fouls and the players from the Big Four are treated slightly more harshly as they become multiple offenders.

How A Player's Chances Of Being Carded Changes With The Matchup.

of Fouls.
Card Probability.
All Matchups,    All Players.
Big 4 (vs Rest).
Rest (vs Big 4).
Rest (vs Rest).
Big 4 (vs Big 4).

There also appears to be good news for combative players who make many tackles and are involved in lots of one on one duels. Refs appear to appreciate that the risk of fouling increases with increased involvement and players who make a large number of legal challenges are given slightly more leeway when they do foul compared to teammates who make far fewer challenges, but are quickly prone to illegality.

Examples Of How Previous Good Behaviour During A Match Can Help.

Number of Challenges by Player
During The Game.
Chance Of Being Booked.
4 1 0.214
34 1 0.107
6 2 0.329
21 2 0.245
4 3 0.495
18 3 0.400
9 4 0.618
21 4 0.539

Bookings, it would seem are more complex than merely taking cards to fouls ratios. Even without field positional data or even considering possible conflicts of cause and effect, we can start to scratch below the surface and begin to see if teams use fouls as a tactical ploy, if they might be aware of which areas of the field and players are less likely to draw a card and if their card count is merited by their foul count.

Where Teams Do Their Fouling.

Team. Proportion of Fouls By Defenders. Midfielders. Strikers.
Arsenal. 34% 56% 10%
Stoke. 42% 30% 27%

We can begin to use all of the above observations to look at the yellow card records of Stoke and Arsenal to get a better comparison than the one provided by the Fair Play League. The table above highlights who committed the fouls for each team and by extension we can conclude where on the pitch they were likely to have occurred and furthermore what their cumulative expected card total would be.

Arsenal appear more likely to disrupt opponents in the midfield region, whereas Stoke operate in the more risky final third of the pitch. Obviously there isn't a clear demarcation line past which a defender cannot make a challenge, but from the available data it makes sense to assume that midfielders make their challenges, on average further up the field than do defenders. If we now for both Stoke and Arsenal isolate the number of fouls committed by each player sorted by position and allowing for such factors as type of opponent and venue we can calculate the expected number of cards each side would receive under the current refereeing climate.

Did Stoke and Arsenal Receive The Cards They Deserved In 2011/12.

Team. Expected
Cards From Fouls.
Cards From Fouls.
Stoke. 54 51
Arsenal. 52 51

In light of the necessary approximations the agreement between expectation and reality is good. Stoke committed 450 fouls compared to 400 for Arsenal, but both teams ended the season with 51 yellows by way of foul. Stoke benefited from a larger proportion of fouls by strikers, these tend to occur much higher up the pitch and officials have become accustomed towards treating such offenders more leniently. The Potters also reduced their card count compared to raw foul numbers by virtue of the larger number of challenges made by the defenders especially, that were entirely fair.

In short the respective records of each team, from raw foul numbers to yellow cards last season were, like Sunday's result, stalemated.

Thursday 23 August 2012

Why Sample Size Matters.

Easily the biggest improvement that can be made in the analysis of player related football data revolves around how issues of sample size are incorporated into the process. It is universally recognized that the outcome of just one trial or event alone can add little to our knowledge of a player's real ability and if playing talent is being judged solely on data collected over a single season, it has become customary to omit data that has originated from a small number of games.

If the only information we have about a player is that he has an extraordinary strike rate from a limited number of shots, many will balk at placing him at the top of the scoring charts, choosing instead to omit his figures until more become available.

However, this approach as well as being unfair to an unknown striker off to a hot streak, doesn't entirely eliminate the problem of a player's recorded stats being a mere sample of his true ability. Even if we limit our study to players who have recorded a minimum, but arbitrarily chosen number of attempts, we haven't removed the problem of unrepresentative, randomly driven outcomes over or underrating a player's real long term ability. We are at best simply reducing the effect.

As many are aware Manchester City in conjunction with Opta have released a huge csv file of individual player data for the English Premiership during 2011/12. The range and detail of the data has been discussed over various blogs and the reaction has been, quite rightly, predominately positive. Data collection for the hobbyists can be extremely time consuming, even assuming that the data is available to collect. So Opta and Gavin Fleig's ground breaking and bold decision to turn over to the analytical community such a large amount of data for free, with the promise of more to come is to be welcomed.

This out pouring of data provides an opportunity to develop new, useful and predictive metrics. But it is also very likely that we will also see extravagant claims being made in regard to what these new metrics tells us about the perceived talent of individuals, mostly through neglecting to address sample issues.

The ultimate strength of advanced sporting analysis comes about through predicting and explaining sporting contests and the raw data is always going to be the building block for this aim. However, presenting newly sourced raw data, even with the benefit of visualization software and corrected by appearance or opportunity is merely a different way of describing what occurred on the field of play over a series of sample size restricted events.

As a descriptive archive it is valid and valuable, but once such statistics alone are used to claim knowledge of a player's real talent and potential, their usefulness becomes over stretched.

Advanced analytics only comes of age when practitioners fully acknowledge the differences between descriptive numbers and the predictive metrics that are subsequently derived from such raw data, hopefully stripped of as much random baggage as is possible. Editorial over :-).

In previous recent posts concerning crossing and goalkeeping stats, I've illustrated how the overall predictive properties of such numbers improve once sample size is addressed and particularly extreme outliers are reigned in towards the group average. So for this post I will outline how I've collected and used aggregated data to evaluate how Premiership defences dealt with both passes made into the final third and crosses made into the box during 2011/12. How many sample repetitions we may reasonable need to see before we can begin to form and opinion about talent levels across different teams and also outline some of the assumptions that are required when using aggregated data.

Aggregated data for success and failures from final third passes and crosses is available at such sites as Opta re seller, EPLIndex. However, in common with many sports it is only provided from the viewpoint of the attack. If you want to know how Stoke deal with crosses, you need to manually record the data from each game at such Opta driven apps as Fourfourtwo's Stats Zone. 380 games, each comprising two sides takes around seven hours for one season.

At the end you are left with summarized data for over 100,000 passes that were made into the final third of the pitch by the attacking sides over a season, of which around 70,000 were deemed successful and resulting in over 400 occasions when the recipient of the pass went on to ultimately score. Similarly for crosses, over 16,000 were attempted, just under 4,000 reached a teammate and over 200 led directly to goals.

Our first assumption is therefore that sheer weight of numbers makes the quality of crosses or passes comparable for all sides. A full and complete calendar of games ensures that strength of schedules are almost identical for all 20 teams, they don't obviously play themselves. But we also need to believe that each team is defending a similar proportion of hopeful longballs as it is delicate, defence splitting passes in and around the area.

Last season, Chelsea faced 4300 passes into it's final third of the pitch and 2737 of those attempts successfully reached an opponent, so they allowed a 63.5% success rate. Relegated Blackburn faced 6020 such passes and allowed a success rate of 71.3% when 4290 found their intended target. Overall the raw efficiency range between best and worst runs from just over 59% to Blackburn's 71.3%. Remember we are looking at this from a defensive perspective, so the lower their opponent's completion rate, then the better the defence and midfielder is doing a job of limiting final third completions.

Surely with trials running into the thousand we can take the efficiency rates at face value ? The large spread in efficiency ratings over such a number of repetitions indicates that causes other than random variation are certainly behind the numbers. These are likely to be partly skill driven and partly tactical. But even in such a large data collection, an improved efficiency figure results from pulling extreme values towards the mean. The adjustment for allowing pass completions in the final third are small, but even in very large sample numbers there is a case for making them.

To further illustrate the need to regress raw efficiency rates, we can take a different criteria for success and look at a defence's ability to prevent final third passes being converted into a goal by the player who received the pass. Successes for the opposition are naturally much less frequent in this case. Wolves for example allowed 38 goals from 5156 final third passes for an (in)efficiency rate of 0.74% or a goal every 135 such passes, compared to Manchester City who succumbed once every 500 passes.

On this occasion there is still team input into the differing efficiency rates recorded by different defences, but that input is less pronounced and random luck is more of a factor than when mere pass completion is used as the defining factor. You could begin to be able to evaluate at team's ability to prevent final third pass completion after a game or two, but accurately evaluating ability based on goals allowed from the same type of pass would require almost a third of a season.

Below I've listed the regressed success rates for both categories of outcome for final third pass allowed by Premiership teams during last season. The numbers are the best guess of how teams will perform in 2012/13 based on knowledge from only 2011/12 and an undoubted improvement on raw efficiency figures. Again, low efficiency figures are preferred because defences don't want their opponents scoring from or maintaining possession of passes played into the final third.

Regressed & Raw Rates At Which Defences Allowed Pass Completions or Goals From Final Third Passes In The EPL 2011/12.(Blue are the Top Five Defences, Red are the Bottom Five).

Team. Regressed
Efficiency Based on Pass Completion.
Raw Rate. Regressed
Efficiency Based on Goals Allowed.
Raw Rate.
Man City. 0.635 0.634 0.00270 0.00205
Stoke. 0.651 0.651 0.00301 0.00260
A Villa. 0.665 0.665 0.00312 0.00275
Man Utd. 0.651 0.651 0.00313 0.00267
Sunderland. 0.660 0.660 0.00314 0.00280
Liverpool. 0.614 0.612 0.00342 0.00310
Everton. 0.639 0.638 0.00381 0.00367
Spurs. 0.630 0.629 0.00382 0.00368
WBA. 0.680 0.680 0.00395 0.00387
Newcastle. 0.650 0.650 0.00396 0.00388
Fulham. 0.687 0.688 0.00429 0.00434
Chelsea. 0.636 0.635 0.00433 0.00441
Swansea. 0.677 0.678 0.00452 0.00465
Wigan. 0.672 0.673 0.00452 0.00466
Norwich. 0.685 0.686 0.00454 0.00466
QPR. 0.674 0.675 0.00464 0.00480
Arsenal. 0.596 0.593 0.00500 0.00536
Blackburn. 0.711 0.713 0.00516 0.00548
Bolton. 0.642 0.641 0.00548 0.00600
Wolves. 0.680 0.681 0.00649 0.00737

Briefly digesting the figures, the regressed first and third columns are much more likely to be the kind of rates enjoyed by each team this season and the second and fourth columns are the rates that each team actually recorded during 2011/12.

There's mixed news for Arsenal, who were the most impressive team at denying passes reaching their intended target, but better only than the three relegated sides at prevent received passes turning almost instantly into a goal. They are likely to post similar figures to last year in the first category, but should show natural improvement when trying to deny teams a goal. Stoke, the weekend opponents of The Gunners share the honours with Manchester City in goal prevention efficiency terms, reinforcing their commitment to denying opponents opportunities in the face of little desire for ball retention.

Regressed & Raw Rates At Which Defences Allowed Completions or Goals From Crosses In The EPL 2011/12. (Blue are the Top Five Defences, Red are the Bottom Five).

Team. Regressed
Efficiency Based on Cross Completion.
Raw Rate. Regressed
Efficiency Based on Goals Allowed.
Raw Rate.
Man City. 0.218 0.200 0.0132 0.0110
Stoke. 0.220 0.207 0.0131 0.0109
Arsenal. 0.223 0.210 0.0128 0.0087
Everton. 0.225 0.215 0.0132 0.0115
Chelsea. 0.227 0.218 0.0136 0.0134
WBA. 0.229 0.224 0.0121 0.0073
Sunderland. 0.231 0.228 0.0136 0.0136
Norwich. 0.233 0.228 0.0132 0.0118
Liverpool. 0.233 0.231 0.0141 0.0165
Newcastle. 0.233 0.232 0.0133 0.0118
Fulham. 0.235 0.233 0.0126 0.0088
A Villa. 0.237 0.236 0.0139 0.0146
Spurs. 0.240 0.239 0.0142 0.0163
Swansea. 0.241 0.246 0.0130 0.0107
Man Utd. 0.242 0.250 0.0138 0.0142
QPR. 0.242 0.249 0.0146 0.0177
Blackburn. 0.242 0.248 0.0141 0.0153
Wolves. 0.243 0.250 0.0156 0.0225
Bolton. 0.247 0.260 0.0155 0.0221
Wigan. 0.247 0.260 0.0138 0.0143

The same methodology can be used in relation to the rate at which defences allow crosses to find an attacking player and how often they are converted. Once again raw rates describe exactly what happened over a series of matches, but regressed figures will be more predictive in future seasons. Manchester City only allowed 1 in 5 successful crosses in 2011/12, but a slightly less impressive 2 in 9 wouldn't surprise this term. Similarly, WBA's defensive set up weathered an average of 137 crosses before giving up a goal directly from the cross ball, but under more neutrally lucky conditions opponents can expect to average slightly more than 80 crosses to score during the current season.

Stoke's Defence Prevent Yet Another Cross From Reaching it's Intended Target.
These numbers indicate that even large team attempt totals do not guarantee that simple raw rates can be taken at face value. Therefore individual player statistics are bound to necessitate even larger amounts of group average rates being added to their numbers. Gradually football is realising that they must follow other sports and regress their raw stats to add mightily to their value and while interactive state of the art presentation of newly released data is to be welcomed, we must not complacently allow it to become the orthodoxy for evaluating actual team or player talent.

To register to download the EPL data from Manchester City, Opta and Gavin Fleig click on this link

Monday 20 August 2012

Tracking YaYa Toure's Acute Accent.

The best game of the opening week of the Premiership season, pre supposing that Manchester United and Everton don't reenact last year's 4-4 draw, took place at the Etihad Stadium were City continued their love affair with 3-2, come from behind victories against inferior opposition.

How Success Rates Fluctuated For Both Teams at The Etihad Stadium.

0-0,Penalty awarded to Man City, 17'
0-0,Penalty missed by Silva,17'

The descriptive nuts and bolts of the match will be well described throughout the blogging community, so I'll
merely add of few observations. City were unsurprisingly huge pre game favourites with around a 80% chance of winning and an expected combined success rate that includes a draw as half a win of just under 0.9.

The penalty award moved their expected success rate to over 0.93, but Silva's tame effort dropped it back again. The highest expected success rate that new boys, Southampton attained was 0.66 when they led 2-1 after 71 minutes, immediately prior to Dzeko's equaliser. A position comparable to that enjoyed by City ten minutes later when they merely drew level at 2-2, indicating the long term potency of superior ability combined with home advantage even in time limiting situations.

In this post I introduce a way to track each team's in game satisfaction with the current scoreline and how their position is likely to alter the balance between attacking intent and defensive responsibilities.

Games States For Manchester City v Southampton.

Each team's pre game expected success rate is used as their benchmark figure and as the game progresses the difference between what they, on average initially hoped to take from the game and what the current score/time remaining combination indicates they may take from the game is charted. Plots below the zero line indicate an under performance and this is quantified by the length of the bar.

For example, 30 minutes in and with the game still 0-0, Manchester City are underperforming because they are strong favourites to win. However, the relatively short green bar reassuringly for their fans indicates that their is still a considerable amount of time remaining during which City are very likely to be able to turn their superior skills into a win. Contrast this green, City bar at 30 minutes with the much larger negative one at 80, just prior to Nasri's winner. The game is again tied, but now time constraints mean that City have much less time to impose their talent on Southampton and claim all three points.

It's probable that Southampton are happy at this point, 10 minutes left and currently in possession of a point, so we can guess that Southampton will be concentrating on defence, City more so on attack. Once City retake the lead their current expected position then becomes stronger than their pre game expectations, denoted by their green line flipping to above the axis. Time to adopt a more defensive approach, perhaps ?

Overall City's match satisfaction plots for the Southampton game can be broken down into four distinct periods. One of slight under expectation, but with ample time for class to tell prior to Tevez's opening goal. (The positive blip at 17' was transient and as a result of the award of the subsequently missed spot kick).The most substantial period of City under expectation ran from Lambert's equaliser up until Nasri's winner, sandwiched between were two periods where City's expectation over shoots their pre match expectation.

Fourfourtwo's stats app provides an excellent way to track a team's tactical approach over time by collecting  both the average position and the extent to which players are involved in game events and denoting this involvement by the size and position of each player's name on a 12 by 6 cm pitch grid. Yaya Toure is a hugely influential City midfielder, he has managed to retain my complete admiration for his abilities despite scoring the winning FA Cup Final goal against Stoke. So he must be very good indeed.

If we use 442's app to trace Toure's involvement and position during the four different periods of expectation "enjoyed" by City and their supporters on Sunday, we can use the results as a reasonable proxy for City's overall approach. I've settled on the position and size of the acute accent at the end of Yaya's name as my reference point.

Influence & Pitch Position of Yaya Toure During the Four Phases of Manchester City's Game with Southampton.

Game Period Average Distance from Southampton Goal.  Average Distance Left of Centre Spot. Average Length of Acute Accent.
Kick Off to 40'. 31.6 4.9 1.5
41' to 59'. 25.9 1.2 1.5
60' to 80' . 16.3 4.9 2
81' to Full Time. 54.6 14.8 0.05

Game periods when City's expected success rate was below the pre game figure are marked in red and in blue when they were on target for all three points. All distances are in yards.

Yaya is significantly more involved in the areas around Southampton's goal during the 20 minutes when the visitors were either level or ahead and City's potential average returns were well below expectations. The size of of his influence on the game is also at it's greatest during this period. Had his name been etched on the pitch in relation to his involvement his acute accent would have measured almost 2 yards! He was extremely influential and attack orientation when City needed those qualities more than ever.

Once City regain the upper hand following Nasri's goal, Toure's accent drops back to a position very nearly on halfway line and adopts a much wider position. His influence also declines dramatically, illustrating that the final ten minutes became very much more a team effort.

In short, Manchester City's approach to the game ebbed and flowed over the 90+ minutes, but how they prioritized attacking intent was very much dependent upon the state of the game and that reflects in Toure's sphere of influence and by inference, City's granular attacking stats such as goal attempts and final third passes. Context again is hugely influential when analysising the statistics.

Sunday 19 August 2012

Quantifying Which Team Is Happier With The Current Scoreline.

One vital ingredient you need to add to your analysis of a football match is the current score context. Teams adjust their playing style depending upon whether they are comfortably ahead or well behind, generally adopting a less attacking and more defensive stance in the former case and vice versa in the latter. These micro shifts in emphasis can in turn impact upon in game match events such as shots and saves. For example a team chasing a game may produce more goal attempts, but they are often from further out, instigated by a wider pool of players and subject to more defensive pressure. This kind a subtle difference in shot or save quality is often lost in aggregated data, but visible in more granular batches.

The general game state for each team in a match is fairly easy to qualify if one team has an advantage on the scoreboard, but actually quantifying these positions as well as the numerous occasions when the match is stalemated requires more effort. Arsenal and Sunderland were level after a hour at the Emirates on Saturday and the visitors from the North East would have been much more comfortable with the scoreline than were their hosts. The real question is how much happier were Sunderland ?

One way to add context to such games is to calculate the combined win and draw expectancies for each team in running and track the change in these values compared to where they stood at kick off.

Full Time Score. 0-0.

Arsenal were unsurprisingly large pregame favourites to beat Sunderland, they were in the region of 70% likely to win the game and a shade under 20% to draw it. Combined, these two figures suggest that Arsenal's long term success rate (wins + half draws divided by games played) from such a match up would average just under 0.8 making Sunderland's long term success rate just over 0.2. 

As the game progresses, goal expectacies for each team also decline and at the still stalemated hour mark Arsenal's win probability would have been around the region of 0.5 and their predicted long term success rate for this game position would have declined from just under 0.8 at kick off to just under 0.7 now. In raw terms The Gunners had lost 0.1 of their pre match predicted success rate by their failure to score. In running success rates of rivals are intimately entwined, they must always total one, so Sunderland had seen their pre game success rate climb by the same amount. 

In this situation using game win a draw probabilities allows the 60th minute to be contextualized for both Arsenal and Sunderland. In non numerical terms, Sunderland are very happy and Arsenal aren't and this will partly dictate how each team approaches the final third of the match.

A further example from yesterday illustrates the effect of  goals in a more evenly matched game such as Newcastle's entertaining of Tottenham.

1-0, D Ba, 55'
1-1, Defoe, 76'
2-1, H B Arfa (pen), 81'

Newcastle are probably inferior to Tottenham at the moment, but home advantage gave them a very slight match day edge. In contrast to the Arsenal/Sunderland game, while the game remained or became level, both teams were probably fairly happy with the scoreline. This is denoted by the closeness of each team's plots to the neutral zero line and tactical approaches from both sides are likely to mirror those used league wide in evenly matched contests. Tottenham found themselves twice in losing positions, one of which they managed to salvage and league wide, teams attempt this rescue operation by committing more to attack than defence. 

Utilizing time and score specific success rate movements such as these into more granular shot data will prevent erroneous conclusions regarding team shot conversion rates being formed and being incorporated into aggregated totals. Teams, even good ones may appear to have declining shot conversion rates from previous years or previous months, but often this is because they may have found themselves trailing or drawing more often and therefore had more frequently faced overtly defensively minded opponents. This variation in game position is to be expected for all teams across and during seasons. One year Arsenal will find themselves trailing or drawing more often than previously through random variation rather than a sea change in team quality.

Aggregated data can be very useful, but it can also mislead. 

Thursday 16 August 2012

The Case For Crosses.

The recently departed Euro Finals provided a paradox for advocates of different styles of play. Spain largely did away with the conventional centre forward, choosing instead to play intricate, short passes in the final third while patiently waiting for an opening to appear. Meanwhile many of the remaining teams, England among them threw crosses into the box and reaped a fairly substantial reward. The Barcelona/ Spain approach is certainly widely admired and Andy Carroll's towering header against Sweden, rather than being seen as a magnificent feat of precision crossing by Gerrard and athletic finishing by the forward, was regarded as the act of a team playing in a tactical backwater.

Crossing it has been suggested is not only outdated, it is also a waste of precious possession, an inefficient mode of scoring and a hostage to luck rather than skill. Teams throw the ball in to the mix and hope for the best.

Fortunately, data now exists that can help to help quantify and compare each approach. I therefore collected data for every cross attempted during the 2011/12 EPL season and recorded the frequency of which the ball reached it's intended target and the number of times the cross resulted in an immediate goal.

The first charge often leveled at crosses is that they require large amounts of luck to result in a goal. If luck was the over riding factor in scoring from crosses, we would still see a variation in conversion rates between teams. Toss a coin 800 times (the average number of crosses attempted by a team in a season)  in 20 batches and some coins would appear to be more adept at scoring heads than would others simply as a result of natural, random variation. So just because Chelsea require less crosses to score one goal than do Wolves, we cannot automatically assume that Chelsea are more skilled at crosses. Instead we need to see if the conversion rates from crosses varies between teams in such an extreme way that we can conclude that the spread is due not only to natural expected variation, but also player input that we may attribute to differing levels of skill. This may be the crossers ability to more accurately deliver the ball or the strikers ability to lose his marker and direct the ball goalwards.

One down...72 To Go!
In the table below I've listed the average number of crosses each EPL side would require to register a goal by the first player to receive the cross based on analysis of each side's crossing conversion rates during last season. As you may guess and calculations confirm, a task that requires 45 repetitions per success for the best and reaches into the 100's for the worst is not just the product of random variation, there is a fairly considerable element of skill involved as well. One way of describing the spread of talent within the task is to calculate how many games we would need to watch before our impression of a team's talent at performing the task was more influenced by skill than random chance. In the case of scoring from crosses that happens after about 300 attempts or about 14 games for an average side.

The next assessment we need to make is how efficiently crosses use possession. My calculated average conversion rate for the first receiver to convert a cross from either open play or a set piece into a goal is once every 73 crosses. This doesn't look good, especially when compared to shots that are converted at a much higher rate. However, the cross is the precursor to the attempt on goal. If we are going to include crosses that sail harmlessly into the keeper's grasp in quantifying the potency of a cross, then we must also include the misplaced final third pass when quantifying the merits of a different approach that relies upon passes played towards the goal from areas other than the wings.

I therefore recorded the fate of all passes that were made into the final attacking third other than crosses for the 2011/12 EPL season. The comparison with crosses  is necessarily broad, but hopefully retains enough validity to compare an attack culminating in a cross and one based more around possession, passing and carving out an opening.

Regressed Number of Attempts Required To Score Once From A Cross and Once From A Final 3rd Pass.

Team.  Number of Crosses Needed To Score. No. of Final 3rd Passes Needed To Score.
Chelsea. 45 249
Norwich. 48 304
Man Utd. 49 167
Man City. 50 157
Blackburn. 65 260
Arsenal. 66 171
Everton. 74 267
Newcastle. 74 202
Stoke. 75 326
Aston Villa. 75 312
QPR. 78 331
Sunderland. 83 273
WBA. 90 252
Swansea. 92 246
Fulham. 94 227
Wigan. 94 331
Bolton. 96 253
Wolves. 99 363
Tottenham. 114 171
Liverpool. 139 341

If we repeat the previous analysis this time for chances made from passes into and inside the final third, we find that the average team attempts 135 such passes during a game. Again scoring from such passes is a skill and this should become apparent over the slightly shorter timescale of just ten games instead of the 14 for crosses. In terms of raw efficiency teams need to execute between 170 and 360 such passes to score one goal, with the average 240.

If we lastly include accuracy rates for each category of chance creation, crosses reached their intended target  23% of the time in 2011/12 compared to 66% of the time for final third passes.

We can therefore build up an identikit for our two classes of chance precursor. Scoring from either a cross or a final third pass are skills that vary between teams. A single cross has on average a greater goal threat than does a single final third pass, but ball retention is much more likely for the latter than the former. In essence we have for crosses a high risk, high reward skill compared to a predominately ground based passing strategy that is low risk, but also low reward for each individual pass.

Ballpark Figures For Goals From Both Crosses and From Final Third Passes, EPL 2011/12.

Crosses. Final 3rd
Average Number Per Game. 22 135
Number Needed To Score. 73 240
Accuracy. 23% 66%
% of Games Where Such An Assist Led To A Goal. 26% 39%
Total Goals From Such Passes. 2011/12. 230 429.

These are by necessity of the data just ball park figures, but if a team wanted to move from a balanced model of crosses and final third passes for chance creation towards a full on Barcelona approach, they might need to replace their 22 crosses per game with 72 extra final third passes.....just to stand still in goal expectancy terms. Of the 760 team appearances in last year's EPL 200+ final third passes was out of reach on 84% of those occasions.

A much more important question is whether both approaches are not needed to enhance a team's attacking capability in the complex interactions that comprise a football game. Crosses require taller, possibly less agile defenders to defend the ball. They may be more vulnerable if a team that possesses a crossing threat then also employs Barcelona type passing based attacks as well. 

NFL defenses must respect the less efficient pass (in possession retention terms) because teams do not commit totally to the more efficient run (in possession retention terms). They recognize the benefits of a mixed strategy, even if one of the ways of advancing the scoreboard is less efficient than another. Passing the football  and running the football both contribute significantly to scoring and the presence of one mode of attack may enhance the effectiveness of the other. Even the most potent of passing attacks in the NFL struggle when the scoreboard and time constraints force them to continually resort to a one dimensional passing attack.

Arsenal scored goals from final 3rd passes in half their games, but they also scored from crosses in a third of their matches. If they become one dimensional and dispense with crosses all together, will they become even more potent or even more predictable and easy to deal with ? Barca have wonderful, exciting and very expensive players, but even they failed to overcome Real Madrid in La Liga, Chelsea in the Champions League and Spain were held by both Italy and Portugal in regulation at the Euros. 

The very best exponents of passing based football have fallen short twice last season and Spain required a shootout along the way to the Final, maybe crossing isn't dead and it's all in the game theory.

Monday 13 August 2012

Defence. It's All About The Marking.

Imagine a goal scoring opportunity 12 yards from goal and dead centre, in other words on the penalty spot. What's the likelihood that the attempt will be converted ? Naturally the answer will depend on a variety of factors. An actual penalty will carry a greater chance of being converted than will a header from a corner in a congested area. In the former situation defenders are totally absent and the striker has an unimpeded shot at goal, success rates will be in excess of 70%. In the latter the presence of a large number of defenders, with one probably assigned marking duties on the chance taker will make converting the opportunity much more difficult. Shooting is likely to be more hurried, less accurate and potent, there's also the added possibility that the subsequent attempt will be blocked before it even reaches the keeper and success rates will drop considerably towards 1 in 20 levels.

In previous posts I've looked at how the position on the pitch from where a goal attempt originates has a significant bearing on the likely success of that attempt. Wider and further quickly reduces conversion rates and headers quickly lose out to shots as we retreat from the six yard box. However, we haven't yet attempted to incorporated the impact of defensive bodies around the striker and between the ball and the goal.

Collecting information detailing the amount of bodies in close proximity to a goal scoring attempt is a hugely labour intensive exercise requiring the use of video analysis. However, we can use a reasonable proxy by looking at the situation from which a chance originates. Set plays such as corners and free kicks where the ball is played into the box rather than being fired directly at goal are likely to see players most tightly marked. Chances created from open play will afford attacking players the opportunity to find more space and separation because of the more fluid nature of the move and a much less compressed area of play compared to a corner. Finally, counter attacks where opponents have committed players to the attacking third of the field gives the countering team the best opening to provide a final pass to a colleague enjoying time and space to shoot at goal.

Corner Kick.........All Link Arms.
To measure the impact that tight or loose marking has on the conversion rates of shots taken from various distances and angles, I've used the average overall goal expectancy derived from the full data set and compared the predicted figures to goals scored from actual attempts made in each of the three categories.

Scoring Rates In Various Defensive Situations Compared To Overall Scoring Rate.

Likely Defensive Formation. Actual Goals/Expected Goals.
Tight. 0.65
Normal. 0.96
Stretched. 2.12

Chances created from open play that are likely to be characterized by "normal" levels of marking predominate in the data, so the conversion rates for "tight" and "stretched" defences shouldn't be taken at face value, but the results appear to be as expected.

Close, man to man marking prevalent at corner and free kick situations depresses the rate of scoring to only 65% of the overall conversion rates, but stretched defences subject to quick counter attacks positively leak scores at over double the usual rate.

Average Scoring Probabilities Against Various Defensive Formations Sorted By Attempt Distance from The Byeline.

An even starker example of the importance of an organised and fully manned defence is seen if we run a regression on the goal concession rates of our three individual types of defences and plot the average predicted conversion rates for attempts made throughout the penalty area at varying distances from the dead ball line.

Undermanned and possibly disorganised defences on the receiving end of counter attacks are overwhelmingly more likely to give up a goal than are "normal" or "tight" defences. Sample sizes are again small for attempts made on the counter, but the likelihood of each type of defence conceding goals only begins to converge when shots are taken from around the edge of the box. At all other distances within the area, stretched defences are much more fallible than normal ones, who in turn trail defences set up to defend corners and free kicks. In my dataset, an opportunity created inside the six yard box from a counter attack is over twice as likely to succeed as a similar chance created in normal open play or from a set piece.

The difficulty of scoring from corners and set pieces where the ball is played into the box is again illustrated using this approach. There is clear indication that corners are relatively unproductive, not because of the angle of delivery, but because of the time allowed for the defence to organise and populate itself. Defenders work extremely hard, both inside and outside the laws to stay close to attackers at such set pieces and with good reason. Similarly attackers who can free themselves, perhaps with the help of a blocking run by a colleague can reap rich rewards by moving the chance into an environment more akin to a counter attack.

It should come as no surprise to discover that a well organised defence can degrade the potency of an attack, but the extent to which this appears to happen may do. At the very least this kind of analysis further highlights the misleading nature of possession statistics. Teams that are content to soak up pressure and occasionally hit opponents quickly on the break are likely to create fewer, but better quality opportunities.

Labeling teams that dominate possession and chances as undeserved losers may soon be a thing of the past as the narrative is replaced by one where well organised teams forego possession to create premium opportunities on the counter and start to be described as the deserved winners. The more we delve into the statistics, the more apparent it becomes that there is more than one way to win a football match.

Friday 10 August 2012

The Anatomy Of A Spanish Goal.

One of the most exciting and fast moving developments at the present involves goal expectancy calculations and how they can be applied to individual player contributions. I've laid down some initial thoughts and calculations with data kindly provided by OptaPro here. The basic idea involves the increased likelihood of a team scoring based on where on the pitch they have possession and how the credit for that situation could be divided. The use of probabilities instead of actual real life records is attractive because randomness can result in for example goal gluts or famines leading to players receiving unrealistically extreme praise or criticism.

It's inevitably that numbers will be a central part of this type of analysis, but at the moment they should be considered secondary to the methodology. It's fairly easy to create a goal expectancy grid for the attacking portion of the pitch using general accumulated data and observational skills. In general the further away from goal and the wider out you are, then the less likely you are to score. So any pass that moves the ball to a closer and more central position is going to improve the scoring chances of a team.

Below I've taken a more detailed look at the assisted goals scored by Spain in their successful Euro 2012 campaign, other opportunities of course were created but not converted. At the moment I've solely evaluated the position of the ball in terms of how likely a team is to score should they attempt a shot from their current position. This approach isn't realistic for general play, as teams also use possession as a defensive tactic, but it is appropriate for overtly attacking situations.

How Spain Increased Their Goal Threat From Passer to Scorer.

Goal Probability
at Point of Pass.
Goal Prob. at Scorer's 1st Touch. Goal
Prob. at Point of Shot.
Change in
Prob. From Pass to 1st Touch.
Change From
1st Touch to Shot.
Scorer. Passer.
0.103 0.341 0.480 0.238 0.139 Navas. Iniesta.
0.035 0.209 0.209 0.174 0 Fabregas. Silva.
0.004 0.023 0.149 0.019 0.126 Torres. Silva.
0.030 0.062 0.169 0.032 0.107 Fabregas. Silva.
0.023 0.117 0.117 0.094 0 Torres. Xavi.
0.006 0.030 0.080 0.024 0.051 Jordi Alba. Xavi.
0.085 0.155 0.155 0.070 0 Mata Torres.
0.214 0.277 0.277 0.063 0 Silva. Fabregas.
0.098 0.137 0.137 0.039 0 Alonso. Jordi Alba.

The process is best described by an example and I've chosen Jordi Alba's forward burst onto Xavi's pass for Spain's second goal in the final, highlighted in red in the grid. In the first column Xavi has possession of the ball over thirty yards away and to the left of the Italian goal. Had he tried a speculative attempt from that position, typical shot data from Euro 2012 and the EPL suggests that he would score less than once in every 160 tries. So the direct goal threat from this position is negligible. Xavi finds Jordi Alba, who has continued his run with a forward through ball and the soon to be scorer makes first contact with the ball outside the box and still to the left of goal. Had Jordi Alba attempted a shot with his first touch his generic chances of scoring would have been about 1 in 30. So Xavi's pass has advanced Spain's likelihood of scoring from 1 in 160 to 1 in 30. The ball is now firmly at the feet of Jordi Alba and he advances the ball into the Italian box and from the position of his first touch he now has increased Spain's chance of scoring to 1 in 12. And he duly adds to Silva's opening strike.

Broken down in stages we can begin to tease apart the individual contributions made by each player. Italy saw the threat level rise from negligible to considerable through a combination of Jordi Alba's run, Xavi's pass, Jordi Alba's running with the ball and finally his shot and Goal Expectancy can be used to quantify the likelihood of success every step of the way. Similarly, Silva's quick thinking turned a routine corner into a dangerous situation against Ireland, Fabregas then elevated the threat by quickly heading towards goal to complete the 4-0 rout. It may also be noteworthy that over half of the nine chances created by Spain required no extra input from the scorer other than the execution of the shot or header. Silva, Xavi, Torres, Jordi Alba and Fabregas each providing pin point passes that required no adjustment by the scorer.

One small step.............