Friday, 15 February 2013

Shot Conversion Rates, Time Spent Leading and Red Cards.

Parity is a largely alien concept for the very best teams in Europe's major football league. American sporting structures such as the NFL embrace the concept of equality of opportunity, even if a few teams occasionally manage to remain successful for longer periods of time, through fair means or foul. A quick head count shows that in this century virtually every NFL side has made the post season and the largely unremarkable NFC West has seen all four of its teams reach the Super Bowl. Any Given Sunday translates quite nicely to Any Given Season.

By contrast the Premiership has been the preserve of a largely unchanging group of four teams, headed with consistent predictability by Manchester United. Their season long success rate is currently over two standard deviations above league average and this indicator of supremacy over their rivals rarely drops below one and a half times better than par. Similarly, Real Madrid won La Liga last term with a win or draw success rate which was almost 2.5 standard deviations greater than league average, with Barcelona following a respectful 2 sd's back as runners up. Spain has increasingly become a two horse race since 2007/08 and the best of Spain is becoming more dominant in their sphere than are the best of England in theirs.

Talent is the obvious defining factor which separates the top teams from the rest. If luck were a major contributor between reasonably matched sides, we would expect to see greater churn among the top finishers. Since 2000, nine different NFL sides have lifted the Super Bowl, with only the Patriots, who combined astute coaching with Spygate having triumphed more than twice. Over the same time span, the Premiership has been won by just four teams, including seven times by United and that sequence of exclusivity will remain intact this season.

Identifying and grading talent requires copious amounts of data if we are to attempt to separate the output due to randomness from that due to skill. The goal scoring exploits of Ronaldo and Messi in Spain and van Persie in England combined with the large monetary value placed on their services appears to indicate that scoring ability is an area where skill proliferates. Scoring efficiency and hence goals goes a long way to producing a successful team.

We can try to separate the great scorers from the merely good by various means. Quantifying the expected number of goals scored by a striker and by extension a team, based on shot location can begin to tells us much about the team or individual quality. However, the approach is very data intensive.

Accumulating large numbers of shots without including positional data can also provide excellent information. The hope is that sheer weight of numbers leads to a similar overall quality of opportunity, especially at a team level. Each attempt on goal can then be treated as a trial which is either successful because a goal is scored or not. By reference to both sample size and average shot conversion rates across the league we can then see if the different team conversion rates differ by more than would be expected purely by random chance. We may choose to assign any difference to non random factors, such as skill or the lack of it and tentative initial studies appear to show that increased shooting efficiency is present within successful teams and sought after strikers.

Broad, season long trends are of course useful, but we can try to gain a more intimate understanding of the dynamics of a football match by looking a data from a game level to see how teams cope with the inevitable changes in game state that occur from match to match. If clinical finishing is a skill, a player may demonstrate that talent more effectively in different game states. A team may have taken the lead because they have more efficient scorers, but they then become even more efficient as their skill players are able to operate in a scoring environment where the opposition are prioritizing attack over defence.

EPL teams play a near identical schedule (they can't obviously play themselves, so United have an easier schedule than QPR), but within this relatively unbiased fixture list, a side will experience a much more varied in game state. Even the most committed of defensive, bus parking exercises will eventually have to give way to more adventure if the scoreline dictates and that opens up play at their defensive end of the pitch.

We can try to demonstrate the effect of game state on likely conversion rate in  single game by plotting game state against shot conversion rate. On a match by match basis game state accounts for 24% of the total variance in conversion rate. If we express this in a way that is more applicable to a real life match situation, should a team increases it's game state by one standard deviation of the league average, then their conversion rate would, on average increases by around 49% of the standard deviation of all game by game shot conversion rates for that particular league. In short and irrespective of the possibly competing correlation directions, the longer you lead, the better your single match game state will tend to be and your shot conversion rate should follow this improvement.

We can now follow the chain of evidence as to why the very best may be extremely efficient at certain important on field actions, such as shot conversion. They initially purchase talented strikers (different levels of shooting talent appears to exist among strikers), they then find themselves in strong in running positions, which then opens up further their attacking options as their opponents become less defensive. There is, however another significant, minority factor which contributes to enhancing the strike rate of the very best and that is red cards.

30% of the red cards shown last year were shown to opponents of the big four and when the best sides are given the added advantage of a numerical advantage, their conversion rate increases again. The shot conversion rate for Arsenal, Chelsea and the two Manchester clubs where 11 played 11 was around 14%, but nearly 20% in red card games.

The best appear to have high conversion rates because of great players, favourable match environments and a disciplinary system which rewards the best by more frequently reducing the numbers of the rest and while their season long rates will fluctuate, it is to be expected that on average United, especially will maintain a healthy gap between themselves on the summit and the mere also-rans.


  1. Mark,

    What are your thoughts on Arsenal 2011-2012 vs 2012-2013? I think their conversion rates are very similar to last season even without RVP.

  2. When players move teams it gives us a chance to try to see how much of the talent is with the player and how much with the team. I'm just starting a couple of posts relating to vP. they should be online later next week


  3. If you are interested in separating team skill from player skill, you may want to check out my blog

    I rate players by comparing the teams performance with and without the player in a minute by minute approach. Until now I blog in German, but I could translate if the English speaking football analyst community is interested.

  4. Mark,

    Thank you for this fantastic post.
    I really support the idea that context specific analysis is needed to take football analysis to a higher level.

    My question here regards the direction of the relation between conversion and game state.
    While I personally think that there is more and more evidence that game state influences conversion more than most people think ('leading teams convert better'), how would you counter the argument that teams would lead because they convert better?

    Wouldn't that still require a higher level of data, such as you used in this ( excellent post elsewhere?


  5. Hi Sander,
    I definitely agree that the causation/correlation arrow goes in both directions and the best way to separate the size of the two effects is to use the higher level of data. Collecting such data isn't really that difficult, but it is tedious and time consuming. One season for one team takes about 4 hours.

    I've been trying to come up with a use for the time spent leading/drawing stat for ages. It is strongly related to the pregame strengths of each side, so you can work out fairly easily how long a team *should* lead/draw for and compare it to how long they did lead/draw for. You still don't know for sure when their shooting inefficiencies started. But you can use it to build up evidence to support the conclusions from higher level data such as the Arsenal study.

    Match state is largely neglected at the moment, but it is a significant, minor contributor. When I started building "in running" models for football, they always improved if you told them what the current score was.