As well as providing the analytics community with a rich vein of raw data, Manchester City is also themselves the subject of many of the sound bite stats that appear on a daily basis. Unsurprisingly for a highly successful side, many of these statistical nuggets of information revolve around goal scoring exploits.
Last month Dzeko was a super sub. But a single strike, accounting for only 20% of the goals scored while he has been on the pitch as a replacement since early November has seen the "super sub" tag largely disappear from print and Dzeko's scoring rate as a late entrant has fallen closer to earth.
Gareth Barry's winning goal against Reading in the 93rd minute of the match appeared to merely reinforce City's apparent ability to score late goals at will and none were more important than the two stoppage time strikes in the final match of last season, when victory over ten man QPR secured the title. From the beginning of the 2011/12 campaign up to and including Saturday's last gasp win over the bottom club, City have scored 24 goals in the 85th minute or later, twice the number of their nearest rivals, United.
Reported in isolation the sound bite stat appears impressive. The narrative is clearly intended to portray City as an incredibly dangerous attacking side late in matches, certainly much more potent than their neighbours. With the backing of data going back over 56 previous matches, we appear to be looking at a cast iron case.
Since 2011/12, City have arguably been the best side in the Premiership, scoring 127 goals over those 56 games. So the first legitimate question is how many goals would such a team expect to score over that run of matches after the 85th minute ?
We can describe each of those games in terms of the initial goal expectancy that City would expect record over numerous repetitions of the 56 games. Goal times appear to be rounded up as two of City's goals, recorded as 85 minute strikes, were scored after 84 minutes and 30 seconds but before the 85th minute was reached. If we allow for the actual amount of "normal" time played, the actual stoppage time played, Manchester City's initial goal expectation in each match and the gradual increase in scoring rate which occurs for all teams as the contest progresses, we can calculate the goal expectation for City for every minute described in the recently circulated "85 minute" stat.
City is a top side and their average goal expectancy per game since August 2011 from the 85 minute and beyond is just over 3 tenths of a goal. Therefore, they would have expected to score 17 goals over that time span and their actual total of 24 is an impressive 40% higher. By delving deeper into the numbers, we appear to be confirming the validity of the sound bite.
However, there are more obstacles to overcome. 56 games appears impressive, but we have looked at on average just the final ten minutes of playing time for each game. In reality our sample size is only 567 minutes of actual playing time or the near equivalent of only six completed games. Manchester City in a run of six actual matches has produced an equivalent of 24 goals once over the last two seasons. As have United. If City (and United) can score at such impressive rates over a six match run selected from a season and half of games, it shouldn't be a great surprise that they can do likewise in a non random selection of 56 end games.
It's understandable that stats like this appear once such an event as a late goal has occurred, but as with super subs, this virtually guarantees a biased sample. The first of City's games in the 56 match run was a 4-0 win over Swansea on the opening day of the 2011/12 season, which saw Aguero scoring in injury time. The last game in the sequence was Reading on Saturday, when Barry did likewise. Probably inadvertently, we again have selective cut off points which start and finish with the attribute we are trying to measure in our sample.
If we extend the sample to included 2010/11 when City were still good enough to win the Cup, finish third with the same manger and many of the same players, we find that the gap between City and their nearest rivals over the period in the late goal stakes has shrunk from 12 goals to just 3. If we insist on comparing United and City over just the last two seasons we can also close the gap again to three by choosing the 81st minute instead of the by no means special 85th minute. The more you try to be fair and unbiased, the more biased cutoff points appear in the data.
City score lots of late goals because they are one of the two best teams currently in the Premiership and we can manipulate the apparent size of this advantage by taking results from different, but similar samples. If you dissect games into bite sized chunks of time, patterns will emerge that are neither representative of a team's real ability, nor repeatable in future contests. Quotes based on such inadvertently manipulated data are enticing, but have the power to mislead. As part of a wider picture such dicing of data may be useful, but in isolation they are prone to large corrections.
Poisson simulations using City's actual goal expectations from the 85th minute onwards in their last 56 games saw 24 or more late goals arriving in around 7% of the trials. City's 24 actual goals is an impressive achievement, but as larger samples and different timescales appear to indicate, a lower longterm figure should be expected, with little to chose between the Manchester clubs in this particular talent.....whatever the misleading sound bites might say.
"Poisson simulations using City's actual goal expectations from the 85th minute onwards in their last 56 games saw 24 or more late goals arriving in around 7% of the trials. "
ReplyDeleteMark, can you expand more on the Poisson simulations?
A quick way to sim City's goals from the 85th minute onwards.
ReplyDeleteTake the Wigan away game this season as an example.
City had around a 63% chance of winning that game, so they'd expect to score about an average of 1.9 goals and concede 0.8 gls in this game. If you stick those average goal expectancies into a Poisson & calculate each actual score combination which lead to a City win, total the individual probabilities, they will add up to 63%
The match actually had 4 minutes of injury time. From the 84th minute and 31secs onwards City's goal expectancy had declined from 1.9 at the start and a goal expectancy of about 0.281 gls remained.
Stick 0.281 into a raw, unadjusted Poisson and you will see that the chances of City scoring exactly one goal in the short time remaining will be about 0.212. The probability of exactly 2 goals being scored by City was 0.0298, three goals 0.00279 etc.
Set up an excel spreadsheet for all City's games, not just the Wigan one and calculate the probability of exactly one, exactly two, exactly three etc goals being scored in the final stages of the matches based on the actual amount of injury time played and how much City's goal expectancy had declined to by the 84:31 minute.
Set up a random number cell (=rand()). This will generate a random number between 1 and 0. For the Wigan game assign say between 0 and 0.212 of the randomly number to simulate the scoring of exactly one goal by City over that time span. Do similar for two goals, three goals etc.
Run as many sims as you feel necessary and sum the number of simulated seasons where City score x amount of goals after the 84:31 minute.