One of the more glaring omissions in attempting to make sense of the huge increase in available football data relates to a lack of context. A priority during one stage of a match may become less so as the game progresses and often the driving force for change will be the game state. The balance between defence and attack will shift with changing scorelines, time remaining and the relative abilities of the competing sides.
In this post here, I looked at how shooting efficiency, frequency and the identity of the shooter and type of goal attempts changed with changing game state. Arsenal's shooting was more efficient, less frequent and more confined to recognised goalscorers when they held a comfortable match position, compared to less efficient, more frequent and more evenly spread among defenders as well as strikers, when they were trying to recover from a losing or drawing position.
Analysing a single team for one season was relatively data intensive, requiring time stamped goal attempts, as well as regular in running calculations of the individual game state positions for the team. Arsenal are of course a successful side, so with a few exceptions, if they are trailing or even simply drawing during a game their current game state will be below their expectations for the game result as a whole. Therefore, they will have the desire, but much more importantly the ability to try to alter their current situation for the better. How they attempt to recover should be reflected in the change in simple in running stats, such as goal attempts or corners won.
Deducing game states for the very best teams is fairly easy without the need to calculate in running goal expectancy for both sides, then relate that to time remaining and current score and compare their current match position with their hopes before kickoff. In short, if they are trailing or drawing, the very best are probably under performing and will be dissatisfied with their current game position.
However, it is less clear if say Wigan are in an agreeable position or capable of improving their lot by referring solely to the current score. In this post I showed that Wigan are more likely than usual to score if they trail, but more likely to concede than usual if they lead. Losing is obviously bad and therefore encourages sides to try to level the game, partly by increased effort and partly by taking more risks and the same situation applies to their opponents when a side such as Wigan lead. But when the game is level and involves non big four sides, it is much less clear where the incentive to attack or defend currently lies. To estimate which team may be driving for a win and which will be happy with a point, we need to go back to calculating regular game states for both sides.
Short cuts are always welcome, as long as they preserve the essential ingredients of the more labour intensive study. In this post I showed how the pregame supremacy estimates are strongly related to the time a side will expect to spend leading, drawing and trailing in a match. So, if we use in running success rate, described here as a proxy for how the game actually went for a particular team and compare it to the pregame supremacy prediction expressed in a similar format, we can produce an informed guess as to how the game panned out for each team through the lens of actual game states compared to pregame aspirations.
For example last season Blackburn visited Old Trafford in a game that Ferguson would dearly want back. Unsurprisingly, United were strong pregame favourites and were given around a 83% chance of winning and 12% for the draw. In the format of success rate, where a team is given half credit for a draw and full credit for a projected win, that equates to a pregame projected success rate of 0.89. The reality was very different, Blackburn led for almost an hour, drew for just over half an hour and United never had the chance to lead, for an in running success rate from United's perspective of 0.19. A comparison of these two figures immediately tells us that United spent much of the time chasing a game and two goals from 27 shots appears to confirm this view.
As with Arsenal, this case is self evident, but the method allows us to tease apart the likely flow of attack and defensive contests in much closer match ups. This approach of comparing expectation with reality, may provide a quick, but reasonably representative way to add game state context to a multitude of stats, ranging from shot and save percentage to proportion of corners, without sacrificing the merits of the more detailed method involving repeated, team specific calculations.
To test this model, I looked to see if the league as a whole follows the Arsenal trait of having more frequent, but less accurate attempts overall in matches where they are likely playing catch up from their pregame expectations. I plotted shooting efficiency against the amount of deviation in actual in running success rate compared to pregame hopes and the trend appears to be present league wide. When likely trailing against expectation, in general, shots are less efficient, presumably as attempts become more speculative, against more concentrated defenses and from less able striking talent. R^2 is 0.17, which is huge for data points comprising individual games. R^2 is a hostage to sample size, and when sample size is small, random variation predominates, R^2 doesn't always need to be large. It too must be given context. Which is where we started this post.