'Tis the season for small sample sized hyperbole to be liberally launched on a expectant audience and the latest recipient of the "If he continues at this rate" award for unrealistic dreamland is none other than Lionel Messi.
While Ronaldo has been kicking his heels and the occasional Real Betis player, Messi has single-handedly (with the help of 10 teammates) launched Barcelona seven points clear of their perennial rivals from Madrid.
Messi turned 30 in the close season, he's playing in his 14th La Liga season and is undoubtedly one of the two best players of the last decade.
But he is still human and bound by the natural athletic decline that eventually sets in for every footballer.
Players improve with maturity and experience, peak, usually in their late twenties and then begin an inexorable decline, albeit from differing peaks.
Messi's post birthday, six game return in the UCL and La Liga, but discounting a two legged Spanish Super Cup defeat at the hands of Ronaldo's Madrid, has been spectacular, even by his standards.
It has spawned at least one article, liberally salted with stats to enhance credibility, eagerly anticipating the untold riches to come.
Unfortunately, five or six games is so small that you will inevitably get extremes of performance, either very good or very bad.
Particularly, if you selectively top and tail the games to eliminate a comprehensive defeat, devoid of any Messi goals from open play at the hands of your nearest rivals, but conclude with a three open play scoring performance from the Argentine.
Small samples are noisy, unbalanced and rarely definitively indicative of what will happen in the longer term or even just a single season.
Barcelona has played Alaves, Eibar, Getafe, Espanyol and Betis, only the latter is currently higher than 13th.
As a data point it is all but useless to project Messi's 2017/18 season.
Individual careers are statistically noisy. Injury, shifted positional play and team mate churn are just some of the factors that can make for an atypical seasonal return, even before we try to decide which metric is sufficiently robust to reflect individual performance.
If we use goals and assists to judge Messi up to his 30th birthday, his delta, the change in non penalty goals and assists per 90 from one season to the previous season trends negative when Messi was 27, guesstimating this was when he peaked.
If we include 2017/18's small sample sized explosion as a fully developed rate for this upcoming season, the trendline still becomes negative this year.
If we regress this current hot rate towards Messi's most recent deltas, as we should, Messi's peak stretches to his 28th birthday.
But by his own standards he has likely peaked.
Open play goals and expected goals for the last three and the first 5 games of 2017/18 tell a similar gentle decline, even allowing for Messi's recent spurt of scoring.
Actual, non penalty, open play goals/90 are trending downwards, as are Messi's xG per 90 on a 10 game rolling average.
The actual trendline is also probably more shallower because of the narrative driven choice of his three open play goal spree against Eibar providing the doorstop.
That Messi consistently over performs the average player xG isn't surprising, but the peaks, like the one he's currently enjoying is often driven by a glut of relegation threatened sides turning up in Barcelona's lumpy quality of schedule.
Enjoy the blips, but don't draw conclusions based on so little evidence.
Data from Infogolapp.
Pages
▼
Saturday, 23 September 2017
Sunday, 10 September 2017
Messi and Ronaldo. Expected Goals Makers, Takers or a Bit of Both.
With the increased availability of granular data, there has been a similar influx of advanced metrics, both for players and sides across a wider range of domestic leagues.
And while performance based numbers, often to a couple of decimal places, are the raw material for much of the analytically based content, their attractiveness and clarity of meaning rarely extend beyond the spreadsheet.
It therefore falls to visualisations to convey some of the rich seams of information available in such manipulated data sets in a clear and easily digestible format, such as Ted Knutson's hugely popular radars.
Expected goals remain the flavour of the month, although BBC pundits are still immune, imploring players to "do better" with opportunities that are scored fewer than one time in 10.
A team or individual's attacking contribution can be neatly summarised by their expected goals and assists, standardised at least to a per 90 figure, with respect given to those who have achieved their numbers over a larger sample size compared to noisy small sample interlopers, ripe for regression.
Here's the xG/90 and xA/90 for the 70 largest cumulative, goal involvement achievers from La Liga's 2016/17 season.
Data is from @InfogolApp and has been restricted to open play chances and assists.
Messi and Ronaldo are among a clutch of players who have broken away from the main body of the plot, although they are also quite a distance remove from each other.
Messi was involved in around 0.85 xg+xA per 90 and Ronaldo around 0.65.
However, the former, while slightly under-performing against the latter in getting on the end of xG scoring chances, more than compensated by creating over double the amount of expected assists per 90.
So a simple scatter plot can begin to reveal fundamental differences between even the most high profile of players.
More information can be extracted by simply running a straight line between a particular player's point on a scatter graph and the origin.
Moving down such a line, you'll encounter players who in the season under scrutiny, achieved ratios for xg and xA that closely resemble those of the line owning player.
The magnitude of their cumulative performance is less than those players that are further away from the origin, but their shot/assist characteristics will be consistent with any near neighbours.
Messi was a more sharing team mate in open play in 2016/17, whereas Ronaldo headed the line of takers, rather than makers.
And while performance based numbers, often to a couple of decimal places, are the raw material for much of the analytically based content, their attractiveness and clarity of meaning rarely extend beyond the spreadsheet.
It therefore falls to visualisations to convey some of the rich seams of information available in such manipulated data sets in a clear and easily digestible format, such as Ted Knutson's hugely popular radars.
Expected goals remain the flavour of the month, although BBC pundits are still immune, imploring players to "do better" with opportunities that are scored fewer than one time in 10.
A team or individual's attacking contribution can be neatly summarised by their expected goals and assists, standardised at least to a per 90 figure, with respect given to those who have achieved their numbers over a larger sample size compared to noisy small sample interlopers, ripe for regression.
Here's the xG/90 and xA/90 for the 70 largest cumulative, goal involvement achievers from La Liga's 2016/17 season.
Data is from @InfogolApp and has been restricted to open play chances and assists.
Messi and Ronaldo are among a clutch of players who have broken away from the main body of the plot, although they are also quite a distance remove from each other.
Messi was involved in around 0.85 xg+xA per 90 and Ronaldo around 0.65.
However, the former, while slightly under-performing against the latter in getting on the end of xG scoring chances, more than compensated by creating over double the amount of expected assists per 90.
So a simple scatter plot can begin to reveal fundamental differences between even the most high profile of players.
More information can be extracted by simply running a straight line between a particular player's point on a scatter graph and the origin.
Moving down such a line, you'll encounter players who in the season under scrutiny, achieved ratios for xg and xA that closely resemble those of the line owning player.
The magnitude of their cumulative performance is less than those players that are further away from the origin, but their shot/assist characteristics will be consistent with any near neighbours.
Messi was a more sharing team mate in open play in 2016/17, whereas Ronaldo headed the line of takers, rather than makers.
Friday, 8 September 2017
Shot Blocking and the State of the Game.
It has long been appreciated that the dynamics of a game subtly alters as time elapses, scorelines alter or remain the same and pre match expectations are met, exceeded or under shot.
This shifting environment has traditionally been investigated using the simple measure of the current score.
This has been unfortunately labelled as games state, when simply "score differential" would have both succinctly described the underlying benchmark being applied, without hinting at a more nuanced approach than just subtracting one score from another.
As I blogged here, the problem is most acute when lumping the not uncommon, stalemated matches together.
Consider a game between a strong favourite and an outsider that finishes goalless.
Whereas the latter more than matches their pregame expectation, the former falls disappointingly short of theirs.
The average expectation at any point in a game can be represented in a number of ways, but perhaps the most intuitive is an estimation of the average number of points a team will pick up based on the relative strengths of themselves and their opponent, at the current scoreline and with the time that remains.
The plot above shows the relative movement of the expected points for a strong favourite playing weaker opposition to a 0-0 conclusion.
The favourite would expect to average around 2.5 points per match up at kick off, decaying exponentially to one actual point at full time.
So at any point in the match we can measure the favourite's current expectation compared to their pregame benchmark and use this to describe their own level of satisfaction with the state of the game.
Game state would be preferable, but that's already taken.
The same is true for the outsider. Their state of the game gradually increases compared to their much reduced pregame expectation.
Although the game is scoreless throughout for each side, things are getting progressively worse for the favourite and better for their opponents.
We can use these shifting state of the game environments to see if they have an effect on in game actions.
Intuitively you would expect the team doing less well compared to their expectations to gradually commit more resources to attack, in turn forcing their opponents onto the defensive.
This may increase shot volume for the former, but it is also likely that these attempts, particularly from open play will fall victim to more defensive actions, such as blocks.
The reverse would seem likely to be true for the weaker team. Although their shot count may fall, with less defensive duties being carried out by their opponents, their sparser shot count may evade more defensive interventions, again such as blocks.
Here's what the modelled fate of a shot from regular play from just outside the penalty area in a fairly central position looks like between two unequal teams as the match progresses.
Data is from a Premier League season via @infogolApp
In building the model, the decay in initial expectation has been used to describe the state of the game for the attacking team when each individual shot was attempted, rather than simply using score differential.
Initially the weaker team is less likely to have their shot blocked, although it is probably more accurate to say that the favoured side is more likely to suffer this fate.
As the game progresses, the better team sees a slight increase in the likelihood that a shot from just outside the box is blocked, perhaps suggesting that their opponents are initially heavily committed to a defensive structure.
The weaker side has a lower initial likelihood that such a shot is blocked, again implying a more normal amount of defensive pressure early in the game. But as the match progresses this likelihood that their shots are blocks falls even more.
This nuanced model appears to be illustrating the classic potential for a prolonged rearguard action from an underdog, followed by a late smash and grab opening goal, mitigated by the relative shot counts from each team.
This shifting environment has traditionally been investigated using the simple measure of the current score.
This has been unfortunately labelled as games state, when simply "score differential" would have both succinctly described the underlying benchmark being applied, without hinting at a more nuanced approach than just subtracting one score from another.
As I blogged here, the problem is most acute when lumping the not uncommon, stalemated matches together.
Consider a game between a strong favourite and an outsider that finishes goalless.
Whereas the latter more than matches their pregame expectation, the former falls disappointingly short of theirs.
The average expectation at any point in a game can be represented in a number of ways, but perhaps the most intuitive is an estimation of the average number of points a team will pick up based on the relative strengths of themselves and their opponent, at the current scoreline and with the time that remains.
The plot above shows the relative movement of the expected points for a strong favourite playing weaker opposition to a 0-0 conclusion.
The favourite would expect to average around 2.5 points per match up at kick off, decaying exponentially to one actual point at full time.
So at any point in the match we can measure the favourite's current expectation compared to their pregame benchmark and use this to describe their own level of satisfaction with the state of the game.
Game state would be preferable, but that's already taken.
The same is true for the outsider. Their state of the game gradually increases compared to their much reduced pregame expectation.
Although the game is scoreless throughout for each side, things are getting progressively worse for the favourite and better for their opponents.
We can use these shifting state of the game environments to see if they have an effect on in game actions.
Intuitively you would expect the team doing less well compared to their expectations to gradually commit more resources to attack, in turn forcing their opponents onto the defensive.
This may increase shot volume for the former, but it is also likely that these attempts, particularly from open play will fall victim to more defensive actions, such as blocks.
The reverse would seem likely to be true for the weaker team. Although their shot count may fall, with less defensive duties being carried out by their opponents, their sparser shot count may evade more defensive interventions, again such as blocks.
Here's what the modelled fate of a shot from regular play from just outside the penalty area in a fairly central position looks like between two unequal teams as the match progresses.
Data is from a Premier League season via @infogolApp
In building the model, the decay in initial expectation has been used to describe the state of the game for the attacking team when each individual shot was attempted, rather than simply using score differential.
Initially the weaker team is less likely to have their shot blocked, although it is probably more accurate to say that the favoured side is more likely to suffer this fate.
As the game progresses, the better team sees a slight increase in the likelihood that a shot from just outside the box is blocked, perhaps suggesting that their opponents are initially heavily committed to a defensive structure.
The weaker side has a lower initial likelihood that such a shot is blocked, again implying a more normal amount of defensive pressure early in the game. But as the match progresses this likelihood that their shots are blocks falls even more.
This nuanced model appears to be illustrating the classic potential for a prolonged rearguard action from an underdog, followed by a late smash and grab opening goal, mitigated by the relative shot counts from each team.
Tuesday, 5 September 2017
Premier League Defensive Profiles.
Heat maps and the like have been around for ages as a way of visualising the sphere of a particular players influence.
However, it's always nice to have some numerical input to work with, so I've used the Opta event data that powers InfoGol's xG and in running app to develop metrics that describe how teams and individuals contribute over a season.
Defensive metrics have lagged well behind goals and assists, so I looked at that neglected side of the ball.
Unlike goal attempts, counting defensive stats tends to be a fairly futile exercise. No one willingly wants to keep making last ditch tackles and racking up ever higher defensive events is more often the sign of a team in trouble.
There's also the disparity in possession time which gives the possession poor team more chances to accrue defensive events.
Therefore, pitch position, rather than bulk events seems an obvious alternative.
Allowing a side lots of touches deep in your territory is intuitively a bad idea and the higher up the field a side is willing or able to engage their opponent would appear preferable.
Measurements have been calculated from the Opta X, Y point of an event to the centre of a team's own goal line.
Thus a tackle or clearance made on the half way line will be further from this point of reference if it is made near the touchline compared to if it completed on the centre spot.
This allows for defensive event profiles for both a team and also their opponents.
A quick eye test appears to show that the more successful Premier League teams do their defending further away from their own goal than the lesser sides are either willing or able to do.
That the idea that doing defensive stuff higher up the pitch is the product of a good team is further developed by plotting where a side defends on average and where they allow their opponents to defend, again on average.
The relegated teams from 2016/17 mostly suffered the doubly whammy of choosing or having to defend an average of around 34 yards from the centre of their own goal line compared to nearly 40 yards for some of the top 6 and they also allowed their opponents the luxury of making defensive actions around 38 yards from their own goal line.
Notably Pulis again muscles into an area apparently reserved for relegation fodder with his defensive voodoo.
At a player level it's a trivial problem to find the average pitch position where he makes a defensive action and then find how closely or far flung each individual action is from this average point.
These numbers can then be used as the average position for a player's defensive contribution, measured from the centre of his own goal and also how widely this area extends to.
N'Golo Kante's an obvious candidate to see if this simple exercise again passes the eye test.
In 2016/17 the average pitch position for Kante's defensive actions was 45 yards from his own goal.
The average distance between this average position and all the defensive actions he made was 23 yards
The latter was greater than the average for all defensive midfielders as a group.
We could perhaps say that Kante was relatively advanced in his defensive actions (he was seven yards further up field that his former team mate Nemanja Matic) and his field of influence was also more expansive compared again to Matic and his peers.
Charlie Adam, by contrast appears more constrained by the role required from him. In 2016/17 he tackled deeper than both Kante and Matic and strayed less far afield.
He more resembled a disciplined central defender in his defensive foraging and in doing so remained roughly where his energy bar lands on the pitch around the 70th minute.
However, it's always nice to have some numerical input to work with, so I've used the Opta event data that powers InfoGol's xG and in running app to develop metrics that describe how teams and individuals contribute over a season.
Defensive metrics have lagged well behind goals and assists, so I looked at that neglected side of the ball.
Unlike goal attempts, counting defensive stats tends to be a fairly futile exercise. No one willingly wants to keep making last ditch tackles and racking up ever higher defensive events is more often the sign of a team in trouble.
There's also the disparity in possession time which gives the possession poor team more chances to accrue defensive events.
Therefore, pitch position, rather than bulk events seems an obvious alternative.
Allowing a side lots of touches deep in your territory is intuitively a bad idea and the higher up the field a side is willing or able to engage their opponent would appear preferable.
Measurements have been calculated from the Opta X, Y point of an event to the centre of a team's own goal line.
Thus a tackle or clearance made on the half way line will be further from this point of reference if it is made near the touchline compared to if it completed on the centre spot.
This allows for defensive event profiles for both a team and also their opponents.
A quick eye test appears to show that the more successful Premier League teams do their defending further away from their own goal than the lesser sides are either willing or able to do.
That the idea that doing defensive stuff higher up the pitch is the product of a good team is further developed by plotting where a side defends on average and where they allow their opponents to defend, again on average.
The relegated teams from 2016/17 mostly suffered the doubly whammy of choosing or having to defend an average of around 34 yards from the centre of their own goal line compared to nearly 40 yards for some of the top 6 and they also allowed their opponents the luxury of making defensive actions around 38 yards from their own goal line.
Notably Pulis again muscles into an area apparently reserved for relegation fodder with his defensive voodoo.
At a player level it's a trivial problem to find the average pitch position where he makes a defensive action and then find how closely or far flung each individual action is from this average point.
These numbers can then be used as the average position for a player's defensive contribution, measured from the centre of his own goal and also how widely this area extends to.
N'Golo Kante's an obvious candidate to see if this simple exercise again passes the eye test.
In 2016/17 the average pitch position for Kante's defensive actions was 45 yards from his own goal.
The average distance between this average position and all the defensive actions he made was 23 yards
The latter was greater than the average for all defensive midfielders as a group.
We could perhaps say that Kante was relatively advanced in his defensive actions (he was seven yards further up field that his former team mate Nemanja Matic) and his field of influence was also more expansive compared again to Matic and his peers.
Charlie Adam, by contrast appears more constrained by the role required from him. In 2016/17 he tackled deeper than both Kante and Matic and strayed less far afield.
He more resembled a disciplined central defender in his defensive foraging and in doing so remained roughly where his energy bar lands on the pitch around the 70th minute.