Excitement at a sporting event is a subjective measurement.
It doesn't quite equate to brilliance, as a 7-2 thrashing has to be appreciated for the excellence of the performance of one of the teams, but as the score differential climbs, morbid fascination takes over, at least for the uncommitted.
Nor does it tally with technical expertise. A delicately crafted passing movement doesn't quite set the pulse racing like a half scuffed close range shot that deflects off the keepers knee and loops agonisingly over the bar with the game on the line.
You can attempt to quantify excitement using a couple of benchmark requirements.
The game should contain a fair number of dramatic moments that potentially might have changed the course of the outcome or actually do lead to a significant alteration to the score.
It's easy to measure the change in win probability associated with an actual goal.
A goal that breaks a tied game in the final minutes will advance the chances of the scoring team by a significant amount, whilst the seventh goal in a 7-2 win merely rubs salt into the goal difference of the defeated side.
Spurned chances at significant junctures are only slightly more difficult to quantify.
You can take a probabilistic view and attach the likelihood that a chance was taken based on the chance's expected goals figure to the effect that an actual goal would have had on the winning chances of each side.
Summing the actual and probabilistic changes in win probability for each goal attempt in each match played in the 2016/17 Premier League season gives the five most "in the balance", chance laden matches from that season.
Top Five Games for Excitement 2016/17 Premier League
No surprise to see the Swansea/Palace game as the season's most exciting encounter, with Palace staging a late comeback, before an even later Swansea response claimed all three points in a nine goal thriller.
Overall I've ranked each of the 380 matches from 2016/17 in order of excitement as measured by the actual and potential outcomes of the chances created by each team in the game
Bournemouth's games had the biggest share of late, game swinging goals, along with the most unconverted endeavour when the match was still in the balance.
While Tottenham, despite playing in the season's second most exciting game, a very late 3-2 win over West Ham, more typically romped away with games, leaving the thrill seekers looking for a match with more competitive balance to tune into.
Middlesbrough fans not only saw their side relegated, but they did so in rather bland encounters, as well.
Pages
▼
Sunday, 22 October 2017
Saturday, 14 October 2017
Player Projections. It's All About The Distribution Part 15
A couple of football analytics' little obsessions are correlations and extrapolations.
Many player metrics have been deemed flawed because they fail to correlate from one season to the next, but there are probably good reasons why the diminished sample sizes available for individuals lead to poor season on season correlation.
Simple random variation, players suffer injury, a change in team mates or role within a club, atypically small sample sizes often lead to see sawing rate measurements and inevitably players age and so can be on a very different career trajectory to others within the sample.
The problems associated with neglecting the age profile of a group of players when attempting to identify trends for use in future projections is easily demonstrated by looking at the playing time (as a proxy for ability) enjoyed by players who were predominated aged 20 and 30 when members of a Premier League squad and how that time altered in their 21st and 31st years.
The 30 year oldies played Premier League minutes equivalent to 15 full matches, falling to 12 matches in their 31st year. So they were still valued enough to play fairly regularly, but perhaps due to the onset of decline in their abilities they featured, on average, less than they had done.
The reverse, as you may expected was true for the younger players. They won the equivalent of seven full games in their 20th year and nine the following season.
It seems clear that if you want to project a player's abilities from one season to the next and playing time provides a decent talent proxy, you should expect improvement from the youngster and decline from the older pro.
However, as with many such problems, we might be guilty of attempting to impose a linear relationship onto a population that is much better defined by a distribution of possible outcomes.
The table above shows the range of minutes played by 21 and 31 year olds who had played 450 minutes or fewer in the previous season as 20 or 30 year old players.
As before, we may describe the change in playing time as an average. In this subset, the older players play very slightly more than they did as 30 year olds, the equivalent of two games, improving to 2.2.
The younger players jump from 1.8 games to 3.6.
However, just as cumulative xG figures can hide very different distributions, particularly of big chances which subtly alter our expectation for different teams, the distribution of playing minutes that comprise the average change of playing time can be both heavily skewed and vary between the two groups.
Over three quarters of 30 year old didn't get on the field at all during the next Premier League season, likewise 2/3 of the younger ones..
21% of young players played a similar amount of time to the previous season, between one and 450 minutes, compared to just 14% of the older ones. And 17% of youngsters exceeded the total from the previous season, as did just 10% of the veterans.
So if you use the baseline rate of increased playing time as a flat rate across all players that fall into these two categories in the future, you might be slightly disappointed, because overwhelmingly the experience of such players is one where they fail to play even a minute in the following season.
Knowing that there is an upside, on average for these two groups of players, based on historical precedent is a start, but knowing that 3 out of 4 the oldies and 2 out of 3 youngsters who you are considering didn't merit one minutes worth of play in an historical sample is also a fairly important, if not overriding input.
Many player metrics have been deemed flawed because they fail to correlate from one season to the next, but there are probably good reasons why the diminished sample sizes available for individuals lead to poor season on season correlation.
Simple random variation, players suffer injury, a change in team mates or role within a club, atypically small sample sizes often lead to see sawing rate measurements and inevitably players age and so can be on a very different career trajectory to others within the sample.
The problems associated with neglecting the age profile of a group of players when attempting to identify trends for use in future projections is easily demonstrated by looking at the playing time (as a proxy for ability) enjoyed by players who were predominated aged 20 and 30 when members of a Premier League squad and how that time altered in their 21st and 31st years.
The 30 year oldies played Premier League minutes equivalent to 15 full matches, falling to 12 matches in their 31st year. So they were still valued enough to play fairly regularly, but perhaps due to the onset of decline in their abilities they featured, on average, less than they had done.
The reverse, as you may expected was true for the younger players. They won the equivalent of seven full games in their 20th year and nine the following season.
It seems clear that if you want to project a player's abilities from one season to the next and playing time provides a decent talent proxy, you should expect improvement from the youngster and decline from the older pro.
However, as with many such problems, we might be guilty of attempting to impose a linear relationship onto a population that is much better defined by a distribution of possible outcomes.
The table above shows the range of minutes played by 21 and 31 year olds who had played 450 minutes or fewer in the previous season as 20 or 30 year old players.
As before, we may describe the change in playing time as an average. In this subset, the older players play very slightly more than they did as 30 year olds, the equivalent of two games, improving to 2.2.
The younger players jump from 1.8 games to 3.6.
However, just as cumulative xG figures can hide very different distributions, particularly of big chances which subtly alter our expectation for different teams, the distribution of playing minutes that comprise the average change of playing time can be both heavily skewed and vary between the two groups.
Over three quarters of 30 year old didn't get on the field at all during the next Premier League season, likewise 2/3 of the younger ones..
21% of young players played a similar amount of time to the previous season, between one and 450 minutes, compared to just 14% of the older ones. And 17% of youngsters exceeded the total from the previous season, as did just 10% of the veterans.
So if you use the baseline rate of increased playing time as a flat rate across all players that fall into these two categories in the future, you might be slightly disappointed, because overwhelmingly the experience of such players is one where they fail to play even a minute in the following season.
Knowing that there is an upside, on average for these two groups of players, based on historical precedent is a start, but knowing that 3 out of 4 the oldies and 2 out of 3 youngsters who you are considering didn't merit one minutes worth of play in an historical sample is also a fairly important, if not overriding input.
Wednesday, 11 October 2017
World Cup Qualification So Far.
To save my Twitter feed from viz overload, here's a couple of plots from the completed World Cup qualifiers.
FIFA ratings usually get a good kicking, but if you know their limitations they do a decent job and have done in predicting the qualifying teams so far for 2018.
Some higher rated teams will miss out, it's only 10 games in some cases, after all.
But if you want a benchmark FIFA rating at the time qualifying began in 2015, the definite qualifiers had a median rating of 891.
Those still waiting on a playoff were rated 676 and those rooting for other countries were 464.
Check your country and see if they ended up roughly in the position they deserved based on 2015 FIFA rankings.
FIFA don't seem to want you to find historical ratings, but to the best of my knowledge these were the ratings each side had in October 2015, apart from the three I couldn't find & made up.
FIFA ratings usually get a good kicking, but if you know their limitations they do a decent job and have done in predicting the qualifying teams so far for 2018.
Some higher rated teams will miss out, it's only 10 games in some cases, after all.
But if you want a benchmark FIFA rating at the time qualifying began in 2015, the definite qualifiers had a median rating of 891.
Those still waiting on a playoff were rated 676 and those rooting for other countries were 464.
Check your country and see if they ended up roughly in the position they deserved based on 2015 FIFA rankings.
FIFA don't seem to want you to find historical ratings, but to the best of my knowledge these were the ratings each side had in October 2015, apart from the three I couldn't find & made up.
Sunday, 8 October 2017
Premier League Age Profiles Through the Ages
I found some data I collected but never got round to analysing for the joint OptaProForum presentation with Simon Gleave a few years ago.
It simply consists of minutes played by each age group in the four highest tiers of English domestic football.
There are a variety of methods to describe the ageing curve in football, where players initially show improvement, peak and then decline with age. I prefer the delta approach, which charts the change of a variety of performance related indicators or their proxies.
We may condense the age profile of a team or league down into three main groups. Young players, under 24 who are still improving,
Peak age performers from around 24 to 29 and ageing players of 30 or more, who may still be good enough to command some playing time, but are diminishing compared to their own peak levels.
Using the amount of playing time allowed to each of the three groups as a performance proxy, the peak age group of Premier League players have been increasing their share at the expense of both the younger and older groups since 2004/05. Peak share has risen from 48% of the available playing time at the start of the period to 60% by 2014/15.
The wealth of the Premier League and the limited alternative destinations for the best, prime aged talent would appear to be a reasonable cause for this increase. Perhaps only Spain's Barcelona and Real Madrid (Suarez and Bale) account for the few realistic destinations for peak age, Premier League talent.
By contrast, League Two, the fourth tier of English football, appears to have a very different age profile.
Here, youth and peak aged players share playing time, with 30 & over players lagging well below these levels, implying a different market further down the pyramid.
Players are not being recruited from the extreme right hand tail of the talent pool, so more options of similar ability are available and there is also an extensive pool of buyers in the two or three divisions immediately above League Two, ready to take on the cream of the peak age performers.
Finally here's the plots for the best Premier League teams compared to the remainder of the clubs.
.
Peak shares are similar for both groups, but the top teams have played a larger share of (talented) younger players, while the remainder of the Premier League have swayed slightly more towards experience (perhaps ageing players from the top teams dropping in grade, but remaining in the Premier League).
Crouch at Stoke, for example.
Liverpool's individual profile appears to illustrate how their age profile has remained similar to the average for top Premier League teams across the 11 seasons.
Over 30's make up the lowest proportion of playing time, followed by younger players and topped of by peak age talent.
30+ contribution falls away, to be replaced by ageing peak age talent, which in turn is refreshed by maturing younger players. Replacement buys can then be made in the 22-24 range to continue the cycle.
By contrast, Everton has chosen to largely swap around the over 30 group and the under 24 group, leading to seasons where older players dominate.
It simply consists of minutes played by each age group in the four highest tiers of English domestic football.
There are a variety of methods to describe the ageing curve in football, where players initially show improvement, peak and then decline with age. I prefer the delta approach, which charts the change of a variety of performance related indicators or their proxies.
We may condense the age profile of a team or league down into three main groups. Young players, under 24 who are still improving,
Peak age performers from around 24 to 29 and ageing players of 30 or more, who may still be good enough to command some playing time, but are diminishing compared to their own peak levels.
Using the amount of playing time allowed to each of the three groups as a performance proxy, the peak age group of Premier League players have been increasing their share at the expense of both the younger and older groups since 2004/05. Peak share has risen from 48% of the available playing time at the start of the period to 60% by 2014/15.
The wealth of the Premier League and the limited alternative destinations for the best, prime aged talent would appear to be a reasonable cause for this increase. Perhaps only Spain's Barcelona and Real Madrid (Suarez and Bale) account for the few realistic destinations for peak age, Premier League talent.
By contrast, League Two, the fourth tier of English football, appears to have a very different age profile.
Here, youth and peak aged players share playing time, with 30 & over players lagging well below these levels, implying a different market further down the pyramid.
Players are not being recruited from the extreme right hand tail of the talent pool, so more options of similar ability are available and there is also an extensive pool of buyers in the two or three divisions immediately above League Two, ready to take on the cream of the peak age performers.
Finally here's the plots for the best Premier League teams compared to the remainder of the clubs.
.
Peak shares are similar for both groups, but the top teams have played a larger share of (talented) younger players, while the remainder of the Premier League have swayed slightly more towards experience (perhaps ageing players from the top teams dropping in grade, but remaining in the Premier League).
Crouch at Stoke, for example.
Liverpool's individual profile appears to illustrate how their age profile has remained similar to the average for top Premier League teams across the 11 seasons.
Over 30's make up the lowest proportion of playing time, followed by younger players and topped of by peak age talent.
30+ contribution falls away, to be replaced by ageing peak age talent, which in turn is refreshed by maturing younger players. Replacement buys can then be made in the 22-24 range to continue the cycle.
By contrast, Everton has chosen to largely swap around the over 30 group and the under 24 group, leading to seasons where older players dominate.
Wednesday, 4 October 2017
Quick & Dirty Strength of Schedule.
I've recently posted some xG, strength of schedule adjusted figures for the Premier League and justin_mcguirk has asked for a method.
The sos values have been intended to be purely descriptive, rather than attempting to more accurately portray underlying team quality.
But intuitively you can look at WBA's start where they've not faced one genuine title contender, in Bournemouth, Burnley, Stoke, Brighton, WHU, Arsenal and Watford and compare it to Everton's lucky seven of Stoke, Man City, Chelsea, Spurs, ManUtd, Bournemouth and Burnley and immediately think that Everton's start has been more difficult than that of the Baggies.
Strength of schedule can be calculated using a steal from the NFL, particularly so called least squares Massey ratings..
In the case of the Premier League, each teams schedule is laid out, followed by a performance parameter, such as goal or expected goal difference. The seven inputs (the teams they've played out of a possible twenty for each team) are then calculated, such that the errors arising when trying to solve each of the twenty simultaneous equations are reduced to a minimum.
The maths is doable using matrices, although 20x20 matrices can sometimes resist inversion and I'm sure many packages will undertake the heavy lifting as well.
For those who would like a simpler and probably equally informative approach you can average the goal or expected goal difference of the seven teams a side has played.
These seven teams will have played 49 matches, admittedly seven will be against the side whose strength of schedule you are attempting to estimate, but their 49 games will have been against a broadly league representation.
Here's the sos table using this method after six games. It is broadly similar to the one I posted on Twitter after 7 and using a least squares approach.
Everton still had the toughest start & WBA the easiest. Chelsea moved up towards a more taxing unbalanced schedule by hosting Man City as did Palace visiting Manchester United.
Also more information about each team and their opponents has become available after seven games.
Finally, here's the individual calculations for WBA & Everton. Stoke's xG for after 6 games was 5.2 and they'd allowed 9.2 xG.
Data from @InfogolApp
The sos values have been intended to be purely descriptive, rather than attempting to more accurately portray underlying team quality.
But intuitively you can look at WBA's start where they've not faced one genuine title contender, in Bournemouth, Burnley, Stoke, Brighton, WHU, Arsenal and Watford and compare it to Everton's lucky seven of Stoke, Man City, Chelsea, Spurs, ManUtd, Bournemouth and Burnley and immediately think that Everton's start has been more difficult than that of the Baggies.
Strength of schedule can be calculated using a steal from the NFL, particularly so called least squares Massey ratings..
In the case of the Premier League, each teams schedule is laid out, followed by a performance parameter, such as goal or expected goal difference. The seven inputs (the teams they've played out of a possible twenty for each team) are then calculated, such that the errors arising when trying to solve each of the twenty simultaneous equations are reduced to a minimum.
The maths is doable using matrices, although 20x20 matrices can sometimes resist inversion and I'm sure many packages will undertake the heavy lifting as well.
For those who would like a simpler and probably equally informative approach you can average the goal or expected goal difference of the seven teams a side has played.
These seven teams will have played 49 matches, admittedly seven will be against the side whose strength of schedule you are attempting to estimate, but their 49 games will have been against a broadly league representation.
Here's the sos table using this method after six games. It is broadly similar to the one I posted on Twitter after 7 and using a least squares approach.
Everton still had the toughest start & WBA the easiest. Chelsea moved up towards a more taxing unbalanced schedule by hosting Man City as did Palace visiting Manchester United.
Also more information about each team and their opponents has become available after seven games.
Finally, here's the individual calculations for WBA & Everton. Stoke's xG for after 6 games was 5.2 and they'd allowed 9.2 xG.
Data from @InfogolApp
Tuesday, 3 October 2017
Crystal Palace.....The Only Way Is Up.
A quick post to try to put Crystal Palace's current predicament into some kind of historical context.
In terms of points, they've (obviously) had the worst start through seven matches in the lifetime of the 20 team Premier League.
Zero points, zero goals and not one iota of friendly randomness to break their duck in either category, despite bad, but not completely hopeless xG figures.
Particularly in chances created.
Points won are just one factor in determining how bad a side has started their campaign. The aim of the majority of teams in the Premier League is to simply stay in it for next season and your proximity to your nearest rivals is therefore just as important as merely your own points total.
One this basis, there's arguably a few teams ahead of Palace in claiming the worst initial seven game record.
Southampton in 1998/99, Portsmouth in 2009/10, their administration year and Sunderland in 2013/14 could be considered to have been worse off than Palace are now. Each may have won more points than Palace has, but Palace are marginally closer to both their immediate rivals and even mid table than were this trio.
Also, poor starts aren't an automatic ticket to the Championship.
50% of the 20 worst placed teams, compared to their 19 rivals after seven matches managed to stay up, although conversely the better the start, the more likely survival becomes.
27 teams have been comfortably placed equidistant from the leaders and the 20th placed side after seven matches and four ultimately fell through the trapdoor. But after that it became plain sailing and survival has been universal.
If we use Palace's proximity to their rivals as a measure of their start and compare the fate and the ranking of all teams in the 20 team Premier League era after seven games, there is more than a glimmer of hope.
Based on historical precedent and that alone, Palace have around a 28% chance of escaping relegation.
Of course a side is not relegated just on a single statistic. Injuries, the January window and their underlying stats all contribute to the reckoning in May.
Palace have had around the fourth toughest start in terms of opposition faced. It gets much less arduous after the play Chelsea in game eight, but they haven't enjoyed good luck with injuries to key attackers.
Their 10 game rolling xGD and actual GD since 2014 has been trending downwards over time, but the precipitous disconnect between process and outcome in recent matches is unlikely to persist.
They are far from the worst team in the current Premier League when measured over a more prolonged time frame. And although they have given inferior sides a start, it is a start that has been run down in the past.
Supporters will be correct to be pessimistic, Palace are probably more likely to be relegated than not, but the bookies price of 1.53, with an implied probability 65% still leaves their survival chances somewhere around the mid to low 30%'s.
A similar level of success enjoyed by their single cause predecessors mentioned earlier in this post.
In terms of points, they've (obviously) had the worst start through seven matches in the lifetime of the 20 team Premier League.
Zero points, zero goals and not one iota of friendly randomness to break their duck in either category, despite bad, but not completely hopeless xG figures.
Particularly in chances created.
Points won are just one factor in determining how bad a side has started their campaign. The aim of the majority of teams in the Premier League is to simply stay in it for next season and your proximity to your nearest rivals is therefore just as important as merely your own points total.
One this basis, there's arguably a few teams ahead of Palace in claiming the worst initial seven game record.
Southampton in 1998/99, Portsmouth in 2009/10, their administration year and Sunderland in 2013/14 could be considered to have been worse off than Palace are now. Each may have won more points than Palace has, but Palace are marginally closer to both their immediate rivals and even mid table than were this trio.
Also, poor starts aren't an automatic ticket to the Championship.
50% of the 20 worst placed teams, compared to their 19 rivals after seven matches managed to stay up, although conversely the better the start, the more likely survival becomes.
27 teams have been comfortably placed equidistant from the leaders and the 20th placed side after seven matches and four ultimately fell through the trapdoor. But after that it became plain sailing and survival has been universal.
If we use Palace's proximity to their rivals as a measure of their start and compare the fate and the ranking of all teams in the 20 team Premier League era after seven games, there is more than a glimmer of hope.
Based on historical precedent and that alone, Palace have around a 28% chance of escaping relegation.
Of course a side is not relegated just on a single statistic. Injuries, the January window and their underlying stats all contribute to the reckoning in May.
Palace have had around the fourth toughest start in terms of opposition faced. It gets much less arduous after the play Chelsea in game eight, but they haven't enjoyed good luck with injuries to key attackers.
Their 10 game rolling xGD and actual GD since 2014 has been trending downwards over time, but the precipitous disconnect between process and outcome in recent matches is unlikely to persist.
They are far from the worst team in the current Premier League when measured over a more prolonged time frame. And although they have given inferior sides a start, it is a start that has been run down in the past.
Supporters will be correct to be pessimistic, Palace are probably more likely to be relegated than not, but the bookies price of 1.53, with an implied probability 65% still leaves their survival chances somewhere around the mid to low 30%'s.
A similar level of success enjoyed by their single cause predecessors mentioned earlier in this post.