Saturday 30 December 2017

Jeff Stelling was Right about xG....For the Wrong Reasons.

Love it or loath it, totally get it or pack it away with opinions such as "foreign managers who don't know the Premier League are rubbish" or simply use it as one component in your predictive market of choice, there's no denying that expected goals made a mark in 2017.

Expected goals is most effective in the long term and in the aggregate, but there's an understandable desire to also parade it for individual games and individual chances.

Jeff Stelling, who only appears to think probabilistically, when lying, fully clothed in bed with a million pounds and a teddy wearing a Hartlepool shirt, may merely have been expressing the well documented caveats of using xG for a single game when he derided the xG thoughts of the Premier League's senior statesman, Arsene Wenger.

Betting on probabilistic outcomes, what are the odds of that!

Using xG rather than actual goals in a single game is simply a more nuanced look at the team process that went into the 90 minutes.

It approaches the difficult question of who "deserved" to win from both a larger sample size than goals, albeit one often twisted by game effects and provides an answer in terms of likelihood, rather than the more palatable, but unattainable level of certainty that has long been expected from TV experts.

1-0 wins can be subject to large amounts of random variation, There's probably even more if you have treated your fans to a 4-3 victory. Whereas 7-0 leaves much less room for doubt as to whom got their just rewards.

If you adopt a Pythagorean wins approach to the goals scored and allowed in these three single game scenarios, you would give a larger proportion of "Pythagorean wins" to the team that won 7-0 than you would the team that won 1-0 and by far the least to the side that triumphed 4-3.

So there is information to be extracted from even basic scorelines that goes beyond wins. draws and losses.

Individual xG chances takes this approach a step further to give indications of whether a team that won 1-0 was fortunate to win or unlucky not to have scored a hatful in competition with the efforts of their defeated opponent.

The most visible flaw of xG can be in individual chances, because although the amount of information available to define an opportunity is large, it is still far from complete.

The broad sweep of xG probabilities, drawn from large historical precursors often trumps an eyetest opinion, particularly where probability is an unfamiliar concept to those using years of footballing knowledge, rather than mathematical models to estimate whether or not a chance should have been converted.

There are also relatively easy to spot examples where lack of collected data has, in a largely automated xG process, generated values that are at odds with reality.

Joe Allen

The above and below examples from Stoke's recent game with WBA, illustrate the problems inherent with calculations made either without a visual check or a more complete set of parameters.

Ramadan Sobhi

 Looked at from the perspective of the WBA keeper, Ben Foster, the post shot xG for Allen's goal is likely higher than the xG for Sobhi's strike, based on placement, power, location, deflection or lack of.

But it is fairly obvious that the absence of Ben Foster himself in the latter shot has in reality elevated Sobhi's effort to a near 100%.

It is the equivalent of an un-fieldable ball in baseball or an un-catachable pass in football, NFL style, simply because of the field position of the designated catcher or saver.

I don't have our xG2 values for each attempt (it's Christmas), but I suspect Foster will be expected to save Sobhi's effort more often than Allen's, in a model that is ignorant of his wayward positioning for the former attempt.

That would be harsh on Foster, acting out his role as auxiliary attacker, chasing an injury time equaliser.

Keeper metrics are based on the savability of attempts on target and once Sobhi got his attempt on target, the true chance of a goal being scored is around 99.9% (to allow for the possibility of the ball bursting prior to crossing the line).

Using Sobhi's goal to evaluate Foster xG over or under performance would immediately put the keeper at a unfair disadvantage.

If we assume the chance finding the net with a weakly hit shot, along the ground, attempting to enter the goal around the centre of the frame, with no deflection (which effectively changes the shot location), taken from wide of the post and level with the penalty spot, is relatively modest in historical precedence, then Foster will already be nearly a goal worse off when comparing his xG goals allowed with his actual goals allowed.

The reality was a shot, that through little fault of his own, Foster was entirely unable to save, whereas the majority of similar attempts upon which models are built, would have featured a more advantageously positioned keeper.

Numerous unrecorded aspects of a goal attempt can greatly change individual xG estimates while still retaining a usefulness when aggregated.

Body shape when attempting to shoot from the striker's perspective, a bizarre trajectory of the flight of the ball, for example, can change actual expected conversion rates, transforming seemingly identical chances into near unsavable certainties or comfortable claims for the keeper.

It's likely that many post shot xG probabilities that are grouped in similar bins actually have a much wider range of true probabilities. They may not be as wrongly classified as the Foster example, but the implied accuracy inherent in multiple decimal places is bound to be an illusion.

There are a couple of ways to attempt to improve this conundrum.

Scrutinising each attempt is one labour intensive option, hoping that events largely even out in the aggregate is another (although randomness isn't always inherently fair).

A third option is to take indicators from the data we do have, that may help to highlight occasions where a chance may have been wrongly classified within a group of similarly computed xG values.

(This is unfortunately where I invoke a rare non disclosure clause).

So what happens to our xG2 keeper ratings if we try to account for factors that we haven't recorded and therefore are absent in our model?

Generally under performing keepers improve, whilst remaining below par and over performers are similarly dragged partway towards a less extreme level.

De Gea and Bravo have been respectively among the best and worst shot stoppers of the last three seasons.

Using models that incorporate much of the post shot information available, such as shot type, power, placement, rudimentary trajectory, deflections etc, de Gea concedes 84 non penalty attempts against a model's average prediction of 95.

For Bravo the numbers are 25 allowed against 15 predicted.

If we concede that some of the attempts that have been aggregated to make up the baseline for each keeper may have been miss-classified, we can apply a correction, based on hints we have in the data we do have, that may reclassify the attempts more accurately.

De Gea's average expected number of goals allowed falls to 92 (still making him above average, but slightly less super human) and Bravo's is given a slightly more forgiving 19 expected goals, rather than 15.

Acknowledging that a model is incomplete has lead to extremes being regressed towards the mean and that's probably no bad thing if these models are to be used to evaluate and project player talent.

Expected Goals is a work in progress tool, not the strawman, full of cast iron claims, that opponents invariably make on the metric's behalf. If you accept the inevitable and often insurmountable limitations, xG can still add much value to any analysis.

Don't be like Jeff, approach xG with an open mind....and also don't go to bed in a suit.

Saturday 23 December 2017

Influential xG & xA Team Players

Expected goals and expected assists are now becoming an established part of player performance stats and rather than post column after column of boring and indigestible numbers, often to two decimal points, I've been presenting the data as a visual.

It seemed logical to plot the xG/90 against the xA/90 with a minimum cutoff for minutes played to mitigate the intrusion of outlandish outliers.

Here's one of my plots for the Premier League, earlier in the season.

Players appearing towards the top left are the league's more prolific providers of an opportunity, while bottom right is populated with players who more often latch onto the final decisive pass.

Players who had been doing quite a bit of both turn up in the top right region of the plot.

It's immediately obvious that Manchester City dominate the plot, as you might expect from their near perfect start to the season and as a record of the league's most prolific attacking contributors, the plot does it's job.

However, while the prominent multiple players from the same teams, notably City and Arsenal are undoubtedly fine players, their individual performance indicators are perhaps made slightly easier to achieve given the quality of their teammates.

De Bruyne's precise passing is also feeding into the xG of his teammates, as are their intelligent runs creating opportunities for him to bolster his xA.

So as a tweak to my original plots, I've now factored in the overall xG and xA/90 of the team for which each player plies his trade. This results in many of the Manchester City players falling back into the pack.

The individual players are still creating and attempting to convert chances at similar rates to the original plot, but such is City's commitment to attacking play (they are averaging upwards of 2.6 NP xG per game) and such is their depth of creative and attacking talent, that they don't have one particularly stand out performer.

Conversely, Peter Crouch, who would be unlikely to feature prominently in a plot that merely quantified his raw xG and xA/90 contribution, shows up as a hugely influential contributor to Stoke's offensively tepid, overall attack once we factor in the Potters' overall team NPxG/90 of barely 1.0.

As an example, Aguero's combined xG/90 & xA/90 of 1.3 from an overall Manchester City combined rate of nearly 5 xG & xA/90, is arguably less influential and more readily replaced from within than is Crouch's combined 0.6 against Stoke's puny overall 1.8 xG+xA/90.

Whilst the heavyweights from Manchester City will undoubtedly take the plaudits in May, it is perhaps the likes of Crouch, Murray, Austin, Carroll and Gross who are striving and greatly contributing the most to keep their lesser sides afloat who also deserve a mention and their own viz.

Friday 22 December 2017

Tackling Success Rate & the Influence of Luck

About four years ago I wrote a post that speculated on the transfer price associated with a group of equally talented players whose success rate in a particular skill had actually been randomly generated.

Each were given a 10% chance of succeeding, each were given 100 opportunities to succeed and the "best" performers were ranked accordingly.

Of course, the difference in success rate was entirely down to randomness.

If you bought the "best" at a premium, you were paying for unsustainable luck. If you bought the "worst", you were getting a potential bargain, if the price reflected the imaginary under performing ability that would likely regress towards 10%.

It's less straightforward when looking a real players.

Players play on different teams, with different tactical setups and different teammates. They probably have varied levels of skill differentials in a variety of skill sets and they have differing number of opportunities to demonstrate their talent or lack of.

Attempting to partly account for the randomness in sampling is most applicable in on field events where there is a simple definition of success or failure.

In such areas as tackles made, raw counting numbers are much more a product of overall team talent and setup, so there has been a tendency to move onto percentage of tackles won, as an outward sign of competence.

Unlike the revolution in scoring and chance creation, where pre-shot parameters are modeled on historical precedence to created expected goals or chances, there is little prospect, given the available data, of similarly modeling expected tackles, dribbles or aerial duels, for example.

But we should at least try to account for the ever present randomness, even in large samples that partly transforms purely descriptive percentage counts into a more informed predictive metric capable of projecting future success rate.

It's easy to be impressed by the eye test that sees four successful tackles made by a player in a single half of football. But aside from draining the tension from the final minutes of a game by declaring said player "man of the match" , as a projection of future performance it is riddled with "luck" and largely unrepresentative of future. larger scale output

To attempt to overcome this, we can work out what a distribution of outcomes would look like if there is no differential in a measured skill within a group of players. We can then compare this distribution to an actual distribution of outcomes where we suspect a differential exists.

For example, in the tacking ability of Premier League defenders.

We can then try to allow for the randomness that may exist in the observed success rate of players who have had differing opportunities to prove there tackling prowess to produce a more meaningful projection.

The more tackles a player has been involved in, the more signal and less noise his raw rate will contain. Whereas in smaller samples, noise will proliferate and perhaps give extremes that will not be representative of any future output.

Here's the raw tackle success rate from the MCFC/Opta data dump from the 2011/12 season.

It lists the 140 defenders involved in the most tackles during the whole of that season. The left hand side of the plot has players with most tackles, moving to the fewest at the right hand side, where more extreme rates, both apparently good and bad. begin to appear.

The second, identically scaled plot has attempted to regress the observed rate towards the mean for the group, based on the differing number of tackle attempts each defender has been involved in.

All of the small sample sized extremes, either good or bad are dragged closer to the the group average, while the larger samples group slightly more tightly, but were clustered more closely to the group mean to begin with.

The first plot illustrates the interplay between randomness and skill. It is at it's most deceptive in smaller sample sizes. It is perfectly adequate as a descriptive stat for defenders, but deeply flawed as a projection of a defender's likely true tackling talent. And the two are often conflated.

While the second plot tries to strip out the differing influences of randomness over different sample sizes to show that there is probably a skill differential for tackling defenders, but it is nowhere near as wide as raw stats imply, even after a season's worth of tackles.

And if you're rating or buying some of the 90%+ success rated tackles based on just 30 or 40 interactions, you're probably staking your reputation on a hefty dose of unsustainable good fortune as they fall back into the pack with greater exposure.

Friday 15 December 2017

How High Might Manchester City Go?

Despite an inglorious 0/1 record, (Stoke to be relegated after one game of their return to the Premier League in 2008) Paddy Power has already paid out on the crowning in 2018 of Manchester City as the Premier League winners.

They are on slightly firmer ground this time around, as not only are City 11 points clear of United, 14 from Chelsea and 18 ahead of the three top six also-rans, Liverpool, Spurs and Arsenal Burnley, they are also one of the best teams in Premier League history.

With "City to win the League" drifting into "Putin to be re-elected as Russian Leader" territory, focus has shifted to secondary betting markets, based around City's likely points total, goals scored or margin by which they will lift the domestic crown.

Whether or not you're interested in the betting dimension, estimating City's degree of dominance can provide a useful exercise in prediction over the long term.

The quick, and usually flawed way to predict a side's end of season statistics is to blindly scale up from those recorded in the season to date.

This approach is rarely useful, as it takes no account of remaining schedule, implies that the 17 matches played by City and each of their 19 rivals is a near perfect indication of what will follow and disregards variance.

Even after the fact of 16 wins and one draw, there was a finite possibility that more than just Everton may have taken something from a daunting meeting with Manchester City.

Future projections should embrace the possibility that their record to date belongs to an excellent team, but one who may have been slightly fortunate to extract a near 100% points haul and allow for the often admittedly small chance that Pep's City may be defeated.

Even a cursory glance at City's remaining fixtures that includes a game against each of the top 6 7 and two meetings with Spurs, should indicate that a single draw interspersed with wins in their remaining games would seem an unlikely scenario up on which to base a projection.

Simulations of the remaining games in the 2017/18 season, give a less rose tinted prediction, while still confirming City's near certainty to lift the title.

Simulations based on expected goals rolling over both this and last season, expect City to gain 98 Premier League points by May. This is completely in line with the current estimates at which their final points total may be bought or sold at a variety of spread betting companies.

Similar ranges are shown for 10,000 simulated outcomes for City's total goals scored and total wins over the 38 game season.

I've also added the scaled up totals based on their record over 17 games being repeated over the remaining 21 and while these blockbusting values do occasionally appear in the simulations, they are relatively high end outliers and inadequate as a most likely projection in mid December.

Saturday 9 December 2017

Know Your Limits

All predictions come with the caveat that there is a spread of uncertainty either side of the most likely outcome.

A side may be odds on to win almost all of their matches over a season, as Manchester City have very nearly shown in 2017/18, but there is a finite, if extremely small chance that they will actually lose all 38 matches.

Similarly, there is a bigger chance that they will win all 38, but the most likely scenario sits between these two extremes and for the current best team in the Premier League, winning the title with around 96 points is the most expected final outcome in May.

While single, definitive predictions are more newsworthy, they imply a precision that is never available about the longer term futures, especially about a sporting contest, such as a Premier League season that comprises low scoring matches spread over 380 games.

It's therefore useful to attach the degree of confidence we have in our predictions to any statements we make about a future outcome, particularly as new information about teams feeds into the system and the competition progresses, turning probabilistic encounters into 0,1 or 3 point actual outcomes.

Here's the range of points which a simulated model of the 2016/17 Premier League came up with using xG based ratings for each team and particularly Swansea before a ball was kicked.

Swansea had been in relative decline since their impressive introduction into the top tier, playing much admired possession football, mainly as a defensive tactic, that had seen then finish as high as 8th in 2014/15, 21 points clear of the drop zone.

2015/16 had seen them fall to 12th, just ten points from the drop zone and much of their xG rating for 2016/17 was based around this less impressive performance.

The top end of their points totals over 10,000 simulations resulted in a top 10 finish with 52 points, but the lower end left them relegated with 27 points and their mode of 36 final points suggested a season of struggle.

And this is illustrated by the dial plot showing well into the red zone signifying relegation.

After ten games, we now have more information, both about Swansea and the other 19 Premier league teams and the most likely survival cut off points in the 2016/17 league.

At the time, Swansea were 19th with five points from ten games and while the grey portion of mid table is still achievable, it has shrunk and the Swans' low point has fallen deeper into the red.

After thirty games, so just eight left, the upper and lower limits for Swansea after the full 38 games has narrowed. They are still more likely than not to be relegated, according to the updated xG model, but there is still some chance that they will survive.

In reality, Swansea were in the bottom three with three games left, but a win for them and a defeat for Hull in game week 36 was instrumental in retaining their top flight status, but it was as close as the final plot suggested it might be.

Adding indications of confidence in your model enhances any information you may wish to convey.

It's also essential when using xG simulations to "predict" the past, such as drawing conclusion about a player's individual xG and his actual scoring record.

Adding high and low limits will highlight if any over or under performance against an average model based simulation is noteworthy or not.

One final point. The upper and lower limits can be chosen to illustrate different levels of confidence, typically 95%. But this does not mean that a side's final points total and thus finishing position has a 95% chance of lying within these two limits.

It is more your model that is on trial.

There is a 95% chance that any new prediction made for a team by your model will lie within these upper and lower limits.

Hopefully, your model will have done a decent job of evaluating a side, in this case Swansea from 2016/17. But if it hasn't, Swansea's actual finishing position may lie elsewhere.

Wednesday 29 November 2017

Over Performers Aren't Always Just Lucky.

Firstly, this isn't another post about whether Burnley are good at blocking shots because "yes they are".

Instead it's about applying some kind of context to levels of over or under performance to a side's performance data. And attempting to attribute how much is the result of the ever present random variation in inevitably small samples and how much is perhaps due to a tactical wrinkle and/or differing levels of skill.

Random variation termed as "luck" is probably the reddest of rags to a casual fan or pundit, disinterested or outwardly hostile to the use of stats to help to describe their beautiful game.

It's the equivalent for anyone with a passing interest in football analytics of "clinical" being used ad nauseam, all the way to the mute button by Owen Hargreaves.

Neither of these two catch-all, polar opposite terms used in isolation are particularly helpful. Most footballing events are an ever shifting, complex mixture of the two.

I first started writing about football analytics through being more than mildly annoyed that TSR (or Total Shot Ratio, look it up) and its supporters constantly branded Stoke as being that offensive mix of "rubbish at Premier League football" and constantly lucky enough to survive season after season.

And then choosing the Potters as the trendy stats pick for relegation in the next campaign as their "luck" came deservedly tumbling down.

It never did.

Anyone bothered enough to actually watch some of their games could fairly quickly see that through the necessity of accidentally getting promoted with a rump of Championship quality players, Stoke or more correctly Tony Pulis, were using defensive shapes and long ball football to subvert both the beautiful game and the conclusions of the helpful, but deeply flawed and data poor, TSR stat.

There weren't any public xG models around in 2008. To build one meant sacrificing most of Monday collecting the data by hand and Thursday as well when midweek games were played.

But, shot data was readily available, hence TSR.

At its most pernicious, TSR assumed an equality of chance quality.

So getting out-shot, as Stoke's setup virtually guaranteed they would be every single season, was a cast iron guarantee of relegation once your luck ran out in this narrow definition of "advanced stats",

Quantifying chance quality in public was a few years down the road, but even with simple shot numbers, luck could be readily assigned another constant bedfellow in something we'll call "skill".

There comes a time when a side's conversion rate on both sides of the ball is so far removed from the league average rates that TSR relied upon that you had to conclude that something (your model) was badly broken when applied to a small number of teams.

We don't need to build an xB model to see Burnley as being quite good at blocking shots, just as we didn't need a labouriously constructed expected goals model to show that Stoke's conversion disconnects were down to them taking fewer, good quality chances and allowing many more, poorer quality ones back in 2008.

Last season, the league average rate at which open play attempts were blocked was 28%. Burnley faced 482 such attempts and blocked 162 or 34%

A league average team would have only blocked 137 attempts under a naive, know nothing but the league average, model.

Liverpool had the lowest success rate under this assumption that every team has the same in built blocking intent/ability. They successfully blocked just 21% of the 197 opportunities they had to put their bodies on the line.

You're going to get variation in blocking rate, even if each team has the same inbuilt blocking ability and the likelihood of a chance being blocked evens out over the season.

But you're unlikely to get the extremes of success rates epitomized by Burnley and Liverpool last season.

You'll improve this cheap and cheerful, TSR type blocking model for predictive purposes by regressing towards the mean both the observed blocking rates of Liverpool and Burnley.

You'll need to regress Liverpool's more because they faced many fewer attempts, but the Reds will still register as below average and the Claret and Blues above.

In short, you can just use counts and success rates to analysis blocking in the same way as TSR looked at goals, but you can also surmise that the range and difference in blocking ability that you observe may be down to a bit of tactical tinkering/skillsets as well as randomness in limited trials.

In the real world, teams will face widely differing volumes, the "blockability" of attempts will vary and perhaps not even out for all sides and some managers will commit more potential blockers, rather than sending attack minded players to create havoc at the other end of the field.

With more data, and I'm lucky to have access to it in my job, you can easily construct an xB model. And some teams will out perform it (Burnley). But rather than playing the "luck" card you can stress test your model against these outliers.

There's around a 4% chance that a model populated with basic location/shot type/attack type parameters adequately describes Burnly's blocking returns since 2014.

That's perhaps a clue that Burnley are a bit different and not just "Stoke" lucky.

The biggest over-performing disconnect is among opponent attempts that Burnley faced that were quite likely to be blocked in the first place. So that's the place to begin looking.

And as blocking ability above and beyond inevitably feeds through into Burnley's likelihood of conceding actual goals, you've got a piece of evidence that may implicate Burnley as being a more acceptable face of over-performance in the wider realms of xG for the enlightened analytical  crowd to stomach than Stoke were a decade ago.

Wednesday 22 November 2017

An xG Timeline for Sevilla 3 Liverpool 3.

Expected goals is the most visible public manifestation of a data driven approach to analyzing a variety of footballing scenarios.

As with any metric (or subjective assessment, so beloved of Soccer Saturday) it is certainly flawed, but useful. It can be applied at a player or team level and can be used as the building block to both explain past performance or track and predict future levels of attainment.

Expected goals is at its most helpful when aggregated over a longer period of time to identify the quality of a side's process and may more accurately predict the course of future outcomes. rather than relying on the more statistically noisy conclusion that arise from simply taking scorelines at face value.

However, it is understandable that xG is also frequently used to give a more nuanced view of a single game, despite the intrusion of heaps of randomness and the frequent tactical revisions that occur because of the state of the game.

Simple addition of the xG values for each goal attempt readily provides a process driven comparison against a final score, but this too has obvious, if easily mitigated flaws.

Two high quality chances, within seconds of each other can hardly be seen as independent events, although a simple summation of xG values will fail to make the distinction.

There were two prime examples from Liverpool's entertaining 3-3 draw in Sevilla, last night.

Both Firmino goals followed on within seconds of another relatively high quality chance, the first falling to Wijnaldum, the second to Mane.

Liverpool may have been overwhelming their hosts in the first half hour, they were alert enough to have Firmino on hand to pick up the pieces from two high quality failed chances, but a simple summation of these highly related chances must overstate Liverpool's dominance to a degree.

The easy way around this problem is to simulated highly dependent scoring events as such, to prevent two goals occurring from two chances separated by one or two seconds.

It's also become commonplace to expand on the information provided by the cumulative xG "scoreline" by simulating all attempts in a game, with due allowance for connected events, to quote how frequently each team wins an iteration of this shooting contest and how often the game ends stalemated.

Here's the xG shot map and cumulative totals from last night's match from the InfoGolApp.

There's a lot of useful information in the graphic. Liverpool outscored Sevilla in xG, they had over half a dozen high quality chances, some connected, compared to a single penalty and other, lower quality efforts for the hosts.

Once each attempt is simulated and the possible outcomes summed, Liverpool win just under 60% of these shooting contests, Sevilla 18%, with the remainder drawn.

Simulation is an alternative way of presenting xG outputs rather than as totals that accounts for connected events, the variance inherent in lots of lower quality attempts compared to fewer, better chances and also  describes most likely match outcomes in a probabilistic way that some may be more comfortable with.

Liverpool "winning" 2.95-1.82 xG may be a more intuitive piece of information for some (although as we've seen it may be flawed by failing to adequately describe distributions and multiple, common events), compared to Liverpool "winning" nearly 6 out of ten such contests.

None of this is ground breaking, I've been blogging about this type of application for xG figures for years, But there's no real reason why we need to wait until the final whistle to run such simulations of the attempts created in a game.

xG timelines have been used to show the accumulation of xG by each team as the game progresses, but suffer particularly from a failure to highlight connected chances.

In a simulation based alternative, I've run 10,000 attempt simulations of all attempts that had been taken up to a particular stage in last night's game.

I've then plotted the likelihood that either Liverpool or Sevilla would be leading or the game would be level up based on the outcome of those attempt simulations.

Liverpool's first dual attempt event came in the first minute. Wijnaldum's misplaced near post header, immediately followed by Firmino's far post shot.

Simulated as a single event, there's around a 45% chance Liverpool lead, 55% chance the game is still level and (not having had an attempt yet) a 0% chance Sevilla are ahead.

If you re-run the now four attempt simulation following Nolito's & Ben Yedder's efforts after 19 minutes, a draw is marginally the most likely current state of the game, followed by a lead for either team.

A flurry of high quality chances then make the Reds a near 90% to reach half time with a lead, enabling the halftime question as to whether Liverpool are deservedly leading to be answered with a near emphatic, yes.

Sevilla's spirited, if generally low quality second half comeback does eat into Liverpool's likelihood of leading throughout the second half, but it was still a match that the visitors should have returned from with an average of around two UCL points.

Sunday 22 October 2017

Excitement Quotas in the Premier League.

Excitement at a sporting event is a subjective measurement.

It doesn't quite equate to brilliance, as a 7-2 thrashing has to be appreciated for the excellence of the performance of one of the teams, but as the score differential climbs, morbid fascination takes over, at least for the uncommitted.

Nor does it tally with technical expertise. A delicately crafted passing movement doesn't quite set the pulse racing like a half scuffed close range shot that deflects off the keepers knee and loops agonisingly over the bar with the game on the line.

You can attempt to quantify excitement using a couple of benchmark requirements.

The game should contain a fair number of dramatic moments that potentially might have changed the course of the outcome or actually do lead to a significant alteration to the score.

It's easy to measure the change in win probability associated with an actual goal.

A goal that breaks a tied game in the final minutes will advance the chances of the scoring team by a significant amount, whilst the seventh goal in a 7-2 win merely rubs salt into the goal difference of the defeated side.

Spurned chances at significant junctures are only slightly more difficult to quantify.

You can take a probabilistic view and attach the likelihood that a chance was taken based on the chance's expected goals figure to the effect that an actual goal would have had on the winning chances of each side.

Summing the actual and probabilistic changes in win probability for each goal attempt in each match played in the 2016/17 Premier League season gives the five most "in the balance", chance laden matches from that season.

                               Top Five Games for Excitement 2016/17 Premier League

No surprise to see the Swansea/Palace game as the season's most exciting encounter, with Palace staging a late comeback, before an even later Swansea response claimed all three points in a nine goal thriller.

Overall I've ranked each of the 380 matches from 2016/17 in order of excitement as measured by the actual and potential outcomes of the chances created by each team in the game

Bournemouth's games had the biggest share of late, game swinging goals, along with the most unconverted endeavour when the match was still in the balance.

While Tottenham, despite playing in the season's second most exciting game, a very late 3-2 win over West Ham, more typically romped away with games, leaving the thrill seekers looking for a match with more competitive balance to tune into.

Middlesbrough fans not only saw their side relegated, but they did so in rather bland encounters, as well.

Saturday 14 October 2017

Player Projections. It's All About The Distribution Part 15

A couple of football analytics' little obsessions are correlations and extrapolations.

Many player metrics have been deemed flawed because they fail to correlate from one season to the next, but there are probably good reasons why the diminished sample sizes available for individuals lead to poor season on season correlation.

Simple random variation, players suffer injury, a change in team mates or role within a club, atypically small sample sizes often lead to see sawing rate measurements and inevitably players age and so can be on a very different career trajectory to others within the sample.

The problems associated with neglecting the age profile of a group of players when attempting to identify trends for use in future projections is easily demonstrated by looking at the playing time (as a proxy for ability) enjoyed by players who were predominated aged 20 and 30 when members of a Premier League squad and how that time altered in their 21st and 31st years.

The 30 year oldies played Premier League minutes equivalent to 15 full matches, falling to 12 matches in their 31st year. So they were still valued enough to play fairly regularly, but perhaps due to the onset of decline in their abilities they featured, on average, less than they had done.

The reverse, as you may expected was true for the younger players. They won the equivalent of seven full games in their 20th year and nine the following season.

It seems clear that if you want to project a player's abilities from one season to the next and playing time provides a decent talent proxy, you should expect improvement from the youngster and decline from the older pro.

However, as with many such problems, we might be guilty of attempting to impose a linear relationship onto a population that is much better defined by a distribution of possible outcomes.

The table above shows the range of minutes played by 21 and 31 year olds who had played 450 minutes or fewer in the previous season as 20 or 30 year old players.

As before, we may describe the change in playing time as an average. In this subset, the older players play very slightly more than they did as 30 year olds, the equivalent of two games, improving to 2.2.

The younger players jump from 1.8 games to 3.6.

However, just as cumulative xG figures can hide very different distributions, particularly of big chances which subtly alter our expectation for different teams, the distribution of playing minutes that comprise the average change of playing time can be both heavily skewed and vary between the two groups.

Over three quarters of 30 year old didn't get on the field at all during the next Premier League season, likewise 2/3 of the younger ones..

21% of young players played a similar amount of time to the previous season, between one and 450 minutes, compared to just 14% of the older ones. And 17% of youngsters exceeded the total from the previous season, as did just 10% of the veterans.

So if you use the baseline rate of increased playing time as a flat rate across all players that fall into these two categories in the future, you might be slightly disappointed, because overwhelmingly the experience of such players is one where they fail to play even a minute in the following season.

Knowing that there is an upside, on average for these two groups of players, based on historical precedent is a start, but knowing that 3 out of 4 the oldies and 2 out of 3 youngsters who you are considering didn't merit one minutes worth of play in an historical sample is also a fairly important, if not overriding input. 

Wednesday 11 October 2017

World Cup Qualification So Far.

To save my Twitter feed from viz overload, here's a couple of plots from the completed World Cup qualifiers.

FIFA ratings usually get a good kicking, but if you know their limitations they do a decent job and have done in predicting the qualifying teams so far for 2018.

Some higher rated teams will miss out, it's only 10 games in some cases, after all.

But if you want a benchmark FIFA rating at the time qualifying began in 2015, the definite qualifiers had a median rating of 891.

Those still waiting on a playoff were rated 676 and those rooting for other countries were 464.

Check your country and see if they ended up roughly in the position they deserved based on 2015 FIFA rankings.

FIFA don't seem to want you to find historical ratings, but to the best of my knowledge these were the ratings each side had in October 2015, apart from the three I couldn't find & made up.

Sunday 8 October 2017

Premier League Age Profiles Through the Ages

I found some data I collected but never got round to analysing for the joint OptaProForum presentation with Simon Gleave a few years ago.

It simply consists of minutes played by each age group in the four highest tiers of English domestic football.

There are a variety of methods to describe the ageing curve in football, where players initially show improvement, peak and then decline with age. I prefer the delta approach, which charts the change of a variety of performance related indicators or their proxies.

We may condense the age profile of a team or league down into three main groups. Young players, under 24 who are still improving,

Peak age performers from around 24 to 29 and ageing players of 30 or more, who may still be good enough to command some playing time, but are diminishing compared to their own peak levels.

Using the amount of playing time allowed to each of the three groups as a performance proxy, the peak age group of Premier League players have been increasing their share at the expense of both the younger and older groups since 2004/05. Peak share has risen from 48% of the available playing time at the start of the period to 60% by 2014/15.

The wealth of the Premier League and the limited alternative destinations for the best, prime aged talent would appear to be a reasonable cause for this increase. Perhaps only Spain's Barcelona and Real Madrid (Suarez and Bale) account for the few realistic destinations for peak age, Premier League talent.

By contrast, League Two, the fourth tier of English football, appears to have a very different age profile.

Here, youth and peak aged players share playing time, with 30 & over players lagging well below these levels, implying a different market further down the pyramid.

Players are not being recruited from the extreme right hand tail of the talent pool, so more options of similar ability are available and there is also an extensive pool of buyers in the two or three divisions immediately above League Two, ready to take on the cream of the peak age performers.

Finally here's the plots for the best Premier League teams compared to the remainder of the clubs.


Peak shares are similar for both groups, but the top teams have played a larger share of (talented) younger players, while the remainder of the Premier League have swayed slightly more towards experience (perhaps ageing players from the top teams dropping in grade, but remaining in the Premier League).

Crouch at Stoke, for example.

Liverpool's individual profile appears to illustrate how their age profile has remained similar to the average for top Premier League teams across the 11 seasons.

Over 30's make up the lowest proportion of playing time, followed by younger players and topped of by peak age talent.

30+ contribution falls away, to be replaced by ageing peak age talent, which in turn is refreshed by maturing younger players. Replacement buys can then be made in the 22-24 range to continue the cycle.

By contrast, Everton has chosen to largely swap around the over 30 group and the under 24 group, leading to seasons where older players dominate.

Wednesday 4 October 2017

Quick & Dirty Strength of Schedule.

I've recently posted some xG, strength of schedule adjusted figures for the Premier League and justin_mcguirk has asked for a method.

The sos values have been intended to be purely descriptive, rather than attempting to more accurately portray underlying team quality.

But intuitively you can look at WBA's start where they've not faced one genuine title contender, in Bournemouth, Burnley, Stoke, Brighton, WHU, Arsenal and Watford and compare it to Everton's lucky seven of Stoke, Man City, Chelsea, Spurs, ManUtd, Bournemouth and Burnley and immediately think that Everton's start has been more difficult than that of the Baggies.

Strength of schedule can be calculated using a steal from the NFL, particularly so called least squares Massey ratings..

In the case of the Premier League, each teams schedule is laid out, followed by a performance parameter, such as goal or expected goal difference. The seven inputs (the teams they've played out of a possible twenty for each team) are then calculated, such that the errors arising when trying to solve each of the twenty simultaneous equations are reduced to a minimum.

The maths is doable using matrices, although 20x20 matrices can sometimes resist inversion and I'm sure many packages will undertake the heavy lifting as well.

For those who would like a simpler and probably equally informative approach you can average the goal or expected goal difference of the seven teams a side has played.

These seven teams will have played 49 matches, admittedly seven will be against the side whose strength of schedule you are attempting to estimate, but their 49 games will have been against a broadly league representation.

Here's the sos table using this method after six games. It is broadly similar to the one I posted on Twitter after 7 and using a least squares approach.

Everton still had the toughest start & WBA the easiest. Chelsea moved up towards a more taxing unbalanced schedule by hosting Man City as did Palace visiting Manchester United.

Also more information about each team and their opponents has become available after seven games.

Finally, here's the individual calculations for WBA & Everton. Stoke's xG for after 6 games was 5.2 and they'd allowed 9.2 xG.

Data from @InfogolApp

Tuesday 3 October 2017

Crystal Palace.....The Only Way Is Up.

A quick post to try to put Crystal Palace's current predicament into some kind of historical context.

In terms of points, they've (obviously) had the worst start through seven matches in the lifetime of the 20 team Premier League.

Zero points, zero goals and not one iota of friendly randomness to break their duck in either category, despite bad, but not completely hopeless xG figures.

Particularly in chances created.

Points won are just one factor in determining how bad a side has started their campaign. The aim of the majority of teams in the Premier League is to simply stay in it for next season and your proximity to your nearest rivals is therefore just as important as merely your own points total.

One this basis, there's arguably a few teams ahead of Palace in claiming the worst initial seven game record.

Southampton in 1998/99, Portsmouth in 2009/10, their administration year and Sunderland in 2013/14 could be considered to have been worse off than Palace are now. Each may have won more points than Palace has, but Palace are marginally closer to both their immediate rivals and even mid table than were this trio.

Also, poor starts aren't an automatic ticket to the Championship.

50% of the 20 worst placed teams, compared to their 19 rivals after seven matches managed to stay up, although conversely the better the start, the more likely survival becomes.

27 teams have been comfortably placed equidistant from the leaders and the 20th placed side after seven matches and four ultimately fell through the trapdoor. But after that it became plain sailing and survival has been universal.

If we use Palace's proximity to their rivals as a measure of their start and compare the fate and the ranking of all teams in the 20 team Premier League era after seven games, there is more than a glimmer of hope.

Based on historical precedent and that alone, Palace have around a 28% chance of escaping relegation.

Of course a side is not relegated just on a single statistic. Injuries, the January window and their underlying stats all contribute to the reckoning in May.

Palace have had around the fourth toughest start in terms of opposition faced. It gets much less arduous after the play Chelsea in game eight, but they haven't enjoyed good luck with injuries to key attackers.

Their 10 game rolling xGD and actual GD since 2014 has been trending downwards over time, but the precipitous disconnect between process and outcome in recent matches is unlikely to persist.

They are far from the worst team in the current Premier League when measured over a more prolonged time frame. And although they have given inferior sides a start, it is a start that has been run down in the past.

Supporters will be correct to be pessimistic, Palace are probably more likely to be relegated than not, but the bookies price of 1.53, with an implied probability 65% still leaves their survival chances somewhere around the mid to low 30%'s.

A similar level of success enjoyed by their single cause predecessors mentioned earlier in this post.

Saturday 23 September 2017

30 Year old Messi is Likely in Decline.

'Tis the season for small sample sized hyperbole to be liberally launched on a expectant audience and the latest recipient of the "If he continues at this rate" award for unrealistic dreamland is none other than Lionel Messi.

While Ronaldo has been kicking his heels and the occasional Real Betis player, Messi has single-handedly (with the help of 10 teammates) launched Barcelona seven points clear of their perennial rivals from Madrid.

Messi turned 30 in the close season, he's playing in his 14th La Liga season and is undoubtedly one of the two best players of the last decade.

But he is still human and bound by the natural athletic decline that eventually sets in for every footballer.

Players improve with maturity and experience, peak, usually in their late twenties and then begin an inexorable decline, albeit from differing peaks.

Messi's post birthday, six game return in the UCL and La Liga, but discounting a two legged Spanish Super Cup defeat at the hands of Ronaldo's Madrid, has been spectacular, even by his standards.

It has spawned at least one article, liberally salted with stats to enhance credibility, eagerly anticipating the untold riches to come.

Unfortunately, five or six games is so small that you will inevitably get extremes of performance, either very good or very bad.

Particularly, if you selectively top and tail the games to eliminate a comprehensive defeat, devoid of any Messi goals from open play at the hands of your nearest rivals, but conclude with a three open play scoring performance from the Argentine.

Small samples are noisy, unbalanced and rarely definitively indicative of what will happen in the longer term or even just a single season.

Barcelona has played Alaves, Eibar, Getafe, Espanyol and Betis, only the latter is currently higher than 13th.

As a data point it is all but useless to project Messi's 2017/18 season.

Individual careers are statistically noisy. Injury, shifted positional play and team mate churn are just some of the factors that can make for an atypical seasonal return, even before we try to decide which metric is sufficiently robust to reflect individual performance.

If we use goals and assists to judge Messi up to his 30th birthday, his delta, the change in non penalty goals and assists per 90 from one season to the previous season trends negative when Messi was 27, guesstimating this was when he peaked.

If we include 2017/18's small sample sized explosion as a fully developed rate for this upcoming season, the trendline still becomes negative this year.

If we regress this current hot rate towards Messi's most recent deltas, as we should, Messi's peak stretches to his 28th birthday.

But by his own standards he has likely peaked.

Open play goals and expected goals for the last three and the first 5 games of 2017/18 tell a similar gentle decline, even allowing for Messi's recent spurt of scoring.

Actual, non penalty, open play goals/90 are trending downwards, as are Messi's xG per 90 on a 10 game rolling average.

The actual trendline is also probably more shallower because of the narrative driven choice of his three open play goal spree against Eibar providing the doorstop.

That Messi consistently over performs the average player xG isn't surprising, but the peaks, like the one he's currently enjoying is often driven by a glut of relegation threatened sides turning up in Barcelona's lumpy quality of schedule.

Enjoy the blips, but don't draw conclusions based on so little evidence.

Data from Infogolapp.

Sunday 10 September 2017

Messi and Ronaldo. Expected Goals Makers, Takers or a Bit of Both.

With the increased availability of granular data, there has been a similar influx of advanced metrics, both for players and sides across a wider range of domestic leagues.

And while performance based numbers, often to a couple of decimal places, are the raw material for much of the analytically based content, their attractiveness and clarity of meaning rarely extend beyond the spreadsheet.

It therefore falls to visualisations to convey some of the rich seams of information available in such manipulated data sets in a clear and easily digestible format, such as Ted Knutson's  hugely popular radars.

Expected goals remain the flavour of the month, although BBC pundits are still immune, imploring players to "do better" with opportunities that are scored fewer than one time in 10.

A team or individual's attacking contribution can be neatly summarised by their expected goals and assists, standardised at least to a per 90 figure, with respect given to those who have achieved their numbers over a larger sample size compared to noisy small sample interlopers, ripe for regression.

Here's the xG/90 and xA/90 for the 70 largest cumulative, goal involvement achievers from La Liga's 2016/17 season.

Data is from @InfogolApp and has been restricted to open play chances and assists.

Messi and Ronaldo are among a clutch of players who have broken away from the main body of the plot, although they are also quite a distance remove from each other.

Messi was involved in around 0.85 xg+xA per 90 and Ronaldo around 0.65.

However, the former, while slightly under-performing against the latter in getting on the end of xG scoring chances, more than compensated by creating over double the amount of expected assists per 90.

So a simple scatter plot can begin to reveal fundamental differences between even the most high profile of players.

More information can be extracted by simply running a straight line between a particular player's point on a scatter graph and the origin.

Moving down such a line, you'll encounter players who in the season under scrutiny, achieved ratios for xg and xA that closely resemble those of the line owning player.

The magnitude of their cumulative performance is less than those players that are further away from the origin, but their shot/assist characteristics will be consistent with any near neighbours.

Messi was a more sharing team mate in open play in 2016/17, whereas Ronaldo headed the line of takers, rather than makers.

Friday 8 September 2017

Shot Blocking and the State of the Game.

It has long been appreciated that the dynamics of a game subtly alters as time elapses, scorelines alter or remain the same and pre match expectations are met, exceeded or under shot.

This shifting environment has traditionally been investigated using the simple measure of the current score.

This has been unfortunately labelled as games state, when simply "score differential" would have both succinctly described the underlying benchmark being applied, without hinting at a more nuanced approach than just subtracting one score from another.

As I blogged here, the problem is most acute when lumping the not uncommon, stalemated matches together.

Consider a game between a strong favourite and an outsider that finishes goalless.

Whereas the latter more than matches their pregame expectation, the former falls disappointingly short of theirs.

The average expectation at any point in a game can be represented in a number of ways, but perhaps the most intuitive is an estimation of the average number of points a team will pick up based on the relative strengths of themselves and their opponent, at the current scoreline and with the time that remains.

The plot above shows the relative movement of the expected points for a strong favourite playing weaker opposition to a 0-0 conclusion.

The favourite would expect to average around 2.5 points per match up at kick off, decaying exponentially to one actual point at full time.

So at any point in the match we can measure the favourite's current expectation compared to their pregame benchmark and use this to describe their own level of satisfaction with the state of the game.

Game state would be preferable, but that's already taken.

The same is true for the outsider. Their state of the game gradually increases compared to their much reduced pregame expectation.

Although the game is scoreless throughout for each side, things are getting progressively worse for the favourite and better for their opponents.

We can use these shifting state of the game environments to see if they have an effect on in game actions.

Intuitively you would expect the team doing less well compared to their expectations to gradually commit more resources to attack, in turn forcing their opponents onto the defensive.

This may increase shot volume for the former, but it is also likely that these attempts, particularly from open play will fall victim to more defensive actions, such as blocks.

The reverse would seem likely to be true for the weaker team. Although their shot count may fall, with less defensive duties being carried out by their opponents, their sparser shot count may evade more defensive interventions, again such as blocks.

Here's what the modelled fate of a shot from regular play from just outside the penalty area in a fairly central position looks like between two unequal teams as the match progresses.

Data is from a Premier League season via @infogolApp

In building the model, the decay in initial expectation has been used to describe the state of the game for the attacking team when each individual shot was attempted, rather than simply using score differential.

Initially the weaker team is less likely to have their shot blocked, although it is probably more accurate to say that the favoured side is more likely to suffer this fate.

As the game progresses, the better team sees a slight increase in the likelihood that a shot from just outside the box is blocked, perhaps suggesting that their opponents are initially heavily committed to a defensive structure.

The weaker side has a lower initial likelihood that such a shot is blocked, again implying a more normal amount of defensive pressure early in the game. But as the match progresses this likelihood that their shots are blocks falls even more.

This nuanced model appears to be illustrating the classic potential for a prolonged rearguard action from an underdog, followed by a late smash and grab opening goal, mitigated by the relative shot counts from each team.

Tuesday 5 September 2017

Premier League Defensive Profiles.

Heat maps and the like have been around for ages as a way of visualising the sphere of a particular players influence.

However, it's always nice to have some numerical input to work with, so I've used the Opta event data that powers InfoGol's xG and in running app to develop metrics that describe how teams and individuals contribute over a season.

Defensive metrics have lagged well behind goals and assists, so I looked at that neglected side of the ball.

Unlike goal attempts, counting defensive stats tends to be a fairly futile exercise. No one willingly wants to keep making last ditch tackles and racking up ever higher defensive events is more often the sign of a team in trouble.

There's also the disparity in possession time which gives the possession poor team more chances to accrue defensive events.

Therefore, pitch position, rather than bulk events seems an obvious alternative.

Allowing a side lots of touches deep in your territory is intuitively a bad idea and the higher up the field a side is willing or able to engage their opponent would appear preferable.

Measurements have been calculated from the Opta X, Y point of an event to the centre of a team's own goal line.

Thus a tackle or clearance made on the half way line will be further from this point of reference if it is made near the touchline compared to if it completed on the centre spot.

This allows for defensive event profiles for both a team and also their opponents.

A quick eye test appears to show that the more successful Premier League teams do their defending further away from their own goal than the lesser sides are either willing or able to do.

That the idea that doing defensive stuff higher up the pitch is the product of a good team is further developed by plotting where a side defends on average and where they allow their opponents to defend, again on average.

The relegated teams from 2016/17 mostly suffered the doubly whammy of choosing or having to defend an average of around 34 yards from the centre of their own goal line compared to nearly 40 yards for some of the top 6 and they also allowed their opponents the luxury of making defensive actions around 38 yards from their own goal line.

Notably Pulis again muscles into an area apparently reserved for relegation fodder with his defensive voodoo.

At a player level it's a trivial problem to find the average pitch position where he makes a defensive action and then find how closely or far flung each individual action is from this average point.

These numbers can then be used as the average position for a player's defensive contribution, measured from the centre of his own goal and also how widely this area extends to.

N'Golo Kante's an obvious candidate to see if this simple exercise again passes the eye test.

In 2016/17 the average pitch position for Kante's defensive actions was 45 yards from his own goal.

The average distance between this average position and all the defensive actions he made was 23 yards

The latter was greater than the average for all defensive midfielders as a group.

We could perhaps say that Kante was relatively advanced in his defensive actions (he was seven yards further up field that his former team mate Nemanja Matic) and his field of influence was also more expansive compared again to Matic and his peers.

Charlie Adam, by contrast appears more constrained by the role required from him. In 2016/17 he tackled deeper than both Kante and Matic and strayed less far afield.

He more resembled a disciplined central defender in his defensive foraging and in doing so remained roughly where his energy bar lands on the pitch around the 70th minute.

Wednesday 23 August 2017

Chance Quality From 1999.

Back in the late 90's when Gazza's career was on the wane and what might become football analytics was mainly done in public on gambling newsgroups, shot numbers where the new big thing.

"Goal expectation", calculated from a weighted and smoothed average from a side's actual number of goals from their last x number of matches, was often the raw material to use to work out the chances of Premier League high flyers, Leeds beating mid table Tottenham.

Shot numbers (which included headers) then became the new ingredient to throw into the mix and a team's shooting efficiency quickly became a go to stat.

Multi stage precursors to goal expectation models where further developed when shot data became available which was broken down into blocks, misses and on target attempts.

To score, a side had to avoid having their shots blocked, then get them on target and finally beat David James.

This new data allowed you to attach team specific probabilities to each stage of progression towards a goal and arrive at a probabilistic estimate of a team's conversion rate per attempt.

Unlike today's xG number, the figure told you nothing specific about a single shot, nor was it particularly useful in helping to describe the outcome of a single game, even with double digit attempts.

Aggregated over a larger series of matches by necessity, this nuanced conversion rate, that included information about a side's ability to avoid blocks, get their efforts on target and thereafter into the goal, allowed you to deduce something about a side's preferred attacking and defensive style.

Also if that preference persisted over seasons, this team specific conversion rate could be used alongside each team's raw shot count in the recent past to create novel, up to date and hopefully predictive set of defensive and attacking performance ratings.

Paper and pencil only lasts slightly longer than today's hard drive, so unfortunately I don't have any "goal expectation" figures for Liverpool circa 2002.

However, with the additional, detailed data from 2017, I decided to re-run these turn of the century, slightly flawed goal expectation models to see if these old school, team specific conversion rates offer anything in today's more data rich climate.

To distinguish them from today's xG I've re named the output as "chance quality".

Chance quality is an averaged likelihood that a side would negotiate the three stages needed to score.

Arsenal had the highest average chance quality per attempt in 2015/16.

The Gunners were amongst the most likely to avoid having their attempts blocked, those that weren't blocked were most likely to be on target and those that were on target were most likely to result in a goal.

Leicester, in their title winning season also created high quality chances per attempt, but Tottenham appeared to opt for quantity verses quality. They were mid table for avoiding blocks and finding their target, but their on target attempts were, on average among the least likely to result in a goal.

Only Palace of the surviving sides were less likely to score with an on target attempt than Spurs.


Here's the same chance quality per attempt, but for attempts allowed, rather than created by the non relegated teams from the 2015/16 season.

The final two columns compare the estimated goal totals for each team using their shot count in that season and their conversion, chance quality from the previous year, to their actual values.

The thinking back in 2000 was that conversion rate from a previous season remained fairly consistent into the next season and so multiplying a side's chance quality by the number of shots they subsequently took or allowed would give a less statistically noisy estimate of their true scoring abilities.

Here's the correlation between the estimated and actual totals using chance quality from 2015/16 and shot numbers from 2016/17 to predict actual goals from 2016/17.


There does appear to be a correlation between average chance quality in a previous year, attempts made the next season and actual goals scored or allowed.

The correlation is stronger on the defensive side of the ball, perhaps suggesting less tinkering with the back 3, 4 or 5.

With full match video extremely rare in 2000, it might have been tempting to assume chance quality had remained relatively similar for most sides and any discrepancy between actual and predicted was largely a product of randomness.

Fortunately, greater access to granular data, availability of extensive match highlights and Pulisball, as a primitive benchmark for tactical extremes, has made it easier to recognise that tactical approaches and chance quality often varies, particularly if there is managerial change.

In this post I compared the distribution of xG for Stoke under Pulis' iron grip (fewer, but high chance quality attempts) and his successor Mark Hughes (higher attempt volumes, but lower quality attempts).

Subsequently, under Hughes, Stoke have tended to morph towards the Hughes ideal and away from Pulis' more occasional six yard box offensive free for all.

So a change of manager could lead a a genuine increase or decrease in average chance quality, which in turn might well alter a side's number of attempts. And any use of an updated version of chance quality should come with this important caveat.

For anyone who wants to party like it's 1999, here's the average chance quality per attempt from the 2016/17 season using this pre-Twitter methodology allied to present day location and shot type information.

Use them as a decent multiplier along with shot counts to produce a proxy for the more detailed cumulative xG now available during the upcoming season or as a new data point to assist in describing a side's tactical evolution across seasons.

In 2016/17, Crystal Palace improved their chance quality compared to 2015/16 with half a season of Allardyce and Arsenal maintained their reputation for trying to walk the ball into the net.

All data is from infogolApp, where 2017 expected goals are used to predict and rate the performance of teams in a variety of leagues and competitions.