Saturday 30 December 2017

Jeff Stelling was Right about xG....For the Wrong Reasons.

Love it or loath it, totally get it or pack it away with opinions such as "foreign managers who don't know the Premier League are rubbish" or simply use it as one component in your predictive market of choice, there's no denying that expected goals made a mark in 2017.

Expected goals is most effective in the long term and in the aggregate, but there's an understandable desire to also parade it for individual games and individual chances.

Jeff Stelling, who only appears to think probabilistically, when lying, fully clothed in bed with a million pounds and a teddy wearing a Hartlepool shirt, may merely have been expressing the well documented caveats of using xG for a single game when he derided the xG thoughts of the Premier League's senior statesman, Arsene Wenger.

Betting on probabilistic outcomes, what are the odds of that!

Using xG rather than actual goals in a single game is simply a more nuanced look at the team process that went into the 90 minutes.

It approaches the difficult question of who "deserved" to win from both a larger sample size than goals, albeit one often twisted by game effects and provides an answer in terms of likelihood, rather than the more palatable, but unattainable level of certainty that has long been expected from TV experts.

1-0 wins can be subject to large amounts of random variation, There's probably even more if you have treated your fans to a 4-3 victory. Whereas 7-0 leaves much less room for doubt as to whom got their just rewards.

If you adopt a Pythagorean wins approach to the goals scored and allowed in these three single game scenarios, you would give a larger proportion of "Pythagorean wins" to the team that won 7-0 than you would the team that won 1-0 and by far the least to the side that triumphed 4-3.

So there is information to be extracted from even basic scorelines that goes beyond wins. draws and losses.

Individual xG chances takes this approach a step further to give indications of whether a team that won 1-0 was fortunate to win or unlucky not to have scored a hatful in competition with the efforts of their defeated opponent.

The most visible flaw of xG can be in individual chances, because although the amount of information available to define an opportunity is large, it is still far from complete.

The broad sweep of xG probabilities, drawn from large historical precursors often trumps an eyetest opinion, particularly where probability is an unfamiliar concept to those using years of footballing knowledge, rather than mathematical models to estimate whether or not a chance should have been converted.

There are also relatively easy to spot examples where lack of collected data has, in a largely automated xG process, generated values that are at odds with reality.

Joe Allen

The above and below examples from Stoke's recent game with WBA, illustrate the problems inherent with calculations made either without a visual check or a more complete set of parameters.

Ramadan Sobhi

 Looked at from the perspective of the WBA keeper, Ben Foster, the post shot xG for Allen's goal is likely higher than the xG for Sobhi's strike, based on placement, power, location, deflection or lack of.

But it is fairly obvious that the absence of Ben Foster himself in the latter shot has in reality elevated Sobhi's effort to a near 100%.

It is the equivalent of an un-fieldable ball in baseball or an un-catachable pass in football, NFL style, simply because of the field position of the designated catcher or saver.

I don't have our xG2 values for each attempt (it's Christmas), but I suspect Foster will be expected to save Sobhi's effort more often than Allen's, in a model that is ignorant of his wayward positioning for the former attempt.

That would be harsh on Foster, acting out his role as auxiliary attacker, chasing an injury time equaliser.

Keeper metrics are based on the savability of attempts on target and once Sobhi got his attempt on target, the true chance of a goal being scored is around 99.9% (to allow for the possibility of the ball bursting prior to crossing the line).

Using Sobhi's goal to evaluate Foster xG over or under performance would immediately put the keeper at a unfair disadvantage.

If we assume the chance finding the net with a weakly hit shot, along the ground, attempting to enter the goal around the centre of the frame, with no deflection (which effectively changes the shot location), taken from wide of the post and level with the penalty spot, is relatively modest in historical precedence, then Foster will already be nearly a goal worse off when comparing his xG goals allowed with his actual goals allowed.

The reality was a shot, that through little fault of his own, Foster was entirely unable to save, whereas the majority of similar attempts upon which models are built, would have featured a more advantageously positioned keeper.

Numerous unrecorded aspects of a goal attempt can greatly change individual xG estimates while still retaining a usefulness when aggregated.

Body shape when attempting to shoot from the striker's perspective, a bizarre trajectory of the flight of the ball, for example, can change actual expected conversion rates, transforming seemingly identical chances into near unsavable certainties or comfortable claims for the keeper.

It's likely that many post shot xG probabilities that are grouped in similar bins actually have a much wider range of true probabilities. They may not be as wrongly classified as the Foster example, but the implied accuracy inherent in multiple decimal places is bound to be an illusion.

There are a couple of ways to attempt to improve this conundrum.

Scrutinising each attempt is one labour intensive option, hoping that events largely even out in the aggregate is another (although randomness isn't always inherently fair).

A third option is to take indicators from the data we do have, that may help to highlight occasions where a chance may have been wrongly classified within a group of similarly computed xG values.

(This is unfortunately where I invoke a rare non disclosure clause).

So what happens to our xG2 keeper ratings if we try to account for factors that we haven't recorded and therefore are absent in our model?

Generally under performing keepers improve, whilst remaining below par and over performers are similarly dragged partway towards a less extreme level.

De Gea and Bravo have been respectively among the best and worst shot stoppers of the last three seasons.

Using models that incorporate much of the post shot information available, such as shot type, power, placement, rudimentary trajectory, deflections etc, de Gea concedes 84 non penalty attempts against a model's average prediction of 95.

For Bravo the numbers are 25 allowed against 15 predicted.

If we concede that some of the attempts that have been aggregated to make up the baseline for each keeper may have been miss-classified, we can apply a correction, based on hints we have in the data we do have, that may reclassify the attempts more accurately.

De Gea's average expected number of goals allowed falls to 92 (still making him above average, but slightly less super human) and Bravo's is given a slightly more forgiving 19 expected goals, rather than 15.

Acknowledging that a model is incomplete has lead to extremes being regressed towards the mean and that's probably no bad thing if these models are to be used to evaluate and project player talent.

Expected Goals is a work in progress tool, not the strawman, full of cast iron claims, that opponents invariably make on the metric's behalf. If you accept the inevitable and often insurmountable limitations, xG can still add much value to any analysis.

Don't be like Jeff, approach xG with an open mind....and also don't go to bed in a suit.

Saturday 23 December 2017

Influential xG & xA Team Players

Expected goals and expected assists are now becoming an established part of player performance stats and rather than post column after column of boring and indigestible numbers, often to two decimal points, I've been presenting the data as a visual.

It seemed logical to plot the xG/90 against the xA/90 with a minimum cutoff for minutes played to mitigate the intrusion of outlandish outliers.

Here's one of my plots for the Premier League, earlier in the season.

Players appearing towards the top left are the league's more prolific providers of an opportunity, while bottom right is populated with players who more often latch onto the final decisive pass.

Players who had been doing quite a bit of both turn up in the top right region of the plot.

It's immediately obvious that Manchester City dominate the plot, as you might expect from their near perfect start to the season and as a record of the league's most prolific attacking contributors, the plot does it's job.

However, while the prominent multiple players from the same teams, notably City and Arsenal are undoubtedly fine players, their individual performance indicators are perhaps made slightly easier to achieve given the quality of their teammates.

De Bruyne's precise passing is also feeding into the xG of his teammates, as are their intelligent runs creating opportunities for him to bolster his xA.

So as a tweak to my original plots, I've now factored in the overall xG and xA/90 of the team for which each player plies his trade. This results in many of the Manchester City players falling back into the pack.

The individual players are still creating and attempting to convert chances at similar rates to the original plot, but such is City's commitment to attacking play (they are averaging upwards of 2.6 NP xG per game) and such is their depth of creative and attacking talent, that they don't have one particularly stand out performer.

Conversely, Peter Crouch, who would be unlikely to feature prominently in a plot that merely quantified his raw xG and xA/90 contribution, shows up as a hugely influential contributor to Stoke's offensively tepid, overall attack once we factor in the Potters' overall team NPxG/90 of barely 1.0.

As an example, Aguero's combined xG/90 & xA/90 of 1.3 from an overall Manchester City combined rate of nearly 5 xG & xA/90, is arguably less influential and more readily replaced from within than is Crouch's combined 0.6 against Stoke's puny overall 1.8 xG+xA/90.

Whilst the heavyweights from Manchester City will undoubtedly take the plaudits in May, it is perhaps the likes of Crouch, Murray, Austin, Carroll and Gross who are striving and greatly contributing the most to keep their lesser sides afloat who also deserve a mention and their own viz.

Friday 22 December 2017

Tackling Success Rate & the Influence of Luck

About four years ago I wrote a post that speculated on the transfer price associated with a group of equally talented players whose success rate in a particular skill had actually been randomly generated.

Each were given a 10% chance of succeeding, each were given 100 opportunities to succeed and the "best" performers were ranked accordingly.

Of course, the difference in success rate was entirely down to randomness.

If you bought the "best" at a premium, you were paying for unsustainable luck. If you bought the "worst", you were getting a potential bargain, if the price reflected the imaginary under performing ability that would likely regress towards 10%.

It's less straightforward when looking a real players.

Players play on different teams, with different tactical setups and different teammates. They probably have varied levels of skill differentials in a variety of skill sets and they have differing number of opportunities to demonstrate their talent or lack of.

Attempting to partly account for the randomness in sampling is most applicable in on field events where there is a simple definition of success or failure.

In such areas as tackles made, raw counting numbers are much more a product of overall team talent and setup, so there has been a tendency to move onto percentage of tackles won, as an outward sign of competence.

Unlike the revolution in scoring and chance creation, where pre-shot parameters are modeled on historical precedence to created expected goals or chances, there is little prospect, given the available data, of similarly modeling expected tackles, dribbles or aerial duels, for example.

But we should at least try to account for the ever present randomness, even in large samples that partly transforms purely descriptive percentage counts into a more informed predictive metric capable of projecting future success rate.

It's easy to be impressed by the eye test that sees four successful tackles made by a player in a single half of football. But aside from draining the tension from the final minutes of a game by declaring said player "man of the match" , as a projection of future performance it is riddled with "luck" and largely unrepresentative of future. larger scale output

To attempt to overcome this, we can work out what a distribution of outcomes would look like if there is no differential in a measured skill within a group of players. We can then compare this distribution to an actual distribution of outcomes where we suspect a differential exists.

For example, in the tacking ability of Premier League defenders.

We can then try to allow for the randomness that may exist in the observed success rate of players who have had differing opportunities to prove there tackling prowess to produce a more meaningful projection.

The more tackles a player has been involved in, the more signal and less noise his raw rate will contain. Whereas in smaller samples, noise will proliferate and perhaps give extremes that will not be representative of any future output.

Here's the raw tackle success rate from the MCFC/Opta data dump from the 2011/12 season.

It lists the 140 defenders involved in the most tackles during the whole of that season. The left hand side of the plot has players with most tackles, moving to the fewest at the right hand side, where more extreme rates, both apparently good and bad. begin to appear.

The second, identically scaled plot has attempted to regress the observed rate towards the mean for the group, based on the differing number of tackle attempts each defender has been involved in.

All of the small sample sized extremes, either good or bad are dragged closer to the the group average, while the larger samples group slightly more tightly, but were clustered more closely to the group mean to begin with.

The first plot illustrates the interplay between randomness and skill. It is at it's most deceptive in smaller sample sizes. It is perfectly adequate as a descriptive stat for defenders, but deeply flawed as a projection of a defender's likely true tackling talent. And the two are often conflated.

While the second plot tries to strip out the differing influences of randomness over different sample sizes to show that there is probably a skill differential for tackling defenders, but it is nowhere near as wide as raw stats imply, even after a season's worth of tackles.

And if you're rating or buying some of the 90%+ success rated tackles based on just 30 or 40 interactions, you're probably staking your reputation on a hefty dose of unsustainable good fortune as they fall back into the pack with greater exposure.

Friday 15 December 2017

How High Might Manchester City Go?

Despite an inglorious 0/1 record, (Stoke to be relegated after one game of their return to the Premier League in 2008) Paddy Power has already paid out on the crowning in 2018 of Manchester City as the Premier League winners.

They are on slightly firmer ground this time around, as not only are City 11 points clear of United, 14 from Chelsea and 18 ahead of the three top six also-rans, Liverpool, Spurs and Arsenal Burnley, they are also one of the best teams in Premier League history.

With "City to win the League" drifting into "Putin to be re-elected as Russian Leader" territory, focus has shifted to secondary betting markets, based around City's likely points total, goals scored or margin by which they will lift the domestic crown.

Whether or not you're interested in the betting dimension, estimating City's degree of dominance can provide a useful exercise in prediction over the long term.

The quick, and usually flawed way to predict a side's end of season statistics is to blindly scale up from those recorded in the season to date.

This approach is rarely useful, as it takes no account of remaining schedule, implies that the 17 matches played by City and each of their 19 rivals is a near perfect indication of what will follow and disregards variance.

Even after the fact of 16 wins and one draw, there was a finite possibility that more than just Everton may have taken something from a daunting meeting with Manchester City.

Future projections should embrace the possibility that their record to date belongs to an excellent team, but one who may have been slightly fortunate to extract a near 100% points haul and allow for the often admittedly small chance that Pep's City may be defeated.

Even a cursory glance at City's remaining fixtures that includes a game against each of the top 6 7 and two meetings with Spurs, should indicate that a single draw interspersed with wins in their remaining games would seem an unlikely scenario up on which to base a projection.

Simulations of the remaining games in the 2017/18 season, give a less rose tinted prediction, while still confirming City's near certainty to lift the title.

Simulations based on expected goals rolling over both this and last season, expect City to gain 98 Premier League points by May. This is completely in line with the current estimates at which their final points total may be bought or sold at a variety of spread betting companies.

Similar ranges are shown for 10,000 simulated outcomes for City's total goals scored and total wins over the 38 game season.

I've also added the scaled up totals based on their record over 17 games being repeated over the remaining 21 and while these blockbusting values do occasionally appear in the simulations, they are relatively high end outliers and inadequate as a most likely projection in mid December.

Saturday 9 December 2017

Know Your Limits

All predictions come with the caveat that there is a spread of uncertainty either side of the most likely outcome.

A side may be odds on to win almost all of their matches over a season, as Manchester City have very nearly shown in 2017/18, but there is a finite, if extremely small chance that they will actually lose all 38 matches.

Similarly, there is a bigger chance that they will win all 38, but the most likely scenario sits between these two extremes and for the current best team in the Premier League, winning the title with around 96 points is the most expected final outcome in May.

While single, definitive predictions are more newsworthy, they imply a precision that is never available about the longer term futures, especially about a sporting contest, such as a Premier League season that comprises low scoring matches spread over 380 games.

It's therefore useful to attach the degree of confidence we have in our predictions to any statements we make about a future outcome, particularly as new information about teams feeds into the system and the competition progresses, turning probabilistic encounters into 0,1 or 3 point actual outcomes.

Here's the range of points which a simulated model of the 2016/17 Premier League came up with using xG based ratings for each team and particularly Swansea before a ball was kicked.

Swansea had been in relative decline since their impressive introduction into the top tier, playing much admired possession football, mainly as a defensive tactic, that had seen then finish as high as 8th in 2014/15, 21 points clear of the drop zone.

2015/16 had seen them fall to 12th, just ten points from the drop zone and much of their xG rating for 2016/17 was based around this less impressive performance.

The top end of their points totals over 10,000 simulations resulted in a top 10 finish with 52 points, but the lower end left them relegated with 27 points and their mode of 36 final points suggested a season of struggle.

And this is illustrated by the dial plot showing well into the red zone signifying relegation.

After ten games, we now have more information, both about Swansea and the other 19 Premier league teams and the most likely survival cut off points in the 2016/17 league.

At the time, Swansea were 19th with five points from ten games and while the grey portion of mid table is still achievable, it has shrunk and the Swans' low point has fallen deeper into the red.

After thirty games, so just eight left, the upper and lower limits for Swansea after the full 38 games has narrowed. They are still more likely than not to be relegated, according to the updated xG model, but there is still some chance that they will survive.

In reality, Swansea were in the bottom three with three games left, but a win for them and a defeat for Hull in game week 36 was instrumental in retaining their top flight status, but it was as close as the final plot suggested it might be.

Adding indications of confidence in your model enhances any information you may wish to convey.

It's also essential when using xG simulations to "predict" the past, such as drawing conclusion about a player's individual xG and his actual scoring record.

Adding high and low limits will highlight if any over or under performance against an average model based simulation is noteworthy or not.

One final point. The upper and lower limits can be chosen to illustrate different levels of confidence, typically 95%. But this does not mean that a side's final points total and thus finishing position has a 95% chance of lying within these two limits.

It is more your model that is on trial.

There is a 95% chance that any new prediction made for a team by your model will lie within these upper and lower limits.

Hopefully, your model will have done a decent job of evaluating a side, in this case Swansea from 2016/17. But if it hasn't, Swansea's actual finishing position may lie elsewhere.