Pages

Saturday, 30 December 2017

Jeff Stelling was Right about xG....For the Wrong Reasons.

Love it or loath it, totally get it or pack it away with opinions such as "foreign managers who don't know the Premier League are rubbish" or simply use it as one component in your predictive market of choice, there's no denying that expected goals made a mark in 2017.

Expected goals is most effective in the long term and in the aggregate, but there's an understandable desire to also parade it for individual games and individual chances.

Jeff Stelling, who only appears to think probabilistically, when lying, fully clothed in bed with a million pounds and a teddy wearing a Hartlepool shirt, may merely have been expressing the well documented caveats of using xG for a single game when he derided the xG thoughts of the Premier League's senior statesman, Arsene Wenger.

Betting on probabilistic outcomes, what are the odds of that!

Using xG rather than actual goals in a single game is simply a more nuanced look at the team process that went into the 90 minutes.

It approaches the difficult question of who "deserved" to win from both a larger sample size than goals, albeit one often twisted by game effects and provides an answer in terms of likelihood, rather than the more palatable, but unattainable level of certainty that has long been expected from TV experts.

1-0 wins can be subject to large amounts of random variation, There's probably even more if you have treated your fans to a 4-3 victory. Whereas 7-0 leaves much less room for doubt as to whom got their just rewards.

If you adopt a Pythagorean wins approach to the goals scored and allowed in these three single game scenarios, you would give a larger proportion of "Pythagorean wins" to the team that won 7-0 than you would the team that won 1-0 and by far the least to the side that triumphed 4-3.

So there is information to be extracted from even basic scorelines that goes beyond wins. draws and losses.

Individual xG chances takes this approach a step further to give indications of whether a team that won 1-0 was fortunate to win or unlucky not to have scored a hatful in competition with the efforts of their defeated opponent.

The most visible flaw of xG can be in individual chances, because although the amount of information available to define an opportunity is large, it is still far from complete.

The broad sweep of xG probabilities, drawn from large historical precursors often trumps an eyetest opinion, particularly where probability is an unfamiliar concept to those using years of footballing knowledge, rather than mathematical models to estimate whether or not a chance should have been converted.

There are also relatively easy to spot examples where lack of collected data has, in a largely automated xG process, generated values that are at odds with reality.

Joe Allen

The above and below examples from Stoke's recent game with WBA, illustrate the problems inherent with calculations made either without a visual check or a more complete set of parameters.


Ramadan Sobhi

 Looked at from the perspective of the WBA keeper, Ben Foster, the post shot xG for Allen's goal is likely higher than the xG for Sobhi's strike, based on placement, power, location, deflection or lack of.

But it is fairly obvious that the absence of Ben Foster himself in the latter shot has in reality elevated Sobhi's effort to a near 100%.

It is the equivalent of an un-fieldable ball in baseball or an un-catachable pass in football, NFL style, simply because of the field position of the designated catcher or saver.

I don't have our xG2 values for each attempt (it's Christmas), but I suspect Foster will be expected to save Sobhi's effort more often than Allen's, in a model that is ignorant of his wayward positioning for the former attempt.

That would be harsh on Foster, acting out his role as auxiliary attacker, chasing an injury time equaliser.

Keeper metrics are based on the savability of attempts on target and once Sobhi got his attempt on target, the true chance of a goal being scored is around 99.9% (to allow for the possibility of the ball bursting prior to crossing the line).

Using Sobhi's goal to evaluate Foster xG over or under performance would immediately put the keeper at a unfair disadvantage.

If we assume the chance finding the net with a weakly hit shot, along the ground, attempting to enter the goal around the centre of the frame, with no deflection (which effectively changes the shot location), taken from wide of the post and level with the penalty spot, is relatively modest in historical precedence, then Foster will already be nearly a goal worse off when comparing his xG goals allowed with his actual goals allowed.

The reality was a shot, that through little fault of his own, Foster was entirely unable to save, whereas the majority of similar attempts upon which models are built, would have featured a more advantageously positioned keeper.

Numerous unrecorded aspects of a goal attempt can greatly change individual xG estimates while still retaining a usefulness when aggregated.

Body shape when attempting to shoot from the striker's perspective, a bizarre trajectory of the flight of the ball, for example, can change actual expected conversion rates, transforming seemingly identical chances into near unsavable certainties or comfortable claims for the keeper.

It's likely that many post shot xG probabilities that are grouped in similar bins actually have a much wider range of true probabilities. They may not be as wrongly classified as the Foster example, but the implied accuracy inherent in multiple decimal places is bound to be an illusion.

There are a couple of ways to attempt to improve this conundrum.

Scrutinising each attempt is one labour intensive option, hoping that events largely even out in the aggregate is another (although randomness isn't always inherently fair).

A third option is to take indicators from the data we do have, that may help to highlight occasions where a chance may have been wrongly classified within a group of similarly computed xG values.

(This is unfortunately where I invoke a rare non disclosure clause).

So what happens to our xG2 keeper ratings if we try to account for factors that we haven't recorded and therefore are absent in our model?

Generally under performing keepers improve, whilst remaining below par and over performers are similarly dragged partway towards a less extreme level.

De Gea and Bravo have been respectively among the best and worst shot stoppers of the last three seasons.

Using models that incorporate much of the post shot information available, such as shot type, power, placement, rudimentary trajectory, deflections etc, de Gea concedes 84 non penalty attempts against a model's average prediction of 95.

For Bravo the numbers are 25 allowed against 15 predicted.

If we concede that some of the attempts that have been aggregated to make up the baseline for each keeper may have been miss-classified, we can apply a correction, based on hints we have in the data we do have, that may reclassify the attempts more accurately.

De Gea's average expected number of goals allowed falls to 92 (still making him above average, but slightly less super human) and Bravo's is given a slightly more forgiving 19 expected goals, rather than 15.

Acknowledging that a model is incomplete has lead to extremes being regressed towards the mean and that's probably no bad thing if these models are to be used to evaluate and project player talent.



Expected Goals is a work in progress tool, not the strawman, full of cast iron claims, that opponents invariably make on the metric's behalf. If you accept the inevitable and often insurmountable limitations, xG can still add much value to any analysis.

Don't be like Jeff, approach xG with an open mind....and also don't go to bed in a suit.

No comments:

Post a Comment