Friday, 5 January 2018

Making xG More Accessible

When the outputs of probabilistically modeled, expected goals met mainstream media it was very unlikely to have resulted in a soft landing.

With a few exceptions, notably Sean Ingle , Michael Cox and John Burn-Murdoch, the reaction to the higher media profile of expected goals has ranged from the misguided to the downright hostile and dismissive.

Jeff Stelling's pub worthy rant on Sky was entirely in keeping with how high the Soccer Saturday bar is set, (Stelling can't really think that, though. Can he?).

While the Telegraph's " expected goals went through the roof" critique of Arsenal's back foot point at home to Chelsea, wildly overstated the likelihood of each attempt ending up in the net.

Despite the understandable irritation, much of the blame for the negative reception for xG must lie with our own enclosed community, which created the monster in the first place.

Parading not one, but sometimes two decimal places is often enough to lose an entire audience of arithmophobic football fans, who would otherwise be receptive to the information that xG can be used to portray.

Presenting Chelsea as 3.18 xG "winners" against a 1.33 xG Arsenal team in a game that actually finished 2-2 is an equally clunky and far from intuitive way of presenting a more nuanced evaluation of the balance of scoring opportunities created by each side.

Quoting the raw xG inputs may be fine in peer groupings, such as the OptaPro Forum but if wider acceptance is craved for the concept of process verses outcome, a less number based approach must be sought.

When Paul Merson says that "Arsenal deserve to be in front" he's simply giving a valued opinion based on decades of watching and participating in top class football.

And, ironically when xG quotes Team A as having accumulated more xG than Team B in the first half of a match, it is similarly drawing upon a large, historical data pool of similar opportunities to quantify the balance of play, devoid of any cognitive bias or team allegiance.

Just as a detailed breakdown of Merson's neuron activity required to arrive at his conclusion would be both unnecessary and of very limited interest, merely quoting xG to a wider audience focuses entirely on the "clever" modelling, whilst completely ignoring any wider conclusion that could easily be expressed in football friendly terms.

I've been simulating the accumulated chance of a game being drawn or either team leading based on the individual xG of all goal attempts made up to the latest attempt, as a way of converting mere accumulated xG into a more palatable summary of a game.

 Here's the simulated attempt based xG timeline for Arsenal verses Chelsea.

It plots how likely it is that say Chelsea lead after 45 minutes given the xG of each team.

In this game, it's around a 50% chance that the attempts taken in the first half would have led to Chelsea scoring more goals than Arsenal.

It's around a 40% chance that the game is level (not necessarily scoreless) and around 10% that Arsenal lead.

So rather than quoting xG numbers to a largely unwilling audience, the game can be neatly summarised, from an xG perspective in a manner that isn't far removed from the eye test and partly subjective opinion of a watching ex professional.

"Chelsea leading is marginally the most likely current outcome, with Arsenal leading the least likely, based on goal attempts".

The value of xG is to accumulate process driven information to hopefully make projections that are solidly based, rather than reliant upon possibly poorly processed and inevitably biased, raw opinion based evaluations.

But that shouldn't mean we can't/won't use our data to present equally digestible, but number based opinion as to who's more likely to be leading in a single match....and express it in varying degrees of certainty, but in plainer English and without recourse to any decimal points.

No comments:

Post a Comment