Pages

Monday, 22 January 2018

After the Shot xG2

Expected goals has variously been defined by advocates and opponents respectively as a more accurate summary of what "should" have happened on the pitch or a useless appendage to the final scoreline, that is neither useful nor enlightening.

The first description is perhaps too overtly optimistic for a "work in progress" that is evolving into a useful tool for player projection and team prediction.

Whereas the second, less flattering description, may also stand up to some scrutiny, particularly if the supporters of the stat ignore the uncertainty intrinsic in it's calculation, while the detractors may be blithely ignorant of such limitations.

Both camps are genuinely attempting to quantify the true talent levels of players and teams in a format that allows for more insightful debate and, in the case of the nerds, one that is less prone to cognitive bias.

The strength of model based opinion is that it can examine processes that are necessary for success (or failure), drawing from a huge array of similar scenarios from past competitions.

And in doing so without straying too far down the route from chance creation to chance conversion (or not), so that the model avoids becoming too anchored in the specifics of the past, rendering any projections about the future flawed.

Overfitting past events is a model's version of eye test biases, but that shouldn't mean we throw out everything that happens, post chance creation for fear of producing an over confident model that sticks immutably to past events and fails to flexibly project the future.

It's no great stretch to model the various stages from final pass to the ball crossing the goal line (or not).

Invariably, the process of chance creation alone has been prioritised as a better predictor of future output and post shot modeling has remained either a neglected sidetrack or merely the niche basis for xG2 keeper shot stopping.

But if used in a less dogmatic way, mindful of the dangers of over fitting, the "full set" of hurdles that a decisive pass must overcome to create a goal (or not) may become a useful component in an integrated approach that utilises both numeric and visual clues to deciphering the beautiful game.

Lets look at chances and goals created from set pieces and corners.


Here's the output from two expected goals models for chances and on target attempts conceded by the current Premier League teams in the top flight since early 2014.

The xG column is a pre shot model, typically used to project a side's attacking or defensive process, that uses accumulated information, but is ignorant of what happened once contact with the ball was made.

The xG2 column is based entirely upon shots or headers that require a save and uses a variety of post shot information, such as placement, power, trajectory and deflections. Typically this model would be the basis for measuring a keeper's shot stopping abilities.

A superficial overview of the difference between the xG allowed from set pieces and actual goals allowed leads to the by now familiar "over or under performing" tag.

Stoke had been transformed into a spineless travesty of their former defensive core at set plays, conceding both chucks of xG and under performing wantonly by allowing 42 actual goals against 37 expected.

There's little disconnect between the Potters' xG2, that examines those attempts that needed a save, but the case of Spurs & Manchester United perhaps shows that deeper descriptive digging may provide more insight or at least add nuance.

Tottenham allowed a cumulative 29.6 xG conceding just 23.

We know from keeper models that Lloris is generally an excellent shot stopper and the xG2 model confirms that, along with the ever present randomness, the keeper's reactions are likely to have played a significant role in defending set play chances.

In allowing 23 goals, Lloris faced on target attempts that worth just over 31 goals to an average keeper.

29.6 xG goals are conceded, looked at in terms of xG2 this value has risen to 31.3, so still mindful of randomness, Spurs' defenders might have been a little below par in surpressing the xG2 attempts that came about from the xG chances they allowed, but Lloris performed outstandingly to reduce the level of actual goals to just 23.

Superficially, Manchester United appears identical.

As a side they allowed 37.6 xG, but just 32 actual goals. we know that De Gea is an excellent shot stopper, therefore in the absence of xG2 figures we might assume he performed a similar service for his defence as Lloris did for his.

However, United's xG2 is just 33.1 and the difference between this and the actual 32 goals allowed is positive, but relatively small compared to Lloris at Spurs.

By extending the range of modeling away from a simple over/under xG performance we can begin to examine credible explanations for the outputs we've arrived at.

Are United's defenders exerting so much pressure, even when allowing attempts consistent with an xG of 37.6 that the power. placement etc of those on targets efforts are diluted by the time they reach De Gea?

Are the attackers themselves under performing despite decent xG locations? (Every xG model is always a two way interaction between attackers and defenders).

Is it just randomness or is it a combination of all three?

Using under and over performing shorthand is fine. But we do have the data to delve more into the why and taking this xG and xG 2 data driven reasoning over to the video analysis side is the logical, integrated next step.

Monday, 15 January 2018

Arsenal Letting in Penalties Doesn't Defy the Odds.

Arsenal fans have been getting hot under the collar about penalties.

Penalty kicks have either been awarded (against Arsenal) when they shouldn't have been, not awarded (to Arsenal) when they should have or when they have been conceded, they've gone in, alot.

The latter has spawned the inevitable trivia titbit.


There's nothing wrong with such trivia as fuel for the banter engine between fans, but almost inevitably they quickly become evidence for an underlying problem that exclusively afflicts Arsenal.

Cue the Daily Mail "why is Arsenal's penalty saving record so poor"

So lets add some context.

We're into familiar selective cutoff territory, where you pick a starting point in a sequence to make a trend appear much more extreme than it actually is.

As you'd probably guess, Arsenal saved a penalty just prior to the start of the run.

They also saved one Premier League penalty in each of the preceding two seasons, two more per season if you go back two more campaigns and obligingly opponents penalty takers also missed the target completely on a handful of other occasions.

If you shun the exclusivity of the Premier League Arsenal keepers made penalty saves in FA Cup shootouts and induced two misses in Community Shield shootouts, the latter as recently as 2017.

Over the history of the Premier League, 14% of penalties have been saved by the keeper. The remaining have gone wide, hit the post, been scored or an attempt has been made to pass the ball to a team mate. (Arsenal, again)

Arsenal's overall Premier League penalty save rate is also 14%.

So you should ask if we're simply seeing a random streak that was likely to happen to someone, not necessarily Arsenal, over the course of Premier League history.

Arsenal has conceded nearly 100 Premier League penalties because they have  had dirty defenders  been ever present, respected members of the top flight.

Of the current Premier League sides, 17 have had the opportunity to concede a run of 23 consecutive penalty goals.

If we simulate all the penalties faced by each of these teams using a generic penalty success rates, you find that at least one side during the current history of the Premier league will have conceded a run of 23 penalty goals or more in just over half of the simulations.

Letting in penalty after penalty, sometimes up to and beyond 23 is something that is going to have happened slightly more often than not in the top flight, based on save rates.

Arsenal just happen to have had both the opportunity and the luck to have been the Premier League's slightly odds on reality star winner.

Friday, 5 January 2018

Making xG More Accessible

When the outputs of probabilistically modeled, expected goals met mainstream media it was very unlikely to have resulted in a soft landing.

With a few exceptions, notably Sean Ingle , Michael Cox and John Burn-Murdoch, the reaction to the higher media profile of expected goals has ranged from the misguided to the downright hostile and dismissive.

Jeff Stelling's pub worthy rant on Sky was entirely in keeping with how high the Soccer Saturday bar is set, (Stelling can't really think that, though. Can he?).

While the Telegraph's " expected goals went through the roof" critique of Arsenal's back foot point at home to Chelsea, wildly overstated the likelihood of each attempt ending up in the net.

Despite the understandable irritation, much of the blame for the negative reception for xG must lie with our own enclosed community, which created the monster in the first place.

Parading not one, but sometimes two decimal places is often enough to lose an entire audience of arithmophobic football fans, who would otherwise be receptive to the information that xG can be used to portray.

Presenting Chelsea as 3.18 xG "winners" against a 1.33 xG Arsenal team in a game that actually finished 2-2 is an equally clunky and far from intuitive way of presenting a more nuanced evaluation of the balance of scoring opportunities created by each side.

Quoting the raw xG inputs may be fine in peer groupings, such as the OptaPro Forum but if wider acceptance is craved for the concept of process verses outcome, a less number based approach must be sought.

When Paul Merson says that "Arsenal deserve to be in front" he's simply giving a valued opinion based on decades of watching and participating in top class football.

And, ironically when xG quotes Team A as having accumulated more xG than Team B in the first half of a match, it is similarly drawing upon a large, historical data pool of similar opportunities to quantify the balance of play, devoid of any cognitive bias or team allegiance.

Just as a detailed breakdown of Merson's neuron activity required to arrive at his conclusion would be both unnecessary and of very limited interest, merely quoting xG to a wider audience focuses entirely on the "clever" modelling, whilst completely ignoring any wider conclusion that could easily be expressed in football friendly terms.

I've been simulating the accumulated chance of a game being drawn or either team leading based on the individual xG of all goal attempts made up to the latest attempt, as a way of converting mere accumulated xG into a more palatable summary of a game.


 Here's the simulated attempt based xG timeline for Arsenal verses Chelsea.

It plots how likely it is that say Chelsea lead after 45 minutes given the xG of each team.

In this game, it's around a 50% chance that the attempts taken in the first half would have led to Chelsea scoring more goals than Arsenal.

It's around a 40% chance that the game is level (not necessarily scoreless) and around 10% that Arsenal lead.

So rather than quoting xG numbers to a largely unwilling audience, the game can be neatly summarised, from an xG perspective in a manner that isn't far removed from the eye test and partly subjective opinion of a watching ex professional.

"Chelsea leading is marginally the most likely current outcome, with Arsenal leading the least likely, based on goal attempts".

The value of xG is to accumulate process driven information to hopefully make projections that are solidly based, rather than reliant upon possibly poorly processed and inevitably biased, raw opinion based evaluations.

But that shouldn't mean we can't/won't use our data to present equally digestible, but number based opinion as to who's more likely to be leading in a single match....and express it in varying degrees of certainty, but in plainer English and without recourse to any decimal points.