Sunday 14 November 2021

Football Analytics' Big Own Goal

Just a small vent regarding what a poor job the early analytics community did and continue to do when naming metrics. I know it's been widely pointed out, but "expected goals" is an awful name for the premier metric and nothing screams elitism and jargon than almost always going to acronyms. xG, PSxG, xA, NSxG may be easily understood by anyone who has immersed themselves in the topic, but as someone who has tried to get these ideas to a wider, more general football obsessed audience, they are an immediate barrier. It's almost certainly too late to begin using titles that use everyday language *and* are self explanatory, chance quality, for example rather than expected goals or on target chance quality, rather than post shot expected goals, (which isn't even accurate!). If you need a glossary to write an article about a team or player, who've failed. If you need to write "expected goals, which is"........... The same. Numerical values and decimal places are usually enough to disengage otherwise passionate fans of a sport. Chuck in jargon and you're almost inviting a negative reaction, regardless of the points you are trying to highlight. Three rules of naming metrics. 1) Don't use acronyms. 2) Use familiar language, ideally associated with the sport. 3) DUA

Friday 28 May 2021

What is Goal Expectation?

Let's say you want to make an informed estimation about the upcoming England vs Scotland game at Wembley Stadium in Euro 2020 (2021).

One route would involve estimating the average number of goals England are likely to score against Scotland at Wembley and the average number of goals Scotland would score against England at the same venue.

You could then take a mathematical route to calculate the probability that two side with these average  goal expectation estimates would result in a home win, away win or a draw.

Typically a Poisson approach.

The average number of goals expected to be scored or allowed by a side in a future game has for over 30 years been referred to as their goal expectation

Unfortunately, a more recent and widely discussed metric based on the chance quality of a scoring opportunity, has arrived on the scene and taken the very similar name of expected goals.

They are not the same.

The former, GOAL EXPECTATION, is a measure of the likelihood of success for a side prior to kick off, based on historical data that is used to quantify the difference in quality between the sides. (It may even use historical expected goals data).

The latter, EXPECTED GOALS, is a value ascribed to the quality of attempts on goal, after the fact, based on the characteristics, shot type, location etc of each attempt.

The goal expectation of England and Scotland in the upcoming game is around 2.12 goals and 0.48 goals, respectively.

The expected goals for the game hasn't yet materialised.

Friday 12 March 2021

XG as Easy as 1,2,3

One of the more interesting variants in the expected goals evolutionary backwater broke the scoring process down into stages. Most models go directly from shot location to goal/no goal output, but it is possible to include each of the possible outcomes.

A goal needs to jump through a variety of hoops to register (VAR excluded).

Shots can be blocked, they can miss the target, they can hit the woodwork or the can be saved before they enter the record books and each of these possibilities can be modelled separately.

This route isn’t inherently better than a single stage model, but it does help to throw a more descriptive, if not necessarily predictive light onto why and how a player is excelling or failing to convert location based chance quality into outcome based success.

It has been useful in trying to unpick the Brighton conundrum.

A plethora of underperformance has seen more blocks than expected from shots taken by Brighton players compared to an “expected blocks” model. This is further enhanced by the distance between blocker and Brighton shooter being the lowest in the league, they are getting closed down more extensively than any other team.

Which may suggest a slow and labored build up is degrading Brighton’s xG chances beyond what may be picked up by a one stop, rather than multi-layered xG model. Attacking tweaks, rather than patiently waiting for regression to kick in may be needed.

The next stage in the progression from shot to potential goal involves getting the ball on target.

One of the first xG think pieces I wrote for the now defunct OptaPro blog suggested that getting the ball on target wasn’t quite as straightforward a metric as it first appeared. In short, getting lots of shot on target wasn’t always the sign of an above average striker.

Robin van Persie, then of Manchester United was the guinea pig and his rather less than impressive rate of working the keeper with on target attempts didn’t seem to hurt his scoring performance.

The solution I suggested was that some players who aimed for more difficult to save areas of the goal, top corner, for example, might miss more frequently than players who prioritized target hitting at the expense of save difficulty.

In short, strikers shouldn’t be afraid to miss the goal.

So, we’ve run through two of the three xG stages.

Don’t get your shot blocked (that seems a universal aim, there seems a limited benefit in taking the ball so close to a blocking defender that the chances of having the shot blocked increases greatly).

Hit the target. A more ambiguous ambition. Most strikers could hit the target most of the time, but might compromise the difficulty to save their goal bound attempt.

The final stage is more akin to the traditional, one step model, but instead attempts that successfully negotiate the initial two stages are modelled against out of sample goal/no goal outcomes.

We’ve now got a multi-step xG model (that didn’t catch on from 2014), that adds tons of missing context that can be used to explain the “how” of why a player is returning the outcome from a location based process, even if it still falls to good old random variation to explain away much of the future performance levels.

Some factors affecting xG output may be systematic to teams or players (randomness is still the major player?) and by breaking the process down stage by stage, you can perhaps shine a light onto these additional factors.

Finally, here’s how over and under performers, with at least 10 regular play goals from shots only have maneuvered their way through the three stages of xG since 2016/17.

The table above includes diverse shooting profiles, which may be useful as a descriptor or potential as a coaching aid if the multi-stage xG model can pick up systematic flaws or talents that persist.

Jimenez avoids blocks at a league average, but then misses the target wantonly and his overall scoring from regular play with his boot falls way below the average expectation.

Grealish has more shots blocked than expected, misses the target more frequently, but runs a large over performance for goals scored. Placement is the likely culprit, here.

Whereas, Wood avoids blocks, hits the target, but tamely refuses to accumulate above average goal tallies.

It’s time to take data to the video booth.