Pages

Friday, 12 March 2021

XG as Easy as 1,2,3

One of the more interesting variants in the expected goals evolutionary backwater broke the scoring process down into stages. Most models go directly from shot location to goal/no goal output, but it is possible to include each of the possible outcomes.

A goal needs to jump through a variety of hoops to register (VAR excluded).

Shots can be blocked, they can miss the target, they can hit the woodwork or the can be saved before they enter the record books and each of these possibilities can be modelled separately.

This route isn’t inherently better than a single stage model, but it does help to throw a more descriptive, if not necessarily predictive light onto why and how a player is excelling or failing to convert location based chance quality into outcome based success.

It has been useful in trying to unpick the Brighton conundrum.

A plethora of underperformance has seen more blocks than expected from shots taken by Brighton players compared to an “expected blocks” model. This is further enhanced by the distance between blocker and Brighton shooter being the lowest in the league, they are getting closed down more extensively than any other team.

Which may suggest a slow and labored build up is degrading Brighton’s xG chances beyond what may be picked up by a one stop, rather than multi-layered xG model. Attacking tweaks, rather than patiently waiting for regression to kick in may be needed.

The next stage in the progression from shot to potential goal involves getting the ball on target.

One of the first xG think pieces I wrote for the now defunct OptaPro blog suggested that getting the ball on target wasn’t quite as straightforward a metric as it first appeared. In short, getting lots of shot on target wasn’t always the sign of an above average striker.

Robin van Persie, then of Manchester United was the guinea pig and his rather less than impressive rate of working the keeper with on target attempts didn’t seem to hurt his scoring performance.

The solution I suggested was that some players who aimed for more difficult to save areas of the goal, top corner, for example, might miss more frequently than players who prioritized target hitting at the expense of save difficulty.

In short, strikers shouldn’t be afraid to miss the goal.

So, we’ve run through two of the three xG stages.

Don’t get your shot blocked (that seems a universal aim, there seems a limited benefit in taking the ball so close to a blocking defender that the chances of having the shot blocked increases greatly).

Hit the target. A more ambiguous ambition. Most strikers could hit the target most of the time, but might compromise the difficulty to save their goal bound attempt.

The final stage is more akin to the traditional, one step model, but instead attempts that successfully negotiate the initial two stages are modelled against out of sample goal/no goal outcomes.

We’ve now got a multi-step xG model (that didn’t catch on from 2014), that adds tons of missing context that can be used to explain the “how” of why a player is returning the outcome from a location based process, even if it still falls to good old random variation to explain away much of the future performance levels.

Some factors affecting xG output may be systematic to teams or players (randomness is still the major player?) and by breaking the process down stage by stage, you can perhaps shine a light onto these additional factors.

Finally, here’s how over and under performers, with at least 10 regular play goals from shots only have maneuvered their way through the three stages of xG since 2016/17.




The table above includes diverse shooting profiles, which may be useful as a descriptor or potential as a coaching aid if the multi-stage xG model can pick up systematic flaws or talents that persist.

Jimenez avoids blocks at a league average, but then misses the target wantonly and his overall scoring from regular play with his boot falls way below the average expectation.

Grealish has more shots blocked than expected, misses the target more frequently, but runs a large over performance for goals scored. Placement is the likely culprit, here.

Whereas, Wood avoids blocks, hits the target, but tamely refuses to accumulate above average goal tallies.

It’s time to take data to the video booth.


Thursday, 24 December 2020

Stoke and the Art of Crossing

Stoke Highlight the Art of Crossing.

Two Stoke City games, two headers, two goals and a duo of 1-0 wins not only demonstrates the fine lines that can separate six points from two in a low scoring sport, such as football, but also the important role still played by crosses in the modern game.

Lavishly assembled squads may partly spurn crossing as a primary route to goal in favour of more intricate, possession based passing sequences to create space before the final delivery, but even the likes of Arsenal when faced with the need for a goal do fall back on the traditional cross.

33 crosses yielded a single goal in a recent 2-1 home defeat for Arteta’s side against Wolves and infamously, Manchester United attempted over 80 crosses in a drawn game with Fulham in the last days of David Moyes’ reign.

Crossing, as a primary strategy reached a low point with Liverpool’s 2011/12 team consisting of a big target man, Andy Carroll and a host of players ready to deliver a cross, led by Stewart Downing.

Unfortunately, such a predictable game plan & and tendency to cross the ball early from less advanced field positions, resulted in a failed experiment. An average of 21 Liverpool crosses per game was rewarded with just four Premier League goals.

Present day Liverpool lead the analytics revolution, but their failed, decade old legacy helped to kick start that revolution, as data was used to explain why their cross heavy approach failed and where the lesson lay for teams to maximize the returns from a wide player’s staple delivery.

Crosses in general are inefficient.

Leagues vary, but as a baseline number, it takes upwards of 90 crosses to score a goal directly from the delivery. Secondary chances created after the initial header or shot, but during the same phase of play, improves the strike rate to around one goal every 50 crossed balls.

However, not all crosses are equal. The danger is more apparent if a side works a delivery from the byline compared to a last-minute desperation hoof from deep into the mixer.

Fortunately, data can differentiate between types of crosses. Whether the ball was chipped or driven on the ground, for example. But where crosses originate and where they are aimed provides the biggest insight into how to turn a cross into a winning formula.

You can divide the origin and intended destination of a cross into two broad categories depending on how effective they are at producing goals.

In the graphic below, prime areas are shown in red and the least effective in blue.



Blue wasteful target areas are intuitive.

If the ball is aimed too close to the goal line, they become prey to a dominant keeper. But place the cross too close to the edge of the box and any shot or header will be taken from distance and for every yard a striker moved away from the goal, the likelihood of a goal falls by ten percent.

The red sweet spot is between these two areas.

The touchline hugging, wasteful blue delivery areas give both the keeper and defenders time to defend the box, whereas moving infield to deliver the cross reduces defensive reaction time and greatly improves conversion rates.

Hitting a ball from a wide and deep wing position to the wasteful area of the six-yard box, going from one blue zone to another, only produces a goal every 500 attempts. Whereas a delivery from a red, prime infield area to a red, prime area of the box increases conversion rates to around one goal every 20 crosses.

Stoke City’s two winning goals against Wycombe and Middlesbrough have been added to the graphic and hit the sweet spot for both Fox & McClean’s delivery and Collins & Powell’s headed goals. They were assists that were drawn from the most productive area of the crossing playbook.

Of course, there’s much more than “crossing by the numbers” to a successful outcome.

Powell is an accomplished header of the ball. During his Championship career over 20% of his goal attempts have been from headers and he is adept at getting on the end of higher quality attempts than the league average. Whilst Collins’ physical attributes are obvious.

Campbell then crossed from one prime area to another for Cardiff to obligingly smack the ball into their own net, before he departed on a season long, injury induced hiatus, Fox hit the prime red zone with a pacy cross to defeat Blackburn & Brown repeated the prime to prime connection to set up Thompson to briefly draw level with Spurs in the Carabao Cup 1/4 final.   

Clever off the ball running also contributes, a seen by Vokes drawing away Wycombe defenders with his near post run & Stoke creating an over load of far post attackers for the goal against Middlesbrough.

Over recent games, Stoke City had the crossing basics in place and good things followed,

On the weekend when Stoke climbed into the playoff spots on the back of two smartly executed crosses, Arsenal in the North London derby were again trusting more to luck by throwing in another 44 crosses in the vain pursuit of a goal.

Monday, 20 April 2020

Scatter Plots

There's been a huge increase in football related scatter plots recently. So as the guy who produced the first such plots, I thought I'd quickly run through why I thought this simple plot was useful and then try to expand the idea to provide additional usefulness.

The initial plots were designed to both inform and characterise playing style.

I think still the most successful plots use related metrics, for example expected assists and expected goals per 90 for individual players.

These "makers and takers" plots easily split players into those whose predominant talent is to create chances, those who get onto the end of opportunities and those rare players who excel at both disciplines.

Here's one for Arsenal 2019/20.

It's got sample size issues, but it's fairly evident that the creative players are towards the top left and the goal poachers are to be found in the bottom right.

Another quite neat aspect of this type of plot is that you can run a line through a player to the origin and any one with a similar ratio of xG and xA will lie close to that line.

In league wide samples, therefore you can find emerging players with similar qualities to the established stars.

There's a lot of data swilling around today, these plots are simple to make, three minutes tops, and with some thought about what you're trying to illustrate, they inform pretty well.

Over the weekend I came back to the idea, to see if I could add information that tells you a little bit more than just the raw connection between two metrics.

Here's what I came up with. It's again just a simple scatter plot, but I've used bubble size to introduce a third variable (metric volume per 90).

In addition I've used a single performance metric (NS xG added from ball carries) along the x axis and instead of plotting a complementary metric on the vertical axis, I've used a number to denote how diverse the x axis metrics are for each player.



This just plots the top 20 NS xG added by players through their ability to successfully carry the ball forward and move their team into a more dangerous pitch position.

It's a good one to chose because you know that Adama Traore will top the list (and he does).

Rather than a sterile scatter, you've now got a chart that not only tells you about a performance metric, it also instantly adds another layer (success volume) from which you can draw addition information about the characteristics of a player.

In short, those towards the right of the plot add more NS xG per 90 than others.
Larger bubble size indicates more successful progressive carries per 90.
And higher up the chart indicates more disorder and unpredictability in what a player will positively achieve for his team when on the all.

I've annotated players with the additional information you can draw from these plots.