Wednesday, 2 October 2019

Passing Risk Reward in the Premier League

The availability of richer data sources has naturally led to an interest in passing and ball progression.

The generally quoted passing metrics still gravitate towards event data such as goal attempts and actual scores as the major framework.

Passes that lead to a potential goal scoring attempt predominate in most current passing metrics and little has been done to differentiate between the contribution made by individual players involved in these possession chains.

In contrast, we've broken down the value of each pass attempted by referencing how likely a possession anywhere on the pitch has historically led to a goal, whether or not the possession ultimately result in an attempt on goal.

This so called non shot xG metric not only allows a route to value every ball progression, be it a pass or a carry, but also quantifies individual involvement, rather than sharing the credit equally between all those participating in the possession.

However, as often is the case in football metrics, only one side of the ball has been investigated.

Each pass attempt comes with a risk and reward.

The player attempting the pass has custody of a valuable team resource, namely the non shot xG value for possession of the ball at that precise position on the field.

The potential reward in making a progressive pass is to advance the ball to a more dangerous area of the field.

And the ever present risk is the cost of a turnover. The passing team lose the NS xG value they had by owning the ball and the opponents gain their own NS xG by taking possession of the ball.

Weighing a player's NS xG leger is problematical, but one way to express the risk reward balance of a players passing performance is to add up the NS xG value of every progressive pass they complete and compare this to the sum of the NS xG he loses through incomplete passes, along with the NS xG gained by the opponent taking possession of his errant attempts.

For example, in the nascent Premier League, Matteo Guendouzi's completed open play progressive passes have been received at areas on the field that totals 6.69 NS xG.

On the minus side, his picked off pass attempts has "lost" Arsenal 1.67 N xG. This is made up of loss of pitch position for Arsenal and the combined NS xG value for the opponent based on where possession is won.

Overall, and without regard for pass volume or minutes played, Guendouzi has a net positive 5.02 NS xG for Arsenal in 2019/10.

This puts him top of the Arsenal "risk/reward" passing charts and we feel is a much better single figure metric to describe a player's involvement in progressing his side towards the opponents goal.

Not only does it quantify individual involvement and utilses every pass attempted, it also penalises reckless or sloppy execution that leads to change of possession.

Here's the current pass risk/reward numbers for all 20 Premier League players with a minimum number of attempts.

Saturday, 14 September 2019

Game State and Blocked Shots.

I've written a fair bit about game state and how it impacts on how a side approaches a match s the time elapses and occasionally the score line changes.

I don't use score differential to define "game state", instead I use a measure of how well each team is fairing based of their pre game expectation.

This can be defined as the expected points based on the current score and time elapsed or the expected success rate of a team, again when measured against a pre kick off baseline. The choice is entirely up to you.

The advantage of this approach is primarily when the game is tied (which it is for a fairly significant portion of most matches). Instead of counting offensive production for both sides at this score differential, there's usually a clear indication of which of the two teams is happier with the stalemate and which is not.

You also get a gradual movement of game state that incorporates the often omitted variable of time elapsed.

It's intuitive as to what might happen as game state ebbs and flows over the course of a match, as unhappy teams perhaps become more risk taking in order to change the current status quo, while pregame underdogs are forced or chose to attempt to bank their above expectation gains by becoming more defensive.

One slight problem with this approach is that it assumes a relatively balanced competitive edge between competing teams and further assumes that those needing to change the current scoreline are capable of attempting to do so.

Not to be harsh, but it's difficult to envisage a situation where Manchester City felt the need to protect a lead against say Newcastle or where Newcastle were technically able to up their attacking intent against the champions.

So often the presence of  clearly superior teams can skew conclusions. "Possession leads to wins" arose largely because better sides also had high levels of possession, but the possession was a byproduct of other things they did, rather than the primary driver of their results.

Remove Barca etc from the data and the relationship between possession and wins tended to disappear.

Therefore, firstly here's why "zero goal differential" (the game is level) shouldn't be regarded as a single game state.

Here's a sample of matches from the 2018/19 Premier League, involving games where one of the Big 6 wasn't playing. Thus the games weren't particularly one-sided from the outset.

Initially, I've simply counted the shot volume from regular play for teams when the score differential is zero (the game is level). The vertical axis records my version of changing game state, a larger negative value indicates that a team that is doing badly compared to the expectation at kickoff.

Typically, this may be when a home favourite is level a fair way into the game and a points expectation that may have been 1.75 expected points at 3 o'clock has fallen back towards one point as the clock ticks on towards 5.

Those above the blue score differential line of zero are doing better that they hoped for, they might have expected to average less than a point from such a game, but they are edging closer and closer to a point, with a possibility of nicking all three.

Each point represents a goal attempt and it's clear that the lions share are being taking by the disgruntled favs.

If we re-examine our intuition, it's likely that if the beneficiaries of the stalemate aren't taking that many shots in the match, they're doing things to prevent the ones at the other end going in.

Learning from the likes of Pulis and Dyche that will likely include blocking shots.

Next I built a simple xG model (just location & type), but also included the game state factor, not just at zero goal differential, but at all score differentials to see if it told anything about the likelihood a shot would be blocked or not.

I eliminated games where a red card had been shown, for obvious reasons.

The bottom line was that game state was a significant factor in correlating with whether an attempt was blocked or not, along with location and shot type. And the larger the decrease in a side's pre-match expectation when the attempt was taken, the more likely it became that the shot was blocked.

In short, without the superstar teams, run of the mill games appear to follow the "hold what we have" and "this is disappointing, let's crack on" mentality.

This is one route to improve the much criticised problem of single xG races, where one team scores early and then drops anchor, but whether it is a universal improvement to a predictive model is a question of over fitting the past and potentially screwing up the future.

Wednesday, 11 September 2019

Rugby World Cup Simulation

World Cup's have been like London buses this year and the rugby union version kicks off in a week or so.

It's live and complete on terrestrial TV in the UK, with plenty of huge mismatches in the opening group games, before eight teams, (whom could be fairly accurately predicted beforehand) hold the really interesting knockout run to the Webb Ellis Trophy on November 2nd.

However, that's not to say that the group matches don't hold any intrigue. There are at least two tier one teams in each of the four groups and while they'll be expected to steamroller the lower grade group opponents, the outcomes of these elite matchup will have a huge bearing on how the pairings for the knockout phase pans out.

Therefore, if you want to chart the likelihood of a team's route to the final being paved with Southern hemisphere behemoths, a tournament simulation is the easiest method out there.

You'll need a ratings system to kickoff with, assuming you're shunning the merry-go-round that has been the world rankings. Ireland are the current leaders, having recently displaced Wales, who had just displaced New Zealand, who themselves had displaced South Africa....ten years ago.

So the world rankings, following a decade of stagnation have suddenly become volatile.

Let's make our own, instead.

I took the last 20 matches for all participants, and produced an attacking and defensive rating, based around match scores and opponent quality.

New Zealand are the tournament's most potent attack, they'll score around 14 more points against and average team than another average team would manage and Wales, courtesy of rugby league knowhow, has the best defence.

Next you need a way to simulate game outcomes.

The big clash of the group stages sees favourites New Zealand take on South Africa. After matching up the respective attacking and defensive ratings for each team, the model expects the All Blacks to average around 28.5 points and S Africa 23.5.

New Zealand are favoured by five points and there's likely to be 52 total points.

If we look at the spread of points scored and allowed by each side over the last year or so, we can produce a distribution of points that describes each team's likely scoring pattern in this game. We'll then draw a value randomly from this distribution for each team to simulate a single match scoreline and then repeat the process thousands of times.

After adding a few tweaks to mimic the largely redundant bonus points system rugby insists on employing and ensuring that each drawn score from the distributions is a "rugby score" (no scoring a grand total of four points etc), we just repeat for every group game, add up the total points won in the group, follow the draw format and find the winner.

This is how the simulations shake out.

Four sides with a double figure percentage chance of lifting the trophy, New Zealand, S Africa for the south and England and Wales for the north, with the former looking a vulnerable favourite.