Saturday 14 September 2019

Game State and Blocked Shots.

I've written a fair bit about game state and how it impacts on how a side approaches a match s the time elapses and occasionally the score line changes.

I don't use score differential to define "game state", instead I use a measure of how well each team is fairing based of their pre game expectation.

This can be defined as the expected points based on the current score and time elapsed or the expected success rate of a team, again when measured against a pre kick off baseline. The choice is entirely up to you.

The advantage of this approach is primarily when the game is tied (which it is for a fairly significant portion of most matches). Instead of counting offensive production for both sides at this score differential, there's usually a clear indication of which of the two teams is happier with the stalemate and which is not.

You also get a gradual movement of game state that incorporates the often omitted variable of time elapsed.

It's intuitive as to what might happen as game state ebbs and flows over the course of a match, as unhappy teams perhaps become more risk taking in order to change the current status quo, while pregame underdogs are forced or chose to attempt to bank their above expectation gains by becoming more defensive.

One slight problem with this approach is that it assumes a relatively balanced competitive edge between competing teams and further assumes that those needing to change the current scoreline are capable of attempting to do so.

Not to be harsh, but it's difficult to envisage a situation where Manchester City felt the need to protect a lead against say Newcastle or where Newcastle were technically able to up their attacking intent against the champions.

So often the presence of  clearly superior teams can skew conclusions. "Possession leads to wins" arose largely because better sides also had high levels of possession, but the possession was a byproduct of other things they did, rather than the primary driver of their results.

Remove Barca etc from the data and the relationship between possession and wins tended to disappear.

Therefore, firstly here's why "zero goal differential" (the game is level) shouldn't be regarded as a single game state.

Here's a sample of matches from the 2018/19 Premier League, involving games where one of the Big 6 wasn't playing. Thus the games weren't particularly one-sided from the outset.

Initially, I've simply counted the shot volume from regular play for teams when the score differential is zero (the game is level). The vertical axis records my version of changing game state, a larger negative value indicates that a team that is doing badly compared to the expectation at kickoff.

Typically, this may be when a home favourite is level a fair way into the game and a points expectation that may have been 1.75 expected points at 3 o'clock has fallen back towards one point as the clock ticks on towards 5.

Those above the blue score differential line of zero are doing better that they hoped for, they might have expected to average less than a point from such a game, but they are edging closer and closer to a point, with a possibility of nicking all three.

Each point represents a goal attempt and it's clear that the lions share are being taking by the disgruntled favs.

If we re-examine our intuition, it's likely that if the beneficiaries of the stalemate aren't taking that many shots in the match, they're doing things to prevent the ones at the other end going in.

Learning from the likes of Pulis and Dyche that will likely include blocking shots.

Next I built a simple xG model (just location & type), but also included the game state factor, not just at zero goal differential, but at all score differentials to see if it told anything about the likelihood a shot would be blocked or not.

I eliminated games where a red card had been shown, for obvious reasons.

The bottom line was that game state was a significant factor in correlating with whether an attempt was blocked or not, along with location and shot type. And the larger the decrease in a side's pre-match expectation when the attempt was taken, the more likely it became that the shot was blocked.

In short, without the superstar teams, run of the mill games appear to follow the "hold what we have" and "this is disappointing, let's crack on" mentality.

This is one route to improve the much criticised problem of single xG races, where one team scores early and then drops anchor, but whether it is a universal improvement to a predictive model is a question of over fitting the past and potentially screwing up the future.

Wednesday 11 September 2019

Rugby World Cup Simulation

World Cup's have been like London buses this year and the rugby union version kicks off in a week or so.

It's live and complete on terrestrial TV in the UK, with plenty of huge mismatches in the opening group games, before eight teams, (whom could be fairly accurately predicted beforehand) hold the really interesting knockout run to the Webb Ellis Trophy on November 2nd.

However, that's not to say that the group matches don't hold any intrigue. There are at least two tier one teams in each of the four groups and while they'll be expected to steamroller the lower grade group opponents, the outcomes of these elite matchup will have a huge bearing on how the pairings for the knockout phase pans out.

Therefore, if you want to chart the likelihood of a team's route to the final being paved with Southern hemisphere behemoths, a tournament simulation is the easiest method out there.

You'll need a ratings system to kickoff with, assuming you're shunning the merry-go-round that has been the world rankings. Ireland are the current leaders, having recently displaced Wales, who had just displaced New Zealand, who themselves had displaced South Africa....ten years ago.

So the world rankings, following a decade of stagnation have suddenly become volatile.

Let's make our own, instead.

I took the last 20 matches for all participants, and produced an attacking and defensive rating, based around match scores and opponent quality.

New Zealand are the tournament's most potent attack, they'll score around 14 more points against and average team than another average team would manage and Wales, courtesy of rugby league knowhow, has the best defence.

Next you need a way to simulate game outcomes.

The big clash of the group stages sees favourites New Zealand take on South Africa. After matching up the respective attacking and defensive ratings for each team, the model expects the All Blacks to average around 28.5 points and S Africa 23.5.

New Zealand are favoured by five points and there's likely to be 52 total points.

If we look at the spread of points scored and allowed by each side over the last year or so, we can produce a distribution of points that describes each team's likely scoring pattern in this game. We'll then draw a value randomly from this distribution for each team to simulate a single match scoreline and then repeat the process thousands of times.

After adding a few tweaks to mimic the largely redundant bonus points system rugby insists on employing and ensuring that each drawn score from the distributions is a "rugby score" (no scoring a grand total of four points etc), we just repeat for every group game, add up the total points won in the group, follow the draw format and find the winner.

This is how the simulations shake out.

Four sides with a double figure percentage chance of lifting the trophy, New Zealand, S Africa for the south and England and Wales for the north, with the former looking a vulnerable favourite.