Thursday 31 January 2019

A Non Shot Addition to the xG Family

Shot based expected goals models can tell us a lot about a match by extending the sample size from around three for actual goals to well into double figures for goal attempts.

But they are event based descriptions of a match and don't always tell the whole story of a match.

The weakness of event based models, be they attempts, final third entries or touches in the box, is, rather obviously, that these event have to occur for them to be registered, often in the most competitively contested region of the field.

Non shot xG models can fill the void that sometimes exists by examining such things as possession chains and the probabilistic outcome that may occur between two teams of known quality.

Last night Liverpool drew 1-1 at home to Leicester.

The hosts, depending on your view point, were unlucky to lose because, "Leicester defended well", "Atko reffed the game poorly" or "Liverpool weren't themselves".

Shot based xG universally gave the match to Leicester. They created better chances and had a larger total shot based xG than the title contending Reds.

Here's Infogol's shot map from last night. Leicester created a couple of decent chances. Liverpool were restricted to attempts from distance.

However, if we look at the potential return for each team based on where and how frequently they began attacks against each other, combined with the typical outcome of such possession in expected goals terms and the talent based differential at completing or supressing passes or dribbles, the balance of "probabilistic" power shifts.

Liverpool shaded the non shot xG assessment by 2.4 to 1.1.

They had the ball frequently enough, beginning in sufficiently advanced areas to have scored a likely two or three goals, with a penalty thrown in for good measure.

Leicester would have typically replied once.

So why was it just 1-1.

Just plain randomness ? An early goal that caused Liverpool to cruise somewhat in a similar way to the return game earlier in the season. A clever Leicester game plan that frustrated Liverpool with a packed defense and a bit of luck from the officials.

There's no correct answer, but there are tools, both event and possession based that can add clarity and suggest areas of investigation.

Tuesday 29 January 2019

Simulating Post Game Outcomes with a Non Shot xG Model.

First there was xG, ExpG, expected goals, chance quality or whatever you wished to call it.

Then we simulated the shooting contest to create a likelihood and range of possible scores.

Next we added the different scoreline probabilities to arrive at a post game chance of the shooting contest ending as a win or a draw.

Undeniably these approaches help to illuminate the story of a single game, but there are occasions when a shot based approach can mislead.

Game state, (the combination of time remaining, scoreline and the talent differential of the two teams), can sometimes lead to a side prioritising winning the game as opposed to maximising the number of goals they may score.

The obvious example of these game state effects might be a side leading by a single goal deep into stoppage time heading for the corner flag, rather than the opposition penalty area or the reverse where a trailing team attempts a speculative long range effort instead of choosing to progress the ball and perhaps losing it before they can shoot.

Therefore, a simple xG tally can sometimes become distorted by attempts that aren't taken and attempts that perhaps shouldn't have been.

Non shot xG models may provide a partial solution to this occasional disconnect between xG totals and an eye witness account of a game.

Instead of using goal attempts when assessing the performance of each team, possessions my the chosen currency in a non shot chance quality model.

Non shot xG models aren't too concerned with how a team choses to use their possession.

Instead it takes a weighted midline between the situations where scoring a goal is the main aim and when alternatively preserving a lead is paramount.

A side who isn't being overwhelmed by a trailing opponent can therefore still build up non shot xG credit by claiming a fair share of possessions in varying areas of the pitch......even if they don't chose to convert them into actual goal attempts that would register in a shot based xG framework.

In short, a side may go shot-less for the final half hour in a game they lead, but still be largely in control of managing the advantageous scoreline.

Earlier this season, Liverpool went to Huddersfield and won 1-0 with a Salah goal in the 24th minute.

Huddersfield "won" the shot based xG contest 0.9 to 0.6 and whether you want to simulate every chance (some of Liverpool's were related opportunities) or simply run the relative xG totals through a poisson, you'll find that shot based xG thinks that Huddersfield were more likely to win the actual game than Liverpool.

The 1X2 splits are around 40/35/25.

So this is one of those occasions when shot based xG thinks the wrong team won, although it is blind to the superior team holding an early lead.

However, a possession based, non shot model, which values every possession and doesn't need a goal attempt to trigger a plus for either teams sees things rather differently.

Liverpool's possessions were, on average around 15% more valuable than Huddersfield's.

I only vaguely remember watching the match, but I didn't get the impression that Liverpool were very lucky to win, nor that, if needed they wouldn't have turned their superior possession chains into more chances.

If we now simulate the likelihood of each side turning their possessions into goals (with no regard for tactical, game state related nuances), Liverpool now win a non shot simulation 44% of the time compared to just 26% for Huddersfield.

There is no right answer when looking at who deserved a win or a loss, and while shot based xG offers one probabilistic opinion, as they say others are available and sometimes they will disagree.

Friday 25 January 2019

Putting Together a Possession Based Non Shot Model

I've previously written about non shot based models as an alternative to purely shot based xG, as well as a way of incorporating the 90+% of onfield actions that are omitted in the former.

A valid criticism of shot based models is that a goal attempt needs to be registered before expected goals tallies can be increased.

However, it is intuitively realised that continued incursions deep into an opponent's half are dangerous, even if a shot isn't forth coming and a dangerous ball that is played across the face of goal also carries a non recorded level of threat.

Similarly, a penalty kick gives a disproportionately large xG figure, particularly when compared to numerous other passes into the box that don't result in a reckless lunge and a favourable ref.

An alternative approach might be to count attack based events, such as final third passes or progressive runs and relate these to a likelihood of scoring. But this seems rather arbitrary and lacking a framework.

Our approach is to select a consistent unit to describe the model that is analogous to a goal attempt and we've chosen a possession.

We then need an equivalent figure to the expected goal figure for an attempt made on goal. And just as a shot based xG model is driven by the probability of scoring with a shot/header given a variety of identifiable parameters, we have used the likelihood that a possession will result in a goal.

Shot or header location are the primary factors n a shot based xG model, but modellers have shied away from such things as finishing skill and or goal keeping prowess, as the proliferation of statistical noise often swamps any signal.

However, in the more event rich environment of passes and ball progressions we may be more confident in including such skill differentials into a non shot model, without straying too far into xG2 shot based territory.

Anyone who watched Burton's second leg game with Manchester City couldn't not be swayed by the obvious individual and technical ability on show from City compared to their hosts. And the implied level of goal threat was much higher when City gained possession compared to the Brewers in a similar pitch location.

Therefore, in constructing a non shot based model, as well as such familiar universals as location, we also incorporate factors which identify both above average proficiency in passing as well as in disrupting passes or carries.

Here's a table I posted at the end of last season, showing the level of over or under performance for Premier League teams in pass completion and pass disruption.

It's notable that Man City were the best at completing passing sequences and suppressing opponent's attempts.

We now have an assembly of ingredients to produce a non shot equivalent to the purely shot based model.

Above is a game by game summary of the non shot xG differential for Manchester City in 2017/18.

Unsurprisingly, a team committed to possession and passing excellence, with high quality players almost always creates a possession environment that gives them a superior non shot xG differential.

And here's a game by game tally for Liverpool in 2017/18

Together with a shot based approach, a non shot model can perhaps add nuance to the balance of power between two sides, based on the frequency, location of possessions and pre game skill differentials of the sides, as well as exploring, via a shot based xG model the, now familiar occasions where a goal attempt was generated.

Thursday 17 January 2019

A Non Shot Expected Goals Look at the UCL Group Stages.

The last post looked at quantifying the increased contribution made by players attempting progressive passes based on the improvement in non shot expected goals via completing a pass and the likelihood that an average passer is able to successfully make such a pass.

We've been building non shot xG models for a few years, so lets take a look at how possession & passing ability can be redefined in terms of non shot xG from this season's UCL group games.

Once you have a NS xG framework you can look at the risk/reward of every attempted pass by quantifying the improvement in NS xG should the pass be completed.

This can be further combined with the likelihood a pass is completed against the risk of losing the initial NS xG you owned and handing NS xG to the opposition should they take possession.

To simplify the post, I'll just look at the reward side of the bargain and aggregate the expected value of a completion in NSxG units for all progressive passes attempted by the 32 UCL group teams and compare that value to the actual value of the completions they made.

This will quantify how often a side had possession in a dangerous area of the field and if, through better passers and/or receivers they outperformed an average passing team.

We'll also take a look at the value of passes allowed into dangerous areas and whether a side managed to reduce that value by making it difficult for opponents to complete passes compared to an average defence.

The defensive side of the ball is often ignored or described entirely in terms of completed actions, such as tackles or interceptions, with little context.

The "Attacking Reward from Progressive Passes NSxG" column is the model's average expectation that a progressive pass results in a possession somewhere on the field.

Playing a forward pass out of defence to the centre circle is very likely to be completed, but the value of the possession in the centre circle won't be that large.

Playing the ball into the opponent's penalty area, dependent upon the origin of the pass, won't be as easy to complete, but will result in a relatively large NS xG value if it is.

Overall, if an average team was willing and able to attempt the pass attempts of Real Madrid in the group phase, they would expect to accrue a cumulative NSxG of 74.2 NSxG over the six games.

Real actual gained 77.9 NSxG.

So they made lots of dangerous pass attempts (although they did also recycle the ball backwards) and over performed the average model by 4% based on actual completions.

Porto was one of the better defences. They allowed side's to make progressive passes worth a model value of 39.4 NS xG and restricted the completions to further depress the actual value to 36 NS xG over the six games.

The best offensive and defensive performers, in terms of NS xG accrued or allowed, along with above average efficiencies are shown in blue, underperformers in red.

Attack and defensive numbers are correlated, particularly from a possession standpoint. As Swansea showed possession can be a purely defensive strategy. So it makes sense to look at the attacking and defensive differentials, along with the performance of the 32 teams in the group phase.

Real Madrid had a net positive NSxG differential of +44.2 in topping group G and Crvena Zvezda a whopping -57.8 in propping up group C.

Real got the ball often into dangerous positions with above average efficiency and restricted the ability of opponents to do the same at league average efficiency.

This is a step towards quantifying progressive passes, rather than simply counting final third completions etc. It unsurprisingly tallies with actual performance and provides a framework to produce possession chain based evaluations of past and future games that isn't entirely reliant upon a shot based approach.

Tuesday 15 January 2019

Quantifying Passing Contribution.

Passing completion models have seeped into the public arena over the last couple of months, mimicking the methodology used in expected goals models.

Historical data is used to estimate the likelihood that a goal is scored by an average finisher based primarily on the shot type and location in the case of expected goals models. And a similar approach is used for passing models.

Historical completion rates based on the origin and type of pass is combined with the assumed target to model a likelihood that a pass is completed and actual completion rates for players are then compared with the expected completion rate to discern over and under performing passers.

However, this approach omits a huge amount of context when applied to passes.

A goal attempt has one preferred outcome, namely a goal. But the unit of success that is often used in passing models is a completion of the pass and that in itself leaves a lot of information off the table.

How much a completed pass advances a side should also be an integral ingredient of any passing model. Completion alone shouldn't be the preferred unit of success, because it isn't directly comparable to scoring in an expected goals model.

A player can attempt extremely difficult passes that barely advances the team's non shot expected goals tally. For example, a 40 yard square ball across their own crowded penalty area is difficult to consistently complete and the balance of risk and reward for success or failure is greatly skewed towards recklessness.

Completing such passes above the league average would mark that player as an above average passer, but if we include the expected outcome of such reckless passes, we would soon highlight the flawed judgement.

The premier passer of his generation is of course Lionel Messi. It isn't surprising that he would complete more passes than an average player would expect to based on the difficulty of each attempted pass.

But we can add much more context if we include the risk/reward element of Messi's attempted passes.

A full blown assessment of every pass Messi attempted in the Champions League group stages becomes slightly messy for this initial post. Instead I'll just look at the positive expected outcomes of his progressive passes.

150 sampled progressive passes made by Messi during the Champions League group stage have both an expected completion probability and an attached improvement in non shot expected goals should the pass be completed. (NS xG is the likelihood that a goal results from that location on the field, it isn't the xG from a shot from that location).

If we simulate each attempt made by Mess 1,000's of times based on these average probabilities and the NS gain should the pass be completed, we get a range and likelihood of possible cumulative NS xG values.

The most likely outcome for an average player attempting Messi's passes is that they would add between 2.4 and 2.6 non shot expected goals to Barcelona's cause.

The reality for Messi was that he added 3.1 non shot expected goals.

There's around a 10% chance that an average player equals or betters Messi's actual tally in this small sample trial. But it is quantified evidence that Messi may well be a better than average passer of the football.

Monday 7 January 2019

Are Teams More Vulnerable After Scoring?

One of the joys from the "pencil and paper" age of football analytics was spending days collecting data to disprove a well known bedrock fact from football's rich traditional history.

2-0 = dangerous lead has been a "laugh out loud" moment for those who went on more than gut instinct for decades.

Nowadays, you can crunch a million passes to build a "risk/reward" model and the only limitation is whether or not your laptop catches fire.

Myth busting (or not) perceived wisdom is now a less time consuming, but still enjoyable pastime.

Teams being more vulnerable immediately following a goal turned up on Twitter this week, although I've lost the link, so does it hold water?

Here's what I did.

Whether a team scores in the next 60 seconds depends on a couple of major parameters.

Firstly, a side's goal expectation.

Again not to be confused with expected goals, goal expectation is a term from the pre internet age of football analytics which is the average number of goals a side is expected to score based on venue, their scoring prowess and the defensive abilities of their opponent on the day.

Secondly, how long has elapsed.

Scoring tends to increase as the game progresses.

45% of goals on average arrive in the first half and 55% in the second. So if you want to predict how likely a side is to score based on their initial goal expectation, it will be smaller if you're looking at the 60 seconds between the 12th and 13 minute, compared to between the 78th and 79th.

Therefore, you take the pre game goal expectation for each team and when one team scores you work out the goal expectation per minute from this general decay rate for the other team over the next ten minutes.

Then you work out the likelihood that the "scored on" team scores in each 60 second segment via Poisson etc.

And then you compare that to reality.

The model doesn't "know" one team has just conceded, so if their opponents are really more likely to concede following their goal, the model's prediction will significantly under estimate the expected number of goals compared to reality.

There's a few wrinkles to iron out.

The first minute after conceding is going to be taken up with one team doing a fair bit of badge kissing and knee sliding, so it won't last for 60 seconds.

It's also going to be difficult to reply in the sixth minute after conceding if you opponent scores in the 94th minute and the ref has already blown for fulltime.

There's also the question of halftime crossover, where the 6th minute might actually be 21 minutes after the goal is conceded.

You can deal with these fairly easily.

I took time stamped Premier League date, ran the methodology and found 91 occasions where a side scored within ten minutes of conceding.

(I also split the ten minutes into 60 second segments, but I want to keep this short & more general).

From the model, in that timeframe, you would have expected those teams to score , wait for it.......91 goals, based on when the goal was conceded, how good their attacking potential matched up to the opponent's defensive abilities and allowing for truncated opportunity at the end of the game & through celebration.

There's no need to invoke scoring team complacency or a conceding teams wrath to end up with the scoring feats achieved, at least in the sample of Premier League games I used.

Are Teams Vulnerable After Scoring?

Probably not.

Saturday 5 January 2019

xG Tables

There's been a lot of interest on Twitter in deriving tables from expected goals generated in matches that have already been played out.

Average expected points/goals/ are a useful, but inevitably flawed way to express over or under performance in reality compared to a host of simulated alternative outcomes.

Averages of course are themselves flawed, because you can drown in 3 inches........blah,blah.

Here's one way I try to take useful information from a simulated based approach using "after the fact" xG figures from matches already played, that may not be as Twitter friendly, but does add some context that averages omit.

If you have the xG that each side generated in a match, you can simulate the likely outcomes and score lines from that match by your method of choice.

A side who out xG'ed the opponent is usually also going to be the most likely winner, in reality and in cyberspace.

But sometimes Diouf will run 60 yards, stick your only chance through Joe Hart's legs, nick three points and everyone's happy.

It just won't happen very often, but it does sometimes and then the xG poor team get three points and the others get none.

Simulate each game played, add up the goals and points and you now have two tables.

One from this dimension and one that "might" have happened in the absence of games state and free will.

It's easy and most readily understood to then compare the points Stoke got in reality to the points the multiple Premier League winners got in this alternative reality.

But it might be better if instead we compared the relative positions and points of each team in this simulation to the reality of the table.

I do that and repeat the process for every one of the 1,000's of simulations using each side's actual points haul in relation to each of their 19 rivals as the over/under performing benchmark.

This is what the 2017/18 season looked like in May based on counting the number of times a side's actual position and points in the table relative to all others was better than a xG simulation.

Top two overperformed, 3rd and 5th did what was expected, 4th and 6th under performed in reality.

Only 15% of the time did the xG simulation throw up a Manchester City season long performance that out did their actual 2017/18 season.

The model might have under valued City's ability to take chances, prevent goals, they might have been lucky, for instance scoring late winners and conceding late penalties to teams who can't take penalties.

So when you come to evaluate City's 2018/19 chances, you may take away that they were flattered by their position, but concluded that the likely challengers were so far behind that they are still by far the most likely winners.

Man United, De Gea, obviously.

Liverpool, 4th but perhaps deserved better. Too far behind City to be a genuine title threat, unless they sort out the keeper & defence.

Burnley, score first, pack the defence and play a hot keeper, bound to work again.

Huddersfield, 16th was a buoyant bonus they didn't merit.

Relegated trio, Swansea, Stoke, pretty much got what they deserved, WBA, without actually watching them much last season, looked really hard done by. If you're going for the most likely bounce straight back team, it was the Baggies.

All of this comment was made in our pre season podcast.

You can use this approach for goals scored/allowed to see where the problems/regression/hot/cold might be running riot, plus simulations and xG are just one tool of many.