Tuesday, 15 January 2019

Quantifying Passing Contribution.

Passing completion models have seeped into the public arena over the last couple of months, mimicking the methodology used in expected goals models.

Historical data is used to estimate the likelihood that a goal is scored by an average finisher based primarily on the shot type and location in the case of expected goals models. And a similar approach is used for passing models.

Historical completion rates based on the origin and type of pass is combined with the assumed target to model a likelihood that a pass is completed and actual completion rates for players are then compared with the expected completion rate to discern over and under performing passers.

However, this approach omits a huge amount of context when applied to passes.

A goal attempt has one preferred outcome, namely a goal. But the unit of success that is often used in passing models is a completion of the pass and that in itself leaves a lot of information off the table.

How much a completed pass advances a side should also be an integral ingredient of any passing model. Completion alone shouldn't be the preferred unit of success, because it isn't directly comparable to scoring in an expected goals model.

A player can attempt extremely difficult passes that barely advances the team's non shot expected goals tally. For example, a 40 yard square ball across their own crowded penalty area is difficult to consistently complete and the balance of risk and reward for success or failure is greatly skewed towards recklessness.

Completing such passes above the league average would mark that player as an above average passer, but if we include the expected outcome of such reckless passes, we would soon highlight the flawed judgement.

The premier passer of his generation is of course Lionel Messi. It isn't surprising that he would complete more passes than an average player would expect to based on the difficulty of each attempted pass.

But we can add much more context if we include the risk/reward element of Messi's attempted passes.

A full blown assessment of every pass Messi attempted in the Champions League group stages becomes slightly messy for this initial post. Instead I'll just look at the positive expected outcomes of his progressive passes.

150 sampled progressive passes made by Messi during the Champions League group stage have both an expected completion probability and an attached improvement in non shot expected goals should the pass be completed. (NS xG is the likelihood that a goal results from that location on the field, it isn't the xG from a shot from that location).

If we simulate each attempt made by Mess 1,000's of times based on these average probabilities and the NS gain should the pass be completed, we get a range and likelihood of possible cumulative NS xG values.

The most likely outcome for an average player attempting Messi's passes is that they would add between 2.4 and 2.6 non shot expected goals to Barcelona's cause.

The reality for Messi was that he added 3.1 non shot expected goals.

There's around a 10% chance that an average player equals or betters Messi's actual tally in this small sample trial. But it is quantified evidence that Messi may well be a better than average passer of the football.

Monday, 7 January 2019

Are Teams More Vulnerable After Scoring?

One of the joys from the "pencil and paper" age of football analytics was spending days collecting data to disprove a well known bedrock fact from football's rich traditional history.

2-0 = dangerous lead has been a "laugh out loud" moment for those who went on more than gut instinct for decades.

Nowadays, you can crunch a million passes to build a "risk/reward" model and the only limitation is whether or not your laptop catches fire.

Myth busting (or not) perceived wisdom is now a less time consuming, but still enjoyable pastime.

Teams being more vulnerable immediately following a goal turned up on Twitter this week, although I've lost the link, so does it hold water?

Here's what I did.

Whether a team scores in the next 60 seconds depends on a couple of major parameters.

Firstly, a side's goal expectation.

Again not to be confused with expected goals, goal expectation is a term from the pre internet age of football analytics which is the average number of goals a side is expected to score based on venue, their scoring prowess and the defensive abilities of their opponent on the day.

Secondly, how long has elapsed.

Scoring tends to increase as the game progresses.

45% of goals on average arrive in the first half and 55% in the second. So if you want to predict how likely a side is to score based on their initial goal expectation, it will be smaller if you're looking at the 60 seconds between the 12th and 13 minute, compared to between the 78th and 79th.

Therefore, you take the pre game goal expectation for each team and when one team scores you work out the goal expectation per minute from this general decay rate for the other team over the next ten minutes.

Then you work out the likelihood that the "scored on" team scores in each 60 second segment via Poisson etc.

And then you compare that to reality.

The model doesn't "know" one team has just conceded, so if their opponents are really more likely to concede following their goal, the model's prediction will significantly under estimate the expected number of goals compared to reality.

There's a few wrinkles to iron out.

The first minute after conceding is going to be taken up with one team doing a fair bit of badge kissing and knee sliding, so it won't last for 60 seconds.

It's also going to be difficult to reply in the sixth minute after conceding if you opponent scores in the 94th minute and the ref has already blown for fulltime.

There's also the question of halftime crossover, where the 6th minute might actually be 21 minutes after the goal is conceded.

You can deal with these fairly easily.

I took time stamped Premier League date, ran the methodology and found 91 occasions where a side scored within ten minutes of conceding.

(I also split the ten minutes into 60 second segments, but I want to keep this short & more general).

From the model, in that timeframe, you would have expected those teams to score , wait for it.......91 goals, based on when the goal was conceded, how good their attacking potential matched up to the opponent's defensive abilities and allowing for truncated opportunity at the end of the game & through celebration.

There's no need to invoke scoring team complacency or a conceding teams wrath to end up with the scoring feats achieved, at least in the sample of Premier League games I used.

Are Teams Vulnerable After Scoring?

Probably not.

Saturday, 5 January 2019

xG Tables

There's been a lot of interest on Twitter in deriving tables from expected goals generated in matches that have already been played out.

Average expected points/goals/ are a useful, but inevitably flawed way to express over or under performance in reality compared to a host of simulated alternative outcomes.

Averages of course are themselves flawed, because you can drown in 3 inches........blah,blah.

Here's one way I try to take useful information from a simulated based approach using "after the fact" xG figures from matches already played, that may not be as Twitter friendly, but does add some context that averages omit.

If you have the xG that each side generated in a match, you can simulate the likely outcomes and score lines from that match by your method of choice.

A side who out xG'ed the opponent is usually also going to be the most likely winner, in reality and in cyberspace.

But sometimes Diouf will run 60 yards, stick your only chance through Joe Hart's legs, nick three points and everyone's happy.

It just won't happen very often, but it does sometimes and then the xG poor team get three points and the others get none.

Simulate each game played, add up the goals and points and you now have two tables.

One from this dimension and one that "might" have happened in the absence of games state and free will.

It's easy and most readily understood to then compare the points Stoke got in reality to the points the multiple Premier League winners got in this alternative reality.

But it might be better if instead we compared the relative positions and points of each team in this simulation to the reality of the table.

I do that and repeat the process for every one of the 1,000's of simulations using each side's actual points haul in relation to each of their 19 rivals as the over/under performing benchmark.

This is what the 2017/18 season looked like in May based on counting the number of times a side's actual position and points in the table relative to all others was better than a xG simulation.

Top two overperformed, 3rd and 5th did what was expected, 4th and 6th under performed in reality.

Only 15% of the time did the xG simulation throw up a Manchester City season long performance that out did their actual 2017/18 season.

The model might have under valued City's ability to take chances, prevent goals, they might have been lucky, for instance scoring late winners and conceding late penalties to teams who can't take penalties.

So when you come to evaluate City's 2018/19 chances, you may take away that they were flattered by their position, but concluded that the likely challengers were so far behind that they are still by far the most likely winners.

Man United, De Gea, obviously.

Liverpool, 4th but perhaps deserved better. Too far behind City to be a genuine title threat, unless they sort out the keeper & defence.

Burnley, score first, pack the defence and play a hot keeper, bound to work again.

Huddersfield, 16th was a buoyant bonus they didn't merit.

Relegated trio, Swansea, Stoke, pretty much got what they deserved, WBA, without actually watching them much last season, looked really hard done by. If you're going for the most likely bounce straight back team, it was the Baggies.

All of this comment was made in our pre season podcast.

You can use this approach for goals scored/allowed to see where the problems/regression/hot/cold might be running riot, plus simulations and xG are just one tool of many.