Wednesday, 6 March 2019

Title Winners Aren't Becoming More Dominant Over Time.

Are the title winning teams in the Premier League getting more dominant because they're getting so much richer?

It seems a logical conclusion to draw given that Manchester City won the league with an unprecedented 100 points in 2017/18.

That obviously makes them the highest points per game team in 20 team Premier League history, but without context, such figures are largely meaningless.

Taking the points per game high point as a selective cutoff point is invariably going to furnish any number of apparently positive trendlines, but without taking a deeper look at how the league as a whole has evolved over a period of time, they too are context-less trivia.

The first 20 team Premier League season in 1995/96 had 98 draws, by 2017/18 the number had 99. But singular seasons may hide an upward or downward trend and this appears to be the case with drawn matches and by extension the total points that were won in a whole season.

The 1990's averaged 104 draws per season compared to just 92 for the comparable number of most recent Premier League campaigns.

Here's what this means for the average number of points won by sides in each Premier League season since 1995/96.

There has been a steady upward trend for the average number of points won by all Premier League teams since the beginning of the 20 team era, as draws have tended to decrease, therefore reducing the number of matches where just two points are won compared to those where three are gained.

So are the top teams taking a bigger share of this expanded points pot, which may indicate that they are being more dominant that their predecessors were.

One way to look at this context corrected view is to see how remote the representative of each finishing position has become from the average points won by a side in a particular season.

Manchester City in 2017/18 were 2.5 standard deviations above the league average points won that season. But it's a level of dominance that was very similar to that attained by Chelsea in 2004/05, Arsenal in 2003/04 and Manchester United in 1999/2000.

Here's the plot of how far from the average points all 20 finishing positions have been since 1995/96.

OK, it's messy. But it's fairly easy to see that the title winners aren't powering upwards in a ever improving arc. In fact it pretty much flatline's and might even be encouraged to dip downwards if we wanted to be "creative".

Here's an easier on the eye trendline for each final position.

Once you add the context of the points gathering environment over time, Man City 2017/18 are just a bump in the road and not part of a general trend. None of the top three finishing positions have shown to have improved their dominance over the rest of the league.

There's been a slight uptick for 4th to 7th placed sides, a down tick for 7th to 12th. Then everyone holds station, until the two worst teams become slightly more competitive over time, but still go down.

Thursday, 21 February 2019

The Name Game.

Sports analytics, not just football (or soccer) has always had a problem when naming their metrics (see what I mean).

Corsi, TSR, Pythagorean and expected goals may work fine in a closed environment, but try sticking those terms into the mainstream and you're immediately on the back foot.

Jeff Stelling's rant wouldn't have been half as effective if he'd had to say "Chance quality, what's that!"

Anyway, we've already embarked on a second phase of attaching names to a brand new raft of models and performance indicators, except this time everyone's going to be scratching their heads about what it is that we're actually talking about.

Anyone who's ever posted an xG figure will be familiar with the "X get Y for their xG, why the difference" but the rise of the NS xG model will take that to new heights.

Shot based xG models (actually shots, headers and other body parts) all share a core set of inputs (location, type) and any additions simply move the dial slightly, but the steady onset of so call "Non Shot xG" models may lead to comparisons between models that bear very little relationship to one another.

538 has a NS xG model, defined thus,.

Non-shot expected goals is an estimate of how many goals a team could have scored given their nonshooting actions in and around their opponent’s penalty area.

Infogol has a NS xG model, but ours is based on the expected outcome of possession chains.

They currently share a name, but nothing else.

In an increasingly monetized situation it is understandable that some are reluctant or unable to share detailed descriptions of each model's makeup.

But, even if we can't avoid falling into the trap of using less than intuitive language to name commonly used metrics (as happened with xG), we perhaps should steer clear of using catch all terms, such as NSxG to describe future modelling efforts.

538's model appears to be event based, ours is possession based, so it's probably best to include this additional piece of information when presenting any NSxG models in the future. 

Thursday, 31 January 2019

A Non Shot Addition to the xG Family

Shot based expected goals models can tell us a lot about a match by extending the sample size from around three for actual goals to well into double figures for goal attempts.

But they are event based descriptions of a match and don't always tell the whole story of a match.

The weakness of event based models, be they attempts, final third entries or touches in the box, is, rather obviously, that these event have to occur for them to be registered, often in the most competitively contested region of the field.

Non shot xG models can fill the void that sometimes exists by examining such things as possession chains and the probabilistic outcome that may occur between two teams of known quality.

Last night Liverpool drew 1-1 at home to Leicester.

The hosts, depending on your view point, were unlucky to lose because, "Leicester defended well", "Atko reffed the game poorly" or "Liverpool weren't themselves".

Shot based xG universally gave the match to Leicester. They created better chances and had a larger total shot based xG than the title contending Reds.

Here's Infogol's shot map from last night. Leicester created a couple of decent chances. Liverpool were restricted to attempts from distance.

However, if we look at the potential return for each team based on where and how frequently they began attacks against each other, combined with the typical outcome of such possession in expected goals terms and the talent based differential at completing or supressing passes or dribbles, the balance of "probabilistic" power shifts.

Liverpool shaded the non shot xG assessment by 2.4 to 1.1.

They had the ball frequently enough, beginning in sufficiently advanced areas to have scored a likely two or three goals, with a penalty thrown in for good measure.

Leicester would have typically replied once.

So why was it just 1-1.

Just plain randomness ? An early goal that caused Liverpool to cruise somewhat in a similar way to the return game earlier in the season. A clever Leicester game plan that frustrated Liverpool with a packed defense and a bit of luck from the officials.

There's no correct answer, but there are tools, both event and possession based that can add clarity and suggest areas of investigation.