Friday, 15 December 2017

How High Might Manchester City Go?

Despite an inglorious 0/1 record, (Stoke to be relegated after one game of their return to the Premier League in 2008) Paddy Power has already paid out on the crowning in 2018 of Manchester City as the Premier League winners.

They are on slightly firmer ground this time around, as not only are City 11 points clear of United, 14 from Chelsea and 18 ahead of the three top six also-rans, Liverpool, Spurs and Arsenal Burnley, they are also one of the best teams in Premier League history.

With "City to win the League" drifting into "Putin to be re-elected as Russian Leader" territory, focus has shifted to secondary betting markets, based around City's likely points total, goals scored or margin by which they will lift the domestic crown.

Whether or not you're interested in the betting dimension, estimating City's degree of dominance can provide a useful exercise in prediction over the long term.

The quick, and usually flawed way to predict a side's end of season statistics is to blindly scale up from those recorded in the season to date.

This approach is rarely useful, as it takes no account of remaining schedule, implies that the 17 matches played by City and each of their 19 rivals is a near perfect indication of what will follow and disregards variance.

Even after the fact of 16 wins and one draw, there was a finite possibility that more than just Everton may have taken something from a daunting meeting with Manchester City.

Future projections should embrace the possibility that their record to date belongs to an excellent team, but one who may have been slightly fortunate to extract a near 100% points haul and allow for the often admittedly small chance that Pep's City may be defeated.

Even a cursory glance at City's remaining fixtures that includes a game against each of the top 6 7 and two meetings with Spurs, should indicate that a single draw interspersed with wins in their remaining games would seem an unlikely scenario up on which to base a projection.

Simulations of the remaining games in the 2017/18 season, give a less rose tinted prediction, while still confirming City's near certainty to lift the title.

Simulations based on expected goals rolling over both this and last season, expect City to gain 98 Premier League points by May. This is completely in line with the current estimates at which their final points total may be bought or sold at a variety of spread betting companies.

Similar ranges are shown for 10,000 simulated outcomes for City's total goals scored and total wins over the 38 game season.

I've also added the scaled up totals based on their record over 17 games being repeated over the remaining 21 and while these blockbusting values do occasionally appear in the simulations, they are relatively high end outliers and inadequate as a most likely projection in mid December.

Saturday, 9 December 2017

Know Your Limits

All predictions come with the caveat that there is a spread of uncertainty either side of the most likely outcome.

A side may be odds on to win almost all of their matches over a season, as Manchester City have very nearly shown in 2017/18, but there is a finite, if extremely small chance that they will actually lose all 38 matches.

Similarly, there is a bigger chance that they will win all 38, but the most likely scenario sits between these two extremes and for the current best team in the Premier League, winning the title with around 96 points is the most expected final outcome in May.

While single, definitive predictions are more newsworthy, they imply a precision that is never available about the longer term futures, especially about a sporting contest, such as a Premier League season that comprises low scoring matches spread over 380 games.

It's therefore useful to attach the degree of confidence we have in our predictions to any statements we make about a future outcome, particularly as new information about teams feeds into the system and the competition progresses, turning probabilistic encounters into 0,1 or 3 point actual outcomes.

Here's the range of points which a simulated model of the 2016/17 Premier League came up with using xG based ratings for each team and particularly Swansea before a ball was kicked.

Swansea had been in relative decline since their impressive introduction into the top tier, playing much admired possession football, mainly as a defensive tactic, that had seen then finish as high as 8th in 2014/15, 21 points clear of the drop zone.

2015/16 had seen them fall to 12th, just ten points from the drop zone and much of their xG rating for 2016/17 was based around this less impressive performance.

The top end of their points totals over 10,000 simulations resulted in a top 10 finish with 52 points, but the lower end left them relegated with 27 points and their mode of 36 final points suggested a season of struggle.

And this is illustrated by the dial plot showing well into the red zone signifying relegation.

After ten games, we now have more information, both about Swansea and the other 19 Premier league teams and the most likely survival cut off points in the 2016/17 league.

At the time, Swansea were 19th with five points from ten games and while the grey portion of mid table is still achievable, it has shrunk and the Swans' low point has fallen deeper into the red.

After thirty games, so just eight left, the upper and lower limits for Swansea after the full 38 games has narrowed. They are still more likely than not to be relegated, according to the updated xG model, but there is still some chance that they will survive.

In reality, Swansea were in the bottom three with three games left, but a win for them and a defeat for Hull in game week 36 was instrumental in retaining their top flight status, but it was as close as the final plot suggested it might be.

Adding indications of confidence in your model enhances any information you may wish to convey.

It's also essential when using xG simulations to "predict" the past, such as drawing conclusion about a player's individual xG and his actual scoring record.

Adding high and low limits will highlight if any over or under performance against an average model based simulation is noteworthy or not.

One final point. The upper and lower limits can be chosen to illustrate different levels of confidence, typically 95%. But this does not mean that a side's final points total and thus finishing position has a 95% chance of lying within these two limits.

It is more your model that is on trial.

There is a 95% chance that any new prediction made for a team by your model will lie within these upper and lower limits.

Hopefully, your model will have done a decent job of evaluating a side, in this case Swansea from 2016/17. But if it hasn't, Swansea's actual finishing position may lie elsewhere.

Wednesday, 29 November 2017

Over Performers Aren't Always Just Lucky.

Firstly, this isn't another post about whether Burnley are good at blocking shots because "yes they are".

Instead it's about applying some kind of context to levels of over or under performance to a side's performance data. And attempting to attribute how much is the result of the ever present random variation in inevitably small samples and how much is perhaps due to a tactical wrinkle and/or differing levels of skill.

Random variation termed as "luck" is probably the reddest of rags to a casual fan or pundit, disinterested or outwardly hostile to the use of stats to help to describe their beautiful game.

It's the equivalent for anyone with a passing interest in football analytics of "clinical" being used ad nauseam, all the way to the mute button by Owen Hargreaves.

Neither of these two catch-all, polar opposite terms used in isolation are particularly helpful. Most footballing events are an ever shifting, complex mixture of the two.

I first started writing about football analytics through being more than mildly annoyed that TSR (or Total Shot Ratio, look it up) and its supporters constantly branded Stoke as being that offensive mix of "rubbish at Premier League football" and constantly lucky enough to survive season after season.

And then choosing the Potters as the trendy stats pick for relegation in the next campaign as their "luck" came deservedly tumbling down.

It never did.

Anyone bothered enough to actually watch some of their games could fairly quickly see that through the necessity of accidentally getting promoted with a rump of Championship quality players, Stoke or more correctly Tony Pulis, were using defensive shapes and long ball football to subvert both the beautiful game and the conclusions of the helpful, but deeply flawed and data poor, TSR stat.

There weren't any public xG models around in 2008. To build one meant sacrificing most of Monday collecting the data by hand and Thursday as well when midweek games were played.

But, shot data was readily available, hence TSR.

At its most pernicious, TSR assumed an equality of chance quality.

So getting out-shot, as Stoke's setup virtually guaranteed they would be every single season, was a cast iron guarantee of relegation once your luck ran out in this narrow definition of "advanced stats",

Quantifying chance quality in public was a few years down the road, but even with simple shot numbers, luck could be readily assigned another constant bedfellow in something we'll call "skill".

There comes a time when a side's conversion rate on both sides of the ball is so far removed from the league average rates that TSR relied upon that you had to conclude that something (your model) was badly broken when applied to a small number of teams.

We don't need to build an xB model to see Burnley as being quite good at blocking shots, just as we didn't need a labouriously constructed expected goals model to show that Stoke's conversion disconnects were down to them taking fewer, good quality chances and allowing many more, poorer quality ones back in 2008.

Last season, the league average rate at which open play attempts were blocked was 28%. Burnley faced 482 such attempts and blocked 162 or 34%

A league average team would have only blocked 137 attempts under a naive, know nothing but the league average, model.

Liverpool had the lowest success rate under this assumption that every team has the same in built blocking intent/ability. They successfully blocked just 21% of the 197 opportunities they had to put their bodies on the line.

You're going to get variation in blocking rate, even if each team has the same inbuilt blocking ability and the likelihood of a chance being blocked evens out over the season.

But you're unlikely to get the extremes of success rates epitomized by Burnley and Liverpool last season.

You'll improve this cheap and cheerful, TSR type blocking model for predictive purposes by regressing towards the mean both the observed blocking rates of Liverpool and Burnley.

You'll need to regress Liverpool's more because they faced many fewer attempts, but the Reds will still register as below average and the Claret and Blues above.

In short, you can just use counts and success rates to analysis blocking in the same way as TSR looked at goals, but you can also surmise that the range and difference in blocking ability that you observe may be down to a bit of tactical tinkering/skillsets as well as randomness in limited trials.

In the real world, teams will face widely differing volumes, the "blockability" of attempts will vary and perhaps not even out for all sides and some managers will commit more potential blockers, rather than sending attack minded players to create havoc at the other end of the field.

With more data, and I'm lucky to have access to it in my job, you can easily construct an xB model. And some teams will out perform it (Burnley). But rather than playing the "luck" card you can stress test your model against these outliers.

There's around a 4% chance that a model populated with basic location/shot type/attack type parameters adequately describes Burnly's blocking returns since 2014.

That's perhaps a clue that Burnley are a bit different and not just "Stoke" lucky.

The biggest over-performing disconnect is among opponent attempts that Burnley faced that were quite likely to be blocked in the first place. So that's the place to begin looking.

And as blocking ability above and beyond inevitably feeds through into Burnley's likelihood of conceding actual goals, you've got a piece of evidence that may implicate Burnley as being a more acceptable face of over-performance in the wider realms of xG for the enlightened analytical  crowd to stomach than Stoke were a decade ago.