Tuesday 30 August 2016

The 2016 NFL Regular Season Done & Dusted in Excel.

The NFL’s back and so are the LA Rams, so here’s how I'd go about modelling the 2016 season.

Firstly, you need a rating for each team in the new season.

Previous season’s data is always a good starting point, but the NFL is a relatively short 16 games regular season, so wins and losses from 2015 can be heavily influenced by luck or random variation to give it a less provocative name.

Winning lots of games by narrow margins, winning more or fewer games than your points scoring/conceding merits and feeding off lots of turnover ball are usually big indicators that your win/loss record may regress towards the league of 8 wins in the upcoming season.

These traits are often described as "knowing how to win", but they are almost always just random......or possibly cheating.

Based on how teams with these flags against their record did in the subsequent season over the last decade, the Carolina Panthers (15-1) would expect to regress to around 10 wins in 2016. 

The four win Chargers should be dragged upwards to just over 7 wins.

These projections are around where the Panthers and the Chargers are quoted in the season total wins markets. So we’ve got a decent starting point that’s potentially stripped out some of the unsustainable luck that went into the 2015 regular season.

Here's the predicted wins for all 32 teams based on last year's luck based indicators.

Next we need a way to predict individual match results.

The NFL is blessed because we have Bill James’ Pythagorean Log 5 method, which takes winning percentage and home field advantage and spits out the odds of a home and away win. 

(It looks like this =((E2)*(1-G2)*$B$1)/((E2)*(1-G2)*$B$1+(1-E2)*(G2)*(1-$B$1)), where E2 is the home team win%, G2 is the away team win% and B1 is the home winning %, currently around 0.57).

The Panthers are projected as a 10-6 team, so 10/16 or a 0.625 team.

You put estimated win% for each team into each match up and get the estimated home win% for the match as the output.

Do this for every regular season game. 

The projected 0.39 Rams go to the 0.27 SF 49ers on Monday night, week one. 

Last season the Rams were in St Louis and if we stick our regressed, new season win% into Bill’s formula we get LA 1.8 (a shade of odds on) to win on the money line in SF or a spread of around 3 points in LA’s favour.

LA are currently quoted at -2.5 favs, so again, we’ve got a decent model.

Next we need simulate all 256 games from week one to week seventeen.

Again it’s easy in excel. 

We judge that LA will win in SF with a probability of 0.56 based on our money line odds of 1.8 and if we stick a random number between 1 and 0 alongside this estimated win probability and if it falls below 0.56, we grant LA their win, otherwise it’s a SF win.

Again we can do this for every regular season game and we’ve simulated one NFL season for 2016.

Ideally we’d like to repeat this a couple of thousand times and once again even excel obliges in less than a minute with a decent computer.

Check out this soccer post for the basic method.

We now have a range of season wins for all 32 teams, based on a reasonably robust new season rating and their actual intertwined schedule.

Seattle has the best projection in 2016, they are expected to average just over 11 wins.

However, the simulations illustrate the range of outcomes that are possible even for the likely best team in the NFL just due to the randomness in a short 16 game season.

From the plot of the outcomes from 10,000 iterations of the 2016 season, there is an 8% chance that Seattle will not have a winning season. Small, but certainly not insignificant.

More positively, there’s around a 1 in 1,000 chance they go 16-0. Bookies will currently give you around 499/1

It’s around 200/1 that someone goes 16-0 in 2016, again you might get 66/1 on that.

Ratings can be updated as the season progresses or you can add your own tweaks at any time, such as subjective adjustments to account for current rosters.

Finally, here's the finishing probabilities for the NFC West, based on 10,000 league simulations and appropriate tie breakers.

Fairly close to the quoted odds for the top two, with the Rams and 49er's shortened on the books in case either "do a Leicester".

Seattle and Arizona are most likely to also nab the top seeding in the NFC, along with Cinci, New England and Pittsburgh in the AFC. But you could have probably guessed that.

The Jets have about a 14% chance of winning 12 or more games, which might be an AFC outsider worth running with. 

While Dallas and Detroit each has a 5% chance of doing likewise in the NFC and getting a high post season seeding.

Monday 29 August 2016

On The Rebound.

Expected goals are designed to look at the process of scoring, rather than the singular outcome on the day.

I've previously written about how few big chances aren't equivalent to the same expected goals spread over more shots. The later trades the possibility of scoring a larger number of goals for the greater likelihood that you will score at least once.

Another acknowledged problem when using cumulative expected goals to represent a side's achievements is the treatment of quickfire attempts from rebounding shots.

Often the chances are created well inside the box, sometimes leading to cumulative expected goals totals that exceed 1 for a connected opportunity that could at best only result in a single score.

Choosing cutoff points is always subjective, after all every action in a real match is connected to some degree from the first kick to the last, but in the table below I've charted the percentage of shots for each team in last year's La Liga that came within 10 seconds of their previous attempt.

Over 90% of attempts were made at least 30 seconds after the preceding effort, so the majority of attempts are preceded by a lengthy phase of general play.

The average in 2015/16 for La Liga as a whole was 6%, but Eibar, Espanyol and Real Betis heeded the call to "follow up" with greater regularity.

The problem of over estimating a side's attacking potential by inflating rebounds can be reduced by simulating chances, but limiting such sequences to a maximum of just one actual goal scored.

For example, two sequential chances, in quick succession each having a singular probability of being scored of 0.5 doesn't guarantee a goal, on average, as their cumulative total may suggest. Instead you score with three quarters of such related opportunities.

Real Betis hone their blocking skills against a top Premier League team.
We can demonstrate the difference between the two sets of circumstances by accounting for and then ignoring the connected events in a match simulation.

In 2015/16 Eibar drew a late season fixture with Betis, 1-1.

Visitors, Betis had six shots, one of which was the game's biggest chance, from which they scored, nicely illustrating the value of creating the odd gilt edged opportunity.

Eibar had 21 attempts, 16 of which had a goal probability of less than 10%.

Five Eibar attempts came within 10 seconds of an initial attempt. Four combined low value attempts with more valuable ones, but the final salvo united two attempts that were individually marginally odds on to be scored.

Cumulatively, Eibar's expected goals almost reached three compared to just over one for Real Betis.

Although score effects also played a part, the hosts would appear to have been unlucky to not gain three points.

Simulations based on attempts conform this impression.

A straight simulation of all 27 attempts in the game give Eibar more goals in 83% of the iterations, scoring and conceding an average number of goals per game that equals the cumulative expected goals tallies of each attack.

However, once you treat rebounds as connected events, Eibar's share of victories falls to 77% and the average goals scored does likewise from their average cumulative expected goals of 2.9 to just 2.6.

Expected goals do provide insight into a side's ability or achievement in a single match, but occasionally they over or under rate the teams at the extremes.

Friday 26 August 2016

48 Games into the Championship.

The Championship may be only four match days old, but granular data on the state of the teams is beginning to pile up.

Over 1,000 goal attempts have been made, 300 plus of which required the keeper to try to at least make a save and Shane Duffy has already scored three league goals, although none for his actual employers.

Prediction is a constant balancing act between using recent data and larger samples that inevitably contain information from previous seasons, when a side may have had a very different lineup.

Huddersfield currently sit top of the Championship, while Newcastle, the short priced preseason favourites are closer to the relegation zone, from a points perspective than they are to the top of the pile.

The betting markets do not expect this situation to remain and Newcastle still head the market and the current leaders are given around a 4% chance of remaining in their current elevated position.

Fans of Huddersfield will no doubt relish their current position and perhaps dream that they are deserved pacesetters at this early stage, much as Crystal Palace. Swansea, Leicester supporters did in the early 2015/16 Premier league.

So is there any useful information to be gained from a sample size of just four games?

Many will be familiar with the idea that individual matches are rife with luck and looking at the process of chance creation, rather than just the relatively infrequent outcomes can be more predictive.

Huddersfield currently has a goal difference of +3, the smallest possible differential when acquiring 10 points from four matches and they have won each of their three victories by the margin of a single goal.

They've taken just slightly more attempts than they've faced and expected goals, based on shot type and position suggest that they might score, on average 4.5 goals and allow 5.5 from such chances.

They have a negative expected goal difference after four matches, that is only the 16th best record this season.

Small numbers of matches can also have very different strengths of schedules for different sides and Huddersfield has played a reasonably taxing first four games against relegated teams, Villa and Newcastle, along with Barnsley and Brentford.

Using interlocking collateral form of all 24 sides and their expected goal differential from Opta sourced data, the solutions that describe the events of the 48 games to date, place Huddersfield as the 12th best team in terms of strength of schedule corrected expected goals.

Newcastle are second under this approach, behind only Brighton.

All three promoted teams are comfortable inside the top 10, along with the likes of Wolves, Fulham, Derby, QPR and perhaps surprisingly, Reading.

Blackburn prop up whichever approach you use, with Nottingham Forest and Birmingham enjoying more elevated league positions than their shooting and schedule perhaps merits.

It's early days for the 24 team league and Huddersfield fans should perhaps screen capture for posterity this early incarnation.

Wednesday 10 August 2016

The Premier League Goalkeeping Class of 2016/17.

Goal scorers, followed by goal keepers have been the most widely analysed positions in football.

The reasons are obvious, their main duties are closely connected. Strikers try the get the ball on target and into the net, whilst keepers do their best to prevent the latter.

In short, there is a readily identifiable series of actions and possible outcomes that can be used to attempt to define the abilities of the two positions.

Initially keepers were ranked simply by save percentage, the proportion of on target attempts that they prevented from turning into goals.

We should expect some variation in save percentages, even if every attempt carried exactly the same difficulty tariff and each keeper had the same talent for making saves.

If you toss a series of fair coins in a varied number of trials, the success rates of heads will largely fall in and around 50%, but some coins will appear more talented than others, just through chance.

Once you allow for this natural variation and  unequal number of attempts faced by the individual keepers, the save percentages of Premier League keepers is still more widely dispersed than though mere chance.

We can conclude that either, some keepers are better at saving shots than others, not all shots are savable to the same degree or much more likely a combination of at least these two factors.

Expected goals models, which use shot location, type, style and placement may be used in an attempt to quantify the task faced by keepers for each individual on target attempt.

A weakly struck chip that drifts gently towards the centre of the goal at midriff height is eminently more savable than a powerfully hit shot that deflects towards the top corner and historical precedence can be used to assign such efforts differing likelihoods of being saved.

This type of analysis quickly yields keepers who have allowed fewer goals than the average keeper described by such models would conceded from the attempts faced.

For example, in 2015/16 Lukasz Fabianski allowed 44 non penalty and non own goals from 157 goal bound attempts compared to an expected number conceded of nearly 50. Similarly, Kasper Schmeichel allowed 32 from 128 attempts against a par score of just over 35.

Both are over performers, but if we look more deeply at each attempt by simulating the range of outcomes using an average stand in keeper, the Swansea stopper appears to have put in a more solidly impressive performance.

An average keeper would emulate or better Fabianski's 2015/16 shot stopping performance around 9% of the time, whereas he would replicate or better Schmeichel's above par season in over a quarter of the simulated trials.

Over performing shot stopping, therefore is an encouraging sign, but by no means a clear indication of consistent, above average talent that may persist. It may just be par ability boosted by luck.

The table above shows the keepers who played in both of the last two seasons and how many times they saved more shots than predicted by an average ability expected goals model.

Ten keepers were above average in both seasons, whereas eight were below par in consecutive campaigns.

Stoke and England's Jack Butland combines youth with two season's of over performance in terms of goals allowed compared to the goal bound efforts he has been required to save.

However, just as Schmeichel's 2015/16 season has a 25% chance of being replicated by a lucky, average keeper, the same is true, only more so for both of Butland's seasons.

His impressive over-performance in 2014/15 from relatively few attempts faced and his smaller over achieving 2015/16 season from a much larger sample size was in both cases replicated or bettered in just under 50% of trials.

Depending upon where we draw this probabilistic line, many of the keepers who have had two most recent above par performing seasons against an expected goals model begin to fall away.

de Gea's most recent season is reproduced in nearly 30% of average trials. Similar reservations apply to Pantilimon, Robles, Forster, Ospina and more marginally Cech.

Only three keepers have over performed in the previous two seasons, with location/style/placement corrected expected goals campaigns that each have a likelihood of 10% or less of being replicated by our average keeper.  

Fabianski has the most impressive combination of dual over performance that is least likely to be emulated by an average performer, followed by Lloris and Adrian.

Conclusions should always be couched in probabilistic terms.

Butland and Forster, the two pretenders to Hart's England shirt may well be above average shot stoppers, but current evidence allows for the not insignificant possibility that both are reasonably capable keepers, who have enjoyed a run of good fortune.

And both may be a step or two behind possibly the best combination of likely longevity and current ability shown in the Premier League by Spurs' Hugo Lloris.

Friday 5 August 2016

Ross McCormack, a £12 Million Gamble.

Aston Villa's descent into the Championship was one of the few certainties of the 2015/16 Premier League season.

Four points from a final possible 45, each gained against fellow relegated teams, mirrored Derby's meek, second half of the season collapse nine years earlier.

Villa's problems were extensive and wide ranging, but you had to return to Derby's debacle before you came across a relegated side who scored fewer goals than Villa did last term.

Relegated teams on average improve their goal scoring in the Championship, but even the most optimistic of projections would still likely leave Villa as one of the weakest Premiership attacks undertaking Championship duties.

Therefore, on a superficial level their acquisition of Ross McCormack from Fulham who scored 19 non penalty goals in just over 4,000 minutes of play, appears a sensible move.

However, McCormack turns 30 two weeks into the new season, typically an age when outfield attacking players have begun an aged related decline in output. A £12 million price tag also appears excessive.

Unless Villa are rewarded with an immediate or near immediate return to the top flight, they will be left with a rapidly depreciating asset.

McCormack has spent the majority of his time in England playing at Championship level, initially with Cardiff, then Leeds and latterly Fulham, without tempting a Premier league suitor, even in his prime.

Further alarm bells may ring when we look at the expected goals, based on shot location and type.

During McCormack's two most recent seasons at Fulham he has maintained an impressive volume of goal attempts.

He took slightly fewer shots per 90 in 2015/16, but from slightly better positions and once time played was factored in he was involved in trying to finish chances that were worth 0.25 expected goals per 90 in 2014/15 and 0.28 as season later.

However, his actual total non penalty goals scored rose from 13 to 19 a year later.

An over performance whereby 13 goals are scored from a cumulative expected goals total of 9.9 shouldn't surprise, an average player would achieve this through random chance around 20% of the time.

But 2015/16's efforts where 19 goals are scored compared to an expected 12, which no doubt contributed greatly to his price tag and sparked a bidding war between the relegated Premier League sides, is more difficult to dismiss as mere random fluctuation.

An average player, given McCormack's 2015/16 opportunities would score 19 or more goals just 3% of the time. So have Villa bought that rare commodity, a lethal finisher?

If we first imagine each of the 24 Championship teams has a striker who could have a small chance of over performing to the levels seen in McCormack's figures during a season.

Such an event that may have just a 3% chance of occurring for an individual will be more likely if we examine a larger group of players.

In short, if you had 24 players attempting McCormack's chances each season, you would expect at least one to produce his inflated return of 19 compared to the likely average of 12 goals around every other season, simply through chance.

Villa may hope they have bought a player who was capable of scoring 19 non penalty goals for Fulham last season, but erring on the side of caution, it may be better they assume they have bough a striker who is more likely a 12 NP goal a season purchase.

The good news for Villa fans is that McCormack was also a frequently involved creative influence in supplying chances for his teammates, He setup nine such goals in each of his seasons at Fulham.

On these occasions there were no major disconnects between expectation and reality. In both seasons you would expect an average player to score around 11 goals from the chances McCormack created.

Notwithstanding the possible difference in the quality of teammates in 2016/17, Villa's new buy will be unlikely to over perform to such heights in converting his West Midlands chances, but fans will hope he provides an all round contribution that goes some way to justifying the risk/reward from a £12 million outlay.

Wednesday 3 August 2016

What to Expect from the Relegated Premier League Strikers.

The Championship helps to kick off the English domestic club season on Friday.

For the lost souls of Norwich and Newcastle, second tier football will not quite be the culture shock awaiting Aston Villa, a Premier League side since Day One.

A reunion with cross city rivals, Birmingham will be scant compensation for missing out on trips to the Emirates or Old Trafford for the first time in 24 years.

At least the presence of the big three will ensure that upwardly mobile, Burton Albion finally fill their 7,000 capacity stadium for the first time since their non league days.

Just as promotion to the Premier League heralds a season of reduced matchday expectations, fewer goals scored and more conceded, a trip to the Championship, on average improves a sides attacking and defensive bottom lines.

The arrival of unfamiliar opponents promises a more satisfactory conclusion come fulltime.

Taking a lesson from recent history, Burnley, Hull and QPR failed to retain their Premier League status in 2014/15.

The latter improved the rate at which they scored and conceded goals subsequently in the Championship, while Burnley, as champions and Hull via the playoffs returned to the top flight at the first time of asking.

34 players who had at least one attempt on goal in the Premier League of 2014/15 for the three relegated teams also tried their luck with the same sides in the Championship three months later.

As a group they found the Championship a more rewarding environment to showcase their goal scoring skills.

An attempt made in the Championship, once the shot location and type was accounted for was more likely to result in a goal and was less likely to be blocked compared to their experience in the top flight.

As a group, these loyal retainers increased their non penalty shot volume per 90 from 1.27 in the Premier League to 1.39 in the Championship and their expected goals per shot also rose from 0.07 to 0.1.

22 of the 34 players took more attempts per 90 in the Championship than they had in the Premier League, 24 had a higher expected goals per 90 and 20 ticked both boxes.

26 of the 34 turned probabilistic improvement into actual gains by scoring at non penalty goal rates/90 that were as good or better than their returns in the Premier League a season before,

And the productivity leap was even more pronounced if we look at only attackers and midfielders.

So supporters of the three relegated Premier League teams may rue upcoming life in the second tier, but they should be compensated with more excitement at the attacking end of the field. And this increased firepower from players who struggled in the Premier League is one reason why it is a shade of odds on that one of the three returns immediately as champions.