Friday 30 September 2016

The Biggest Liar in Football.

In a week during which many fresh contenders emerged, the league table remains one of the least trustworthy sources in football.

2-0 leads being the "most dangerous in football" made a welcome reappearance courtesy of James Richardson on the BT European Goals Show and was warmly greeted by the assembled hacks, but "the table never lies" still reigns supreme.

Small sample size, luck laden outcomes, random variation, strength of schedule, red cards, injury counts, new improved/useless players and managers, dodgy, but well intentioned interpretation of the laws, patchily applied, all conspire to produce a transient ranking that broadly sifts the very best from the very worst, but rarely manages to fully reward the bulk of closely matched sides with their just deserts.

Reinterpreting the mass of shots, saves and passes into a better reflection of the past and a less knee jerk projection of the future can be done by simulation of past and future games to generate the now familiar heat maps. These show the range of points a side might have/may well accumulate and the range of potential positions occupied.

This approach admirably illustrates the probabilistic breadth of outcomes that can befall a side given their core achievements, but nothing beats the implied certainty of a singular league position, with as much of the unsustainable luck stripped away.

The backbone of the table above, produced by Tom @UTVilla is the current position occupied by the team in the Premier League.

The expected position to the left is the most likely position occupied by each side based on an expected goals simulation of each match played in the season to date. So Hull are flattered somewhat by their current position, while Stoke should perhaps be a couple of places higher.

The right hand axis uses the actual number of points, ill gotten or otherwise and adds the simulated outcome of each team's remaining fixtures based on their core statistical achievements over the recent past. It includes the season to date, but not exclusively so.

This forecast position grants teams the luck they have enjoyed or endured to date, but denies them the extremes in the up coming months.

Tom's tube map to each side's ultimate potential May destination brilliantly illustrates the likely upwardly mobile or downward spiralling trajectories which may await...except possibly for Pep's Manchester City revolution.

The more mature Championship table, with four more sides compared to the Premier League and six new entries each season perhaps offers a more interesting chart and La Liga completes this initial trio of leagues.

Thursday 22 September 2016

Expected Goals and Game State.

The aim in competitive team sports is to score more goals or points than you allow your opponents.

However, often the route taken is subtly compromised by the ultimate intention of simply winning the contest and scores are neither maximised. nor scores allowed minimised.

More so in higher scoring sports, such as American Football, a side will react to the efforts of a trailing team by allowing yardage and possibly points to be scored against themselves in exchange for the trade of another valuable commodity, namely time run off the clock.

In short, team's react to the current score or game state and the multitude of statistics generated in this phase of a match may not be a true representation of the gulf in quality between the teams in a more equally balanced phase.

"Garbage time" touchdowns when already trailing by four scores may alter our assessment of the abilities of two teams in a more favourable way for the defeated side.

Therefore, it is commonplace to assess an NFL team based on the numbers they record when within one score either side of a tied game and further restrict collection to include pass/run neutral downs and distance, such as 1st and 10.

Football has fewer scoring events than its Stateside cousin, but game state and performance, particularly if measured in expected goals, may benefit by dicing the data to similarly include events that occur within one score of a tied scoreline.

Stoke have made an uncomfortable start to the season where conventional wisdom saw them as capable of taking the next step as a regular top ten and potential top six side.

Instead a run of actual results that more typically reflect their core expected goals figures from the latter half of 2015/16 finds them bottom with a single point.

Last Sunday's match at Crystal Palace highlight's how game state may produce expected goals figures that might not fully reflect the relative team merits.

Stoke shaded Palace in expected goals, but Palace scored four without reply until injury time.

In the admittedly brief period during which the game was level or within a single score, the home team had four attempts to none from City.

Just under half of Stoke's goal attempts and 65% of their total accumulated expected goals came in the final 15 minutes, when the hosts already led 4-0.

"Give Me Hope, Joe Allen"
Perhaps Palace considered conceding four in the final 15 minutes to a team who had failed to score at all in the first 75 was so unlikely they could coast to full time with little risk, save for a narrowing of the virtual divide and the odd real life goal conceded.

Maybe they slipped into the Premier League equivalent of a prevent defence, but few would argue that Stoke's expected goals "victory" over the 90 minutes hid a miscarriage of justice.

Palace out "expected goaled" Stoke with the scores level, when up by one, when up by two and when up by three. Whereas Stoke only dominated when the match was over as a recognisable contest.

With this in mind, here's the ranking of the expected goal difference for all 20 sides in the Premier League for all 2016/17 games and corrected for strength of schedule. Both with the game close and then for whole game data.

     Ranked Expected Goal Difference in All Matches and while Games are Close, 2016/17.

Liverpool have the best expected goal difference counting every minute played this season, but current leaders, Manchester City have been the most dominant in terms of expected goals created and allowed whilst their matches have been at their most contestable.

Friday 16 September 2016

Goalkeeping Talent and/or Luck

Whether you are making a subjective or data based assessment of the skill sets of footballers the approach is similar in both cases.

Has the player out performed a nominally chosen benchmark figure for the attribute you are measuring.

For example, if a keeper makes saves that an experienced observer considers exceptional or if he saves more attempts than expected by a statistical model based on the average performance of his peers.

The limitations of such observational based evaluation lies in sample sizes. Keepers may produce hot performances that inevitably cool and aren't indicative of the general level of performance and levels of good or bad fortune may be present in any data set.

I've 67 on target attempts faced by Fraser Forster in 2015/16, he conceded 17 goals and simulating these shots faced can show how often an average keeper, represented by an expected goals model, would have conceded as many or fewer than Forster did that season.

Forster's 17 goals conceded is equalled or bettered by an average keeper in around 24% of trials and is represented by the orange part of the distribution.

While an over achievement is obviously desirable, simulating all attempts also adds information about how likely it was this over achievement occurred by chance. Forster may not maintain these levels, but he may be a consistently above average shot stopper.

The table below uses shot data from 2015/16 to see how often an "average keeper" simulation of the actual attempts faced by Premier League keepers resulted in the par keeper equalling or bettering the actual number of goals conceded in reality by each keeper.

Only 11% of simulations managed to equal or best Fabianski, whereas at the opposite end of the table Stoke's keeper crisis in the continued injury absence of Jack Butland is starkly revealed.

Their current first choice keeper is 40 year old Shay Given, who is unsurprisingly injury prone and his performance in limited appearances last season was equalled or bettered by 80% of average Premier League goal keeping prowess.

Jakob Haugaard, his younger and initially preferred counterpart, raises that under performance to include every average keeping simulation.

Haugaard conceded 9 goals from 18 on target attempts against a cumulative expectation of just over three and I've yet to find an iteration of those 18 attempts that does worse than the young Danes' unimpressive introduction to Premier League duty.

Shot saving is off course one of many abilities demanded of goalkeepers by modern day managers, but it remains a substantial contributing factor to their valuation and age is also a factor. Even keepers have ageing curves, albeit right shifted compared to the overall footballing sample.

These two simple inputs of age and likelihood of over performance from last season broadly correlate to each keepers current valuation on Transfermarkt.

Courtois and de Gea's valuations, rightly or wrongly have made the jump to vie with the inflated valuations of mainstream attacking players, but the input of the two variables listed above generally identifies those who command a high price amongst their peers (even if their over performance is likely to regress) and those who risk being replaced by a loan signing from the Championship (even if their parlous under performance is also likely to become less extreme).

Thursday 15 September 2016

The Championship After 7 Games.

The English second tier makes good use of the early season midweek to cram in as many matches as possible. Already teams have played nearly twice as many league matches as their Premier League counterparts and around 1/8 of the season is already in the record books.

The division is often more chaotic than the Premier League, boasting four additional sides and six new members each year by way of relegation from the top tier and promotion from the third.

So there is potentially a big gulf in class between the strongest teams in the division, represented by Premier League regulars on a brief excursion to pastures new, such as Newcastle and under resourced over achievers with a recent history of non league football, such as Burton.

However, these financial mismatches often mask a bulk of the division where resources and abilities are broadly similar and the difference between a push for the playoffs or an anxious spring spent looking downwards can be as much down to the vagaries of luck as it is to the careful assembly of talent.

In the tables below, I've simulated the remainder of the season, based largely on a weighted combination of each team's expected goals created and allowed so far in 2016/17 and their expected goals performances from the previous season.

The points won in each iteration is then added to the actual points they've won so far to chart the % likelihood that each team will finish in a particular position in May.

                                       Simulated Final League Positions After Seven Games.

The preseason bullishness about the chances of Newcastle returning instantly to the Premier League, not just as a promoted team but as champions, is even more apparent after seven games. they spent the first couple of matches at the foot of the early table, but now have top spot firmly in their sights.

A slow drift in the early market has been reversed and they are a shade of odds on to be crowned champions in May, a sentiment endorsed by the simulations.

Early pacesetters, Huddersfield are most likely to finish in the final playoff spot, but in common with many sides their range of possible finishing positions spans much of the table.  Their accumulated chances make them more likely than not to miss any kind of post season shot a promotion.

Wolves are as likely to finish top as they are to finish bottom, although neither outcome is very likely.

Around a third of the sides are probably looking at a bottom half finish as their most likely final resting place, amongst them newly promoted Burton and premier League and European giants of the past, Leeds and Nottingham Forest.

Of the other relegated teams, Norwich are strongly expected to join Newcastle with an immediate return either automatically or via a playoff attempt, while Villa are posting the kind of position spreads that would have been acceptable to their fans in the Premier League, but not in the lower tier.

Tinges of green hint at the prospect of decent seasons for Brighton, Sheffield Wednesday and Bristol City and less so for Rotherham.