Sunday 27 December 2015

Express Yourself.

I occasionally write about the efficiency of rugby union kickers using a model that has many things in common with the expected goals models in use in soccer.

The rugby edition uses fewer variables than its cousin, primarily kick location and footedness of the kicker, but it does differ in that many of the simpler attempts have expectations that approach 100%.

Therefore, many kickers have a near perfect conversion rates from a particular distance and angle.

One choice that is common to each model is how to express a player's over or under performance and typically the percentage above or below the expectation of an average kicker is used. Or occasionally a +/- differential expressed in goals or points in the case of rugby.

Repeatability is always a desirable quality of a metric that professes to capture aspects of player talent. Performance levels may fluctuate for a variety of reasons, injury, aging or simply random variation, but if a model is to be useful we would expect to see some season on season correlation between the metrics we are recording.

Expected goals models profess to show those who are performing above the general average expectation and often this is used to illustrate above average ability, although inevitably buoyed also by luck.

However, often these levels of over performance are not repeated in future seasons, inevitably calling into question the validity and usefulness of a particular model.

Inevitably these models lack all of the inputs to adequately quantify the abilities we are trying to measure, but part of the problem may be down to how the outputs are expressed, especially if some chances are highly likely to be successfully taken.

Imagine an idealised example from soccer.

A player takes five shots at goal, each has a 20% chance of resulting in a goal, so the average expectation is that he scores once. In the field he scores twice, so he's doubled his expected goals, scoring 200% of an average player.

In terms of goal differential, he's +1 goal.

His next five attempts are much better chances with an 80% chance of scoring, akin to the relatively automatic conversions I see in rugby.

He should score on average four goals, but buoyed by being dubbed a hot striker on BT sport he converts all five. He's perfect, he couldn't have scored any more.

In terms of differentials, he's now plus 2. The average player would expect to score 5 from his last 10 attempts , but our player has scored 7. He has been rewarded for his perfection by seeing his differential above average increase from +1 after 5 attempts to +2 after 10.

How does an index approach reward his recent spree?

After 5 20% attempts, his two scores were scoring at twice the expected average rate (2 goals instead of just 1). But once we include his arguably more impressive perfect five from five 80% chances it actually reduces his rate of over performance from twice the average to 1.4 times the average rate (7 goals instead of an expected 5).

Two ways of expressing the output from an expected goals model. Differentials reward a run of perfection by improving the rating from 1 to 2, while a rate approach decreases the rating from 2 to 1.4.

Intuitively the latter would appear flawed when applied to a player who attempts high value chances and the quality of your model notwithstanding, how you chose to express the output may impact on your chances of finding year on year correlations.

A third alternative is to run a simulation of the individual expectation for each attempt and see how many trials are as good or better than the result achieved by the player under scrutiny.

25% of true average players would score two or more from 5 20% attempts just by luck, but only 11% would manage 7 from the ten attempts further described.

A player who continually finds himself in this lucky subset, may simply be better than the average striker and using an approach that accounts for the distribution of the quality of his chances may not see his rate bounce around if he has a bout of successful goal hanging.

Tuesday 22 December 2015

Home Field Advantage in the 2015/16 Premier League.

The 2015/16 Premier League season has been portrayed as a remarkable one in which the natural order has been upturned. Although as Simon Gleave points out, the only major dislocation from previous years is that Leicester and Chelsea has switched shirts.

Otherwise, the expected strugglers are struggling, the usual title contenders are heading the betting market, if not the actual table and a handful of unfancied mid table fodder has leapt into the top half of the table buoyed by good play, small sample size and a bit of good fortune.

A minor sub plot has been the near equality of home and away results. The raft of away successes early in the season highlighted the apparent supremacy of travelling teams and I suggested that a quarter of the season was insufficient to declare a sea change.

After 170 matches home wins are now back ahead of away victories, but the lead is a narrow one.

Expressed as a success rate, where draws are treated as half a win, away teams are running at 0.49 and home wins, unsurprisingly 0.51.

Historically, the trend is for decreasing levels of home field advantage, although there are inevitably peaks and troughs within the general descent. It therefore makes sense to see if a run of 170 matches where home and away teams came close to parity is unusual in the recent past.

HFA, on the wane.
Success rate for away teams in groups of 170 consecutive matches has ranged from lows of 0.34 to highs of 0.48 since 2002, excluding this season.

2008 began with away teams achieving a 0.46 success rate across the opening 170 matches with home teams outscoring their visitors by just over 0.1 of a goal per match.

But over the season as a whole, home teams were on average superior by 0.32 of a goal per game and away side had a success rate of 0.42.

So evidence for a closing of the gap between host and visitor, but not for parity.

Expected goals for 2015/16 confirm a period of matches where home and away teams have been closely matched, with the former outscoring the latter by just over one tenth of a goal per game.

Simulating all 170 matches results in away sides having an above 0.5 success rate in 16% of the seasons and a success rate as good or better than their actual record in 30% of simulations.

That leaves around 80% of simulated seasons where home teams have the higher success rate and 10% of seasons where that success rate is a healthy 56% or higher.

Again, a continued closing perhaps, rather than an elimination of home field advantage.

The causes of home field advantage, not just in soccer, is not well understood nor universally accepted. Even in the cossetted environment of modern soccer, travel may play a small role, as may crowd support.

And these factors may subtly change from season to season.

However, an important contributor to individual match outcomes is red cards. Eleven verses ten or even nine, is on average a big advantage to the numerically superior team.

Historically away sides suffer more red cards, not particularly because of referee bias, but simply because they are forced into making more tackles.

In 2014/15, Premier League home sides lost 600 playing minutes to red cards compared to 1000 for their visitors. The previous season it was broadly similar, 520 minutes lost by the hosts and 1070 for the away team.

So far in 2015/16 this potent, but relatively rare event is favouring the visitors. Home teams have lost 460 minutes to red cards spread across 11 matches, seven of which have been lost and two drawn.

Away teams have lost 420 minutes.

Home teams may have been unlucky so far based on expected goals in just 170 matches. Refs may not continue to find fault with the home players in a way that is unusual in recent seasons and home field advantage may continue to be a depreciating, but real feature of the current Premier League.

Thursday 17 December 2015

Chelsea Win In Cyberspace.

Chelsea's defeat at Leicester on Monday night was hardly made more bearable by their moral victory over the title leaders on a myriad of spreadsheets. It certainly failed to impress Roman Abramovich.

From the context-less reaches of hyperspace to the sports pages of the Guardian, the floundering champions gathered three virtual league points in a hard fought, but decisive, expected goals victory that barely required more than one decimal place to confirm their eventual superiority.

In a straight summation of expected goals it is difficult to find a model that didn't rate Chelsea above Leicester on the night despite the Foxes' 2-1 win.

For those who watched the match (and anyone who quotes expected goals or some such, is automatically assumed not to have bothered), the expected goal figures do not pass the eye test.

Part of the problem may arise from incomplete models.

The game was level until the 34th minute, whereupon Leicester took a lead that they increased, saw it reduce, but subsequently kept.

Leicester had five shots on target spread across the 2nd minute to the 48th and none thereafter.

Chelsea had no shots or headers on target until the 62nd minute and four in total, ending with Remy's 77th minute headed goal.

So a "game of two halves".

Game state, score effects, or how ever you wish to describe them eventually alter a side's approach to the match. Risk, reward subtly change based on score line, abilities and time remaining.

A side chasing a deficit appears to see their chances of scoring reduced by around 15% compared to the same opportunity from a side that has the lead. Possibly due to different levels of defensive pressure throughout the chance creation process.

So Chelsea's chances may not have been as gilt edged as they appeared merely from shot locations.

Also closely related events are not additive and Chelsea's two opportunities around the 62nd minute where close enough to have only reasonably been able to deliver a single goal.

If you include these factors on your spreadsheet, the game remains with Chelsea, but they only win around 25% of simulations, with Leicester taking 20% and avoiding defeat in the remaining 55%.

In Cyberspace no one can hear Mourinho scream. (credit @lubomerkov)
Expected goals is a flexible tool, rather than a true reflection of what the score should have been in the context driven environment of a single 90+ minutes.

It can be used to illuminate the effects of last throws of the tactical dice, such as when Chelsea sent caution to the wind.

For example, once Leicester had a two goal lead we can hazard a guess as to the likelihood that Chelsea's mini barrage of chances could engineer a comeback. Leicester hold on for a win around 40% of the time and draw a similar percentage of simulations, despite not troubling Courtois in their second half display.

We can equally ask how likely Leicester were to score two goals without reply when they were on the offensive front foot during the first 48 minutes and whether that outcome was typical for such a first half performance.

In reviewing a single game exp goals just adds another layer of information, it's as useful or useless as bringing us news about the dressing room psyche or attempting to second guess a manager's in game intentions, however eloquently and subjectively they are presented.

Thursday 10 December 2015

Monk Loses the Winning Habit.

At the end of August 2014, two sides, Swansea and Chelsea were vying for the lead in the Premier League table with a 100% record from three matches. Aston Villa were third.

So it represented business as usual for Mourinho and a validation of the soon to be written raft of complementary articles about the bright new manager in charge at Swansea, Garry Monk.

As of yesterday two of those three teams have seen managerial change and Mourinho's tenure hangs by a Champions League thread.

Monk's hot start to the season was prolonged enough to earn him a place on the Daily Telegraph's shortlist of six for manager of the season, along with the beleaguered Mourinho and the subsequently dispensed with Sam Alladyce.

A near 50% attrition rate in the blink of an eye.

It is becoming commonplace to increasingly acknowledge the role that luck plays in shaping a relatively short, skill based competition, such as a Premier League season.

More data hungry models in late 2014 were already suggesting that Swansea had been relatively shot shy and fortunate even as they remained buoyant, only slightly removed from their August heights,

An abundance of 1-0 wins, seven in total by May, further hinted at a solid mid table side inflated upwards by random, most likely non repeatable events.

Premier League managers, always looking over their shoulders.
Outsiders are never privy to the inner workings of the professional relationships with a football club that may drive change, but it was particularly unfortunate for Monk that an immediate see-sawing of narrowly contested 1-0 games fell so badly for him in 2015/16.

Five such league defeats and a cup exit since August 2015.

Extremes, such as Monk may have benefited from in 2014/15, tend to be less extreme in the future, but fueled by euphoria and congratulatory broadsheets, they tend to become the normal expectation from both the fan base and employer.

Swansea's 14 actual points through 15 games are around a win shy of their most likely total based on shot model simulations and they have created enough to have had a 1 in four chance of bettering the 20 or more points that would have invited a more prosperous New Year.

The reality is probably that, in part at least, a straight comparison has been made between the 0.9 points per game this term and the near 1.5 points per match in 2014/15 and knees have been jerked.

Random variation gives and it sometimes cruelly takes away.

Wednesday 9 December 2015

Rebranding Stoke.

The view that you're as good or bad as your last performance usually flourishes in the online club fora and the soundbite world of football punditry.

So it was hardly surprising that Stoke briefly rose to the dizzy heights of everyone's favourite second team following their comprehensive and visually pleasing defeat of second placed Manchester City on Saturday lunchtime.

Stokealona or my own favourite, Inter City, briefly trended.

The tendency to stereotype teams and players based often on stale evidence from seasons long gone, is a trait that continues to surprise.

On Saturday the realisation gradually dawned on the BT commentary team that even players who remained from the rump of Tony Pulis' ingeniously devised, but widely despised system, could actually participate in a passing based evolution.

Quotes from opposing managers who should really have known better suggesting "We know what to expect from Stoke", while packing the team bus with six foot plus defenders, was amusingly familiar even while the Hughes revolution stumbled uncertainly from possession poor to possession normal.

Pass completion rates for the likes of Cameron, Whelan and Shawcross were poor under Pulis not because those players couldn't pass the ball, but because they were required to implement an approach that at its most extreme coveted distance over retention.

Following the fairly amicable parting of the ways, Pulis' brand of survival at all costs football swept through the lower reaches of the Premier League, first at Palace, later in a delicious irony at WBA, sending the passing stats of competent players plummeting in the process.

Raw shooting differentials failed to spot the trade off between shot quantity and shot location, as Stoke under Pulis invited the opposition to shoot frequently from distance, while they bundled in sufficient goals at the other end from just inside the six yard box.

Shaqiri, along with Afellay, Arnautovic, Bojan, Joselu and Muniesa, "He plays for City!".
Hughes' Stoke has partly borrowed from the Pulis blueprint, recruiting flawed jewels from a wider market. Careers marked by injury, under achievement or a temperament that prefers to invite a post game red card, rather than celebrate a brace of match winning goals, has allowed the assembly of unprecedented talent in the Potteries.

But while plaudits are a welcome change, Stoke's longterm prospects should perhaps be viewed in the context of their accumulated stats. Just as Pulis' Stoke were legitimately better than a swift glance at their shot differentials implied, Hughes' infinitely more entertaining version may be better judged on their statistical achievements this term.

Individual match performance will invariably fluctuate. One or two perceived improved results do not make a trend and Stoke are more usually to be found in the lower half of current league simulations, a handful of expected points below their actual current total of 22.

The Hughes revolution hasn't taken Stoke, puns apart, into the higher echelons of European football. they've merely entered the Premier League mid table tactical mainstream.

Thursday 3 December 2015

Diego Costa, Head & Shoulders Above the Rest?

There have been some great stats on potential finishing ability posted here on Dan Kennett's twitter feed. Naturally the focus was on Liverpool players, particularly Daniel Sturridge and the post proved timely following Wednesday night's 6-1 away victory at Southampton.

Identifying different levels of finishing ability is always going to be challenging in a sport where scoring opportunities are relatively rare.

Squad rotation, substitution and injuries often deprives strikers of playing time and few manage more than five attempts per 90 minutes.

Even in Dan's comprehensive list of the highest achievers, only a handful of players have exceeded 10,000 minutes since 2011, the time it allegedly takes to master a skill.

Topping the list of currently active Premier League strikers is the recently rested Diego Costa. A 23% conversion rate has been achieved in less than 100 attempts, a small sample size compared to the remaining players in Dan's list, who average nearly 300 attempts each.

Therefore, although Costa's conversion rate is well in excess of Aguero and Sturridge, his nearest EPL challengers, there must be a suspicion that his 23% rate is unsustainable, long term compared just under 15% for the other two.

Who Needs a Fit 16% Striker Every Week!
There are a variety of approaches that are currently available try to identify finishing skill.

Expected goal models add an extra level of insight. But they are data hungry and potentially susceptible to rare events, such as deflected shots. Ultimately they only measure a player's deviation from the norm expected by that particular shot based model, which itself is almost certainly incomplete.

It is also rare to see such model based analysis address the likelihood that any over or under performance occurred merely by chance.

If instead we assume the chances presented to these out and out strikers are broadly similar, we can see if the spread of conversion rates is wide enough to imply differing levels of finishing skill within the chosen group.

This approach focuses more on the role of random chance and incorporates sample size, while assuming chance quality is similar for each player.

In short, it is a flawed, polar opposite approach to that of an equally flawed shot based model.

Regressed Conversion Rate for EPL Strikers 20011/12-2015/16.

Data Credit - Dan Kennett.

Player Regressed Conversion Rate.
Costa 0.149
Hernandez 0.148
van Persie 0.147
Aguero 0.147
Sturridge 0.147
.............................. ..........................
Defoe 0.142
Lukaku 0.141
Suarez 0.141
Ba 0.139
Bony 0.139

The spread seen in Dan's numbers are just extreme enough to conclude that there is some evidence that finishing skill may exist. 

Costa remains the highest rated finisher, but his numbers are regressed by over 90% towards the group average because of his relatively low number of attempts. We have to go to the third decimal place to elevate him above the next four highest players, including Sturridge.

Similarly, the gap between the most and least efficient finisher is now just 1%, rather than the 12% seen in the raw data.

It would be unusual to see a wide range of true finishing abilities at the elite level of a professional sport. 

There may be tentative evidence to suggest that a narrow gap does exist (perhaps traditional scouting could contribute the eye test) and Daniel Sturridge is towards the top of such a pecking order...but can he do it on a bitterly cold night in a minor cup competition in January at the Britannia Stadium!

Monday 30 November 2015

The Alternative EPL Table, Iteration Number One.

In A Galaxy Far, Far Away.........................

Week 14 of the current season ended in a 2-1 defeat for the league leaders, Arsenal, in a game of few chances at Norwich. They spurned the opportunity to extend their lead over Manchester City who went down by the odd goal in a five goal thriller at home to Southampton on Saturday.

Spurs heaped more pain on defending champions, Chelsea, winning comfortably, 2-0. Chelsea are now beginning to become cut adrift at the foot of the table along with Newcastle, who rode their luck in a goalless draw with Palace.

Former manager of the year, Tony Pulis further enhanced his reputation by keeping his WBA team in the top four after a 1-1 draw with near rivals WHU.

Fallen giants, Liverpool and Manchester United, each recorded narrow 1-0 wins against Swansea and Leicester, respectively, but still remain detached from the title race.

Gary Monk's Swansea's league position continues to spotlight the talents of their young exciting manager, while Leicester, as predicted are struggling to emulate their strong finish last season and just stay clear of the relegation positions under Claudio Ranieri.

Stoke played half a game with ten men.

The Premier League, sponsored by Random Shot Based Simulations Inc.

Sunday 29 November 2015

The Table Often Lies.

The group phase of the Champions League is cleverly designed to maintain interest to the end of match day six.

The winners of the group are seeded to meet the runners up from the other groups, subject to a restriction that prevents a team playing another from the same country.

The third placed teams become the favourites to lift the Europa League and the fourth placed teams get to spend more quality time with their families.

% To Win Group
% To Qualify 
% To Drop into Europa League
Median Points
Bayer Leverkusen
BATE Borisov

So even if one team is strongly favoured to win the group, there is likely much to play for over the full course of the twelve games.

Group E was the easiest to predict the most likely winner. Top ranked Spanish Champions League representatives are generally superior to second ranked Serie A teams, who in turn are superior to below average German qualifiers. Each of which are superior to a champion from Belarus.

My pre tournament estimation of the chances of each team filling a particular spot for Pinnacle can be seen in the table above.

Others took a slightly more optimistic view of the chances of Bayer Leverkusen and a typical average points expectation based on reported odds was an average of 13.5 points for Barcelona, 9 for Bayer Leverkusen, 8 for Roma and 3 for BATE Borisov.

Already we have a dilemma in deciding if finishing position fully reflects talent. Namely, opinions differ as to who is the best to start with and the argument begins to become circular.

If we again treat the bookmaking odds as all seeing and able to perfectly capture the true chances of each side in each individual group match, it would be useful to know how often the teams finish in the order predicted by the bookmaking.

36% of Group E simulations finish with Barca as top seed, Leverkusen as runner up, Roma as Europa League bound and BATE going home where each team plays six matches.

If we double the tournament length so that each team plays twelve rather than six games, a finishing order that reflects the odds we've used in the simulations is still less than a 50% chance.

Quadrupling the length of the group stage to 24 matches per team produces the desired order to reflect talent just over 60% of the time.

Should you wish to trade increasing certainty for viewer indifference, a 96 game per team schedule pushes to beyond 77% the percentage of iterations with Barca as seeded knockout stage entrants, followed by Leverkusen and Roma Europa League bound.

Competitions must strike a balance to reward talent (to please the participants) and maintain a degree of uncertainty (to enthrall the audience), while sticking to a manageable time frame.

So in the real world, especially where teams are closely matched, the table often lies.

Saturday 28 November 2015

Who is the Best Team in Group D?

The Uefa Champions League Groups are concluded in early December and the list of already qualified teams is a familiar mix of Spanish, German, French and English teams.

Seeding and a six game, home and away league format attempts to ensure the best sides progress and aren't caught out in a single knockout match,

Group D appeared one of the most competitive prior to match day one, containing a representative of the four major European countries, in Juventus, Manchester City, Sevilla and Borussia Moenchengladbach.

Juve and City had already qualified after match day four, with the former holding a two point lead going into match day six. Bookmaking odds had favoured Manchester City to top the group, although in a preview written for Pinnacle, I suggested that Juventus should be considered the more likely group topping side.

So it is tempting to claim that my assessment of the two sides was well founded.

However, although the runner up from last season currently head the group, they are merely odds on to top it after the final reckoning, so City may still justify pre tournament, group favouritism.

There is also a significant possibility that City may be superior to Juve, but still finish below them after just six home and away, round robin games.

The bookmaker's assessment of each of the four teams can be broadly demonstrated by converting the match odds into expected group points. City were expected to gain an average of 10.3 points compared to 9.6 for Juve, with Sevilla and Moenchengladbach trailing with 8.4 and 4.9 points, respectively.

More usefully, simulating the group using the same bookmaking odds that favoured City results in the English team topping the table 45% of the time compared to 35%, 15% and 5% for the respective remaining three teams.

So City may be the best side in the group, but it is more likely that someone else would top the table if they play just six group stage matches.

Therefore, the books may claim to have been correct, but randomness may still overcome their best predictions after match day six.

If we extend the group stage fourfold so that each team plays 24 matches, the separation between the current top two and the others increases.

Juve now win the group 33% of the time and City become slightly more likely than not to top the group. Sevilla has a 9% of winning and Moenchengladbach is relegated to the realms of improbable, but not quite impossible.

If we become even more extravagant and condemn each team to play 96 group games based on bookmaker's odds, City are now a 74% chance to top the group, Juve 25% with the other two teams a virtual probabilistic irrelevance.

Even turning a four team group into a weekly event spread over nearly two years, there is still a one in four chance that the best team doesn't win the group when two of the teams are closely matched.

Therefore, declaring one team superior to another, even based on a 38 game Premier League season almost certainly fails to account for a set of trials that are awash with randomness and similarly comparing players based on a paltry amount of data, often without context is likely to be equally misleading.

League position = talent + randomness.

Monday 23 November 2015

Has Luck Made the Title Race Highly Competitive?

The 2015/16 Premier League has thrown up a multitude of talking points, ranging from Chelsea and Leicester swapping identities to the log jam of teams that are in competition at the top of the table after 13 matches.

Leicester entered the Premier League as above average Championship winners, so survival in 2014/15 was more likely than not, but with Premier League points sometimes coming in uneven bursts, a title contending run was required to secure safety over the final months of the season.

Despite the encouraging finish to 2014/15, optimism was hardly raised by the appointment of Claudio Ranieri, a manager better known in England for his extreme squad rotation policy than his previous tenure at the head of a title contending side.

Leicester's current position is a pleasing juxtaposition to 2014/15 when they were bottom with 10 points after 13 games compared to topping the table in 2015/16 having dropped just 11 points.

Shooting stats don't mark Leicester down as the most likely leaders of the Premier League, they've been lucky in converting chances in matches where they've not dominated in terms of volume, but they are currently a legitimately improved side from the previous season.

Leicester's 28 points after 13 matches ties with the 2010 Chelsea side as an historical low points total for a table topping team after a baker's dozen since 2000. However, points alone are a poor measure of a team's standing compared to the remainder of the league.

Currently, just four points separate the top five, compared to a nine point gap when Chelsea led the title race with 28 points after 13 games in 2010.

When measured in terms of how many standard deviations a team leading the table is above the current average points total for the league, the 2010 Chelsea team were more dominant than Leicester are despite both sides having identical records.

How Dominant were the Leaders after 13 Games.

Table Topping Team. Year Start. Standard Score.
Manchester U 2000 2.02
Liverpool 2001 2.03
Liverpool 2002 2.07
Arsenal 2003 2.12
Chelsea 2004 2.23
Chelsea 2005 2.22
Manchester U 2006 2.38
Arsenal 2007 2.96
Chelsea 2008 2.30
Chelsea 2009 2.31
Chelsea 2010 2.14
Manchester C 2011 2.12
Manchester U 2012 1.94
Arsenal 2013 1.96
Chelsea 2014 2.51
Leicester. 2015 1.51

(Teams in bold won title).

In 2015 to date, Leicester are just 1.5 standard deviations above league average, the least dominant achievement for a leader after 13 games this century by some distance. 

There are three challengers each within a win of overhauling them, including Manchester United whom they play next and it is tempting to cite the closeness of the title race and Leicester's position at the head of it, as proof that standards may be falling in the Premier League.

However, just as random variation in the matches so far may have been kind to the Foxes, it may also have compressed the higher reaches of the table compared to more recent seasons.

Instead of tracking simply points won by the leaders in shot based simulations of the current table, we can partly estimate how competitive each iteration was by calculating the distribution of standard scores for each table topping team.

Slightly more than 10% of season simulations for 2015/16 give a leader who is less dominant than Leicester are to date. The most likely outcome produces a leader that is between 1.8 and 1.9 standard deviations above average and 22% of simulated leaders have standard scores of 2.0 or above.

Random variation may possibly have propelled Leicester to the head of the Premier League. It may also be responsible for making the current table appear more competitive than it may actually be.

A single anomalous batch of 130 matches is far too early to call time on the title regulars, induct new members or declare a new found equality in the higher reaches of the table, 

Saturday 21 November 2015

Everyone Loves Ricky.

Everyone loves an individual goal. Be it Peter Beagrie beating six players (or the same player six times) in the late 80's to John Barnes in the Maracana and Ricky Villa lighting up the old Wembley Stadium.

It may be a trick of the mind, but such unassisted goals also seem to have an air of inevitability. Once the last line of defence is reached the keeper rarely spoils the party.

There may be legitimate reason for this impression. A player who has largely created his own chance has often disrupted any defensive organisation that had previously existed, while being fully in control of the ball, rather than stretching to master an over hit assist.

While imperfect, the absence of an assist in an attempt description might serve to identify goals or shots that were a result of individual skills, rather than a chance created through a series of team based passes.

Using a season's worth of shot data from open play, it does appear that unassisted goal attempts result in scores at a higher rate than attempts originating from an assist. This may have occurred by chance, but the analysis strongly suggests otherwise.

A Spurs legend dreams of historical deeds.
As a baseline figure using single season data, once location is accounted for, an unassisted on goal attempt is around 10% more likely to be scored than an attempt that came about by a teammate setting up the chance.

This has implications for both teams and players who may be adept at creating potentially better quality chances through individual effort compared to relying more on a teamwork based approach, where the opposition may be able to defend more cohesively.

In 2012/13 only 15% of Arsenal's on goal attempts from open play were lone wolf attempts compared to 25% for Sunderland. However, variation of percentages should be expected, even if all teams have broadly the same propensity to create individually crafted chances.

Premier League attacks, based on the limited data I have, do created widely different numbers of chances from open play, (Arsenal created almost twice the number for Sunderland) and within these chances are varying proportions of individually created chances.

However, the spread in 2012/13 was insufficient to conclude that Sunderland's higher proportion of individually created chances compared to say Arsenal, is a real trait that may persist. It could be, but more data is needed.

The same could not be said for Premier League defences in 2012/13. The league average was for 20% of open play chances faced to be predominately the product of an individuals efforts. But this fell to 9% for Reading's defence to a high of 29 for Arsenal.

This time the spread could not be explained away as merely random variation. At the very least for that season, opponents seemed to be attacking certain sides in a variety of biased approaches from open play.

The quantity of chances faced will always overwhelm any persistent bias in the type of chance allowed, but identifying if and perhaps why a side is allowing a larger percentage of individually created chances, that may carry a greater sting in the tail, may make for marginal gains.

Also individuals, who aren't called Messi or Ronaldo, may be unfairly define as having a lucky season, when they are actually rather good at persistently emulating Ricardo Julio Villa.

Thursday 19 November 2015

Bad Mood.

On Tuesday night I won the lottery. OK not really, but I was involved in an extremely low probability occurrence.

To briefly describe the event. We were going to see The Vaccines at Wolves Civic Hall, an indie rock band from West London with a fan base that is almost exclusively young. So our mere presence as a very small right hand tail of the audience age distribution was unlikely in itself.

The route, car used and departure time was dependent on a multitude of random decisions, chosen arbitrarily on the night.

How long it took to bribe the cat with extra food, which routes to take to avoid the traffic chaos that is currently Stafford town centre, how many cars to overtake, when safe and legal to do so and how many cars to let out from side roads.

Not my fault.
An hour after a journey that usually takes an hour in total we were still crawling through Stafford when the remnants of hurricane Barney deposited a six foot long tree branch onto the roof of my 2003 mini bought new on the same day England won the Rugby World Cup..

There were no injuries, the car was driveable, but a total write off.

No other car I saw that night was driving around with recent storm damage and none of the drivers appeared particularly adept at dodging unseen projectiles. So I concluded that I'd been unlucky in the extreme. (although when you emerge unscathed from such incidents, you're invariably told how lucky you've been).

The chances that I was going to be on the wrong end of a tree branch on Monday prior to the event was very, very tiny.

Improbable, but not impossible and the same was true for everyone else on the road. But so many drivers were around that night across the path of Barney, each with a very small chance of getting written off that the chance that someone was going to suffer that fate was significant.

I was just the "lucky" one.

1 chance in 20 is often taken as the arbitrary measure of when something unseen is thought to be at play, such as talent to avoid flying trees or to score more goals than expected.

But unless there is corroborative evidence to back up the claim that an event is the product of additional skill and luck, it is worth seeing how many players are "on the road" in case you are simply seeing a notable and unlikely event that was almost bound to happen by chance to one of a numerous group of similarly talented individuals.

Wednesday 18 November 2015

Goal Expectancy Per Minute.

Top of most manager or fan's wishlist in the January window is a 15 goal a season striker. But what actually constitutes such a potential purchase?

In terms of goals scored, a penalty or free kick taking striker in a moderately successful side could easily reach half way to this benchmark from dead ball goals alone.

If they are also the intended target for other set piece plays, have bolstered their recent tallies with a couple of deflected chances (that are extremely difficult to save, but may not be repeatable in the long term) and benefited from the occasional goal keeping error, then the bulk of their recent record may be deceptive.

We therefore may chose to just look at goals that are scored from open play as a more revealing statistics, sieved of wrong footed keepers as well as the added advantage of regularly striking a dead ball.

However, this approach merely invites the other inherent uncertainty of random variation.

Expected goals models hope to illustrate how likely an average player is to score with a shot or header. Over or under performance against this cumulative expectation is often taken to be a sign of above or below average finishing talent, but is far more likely to be mostly down to simple variance within relatively small numbers of probabilistic events.

A fair coin exhibits no skill by falling heads up six times out of ten.

By looking at a player's expected goals from open play, we may eliminate the inbuilt advantage of being the chosen one for set pieces, as well a move to a probabilistic, rather than outcome based assessment. But we still need to adjust for time on the field.

Christian Benteke's 2012/13 season at Aston Villa was rewarded with nine actual goals from open play in 2820 minutes of playing time from chances that had a cumulative goal expectancy of 5.5 goals.

An impressive over performance,

He had required the keeper to save on target attempts that would yield an average player 0.00195 expected goals from open play per minute, but he was actually scoring at a rate of 0.0032 open play goals per minute.

His over performance in converting his attempts in 2012/13 placed him statistically alongside potential superstars, such as Bale, solid Premier League performers, such as Walcott and Cazorla and someone called Michu.

However, it is a simple exercise to simulate the likelihood that an average finisher scores at least nine goals from the chances Benteke put on target in 2012/13 (it's around 8%). So it was eminently possible that his actual over performance in converting chances was simply due to good luck.

Benteke laments the influence of random variation on his actual goal tally.
Anyone who might have considered Benteke as an acquisition capable of 15 goals a season (or perhaps 9 open play goals) may have waited until the 2013/14 January window to pounce, rather than pay an inflate fee for a player who may have visibly, if misleadingly demonstrated his potential.

Up to January 1st of the following season, Benteke had fared less well in open play. Scoring just once in 1145 minutes of play (0.0009 actual goals per minute) compared to a goal expectancy of 1.8 or 0.0016 expected goals per minute.

This time he under performed rather than over performed against his goal expectation based on where and how he took his shots.

But again it may have just been a less extreme dose of random variation. It was a 47% chance that an average finisher would score one or zero goals from Benteke's on target attempts in the first half of 2013/14.

Benteke's open play goal expectation per minute is relatively consistent from 2012/13 to 2013/14, more so than his actual goals per minute. And the latter could reasonably have occurred as a random draw from the former.

This correlation for goal expectation per minute from open play is also stronger across seasons for attacking players as a group than is their actual scoring rate per minute.

Improvements in estimating a player's goal expectation is relatively easy, if data hungry.

Allowances for goal scoring environment which particularly impacts frequent substitutes and allowing for ageing (goal expectation appears to follow the typical ageing curve with a peak in a players late 20's).

Rather than looking at a players actual scoring record, which has less connection with his future scoring feats, it may be wiser to look at his goal expectation per minute in his more recent seasons.

And what fans really want as a late Christmas present might just be a 0.005 goal expectancy per minute striker, preferably in his early to mid 20's.

Jon Walters, International Football's Best Penalty Taker.

Judging players on what they do rather than applying a more probabilistic approach often leads to misleading conclusions.

On Monday night, Jon Walters' brace of goals sent the Republic of Ireland to Euro 2016.

The opening goal was from the penalty spot and came just over a year since Walters was branded by The Daily Telegraph as one of the worst penalty takers in the Premier League             

So if you disregard the random variation that exists in a small number of penalties taken by any perfectly capable professional footballer and instead go with the Telegraph's confident headline, it appears a brave call by Ireland to entrust such an important kick to such a wretched performer from 12 yards.

Perhaps Martin O'Neill includes probabilistic thinking in his list of skills and knew that even at his lowest conversion rate, Walters was still likely to be a perfectly adequate 78% penalty taker who had merely been unlucky in a small sample of spot kicks.

Or perhaps he'd taken into account Walters' perfect spot kick record since the Telegraph branded him useless at penalties and concluded that Walters had been putting in hours on the training pitch.

With the Euro's just around the corner perhaps the Telegraph would care to run a piece on the best penalty takers currently in international football.

If they assume all kickers have broadly a 78% chance of converting a spot kick and look at the expected variation in actual spot kick successes from the inevitably small sample they will have collected, they will have great difficulty in drawing up a list.

But if they just look at international conversion rates, as they did when selecting the worst penalty takers in the Premier League, they'll have no trouble coming up with a list of the best penalty takers who are going to Euro 2016.

Jonathan Walters. It was the best of times and the worst of times.

And that list is probably going to include Jonathan Walters.


Monday 16 November 2015

Will Leicester Emulate WBA's Hot Start from 2012/13 ?

Goal expectation models have become increasingly popular in evaluating past performance and predicting future achievements, both for teams and individual players.

More widespread data availability has allowed models to grow in complexity.

Simply taking the cumulative goal expectation for a teams attacking and defensive units does shift the wheat from the chaff, but the methodology can be easily improved by looking at the goal expectation of individual attempts and allowing for multiple saves in the same attack.

How cumulative goal expectation is distributed over a game in terms of the number of chances and likelihood of success for each individual attempt can have a subtle influence on match outcome between two teams with a similar goal expectation in that match.

One use of granular models is to simulate the variation in possible match outcomes compared to the actual result on the day.

Such an approach is of course limited, matches aren't merely heading and shooting contests, but in a young discipline it may highlight which team is benefiting from variation in outcomes and which is not.

In short, teams are dubbed deserving or not of their current league position based on how closely their expected goals profile tallies with the league table.

WBA spent 2009/10 in the Championship and 2010/11 and 2011/12 as comfortable mid table Premier League finishers.

After 12 games of the 2012/13 season they were 4th, a point behind Chelsea, 4 adrift of Manchester City, who'd narrowly beaten them in injury time in October and 5 behind leaders, Manchester United.

Despite a recent history of Premier League mediocrity, their position on November 18th 2012 seemed legitimate. Their goal expectancy was just over 18 goals and defensively it was 13, very close to their actual goals record of 19 for and 13 against and they had just beaten Chelsea at home.

So 23 points from WBA's 12 games, based individual on goal attempts in those matches seemed to be a fair return.

However, if we split into groups the goal expectancy for all on target attempts in WBA's 12 games they were particularly dominant in non penalty chances that had at least a 0.5 probability of being converted.

And while such a distribution of attempts had served them well in reaching 4th spot, samples were small and it wasn't necessarily going to be typical of how the season might continue for WBA.

The two plots above show the typical distribution of non penalty kick, on target attempts in a season for teams who finished in the top 4, (where WBA stood after 12 games) and for sides who finished from 9th to 12th, broadly the position WBA occupied in their previous two Premier League seasons.

The attempt profile for the first 12 WBA games of 2012/13 appears to be more typical of a mid table side when chances have a relatively low individual expectation, but that of a top four team when chances had a higher individual goal expectation.

So in projecting the Baggies performance in the remaining 26 games, do we broadly take their 12 games to date as an indication of improved form, even if their dominance in big chances only amounts to a handful of attempts at both ends?

Or do we treat these mere 100 combined attempts as a possible aberration and assume WBA will perform more in keeping with their previous two seasons in the remaining 26 games and weight any projected ratings accordingly?

The goal expectancy profile for the remaining 26 games is above and WBA were largely out-shot during the remainder of the season in all but the lowest probability attempts, rather than maintaining their hot start.

They finished with 49 points in 8th spot, just two more points than they'd won in each of their previous two campaigns.

Two seasons of mediocre performance proved a better indicator of future success than did a 12 game sequence where WBA out-shot their opponents in high probability attempts.

Early season WBA in 2012/13 weren't lucky in converting their chances, but they may have been lucky in the distribution and number of chances they were creating.

Perhaps their Midlands rivals Leicester City, with a similar early season goal expectancy profile and lofty current league position, will fare better in 2016 than the Baggies did in 2013.

Or more probably, not.

Sunday 15 November 2015

Where in the Table? (S to W)

Teams A to N can be found here or in various dedicated posts for Manchester City, Chelsea, Leicester and Arsenal.

                                                                      Actual Table.

Saturday 14 November 2015

Where in the Table? (A to N)

If everyone creates the chances they have to date in the Premier League, where might random variation cast your side compared to where they are lucky/unlucky, skillful/rubbish enough to actually be after 12 games?

Each iteration of each simulation is done on every goal attempt in each of the 120 games to date. Missing teams can be found in earlier dedicated posts.

                                                                          Actual Table.

Southampton to West Ham to follow

Points Simulations after 12 Games (S to W)

Concluding this earlier post