Monday, 30 November 2015

The Alternative EPL Table, Iteration Number One.

In A Galaxy Far, Far Away.........................

Week 14 of the current season ended in a 2-1 defeat for the league leaders, Arsenal, in a game of few chances at Norwich. They spurned the opportunity to extend their lead over Manchester City who went down by the odd goal in a five goal thriller at home to Southampton on Saturday.

Spurs heaped more pain on defending champions, Chelsea, winning comfortably, 2-0. Chelsea are now beginning to become cut adrift at the foot of the table along with Newcastle, who rode their luck in a goalless draw with Palace.

Former manager of the year, Tony Pulis further enhanced his reputation by keeping his WBA team in the top four after a 1-1 draw with near rivals WHU.

Fallen giants, Liverpool and Manchester United, each recorded narrow 1-0 wins against Swansea and Leicester, respectively, but still remain detached from the title race.

Gary Monk's Swansea's league position continues to spotlight the talents of their young exciting manager, while Leicester, as predicted are struggling to emulate their strong finish last season and just stay clear of the relegation positions under Claudio Ranieri.

Stoke played half a game with ten men.

The Premier League, sponsored by Random Shot Based Simulations Inc.

Sunday, 29 November 2015

The Table Often Lies.

The group phase of the Champions League is cleverly designed to maintain interest to the end of match day six.

The winners of the group are seeded to meet the runners up from the other groups, subject to a restriction that prevents a team playing another from the same country.

The third placed teams become the favourites to lift the Europa League and the fourth placed teams get to spend more quality time with their families.

% To Win Group
% To Qualify 
% To Drop into Europa League
Median Points
Bayer Leverkusen
BATE Borisov

So even if one team is strongly favoured to win the group, there is likely much to play for over the full course of the twelve games.

Group E was the easiest to predict the most likely winner. Top ranked Spanish Champions League representatives are generally superior to second ranked Serie A teams, who in turn are superior to below average German qualifiers. Each of which are superior to a champion from Belarus.

My pre tournament estimation of the chances of each team filling a particular spot for Pinnacle can be seen in the table above.

Others took a slightly more optimistic view of the chances of Bayer Leverkusen and a typical average points expectation based on reported odds was an average of 13.5 points for Barcelona, 9 for Bayer Leverkusen, 8 for Roma and 3 for BATE Borisov.

Already we have a dilemma in deciding if finishing position fully reflects talent. Namely, opinions differ as to who is the best to start with and the argument begins to become circular.

If we again treat the bookmaking odds as all seeing and able to perfectly capture the true chances of each side in each individual group match, it would be useful to know how often the teams finish in the order predicted by the bookmaking.

36% of Group E simulations finish with Barca as top seed, Leverkusen as runner up, Roma as Europa League bound and BATE going home where each team plays six matches.

If we double the tournament length so that each team plays twelve rather than six games, a finishing order that reflects the odds we've used in the simulations is still less than a 50% chance.

Quadrupling the length of the group stage to 24 matches per team produces the desired order to reflect talent just over 60% of the time.

Should you wish to trade increasing certainty for viewer indifference, a 96 game per team schedule pushes to beyond 77% the percentage of iterations with Barca as seeded knockout stage entrants, followed by Leverkusen and Roma Europa League bound.

Competitions must strike a balance to reward talent (to please the participants) and maintain a degree of uncertainty (to enthrall the audience), while sticking to a manageable time frame.

So in the real world, especially where teams are closely matched, the table often lies.

Saturday, 28 November 2015

Who is the Best Team in Group D?

The Uefa Champions League Groups are concluded in early December and the list of already qualified teams is a familiar mix of Spanish, German, French and English teams.

Seeding and a six game, home and away league format attempts to ensure the best sides progress and aren't caught out in a single knockout match,

Group D appeared one of the most competitive prior to match day one, containing a representative of the four major European countries, in Juventus, Manchester City, Sevilla and Borussia Moenchengladbach.

Juve and City had already qualified after match day four, with the former holding a two point lead going into match day six. Bookmaking odds had favoured Manchester City to top the group, although in a preview written for Pinnacle, I suggested that Juventus should be considered the more likely group topping side.

So it is tempting to claim that my assessment of the two sides was well founded.

However, although the runner up from last season currently head the group, they are merely odds on to top it after the final reckoning, so City may still justify pre tournament, group favouritism.

There is also a significant possibility that City may be superior to Juve, but still finish below them after just six home and away, round robin games.

The bookmaker's assessment of each of the four teams can be broadly demonstrated by converting the match odds into expected group points. City were expected to gain an average of 10.3 points compared to 9.6 for Juve, with Sevilla and Moenchengladbach trailing with 8.4 and 4.9 points, respectively.

More usefully, simulating the group using the same bookmaking odds that favoured City results in the English team topping the table 45% of the time compared to 35%, 15% and 5% for the respective remaining three teams.

So City may be the best side in the group, but it is more likely that someone else would top the table if they play just six group stage matches.

Therefore, the books may claim to have been correct, but randomness may still overcome their best predictions after match day six.

If we extend the group stage fourfold so that each team plays 24 matches, the separation between the current top two and the others increases.

Juve now win the group 33% of the time and City become slightly more likely than not to top the group. Sevilla has a 9% of winning and Moenchengladbach is relegated to the realms of improbable, but not quite impossible.

If we become even more extravagant and condemn each team to play 96 group games based on bookmaker's odds, City are now a 74% chance to top the group, Juve 25% with the other two teams a virtual probabilistic irrelevance.

Even turning a four team group into a weekly event spread over nearly two years, there is still a one in four chance that the best team doesn't win the group when two of the teams are closely matched.

Therefore, declaring one team superior to another, even based on a 38 game Premier League season almost certainly fails to account for a set of trials that are awash with randomness and similarly comparing players based on a paltry amount of data, often without context is likely to be equally misleading.

League position = talent + randomness.

Monday, 23 November 2015

Has Luck Made the Title Race Highly Competitive?

The 2015/16 Premier League has thrown up a multitude of talking points, ranging from Chelsea and Leicester swapping identities to the log jam of teams that are in competition at the top of the table after 13 matches.

Leicester entered the Premier League as above average Championship winners, so survival in 2014/15 was more likely than not, but with Premier League points sometimes coming in uneven bursts, a title contending run was required to secure safety over the final months of the season.

Despite the encouraging finish to 2014/15, optimism was hardly raised by the appointment of Claudio Ranieri, a manager better known in England for his extreme squad rotation policy than his previous tenure at the head of a title contending side.

Leicester's current position is a pleasing juxtaposition to 2014/15 when they were bottom with 10 points after 13 games compared to topping the table in 2015/16 having dropped just 11 points.

Shooting stats don't mark Leicester down as the most likely leaders of the Premier League, they've been lucky in converting chances in matches where they've not dominated in terms of volume, but they are currently a legitimately improved side from the previous season.

Leicester's 28 points after 13 matches ties with the 2010 Chelsea side as an historical low points total for a table topping team after a baker's dozen since 2000. However, points alone are a poor measure of a team's standing compared to the remainder of the league.

Currently, just four points separate the top five, compared to a nine point gap when Chelsea led the title race with 28 points after 13 games in 2010.

When measured in terms of how many standard deviations a team leading the table is above the current average points total for the league, the 2010 Chelsea team were more dominant than Leicester are despite both sides having identical records.

How Dominant were the Leaders after 13 Games.

Table Topping Team. Year Start. Standard Score.
Manchester U 2000 2.02
Liverpool 2001 2.03
Liverpool 2002 2.07
Arsenal 2003 2.12
Chelsea 2004 2.23
Chelsea 2005 2.22
Manchester U 2006 2.38
Arsenal 2007 2.96
Chelsea 2008 2.30
Chelsea 2009 2.31
Chelsea 2010 2.14
Manchester C 2011 2.12
Manchester U 2012 1.94
Arsenal 2013 1.96
Chelsea 2014 2.51
Leicester. 2015 1.51

(Teams in bold won title).

In 2015 to date, Leicester are just 1.5 standard deviations above league average, the least dominant achievement for a leader after 13 games this century by some distance. 

There are three challengers each within a win of overhauling them, including Manchester United whom they play next and it is tempting to cite the closeness of the title race and Leicester's position at the head of it, as proof that standards may be falling in the Premier League.

However, just as random variation in the matches so far may have been kind to the Foxes, it may also have compressed the higher reaches of the table compared to more recent seasons.

Instead of tracking simply points won by the leaders in shot based simulations of the current table, we can partly estimate how competitive each iteration was by calculating the distribution of standard scores for each table topping team.

Slightly more than 10% of season simulations for 2015/16 give a leader who is less dominant than Leicester are to date. The most likely outcome produces a leader that is between 1.8 and 1.9 standard deviations above average and 22% of simulated leaders have standard scores of 2.0 or above.

Random variation may possibly have propelled Leicester to the head of the Premier League. It may also be responsible for making the current table appear more competitive than it may actually be.

A single anomalous batch of 130 matches is far too early to call time on the title regulars, induct new members or declare a new found equality in the higher reaches of the table, 

Saturday, 21 November 2015

Everyone Loves Ricky.

Everyone loves an individual goal. Be it Peter Beagrie beating six players (or the same player six times) in the late 80's to John Barnes in the Maracana and Ricky Villa lighting up the old Wembley Stadium.

It may be a trick of the mind, but such unassisted goals also seem to have an air of inevitability. Once the last line of defence is reached the keeper rarely spoils the party.

There may be legitimate reason for this impression. A player who has largely created his own chance has often disrupted any defensive organisation that had previously existed, while being fully in control of the ball, rather than stretching to master an over hit assist.

While imperfect, the absence of an assist in an attempt description might serve to identify goals or shots that were a result of individual skills, rather than a chance created through a series of team based passes.

Using a season's worth of shot data from open play, it does appear that unassisted goal attempts result in scores at a higher rate than attempts originating from an assist. This may have occurred by chance, but the analysis strongly suggests otherwise.

A Spurs legend dreams of historical deeds.
As a baseline figure using single season data, once location is accounted for, an unassisted on goal attempt is around 10% more likely to be scored than an attempt that came about by a teammate setting up the chance.

This has implications for both teams and players who may be adept at creating potentially better quality chances through individual effort compared to relying more on a teamwork based approach, where the opposition may be able to defend more cohesively.

In 2012/13 only 15% of Arsenal's on goal attempts from open play were lone wolf attempts compared to 25% for Sunderland. However, variation of percentages should be expected, even if all teams have broadly the same propensity to create individually crafted chances.

Premier League attacks, based on the limited data I have, do created widely different numbers of chances from open play, (Arsenal created almost twice the number for Sunderland) and within these chances are varying proportions of individually created chances.

However, the spread in 2012/13 was insufficient to conclude that Sunderland's higher proportion of individually created chances compared to say Arsenal, is a real trait that may persist. It could be, but more data is needed.

The same could not be said for Premier League defences in 2012/13. The league average was for 20% of open play chances faced to be predominately the product of an individuals efforts. But this fell to 9% for Reading's defence to a high of 29 for Arsenal.

This time the spread could not be explained away as merely random variation. At the very least for that season, opponents seemed to be attacking certain sides in a variety of biased approaches from open play.

The quantity of chances faced will always overwhelm any persistent bias in the type of chance allowed, but identifying if and perhaps why a side is allowing a larger percentage of individually created chances, that may carry a greater sting in the tail, may make for marginal gains.

Also individuals, who aren't called Messi or Ronaldo, may be unfairly define as having a lucky season, when they are actually rather good at persistently emulating Ricardo Julio Villa.

Thursday, 19 November 2015

Bad Mood.

On Tuesday night I won the lottery. OK not really, but I was involved in an extremely low probability occurrence.

To briefly describe the event. We were going to see The Vaccines at Wolves Civic Hall, an indie rock band from West London with a fan base that is almost exclusively young. So our mere presence as a very small right hand tail of the audience age distribution was unlikely in itself.

The route, car used and departure time was dependent on a multitude of random decisions, chosen arbitrarily on the night.

How long it took to bribe the cat with extra food, which routes to take to avoid the traffic chaos that is currently Stafford town centre, how many cars to overtake, when safe and legal to do so and how many cars to let out from side roads.

Not my fault.
An hour after a journey that usually takes an hour in total we were still crawling through Stafford when the remnants of hurricane Barney deposited a six foot long tree branch onto the roof of my 2003 mini bought new on the same day England won the Rugby World Cup..

There were no injuries, the car was driveable, but a total write off.

No other car I saw that night was driving around with recent storm damage and none of the drivers appeared particularly adept at dodging unseen projectiles. So I concluded that I'd been unlucky in the extreme. (although when you emerge unscathed from such incidents, you're invariably told how lucky you've been).

The chances that I was going to be on the wrong end of a tree branch on Monday prior to the event was very, very tiny.

Improbable, but not impossible and the same was true for everyone else on the road. But so many drivers were around that night across the path of Barney, each with a very small chance of getting written off that the chance that someone was going to suffer that fate was significant.

I was just the "lucky" one.

1 chance in 20 is often taken as the arbitrary measure of when something unseen is thought to be at play, such as talent to avoid flying trees or to score more goals than expected.

But unless there is corroborative evidence to back up the claim that an event is the product of additional skill and luck, it is worth seeing how many players are "on the road" in case you are simply seeing a notable and unlikely event that was almost bound to happen by chance to one of a numerous group of similarly talented individuals.

Wednesday, 18 November 2015

Goal Expectancy Per Minute.

Top of most manager or fan's wishlist in the January window is a 15 goal a season striker. But what actually constitutes such a potential purchase?

In terms of goals scored, a penalty or free kick taking striker in a moderately successful side could easily reach half way to this benchmark from dead ball goals alone.

If they are also the intended target for other set piece plays, have bolstered their recent tallies with a couple of deflected chances (that are extremely difficult to save, but may not be repeatable in the long term) and benefited from the occasional goal keeping error, then the bulk of their recent record may be deceptive.

We therefore may chose to just look at goals that are scored from open play as a more revealing statistics, sieved of wrong footed keepers as well as the added advantage of regularly striking a dead ball.

However, this approach merely invites the other inherent uncertainty of random variation.

Expected goals models hope to illustrate how likely an average player is to score with a shot or header. Over or under performance against this cumulative expectation is often taken to be a sign of above or below average finishing talent, but is far more likely to be mostly down to simple variance within relatively small numbers of probabilistic events.

A fair coin exhibits no skill by falling heads up six times out of ten.

By looking at a player's expected goals from open play, we may eliminate the inbuilt advantage of being the chosen one for set pieces, as well a move to a probabilistic, rather than outcome based assessment. But we still need to adjust for time on the field.

Christian Benteke's 2012/13 season at Aston Villa was rewarded with nine actual goals from open play in 2820 minutes of playing time from chances that had a cumulative goal expectancy of 5.5 goals.

An impressive over performance,

He had required the keeper to save on target attempts that would yield an average player 0.00195 expected goals from open play per minute, but he was actually scoring at a rate of 0.0032 open play goals per minute.

His over performance in converting his attempts in 2012/13 placed him statistically alongside potential superstars, such as Bale, solid Premier League performers, such as Walcott and Cazorla and someone called Michu.

However, it is a simple exercise to simulate the likelihood that an average finisher scores at least nine goals from the chances Benteke put on target in 2012/13 (it's around 8%). So it was eminently possible that his actual over performance in converting chances was simply due to good luck.

Benteke laments the influence of random variation on his actual goal tally.
Anyone who might have considered Benteke as an acquisition capable of 15 goals a season (or perhaps 9 open play goals) may have waited until the 2013/14 January window to pounce, rather than pay an inflate fee for a player who may have visibly, if misleadingly demonstrated his potential.

Up to January 1st of the following season, Benteke had fared less well in open play. Scoring just once in 1145 minutes of play (0.0009 actual goals per minute) compared to a goal expectancy of 1.8 or 0.0016 expected goals per minute.

This time he under performed rather than over performed against his goal expectation based on where and how he took his shots.

But again it may have just been a less extreme dose of random variation. It was a 47% chance that an average finisher would score one or zero goals from Benteke's on target attempts in the first half of 2013/14.

Benteke's open play goal expectation per minute is relatively consistent from 2012/13 to 2013/14, more so than his actual goals per minute. And the latter could reasonably have occurred as a random draw from the former.

This correlation for goal expectation per minute from open play is also stronger across seasons for attacking players as a group than is their actual scoring rate per minute.

Improvements in estimating a player's goal expectation is relatively easy, if data hungry.

Allowances for goal scoring environment which particularly impacts frequent substitutes and allowing for ageing (goal expectation appears to follow the typical ageing curve with a peak in a players late 20's).

Rather than looking at a players actual scoring record, which has less connection with his future scoring feats, it may be wiser to look at his goal expectation per minute in his more recent seasons.

And what fans really want as a late Christmas present might just be a 0.005 goal expectancy per minute striker, preferably in his early to mid 20's.

Jon Walters, International Football's Best Penalty Taker.

Judging players on what they do rather than applying a more probabilistic approach often leads to misleading conclusions.

On Monday night, Jon Walters' brace of goals sent the Republic of Ireland to Euro 2016.

The opening goal was from the penalty spot and came just over a year since Walters was branded by The Daily Telegraph as one of the worst penalty takers in the Premier League             

So if you disregard the random variation that exists in a small number of penalties taken by any perfectly capable professional footballer and instead go with the Telegraph's confident headline, it appears a brave call by Ireland to entrust such an important kick to such a wretched performer from 12 yards.

Perhaps Martin O'Neill includes probabilistic thinking in his list of skills and knew that even at his lowest conversion rate, Walters was still likely to be a perfectly adequate 78% penalty taker who had merely been unlucky in a small sample of spot kicks.

Or perhaps he'd taken into account Walters' perfect spot kick record since the Telegraph branded him useless at penalties and concluded that Walters had been putting in hours on the training pitch.

With the Euro's just around the corner perhaps the Telegraph would care to run a piece on the best penalty takers currently in international football.

If they assume all kickers have broadly a 78% chance of converting a spot kick and look at the expected variation in actual spot kick successes from the inevitably small sample they will have collected, they will have great difficulty in drawing up a list.

But if they just look at international conversion rates, as they did when selecting the worst penalty takers in the Premier League, they'll have no trouble coming up with a list of the best penalty takers who are going to Euro 2016.

Jonathan Walters. It was the best of times and the worst of times.

And that list is probably going to include Jonathan Walters.


Monday, 16 November 2015

Will Leicester Emulate WBA's Hot Start from 2012/13 ?

Goal expectation models have become increasingly popular in evaluating past performance and predicting future achievements, both for teams and individual players.

More widespread data availability has allowed models to grow in complexity.

Simply taking the cumulative goal expectation for a teams attacking and defensive units does shift the wheat from the chaff, but the methodology can be easily improved by looking at the goal expectation of individual attempts and allowing for multiple saves in the same attack.

How cumulative goal expectation is distributed over a game in terms of the number of chances and likelihood of success for each individual attempt can have a subtle influence on match outcome between two teams with a similar goal expectation in that match.

One use of granular models is to simulate the variation in possible match outcomes compared to the actual result on the day.

Such an approach is of course limited, matches aren't merely heading and shooting contests, but in a young discipline it may highlight which team is benefiting from variation in outcomes and which is not.

In short, teams are dubbed deserving or not of their current league position based on how closely their expected goals profile tallies with the league table.

WBA spent 2009/10 in the Championship and 2010/11 and 2011/12 as comfortable mid table Premier League finishers.

After 12 games of the 2012/13 season they were 4th, a point behind Chelsea, 4 adrift of Manchester City, who'd narrowly beaten them in injury time in October and 5 behind leaders, Manchester United.

Despite a recent history of Premier League mediocrity, their position on November 18th 2012 seemed legitimate. Their goal expectancy was just over 18 goals and defensively it was 13, very close to their actual goals record of 19 for and 13 against and they had just beaten Chelsea at home.

So 23 points from WBA's 12 games, based individual on goal attempts in those matches seemed to be a fair return.

However, if we split into groups the goal expectancy for all on target attempts in WBA's 12 games they were particularly dominant in non penalty chances that had at least a 0.5 probability of being converted.

And while such a distribution of attempts had served them well in reaching 4th spot, samples were small and it wasn't necessarily going to be typical of how the season might continue for WBA.

The two plots above show the typical distribution of non penalty kick, on target attempts in a season for teams who finished in the top 4, (where WBA stood after 12 games) and for sides who finished from 9th to 12th, broadly the position WBA occupied in their previous two Premier League seasons.

The attempt profile for the first 12 WBA games of 2012/13 appears to be more typical of a mid table side when chances have a relatively low individual expectation, but that of a top four team when chances had a higher individual goal expectation.

So in projecting the Baggies performance in the remaining 26 games, do we broadly take their 12 games to date as an indication of improved form, even if their dominance in big chances only amounts to a handful of attempts at both ends?

Or do we treat these mere 100 combined attempts as a possible aberration and assume WBA will perform more in keeping with their previous two seasons in the remaining 26 games and weight any projected ratings accordingly?

The goal expectancy profile for the remaining 26 games is above and WBA were largely out-shot during the remainder of the season in all but the lowest probability attempts, rather than maintaining their hot start.

They finished with 49 points in 8th spot, just two more points than they'd won in each of their previous two campaigns.

Two seasons of mediocre performance proved a better indicator of future success than did a 12 game sequence where WBA out-shot their opponents in high probability attempts.

Early season WBA in 2012/13 weren't lucky in converting their chances, but they may have been lucky in the distribution and number of chances they were creating.

Perhaps their Midlands rivals Leicester City, with a similar early season goal expectancy profile and lofty current league position, will fare better in 2016 than the Baggies did in 2013.

Or more probably, not.

Sunday, 15 November 2015

Where in the Table? (S to W)

Teams A to N can be found here or in various dedicated posts for Manchester City, Chelsea, Leicester and Arsenal.

                                                                      Actual Table.

Saturday, 14 November 2015

Where in the Table? (A to N)

If everyone creates the chances they have to date in the Premier League, where might random variation cast your side compared to where they are lucky/unlucky, skillful/rubbish enough to actually be after 12 games?

Each iteration of each simulation is done on every goal attempt in each of the 120 games to date. Missing teams can be found in earlier dedicated posts.

                                                                          Actual Table.

Southampton to West Ham to follow

Points Simulations after 12 Games (S to W)

Concluding this earlier post

Points Simulations after 12 Games (A to S)

The current table is simply a single iteration of a multitude of inter connected possible outcomes, spread out over 120 individual matches that have been contested in the Premier League to date.

Each of those games has been decided by a variety of significant events, most notably goal attempts that have require the keeper to attempt a save. And while some have resulted in scores others have been saved. The outcomes are not set in stone, on another day, goals may have been saves and saves a cause for wild celebration.

Replaying these goal bound attempts, mindful of how likely such an attempt is to result in a goal based on such variables as location and shot type can convey the variation in the range of possible outcomes and ultimately match results.

In one particular parallel universe, Everton top the table, while in quite a few others Chelsea occupy 20th and bottom position.

The plots below give an idea of the distribution of points a side might have won if the league schedule to date was tipped into a probabilistic soup of shot base simulations.

The remaining teams will follow later, along with similar plots charting the range of current possible league positions after 12 matches.

                                                              Actual Table.

Friday, 13 November 2015

Arsenal's Solid Title Aspirations.

Arsenal spurned the chance to go into the international break as early season leaders of the Premier League when they had to come from behind to claim a point against traditional rivals, Tottenham on Sunday evening.

Nevertheless a perfect Premier League record in October was enough to win the "Manager of the Month" award for Arsene Wenger and Arsenal head a short list of legitimate contenders to Chelsea's crown.

The Gunners have out-shot opponents in seven of their 12 matches to date, being out-shot just twice. 26 is their most likely Premier League points tally in individual match simulations based on their shot location type and ability to hit the target and restrict the shooting of opponents.

Second spot is their most likely current placing, which also tallies with Arsenal's actual record. So their title aspirations are confirmed. they appear neither over or under rated compared to their underlying shooting statistics.

Shot profiles will inevitably bounce around after just 12 matches, but once you accept that Arsenal are hardly renown for attempting low goal expectation efforts from distance, their current shot profile compares favourably to those typically seen in top four sides.