Monday 22 February 2016

How Roberto Martinez Could Have Been Everton's Most Irreplaceable Player.

Midtjylland's boffin driven win over Manchester United last week along with Rasmus Ankersen peerlessly owning BT's Jake Humphrey in the post match interviews has inevitably put football analytics back in the media spotlight.

Intelligently written prose combined with rigorously derived statistics has a relatively sparse, but nonetheless impressive history in the mainstream British media.

The Guardian's Sean Ingle regularly delivers well written narrative, backed up with accessible use of numbers, often with a slice of contemporary culture thrown in and Daniel Finkelstein's eponymous Fink Tank has long provided an outlet for some of football's best, lab coated number crunchers.

Nerdy clich├ęs aside, it's encouraging to see analytics becoming an accepted part of football reporting. It may not appeal to every taste, but it certainly doesn't seek to wholly replace more subjective approaches. It has only ever mostly been intended to complement and enhance.

But just as analytics has had to learn to walk before it could break into its current brisk canter, some of the emerging mainstream efforts fall well short of the bar set by The Times and The Guardian.

Analytics occasionally over turns "established" knowledge. But this mostly occurs when simple common sense could have eyeballed the flaws in the narrative anyway.

2-0 is a "dangerous" lead that few would turn down given the choice and it's rare to hear a fan base inwardly groan when an opponent is reduced to ten men and they instantly become "harder" to play against.

So analytics does have a record of going against the grain of irrational popular sentiment. But this should not mean that the more outlandish the numbers driven claim, the more likely it is to be true.

Saturday's article in the Telegraph, "The Player Your Premier League Team Cannot Do Without" caught the eye on Twitter, as well as being shared over 2k times.

The premise was simple, the unnamed journalist found the player in each Premier League team, subject to six or more appearances who was that side's "winning-est" player. In plain English, the player(s) who had the best win% in matches they had taken part in at each Premier League club.

The list contained a motley crew of rarely used fringe players mixed with the occasional legitimate star.

Typifying the former and with all due respect was Everton's Leon Osman, although Vardy and Mahrez will no doubt be surprised to learn that Nathan Dyer is the player that Leicester can ill afford to lose this season.

34 year old Osman's six appearances has yielded four wins for a winning-est percentage of 67% compared to Everton's baseline figure of 31% in the Premier League this season.

Player contribution to team success is a well trodden and problematic path in sports analytics, more so in football where the individual, repetitive elements present in other sports, such as baseball and basketball are partly absent.

Nevertheless, the likes of Jorg Seidel and Dan Altman have produce measures that aim to untangle individual player contribution to team success. Neither, I would hazard would consider Osman indispensable to Everton's cause.

We could pick holes in the Telegraph's approach by pointing out that wins, especially over a small number of games in a low scoring sport is a very blunt measuring tool. We've already discarded draws, for example which contribute around 25% of results.

Similarly, if we are looking at very small sample sizes in a statistically noisy environment, then we are often going to get extremes of good or bad percentage based outcomes through chance.

Manchester United's win% without Smalling is zero%, he's missed one game all season, the 2-1 loss at Bournemouth.

Even more damning, we could point out that three of Everton's Osman inspired wins, 6-2 v Sunderland, 4-0 v Villa and 3-0 v Stoke had already been largely completed when he entered the field of play.

He played for one minute at Stoke, 16 against Villa and 12 against Sunderland, during which no further goals were scored or to be fair conceded. But it is a performance that manager, Roberto Martinez could almost certainly have replicated had he chosen to sub himself on rather than turn to Osman. Thus making himself a prime candidate to for Everton's most irreplaceable player under The Telegraph's deeply flawed methodology.

Familiarity with your data can help to avoid such faintly ridiculous conclusions and pushing Osman into the six games or more requirement on the basis of sixty seconds plus injury time at Stoke suggests that this particular "data cruncher" wasn't.

But another way to avoid publishing nonsense under the guise of stats is to spell out your conclusion without recourse to any of your data driven evidence.

"Osman's win% is 67%" sounds good, but "Based on Leon Osman's six Premier League appearances, five as a sub and one as a subbed off starter, I conclude that he is the player Everton cannot do without" should have been sufficient to spike this depressing addition to the analytics portfolio before it saw the light of day.

Tuesday 16 February 2016

Pulis' Road Block.

A Tony Pulis side was at it again on Saturday afternoon.

WBA traveled to Everton, points-wise becalmed in the high 20's following a win-less run since the first game of the New Year. It's a run that inevitably befalls lower table teams at some stage of the season and drags them gradually, in the minds of the supporters at least, into the relegation mire.

Based purely on shot data Everton routed their visitors 33 attempts to 5, 6 to 1 on target, but Pulis has never been respectful of raw counts and the Baggies won three points with a 1-0 win.

No one has played the payoff between  creating few Ben Woolcock "big chances" while allowing lots of long range attempts in return more than Pulis' Premier League teams, most notably Stoke.

WBA's winner, bundled in from a couple of inches out following a near post flick on from a corner proved enough on the day to outscore 33 attempts of which around half were from outside the area and just one from inside the six yard box.

Gradually accrued superior expected goals defeated by the relative certainty of one virtually unmissable opportunity.

Back in his northern hinterland, Pulis' "stats busting" win failed to garner the press accolades that he won when lifting the manager of the season award for similar performances that rescued a slightly fallen southern giant from an imminent return to the second tier.

Stats that are beloved of the advocates of the beautiful game, possession, pass completion, open play as opposed to putting it into the mixer set play deliveries have been the natural habitat for Pulis' contrarian approach.

If there was a coveted stat associated with "playing the game the right way", a Pulis team were inevitably the ugly outlier, cutting their cloth accordingly.

Saturday's win exhibited another Pulis staple that rarely nudges the dial in one direction or another for the vast majority of Premier League teams.

15 of the 33 Everton attempts were blocked, saving Foster the trouble of making additional saves, without the inconvenience of having to deal with any wicked deflections.

Few teams stray for from the league average in terms of blocking shots, even when shot type and location is incorporated into the model. And while single seasons in which Pulis demanded his Stoke team make around 10% more blocks than expected from a location based model may not provide compelling evidence for an atypical, coach driven tactical quirk, the season on season blocking over performance for his teams perhaps does.

Even when faced with Everton's 33 attempt barrage, 15 blocks is excessive. The average Premier League team would most likely block around half that number and succeed in blocking 15 or more in only around 1 in 100 such games.

So we perhaps have more evidence to add blocking to Pulis' statistical reign of terror. And judging by the enthusiastic way in which Robert Huth, a graduate of the Pulis school of defensive arts, threw himself in the way of some of Arsenal's shots on Sunday, some aspects of Pulisball are gradually seeping into the mainstream Premier League playbook.

Friday 12 February 2016

Welsh Hiraeth verses English Passion.

No sport likes to invoke the seemingly unquantifiable measure of heart, passion or sheer will to "want it more" than the opposition than rugby union.

Whether it be the sight of grown locks crying at the first strains of "Mae hen wlad fy nhadau" or the emotive interplay between team and crowd in "Flower of Scotland", the importance of beginning and staying in a positive mental state is a constant aim.

It is perceived that the more concentrated the mental state of the players and team as a whole, the more positive the results achieved and following Wales' brave, but ultimately slightly disappointing comeback in drawing 16-16 after taking a late lead in Ireland, coach, Warren Gatland has demanded the side "be better emotionally".

Dr Roberts & Captain Sam gear up for pre Six Nations singing practice.
Already there's a sneaking suspicion among those more comfortable with statistical rather than emotional indicators that such demands are the product of already knowing the actual result rather than any readily identifiable stage in last week's match when Welsh passion waned and Irish fire rekindled.

Measuring emotional levels among the players will require much more sophisticated monitoring than the current gps and a trained, if biased seasoned eye.

However, the single rugby event when amateur psychologists and stats followers get to stare into the soul or the kicking percentages of the player is during a kick to the posts.

The slightly robotic routines of a Wilkinson or a Farrell has evolved into the free form improvisation of the "Biggarena", but the aims of all are the same. To put the kicker into a state, both emotionally and physically to give himself the best chance of converting the kick.

Although a kick is a kick they arise through two distinct events. Either they are a conversion of a try or a penalty kick as the result of foul play.

Currently the rewards in international and major club rugby gives the side an additional two points to the five already won from the try in the case of a conversion and a stand alone three points for a penalty kick from an opponent's transgression.

So already there is a perhaps major disconnect between pressurized situation of a penalty or a conversion.

In the case of the latter, hard earned points are already on the board because of the try and numerically the value of the kick is just two rather than the three points on offer for a penalty.

Additionally, in the case of the penalty, the side has yet to add to the scoreboard despite playing well enough within the previous open play phases to force the opposition into a major infringement.

Simplistically the pressure may be conceived as being greater on the kicker when taking a penalty rather than when he is merely adding the "extras" from a conversion. Even with a choreographed comfort blanket, his actual level of performance may be lower in the higher stress levels of a penalty kick.

Using OptaPro data for kicks of both types, incorporating other variables such as kick distance, angle and pitch location, we can in addition add a variable to account for the type of kick to see if there is a statistically significant difference in success rate that is dependent upon the nature of the kick.

The data runs into five figures, ranging from club, under 20 and full international kickers. In the initial run through a conversion is converted, statistically significantly less often than a penalty once the position and angle of the attempt is accounted for.

So even though a penalty kick is more valuable than a conversion and represents the only chance in that phase of play to add to the score and therefore you would assume places more pressure on the kicker, it is these higher pressure kicks that are successful more often, even when location is accounted for.

Time for the sports psychologists to step in if this analysis persists after this initial data dump. Does a kicker involuntarily relax into a lower performance level when he's trying to simply add points to an already advanced scoreboard?

Maybe Gatland was correct in principle and in the case of kickers particularly. They may need a Biggarena mark 2 to get deeper into the required zone following a George North try, a mass eruption of celebratory Welsh Hiraeth and the comfort zone of knowing points are already in the bag as a reward for a period of dominant play.

Welsh proof read by Rachel Taylor and Dr Ian Graham!

Monday 8 February 2016

The Shawcross Redemption.

Stoke fans are restless again.

A couple of weeks after eagerly anticipating a Wembley trip and making a push for a domestic cup double, settling for 4th place in the Premier League and regular trips to the Nou Camp, while fighting off the advances of Chelsea and Manchester United for the services of Mark Hughes, they've seen all this fade to grey.

They're now wondering how Hughes could be so inept as to neglect Stoke's defensive and attacking frailties in the January window and when he should go to avoid regular trips to the Pirelli Stadium, Burton.

Knee jerk punditry has nothing on those with an emotional and financial investment.

Top of Hughes' current rap sheet in the eyes of his previously greatest supporters is his neglect of adequate cover for the central defence.

It probably doesn't help that on the same weekend that Robert Huth (allowed to leave on a free) was visibly winning the title for Leicester, Stoke captain Ryan Shawcross was again sitting out a tame home defeat to Everton through another injury.

With Huth & Shawcross in tandem, Stoke often dispensed with the services of a keeper.
With or without stats are horribly blunt devices, where small win/loss samples can "prove" bit players essential to a side's well being, when they are merely the coincidence with little causation.

But just as action or heat maps for a single match for more numerous in game stats may shed some light on where a side won or lost a single game, aggregating such things as the quantity and quality of chances allowed with or without a particular player may also tell us something about their impact.

Shawcross has obligingly missed about half the season through injury and the odd suspension. Here's the expected goals per 90 that Stoke have allowed when he's played and when he hasn't in the Premier League. (excuse the familiarity in the table, it was done for a Stoke fan site).

So there is tentative evidence that a player who has had his fair share of media scrutiny has become an important, if not irreplaceable part of Stoke's defence.

His most obvious attribute is his strength in the air and his ability to prevent attackers winning the aerial challenges (often, and in keeping with many defenders, by anchoring the attacker to the ground by his shirt tails).

If you look at the number and proportion of headed chances conceded when Shawcross is and isn't on the field again the contrast is marked. 20% of the total attempts come from headers in his absence of which five have resulted in goals compared to nearer 10% when he plays, with no goals conceded.

It has been suggested that the skill differential when players use their head is greater between players than when they use their best foot, so for once the fans may actually have a point when they debate that Stoke have one of the Premier League's best defensive headers of the ball, but nothing in the way of cover.

Friday 5 February 2016

Putting Your Best Foot Forward.

Finishing skill has been an acknowledged fact of football virtually forever. Strikers are never more dangerous than when they are being "clinical", "ruthless" or, for those of a certain comic strip vintage, "Dead-Shot".

Unfortunately this almost mystical ability has constantly eluded every effort to pin it down even as the data generally available becomes more extensive and plentiful.

It is relatively easy to find strikers who are under or over performing their expected goals model based on any number of shot location variables, but persistence of this trait is less obvious.

Often the "cold" player from one month/week/match/half is the same "hot" scorer from a similarly recent time frame.

   The Magical Finishing Skill Aura of "Dead Shot" Keen's boots worked for Billy Dane. 

Shot volume and location can usually be relied upon to produce an expected goals figure that tracks fairly well a player's actual goal tally. But expecting even a season-long over performance to extend to a subsequent season (at least with a rudimentary model) is often a forlorn hope.

Random variation or rare or unlogged events, such as deflections and defensive pressure appears to overwhelm any attempt to observe a quality that is currently worth around 2 billion Chinese yuan.

A player may differ in finding space, receiving passes and anticipating where to be inside the box, but it is likely that the difference in finishing ability once the chance presents itself is going to be small between the elite.

Marginal gains, but also expensive mistakes if luck is purchased masquerading as a repeatable talent.

The biggest talent gap in finishing skill at the top level should lie between strikers and the rest of the outfielders.

So I looked at every shot (headers excluded) taken by every oufield player in a chance created solely from open play, which wasn't deflected and created an expected goals model based simply on the location of the shot. Sample size well into five figures.

Unsurprisingly, the location of the attempt in this sanitized shooting competition was a significant indicator as to the likelihood of a goal being scored.

I then told the model which shots were taken by "Dead-Shot" strikers and which came from the boot of non-strikers. The expectation being that this additional variable would prove significant and improve the likelihood of the strikers scoring at the expense of their team mates who were less talented at finishing (or they would presumably be strikers themselves).

It didn't.

In this dataset, knowing that a striker had taken the shot slightly decreased the likelihood of a goal, but this effect had almost certainly arisen entirely by chance. The model couldn't see a difference in the likely outcome regardless of whether the shot came from a defender or a striker.

If there is a difference in finishing ability between Premier League outfield players in different positions, as opposed to other desirable attributes possessed by a striker, a naive shot location model can't cut through the missing variables and noise to find it.

So instead I looked for a set of Premier League shots that should/might be (much?) less likely to be scored than others and could be picked up by a simple shot location model.

Scorcher's Billy Dane aside, most players don't have magical football boots, but they do have a preference for one foot over the other. I've yet to find a penalty taker who hasn't taken all his kicks exclusively with a particular foot.

Regular penalty takers used their penalty taking foot for nearly 80% of their shots from opportunities created in open play. So you also have to think they know something about the "finishing ability" of their standing leg.

I re did the model.

Again in the model shot location was a significant variable in the outcome of the shot. But this time when I added a variable for whether the shot originated from the player's penalty or non-penalty taking foot, that too was (almost) significant.

Benchmark figure, a shot with a player's "weaker" foot reduces the chances of a goal by around 10% of the value if it had if it had been taken with his penalty kick foot.

Every player demonstrates finishing ability and that difference might show itself on the 20% of occasions he uses his "swinger" and hits and hopes.

Thursday 4 February 2016

"...And Then We Went To The Etihad".

Manchester City entertain surprise package Leicester in the mid day televised Premier League game on Saturday in the first of five, potentially high leverage head to head matches involving the current top four teams between now and May.

It is unusual to have four teams in genuine contention for the title with just 140 matches remaining, so although the outcome of the early kick off will move the dial it won't be as dramatic as if there were fewer title hopefuls.

The current market odds favour Manchester City followed by Arsenal, the respective second and third favourites in the preseason. So August liabilities may be still skewing the market's February estimation of either lifting the title.

By contrast, Tottenham and Leicester where available respectively at triple and quadruple digit odds.

Numbers are oblivious to any monetary balancing of the books and even the fluctuating levels of future performance that a high profile manager in waiting may inspire. They simply rise or fall as the matches are played out.

Not so very long ago, Leicester were just Championship FA Cup cannon fodder for the Premier League Big Boys.
Manchester City has averaged 1.83 expected goals per game and allowed 1.09 in the season so far compared to Leicester's 1.58 and 1.21 respectively, which gives the hosts a 53% chance of winning, 23% the draw and 24% the visiting Foxes.

The market is more bullish about the hosts (five Premier League losses so far) beating the twice defeated upstarts. It puts Manchester City's chances at nearer 60%.

There will be around 20 minutes to digest the result from the Etihad before the probabilistic projections of Spurs entertaining Watford and Sunday's trip to Bournemouth by Arsenal begin to turn into real points.

There'll also be ample time for the North London fan base to root for the best case scenario for their respective sides in the early game.

So how will the three possible outcomes alter, not only the title chances of the two Citys, but also those of Arsenal and Spurs?

How a Manchester City win might change the title odds at 3 o'clock on Saturday Feb. 6th.

How a draw might change the title odds.

How a Leicester win might change the title odds.

Obviously a win is the best possible outcome for either Manchester City or Leicester.

The host would draw level with their visitors with a win, the most likely outcome. Viewed purely in terms of the relative strengths and remaining schedule of the four challengers, Manchester City's likelihood of winning the title would remain below 50%. Although  in a potentially skewed market they are likely to move to odds on.

A Manchester City win is also marginally the worst outcome for Arsenal.

Spurs can root for a Man City win or a draw. Although the latter would turn their Valentine's Day game at the Etihad into a high leverage game.

A Leicester win would eat into the chances of each of their three competitors, particularly Manchester City's.

Although their underlying inferior defensive and attacking expected goals would mean that even a six point lead would be insufficient to overturn a title win by someone other than the Foxes as still the most likely outcome come 3 o'clock on Saturday.

Monday 1 February 2016

Using Excel To Simulate Villa's Demise.

In the previous post, I described a simple method to use expected or real goals to estimate the average number of goals each team might score and allow in a single game at a certain venue and hence derive the win/draw loss percentages for the game via a Poisson.

It's a handy trick, particularly if you want a method to frame you own match odds and compare them to the market. But the goal ratings can also be used to create passable odds for games that are due to be played over the remainder of the season.

The table above shows the home/draw/away odds for the final weekend of the season using team ratings from the first 230 matches of the season, expressed in expected goals.

It is likely that the abilities of the 20 Premier League teams will change over the remaining 150 matches, but often the change is gradual. Regression towards the mean may be used along with season to date trends to extrapolate each side's future ratings. But on this occasion the ratings from week 23 have simply been used throughout.

To download the estimated home win/draw/away win probabilities for the remainder of the 2015/16 Premier League season just click on the download icon above.

There are two worksheets. One with match odds, both home and away and a second which lists win/ draw (and loss) odds for each team's final 15 games.

We've now got the available ammunition to simulate the range of points that might be won by each of the 20 sides and eventually join up all the interconnected results in each iteration of a season to project final league positions.

But first we'll just use excel to simulate the range of final points a side might expect to get based on these match probabilities.

Here's Villa's final 15 games with their predicted win% in column D. In column G take their predicted draw probability from 1 and drag this formula down to G16.

Insert a random number in column H and again drag down to H16.

We need two columns. One for three points should Villa win and one for a single point should they draw. A win is assumed if the random number is less than the corresponding win probability in column D.

We've taken the draw probability from one in column D. So a draw is assumed in proportion to it's likelihood if the random number is greater than 1 minus the draw probability. We've also ensured that we don't get a win and a draw in the same game.

Now add up all the points won from wins and draws in Villa's final 15 games. Sum(I2:J16)

Now we need the data table/What if to run the simulation, in this case 1,000 times. count column L up from 1 to 1,000 and paste K16, the total points won by Villa from our projected odds into M1.

Select M1000 to L1. Click "What if", then Data Table, then Column input cell, then select an empty cell, K1 in this case. Click "OK" and the simulated points for Villa will auto fill into column M.

For a step by step screen grab for this stage refer back to this post.

Add the points Villa currently have to each iteration. With 15 games left it was 13. I've done this in column N. And then use =Countif($N$1:$N$1000,Q14) to sum the number of iterations from the 1,000 (or more) you've run to see Villa's most likely final points total.

It's 26, which is also around the mid point of the current quote on the various spread betting sites.

Next time I might get around to simulating league positions in excel, GD tie breakers and all that.

How To Frame An Individual Match Outcome.

A simple method to frame your own match odds using historical goal or expected goal data. We'll look at Sunderland's upcoming home game with Manchester City. City unsurprisingly are strongly favoured.

Here's what you need.

1) The average number of goals or expected goals scored by the home and away teams in the competition.

So you can take data from this season or last season or a weighted average of a number of seasons. Your choice, you can validate your model against out of sample games later to see what works best.

2) The average number of goals or expected goals scored and allowed by Man City and Sunderland. Again time frame is up to you. I don't differentiate between home and away goals, that comes later. Why would you want to chuck half your data away or risk over fitting a "home or away specialist"?

Also the team figures haven't been regressed by adding a proportion of league average. We're just looking at the basic process here.

That's it.

Here's some representative figures. Home teams are scoring 0.25 goals per game more than visitors, 1.49 compared to 1.24. The average game has 1.37 expected goals per team. (Basically just the mean of the first two figures).

Sunderland are scoring few and allowing lots. Vice versa for City.

We want to find Sunderland's average expected goals at home against Man C. So these figures are more usefully expressed as rates.

Sunderland score 1.09/1,37 or 0.79 times the rate of scoring in the competition.

Man C allow 1.16/1,37 or 0.85 times the rate of conceding in the competition.

Sunderland are at home and home teams score 1.49/1.37 or 1.09 times the average rate for this competition.

Multiply these three rates together 0.79*0.85*1.09 = 0.73

Sunderland are likely to score at 0.73 times the league average number of goals at home to City. The league average expected goals for the competition is 1.37 goals.

So in terms of expected goals Sunderland might average 0.73*1.37 = 1.00 expected goals.

Do the same for City.

City score 1.92/1.37 = 1.40 times league average.

Sunderland allow 1.91/1.37 = 1.39 times league average.

Away teams score 1.24/1.37 = 0.91 times league average.

Man C are likely to score 1.40*1.39*0.91*1.37 expected goals = 2.43 expected goals.

So Sunderland have an expected goals average of 1.00 goals and Man C has 2.43 expected goals. We're in Poisson territory now and a plain, non-tweaked Poisson gives the following match predictions.

Compared to the current Oddschecker % of 13% Sunderland, 21% the draw and 67% Man C.