Friday, 5 February 2016

Putting Your Best Foot Forward.

Finishing skill has been an acknowledged fact of football virtually forever. Strikers are never more dangerous than when they are being "clinical", "ruthless" or, for those of a certain comic strip vintage, "Dead-Shot".

Unfortunately this almost mystical ability has constantly eluded every effort to pin it down even as the data generally available becomes more extensive and plentiful.

It is relatively easy to find strikers who are under or over performing their expected goals model based on any number of shot location variables, but persistence of this trait is less obvious.

Often the "cold" player from one month/week/match/half is the same "hot" scorer from a similarly recent time frame.

   The Magical Finishing Skill Aura of "Dead Shot" Keen's boots worked for Billy Dane. 

Shot volume and location can usually be relied upon to produce an expected goals figure that tracks fairly well a player's actual goal tally. But expecting even a season-long over performance to extend to a subsequent season (at least with a rudimentary model) is often a forlorn hope.

Random variation or rare or unlogged events, such as deflections and defensive pressure appears to overwhelm any attempt to observe a quality that is currently worth around 2 billion Chinese yuan.

A player may differ in finding space, receiving passes and anticipating where to be inside the box, but it is likely that the difference in finishing ability once the chance presents itself is going to be small between the elite.

Marginal gains, but also expensive mistakes if luck is purchased masquerading as a repeatable talent.

The biggest talent gap in finishing skill at the top level should lie between strikers and the rest of the outfielders.

So I looked at every shot (headers excluded) taken by every oufield player in a chance created solely from open play, which wasn't deflected and created an expected goals model based simply on the location of the shot. Sample size well into five figures.

Unsurprisingly, the location of the attempt in this sanitized shooting competition was a significant indicator as to the likelihood of a goal being scored.

I then told the model which shots were taken by "Dead-Shot" strikers and which came from the boot of non-strikers. The expectation being that this additional variable would prove significant and improve the likelihood of the strikers scoring at the expense of their team mates who were less talented at finishing (or they would presumably be strikers themselves).

It didn't.

In this dataset, knowing that a striker had taken the shot slightly decreased the likelihood of a goal, but this effect had almost certainly arisen entirely by chance. The model couldn't see a difference in the likely outcome regardless of whether the shot came from a defender or a striker.

If there is a difference in finishing ability between Premier League outfield players in different positions, as opposed to other desirable attributes possessed by a striker, a naive shot location model can't cut through the missing variables and noise to find it.

So instead I looked for a set of Premier League shots that should/might be (much?) less likely to be scored than others and could be picked up by a simple shot location model.

Scorcher's Billy Dane aside, most players don't have magical football boots, but they do have a preference for one foot over the other. I've yet to find a penalty taker who hasn't taken all his kicks exclusively with a particular foot.

Regular penalty takers used their penalty taking foot for nearly 80% of their shots from opportunities created in open play. So you also have to think they know something about the "finishing ability" of their standing leg.

I re did the model.

Again in the model shot location was a significant variable in the outcome of the shot. But this time when I added a variable for whether the shot originated from the player's penalty or non-penalty taking foot, that too was (almost) significant.

Benchmark figure, a shot with a player's "weaker" foot reduces the chances of a goal by around 10% of the value if it had if it had been taken with his penalty kick foot.

Every player demonstrates finishing ability and that difference might show itself on the 20% of occasions he uses his "swinger" and hits and hopes.

Thursday, 4 February 2016

"...And Then We Went To The Etihad".

Manchester City entertain surprise package Leicester in the mid day televised Premier League game on Saturday in the first of five, potentially high leverage head to head matches involving the current top four teams between now and May.

It is unusual to have four teams in genuine contention for the title with just 140 matches remaining, so although the outcome of the early kick off will move the dial it won't be as dramatic as if there were fewer title hopefuls.

The current market odds favour Manchester City followed by Arsenal, the respective second and third favourites in the preseason. So August liabilities may be still skewing the market's February estimation of either lifting the title.

By contrast, Tottenham and Leicester where available respectively at triple and quadruple digit odds.

Numbers are oblivious to any monetary balancing of the books and even the fluctuating levels of future performance that a high profile manager in waiting may inspire. They simply rise or fall as the matches are played out.

Not so very long ago, Leicester were just Championship FA Cup cannon fodder for the Premier League Big Boys.
Manchester City has averaged 1.83 expected goals per game and allowed 1.09 in the season so far compared to Leicester's 1.58 and 1.21 respectively, which gives the hosts a 53% chance of winning, 23% the draw and 24% the visiting Foxes.

The market is more bullish about the hosts (five Premier League losses so far) beating the twice defeated upstarts. It puts Manchester City's chances at nearer 60%.

There will be around 20 minutes to digest the result from the Etihad before the probabilistic projections of Spurs entertaining Watford and Sunday's trip to Bournemouth by Arsenal begin to turn into real points.

There'll also be ample time for the North London fan base to root for the best case scenario for their respective sides in the early game.

So how will the three possible outcomes alter, not only the title chances of the two Citys, but also those of Arsenal and Spurs?

How a Manchester City win might change the title odds at 3 o'clock on Saturday Feb. 6th.

How a draw might change the title odds.

How a Leicester win might change the title odds.

Obviously a win is the best possible outcome for either Manchester City or Leicester.

The host would draw level with their visitors with a win, the most likely outcome. Viewed purely in terms of the relative strengths and remaining schedule of the four challengers, Manchester City's likelihood of winning the title would remain below 50%. Although  in a potentially skewed market they are likely to move to odds on.

A Manchester City win is also marginally the worst outcome for Arsenal.

Spurs can root for a Man City win or a draw. Although the latter would turn their Valentine's Day game at the Etihad into a high leverage game.

A Leicester win would eat into the chances of each of their three competitors, particularly Manchester City's.

Although their underlying inferior defensive and attacking expected goals would mean that even a six point lead would be insufficient to overturn a title win by someone other than the Foxes as still the most likely outcome come 3 o'clock on Saturday.

Monday, 1 February 2016

Using Excel To Simulate Villa's Demise.

In the previous post, I described a simple method to use expected or real goals to estimate the average number of goals each team might score and allow in a single game at a certain venue and hence derive the win/draw loss percentages for the game via a Poisson.

It's a handy trick, particularly if you want a method to frame you own match odds and compare them to the market. But the goal ratings can also be used to create passable odds for games that are due to be played over the remainder of the season.

The table above shows the home/draw/away odds for the final weekend of the season using team ratings from the first 230 matches of the season, expressed in expected goals.

It is likely that the abilities of the 20 Premier League teams will change over the remaining 150 matches, but often the change is gradual. Regression towards the mean may be used along with season to date trends to extrapolate each side's future ratings. But on this occasion the ratings from week 23 have simply been used throughout.

To download the estimated home win/draw/away win probabilities for the remainder of the 2015/16 Premier League season just click on the download icon above.

There are two worksheets. One with match odds, both home and away and a second which lists win/ draw (and loss) odds for each team's final 15 games.

We've now got the available ammunition to simulate the range of points that might be won by each of the 20 sides and eventually join up all the interconnected results in each iteration of a season to project final league positions.

But first we'll just use excel to simulate the range of final points a side might expect to get based on these match probabilities.

Here's Villa's final 15 games with their predicted win% in column D. In column G take their predicted draw probability from 1 and drag this formula down to G16.

Insert a random number in column H and again drag down to H16.

We need two columns. One for three points should Villa win and one for a single point should they draw. A win is assumed if the random number is less than the corresponding win probability in column D.

We've taken the draw probability from one in column D. So a draw is assumed in proportion to it's likelihood if the random number is greater than 1 minus the draw probability. We've also ensured that we don't get a win and a draw in the same game.

Now add up all the points won from wins and draws in Villa's final 15 games. Sum(I2:J16)

Now we need the data table/What if to run the simulation, in this case 1,000 times. count column L up from 1 to 1,000 and paste K16, the total points won by Villa from our projected odds into M1.

Select M1000 to L1. Click "What if", then Data Table, then Column input cell, then select an empty cell, K1 in this case. Click "OK" and the simulated points for Villa will auto fill into column M.

For a step by step screen grab for this stage refer back to this post.

Add the points Villa currently have to each iteration. With 15 games left it was 13. I've done this in column N. And then use =Countif($N$1:$N$1000,Q14) to sum the number of iterations from the 1,000 (or more) you've run to see Villa's most likely final points total.

It's 26, which is also around the mid point of the current quote on the various spread betting sites.

Next time I might get around to simulating league positions in excel, GD tie breakers and all that.