Pages

Thursday, 31 March 2016

Adding Up Expected Goals Models, One Bit At A Time.

So we'll start off with a really basic shot location model that just uses how far in yards the shot is taken from the centre of the goal as the single independent variable with goal/no goal as the dependent variable. We don't include headers, just shots, so no need for a distinguishing independent type variable.

It's going to contain some useful information about the likelihood you score, (wider = worse), but you'd be surprised if it was the definitive example of the art of expected goals modelling.

You run your regression and horizontal distance is a statistically significant term. This gives you an equation to use on out of sample data and as in the previous post we can compare cumulative expected goals with reality.

Breaking down the out of sample shots into groups of 364 by increasing scoring likelihood from the perspective of your regression, we get some good reality checks and some not so good.

 Actual Goals v Predicted Goals in Bins of Increasing Likelihood of Scoring. 364 shots per Bin.


A curate's egg. Good in parts, starts well, then goes awry, has a bull's eye mid table, then peter's off again.


The plot of expected verses actual goals looks half decent, especially considering that we've only used one, slightly obscure independent variable. R^2, another favourite is also impressively high.

But if we run a few test on the observed number of goals compared to the predicted number, along with the respective number of non goals for each bin, we find that there's only a 6% chance that the differences between the groups of observations and predictions has arisen by chance alone.

This is just above the generally used threshold for statistical significance, but for the model to be a good fit we ideally want a large probability that there is no difference between the actual results and the model's predictions.

It'd be great if we had say a 50% chance that the differences in the results had arisen by chance, but for this model we just have a 6% level.

So let's add the y co-ordinate's missing twin, the distance from goal.


Scores look a little better, a couple more near misses and fewer bins that are way wide of the mark. R^2 on the plot's up to 0.99.

If we compare the observed to the predictions, it's now nearly a 10% chance that the differences are just down to chance. In other words the likelihood that the model is a good predictor of reality and the differences are mere chance has increased by adding some more information to the model.

If we just used the x co-ordinate rather than the y, the 6% crept up to 7%. So we can perhaps conclude that horizontal distance builds you a model, vertical distance alone improves slightly on it and both inputs together improves it still further.

Finally, let's throw even more independent variables into the mix. We'll include x, y as well as an interaction term to see the effect of taking shots from wider and further out or closer and more centrally. This model also has information on the strength of the shot and as deflections play such a major role in confounding keepers, I've used that as an input also.


This looks the best of the bunch so far. Observed and predicted increase more or less hand in hand and over half of the bins could be considered as virtual twins. Cumulative observed and predicted totals match exactly and the R^2 is again 0.99.

Perhaps most tellingly, paired comparisons of the differences between the actual and expected goals and non-goals for each bin are highly likely to have arisen just by chance. The p value is around 0.8. So it is highly likely that the differences we see is just chance rather than a poorly fitting model, especially when compared to the 10% levels and below for the less populated models.

At the very least, binning your predictions from an expected goals model, comparing it in an out of sample exam and eye balling the results in the type of tables above might tell you if you've inadvertently "smoothed out" and hidden you model's flaws in more usually quoted certificates of calibration.

Wednesday, 30 March 2016

Good Model, Bad Model.

This may be my last expected goals post for a while, so I thought I'd look at a couple of slightly flawed ways to try to estimate how good your exp gls model might be.

I prefer to measure the usefulness of a model by how it performs in data which it hasn't previously seen, rather that attempting to see how well it describes outcomes that have been used in the construction of the model.

Over fitting of inputs that are unique to the training set may give the illusion of usefulness, but subsequently the model may perform poorly when applied to new observations and while description is a handy attribute for a model, prediction is arguably much more useful.

We'll need a model to play with.

So I've constructed the simplest of exp goals models using just x, y co-ordinates and shot type, foot or header and I've also restricted attempts to chances from open play. The dependent variable is whether or not a goal was scored.

We have three independent variables, x, y and type, from which we can produce a prediction for the probability that a particular attempt will result in a goal.

The training data uses Premier League attempts from one season and the regression is then let loose on four months worth of data from the following season.

Initially the model seems promising. 369 open play goals were scored in the new batch of 4,000+ shots and the cumulative expected goals for all attempts using the regression from the previous season when applied to the shot type and position of these attempts predicted that 368 goals would be scored.

Splitting the data into ten equal batches of shots, ranging from the attempts that were predicted to have the lowest chance of resulting in a goal up to the highest gives an equally comforting plot.

The relationship between the cumulative expected number of goals in each bin and the reality on the pitch is strong.


Everything looks fine, big R^2, near perfect agreement with expected and actual totals. So we have a great model?

However, if we look at the individual bins, it's not quite as clear cut.

The third least likely goal laden bin by our model's estimates had around 400 attempts and around 12 predicted goals, compared to the 20 that were actually scored. The sixth bin had 26 expected and 21 actual.

Nothing too alarming perhaps. We shouldn't expect prediction to perfectly match reality, but when there is divergence we should see if it is extreme enough to suggest that the groups are substantially different or simply the product of chance.

To test this I took the expected and actual figures for goals and none goals for all ten groups comprising of 401 attempts per bin and saw if the differences were statistically significant or just down to chance.

Unfortunately for this model there was only a 2% chance that the range of discrepancies between the actual numbers of goals and non goals and the predicted numbers in each of the ten bins from the model had arisen by chance.

Bottom line, the model was probably a poor fit when used for future data despite it ticking a few initial boxes.

Next, I divided the data into shots and headers and ran a regression on the training set to produce two separate models. One for shots that simply used x, y co-ordinates as the independent variables and similarly for headers.

Both of these new models passed the first two tests by predicting goal totals in the out of sample data that agreed very well with the actual number of goals scored and plotted well when binned into steadily rising scoring probabilities.

However, unlike the composite model that used shot type as an independent variable, both of these models produces binned expected and actual differences that could, statistically be attributed to just chance.

The model for headers particularly had differences between the model's prediction and reality that was highly likely to have been simply down to random chance rather than a poorly fitting model.

There are problems with this approach, not least sample size, but it may be an additional check when exp gls models are released on new data.

Sunday, 6 March 2016

"Martinez Blows Most Dangerous of Leads"!

Human interpretation of sporting events is often awash with cognitive biases.

It is virtually impossible to become totally free of these irrational deviations in matches where you have a vested emotional interest, but even numbers based assessments are sometimes likely to fall foul of such traps.

Many biases of this kind exist, notably, but not exclusively ranging from outcome bias, that over values the result compared to the thought process behind the original decision to recency bias, that over values newly acquired, often memorable outcomes to selection bias that limits the data to support a preconceived viewpoint.

The BBC's @philmcnulty would appear an ideal account to follow for those who prefer their information liberally dosed with irrationality.

Fresh from providing a live score service from North London, Phil was tweeting his 364k followers about Everton's soul crushing capitulation from 2-0 up at home to analytics' bete noir, West Ham.



The tweet certainly struck a chord with Everton fans who were licking fresh wounds and enthusiastically re tweeted the sentiment, along with a few dozen gloating Reds.

As if to reiterate that this was a measured appraisal of Martenez' regular inability to steer the side to their just rewards from 2-0 up, Phil later tweeted this.


So a late playing of the rationality card, "facts" were behind the initial tweet.

Since Roberto Martinez took over at Everton they have raced to 33 2-0 leads in all competitions. Of those 33 matches, Martinez has, if each game were treated as a league match, amassed a total of 90 points from a possible 99 or 91% of the possible maximum.

The only defeat I can find was yesterday against West Ham. However, prior to that they let a 2-0 lead slip when only drawing 3-3 away at Chelsea in January and had done the same at Bournemouth in late November.

So we've potentially got recency bias and selection bias lurking in the sub-conscience, especially for the emotionally attached supporter base.

If "facts" or if you prefer, a probabilistic assessment of the likely range of outcomes for a side taking a 2-0 lead in 33 matches, is more your preference, we can throw some ball park numbers at the problem to counterbalance our inbuilt biases.

Everton reached a 2-0 in these games by, on average the 45th minute. If we make average assumptions about the quality and venue in these matches, there's around a ~90% chance they win from 2-0, ~8% they draw and ~2% they lose, as they did on Saturday.

Spread over 33 games, the "facts" show that losing at least one such game from the "most dangerous of 2-0 leads" is almost as likely as not for a fairly typical mid to upper table side.

But of course, Everton weren't  a fairly typical mid/upper team yesterday. they were a ten man team from the 34 minute onwards.

A red card "costs" a side ~ 1.45 goals per game or about a whole goal from the 34th minute.

When Everton took their 2-0 in the 56th minute, but before possibly one of the league's best shot stoppers. Adrian successfully narrowed the angle for Lukaku's penalty kick five minutes later, they had around a 6% chance of losing that single game alone.

Of all their 2-0 leads, yesterday was probably the one they were most likely going to blow big time. in part due to Mirallas' theatrics. It was the least likely outcome yesterday after an hour, hence the howls of anguish, but if you accumulate enough low, but none zero probabilities over your managerial reign, sooner or later one rare event is going to bite you.

And there's no shortage of talking heads to irrationally mould possible chance into concrete character flaws.

Recency bias, outcome bias, selection bias, anchoring bias, (concentrating on one factor and ignoring all others) and opportunity bias (ignoring the number of times an outcome may have transpired).....Pretty good going for one tweet.