Thursday, 4 December 2014

The Weighting Game.

Shot counts verses goal counts as a predictor of future performance is a debate that that is being fought out not only in football, but also in hockey. Sample size is at the heart of the issue. Goals are obviously more important in terms of who wins the match, but they are relatively rare events. Whereas shots accumulate at a faster rate, building up sample size, but play only an intermediate role in deciding the outcome.

It is perhaps unfortunate that a distinction has arisen between shots and goals because they are merely classifications of a single larger group. Namely, they are all goal attempts, but with different actual outcomes.

Goals are shots (or headers) that result in a goal, saves are on target shots that are saved and misses are shots that go high or wide of the target.

The most recent rumblings from hockey arises from renowned sabermetrician, Tom Tango's use of different types of shots (he includes blocked efforts also), from the first half of a season to predict goals or specifically goal differential from the second half.

He uses the different types of shot differential, with appropriate weightings in the first half of a season to predict goal differential in the second. The post can be found here and the application to football is obvious.

I have therefore updated a similar approach using data from Joe B's football data site (So no blocked shot data as a separate category). The aim was slightly different. I set out to determine the final goal difference for EPL teams, based on their goal difference and shooting differential at various times during the season.

Final goal difference is strongly correlated to finishing position and with the odd exception finishing position is also related to team strength. And knowing where a side is likely to finish well before they actually arrive at that position is an obvious advantage if we wish to know how they are likely to perform during that 38 game journey.

I therefore split goal attempts into shots that went into the net (or goals), attempts that were saved and off target attempts and totaled the cumulative differential between each Premiership team and their opponents from the second match of the season until the penultimate game.

For example, after 14 games, Arsenal currently have a +7 goal difference, a +91 differential in shots that went wide and +34 differential in shots that were saved.

I then regressed these differentials against the final goal difference after the 38th game to get the changing relationship between the three variables and the side's ultimate goal difference after two games all the way up to 37 games played.

All three types of shots are important in predicting the final goal difference of teams. But the relative importance in predicting future goal difference from shots that are saved or go wide, declines in relation to the importance of shots that result in a goal (or goals for short), as the number of games increases.

In addition, the values of the coefficients is also dependent upon how many matches are in the sample. The respective goals, wide shots/headers and saved shots/headers coefficients are 3.91, 0.43 and 1.24 when calculated after just two matches and, as you would probably expect 1.02, 0, and 0 after 37.

So far this season each team has played 14 matches and the coefficients for current goals, wide shots and save differentials when used to predict future final goal difference are respectively 1.91, 0.13 and 0.14. If we use these figures for each team, the final projected league goal difference for each side is as shown below.

Projected Final Goal Difference Using Shot Differentials After 14 Games.

Projected GD.
Man City
Man Utd
West Ham
West Brom
C Palace
Aston Villa

At the moment these figures are merely another rating system, albeit one that appears to reasonably predict the likely quality of the current side. Villa, for example appear to have been fortunate in the way in which they have won numerous single goal victories. And a wider appraisal incorporating extra shot information reduces their rating compared to their current league position.

To illustrate how the projections have fluctuated for a single team, here's how the projected final goal difference has varied for Arsenal using the updated coefficients after each game week of the 2014/15 campaign to date.

Projected Final Goal Difference For Arsenal from Shot Differentials Updated Weekly.

Games Played by Arsenal. Final GD Projection.
    After 2 Games.            +13
3 23
4 16
5 24
6 25
7 17
8 22
9 24
10 31
11 27
12 26
13 27
14 28

We can demonstrate the use of such ratings and perhaps their predictive potential by converting these weighted shot derived ratings into match odds and comparing them with a reliable benchmark, such as the current bookmaking odds.

Stoke entertain Arsenal on Saturday. Arsenal's projected final goal difference is a rounded up +28, Stoke's is an also rounded up -4. Or +0.74 and -0.1 per game, conveniently in the currency of goals.

Home field is running at 0.38 of a goal. So Arsenal are 0.84-0.38 of a goal superior away to Stoke.

Arsenal should be, based on our projections, 0.46 of a goal superior, on average at the Britannia. If we run this figure through a Poisson, we might get a 47% chance of an Arsenal win, 26% for Stoke and 27% the draw. Best odds, as of Thursday night are 50%, 23% and 27%.

So the projections broadly agree with a robust business model, at least for the Potters and below I've applied the method to the remaining games this weekend.

Odds Derived from Shot Differentials and Final Goal Difference Projections for Week 15.

Game (Home Team First!) Home Win %
(Predicted/Best Price)
Away Win%. Draw%.
Man City v Everton 67/63 14/16 19/21
Liverpool v Sunderland 62/64 15/14 23/22
Newcastle v Chelsea 19/13 57/63 24/24
Spurs v C Palace 53/63 21/14 26/23
Stoke v Arsenal 26/23 47/50 27/27
QPR v Burnley 47/47 26/25 27/28
WHU v Swansea 49/41 24/30 27/29
A Villa v Leicester 45/43 28/28 27/29
Southampton v Man U 48/36 25/37 27/27
Hull v WBA 39/40 33/31 28/29

The majority of the odds fall within touching distance of those available to bet on and those few that don't do so for rational reasons, such as Manchester United's chaotic "getting to know you" phase, combined with Southampton's recent injuries.

By weighting shot types and applying coefficients appropriate to the number of matches played, it appears possible to project team strength with sufficient accuracy to mimic the bookmakers appraisal of Premiership teams.


  1. Very nice job, really! One quick question. in order to calculate the final GD projection you also need to know the intercept, right?

  2. Cheers,Raffo,

    Yes, but it is always within a tenth or so of being zero.


  3. Thanks. I'm trying to apply it to the Italian Serie A but the coefficients I get are pretty dfferent (and sometimes negatives).

    For instance after regressing the last 2 seasons here's what I get after 14 games: Intercept -3.58; GD 2.022; Shot Off 0.103; Shots saved -0.032 I'm using FootballData as well.

    Just out of curiosity, how many EPL seasons did you regress?

    1. Raffo, I've taken a quick look at the last two Serie A seasons and get GD coefficient= 2.1, wide shots differential coefficient= 0.09 and saved shots differential coefficient = 0.005, zero for the intercept after 14 games.

    2. Thanks, if you do for Serie A it will be great.

  4. Just done a quick check and I get very similar results using the last 10 seasons and just the 2. I've used 10 for the post. I haven't tried it for other leagues, but I'll try and run a couple over the weekend.


  5. Because you are correlating to the seasonal differential, you are, as you stated, not doing exactly the same thing. As a result, as you approach the end of season, you are in effect correlating x to itself.

    Could you repeat the same exercise I did, but correlating current data to future data? So, first two games, to next 36 games. First three games to next 35 games, and so on? Presumably, you'll reach a peak when you correlate 18 to 18 games. And it would be interesting to see the pattern for the various shot results.

    1. I'll get it done over the weekend. I have already run first half of the season correlations to second half of the season.

      Splitting the season in half, the goal difference in the first half of the season has a r of 0.72 to goal difference in second half.

      If you include shot differential for shots that were saved and shots that missed the target, r between these inputs and future goal differential increases to 0.78.

      Goal Difference in 2nd half of season=0.46*GD first half+0.021*wide shots differential first half +0.137*saved shots differential first half.

  6. Hi Mark, great read, works a dream with the EPL. I'm using it along side other ratings to better gauge H-D-A probabilities and xGoals.

    I've tried applying this to the Bundesliga, but the figure to get total points (51) is far too high. Over the seasons I have looked at it over-predicts by about 16 points (while under-predicting Bayern and Dortmund by around the same). Not sure if this is due to there being less teams and therefore less games. What would be your thoughts on this? Not sure if I should just go with +35 instead of +51, or if this is even an adjustable figure.