Wednesday 21 November 2012

Poisson, Predictions and a Tense Last Ten Minutes.

Often the bigger the incentive is to get things right, the better the final results and the potential for you to lose money in an enterprise inflates that incentive .One "hidden" resource in the field of football analytics is provided by the gambling industry, who regularly produce sporting predictions that are occasionally skewed by weight of money, but more often provide a readily decipherable estimate of an event's true likelihood of occurring. Therefore, this post is written with a betting slant, but it is centrally applicable to the field of football analytics.

Everyone takes for granted the opportunity to bet on a sporting event "in running". However, it is worth remembering that it is a relatively new concept and as such the betting tools developed to describe such events are similarly under developed. Rewind a decade and once the first whistle was blown or the stalls opened, then the betting shutters were slammed tightly shut. Nowadays the betting carries on unabated.

Football is an obvious vehicle for in running wagers and that has created a need to predict match probabilities under many different combinations of scoreline and time elapsed. Aggregating many season's worth of historical data does a reasonable job of describing the general case, but is clearly lacking when applied to specific team matchups.

The major problem with these type of models is biased sampling. Poorer teams playing superior teams are more likely to find themselves trailing, say 2-0 after 45 minutes. So the sample used to predict the likely game outcome from this position will contain an over representation of poor sides and they will go on to perform in accordance with the wider gap in quality over the remainder of the game. In using this biased, general case to predict how Manchester City may perform should they trail 2-0 at halftime to the likes of Stoke will greatly underestimate the possibility of a Blue comeback. The chances of Manchester City storming back for a win in such circumstances is over twice that seen generally.

So if aggregated models have a major flaw too far, what are we to use ? The data revolution has enabled predictions to be made using vast amounts of different inputs, but this approach has produced a counter movement, where simplicity of design and input is thought to produce results of equal merit. A simple goal based model, using the outputs of a Poisson calculation on a team's average goal expectancy to calculate the probability of each side scoring an exact number of goals in a match has been well described in numerous websites since the late 90's.

This approach too has flaws, such as under prediction of draws and failure to account for a lack of independence between the expected scoring rates of both sides. However, these flaws are both well understood and because Poisson has long been used in football prediction, these problems have been extensively addressed.

I'll assume everyone has a passing knowledge of the Poisson approach to modelling football matches, but for the casual reader, the distribution allows an estimation of the likelihood of a team scoring exactly 0,1,2,3 goals and so on given we expect that team to score and average of say 1.6 goals in such a game. To fully appreciate how we can use the Poisson approach to begin to build an in running calculator we first need to grasp the concept of goal expectation.

When we say that a side has a goal expectation of 1.6 goals, we are saying that if today's game were to be repeated over and over again, the average number of goals we would expect our team to score would be 1.6. Sometimes they wouldn't score at all, sometimes they would score 6. The most likely outcome would be a score of exactly one, followed by two. But over a long period of repeats, the average would trend towards our best estimate of 1.6.

The most important thing we need to appreciate is how this goal expectancy decays over the 90+ minutes of a match. The average 1.6 goals per game figure decays because of time elapsed. Goals already scored or conceded may tweak the average slightly in one direction or another as a result of competing, scoreline dependent, tactical rearrangements, but a glut of early goals doesn't significantly alter our pregame goal expectancy.....only the passage of time can do that.

The rate at which a team's goal expectancy declines isn't constant. More goals are scored on average in the second half than the first as teams become more urgent in their efforts to score and fatigue leads to more space. The rates are around 44% for the former and 56% for the latter and the decay can be adequately described by an exponential equation of the following form.

Remaining Goal Expectancy = Initial Goal Expectancy x (Proportion of Time Remaining) ^0.84

Imagine Stoke is expected to score an average of 1 goal in a particular match, West Ham away on Monday night, perhaps. By halftime when the proportion of time remaining is very close to 0.5, the remaining goal expectancy can be calculated by inserting these values into the previous formula to give

Remaining Goal Expectancy = 1 x (0.5)^0.84 = 0.562 of a goal.

0.562 of a goal, you may notice equates to 56% of the initial goal expectation of 1 goal, which nicely fits the observed data. We can repeat this calculation for any minute of the match and also for the opposition. Armed with this information we are just a few repetitive, but simple steps away from being able to describe the likely scoring combinations that will occur in the remainder of the contest.

We'll fast forward to the 80th minute to use this accumulated knowledge to begin to construct a flexible and realistic "in running" prediction model. The West Ham/Stoke game was a fairly common type of Premiership contest, where two reasonable well matched sides were separated on the night by little more than home field advantage. An average expectation at kickoff for Stoke would be that they'd score close to one goal and concede just over 1.4 of a goal to the Hammers. If we insert those numbers into our equation and allow for the likely 4 minutes of added time we could expect Stoke to average 0.22 of a goal to West Ham's 0.30 in the remainder of the match.

If we now fire up the Poisson calculator we can produce probabilities that Stoke and WHU will score exactly 0,1,2,3 goals and so on, in the last 10+ minutes of Monday's game. Those probabilities are listed below.

The Likelihood of Stoke or WHU Scoring an Exact Number of Goals after the 79th Minute.

Team. 0 Goals. 1 Goal. 2 Goals. 3 Goals. 4 Goals.
WHU. 0.737 0.225 0.034 0.003 0.000
Stoke City. 0.806 0.174 0.019 0.001 0.000

We can now begin to accumulate the score combinations that will lead to a final match outcome, bearing in mind that O'Brien had equalised Walters' opening goal for the visitors and the match was currently stalemated. If, as actually happened, neither side scores, the match ends as a draw and the probability of a 0-0 is given by multiplying 0.737 by 0.806 or the individual probabilities of each side failing to score. That outcome has a probability of 0.594 or around 3 times in every 5. A 1-1 in the final "mini" match will also ultimately lead to a draw, as would a 2-2, 3-3 or 4-4 for the optimistic thrill seekers. If we finally total each of these individual, correct score probabilities, we have the likelihood of the currently tied game ending so at the final whistle.

A similar process generates cumulative probability totals for each correct score that leads to either a City win or a happy Hammers victory.

The Likelihood of Stoke or WHU Gaining any Result from 1-1 after the 79th Minute.

Team. Win. Draw. Loss.
WHU. 0.22 0.63 0.15
Stoke City. 0.15 0.63 0.22

The above example is conveniently simplified by a current scoreline of 1-1, but teams can both trail or lead as WHU and Stoke respectively did in this match. However, the process merely becomes slightly more tedious rather than more complex. Stoke's set piece prowess finally reached ground level in the 13th minute when Walters found space in front of decoy runners to crisply dispatch a precisely delivered Whelan corner, an inventive deviation from the Delap assists of old. So if we want to examine the likely match result from say the 34th minute we have to also account for the 1-0 lead held by Stoke.

In this game situation, should Stoke go on to "win" the mini match from the 34' onwards, they will bolster their lead and comfortably win the game. In addition, if they merely "draw" the remainder of the match they will also win the entire game because of the 1-0 lead given to them by their mustachioed striker. An actual draw requires WHU to "win" the next 60 minutes by a single goal or by two or more to claim all three points.

The Likelihood of Stoke or WHU Gaining any Result from 0-1 after the 33rd Minute.

Team. Win. Draw. Loss.
WHU. 0.16 0.25 0.59
Stoke City. 0.59 0.25 0.16

As the scoreline becomes more lopsided, the combinations that ultimately lead to wins, losses or draws also becomes more diverse. A team which holds a 2 goal cushion can afford to "lose" the remainder of the contest by a single goal and still claim victory. So the totting up procedure becomes more tiresome, although a spreadsheet helps greatly, but the Poisson process on a suitably decayed goal expectation remains constant.

As has already been stated, this quick run through does not account for well recognised deficiencies in using Poisson to describe football goal scoring, nor does it allow for the small, but real emphasis shifts that occur as the scoreline changes, but we can test the model's validity by comparing it's predictions to the efficient Betfair betting markets.

Below I've plotted the near 100% book prices that were available on Betfair in two minute intervals, along with the predictions from a pure Poisson during Monday night's Stoke West Ham match.

Price & Probability Movements During Stoke's 1-1 Draw At WHU.


0-1,Walters, 13'
1-1, O'Brien, 48'

The Betfair prices and the pure Poisson track each other's progress fairly accurately. The under prediction of the draw, inherent in the Poisson is well seen up until the WHU equaliser and my allegiance to one of the two side may also be represented throughout by my choice of initial goal expectations. Also the slightly increased optimism towards WHU during the half hour where they trailed isn't captured by the "blind" Poisson, but is by the Betfair traders and is also present in actual data.


  1. Excuse me for being completely unrelated to this, but how long did it take you for mcfc analytics to send you the opta data set? I registered nearly a week ago and haven't heard anything.

  2. I got the lite data in the middle of August and the play by play for Bolton Man City a week or so later. So pretty quickly after registering.

    Have you tried asking at the opta pro micro site

    It's been set up to discuss and share the project.

  3. Interesting stuff, do you not think one of the reason that poisson and betfair prices are so similar is mainly due to the fact the market markers on Betfair are simply using poisson within the majority of bots operating?

  4. That's a fair assumption. Large sample analysis of the Betfair in running soccer prices does indicate that they do a very good job of efficiently tracking real life outcomes. This suggests that you can produce a decent model with just a few inputs (goals, time elapsed and red cards)because a suitably tweaked Poisson mimics betfair....although as you suggest betfair prices may largely equal a poisson approach.

  5. Of course if you delve a little deeper it becomes obvious that this is pure bunkum. A team's goal expectancy is seriously dependent on the availability and mindset of each and every player utilised,together with the availabilty and mindset of the each member of the opposition and how they all interact with or against each other. Then extraneous influences such as pre-match fatigue, transfer requests, subconscious bias of the officials, over-use or under-use of cautions, timing of cautions, pitch conditions, adverse weather, precariousness of the manager's tenure, blisters, distracted/pre-occupied officials and so on, and so on. When you add in the relative ability of players facing each other it becomes obvious that there are too many imponderables to accurately predict anything. When Portsmouth entertained Reading in 2007, they had scored 1 or less goals in 5 of the preceding 6 games, while Reading had scored a grand total of 5 goals in 8 games. The expectations then would not have got anywhere close to the final score of 7-4. The sensible route is to treat aggregated analysis as an aid to serious analysis, not THE analysis. Have a look at the live-odds in the last few minutes of a match. They appear to go the wrong way, then stall until the last few seconds of the match. The reason for this is that it only takes a moment to score a goal. Players are human and humans are unpredictable, unlike Poisson analysis. How many people had Man Utd v West Ham down for 1-0? Seconds after the goal a ManU win was traded at 1.09 but this drifted to 1.12 or 1.13 and stayed there right into injury time. Did Poisson predict this? I doubt it. Poisson predicts/instructs prices, but it doesn't predict outcomes.

  6. Goals based Poisson predictions tally extremely well with long term reality, both pregame and in running.

    "The expectations then would not have got anywhere close to the final score of 7-4." Why would they? The goal expectations alone don't predict a score, just the average number of goals those specific teams would score or concede in a large number of repetitions. The Poisson then gives an estimation of the individual scores. I'd imagine a 7-4 would turn up around once every 120,000 games.

    What teams have done in their last 6 or 8 games gives you very little indication of what they will do in their next match whatever you do with the numbers. I suggest a minimum of 30 matches, suitably smoothed are used to calculate goal expectations.

    The relative abilities of the teams is one of the easiest factors to deal with, blisters probably less so, although their influence on match outcome is probably overrated. The biggest extraneous influences are current score. Trailing teams become slightly more likely to score than previously, leading teams slightly less so. Again relatively easy to incorporate, although how to improve a raw Poisson wasn't ever part of this post.

    "Seconds after the goal a ManU win was traded at 1.09 but this drifted to 1.12 or 1.13 and stayed there right into injury time. Did Poisson predict this?"
    Within a few % points, yes. It tracked United's price as the game unfolded and could, if needed have produced an estimation of the chances that United would lead after the first minute, which may have been superior to an alternative opinion.

    Models, even ones full of bunkum try to attach a likelihood to a particular outcome, they don't say this game will finish 7-4. Life would be very boring if they did.

    thanks for your comments.

  7. Therein lies the problem - if 6 matches are unreliable how can Poisson give you the final score on the single selected match. We don't bet in batches of 30. I'm only guessing, but if you followed Poisson predictions for the Man Utd game you would probably be going for 3-1 at the start and after the goal. Then it would be reducing from that score right up to the final whistle? Well that's obvious and you don't need Poisson for that. As far as I can see all Poisson does is tell you when the price is slightly wrong, which will give you the opportunity to make a value bet, but you'd still have lost your money in those two matches.

  8. "if you followed Poisson predictions for the Man Utd game you would probably be going for 3-1 at the start and after the goal"

    Why would you ? I've never suggested you pick the most likely outcome, assuming your point is that 3-1 is the most likely correct score outcome, (I'd disagree, btw). You're measuring your opinion against the opinion of others, one model against another.

    "As far as I can see all Poisson does is tell you when the price is slightly wrong, which will give you the opportunity to make a value bet.."

    So you've identified an outcome that occurs more frequently than implied by the odds in the long term. You'll have to explain the downside of this.

  9. An outcome is not what's been identified - only a variation in the predicted price. The hoped for outcome may not come to fruition. You love it, I don't. Nuff said.

  10. Rich,
    I don't "love" anything I've posted.

    Feel free to outline the failings of this approach, host it and I'll gladly add the link here, but a continuous procession of strawmen is hardly helpful.

    thanks for your comments,

  11. Why is it that you do the ^0.84 ?

  12. Hi Knud,
    it gives the best fit to how the actual scoring rate changes over time.


  13. Ok, but it only fits when the expected team score is set to 1,0. when the teams expectation score is 2,0 the outcome for 2. half would be 1,0 which is only 50% and not 56%.

  14. Hi Knud, depending on how you treat added time when team expectation is 2 the expectation for the second half works out at around 1.13, just over 56%.


    1. I am not sure that i understand. Can you give an example of how you calculate that when the teams expected score is 2,0 and 1,5 for instance please?

  15. Hi Knud,
    let's say you've got half the match remaining and team A had an initial goal expectancy of exactly 2 goals. The proportion of the match remaining is 0.5.

    Proportion of time remaining raised to the power of 0.84 = 0.5^0.84= 0.559.

    0.559 * 2(the initial goal expectancy) = 1.12 = goal expectancy for the remainder of the match.

    1.12 goals is 56% of the original goal expectancy of 2 goals. This helps to account for the gradual increase in scoring, on average as matches progress.

    An original goal expectancy of 1.5 has decayed to around 0.838 goals with half the match remaining. Again around 56% of the original figure.


  16. Hi Mark, See www dot sortmyfootball dot com, you can see the goal expectations for each team including recent moves. I suspect you may find this resource useful.

  17. Where did you get 0.84 from?

  18. Hi Rory,
    it gives the best fit for describing real life data over the long term.

  19. Hi Mark

    That's a really great article and very informative thank you.

    Could I ask a quick question please, if you don't mind.

    I understand the whole principle of the poisson calculations for pre match and then as you mentioned using your calculation of (^0.84) to get an in-running calculation for the remaining goal expectancy, however what I don't understand is when one team goes 1-0 up, how do you then calculate the goal rates? (for example on your Stoke/WHU game from the 33rd minute onwards).



  20. Hi there, Mark!

    What a beautiful job you did here, mate. Congratulations! I'm not sure if I'll be replied once this article was wrote in few years ago, but lets try.

    I would like to know why the equation with ^0.84, I know that's because it fits well. But where did you get it? Did you invented it? Or is there any other article that explain this equation?

    And the most important to me is: when you crossed your prediction with Betfair it fitted fine. But are you sure Betfair just uses the average scored goals in a poisson model? What did you do to this fit so very well, my friend?

    I hope that I was clear enough, Mark.

    And again, congratulations for your post. =)

    Best Regards

  21. Many thanks for i nice article