Everyone takes for granted the opportunity to bet on a sporting event "in running". However, it is worth remembering that it is a relatively new concept and as such the betting tools developed to describe such events are similarly under developed. Rewind a decade and once the first whistle was blown or the stalls opened, then the betting shutters were slammed tightly shut. Nowadays the betting carries on unabated.
Football is an obvious vehicle for in running wagers and that has created a need to predict match probabilities under many different combinations of scoreline and time elapsed. Aggregating many season's worth of historical data does a reasonable job of describing the general case, but is clearly lacking when applied to specific team matchups.
The major problem with these type of models is biased sampling. Poorer teams playing superior teams are more likely to find themselves trailing, say 2-0 after 45 minutes. So the sample used to predict the likely game outcome from this position will contain an over representation of poor sides and they will go on to perform in accordance with the wider gap in quality over the remainder of the game. In using this biased, general case to predict how Manchester City may perform should they trail 2-0 at halftime to the likes of Stoke will greatly underestimate the possibility of a Blue comeback. The chances of Manchester City storming back for a win in such circumstances is over twice that seen generally.
So if aggregated models have a major flaw too far, what are we to use ? The data revolution has enabled predictions to be made using vast amounts of different inputs, but this approach has produced a counter movement, where simplicity of design and input is thought to produce results of equal merit. A simple goal based model, using the outputs of a Poisson calculation on a team's average goal expectancy to calculate the probability of each side scoring an exact number of goals in a match has been well described in numerous websites since the late 90's.
This approach too has flaws, such as under prediction of draws and failure to account for a lack of independence between the expected scoring rates of both sides. However, these flaws are both well understood and because Poisson has long been used in football prediction, these problems have been extensively addressed.
I'll assume everyone has a passing knowledge of the Poisson approach to modelling football matches, but for the casual reader, the distribution allows an estimation of the likelihood of a team scoring exactly 0,1,2,3 goals and so on given we expect that team to score and average of say 1.6 goals in such a game. To fully appreciate how we can use the Poisson approach to begin to build an in running calculator we first need to grasp the concept of goal expectation.
When we say that a side has a goal expectation of 1.6 goals, we are saying that if today's game were to be repeated over and over again, the average number of goals we would expect our team to score would be 1.6. Sometimes they wouldn't score at all, sometimes they would score 6. The most likely outcome would be a score of exactly one, followed by two. But over a long period of repeats, the average would trend towards our best estimate of 1.6.
The most important thing we need to appreciate is how this goal expectancy decays over the 90+ minutes of a match. The average 1.6 goals per game figure decays because of time elapsed. Goals already scored or conceded may tweak the average slightly in one direction or another as a result of competing, scoreline dependent, tactical rearrangements, but a glut of early goals doesn't significantly alter our pregame goal expectancy.....only the passage of time can do that.
The rate at which a team's goal expectancy declines isn't constant. More goals are scored on average in the second half than the first as teams become more urgent in their efforts to score and fatigue leads to more space. The rates are around 44% for the former and 56% for the latter and the decay can be adequately described by an exponential equation of the following form.
Remaining Goal Expectancy = Initial Goal Expectancy x (Proportion of Time Remaining) ^0.84
Imagine Stoke is expected to score an average of 1 goal in a particular match, West Ham away on Monday night, perhaps. By halftime when the proportion of time remaining is very close to 0.5, the remaining goal expectancy can be calculated by inserting these values into the previous formula to give
Remaining Goal Expectancy = 1 x (0.5)^0.84 = 0.562 of a goal.
0.562 of a goal, you may notice equates to 56% of the initial goal expectation of 1 goal, which nicely fits the observed data. We can repeat this calculation for any minute of the match and also for the opposition. Armed with this information we are just a few repetitive, but simple steps away from being able to describe the likely scoring combinations that will occur in the remainder of the contest.
We'll fast forward to the 80th minute to use this accumulated knowledge to begin to construct a flexible and realistic "in running" prediction model. The West Ham/Stoke game was a fairly common type of Premiership contest, where two reasonable well matched sides were separated on the night by little more than home field advantage. An average expectation at kickoff for Stoke would be that they'd score close to one goal and concede just over 1.4 of a goal to the Hammers. If we insert those numbers into our equation and allow for the likely 4 minutes of added time we could expect Stoke to average 0.22 of a goal to West Ham's 0.30 in the remainder of the match.
If we now fire up the Poisson calculator we can produce probabilities that Stoke and WHU will score exactly 0,1,2,3 goals and so on, in the last 10+ minutes of Monday's game. Those probabilities are listed below.
The Likelihood of Stoke or WHU Scoring an Exact Number of Goals after the 79th Minute.
|Team.||0 Goals.||1 Goal.||2 Goals.||3 Goals.||4 Goals.|
We can now begin to accumulate the score combinations that will lead to a final match outcome, bearing in mind that O'Brien had equalised Walters' opening goal for the visitors and the match was currently stalemated. If, as actually happened, neither side scores, the match ends as a draw and the probability of a 0-0 is given by multiplying 0.737 by 0.806 or the individual probabilities of each side failing to score. That outcome has a probability of 0.594 or around 3 times in every 5. A 1-1 in the final "mini" match will also ultimately lead to a draw, as would a 2-2, 3-3 or 4-4 for the optimistic thrill seekers. If we finally total each of these individual, correct score probabilities, we have the likelihood of the currently tied game ending so at the final whistle.
A similar process generates cumulative probability totals for each correct score that leads to either a City win or a happy Hammers victory.
The Likelihood of Stoke or WHU Gaining any Result from 1-1 after the 79th Minute.
The above example is conveniently simplified by a current scoreline of 1-1, but teams can both trail or lead as WHU and Stoke respectively did in this match. However, the process merely becomes slightly more tedious rather than more complex. Stoke's set piece prowess finally reached ground level in the 13th minute when Walters found space in front of decoy runners to crisply dispatch a precisely delivered Whelan corner, an inventive deviation from the Delap assists of old. So if we want to examine the likely match result from say the 34th minute we have to also account for the 1-0 lead held by Stoke.
In this game situation, should Stoke go on to "win" the mini match from the 34' onwards, they will bolster their lead and comfortably win the game. In addition, if they merely "draw" the remainder of the match they will also win the entire game because of the 1-0 lead given to them by their mustachioed striker. An actual draw requires WHU to "win" the next 60 minutes by a single goal or by two or more to claim all three points.
The Likelihood of Stoke or WHU Gaining any Result from 0-1 after the 33rd Minute.
As the scoreline becomes more lopsided, the combinations that ultimately lead to wins, losses or draws also becomes more diverse. A team which holds a 2 goal cushion can afford to "lose" the remainder of the contest by a single goal and still claim victory. So the totting up procedure becomes more tiresome, although a spreadsheet helps greatly, but the Poisson process on a suitably decayed goal expectation remains constant.
As has already been stated, this quick run through does not account for well recognised deficiencies in using Poisson to describe football goal scoring, nor does it allow for the small, but real emphasis shifts that occur as the scoreline changes, but we can test the model's validity by comparing it's predictions to the efficient Betfair betting markets.
Below I've plotted the near 100% book prices that were available on Betfair in two minute intervals, along with the predictions from a pure Poisson during Monday night's Stoke West Ham match.
Price & Probability Movements During Stoke's 1-1 Draw At WHU.
1-1, O'Brien, 48'
The Betfair prices and the pure Poisson track each other's progress fairly accurately. The under prediction of the draw, inherent in the Poisson is well seen up until the WHU equaliser and my allegiance to one of the two side may also be represented throughout by my choice of initial goal expectations. Also the slightly increased optimism towards WHU during the half hour where they trailed isn't captured by the "blind" Poisson, but is by the Betfair traders and is also present in actual data.