I prefer to measure the usefulness of a model by how it performs in data which it hasn't previously seen, rather that attempting to see how well it describes outcomes that have been used in the construction of the model.
Over fitting of inputs that are unique to the training set may give the illusion of usefulness, but subsequently the model may perform poorly when applied to new observations and while description is a handy attribute for a model, prediction is arguably much more useful.
We'll need a model to play with.
So I've constructed the simplest of exp goals models using just x, y co-ordinates and shot type, foot or header and I've also restricted attempts to chances from open play. The dependent variable is whether or not a goal was scored.
We have three independent variables, x, y and type, from which we can produce a prediction for the probability that a particular attempt will result in a goal.
The training data uses Premier League attempts from one season and the regression is then let loose on four months worth of data from the following season.
Initially the model seems promising. 369 open play goals were scored in the new batch of 4,000+ shots and the cumulative expected goals for all attempts using the regression from the previous season when applied to the shot type and position of these attempts predicted that 368 goals would be scored.
Splitting the data into ten equal batches of shots, ranging from the attempts that were predicted to have the lowest chance of resulting in a goal up to the highest gives an equally comforting plot.
The relationship between the cumulative expected number of goals in each bin and the reality on the pitch is strong.
Everything looks fine, big R^2, near perfect agreement with expected and actual totals. So we have a great model?
However, if we look at the individual bins, it's not quite as clear cut.
The third least likely goal laden bin by our model's estimates had around 400 attempts and around 12 predicted goals, compared to the 20 that were actually scored. The sixth bin had 26 expected and 21 actual.
Nothing too alarming perhaps. We shouldn't expect prediction to perfectly match reality, but when there is divergence we should see if it is extreme enough to suggest that the groups are substantially different or simply the product of chance.
To test this I took the expected and actual figures for goals and none goals for all ten groups comprising of 401 attempts per bin and saw if the differences were statistically significant or just down to chance.
Unfortunately for this model there was only a 2% chance that the range of discrepancies between the actual numbers of goals and non goals and the predicted numbers in each of the ten bins from the model had arisen by chance.
Bottom line, the model was probably a poor fit when used for future data despite it ticking a few initial boxes.
Next, I divided the data into shots and headers and ran a regression on the training set to produce two separate models. One for shots that simply used x, y co-ordinates as the independent variables and similarly for headers.
Both of these new models passed the first two tests by predicting goal totals in the out of sample data that agreed very well with the actual number of goals scored and plotted well when binned into steadily rising scoring probabilities.
However, unlike the composite model that used shot type as an independent variable, both of these models produces binned expected and actual differences that could, statistically be attributed to just chance.
The model for headers particularly had differences between the model's prediction and reality that was highly likely to have been simply down to random chance rather than a poorly fitting model.
There are problems with this approach, not least sample size, but it may be an additional check when exp gls models are released on new data.