Thursday, 15 June 2017

Early Season Strength of Schedule

With the major European leagues currently enjoying their summer holidays, it is left to a handful of competitions to provide club based action until early August.

One such league is Brazil's Serie A, a fascinating mix of player and managerial churn, exciting skillful youngsters, paired with former internationals, slowly winding down their illustrious careers and lots of shooting from distance.

Tonight sees the completion of week seven of the twenty team league, so while we have accumulated some new information about the 2017/18 version of teams such as Santos, Sao Paulo, Corinthians and less know sides, such as Gremio and Bahia, that information comes courtesy of an unbalanced schedule.

Prior to week seven, Flamengo had played three of the current bottom four and no side from the top half of the table, whereas Vasco da Gama had faced the current top two and only two sides outside the top ten.

The challenges faced by these two sides were likely to vary in their degree of difficulty,

Delving deeper into each side's most recent games, including matches from 2016/17 may be a more reliable indicator of their respective future prospects, but it is understandable that a six game season to date also invites comment in isolation.

Predicting the future arc of a team's season is always welcome, but celebrating achievement over a shorter time frame, even if some of it has come from a sprinkling of unsustainable randomness also deserves attention.

How can advanced stats and strength of schedule adjustments assist?

It's natural to look firstly at the record of the side in question, but it is their opponents that possess the richest seam of data from 2017/18's fledgling season.

Vasco has played Palmeiras, Bahia, Sport, Fluminese, Corinthians and Gremio prior to last night and in turn each of their opponents has also played five other opponents in addition to Vasco.

Combined, Vasco's opponents have played 36 games, nearly a full season and have played every side in Serie A at least once, bar Corinthians.

We have a ton of accumulated data from goals to expected goals for Vasco's opponents, but only six games of data for Vasco themselves and the same is true for the remaining 19 teams.

It's natural to expect even this limited, if recent achievement does contain some signal relating to future performance and Ben Cronin over at Pinnacle has written this article about the correlations between Premier League position after six games and final position and the FT's John Burn-Murdoch also tweeted this excellent visualisation correlating current league position during the 2013/14 season with finishing position in May.

To adjust for strength of schedule, we might take expected goal differential, rather than league position as the performance related output for each team and utilise the interrelated collateral form lines are created after a few weeks of the season

Team A may not have played team B yet, but they may have played team C, who have played team B.

We are left with 20 simultaneous equations, with a side's opponents on one side and their actual expected goal differential output on the other. Solve these we have new expected goals differentials that more fully represent the difficulty of each team's schedule.

In short, it is the basis for so called power ratings.

Here's how Serie A teams were ranked by expected goals differential prior to week seven and how that ranking changed when we allowed for the sometimes heavily unbalanced schedules played.

Vasco were ranked 13th on expected goal differential, but jumped into the top 10 to 9th when their harsh early schedule was applied.

Ponte Preta dropped four places to 15th in view of an apparently benign group of initial opponents.

In theory this seems fine, but does schedule strength add anything to our knowledge of a side going forward if we choose to limit ourselves to data from just this single season?

As Ben and John have admirably demonstrated, there is a correlation between league position at various stages of the season and finishing position.

Here's a limited (due to workload) example from a previous Premier League season using simply goal differential rather than expected goals.

13 games into the 2013/14 season, Spurs were ranked 13th by goal difference, 10th when strength of previous schedule was applied and 9th in the actual table. They finished 6th.

Their position in the table after 13 games better predicted their finishing spot, followed by strength of schedule adjusted goal difference and lastly actual goal difference.

As a whole though ranked, strength of schedule adjusted goal difference from week 13 did best of the three, producing ranked correlations of 0.77 for league position and actual goal difference after 13 games, but rising to 0.80 when strength of schedule corrections were applied and the teams re ranked after 13 matches each.

In short, there is signal in limited early season data and as a means of predicting final finishing position there may be some improvement if we rank by a schedule adjusted performance indicator.

All Brazilian data from InfAppoGol

No comments:

Post a Comment