Prediction of soccer outcome by combining rating and ML methods (2009/10 to 2017/18 EPL)

In this example, we assess the predictive performance of each rating system with two different prediciton methods. The target class is the final outcome of soccer matches in the English Premier League (2009-2018 seasons). Prediction methods are: 1. RANK (based on rankings) and 2. MLE (based on probabilities). For the predictions we apply the walk-forward procedure.

Load packages

[1]:
import ratingslib.ratings as rl
from ratingslib.app_sports.methods import (Predictions, prepare_sports_seasons,
                                        rating_norm_features)
from ratingslib.application import SoccerOutcome
from ratingslib.datasets.filenames import get_seasons_dict_footballdata_online
from ratingslib.datasets.parameters import championships, stats
from ratingslib.utils.enums import ratings

Set target outcome

[2]:
outcome = SoccerOutcome()

Get filenames from football-data.co.uk for seasons 2009-2018 (English Premier League).

[3]:
filenames_dict = get_seasons_dict_footballdata_online(
    season_start=2009, season_end=2018,
    championship=championships.PREMIERLEAGUE)

We create a list of rating methods and then we convert it to dictionary. * For Massey a minimum limit of 20 games has been set to start the rating of teams. This number has been selected to provide enough games, and it ensures that the games graph is connected. * For Markov the damping factor b was set to 0.85 * For ELO The choice of parameters is those suggested by FIFA, K=40, ks=400 without taking into account the home field advantage (HA=0) * For WinLoss and Keener normalization is employed to produce fairer ratings since the teams may have a different number of games played (due to postponed or rescheduled matches). * For OffenseDefense the tolerance number we have selected to be 0.0001

[4]:
ratings_list = [
    rl.Winloss(normalization=True),
    rl.Colley(),
    rl.Massey(data_limit=20),
    rl.Elo(version=ratings.ELOWIN, K=40, ks=400, HA=0,
           starting_point=0),
    rl.Elo(version=ratings.ELOPOINT, K=40, ks=400, HA=0,
           starting_point=0),
    rl.Keener(normalization=True),
    rl.OffenseDefense(tol=0.0001),
    rl.Markov(b=0.85, stats_markov_dict=stats.STATS_MARKOV_DICT),
    rl.AccuRate()
]

The ratings in the dataset start from the second match week.

[5]:
data = prepare_sports_seasons(filenames_dict,
                              outcome,
                              rating_systems=ratings_list,
                              start_week=2)
Load season: 2009 - 2010
2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%
Load season: 2010 - 2011
2.8%5.6%8.3%11.1%13.9%16.7%19.4%22.2%25.0%27.8%30.6%33.3%36.1%38.9%41.7%44.4%47.2%50.0%52.8%55.6%58.3%61.1%63.9%66.7%69.4%72.2%75.0%77.8%80.6%83.3%86.1%88.9%91.7%94.4%97.2%100.0%
Load season: 2011 - 2012
2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%
Load season: 2012 - 2013
2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%
Load season: 2013 - 2014
3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%
Load season: 2014 - 2015
3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%
Load season: 2015 - 2016
2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%
Load season: 2016 - 2017
2.9%5.9%8.8%11.8%14.7%17.6%20.6%23.5%26.5%29.4%32.4%35.3%38.2%41.2%44.1%47.1%50.0%52.9%55.9%58.8%61.8%64.7%67.6%70.6%73.5%76.5%79.4%82.4%85.3%88.2%91.2%94.1%97.1%100.0%
Load season: 2017 - 2018
3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%

We will use the normalized ratings values as ml features, thus we create the feature list.

[6]:
features_names_list = rating_norm_features(ratings_list)
features_names_list
[6]:
[['HratingnormWinloss[normalization=True]',
  'AratingnormWinloss[normalization=True]'],
 ['HratingnormColley', 'AratingnormColley'],
 ['HratingnormMassey[data_limit=20]', 'AratingnormMassey[data_limit=20]'],
 ['HratingnormEloWin[HA=0_K=40_ks=400]',
  'AratingnormEloWin[HA=0_K=40_ks=400]'],
 ['HratingnormEloPoint[HA=0_K=40_ks=400]',
  'AratingnormEloPoint[HA=0_K=40_ks=400]'],
 ['HratingnormKeener[normalization=True]',
  'AratingnormKeener[normalization=True]'],
 ['HratingnormOffenseDefense[tol=0.0001]',
  'AratingnormOffenseDefense[tol=0.0001]'],
 ['HratingnormMarkov[b=0.85]', 'AratingnormMarkov[b=0.85]'],
 ['HratingnormAccuRATE', 'AratingnormAccuRATE']]

We test two different methods: MLE and RANK and we start making predictions from the 4th week. We apply the anchored walk-farward procedure with window size = 1 which means that every week we make predictions by using previous weeks data for training set. For example for the 4th week, the training set is consisted of the 1st, 2nd and 3rd week. Note that in every season we restart the walk-forward procedure.

[7]:
results = Predictions(data, outcome,start_from_week=4).rs_pred_parallel(
                                      rating_systems=ratings_list,
                                      pred_methods_list=['MLE', 'RANK'])


=====MLE=====


=====Accuracy results=====

                             Accuracy  Correct Games  Wrong Games  Total Games
Winloss[normalization=True]  0.506675           1594         1552         3146
Colley                       0.510490           1606         1540         3146
Massey[data_limit=20]        0.517880           1593         1483         3076
EloWin[HA=0_K=40_ks=400]     0.513032           1614         1532         3146
EloPoint[HA=0_K=40_ks=400]   0.513032           1614         1532         3146
Keener[normalization=True]   0.515257           1621         1525         3146
OffenseDefense[tol=0.0001]   0.504768           1588         1558         3146
Markov[b=0.85]               0.507947           1598         1548         3146
AccuRATE                     0.514304           1618         1528         3146


=====RANK=====


=====Accuracy results=====

                             Accuracy  Correct Games  Wrong Games  Total Games
Winloss[normalization=True]  0.487921           1535         1611         3146
Colley                       0.479339           1508         1638         3146
Massey[data_limit=20]        0.488557           1537         1609         3146
EloWin[HA=0_K=40_ks=400]     0.480292           1511         1635         3146
EloPoint[HA=0_K=40_ks=400]   0.487921           1535         1611         3146
Keener[normalization=True]   0.485378           1527         1619         3146
OffenseDefense[tol=0.0001]   0.486014           1529         1617         3146
Markov[b=0.85]               0.494278           1555         1591         3146
AccuRATE                     0.489828           1541         1605         3146