Tuning ELO parameters (2009-2010 EPL)
In this example we are tuning Elo parameters in order to improve accuracy. The tuning is performed by grid-search after defining the search space of each parameter.
Load packages
[1]:
from ratingslib.app_sports.methods import Predictions, prepare_sports_seasons
from ratingslib.application import SoccerOutcome
from ratingslib.datasets.filenames import get_seasons_dict_footballdata_online
from ratingslib.datasets.parameters import championships
from ratingslib.ratings.elo import Elo
from ratingslib.utils.enums import ratings
from sklearn.naive_bayes import GaussianNB
Set target outcome
[2]:
outcome = SoccerOutcome()
Get the filename from football-data.co.uk for season 2009-2010 (English Premier League).
[3]:
filename = get_seasons_dict_footballdata_online(
    season_start=2009, season_end=2010, championship=championships.PREMIERLEAGUE)
We set the version list which contains the Elo-Win and Elo-Point version. Then, we create a dictionary that maps all possible combinations of the ranges for each parameter we have set.
[4]:
version_list = [ratings.ELOWIN, ratings.ELOPOINT]
ratings_dict = Elo.prepare_for_gridsearch_tuning(version_list=version_list,
                                                 k_range=[10, 20],
                                                 ks_range=[100, 200],
                                                 HA_range=[70, 80])
ratings_dict.keys()
[4]:
dict_keys(['EloWin[HA=70_K=10_ks=100]', 'EloPoint[HA=70_K=10_ks=100]', 'EloWin[HA=80_K=10_ks=100]', 'EloPoint[HA=80_K=10_ks=100]', 'EloWin[HA=70_K=10_ks=200]', 'EloPoint[HA=70_K=10_ks=200]', 'EloWin[HA=80_K=10_ks=200]', 'EloPoint[HA=80_K=10_ks=200]', 'EloWin[HA=70_K=20_ks=100]', 'EloPoint[HA=70_K=20_ks=100]', 'EloWin[HA=80_K=20_ks=100]', 'EloPoint[HA=80_K=20_ks=100]', 'EloWin[HA=70_K=20_ks=200]', 'EloPoint[HA=70_K=20_ks=200]', 'EloWin[HA=80_K=20_ks=200]', 'EloPoint[HA=80_K=20_ks=200]'])
The ratings in the dataset start from the second match week.
[5]:
data = prepare_sports_seasons(filename,
                              outcome,
                              rating_systems=ratings_dict,
                              start_week=2)
Load season: 2009 - 2010
2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%
We test three diffent methods (RANK, MLE, and the Naive Bayes classifier) and we start making predictions from the 4th week. We apply the anchored walk-farward procedure with window size = 1 which means that every week we make predictions by using previous weeks data for training set. For example, for the 4th week, the training set is consisted of the 1st, 2nd and 3rd week. The best parameters for each method and for each version are printed in the console.
[6]:
prediction_methods = [GaussianNB(), 'MLE', 'RANK']
print()
for predict_with in prediction_methods:
    best = Predictions(data, outcome, start_from_week=4, print_accuracy_report=False).rs_tuning_params(
        ratings_dict=ratings_dict, predict_with=predict_with,
        metric_name='accuracy')
=====Prediction method: GaussianNB=====
EloWin[HA=70_K=20_ks=100] 0.5203488372093024
=====Prediction method: GaussianNB=====
EloPoint[HA=70_K=10_ks=200] 0.5290697674418605
=====Prediction method: MLE=====
EloWin[HA=80_K=20_ks=100] 0.5377906976744186
=====Prediction method: MLE=====
EloPoint[HA=80_K=20_ks=100] 0.5465116279069767
=====Prediction method: RANK=====
EloWin[HA=80_K=10_ks=100] 0.49127906976744184
=====Prediction method: RANK=====
EloPoint[HA=70_K=20_ks=100] 0.5087209302325582