Tuning KNN hyper-parameters (2009-2010 EPL)

In this example we are tuning KNN hyper-parameters in order to improve f1 score. The tuning is performed by grid-search after defining the search space of each hyper-parameter.

Load packages

[1]:

from ratingslib.app_sports.methods import (Predictions, prepare_sports_seasons,
                                        rating_norm_features)
from ratingslib.application import SoccerOutcome
from ratingslib.datasets.filenames import get_seasons_dict_footballdata_online
from ratingslib.datasets.parameters import championships
from ratingslib.ratings.colley import Colley
from sklearn.neighbors import KNeighborsClassifier

Set target outcome

[2]:

outcome = SoccerOutcome()

Get the filename from football-data.co.uk for season 2009-2010 (English Premier League).

[3]:

filename = get_seasons_dict_footballdata_online(
    season_start=2009, season_end=2010, championship=championships.PREMIERLEAGUE)

We create a rating system instance. In this example, we have chosen Colley method.

[4]:

colley = Colley()

The ratings in the dataset start from the second match week.

[5]:

data = prepare_sports_seasons(filename,
                              outcome,
                              rating_systems=colley,
                              start_week=2)

Load season: 2009 - 2010
2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%

We test KNN classifier with 7, 9, 11, 13, 15, 17 neighbors. We start making predictions from the 4th week. We apply the anchored walk-farward procedure with window size = 1 which means that every week we make predictions by using previous weeks data for training set. For example for the 4th week, the training set consists of the 1st, 2nd and 3rd week. The best parameters for each method and for each version are printed in the console.

[6]:

features_names = rating_norm_features(colley)
clf_list = [KNeighborsClassifier(n_neighbors=n) for n in range(7, 19, 2)]
best = Predictions(data, outcome, start_from_week=4, print_accuracy_report=False).ml_tuning_params(clf_list=clf_list,
                                                                                                   features_names=features_names,
                                                                                                   metric_name='f1',
                                                                                                   average='weighted')



=====Classifier: KNeighborsClassifier=====
KNeighborsClassifier(n_neighbors=7)-[features: HratingnormColley AratingnormColley] 0.4555210471051302