{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Prediction of soccer outcome by combining rating and ML methods (2009/10 to 2017/18 EPL)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we assess the predictive performance of each rating system with two different prediciton methods. The target class is the final outcome of soccer matches in the English Premier League (2009-2018 seasons). Prediction methods are: 1. RANK (based on rankings) and 2. MLE (based on probabilities). For the predictions we apply the walk-forward procedure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import ratingslib.ratings as rl\n", "from ratingslib.app_sports.methods import (Predictions, prepare_sports_seasons,\n", " rating_norm_features)\n", "from ratingslib.application import SoccerOutcome\n", "from ratingslib.datasets.filenames import get_seasons_dict_footballdata_online\n", "from ratingslib.datasets.parameters import championships, stats\n", "from ratingslib.utils.enums import ratings" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set target outcome" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "outcome = SoccerOutcome()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Get filenames from football-data.co.uk for seasons 2009-2018 (English Premier League)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "filenames_dict = get_seasons_dict_footballdata_online(\n", " season_start=2009, season_end=2018,\n", " championship=championships.PREMIERLEAGUE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a list of rating methods and then we convert it to dictionary.\n", "* For Massey a minimum limit of 20 games has been set to start the rating of teams. This number has been selected to provide enough games, and it ensures that the games graph is connected.\n", "* For Markov the damping factor b was set to 0.85\n", "* For ELO The choice of parameters is those suggested by FIFA, K=40, ks=400 without taking into account the home field advantage (HA=0)\n", "* For WinLoss and Keener normalization is employed to produce fairer ratings since the teams may have a different number of games played (due to postponed or rescheduled matches).\n", "* For OffenseDefense the tolerance number we have selected to be 0.0001" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "ratings_list = [\n", " rl.Winloss(normalization=True),\n", " rl.Colley(),\n", " rl.Massey(data_limit=20),\n", " rl.Elo(version=ratings.ELOWIN, K=40, ks=400, HA=0,\n", " starting_point=0),\n", " rl.Elo(version=ratings.ELOPOINT, K=40, ks=400, HA=0,\n", " starting_point=0),\n", " rl.Keener(normalization=True),\n", " rl.OffenseDefense(tol=0.0001),\n", " rl.Markov(b=0.85, stats_markov_dict=stats.STATS_MARKOV_DICT),\n", " rl.AccuRate()\n", "]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ratings in the dataset start from the second match week." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Load season: 2009 - 2010\n", "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n", "Load season: 2010 - 2011\n", "2.8%5.6%8.3%11.1%13.9%16.7%19.4%22.2%25.0%27.8%30.6%33.3%36.1%38.9%41.7%44.4%47.2%50.0%52.8%55.6%58.3%61.1%63.9%66.7%69.4%72.2%75.0%77.8%80.6%83.3%86.1%88.9%91.7%94.4%97.2%100.0%\n", "Load season: 2011 - 2012\n", "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n", "Load season: 2012 - 2013\n", "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n", "Load season: 2013 - 2014\n", "3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%\n", "Load season: 2014 - 2015\n", "3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%\n", "Load season: 2015 - 2016\n", "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n", "Load season: 2016 - 2017\n", "2.9%5.9%8.8%11.8%14.7%17.6%20.6%23.5%26.5%29.4%32.4%35.3%38.2%41.2%44.1%47.1%50.0%52.9%55.9%58.8%61.8%64.7%67.6%70.6%73.5%76.5%79.4%82.4%85.3%88.2%91.2%94.1%97.1%100.0%\n", "Load season: 2017 - 2018\n", "3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%\n" ] } ], "source": [ "data = prepare_sports_seasons(filenames_dict,\n", " outcome,\n", " rating_systems=ratings_list,\n", " start_week=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will use the normalized ratings values as ml features, thus we create the feature list." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[['HratingnormWinloss[normalization=True]',\n", " 'AratingnormWinloss[normalization=True]'],\n", " ['HratingnormColley', 'AratingnormColley'],\n", " ['HratingnormMassey[data_limit=20]', 'AratingnormMassey[data_limit=20]'],\n", " ['HratingnormEloWin[HA=0_K=40_ks=400]',\n", " 'AratingnormEloWin[HA=0_K=40_ks=400]'],\n", " ['HratingnormEloPoint[HA=0_K=40_ks=400]',\n", " 'AratingnormEloPoint[HA=0_K=40_ks=400]'],\n", " ['HratingnormKeener[normalization=True]',\n", " 'AratingnormKeener[normalization=True]'],\n", " ['HratingnormOffenseDefense[tol=0.0001]',\n", " 'AratingnormOffenseDefense[tol=0.0001]'],\n", " ['HratingnormMarkov[b=0.85]', 'AratingnormMarkov[b=0.85]'],\n", " ['HratingnormAccuRATE', 'AratingnormAccuRATE']]" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "features_names_list = rating_norm_features(ratings_list)\n", "features_names_list" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We test two different methods: MLE and RANK and we start making predictions from the 4th week.\n", "We apply the anchored walk-farward procedure with window size = 1 which means that every week we make predictions\n", "by using previous weeks data for training set. For example for the 4th week, the training set is consisted of the 1st, 2nd and 3rd week.\n", "Note that in every season we restart the walk-forward procedure. " ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "\n", "=====MLE=====\n", "\n", "\n", "=====Accuracy results=====\n", "\n", " Accuracy Correct Games Wrong Games Total Games\n", "Winloss[normalization=True] 0.506675 1594 1552 3146\n", "Colley 0.510490 1606 1540 3146\n", "Massey[data_limit=20] 0.517880 1593 1483 3076\n", "EloWin[HA=0_K=40_ks=400] 0.513032 1614 1532 3146\n", "EloPoint[HA=0_K=40_ks=400] 0.513032 1614 1532 3146\n", "Keener[normalization=True] 0.515257 1621 1525 3146\n", "OffenseDefense[tol=0.0001] 0.504768 1588 1558 3146\n", "Markov[b=0.85] 0.507947 1598 1548 3146\n", "AccuRATE 0.514304 1618 1528 3146\n", "\n", "\n", "=====RANK=====\n", "\n", "\n", "=====Accuracy results=====\n", "\n", " Accuracy Correct Games Wrong Games Total Games\n", "Winloss[normalization=True] 0.487921 1535 1611 3146\n", "Colley 0.479339 1508 1638 3146\n", "Massey[data_limit=20] 0.488557 1537 1609 3146\n", "EloWin[HA=0_K=40_ks=400] 0.480292 1511 1635 3146\n", "EloPoint[HA=0_K=40_ks=400] 0.487921 1535 1611 3146\n", "Keener[normalization=True] 0.485378 1527 1619 3146\n", "OffenseDefense[tol=0.0001] 0.486014 1529 1617 3146\n", "Markov[b=0.85] 0.494278 1555 1591 3146\n", "AccuRATE 0.489828 1541 1605 3146\n" ] } ], "source": [ "results = Predictions(data, outcome,start_from_week=4).rs_pred_parallel(\n", " rating_systems=ratings_list,\n", " pred_methods_list=['MLE', 'RANK'])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3.8.13 ('py38')", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.13" }, "orig_nbformat": 4, "vscode": { "interpreter": { "hash": "23ceb7112fbf9d0e38ecbf60d6e6d5e2dcebcc82200eeb1e5a5d5f9ffb9e27ca" } } }, "nbformat": 4, "nbformat_minor": 2 }