{
    "cells": [
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "# Prediction of soccer outcome by combining rating and ML methods (2009/10 to 2017/18 EPL)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "In this example, we assess the predictive performance of each rating system with two different prediciton methods. The target class is the final outcome of soccer matches in the English Premier League (2009-2018 seasons). Prediction methods are: 1. RANK (based on rankings) and 2. MLE (based on probabilities). For the predictions we apply the walk-forward procedure."
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Load packages"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 1,
            "metadata": {},
            "outputs": [],
            "source": [
                "import ratingslib.ratings as rl\n",
                "from ratingslib.app_sports.methods import (Predictions, prepare_sports_seasons,\n",
                "                                        rating_norm_features)\n",
                "from ratingslib.application import SoccerOutcome\n",
                "from ratingslib.datasets.filenames import get_seasons_dict_footballdata_online\n",
                "from ratingslib.datasets.parameters import championships, stats\n",
                "from ratingslib.utils.enums import ratings"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Set target outcome"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 2,
            "metadata": {},
            "outputs": [],
            "source": [
                "outcome = SoccerOutcome()"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "Get filenames from football-data.co.uk for seasons 2009-2018 (English Premier League)."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 3,
            "metadata": {},
            "outputs": [],
            "source": [
                "filenames_dict = get_seasons_dict_footballdata_online(\n",
                "    season_start=2009, season_end=2018,\n",
                "    championship=championships.PREMIERLEAGUE)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "We create a list of rating methods and then we convert it to dictionary.\n",
                "* For Massey a minimum limit of 20 games has been set to start the rating of teams. This number has been selected to provide enough games, and it ensures that the games graph is connected.\n",
                "* For Markov the damping factor b was set to 0.85\n",
                "* For ELO The choice of parameters is those suggested by FIFA, K=40, ks=400 without taking into account the home field advantage (HA=0)\n",
                "* For WinLoss and Keener normalization is employed to produce fairer ratings since the teams may have a different number of games played (due to postponed or rescheduled matches).\n",
                "* For OffenseDefense the tolerance number we have selected to be 0.0001"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 4,
            "metadata": {},
            "outputs": [],
            "source": [
                "ratings_list = [\n",
                "    rl.Winloss(normalization=True),\n",
                "    rl.Colley(),\n",
                "    rl.Massey(data_limit=20),\n",
                "    rl.Elo(version=ratings.ELOWIN, K=40, ks=400, HA=0,\n",
                "           starting_point=0),\n",
                "    rl.Elo(version=ratings.ELOPOINT, K=40, ks=400, HA=0,\n",
                "           starting_point=0),\n",
                "    rl.Keener(normalization=True),\n",
                "    rl.OffenseDefense(tol=0.0001),\n",
                "    rl.Markov(b=0.85, stats_markov_dict=stats.STATS_MARKOV_DICT),\n",
                "    rl.AccuRate()\n",
                "]"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "The ratings in the dataset start from the second match week."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 5,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "Load season: 2009 - 2010\n",
                        "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n",
                        "Load season: 2010 - 2011\n",
                        "2.8%5.6%8.3%11.1%13.9%16.7%19.4%22.2%25.0%27.8%30.6%33.3%36.1%38.9%41.7%44.4%47.2%50.0%52.8%55.6%58.3%61.1%63.9%66.7%69.4%72.2%75.0%77.8%80.6%83.3%86.1%88.9%91.7%94.4%97.2%100.0%\n",
                        "Load season: 2011 - 2012\n",
                        "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n",
                        "Load season: 2012 - 2013\n",
                        "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n",
                        "Load season: 2013 - 2014\n",
                        "3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%\n",
                        "Load season: 2014 - 2015\n",
                        "3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%\n",
                        "Load season: 2015 - 2016\n",
                        "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n",
                        "Load season: 2016 - 2017\n",
                        "2.9%5.9%8.8%11.8%14.7%17.6%20.6%23.5%26.5%29.4%32.4%35.3%38.2%41.2%44.1%47.1%50.0%52.9%55.9%58.8%61.8%64.7%67.6%70.6%73.5%76.5%79.4%82.4%85.3%88.2%91.2%94.1%97.1%100.0%\n",
                        "Load season: 2017 - 2018\n",
                        "3.0%6.1%9.1%12.1%15.2%18.2%21.2%24.2%27.3%30.3%33.3%36.4%39.4%42.4%45.5%48.5%51.5%54.5%57.6%60.6%63.6%66.7%69.7%72.7%75.8%78.8%81.8%84.8%87.9%90.9%93.9%97.0%100.0%\n"
                    ]
                }
            ],
            "source": [
                "data = prepare_sports_seasons(filenames_dict,\n",
                "                              outcome,\n",
                "                              rating_systems=ratings_list,\n",
                "                              start_week=2)"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "We will use the normalized ratings values as ml features, thus we create the feature list."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 6,
            "metadata": {},
            "outputs": [
                {
                    "data": {
                        "text/plain": [
                            "[['HratingnormWinloss[normalization=True]',\n",
                            "  'AratingnormWinloss[normalization=True]'],\n",
                            " ['HratingnormColley', 'AratingnormColley'],\n",
                            " ['HratingnormMassey[data_limit=20]', 'AratingnormMassey[data_limit=20]'],\n",
                            " ['HratingnormEloWin[HA=0_K=40_ks=400]',\n",
                            "  'AratingnormEloWin[HA=0_K=40_ks=400]'],\n",
                            " ['HratingnormEloPoint[HA=0_K=40_ks=400]',\n",
                            "  'AratingnormEloPoint[HA=0_K=40_ks=400]'],\n",
                            " ['HratingnormKeener[normalization=True]',\n",
                            "  'AratingnormKeener[normalization=True]'],\n",
                            " ['HratingnormOffenseDefense[tol=0.0001]',\n",
                            "  'AratingnormOffenseDefense[tol=0.0001]'],\n",
                            " ['HratingnormMarkov[b=0.85]', 'AratingnormMarkov[b=0.85]'],\n",
                            " ['HratingnormAccuRATE', 'AratingnormAccuRATE']]"
                        ]
                    },
                    "execution_count": 6,
                    "metadata": {},
                    "output_type": "execute_result"
                }
            ],
            "source": [
                "features_names_list = rating_norm_features(ratings_list)\n",
                "features_names_list"
            ]
        },
        {
            "cell_type": "markdown",
            "metadata": {},
            "source": [
                "We test two different methods: MLE and RANK and we start making predictions from the 4th week.\n",
                "We apply the anchored walk-farward procedure with window size = 1 which means that every week we make predictions\n",
                "by using previous weeks data for training set. For example for the 4th week, the training set is consisted of the 1st, 2nd and 3rd week.\n",
                "Note that in every season we restart the walk-forward procedure. "
            ]
        },
        {
            "cell_type": "code",
            "execution_count": 7,
            "metadata": {},
            "outputs": [
                {
                    "name": "stdout",
                    "output_type": "stream",
                    "text": [
                        "\n",
                        "\n",
                        "=====MLE=====\n",
                        "\n",
                        "\n",
                        "=====Accuracy results=====\n",
                        "\n",
                        "                             Accuracy  Correct Games  Wrong Games  Total Games\n",
                        "Winloss[normalization=True]  0.506675           1594         1552         3146\n",
                        "Colley                       0.510490           1606         1540         3146\n",
                        "Massey[data_limit=20]        0.517880           1593         1483         3076\n",
                        "EloWin[HA=0_K=40_ks=400]     0.513032           1614         1532         3146\n",
                        "EloPoint[HA=0_K=40_ks=400]   0.513032           1614         1532         3146\n",
                        "Keener[normalization=True]   0.515257           1621         1525         3146\n",
                        "OffenseDefense[tol=0.0001]   0.504768           1588         1558         3146\n",
                        "Markov[b=0.85]               0.507947           1598         1548         3146\n",
                        "AccuRATE                     0.514304           1618         1528         3146\n",
                        "\n",
                        "\n",
                        "=====RANK=====\n",
                        "\n",
                        "\n",
                        "=====Accuracy results=====\n",
                        "\n",
                        "                             Accuracy  Correct Games  Wrong Games  Total Games\n",
                        "Winloss[normalization=True]  0.487921           1535         1611         3146\n",
                        "Colley                       0.479339           1508         1638         3146\n",
                        "Massey[data_limit=20]        0.488557           1537         1609         3146\n",
                        "EloWin[HA=0_K=40_ks=400]     0.480292           1511         1635         3146\n",
                        "EloPoint[HA=0_K=40_ks=400]   0.487921           1535         1611         3146\n",
                        "Keener[normalization=True]   0.485378           1527         1619         3146\n",
                        "OffenseDefense[tol=0.0001]   0.486014           1529         1617         3146\n",
                        "Markov[b=0.85]               0.494278           1555         1591         3146\n",
                        "AccuRATE                     0.489828           1541         1605         3146\n"
                    ]
                }
            ],
            "source": [
                "results = Predictions(data, outcome,start_from_week=4).rs_pred_parallel(\n",
                "                                      rating_systems=ratings_list,\n",
                "                                      pred_methods_list=['MLE', 'RANK'])"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "Python 3.8.13 ('py38')",
            "language": "python",
            "name": "python3"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.8.13"
        },
        "orig_nbformat": 4,
        "vscode": {
            "interpreter": {
                "hash": "23ceb7112fbf9d0e38ecbf60d6e6d5e2dcebcc82200eeb1e5a5d5f9ffb9e27ca"
            }
        }
    },
    "nbformat": 4,
    "nbformat_minor": 2
}