{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Prediction of soccer outcome (2009-2010 EPL)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example we use the rating values from \"AccuRATE method\" as machine learning features to predict soccer outcome.\n", "The dataset is composed of soccer matches of the English Premier League (season 2009-2010).\n", "The predictions are performed through Naive Bayes classifier of scikit-learn library and we apply the walk-forward procedure." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Load packages" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from ratingslib.app_sports.methods import Predictions, prepare_sports_seasons\n", "from ratingslib.application import SoccerOutcome\n", "from ratingslib.datasets.filenames import get_seasons_dict_footballdata_online\n", "from ratingslib.datasets.soccer import championships\n", "from ratingslib.ratings.accurate import AccuRate\n", "from sklearn.naive_bayes import GaussianNB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set target outcome" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "outcome = SoccerOutcome()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Firstly, we get the filename from football-data.co.uk for season 2009-2010 (English Premier League).\n", "Then, we create rating system and we add it to a dictionary and finally we prepare the dataset.\n", "The ratings in the dataset start from the second match week." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Load season: 2009 - 2010\n", "2.9%5.7%8.6%11.4%14.3%17.1%20.0%22.9%25.7%28.6%31.4%34.3%37.1%40.0%42.9%45.7%48.6%51.4%54.3%57.1%60.0%62.9%65.7%68.6%71.4%74.3%77.1%80.0%82.9%85.7%88.6%91.4%94.3%97.1%100.0%\n" ] } ], "source": [ "filenames_dict = get_seasons_dict_footballdata_online(\n", " season_start=2009, season_end=2010, championship=championships.PREMIERLEAGUE)\n", "ratings_dict = {'AccuRATE': AccuRate()}\n", "data_ml = prepare_sports_seasons(filenames_dict,\n", " outcome,\n", " rating_systems=ratings_dict,\n", " start_week=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We show the columns of 2009 season dataframe" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Index(['Div', 'Date', 'HomeTeam', 'AwayTeam', 'FTHG', 'FTAG', 'FTR', 'HTHG',\n", " 'HTAG', 'HTR', 'Referee', 'HS', 'AS', 'HST', 'AST', 'HF', 'AF', 'HC',\n", " 'AC', 'HY', 'AY', 'HR', 'AR', 'B365H', 'B365D', 'B365A', 'BWH', 'BWD',\n", " 'BWA', 'GBH', 'GBD', 'GBA', 'IWH', 'IWD', 'IWA', 'LBH', 'LBD', 'LBA',\n", " 'SBH', 'SBD', 'SBA', 'WHH', 'WHD', 'WHA', 'SJH', 'SJD', 'SJA', 'VCH',\n", " 'VCD', 'VCA', 'BSH', 'BSD', 'BSA', 'Bb1X2', 'BbMxH', 'BbAvH', 'BbMxD',\n", " 'BbAvD', 'BbMxA', 'BbAvA', 'BbOU', 'BbMx>2.5', 'BbAv>2.5', 'BbMx<2.5',\n", " 'BbAv<2.5', 'BbAH', 'BbAHh', 'BbMxAHH', 'BbAvAHH', 'BbMxAHA', 'BbAvAHA',\n", " 'Period', 'Week_Number', 'FT', 'HAccuRATE', 'AAccuRATE',\n", " 'HratingnormAccuRATE', 'AratingnormAccuRATE'],\n", " dtype='object')" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data_ml[2009].columns" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will only use the normalized ratings from AccuRATE as features for ml classifier." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
| \n", " | HratingnormAccuRATE | \n", "AratingnormAccuRATE | \n", "
|---|---|---|
| 0 | \n", "0.848092 | \n", "0.061352 | \n", "
| 1 | \n", "0.401452 | \n", "0.288341 | \n", "
| 2 | \n", "0.000000 | \n", "0.231402 | \n", "
| 3 | \n", "0.663704 | \n", "0.343821 | \n", "
| 4 | \n", "0.369277 | \n", "0.165818 | \n", "