ratingslib.app_sports.methods module

Predictions of sport outcome without backtester

predict_hindsight(data: DataFrame, teams_rating_df: DataFrame, outcome: SportOutcome, pred_method: Literal['MLE', 'RANK'] = 'RANK', columns_dict: Optional[Dict[str, Any]] = None) Tuple[list, list]

Hindsight prediction refers to predicting past games using the ratings of entire games.

Parameters
  • data (pd.DataFrame) – Data of games

  • teams_rating_df (pd.DataFrame) – Rating values of teams. Note that ‘rating’ column must be in the DataFrame columns

  • outcome (SportOutcome) – The outcome parameter is associated with application type e.g. for soccer the type of outcome is ratingslib.application.SoccerOutcome. For more details see ratingslib.application module.

  • pred_method (Literal['RANK', 'MLE'], default='RANK') – Two available methods for predictions: ‘RANK’ or ‘MLE’ More details at ratingslib.application.SoccerOutcome

  • columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details

Returns

pred : List of predictions Y : Correct outcome values

Return type

Tuple[list, list]

accuracy_results(test_Y: list, predictions: list) Tuple[float, int]

Returns the accuracy results in a percentage and as correctly classified samples.

Parameters
  • test_Y (list) – Ground truth (correct) labels.

  • predictions (list) – Predicted labels

Returns

  • accuracy (float) – Accuracy metric

  • correct (int) – Correctly classified samples

show_list_of_accuracy_results(names_list: List[str], test_Y: list, predictions_list: list, print_predictions: bool)

Show accuracy results for a list of models

Parameters
  • names_list (List[str]) – Model name list

  • test_Y (list) – List of correct labels

  • predictions_list (list) – List that contains lists of predicted labels

  • print_predictions (bool) – If True then predictions are printed

classification_details(name: str, test_Y: list, pred: list) str

Return classification details for a prediction model based on truth labels and predictions

Parameters
  • name (str) – Name of prediction model

  • test_Y (list) – List of correct labels

  • pred (list) – List of predictions

Returns

Classification details as string

Return type

str

class Predictions(data: Union[Dict[int, DataFrame], DataFrame], outcome: SportOutcome, data_test: Optional[DataFrame] = None, split: Optional[Union[float, int]] = None, start_from_week: Optional[int] = None, walk_forward_window_size: int = - 1, columns_dict: Optional[Dict[str, Any]] = None, print_accuracy_report: bool = True, print_classification_report: bool = False, print_predictions: bool = False)

Bases: object

Class for predict soccer match results

Parameters
  • data (Union[Dict[int, pd.DataFrame], pd.DataFrame]) – Data of games in a dictionary or in a DataFrame. If dictionary passed then the key is the season and value is the data.

  • outcome (SportOutcome) – The outcome parameter is related with application type. For sports application it must be an instance of subclass of SportOutcome class. e.g. for soccer the type of outcome is ratingslib.application.SoccerOutcome. For more details see ratingslib.application module.

  • pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier. More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome

  • features_names (List[str]) – List of feature names (each name refer to a column of the data)

  • data_test (Optional[pd.DataFrame], default=None) – The test set. If data_test is passed then split and start_from_week parameters are ignored

  • split (Optional[Union[float, int]], default=None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.

  • start_from_week (Optional[int], default=None) – The match week that the walk-forward procedure starts predictions

  • walk_forward_window_size (int, default = -1) – Only valid if week is not None. If -1 then walk-forward procedure will not run. For example if walk_forward_window_size is 1 then the window size of walk-forward is one week.

  • columns_dict (Optional[Dict[str, Any], default=None) – The column names of data file. See ratingslib.datasets.parameters.COLUMNS_DICT for more details.

  • print_accuracy_report (bool, default=True) – If True accuracy report will be printed

  • print_classification_report (bool, default=True) – If True, the classification report will be printed

  • print_predictions (bool, default=False) – If True, the predictions will be printed

_select_X_Y(data: DataFrame, features: List[str], col_names: SimpleNamespace) Tuple[DataFrame, DataFrame, DataFrame]

Selects from data the given features. In this function we remove the non-rated weeks. Non rated weeks is the case where all instances have the same value (e.g. massey case: sometimes massey rating system requires more data and as a result it starts from 4th week to rate teams. This means that the second and third week have rating 0 for all teams. First is not included if we have selected to remove it during preprocess

Parameters
  • data (pd.DataFrame) – Games data

  • features (List[str]) – List of feature names (each name refer to a column of the data)

  • col_names (SimpleNamespace) – A simple object subclass that provides attribute access to its namespace. The attributes are the keys of columns_dict.

Returns

  • data_X (pandas.DataFrame) – Dataset that includes only the features after removing not rated weeks, if they have found.

  • data_Y (pandas.DataFrame) – Dataset that contains only the outcomes after removing not rated weeks, if they have found.

  • data (pandas.DataFrame) – Dataset after removing not rated weeks. If non rated weeks not found returns the dataset without any changes.

_predict(pred_method: Union[Literal['MLE', 'RANK'], BaseEstimator], train_X: DataFrame, train_Y: Series, test_X: DataFrame) tuple

Train first according to the given method and then predict

Parameters
  • pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome

  • train_X (pd.DataFrame) – The training set that includes only the features

  • train_Y (pd.Series) – The outcome labels of training set

  • test_X (pd.DataFrame) – The outcome labels of test set

Returns

The predictions for the target outcome and the predictions distribution

Return type

tuple

_train_and_test(*, pred_method: Union[Literal['MLE', 'RANK'], BaseEstimator], features_names: List[str]) tuple

Training and testing based on the given method

Parameters
  • pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome

  • features_names (List[str]) – List of feature names (each name refer to a column of the data)

Returns

test_Y (The outcome labels of test set) and predictions

Return type

tuple

classifier_features_repr(clf, feature_names)
ml_pred(*, clf: BaseEstimator, features_names: List[str], to_dict=False) Union[Tuple[List, List], dict]

Predict with ml classifiers

Parameters
  • clf (sklearn.base.BaseEstimator) – A scikit classifier instance

  • features_names (List[str]) – List of feature names (each name refer to a column of the data)

  • to_dict (bool, default = False) – If True then results will be returned as a dictionary where the key is the name of clf

Returns

Prediction results as tuple (test_Y, predictions) or dictionary {clf_repr: (test_Y, predictions)}

Return type

Union[Tuple[List, List], dict]

ml_pred_parallel(*, clf_list: List[BaseEstimator], features_names_list: List[List[str]], n_jobs: int = - 1) dict

Runs the ml predictions to test each one of the classifiers from the given list

Parameters
  • clf_list (List[sklearn.base.BaseEstimator]) – List of scikit estimators

  • features_names_list (List[List[str]]) – List that contains list of feature names (each name refer to a column of the data)

  • n_jobs (int, default=-1) – Number of jobs to run in parallel. -1 means using all processors. None means 1

Returns

Dictionary that maps classifier represenations to their test_Y (The outcome labels of test set) and predictions

Return type

dict

rs_pred(*, pred_method: Literal['MLE', 'RANK'], ratings: RatingSystem, to_dict: bool = False) Union[Tuple[List, List], dict]

Prediction with one of two available methods: MLE or RANK

Parameters
  • pred_method (Literal['MLE', 'RANK']) – Two available methods for predictions: ‘RANK’ or ‘MLE’ More details at ratingslib.application.SoccerOutcome

  • ratings (RatingSystem) – Rating system instance

  • to_dict (bool, default = False) – If True then results will be returned as a dictionary where the key is the name of pred_method

Returns

prediction results as tuple (test_Y, predictions) or dictionary {pred_name: (test_Y, predictions)}

Return type

Union[Tuple[List, List], dict]

rs_pred_parallel(*, pred_methods_list: List[Literal['MLE', 'RANK']], rating_systems: Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem], n_jobs: int = - 1) dict

Runs the rating prediction for each one from the methods in the given list

Parameters
  • pred_methods_list (List[Literal['MLE', 'RANK']]) – List of prediction methods

  • rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.

  • n_jobs (int, default=-1) – Number of jobs to run in parallel. -1 means using all processors. None means 1

Returns

Dictionary that maps prediction name method to results per rating system

Return type

dict

rs_tuning_params(*, ratings_dict: Dict[str, RatingSystem], predict_with: Union[Literal['MLE', 'RANK'], BaseEstimator], use_norm_ratings: bool = True, metric_name: str = 'accuracy', maximize: bool = True, print_out: bool = True, **kwargs) dict

Tuning of rating systems parameters for the given metric with grid-search.

Parameters
  • ratings_dict (Dict[str, RatingSystem]) – Dictionary that maps names to ratings. Note that ratings are stored in a pandas.DataFrame.

  • predict_with (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier. More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome

  • use_norm_ratings (bool, default=True) – if True then normalized rating values

  • metric_name (str, default='accuracy') – The optimization metric, available metrics name at https://scikit-learn.org/stable/modules/model_evaluation.html

  • maximize (bool, default = True) – If True the maximize, else minimize

  • print_out (bool, default=True) – Print results if True

  • **kwargs (dict) – All keyword arguments are passed to _score_func of scikit

Returns

best – Dictionary that maps rating system versions with best values

Return type

dict

ml_tuning_params(*, clf_list: List[BaseEstimator], features_names: List[str], metric_name: str = 'accuracy', maximize: bool = True, print_out: bool = True, n_jobs: int = - 1, **kwargs)

Tuning the classifiers hyper-parameters for the given metric with grid-search.

Parameters
  • clf_list (List[sklearn.base.BaseEstimator]) – List of scikit estimators

  • features_names_list (List[List[str]]) – List that contains list of feature names (each name refer to a column of the data)

  • metric_name (str, default='accuracy') – The optimization metric, available metrics name at https://scikit-learn.org/stable/modules/model_evaluation.html

  • maximize (bool, default = True) – If True the maximize, else minimize

  • print_out (bool, default=True) – Print results if True

  • n_jobs (int, default=-1) – Number of jobs to run in parallel. -1 means using all processors. None means 1

  • **kwargs (dict) – All keyword arguments are passed to _score_func of scikit

Returns

best – Dictionary that maps classifier representations with best values

Return type

dict

rating_norm_features(ratings) List[str]

Function to use normalized ratings as ml features For example: For AccuRATE: => for Home = H + ratingnorm + key = HratingnormAccuRATE => for Away = A + ratingnorm + key = AratingnormAccuRATE

Parameters

rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.

Returns

features – List of normalized features (each name refer to a column of the data)

Return type

List[str]

enter_values(data: DataFrame, teams_df: DataFrame, teams_dict: Dict[Any, int], rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes: Optional[Dict[str, Dict[Any, Any]]] = None, columns_dict: Optional[Dict[str, Any]] = None) DataFrame

Enter the calculated values (from rating and statistic attributes) for each data-instance and return the data. Also, truncation is applied.

Parameters
  • data (pd.DataFrame) – Games data with statistics and rating values for the teams

  • teams_df (pd.DataFrame) – Set of teams.

  • teams_dict (Dict[Any, int]) –

    Dictionary that maps teams’ names to integer value. For instance

    teams_dict = {'Arsenal': 0,
                  'Aston Villa': 1,
                  'Birmingham': 2,
                  'Blackburn': 3
                  }
    

  • rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.

  • stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).

  • columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details.

Returns

data_truncate_df – Completed data-instances

Return type

pd.DataFrame

_create_rating_data(rs_name: str, rs: RatingSystem, data_train: DataFrame, teams_df: DataFrame)

Rate teams and also create column for normalized rating values

Parameters
  • rs_name (str) – Name of rating system (from the key of dictionary)

  • rs (RatingSystem) – RatingSystem instance

  • data_train (pd.DataFrame) – Games data for training

  • teams_df (pd.DataFrame) – Set of teams.

Returns

teams_df – Teams DataFrame with rating values, and normalized rating values.

Return type

pd.DataFrame

prepare_sport_dataset(data_season: DataFrame, teams_df: DataFrame, rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes=None, start_week: int = 4, preprocess: Optional[Preprocess] = BasicPreprocess(), columns_dict: Optional[Dict[str, Any]] = None) DataFrame

Prepares the sport dataset in order to enter values of ratings and calculated games statistics to the teams every match-week.

Parameters
  • data_season (pd.DataFrame) – Games data of season

  • teams_df (pd.DataFrame) – Set of teams

  • rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.

  • stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).

  • start_week (int, optional) – The match-week that rating procedure starts. For example if match-week is 4 then rating of teams will start from 4th week. Each week ratings are computed based on the previous weeks. e.g. 7th week -> 1,2,3,4,5,6

  • preprocess (Preprocess) – The preprocess procedure for the dataset. It must be an instance of subclass of ratingslib.datasets.preprocess.Preprocess

  • columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details

Returns

DataFrame of prepared data

Return type

pd.DataFrame

prepare_sports_seasons(filenames: Union[str, Dict[int, str]], outcome: SportOutcome, rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes=None, start_week: int = 4, preprocess: Optional[Preprocess] = BasicPreprocess(), columns_dict: Optional[Dict[str, Any]] = None) Dict[int, DataFrame]

Prepares datasets for multiple files that are passed as a dictionary.

Parameters
  • filenames (Union[str, Dict[int, str]]) – Filename or dictionary that maps seasons to filename paths. e.g. {2009: ‘sports/pl2009.csv’}

  • outcome (SportOutcome) – The outcome parameter is associated with application type e.g. for soccer the type of outcome is ratingslib.application.SoccerOutcome. For more details see ratingslib.application module.

  • rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.

  • stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).

  • start_week (int, optional) – The match-week that rating procedure starts. For example if match-week is 4 then rating of teams will start from 4th week. Each week ratings are computed based on the previous weeks. e.g. 7th week -> 1,2,3,4,5,6

  • preprocess (Preprocess) – The preprocess procedure for the dataset. It must be an instance of subclass of ratingslib.datasets.preprocess.Preprocess

  • columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details

Returns

data_seasons_dict – Dictionary that maps season to DataFrame prepared data. Note that if only one filename passed then the dictionary will be returned with the following structure {1: data}

Return type

Dict[int, pd.DataFrame]