ratingslib.app_sports.methods module

Predictions of sport outcome without backtester

predict_hindsight(data: DataFrame, teams_rating_df: DataFrame, outcome: SportOutcome, pred_method: Literal['MLE', 'RANK'] = 'RANK', columns_dict: Optional[Dict[str, Any]] = None) → Tuple[list, list]

Hindsight prediction refers to predicting past games using the ratings of entire games.

Parameters

data (pd.DataFrame) – Data of games
teams_rating_df (pd.DataFrame) – Rating values of teams. Note that ‘rating’ column must be in the DataFrame columns
outcome (SportOutcome) – The outcome parameter is associated with application type e.g. for soccer the type of outcome is ratingslib.application.SoccerOutcome. For more details see ratingslib.application module.
pred_method (Literal['RANK', 'MLE'], default='RANK') – Two available methods for predictions: ‘RANK’ or ‘MLE’ More details at ratingslib.application.SoccerOutcome
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details

Returns

pred : List of predictions Y : Correct outcome values

Return type

Tuple[list, list]

accuracy_results(test_Y: list, predictions: list) → Tuple[float, int]

Returns the accuracy results in a percentage and as correctly classified samples.

Parameters

test_Y (list) – Ground truth (correct) labels.
predictions (list) – Predicted labels

Returns

accuracy (float) – Accuracy metric
correct (int) – Correctly classified samples

show_list_of_accuracy_results(names_list: List[str], test_Y: list, predictions_list: list, print_predictions: bool)

Show accuracy results for a list of models

Parameters

names_list (List[str]) – Model name list
test_Y (list) – List of correct labels
predictions_list (list) – List that contains lists of predicted labels
print_predictions (bool) – If True then predictions are printed

classification_details(name: str, test_Y: list, pred: list) → str

Return classification details for a prediction model based on truth labels and predictions

Parameters

name (str) – Name of prediction model
test_Y (list) – List of correct labels
pred (list) – List of predictions

Returns

Classification details as string

Return type

str

class Predictions(data: Union[Dict[int, DataFrame], DataFrame], outcome: SportOutcome, data_test: Optional[DataFrame] = None, split: Optional[Union[float, int]] = None, start_from_week: Optional[int] = None, walk_forward_window_size: int = - 1, columns_dict: Optional[Dict[str, Any]] = None, print_accuracy_report: bool = True, print_classification_report: bool = False, print_predictions: bool = False)

Bases: object

Class for predict soccer match results

Parameters

data (Union[Dict[int, pd.DataFrame], pd.DataFrame]) – Data of games in a dictionary or in a DataFrame. If dictionary passed then the key is the season and value is the data.
outcome (SportOutcome) – The outcome parameter is related with application type. For sports application it must be an instance of subclass of SportOutcome class. e.g. for soccer the type of outcome is ratingslib.application.SoccerOutcome. For more details see ratingslib.application module.
pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier. More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome
features_names (List[str]) – List of feature names (each name refer to a column of the data)
data_test (Optional[pd.DataFrame], default=None) – The test set. If data_test is passed then split and start_from_week parameters are ignored
split (Optional[Union[float, int]], default=None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
start_from_week (Optional[int], default=None) – The match week that the walk-forward procedure starts predictions
walk_forward_window_size (int, default = -1) – Only valid if week is not None. If -1 then walk-forward procedure will not run. For example if walk_forward_window_size is 1 then the window size of walk-forward is one week.
columns_dict (Optional[Dict[str, Any], default=None) – The column names of data file. See ratingslib.datasets.parameters.COLUMNS_DICT for more details.
print_accuracy_report (bool, default=True) – If True accuracy report will be printed
print_classification_report (bool, default=True) – If True, the classification report will be printed
print_predictions (bool, default=False) – If True, the predictions will be printed

_select_X_Y(data: DataFrame, features: List[str], col_names: SimpleNamespace) → Tuple[DataFrame, DataFrame, DataFrame]

Selects from data the given features. In this function we remove the non-rated weeks. Non rated weeks is the case where all instances have the same value (e.g. massey case: sometimes massey rating system requires more data and as a result it starts from 4th week to rate teams. This means that the second and third week have rating 0 for all teams. First is not included if we have selected to remove it during preprocess

Parameters

data (pd.DataFrame) – Games data
features (List[str]) – List of feature names (each name refer to a column of the data)
col_names (SimpleNamespace) – A simple object subclass that provides attribute access to its namespace. The attributes are the keys of columns_dict.

Returns

data_X (pandas.DataFrame) – Dataset that includes only the features after removing not rated weeks, if they have found.
data_Y (pandas.DataFrame) – Dataset that contains only the outcomes after removing not rated weeks, if they have found.
data (pandas.DataFrame) – Dataset after removing not rated weeks. If non rated weeks not found returns the dataset without any changes.

_predict(pred_method: Union[Literal['MLE', 'RANK'], BaseEstimator], train_X: DataFrame, train_Y: Series, test_X: DataFrame) → tuple

Train first according to the given method and then predict

Parameters

pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome
train_X (pd.DataFrame) – The training set that includes only the features
train_Y (pd.Series) – The outcome labels of training set
test_X (pd.DataFrame) – The outcome labels of test set

Returns

The predictions for the target outcome and the predictions distribution

Return type

tuple

_train_and_test(*, pred_method: Union[Literal['MLE', 'RANK'], BaseEstimator], features_names: List[str]) → tuple

Training and testing based on the given method

Parameters

pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome
features_names (List[str]) – List of feature names (each name refer to a column of the data)

Returns

test_Y (The outcome labels of test set) and predictions

Return type

tuple

classifier_features_repr(clf, feature_names)

ml_pred(*, clf: BaseEstimator, features_names: List[str], to_dict=False) → Union[Tuple[List, List], dict]

Predict with ml classifiers

Parameters

clf (sklearn.base.BaseEstimator) – A scikit classifier instance
features_names (List[str]) – List of feature names (each name refer to a column of the data)
to_dict (bool, default = False) – If True then results will be returned as a dictionary where the key is the name of clf

Returns

Prediction results as tuple (test_Y, predictions) or dictionary {clf_repr: (test_Y, predictions)}

Return type

Union[Tuple[List, List], dict]

ml_pred_parallel(*, clf_list: List[BaseEstimator], features_names_list: List[List[str]], n_jobs: int = - 1) → dict

Runs the ml predictions to test each one of the classifiers from the given list

Parameters

clf_list (List[sklearn.base.BaseEstimator]) – List of scikit estimators
features_names_list (List[List[str]]) – List that contains list of feature names (each name refer to a column of the data)
n_jobs (int, default=-1) – Number of jobs to run in parallel. -1 means using all processors. None means 1

Returns

Dictionary that maps classifier represenations to their test_Y (The outcome labels of test set) and predictions

Return type

dict

rs_pred(*, pred_method: Literal['MLE', 'RANK'], ratings: RatingSystem, to_dict: bool = False) → Union[Tuple[List, List], dict]

Prediction with one of two available methods: MLE or RANK

Parameters

pred_method (Literal['MLE', 'RANK']) – Two available methods for predictions: ‘RANK’ or ‘MLE’ More details at ratingslib.application.SoccerOutcome
ratings (RatingSystem) – Rating system instance
to_dict (bool, default = False) – If True then results will be returned as a dictionary where the key is the name of pred_method

Returns

prediction results as tuple (test_Y, predictions) or dictionary {pred_name: (test_Y, predictions)}

Return type

Union[Tuple[List, List], dict]

rs_pred_parallel(*, pred_methods_list: List[Literal['MLE', 'RANK']], rating_systems: Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem], n_jobs: int = - 1) → dict

Runs the rating prediction for each one from the methods in the given list

Parameters

pred_methods_list (List[Literal['MLE', 'RANK']]) – List of prediction methods
rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.
n_jobs (int, default=-1) – Number of jobs to run in parallel. -1 means using all processors. None means 1

Returns

Dictionary that maps prediction name method to results per rating system

Return type

dict

rs_tuning_params(*, ratings_dict: Dict[str, RatingSystem], predict_with: Union[Literal['MLE', 'RANK'], BaseEstimator], use_norm_ratings: bool = True, metric_name: str = 'accuracy', maximize: bool = True, print_out: bool = True, **kwargs) → dict

Tuning of rating systems parameters for the given metric with grid-search.

Parameters

ratings_dict (Dict[str, RatingSystem]) – Dictionary that maps names to ratings. Note that ratings are stored in a pandas.DataFrame.
predict_with (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier. More details for ‘RANK’ or ‘MLE’ at ratingslib.application.SoccerOutcome
use_norm_ratings (bool, default=True) – if True then normalized rating values
metric_name (str, default='accuracy') – The optimization metric, available metrics name at https://scikit-learn.org/stable/modules/model_evaluation.html
maximize (bool, default = True) – If True the maximize, else minimize
print_out (bool, default=True) – Print results if True
**kwargs (dict) – All keyword arguments are passed to _score_func of scikit

Returns

best – Dictionary that maps rating system versions with best values

Return type

dict

ml_tuning_params(*, clf_list: List[BaseEstimator], features_names: List[str], metric_name: str = 'accuracy', maximize: bool = True, print_out: bool = True, n_jobs: int = - 1, **kwargs)

Tuning the classifiers hyper-parameters for the given metric with grid-search.

Parameters

clf_list (List[sklearn.base.BaseEstimator]) – List of scikit estimators
features_names_list (List[List[str]]) – List that contains list of feature names (each name refer to a column of the data)
metric_name (str, default='accuracy') – The optimization metric, available metrics name at https://scikit-learn.org/stable/modules/model_evaluation.html
maximize (bool, default = True) – If True the maximize, else minimize
print_out (bool, default=True) – Print results if True
n_jobs (int, default=-1) – Number of jobs to run in parallel. -1 means using all processors. None means 1
**kwargs (dict) – All keyword arguments are passed to _score_func of scikit

Returns

best – Dictionary that maps classifier representations with best values

Return type

dict

rating_norm_features(ratings) → List[str]

Function to use normalized ratings as ml features For example: For AccuRATE: => for Home = H + ratingnorm + key = HratingnormAccuRATE => for Away = A + ratingnorm + key = AratingnormAccuRATE

Parameters: rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.
Returns: features – List of normalized features (each name refer to a column of the data)
Return type: List[str]

enter_values(data: DataFrame, teams_df: DataFrame, teams_dict: Dict[Any, int], rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes: Optional[Dict[str, Dict[Any, Any]]] = None, columns_dict: Optional[Dict[str, Any]] = None) → DataFrame

Enter the calculated values (from rating and statistic attributes) for each data-instance and return the data. Also, truncation is applied.

Parameters

data (pd.DataFrame) – Games data with statistics and rating values for the teams
teams_df (pd.DataFrame) – Set of teams.

teams_dict (Dict[Any, int]) –

Dictionary that maps teams’ names to integer value. For instance

teams_dict = {'Arsenal': 0,
              'Aston Villa': 1,
              'Birmingham': 2,
              'Blackburn': 3
              }

rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.
stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details.

Returns

data_truncate_df – Completed data-instances

Return type

pd.DataFrame

_create_rating_data(rs_name: str, rs: RatingSystem, data_train: DataFrame, teams_df: DataFrame)

Rate teams and also create column for normalized rating values

Parameters

rs_name (str) – Name of rating system (from the key of dictionary)
rs (RatingSystem) – RatingSystem instance
data_train (pd.DataFrame) – Games data for training
teams_df (pd.DataFrame) – Set of teams.

Returns

teams_df – Teams DataFrame with rating values, and normalized rating values.

Return type

pd.DataFrame

prepare_sport_dataset(data_season: DataFrame, teams_df: DataFrame, rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes=None, start_week: int = 4, preprocess: Optional[Preprocess] = BasicPreprocess(), columns_dict: Optional[Dict[str, Any]] = None) → DataFrame

Prepares the sport dataset in order to enter values of ratings and calculated games statistics to the teams every match-week.

Parameters

data_season (pd.DataFrame) – Games data of season
teams_df (pd.DataFrame) – Set of teams
rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.
stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).
start_week (int, optional) – The match-week that rating procedure starts. For example if match-week is 4 then rating of teams will start from 4th week. Each week ratings are computed based on the previous weeks. e.g. 7th week -> 1,2,3,4,5,6
preprocess (Preprocess) – The preprocess procedure for the dataset. It must be an instance of subclass of ratingslib.datasets.preprocess.Preprocess
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details

Returns

DataFrame of prepared data

Return type

pd.DataFrame

prepare_sports_seasons(filenames: Union[str, Dict[int, str]], outcome: SportOutcome, rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes=None, start_week: int = 4, preprocess: Optional[Preprocess] = BasicPreprocess(), columns_dict: Optional[Dict[str, Any]] = None) → Dict[int, DataFrame]

Prepares datasets for multiple files that are passed as a dictionary.

Parameters

filenames (Union[str, Dict[int, str]]) – Filename or dictionary that maps seasons to filename paths. e.g. {2009: ‘sports/pl2009.csv’}
outcome (SportOutcome) – The outcome parameter is associated with application type e.g. for soccer the type of outcome is ratingslib.application.SoccerOutcome. For more details see ratingslib.application module.
rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to None then rating values are not included in data attributes for preparation.
stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).
start_week (int, optional) – The match-week that rating procedure starts. For example if match-week is 4 then rating of teams will start from 4th week. Each week ratings are computed based on the previous weeks. e.g. 7th week -> 1,2,3,4,5,6
preprocess (Preprocess) – The preprocess procedure for the dataset. It must be an instance of subclass of ratingslib.datasets.preprocess.Preprocess
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module ratingslib.datasets.parameters for more details

Returns

data_seasons_dict – Dictionary that maps season to DataFrame prepared data. Note that if only one filename passed then the dictionary will be returned with the following structure {1: data}

Return type

Dict[int, pd.DataFrame]