ratingslib.app_sports.methods module
Predictions of sport outcome without backtester
- predict_hindsight(data: DataFrame, teams_rating_df: DataFrame, outcome: SportOutcome, pred_method: Literal['MLE', 'RANK'] = 'RANK', columns_dict: Optional[Dict[str, Any]] = None) Tuple[list, list]
Hindsight prediction refers to predicting past games using the ratings of entire games.
- Parameters
data (pd.DataFrame) – Data of games
teams_rating_df (pd.DataFrame) – Rating values of teams. Note that ‘rating’ column must be in the DataFrame columns
outcome (SportOutcome) – The outcome parameter is associated with application type e.g. for soccer the type of outcome is
ratingslib.application.SoccerOutcome
. For more details seeratingslib.application
module.pred_method (Literal['RANK', 'MLE'], default='RANK') – Two available methods for predictions: ‘RANK’ or ‘MLE’ More details at
ratingslib.application.SoccerOutcome
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module
ratingslib.datasets.parameters
for more details
- Returns
pred : List of predictions Y : Correct outcome values
- Return type
Tuple[list, list]
- accuracy_results(test_Y: list, predictions: list) Tuple[float, int]
Returns the accuracy results in a percentage and as correctly classified samples.
- Parameters
test_Y (list) – Ground truth (correct) labels.
predictions (list) – Predicted labels
- Returns
accuracy (float) – Accuracy metric
correct (int) – Correctly classified samples
- show_list_of_accuracy_results(names_list: List[str], test_Y: list, predictions_list: list, print_predictions: bool)
Show accuracy results for a list of models
- Parameters
names_list (List[str]) – Model name list
test_Y (list) – List of correct labels
predictions_list (list) – List that contains lists of predicted labels
print_predictions (bool) – If True then predictions are printed
- classification_details(name: str, test_Y: list, pred: list) str
Return classification details for a prediction model based on truth labels and predictions
- Parameters
name (str) – Name of prediction model
test_Y (list) – List of correct labels
pred (list) – List of predictions
- Returns
Classification details as string
- Return type
str
- class Predictions(data: Union[Dict[int, DataFrame], DataFrame], outcome: SportOutcome, data_test: Optional[DataFrame] = None, split: Optional[Union[float, int]] = None, start_from_week: Optional[int] = None, walk_forward_window_size: int = - 1, columns_dict: Optional[Dict[str, Any]] = None, print_accuracy_report: bool = True, print_classification_report: bool = False, print_predictions: bool = False)
Bases:
object
Class for predict soccer match results
- Parameters
data (Union[Dict[int, pd.DataFrame], pd.DataFrame]) – Data of games in a dictionary or in a DataFrame. If dictionary passed then the key is the season and value is the data.
outcome (SportOutcome) – The
outcome
parameter is related with application type. For sports application it must be an instance of subclass of SportOutcome class. e.g. for soccer the type of outcome isratingslib.application.SoccerOutcome
. For more details seeratingslib.application
module.pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier. More details for ‘RANK’ or ‘MLE’ at
ratingslib.application.SoccerOutcome
features_names (List[str]) – List of feature names (each name refer to a column of the data)
data_test (Optional[pd.DataFrame], default=None) – The test set. If data_test is passed then split and start_from_week parameters are ignored
split (Optional[Union[float, int]], default=None) – If float, should be between 0.0 and 1.0 and represent the proportion of the dataset to include in the test split. If int, represents the absolute number of test samples.
start_from_week (Optional[int], default=None) – The match week that the walk-forward procedure starts predictions
walk_forward_window_size (int, default = -1) – Only valid if week is not
None
. If-1
then walk-forward procedure will not run. For example if walk_forward_window_size is1
then the window size of walk-forward is one week.columns_dict (Optional[Dict[str, Any], default=None) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.print_accuracy_report (bool, default=True) – If True accuracy report will be printed
print_classification_report (bool, default=True) – If True, the classification report will be printed
print_predictions (bool, default=False) – If True, the predictions will be printed
- _select_X_Y(data: DataFrame, features: List[str], col_names: SimpleNamespace) Tuple[DataFrame, DataFrame, DataFrame]
Selects from data the given features. In this function we remove the non-rated weeks. Non rated weeks is the case where all instances have the same value (e.g. massey case: sometimes massey rating system requires more data and as a result it starts from 4th week to rate teams. This means that the second and third week have rating 0 for all teams. First is not included if we have selected to remove it during preprocess
- Parameters
data (pd.DataFrame) – Games data
features (List[str]) – List of feature names (each name refer to a column of the data)
col_names (SimpleNamespace) – A simple object subclass that provides attribute access to its namespace. The attributes are the keys of
columns_dict
.
- Returns
data_X (pandas.DataFrame) – Dataset that includes only the features after removing not rated weeks, if they have found.
data_Y (pandas.DataFrame) – Dataset that contains only the outcomes after removing not rated weeks, if they have found.
data (pandas.DataFrame) – Dataset after removing not rated weeks. If non rated weeks not found returns the dataset without any changes.
- _predict(pred_method: Union[Literal['MLE', 'RANK'], BaseEstimator], train_X: DataFrame, train_Y: Series, test_X: DataFrame) tuple
Train first according to the given method and then predict
- Parameters
pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier More details for ‘RANK’ or ‘MLE’ at
ratingslib.application.SoccerOutcome
train_X (pd.DataFrame) – The training set that includes only the features
train_Y (pd.Series) – The outcome labels of training set
test_X (pd.DataFrame) – The outcome labels of test set
- Returns
The predictions for the target outcome and the predictions distribution
- Return type
tuple
- _train_and_test(*, pred_method: Union[Literal['MLE', 'RANK'], BaseEstimator], features_names: List[str]) tuple
Training and testing based on the given method
- Parameters
pred_method (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier More details for ‘RANK’ or ‘MLE’ at
ratingslib.application.SoccerOutcome
features_names (List[str]) – List of feature names (each name refer to a column of the data)
- Returns
test_Y (The outcome labels of test set) and predictions
- Return type
tuple
- classifier_features_repr(clf, feature_names)
- ml_pred(*, clf: BaseEstimator, features_names: List[str], to_dict=False) Union[Tuple[List, List], dict]
Predict with ml classifiers
- Parameters
clf (sklearn.base.BaseEstimator) – A scikit classifier instance
features_names (List[str]) – List of feature names (each name refer to a column of the data)
to_dict (bool, default = False) – If True then results will be returned as a dictionary where the key is the name of clf
- Returns
Prediction results as tuple (test_Y, predictions) or dictionary {clf_repr: (test_Y, predictions)}
- Return type
Union[Tuple[List, List], dict]
- ml_pred_parallel(*, clf_list: List[BaseEstimator], features_names_list: List[List[str]], n_jobs: int = - 1) dict
Runs the ml predictions to test each one of the classifiers from the given list
- Parameters
clf_list (List[sklearn.base.BaseEstimator]) – List of scikit estimators
features_names_list (List[List[str]]) – List that contains list of feature names (each name refer to a column of the data)
n_jobs (int, default=-1) – Number of jobs to run in parallel.
-1
means using all processors.None
means 1
- Returns
Dictionary that maps classifier represenations to their test_Y (The outcome labels of test set) and predictions
- Return type
dict
- rs_pred(*, pred_method: Literal['MLE', 'RANK'], ratings: RatingSystem, to_dict: bool = False) Union[Tuple[List, List], dict]
Prediction with one of two available methods: MLE or RANK
- Parameters
pred_method (Literal['MLE', 'RANK']) – Two available methods for predictions: ‘RANK’ or ‘MLE’ More details at
ratingslib.application.SoccerOutcome
ratings (RatingSystem) – Rating system instance
to_dict (bool, default = False) – If True then results will be returned as a dictionary where the key is the name of pred_method
- Returns
prediction results as tuple (test_Y, predictions) or dictionary {pred_name: (test_Y, predictions)}
- Return type
Union[Tuple[List, List], dict]
- rs_pred_parallel(*, pred_methods_list: List[Literal['MLE', 'RANK']], rating_systems: Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem], n_jobs: int = - 1) dict
Runs the rating prediction for each one from the methods in the given list
- Parameters
pred_methods_list (List[Literal['MLE', 'RANK']]) – List of prediction methods
rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to
None
then rating values are not included in data attributes for preparation.n_jobs (int, default=-1) – Number of jobs to run in parallel.
-1
means using all processors.None
means 1
- Returns
Dictionary that maps prediction name method to results per rating system
- Return type
dict
- rs_tuning_params(*, ratings_dict: Dict[str, RatingSystem], predict_with: Union[Literal['MLE', 'RANK'], BaseEstimator], use_norm_ratings: bool = True, metric_name: str = 'accuracy', maximize: bool = True, print_out: bool = True, **kwargs) dict
Tuning of rating systems parameters for the given metric with grid-search.
- Parameters
ratings_dict (Dict[str, RatingSystem]) – Dictionary that maps names to ratings. Note that ratings are stored in a pandas.DataFrame.
predict_with (Union[Literal['MLE', 'RANK'], sklearn.base.BaseEstimator]) – Three available methods for predictions: ‘RANK’ or ‘MLE’ or a scikit classifier. More details for ‘RANK’ or ‘MLE’ at
ratingslib.application.SoccerOutcome
use_norm_ratings (bool, default=True) – if True then normalized rating values
metric_name (str, default='accuracy') – The optimization metric, available metrics name at https://scikit-learn.org/stable/modules/model_evaluation.html
maximize (bool, default = True) – If True the maximize, else minimize
print_out (bool, default=True) – Print results if True
**kwargs (dict) – All keyword arguments are passed to _score_func of scikit
- Returns
best – Dictionary that maps rating system versions with best values
- Return type
dict
- ml_tuning_params(*, clf_list: List[BaseEstimator], features_names: List[str], metric_name: str = 'accuracy', maximize: bool = True, print_out: bool = True, n_jobs: int = - 1, **kwargs)
Tuning the classifiers hyper-parameters for the given metric with grid-search.
- Parameters
clf_list (List[sklearn.base.BaseEstimator]) – List of scikit estimators
features_names_list (List[List[str]]) – List that contains list of feature names (each name refer to a column of the data)
metric_name (str, default='accuracy') – The optimization metric, available metrics name at https://scikit-learn.org/stable/modules/model_evaluation.html
maximize (bool, default = True) – If True the maximize, else minimize
print_out (bool, default=True) – Print results if True
n_jobs (int, default=-1) – Number of jobs to run in parallel.
-1
means using all processors.None
means 1**kwargs (dict) – All keyword arguments are passed to _score_func of scikit
- Returns
best – Dictionary that maps classifier representations with best values
- Return type
dict
- rating_norm_features(ratings) List[str]
Function to use normalized ratings as ml features For example: For AccuRATE: => for Home = H + ratingnorm + key = HratingnormAccuRATE => for Away = A + ratingnorm + key = AratingnormAccuRATE
- Parameters
rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to
None
then rating values are not included in data attributes for preparation.- Returns
features – List of normalized features (each name refer to a column of the data)
- Return type
List[str]
- enter_values(data: DataFrame, teams_df: DataFrame, teams_dict: Dict[Any, int], rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes: Optional[Dict[str, Dict[Any, Any]]] = None, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
Enter the calculated values (from rating and statistic attributes) for each data-instance and return the data. Also, truncation is applied.
- Parameters
data (pd.DataFrame) – Games data with statistics and rating values for the teams
teams_df (pd.DataFrame) – Set of teams.
teams_dict (Dict[Any, int]) –
Dictionary that maps teams’ names to integer value. For instance
teams_dict = {'Arsenal': 0, 'Aston Villa': 1, 'Birmingham': 2, 'Blackburn': 3 }
rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to
None
then rating values are not included in data attributes for preparation.stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module
ratingslib.datasets.parameters
for more details.
- Returns
data_truncate_df – Completed data-instances
- Return type
pd.DataFrame
- _create_rating_data(rs_name: str, rs: RatingSystem, data_train: DataFrame, teams_df: DataFrame)
Rate teams and also create column for normalized rating values
- Parameters
rs_name (str) – Name of rating system (from the key of dictionary)
rs (RatingSystem) – RatingSystem instance
data_train (pd.DataFrame) – Games data for training
teams_df (pd.DataFrame) – Set of teams.
- Returns
teams_df – Teams DataFrame with rating values, and normalized rating values.
- Return type
pd.DataFrame
- prepare_sport_dataset(data_season: DataFrame, teams_df: DataFrame, rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes=None, start_week: int = 4, preprocess: Optional[Preprocess] = BasicPreprocess(), columns_dict: Optional[Dict[str, Any]] = None) DataFrame
Prepares the sport dataset in order to enter values of ratings and calculated games statistics to the teams every match-week.
- Parameters
data_season (pd.DataFrame) – Games data of season
teams_df (pd.DataFrame) – Set of teams
rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to
None
then rating values are not included in data attributes for preparation.stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).
start_week (int, optional) – The match-week that rating procedure starts. For example if match-week is 4 then rating of teams will start from 4th week. Each week ratings are computed based on the previous weeks. e.g. 7th week -> 1,2,3,4,5,6
preprocess (Preprocess) – The preprocess procedure for the dataset. It must be an instance of subclass of
ratingslib.datasets.preprocess.Preprocess
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module
ratingslib.datasets.parameters
for more details
- Returns
DataFrame of prepared data
- Return type
pd.DataFrame
- prepare_sports_seasons(filenames: Union[str, Dict[int, str]], outcome: SportOutcome, rating_systems: Optional[Union[Dict[str, RatingSystem], List[RatingSystem], RatingSystem]] = None, stats_attributes=None, start_week: int = 4, preprocess: Optional[Preprocess] = BasicPreprocess(), columns_dict: Optional[Dict[str, Any]] = None) Dict[int, DataFrame]
Prepares datasets for multiple files that are passed as a dictionary.
- Parameters
filenames (Union[str, Dict[int, str]]) – Filename or dictionary that maps seasons to filename paths. e.g. {2009: ‘sports/pl2009.csv’}
outcome (SportOutcome) – The outcome parameter is associated with application type e.g. for soccer the type of outcome is
ratingslib.application.SoccerOutcome
. For more details seeratingslib.application
module.rating_systems (Dict[str, RatingSystem] or List[RatingSystem]]] or RatingSystem or None, default=None) – If type is dictionary then it is mapping names (or rating keys) to rating systems. If type is list of rating systems instances then it firstly converted to dictionary. If type is RatingSystem instance then it firstly converted to dictionary. If it is set to
None
then rating values are not included in data attributes for preparation.stats_attributes (Optional[Dict[str, Dict[Any, Any]]], default=None) – The statistic attributes e.g. soccer sport: TW (Total Wins), TG (Total Goals), TS (Total Shots), TST (Total Shots on Target).
start_week (int, optional) – The match-week that rating procedure starts. For example if match-week is 4 then rating of teams will start from 4th week. Each week ratings are computed based on the previous weeks. e.g. 7th week -> 1,2,3,4,5,6
preprocess (Preprocess) – The preprocess procedure for the dataset. It must be an instance of subclass of
ratingslib.datasets.preprocess.Preprocess
columns_dict (Optional[Dict[str, str]], default=None) – A dictionary mapping the column names of the dataset. See the module
ratingslib.datasets.parameters
for more details
- Returns
data_seasons_dict – Dictionary that maps season to DataFrame prepared data. Note that if only one filename passed then the dictionary will be returned with the following structure {1: data}
- Return type
Dict[int, pd.DataFrame]