ratingslib.ratings package
Python package for Rating methods This package includes the implementation of the following rating methods:
Several evaluation metrics that related with rating methods are included in the module metrics.py.
All rating systems in this package have been implemented by exploiting several functions of NumPy and SciPy libraries in python that are intended for algebraic and scientific computations. Particularly, NumPy was used:
matrices and vectors handling
linear systems solving
finding eigenvalues and eigenvectors
other problems of linear algebra required for the implementation of rating methods
As for the statistical tests, such as Kendalls’s Tau for the correlation of ranking lists, SciPy was used.
- class Winloss(version=ratings.WINLOSS, normalization=True)
Bases:
RatingSystem
The traditional rating method which is popular in the field of sports. In the case of sports teams the method takes into account the total wins of each team. The first-ranked team is the team with the most wins. Note that for any kind of items, there are many ways to define the notion of a hypothetical matchup and then to determine scores and winners.
- Parameters
version (str, default=ratings.WINLOSS) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.normalization (bool, default = True) – If
True
then the result will be normalized according to the total times each item occurs in the dataset. For example in sport teams set normalization =True
if the teams haven’t played same number of games. This means that each element of W vector is divided by the total number of games played by the respective team.
- W
The WinLoss vector for items of shape (n,) where n = the total number of items. Each element of vector represents the total wins of the respective item.
- Type
numpy.ndarray
Examples
The following example demonstrates Winloss rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.winloss import Winloss >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> Winloss(normalization=False).rate_from_file(filename) Item rating ranking 0 Arsenal 0.0 3 1 Bournemouth 2.0 1 2 Brighton 1.0 2 3 Burnley 0.0 3 4 Cardiff 0.0 3 5 Chelsea 2.0 1 6 Crystal Palace 1.0 2 7 Everton 1.0 2 8 Fulham 0.0 3 9 Huddersfield 0.0 3 10 Leicester 1.0 2 11 Liverpool 2.0 1 12 Man City 2.0 1 13 Man United 1.0 2 14 Newcastle 0.0 3 15 Southampton 0.0 3 16 Tottenham 2.0 1 17 Watford 2.0 1 18 West Ham 0.0 3 19 Wolves 0.0 3
- computation_phase()
All the calculations are made in
ratingslib.ratings.winloss.Winloss.create_win_loss_vector()
method. Winloss vector is the rating vector.
- create_win_loss_vector(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None) ndarray
Construction of WinLoss vector.
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- class Keener(version=ratings.KEENER, normalization: bool = True)
Bases:
RatingSystem
This method has been proposed by James P. Keener in 1993 for football teams ranking in uneven paired competition [1]_. Keener’s method is based on the theory of nonnegative matrices and forms a smoothed matrix of scores generated by Laplace’s rule of succession.
- Parameters
version (str, default=ratings.KEENER) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.normalization (bool, default = True) – If
True
then the result will be normalized according to the total times each item occurs in the dataset. For example in sport teams set normalization =True
if the teams haven’t played same number of games.
- A
The Keener matrix. It has shape (n, n) where n = the total number of items.
- Type
numpy.ndarray
- S
The matrix containing the cumulative number of points scored by each item to any other item. It has shape (n, n) where n = the total number of items.
- Type
numpy.ndarray
References
- 1
Keener, J. P., 1993. The Perron-Frobenius theorem and the ranking of football teams. SIAM Review, 35(1), pp. 80-93
Examples
The following example demonstrates Keener rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.keener import Keener >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> Keener(normalization=False).rate_from_file(filename) Item rating ranking 0 Arsenal 0.047220 17 1 Bournemouth 0.052874 4 2 Brighton 0.049183 11 3 Burnley 0.048576 14 4 Cardiff 0.048243 16 5 Chelsea 0.052800 5 6 Crystal Palace 0.049870 10 7 Everton 0.051214 7 8 Fulham 0.046829 18 9 Huddersfield 0.046068 20 10 Leicester 0.050701 8 11 Liverpool 0.053796 1 12 Man City 0.053511 2 13 Man United 0.050322 9 14 Newcastle 0.048939 13 15 Southampton 0.048969 12 16 Tottenham 0.052569 6 17 Watford 0.053265 3 18 West Ham 0.046730 19 19 Wolves 0.048320 15
- static compute(A: ndarray)
- computation_phase()
To be overridden in subclasses.
- create_keener_matrix(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None) Tuple[ndarray, ndarray]
Construction of Keener matrix and points matrix S
- h_skew(x)
Skewing function
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- class Massey(version=ratings.MASSEY, data_limit=0)
Bases:
RatingSystem
This method was proposed by Kenneth Massey in 1997 for ranking college football teams [1]_. The Massey method apart from numbers of wins and losses, it also considers the point score data to rate items via a system of linear equations. It uses a linear least squares regression to solve a system of linear equations. Note that point score data depends on the application, for instance in soccer teams the points are the number of goals of each team.
- Parameters
version (str, default=ratings.MASSEY) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.data_limit (int, default=0) – The parameter data_limit specifies the minimum number of observations in the dataset. Default is set
0
and indicates no limit.
- Madj
The adjusted Massey matrix. The last row of this matrix is replaced with vector of all ones.
- Type
numpy.ndarray
- d_adj
The adjusted point differentials vector. The last item of this vector is replaced zero.
- Type
numpy.ndarray
References
- 1
Massey, K. (1997). Statistical models applied to the rating of sports teams. Statistical models applied to the rating of sports teams.
Examples
The following example demonstrates Massey rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.massey import Massey >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> Massey().rate_from_file(filename) Item rating ranking 0 Arsenal 2.500000e+00 11 1 Bournemouth 4.781250e+00 3 2 Brighton -4.781250e+00 14 3 Burnley -6.031250e+00 17 4 Cardiff 3.031250e+00 9 5 Chelsea 3.250000e+00 8 6 Crystal Palace 5.031250e+00 2 7 Everton -6.281250e+00 18 8 Fulham 2.781250e+00 10 9 Huddersfield 2.220446e-15 12 10 Leicester -5.531250e+00 16 11 Liverpool 7.281250e+00 1 12 Man City 4.750000e+00 4 13 Man United -5.156250e+00 15 14 Newcastle 3.281250e+00 7 15 Southampton -6.656250e+00 19 16 Tottenham 4.531250e+00 5 17 Watford -3.406250e+00 13 18 West Ham 3.531250e+00 6 19 Wolves -6.906250e+00 20
- computation_phase()
To be overridden in subclasses.
- create_massey_matrix(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None) Tuple[ndarray, ndarray]
Construction of adjusted Massey matrix (
M_adj
) and adjusted point differential vector (d_adj
)
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- class OffenseDefense(version=ratings.OD, tol=0.0001)
Bases:
RatingSystem
Offense-Defense is a modified version of ranking algorithm HITS used in Ask search engine. This rating system developed by Anjela Govan during her PhD [1]_ [2]_ for sport teams rating. The main idea of this method is to separate the offensive and defensive strength of each team and the final rating vector can be generated by combining offensive and defensive lists.
- Parameters
version (str, default=ratings.OD) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.tol (float, default=0.0001) – Tolerance level
- A
Adjacency matrix of items scores.
- Type
numpy.ndarray
- P
P matrix of OD method.
- Type
numpy.ndarray
- defense
Defense rating vector.
- Type
numpy.ndarray
- offense
Offense rating vector.
- Type
numpy.ndarray
- error
Error until convergence.
- Type
float
- iter
Number of iterations to produce convergence of both of the Offense and Defense vectors.
- Type
int
References
- 1
Govan, A. Y., Langville, A. N., & Meyer, C. D. (2009). Offense-defense approach to ranking team sports. Journal of Quantitative Analysis in Sports, 5(1)
- 2
Govan, A. Y. (2008). Ranking Theory with Application to Popular Sports. Ph.D. dissertation, North Carolina State University.
Examples
The following example demonstrates Offense-Defense rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.od import OffenseDefense >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> OffenseDefense(tol=0.0001).rate_from_file(filename) Item rating ranking 0 Arsenal 1.934298e+00 12 1 Bournemouth 1.312759e+06 3 2 Brighton 3.712401e+00 11 3 Burnley 4.103263e+05 4 4 Cardiff 4.757391e-04 14 5 Chelsea 4.576599e+00 10 6 Crystal Palace 1.049852e+03 7 7 Everton 2.080318e-07 16 8 Fulham 1.532621e-12 20 9 Huddersfield 5.583941e-01 13 10 Leicester 1.876215e+04 6 11 Liverpool 3.403605e+12 1 12 Man City 5.284698e+00 9 13 Man United 6.223487e+00 8 14 Newcastle 9.071493e-08 18 15 Southampton 2.510896e-07 15 16 Tottenham 9.071797e-08 17 17 Watford 1.864760e+06 2 18 West Ham 2.311787e+05 5 19 Wolves 3.793810e-11 19
- computation_phase()
Compute offense, defense vectors and overall ratings.
- create_score_matrices(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None) Tuple[ndarray, ndarray]
Construct score matrix A and P
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
Create A and P matrices
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- class Markov(*, version=ratings.MARKOV, b: float = 1, stats_markov_dict: Optional[Union[Dict[str, dict], Set[str]]] = None)
Bases:
RatingSystem
This class implements the Markov (GeM - Generalized Markov Method) rating system. GeM was first used by graduate students, Angela Govan [1]_ and Luke Ingram [2]_ to successfully rank NFL football and NCAA basketball teams respectively. The Markov (GeM) method is related to the famous PageRank method 3 and it uses parts of finite Markov chains and graph theory in order to generate ratings of n objects in a finite set. Not only sports but also any problem that can be represented as a weighted directed graph can be solved using GeM model.
- Parameters
version (str, default=ratings.MARKOV) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.b (float, default=1) – The damping factor. Valid numbers are in the range [0,1]
stats_markov_dict (Optional[Dict[str, Dict[Any, Any]]], default=None) –
A dictionary containing statistics details for the method. For instance for soccer teams rating, the following dictionary
stats_markov_dict
:stats_markov_dict = { 'TotalWins': {'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'VotingWithLosses'}, 'TotalGoals': {'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'WinnersAndLosersVotePoint'} }
specifies the following details:
TotalGoals
andTotalWins
are the names of two statistics'VOTE' : 10
means that the vote is 10. Those votes will be converted as weights. The statistics in this example are equally weighted'ITEM_I': 'FTHG'
and'ITEM_J': 'FTAG'
are the column names for home and away team respectivelyThe key
'METHOD'
specifies which method constructs the voting matrix. The available methods are:'VotingWithLosses'
when the losing team casts a number of votes equal to the margin of victory in its matchup with a stronger opponent.'WinnersAndLosersVotePoint'
when both the winning and losing teams vote with the number of points given up.'LosersVotePointDiff'
when the losing team cast a number of votes
- See also the implementation of the method
- stats
Dictionary that maps voting and stochastic arrays. The keys that starts with V map the voting matrices and with S map the stochastic matrices
- Type
Dict[str, np.ndarray]
- params
Dictionary that maps parameters to their values.
- Type
Dict[str, Optional[Dict[str, Dict[Any, Any]]]]
- stochastic_matrix
A Stochastic Markov matrix is a square matrix where each entry describes the probability that the item will vote for the respective item.
- Type
np.ndarray
- stochastic_matrix_asch
A Stochastic Markov matrix that is irreducible
- Type
np.ndarray
- pi_steady
The stationary vector or dominant eigenvector of the
stochastic_matrix
.- Type
np.ndarray
- group
Set of statistics names
- Type
Set[str]
- Raises
ValueError – Value of b ∈ [0, 1]
Examples
The following example demonstrate GeM rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.markov import Markov >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> votes = { 'TW': { 'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'VotingWithLosses'}, 'TG': { 'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'WinnersAndLosersVotePoint'}, 'TST': { 'VOTE': 10, 'ITEM_I': 'HST', 'ITEM_J': 'AST', 'METHOD': 'WinnersAndLosersVotePoint'}, 'TS': { 'VOTE': 10, 'ITEM_I': 'HS', 'ITEM_J': 'AS', 'METHOD': 'WinnersAndLosersVotePoint'}, } >>> Markov(b=0.85, stats_markov_dict=votes).rate_from_file(filename) Item rating ranking 0 Arsenal 0.050470 11 1 Bournemouth 0.039076 15 2 Brighton 0.051460 10 3 Burnley 0.071596 2 4 Cardiff 0.024085 20 5 Chelsea 0.045033 13 6 Crystal Palace 0.037678 16 7 Everton 0.066307 3 8 Fulham 0.036356 17 9 Huddersfield 0.032164 19 10 Leicester 0.055491 7 11 Liverpool 0.056879 6 12 Man City 0.048325 12 13 Man United 0.061052 4 14 Newcastle 0.035814 18 15 Southampton 0.051716 9 16 Tottenham 0.053079 8 17 Watford 0.082788 1 18 West Ham 0.041824 14 19 Wolves 0.058807 5
References
- 1
Govan, A. Y. (2008). Ranking Theory with Application to Popular Sports. Ph.D. dissertation, North Carolina State University.
- 2
Ingram, L. C. (2007). Ranking NCAA sports teams with Linear algebra. Ranking NCAA sports teams with Linear algebra. Charleston
- 3
Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 33:107-17, 1998.
- set_group()
Set the group of statistics.
- static do_stochastic(voting_matrix: ndarray)
Normalize the rows of the voting matrix to develop a stochastic transition probability matrix.
- Parameters
voting_matrix (List[list]) –
- Returns
stochastic_matrix – Stochastic matrix built from the corresponding voting
- Return type
numpy.ndarray
- static compute(stochastic_matrix, b)
- computation_phase()
Compute the stationary vector or dominant eigenvector of the transpose of irreducible matrix. Stationary vector is the rating vector. Note: irreducible matrix is the
stochastic_matrix_asch
and stationary vector is thepi_steady
.
- create_voting_matrix(*, voting_method: Literal['VotingWithLosses', 'WinnersAndLosersVotePoint', 'LosersVotePointDiff'], data_df: DataFrame, items_df: DataFrame, col_name_home: str, col_name_away: str, columns_dict: Optional[Dict[str, Any]] = None) ndarray
Selection of method for developing voting matrix. The available methods are:
'VotingWithLosses'
when the losing team casts a number of votes equal to the margin of victory in its matchup with a stronger opponent.'WinnersAndLosersVotePoint'
when both the winning and losing teams vote with the number of points given up.'LosersVotePointDiff'
when the losing team cast a number of votes.
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
During preparation phase, voting and stochastic matrices are constructed for each statistic according to the method specified in the dictionary of attr:stats_markov_dict.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- static validate_stats_markov_dict(stats_markov_dict: dict)
- class AccuRate(version: str = ratings.ACCURATE, starting_point: float = 0)
Bases:
RatingSystem
This class implements the
ratingslib.ratings.RatingSystem
abstract class using an approach called AccuRate for the computation of rating values as described in the paper [1]_- Parameters
version (str, default=ratings.ACCURATE) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.starting_point (int) – the value where the initial rating starts
References
- 1
Kyriakides, G., Talattinis, K., & Stephanides, G. (2017). A Hybrid Approach to Predicting Sports Results and an AccuRATE Rating System. International Journal of Applied and Computational Mathematics, 3(1), 239–254.
Examples
The following example demonstrates Accurate rating system for a simple soccer competition where only two teams participate, team “Good” and team “Better”.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_ACCURATE_PAPER_EXAMPLE >>> from ratingslib.ratings.accurate import AccuRate >>> filename = dataset_path(FILENAME_ACCURATE_PAPER_EXAMPLE) >>> AccuRate().rate_from_file(filename) Team rating ranking 0 Better 1.681793 1 1 Good -1.587401 2
- create_rating_vector(data: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None) ndarray
Calculates ratings according to pairs of items data.
- computation_phase()
Nothing to compute, all the calculations are made in
ratingslib.ratings.accurate.AccuRate.create_rating_vector()
method
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- class Elo(version=ratings.ELOWIN, K: int = 40, HA=0, ks=400, starting_point: float = 1500)
Bases:
RatingSystem
Elo ranking system developed by Arpad Elo [1]_ in order to rank chess players, this system has been adopted by quite a lot of sports and organizations.
This implementation includes two basic versions of Elo:
The first is called EloWin and takes into account total wins of items. In soccer teams the final outcome determines the winner.
The second is called EloPoint and takes into account items scores. In soccer the points are the goals scored be each team.
Note that for any kind of items, there are many ways to define the notion of a hypothetical matchup and then to determine scores and winners.
- Parameters
version (str, default=ratings.ELOWIN) – a string that shows version of rating system
K (int, default=40) – K-factor is the maximum possible adjustment per pair of items. For soccer, K–factor plays an important role because it balances the deviation for the goal difference in the game against prior ratings.
HA (int, default=0) – The home advantage factor is an adjustment that is used due to the fact that home teams tend to score more goals. Elo system applies the home-field advantage factor, by adding it to the rating of home team. Many implementations of Elo model for soccer, set the home-field advantage to 100. The default value
0
means that method does not take into account home advantage factor.ks (float, default=400) – Parameter ξ (
ks
) affects the spread of ratings and comes from logistic function. For chess and soccer games usually, ξ is set to 400.starting_point (float, default = 1500) – The value where the initial rating starts
Notes
Soccer application and Elo: According to the type of soccer tournament the following values represents the K-Factor value suggested by several internet sites [2]_:
World Cup Finals = 60
Continental Championship Finals and Major Intercontinental tournaments = 50
World Cup Qualifiers and Major Tournaments = 40
All other tournaments = 30
Friendly matches = 20
References
- 1
Elo, A. E. (1978). The rating of chessplayers, past and present. Arco Pub.
- 2
Examples
The following examples demonstrates the EloWin and the EloPoint version, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.elo import Elo >>> from ratingslib.utils.enums import ratings >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> Elo(version=ratings.ELOWIN, starting_point=0).rate_from_file(filename) Item rating ranking 0 Arsenal -37.707535 12 1 Bournemouth 37.707535 3 2 Brighton 2.292465 5 3 Burnley -18.849977 9 4 Cardiff -20.000000 10 5 Chelsea 37.707535 3 6 Crystal Palace 0.000000 7 7 Everton 20.000000 4 8 Fulham -37.707535 12 9 Huddersfield -37.707535 12 10 Leicester 1.150023 6 11 Liverpool 40.000000 1 12 Man City 37.707535 3 13 Man United -2.292465 8 14 Newcastle -20.000000 10 15 Southampton -20.000000 10 16 Tottenham 37.707535 3 17 Watford 38.849977 2 18 West Ham -37.707535 12 19 Wolves -21.150023 11 >>> Elo(version=ratings.ELOPOINT, starting_point=0).rate_from_file(filename) Item rating ranking 0 Arsenal -11.592411 17 1 Bournemouth 12.658841 5 2 Brighton -6.337388 14 3 Burnley -6.091179 13 4 Cardiff -9.654647 15 5 Chelsea 13.592411 4 6 Crystal Palace 0.191876 10 7 Everton 4.000000 8 8 Fulham -15.861198 18 9 Huddersfield -21.846379 20 10 Leicester 6.230248 7 11 Liverpool 23.141457 1 12 Man City 19.846379 2 13 Man United 0.337388 9 14 Newcastle -4.345353 12 15 Southampton -4.000000 11 16 Tottenham 9.861198 6 17 Watford 16.091179 3 18 West Ham -15.992174 19 19 Wolves -10.230248 16
- create_rating_vector(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None) ndarray
Calculates Elo ratings according to pairs of items data.
- computation_phase()
Nothing to compute, all computations are made in
ratingslib.ratings.elo.Elo.create_rating_vector()
method
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- static prepare_for_gridsearch_tuning(*, version_list: Optional[List[str]] = None, k_range: Optional[List[float]] = None, ks_range: Optional[List[float]] = None, HA_range: Optional[List[float]] = None) Dict[str, RatingSystem]
Create instances that are intended for tuning parameters.
- Parameters
version_list (List[str]) – List of Elo versions
k_range (Optional[List[float]], default=None) – List of k values. If
None
then parameter is not intended for tuningks_range (Optional[List[float]], default=None) – List of ks values. If
None
then parameter is not intended for tuningHA_range (Optional[List[float]], default=None) – List of HA values. If
None
then parameter is not intended for tuning
- Returns
rating_systems_dict – Dictionary that contains Elo instances with the parameters we want for tuning.
- Return type
dict
- class Colley(version=ratings.COLLEY)
Bases:
RatingSystem
This class implements the Colley rating system. This system was proposed by astrophysicist Dr. Wesley Colley in 2001 for ranking sports teams. Colley’s method [1]_ makes use of an idea from probability theory, known as Laplace’s ‘‘rule of succession’’. In fact, it is a modified form of the win-loss method, which uses the percentage of wins of each team.
- Parameters
version (str, default=ratings.COLLEY) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.
- C
The Colley matrix of shape (n,n) where n = the total number of items.
- Type
numpy.ndarray
- b
The right-hand side vector
b
of shape (n,) where n = the total number of items.- Type
numpy.ndarray
References
- 1
Colley, W. (2002). Colley’s bias free college football ranking method: The Colley Matrix Explained.
Examples
The following example demonstrates Colley rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.colley import Colley >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> Colley().rate_from_file(filename) Item rating ranking 0 Arsenal 0.333333 16 1 Bournemouth 0.686012 3 2 Brighton 0.562500 6 3 Burnley 0.401786 10 4 Cardiff 0.394345 11 5 Chelsea 0.666667 5 6 Crystal Palace 0.501488 8 7 Everton 0.562500 6 8 Fulham 0.293155 17 9 Huddersfield 0.333333 16 10 Leicester 0.473214 9 11 Liverpool 0.712798 2 12 Man City 0.666667 5 13 Man United 0.508929 7 14 Newcastle 0.391369 12 15 Southampton 0.366071 14 16 Tottenham 0.671131 4 17 Watford 0.741071 1 18 West Ham 0.349702 15 19 Wolves 0.383929 13
- computation_phase()
Solve the system Cr=b to obtain the Colley rating vector r.
- create_colley_matrix(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None) Tuple[ndarray, ndarray]
Construction of Colley coefficient matrix
C
and right-hand side vectorb
.
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- kendall_tau_table(ratings_dict: Dict[str, DataFrame], print_out: bool = True) List[List[float]]
Kendall Tau comparison of ranking lists.
- Parameters
ratings_dict (Dict[str, pd.DataFrame]) – Dictionary that maps names to ratings. Note that ratings are stored in a pandas.DataFrame.
print_out (bool) – If
True
then print results table.
- Returns
kendall_results – Table of Kendall tau results. The lower diagonal elements represent Kendall’s tau values of each pair, while the upper diagonal elements the p-values of each pair from the two-sided hypothesis test, whose null hypothesis is an absence of association
- Return type
List[List[float]]
- class RatingAggregation(version=ratings.AGGREGATIONMARKOV, votes_or_weights: Optional[Dict[str, float]] = None, b: float = 0.9)
Bases:
RatingSystem
Class for Rating aggregation
- Parameters
version (str, default=ratings.AGGREGATIONMARKOV) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.votes_or_weights (Optional[Dict[str, float]]) – Votes or weigths for matrices
b (float, optional) – Valid if aggregation method = ratings.AGGREGATIONMARKOV, by default 0.9
- calc_rating_distances(data_df: DataFrame, rating_column_name: str) ndarray
Calculate and create pairwise matrix by taking into account the rating differences (as distances)
- Parameters
data_df (pd.DataFrame) – dataset of ratings
rating_column_name (str) – which is the rating column of dataset
- Returns
matrix – rating distances matrix
- Return type
np.ndarray
- calc_dict_rating_distances(data_df: DataFrame, rating_columns: List[str]) Dict[str, ndarray]
Calculate and create dictionary of pairwise matrices by taking into account the rating differences (as distances). Each column represents the rating method name.
- Parameters
data_df (pd.DataFrame) – dataset of ratings
rating_columns (List[str]) – list of columns that refers to ratings
- Returns
matrices_dict – dictionary that maps column to rating distance matrix
- Return type
Dict[str, np.ndarray]
- static rating_aggregation(matrices_dict: Dict[str, ndarray], votes_or_weights: Optional[Dict[str, float]] = None, aggregation_method: str = ratings.AGGREGATIONMARKOV, b: float = 0.9) ndarray
Rating aggregation from rating lists
- Parameters
matrices_dict (Dict[str, np.ndarray]) – Dictionary that maps name to rating distance matrix
votes_or_weights (Optional[Dict[str, float]]) – Votes or weigths for matrices
aggregation_method (str, default=ratings.AGGREGATIONMARKOV) – Name of aggregation method
b (float, optional) – Valid if aggregation method = ratings.AGGREGATIONMARKOV, by default 0.9
- Returns
rating – Aggregated rating vector
- Return type
numpy.ndarray
- Raises
ValueError – If matrices_dict and votes_or_weights parameters don’t have the same size
- computation_phase()
To be overridden in subclasses.
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- class RankingAggregation(version=ratings.RANKINGAVG)
Bases:
RatingSystem
Class for Ranking Aggregation
- Parameters
version (str, default=ratings.RANKINGAVG) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.
- static ranking_aggregation(data_df: DataFrame, rating_columns: List[str], aggregation_method: str = ratings.RANKINGAVG) ndarray
Ranking aggregation from ranking lists
- Parameters
data_df (pd.DataFrame) – Rating values are the columns of DataFrame
aggregation_method (str, default=ratings.AGGREGATIONAVG) – Name of aggregation method
- Returns
rating – Aggregated rating vector
- Return type
numpy.ndarray
- Raises
ValueError – If matrices_dict and votes_or_weights parameters don’t have the same size
- computation_phase()
All the calculations are made in
ratingslib.ratings.aggregations.RankingAggregation.ranking_aggregation()
method.
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
To be overridden in subclasses.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
Subpackages
Submodules
- ratingslib.ratings.accurate module
- ratingslib.ratings.aggregation module
- ratingslib.ratings.colley module
- ratingslib.ratings.elo module
- ratingslib.ratings.keener module
- ratingslib.ratings.markov module
- ratingslib.ratings.massey module
- ratingslib.ratings.methods module
- ratingslib.ratings.metrics module
- ratingslib.ratings.od module
- ratingslib.ratings.rating module
- ratingslib.ratings.winloss module