ratingslib.ratings.markov module
Markov (GeM) Rating System
- class Markov(*, version=ratings.MARKOV, b: float = 1, stats_markov_dict: Optional[Union[Dict[str, dict], Set[str]]] = None)
Bases:
RatingSystem
This class implements the Markov (GeM - Generalized Markov Method) rating system. GeM was first used by graduate students, Angela Govan 1 and Luke Ingram 2 to successfully rank NFL football and NCAA basketball teams respectively. The Markov (GeM) method is related to the famous PageRank method 3 and it uses parts of finite Markov chains and graph theory in order to generate ratings of n objects in a finite set. Not only sports but also any problem that can be represented as a weighted directed graph can be solved using GeM model.
- Parameters
version (str, default=ratings.MARKOV) – A string that shows the version of rating system. The available versions can be found in
ratingslib.utils.enums.ratings
class.b (float, default=1) – The damping factor. Valid numbers are in the range [0,1]
stats_markov_dict (Optional[Dict[str, Dict[Any, Any]]], default=None) –
A dictionary containing statistics details for the method. For instance for soccer teams rating, the following dictionary
stats_markov_dict
:stats_markov_dict = { 'TotalWins': {'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'VotingWithLosses'}, 'TotalGoals': {'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'WinnersAndLosersVotePoint'} }
specifies the following details:
TotalGoals
andTotalWins
are the names of two statistics'VOTE' : 10
means that the vote is 10. Those votes will be converted as weights. The statistics in this example are equally weighted'ITEM_I': 'FTHG'
and'ITEM_J': 'FTAG'
are the column names for home and away team respectivelyThe key
'METHOD'
specifies which method constructs the voting matrix. The available methods are:'VotingWithLosses'
when the losing team casts a number of votes equal to the margin of victory in its matchup with a stronger opponent.'WinnersAndLosersVotePoint'
when both the winning and losing teams vote with the number of points given up.'LosersVotePointDiff'
when the losing team cast a number of votes
- See also the implementation of the method
- stats
Dictionary that maps voting and stochastic arrays. The keys that starts with V map the voting matrices and with S map the stochastic matrices
- Type
Dict[str, np.ndarray]
- params
Dictionary that maps parameters to their values.
- Type
Dict[str, Optional[Dict[str, Dict[Any, Any]]]]
- stochastic_matrix
A Stochastic Markov matrix is a square matrix where each entry describes the probability that the item will vote for the respective item.
- Type
np.ndarray
- stochastic_matrix_asch
A Stochastic Markov matrix that is irreducible
- Type
np.ndarray
- pi_steady
The stationary vector or dominant eigenvector of the
stochastic_matrix
.- Type
np.ndarray
- group
Set of statistics names
- Type
Set[str]
- Raises
ValueError – Value of b ∈ [0, 1]
Examples
The following example demonstrate GeM rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.
>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES >>> from ratingslib.ratings.markov import Markov >>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES) >>> votes = { 'TW': { 'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'VotingWithLosses'}, 'TG': { 'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG', 'METHOD': 'WinnersAndLosersVotePoint'}, 'TST': { 'VOTE': 10, 'ITEM_I': 'HST', 'ITEM_J': 'AST', 'METHOD': 'WinnersAndLosersVotePoint'}, 'TS': { 'VOTE': 10, 'ITEM_I': 'HS', 'ITEM_J': 'AS', 'METHOD': 'WinnersAndLosersVotePoint'}, } >>> Markov(b=0.85, stats_markov_dict=votes).rate_from_file(filename) Item rating ranking 0 Arsenal 0.050470 11 1 Bournemouth 0.039076 15 2 Brighton 0.051460 10 3 Burnley 0.071596 2 4 Cardiff 0.024085 20 5 Chelsea 0.045033 13 6 Crystal Palace 0.037678 16 7 Everton 0.066307 3 8 Fulham 0.036356 17 9 Huddersfield 0.032164 19 10 Leicester 0.055491 7 11 Liverpool 0.056879 6 12 Man City 0.048325 12 13 Man United 0.061052 4 14 Newcastle 0.035814 18 15 Southampton 0.051716 9 16 Tottenham 0.053079 8 17 Watford 0.082788 1 18 West Ham 0.041824 14 19 Wolves 0.058807 5
References
- 1
Govan, A. Y. (2008). Ranking Theory with Application to Popular Sports. Ph.D. dissertation, North Carolina State University.
- 2
Ingram, L. C. (2007). Ranking NCAA sports teams with Linear algebra. Ranking NCAA sports teams with Linear algebra. Charleston
- 3
Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 33:107-17, 1998.
- set_group()
Set the group of statistics.
- static do_stochastic(voting_matrix: ndarray)
Normalize the rows of the voting matrix to develop a stochastic transition probability matrix.
- Parameters
voting_matrix (List[list]) –
- Returns
stochastic_matrix – Stochastic matrix built from the corresponding voting
- Return type
numpy.ndarray
- static compute(stochastic_matrix, b)
- computation_phase()
Compute the stationary vector or dominant eigenvector of the transpose of irreducible matrix. Stationary vector is the rating vector. Note: irreducible matrix is the
stochastic_matrix_asch
and stationary vector is thepi_steady
.
- create_voting_matrix(*, voting_method: Literal['VotingWithLosses', 'WinnersAndLosersVotePoint', 'LosersVotePointDiff'], data_df: DataFrame, items_df: DataFrame, col_name_home: str, col_name_away: str, columns_dict: Optional[Dict[str, Any]] = None) ndarray
Selection of method for developing voting matrix. The available methods are:
'VotingWithLosses'
when the losing team casts a number of votes equal to the margin of victory in its matchup with a stronger opponent.'WinnersAndLosersVotePoint'
when both the winning and losing teams vote with the number of points given up.'LosersVotePointDiff'
when the losing team cast a number of votes.
- preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None)
During preparation phase, voting and stochastic matrices are constructed for each statistic according to the method specified in the dictionary of attr:stats_markov_dict.
- rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) DataFrame
This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.
- Parameters
data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See
ratingslib.datasets.parameters.COLUMNS_DICT
for more details.
- Returns
items_df – The set of items with their rating and ranking.
- Return type
pandas.DataFrame
- static validate_stats_markov_dict(stats_markov_dict: dict)