ratingslib.ratings.markov module

Markov (GeM) Rating System

class Markov(*, version=ratings.MARKOV, b: float = 1, stats_markov_dict: Optional[Union[Dict[str, dict], Set[str]]] = None)

Bases: RatingSystem

This class implements the Markov (GeM - Generalized Markov Method) rating system. GeM was first used by graduate students, Angela Govan 1 and Luke Ingram 2 to successfully rank NFL football and NCAA basketball teams respectively. The Markov (GeM) method is related to the famous PageRank method 3 and it uses parts of finite Markov chains and graph theory in order to generate ratings of n objects in a finite set. Not only sports but also any problem that can be represented as a weighted directed graph can be solved using GeM model.

Parameters

version (str, default=ratings.MARKOV) – A string that shows the version of rating system. The available versions can be found in ratingslib.utils.enums.ratings class.
b (float, default=1) – The damping factor. Valid numbers are in the range [0,1]
stats_markov_dict (Optional[Dict[str, Dict[Any, Any]]], default=None) –
A dictionary containing statistics details for the method. For instance for soccer teams rating, the following dictionary stats_markov_dict:
```
stats_markov_dict = {
'TotalWins': {'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG',
               'METHOD': 'VotingWithLosses'},
'TotalGoals': {'VOTE': 10, 'ITEM_I': 'FTHG', 'ITEM_J': 'FTAG',
                'METHOD': 'WinnersAndLosersVotePoint'}
}
```
specifies the following details:
- TotalGoals and TotalWins are the names of two statistics
- 'VOTE' : 10 means that the vote is 10. Those votes will be converted as weights. The statistics in this example are equally weighted
- 'ITEM_I': 'FTHG' and 'ITEM_J': 'FTAG' are the column names for home and away team respectively
- The key 'METHOD' specifies which method constructs the voting matrix. The available methods are:
  1. 'VotingWithLosses' when the losing team casts a number of votes equal to the margin of victory in its matchup with a stronger opponent.
  2. 'WinnersAndLosersVotePoint' when both the winning and losing teams vote with the number of points given up.
  3. 'LosersVotePointDiff' when the losing team cast a number of votes
See also the implementation of the method
create_voting_matrix()

stats

Dictionary that maps voting and stochastic arrays. The keys that starts with V map the voting matrices and with S map the stochastic matrices

Type: Dict[str, np.ndarray]

params

Dictionary that maps parameters to their values.

Type: Dict[str, Optional[Dict[str, Dict[Any, Any]]]]

stochastic_matrix

A Stochastic Markov matrix is a square matrix where each entry describes the probability that the item will vote for the respective item.

Type: np.ndarray

stochastic_matrix_asch

A Stochastic Markov matrix that is irreducible

Type: np.ndarray

pi_steady

The stationary vector or dominant eigenvector of the stochastic_matrix.

Type: np.ndarray

group

Set of statistics names

Type: Set[str]

Raises: ValueError – Value of b ∈ [0, 1]

Examples

The following example demonstrate GeM rating system, for the 20 first soccer matches that took place during the 2018-2019 season of English Premier League.

>>> from ratingslib.datasets.filenames import dataset_path, FILENAME_EPL_2018_2019_20_GAMES
>>> from ratingslib.ratings.markov import Markov
>>> filename = dataset_path(FILENAME_EPL_2018_2019_20_GAMES)
>>> votes = {
        'TW': {
            'VOTE': 10,
            'ITEM_I': 'FTHG',
            'ITEM_J': 'FTAG',
            'METHOD': 'VotingWithLosses'},
        'TG': {
            'VOTE': 10,
            'ITEM_I': 'FTHG',
            'ITEM_J': 'FTAG',
            'METHOD': 'WinnersAndLosersVotePoint'},
        'TST': {
            'VOTE': 10,
            'ITEM_I': 'HST',
            'ITEM_J': 'AST',
            'METHOD': 'WinnersAndLosersVotePoint'},
        'TS': {
            'VOTE': 10,
            'ITEM_I': 'HS',
            'ITEM_J': 'AS',
            'METHOD': 'WinnersAndLosersVotePoint'},
    }
>>> Markov(b=0.85, stats_markov_dict=votes).rate_from_file(filename)
                Item    rating  ranking
    0          Arsenal  0.050470       11
    1      Bournemouth  0.039076       15
    2         Brighton  0.051460       10
    3          Burnley  0.071596        2
    4          Cardiff  0.024085       20
    5          Chelsea  0.045033       13
    6   Crystal Palace  0.037678       16
    7          Everton  0.066307        3
    8           Fulham  0.036356       17
    9     Huddersfield  0.032164       19
    10       Leicester  0.055491        7
    11       Liverpool  0.056879        6
    12        Man City  0.048325       12
    13      Man United  0.061052        4
    14       Newcastle  0.035814       18
    15     Southampton  0.051716        9
    16       Tottenham  0.053079        8
    17         Watford  0.082788        1
    18        West Ham  0.041824       14
    19          Wolves  0.058807        5

References

1: Govan, A. Y. (2008). Ranking Theory with Application to Popular Sports. Ph.D. dissertation, North Carolina State University.
2: Ingram, L. C. (2007). Ranking NCAA sports teams with Linear algebra. Ranking NCAA sports teams with Linear algebra. Charleston
3: Sergey Brin and Lawrence Page. The Anatomy of a Large-Scale Hypertextual Web Search Engine. Computer Networks and ISDN Systems, 33:107-17, 1998.

set_group(): Set the group of statistics.

static do_stochastic(voting_matrix: ndarray)

Normalize the rows of the voting matrix to develop a stochastic transition probability matrix.

Parameters: voting_matrix (List[list]) –
Returns: stochastic_matrix – Stochastic matrix built from the corresponding voting
Return type: numpy.ndarray

static compute(stochastic_matrix, b)

computation_phase(): Compute the stationary vector or dominant eigenvector of the transpose of irreducible matrix. Stationary vector is the rating vector. Note: irreducible matrix is the stochastic_matrix_asch and stationary vector is the pi_steady.

create_voting_matrix(*, voting_method: Literal['VotingWithLosses', 'WinnersAndLosersVotePoint', 'LosersVotePointDiff'], data_df: DataFrame, items_df: DataFrame, col_name_home: str, col_name_away: str, columns_dict: Optional[Dict[str, Any]] = None) → ndarray

Selection of method for developing voting matrix. The available methods are:

'VotingWithLosses' when the losing team casts a number of votes equal to the margin of victory in its matchup with a stronger opponent.

'WinnersAndLosersVotePoint' when both the winning and losing teams vote with the number of points given up.

'LosersVotePointDiff' when the losing team cast a number of votes.

preparation_phase(data_df: DataFrame, items_df: DataFrame, columns_dict: Optional[Dict[str, Any]] = None): During preparation phase, voting and stochastic matrices are constructed for each statistic according to the method specified in the dictionary of attr:stats_markov_dict.

rate(data_df: DataFrame, items_df: DataFrame, sort: bool = False, columns_dict: Optional[Dict[str, Any]] = None) → DataFrame

This method computes ratings for a pairwise data. (e.g. soccer teams games). To be overridden in subclasses.

Parameters

data_df (pandas.DataFrame) – The pairwise data.
items_df (pandas.DataFrame) – Set of items (e.g. teams) to be rated
sort (bool, default=True.) – If true, the output is sorted by rating value
columns_dict (Optional[Dict[str, str]]) – The column names of data file. See ratingslib.datasets.parameters.COLUMNS_DICT for more details.

Returns

items_df – The set of items with their rating and ranking.

Return type

pandas.DataFrame

static validate_stats_markov_dict(stats_markov_dict: dict)