Movie Rankings

This example illustrates the use of rating systems for movie rankings. The dataset used 1 is obtained from: https://grouplens.org/datasets/movielens/ and it has the following details: Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018. More details for the dataset used can be found at: https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html

1
  1. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>

In the following code we demonstrate how to use the library to rate and rank movies with different methods. Finally we aggregate rating lists into one.

Python code

  1"""
  2This example illustrates the use of rating systems for movie rankings.
  3The dataset used [1]_ is obtained from: https://grouplens.org/datasets/movielens/ and it has the following details:
  4Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018.
  5Link: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
  6Readme Link: https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html
  7
  8.. [1] Harper, F. M., & Konstan, J. A. (2015, December). The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., 5. doi:10.1145/2827872
  9
 10"""
 11
 12import pandas as pd
 13from ratingslib.app_movies.preparedata import create_movie_users
 14from ratingslib.datasets.filenames import MOVIES_DATA_PATH, datasets_paths
 15from ratingslib.datasets.parse import create_pairs_data, parse_pairs_data
 16import ratingslib.ratings as rl
 17from ratingslib.utils.enums import ratings
 18from ratingslib.utils.methods import parse_columns, print_info, print_pandas
 19
 20# set minimum votes from users
 21MIN_VOTES = 200
 22# get dataset paths
 23filename_ratings, filename_movies = datasets_paths(
 24    MOVIES_DATA_PATH+"ratings.csv", MOVIES_DATA_PATH+"movies.csv")
 25
 26# load data
 27ratings_df = pd.read_csv(filename_ratings)
 28movies_df = pd.read_csv(filename_movies)
 29
 30# prepare data
 31user_movie_df, mr_df, movies_dict, id_titles_dict = create_movie_users(
 32    ratings_df, movies_df, min_votes=MIN_VOTES)
 33
 34COLUMNS_MOVIE_DICT = {
 35    'item_i': 'MovieI',
 36    'item_j': 'MovieJ',
 37    'points_i': 'RatingI',
 38    'points_j': 'RatingJ'
 39}
 40# create pairs. Create movie-movie dataframe which means that every pair is a
 41# hypothetical matchup. The columns of movie_movie dataframe are
 42# set in COLUMNS_MOVIE_DICT.
 43movie_movie_df = create_pairs_data(
 44    user_movie_df, columns_dict=COLUMNS_MOVIE_DICT)
 45# replace ids to titles
 46col_names = parse_columns(COLUMNS_MOVIE_DICT)
 47movie_movie_df.replace({col_names.item_i: id_titles_dict,
 48                        col_names.item_j: id_titles_dict}, inplace=True)
 49
 50# parse movie-movie dataframe as pairs data.
 51data_df, items_df = parse_pairs_data(
 52    movie_movie_df, columns_dict=COLUMNS_MOVIE_DICT)
 53
 54# RATE:
 55
 56# Colley method
 57colley = rl.Colley().rate(data_df, items_df,
 58                          columns_dict=COLUMNS_MOVIE_DICT, sort=True)
 59
 60# Massey method
 61massey = rl.Massey().rate(data_df, items_df,
 62                          columns_dict=COLUMNS_MOVIE_DICT, sort=True)
 63# Keener method
 64keener = rl.Keener(normalization=True).rate(data_df, items_df,
 65                                            columns_dict=COLUMNS_MOVIE_DICT,
 66                                            sort=True)
 67# Offense Defense method
 68od = rl.OffenseDefense(tol=0.0001).rate(data_df, items_df,
 69                                        columns_dict=COLUMNS_MOVIE_DICT, sort=True)
 70
 71# print rating and ranking lists
 72print_pandas(colley)
 73print_pandas(massey)
 74print_pandas(keener)
 75print_pandas(od)
 76
 77# We create a dictionary in order to compare ranking lists with Kendall Tau
 78ratings_dict = {
 79    'colley': colley.sort_values(by='Item'),
 80    'massey': massey.sort_values(by='Item'),
 81    'keener': keener.sort_values(by='Item'),
 82    'od': od.sort_values(by='Item')
 83}
 84kendall_tau_results = rl.metrics.kendall_tau_table(
 85    ratings_dict=ratings_dict, print_out=True)
 86
 87# We aggregate rating values into one rating list by applying Perron method
 88print_info("RATING AGGREGATION: PERRON")
 89ra = rl.RatingAggregation(ratings.AGGREGATIONPERRON)
 90
 91ratings_aggr_dict = {
 92    'Item': items_df['Item'].values,
 93    'colley': colley.sort_values(by='Item').rating.values,
 94    'massey': massey.sort_values(by='Item').rating.values,
 95    'keener': keener.sort_values(by='Item').rating.values,
 96    'od': od.sort_values(by='Item').rating.values
 97}
 98
 99columns_dict = {'item': 'Item',
100                'ratings': list(ratings_aggr_dict.keys()-{'Item'})}
101data_df = pd.DataFrame.from_dict(ratings_aggr_dict)
102movies_aggr_df = ra.rate(
103    data_df, items_df, columns_dict=columns_dict, sort=True)
104print_pandas(movies_aggr_df)