Movie Rankings
This example illustrates the use of rating systems for movie rankings. The dataset used 1 is obtained from: https://grouplens.org/datasets/movielens/ and it has the following details: Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018. More details for the dataset used can be found at: https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html
- 1
Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872>
In the following code we demonstrate how to use the library to rate and rank movies with different methods. Finally we aggregate rating lists into one.
Python code
1"""
2This example illustrates the use of rating systems for movie rankings.
3The dataset used [1]_ is obtained from: https://grouplens.org/datasets/movielens/ and it has the following details:
4Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018.
5Link: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip
6Readme Link: https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html
7
8.. [1] Harper, F. M., & Konstan, J. A. (2015, December). The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst., 5. doi:10.1145/2827872
9
10"""
11
12import pandas as pd
13from ratingslib.app_movies.preparedata import create_movie_users
14from ratingslib.datasets.filenames import MOVIES_DATA_PATH, datasets_paths
15from ratingslib.datasets.parse import create_pairs_data, parse_pairs_data
16import ratingslib.ratings as rl
17from ratingslib.utils.enums import ratings
18from ratingslib.utils.methods import parse_columns, print_info, print_pandas
19
20# set minimum votes from users
21MIN_VOTES = 200
22# get dataset paths
23filename_ratings, filename_movies = datasets_paths(
24 MOVIES_DATA_PATH+"ratings.csv", MOVIES_DATA_PATH+"movies.csv")
25
26# load data
27ratings_df = pd.read_csv(filename_ratings)
28movies_df = pd.read_csv(filename_movies)
29
30# prepare data
31user_movie_df, mr_df, movies_dict, id_titles_dict = create_movie_users(
32 ratings_df, movies_df, min_votes=MIN_VOTES)
33
34COLUMNS_MOVIE_DICT = {
35 'item_i': 'MovieI',
36 'item_j': 'MovieJ',
37 'points_i': 'RatingI',
38 'points_j': 'RatingJ'
39}
40# create pairs. Create movie-movie dataframe which means that every pair is a
41# hypothetical matchup. The columns of movie_movie dataframe are
42# set in COLUMNS_MOVIE_DICT.
43movie_movie_df = create_pairs_data(
44 user_movie_df, columns_dict=COLUMNS_MOVIE_DICT)
45# replace ids to titles
46col_names = parse_columns(COLUMNS_MOVIE_DICT)
47movie_movie_df.replace({col_names.item_i: id_titles_dict,
48 col_names.item_j: id_titles_dict}, inplace=True)
49
50# parse movie-movie dataframe as pairs data.
51data_df, items_df = parse_pairs_data(
52 movie_movie_df, columns_dict=COLUMNS_MOVIE_DICT)
53
54# RATE:
55
56# Colley method
57colley = rl.Colley().rate(data_df, items_df,
58 columns_dict=COLUMNS_MOVIE_DICT, sort=True)
59
60# Massey method
61massey = rl.Massey().rate(data_df, items_df,
62 columns_dict=COLUMNS_MOVIE_DICT, sort=True)
63# Keener method
64keener = rl.Keener(normalization=True).rate(data_df, items_df,
65 columns_dict=COLUMNS_MOVIE_DICT,
66 sort=True)
67# Offense Defense method
68od = rl.OffenseDefense(tol=0.0001).rate(data_df, items_df,
69 columns_dict=COLUMNS_MOVIE_DICT, sort=True)
70
71# print rating and ranking lists
72print_pandas(colley)
73print_pandas(massey)
74print_pandas(keener)
75print_pandas(od)
76
77# We create a dictionary in order to compare ranking lists with Kendall Tau
78ratings_dict = {
79 'colley': colley.sort_values(by='Item'),
80 'massey': massey.sort_values(by='Item'),
81 'keener': keener.sort_values(by='Item'),
82 'od': od.sort_values(by='Item')
83}
84kendall_tau_results = rl.metrics.kendall_tau_table(
85 ratings_dict=ratings_dict, print_out=True)
86
87# We aggregate rating values into one rating list by applying Perron method
88print_info("RATING AGGREGATION: PERRON")
89ra = rl.RatingAggregation(ratings.AGGREGATIONPERRON)
90
91ratings_aggr_dict = {
92 'Item': items_df['Item'].values,
93 'colley': colley.sort_values(by='Item').rating.values,
94 'massey': massey.sort_values(by='Item').rating.values,
95 'keener': keener.sort_values(by='Item').rating.values,
96 'od': od.sort_values(by='Item').rating.values
97}
98
99columns_dict = {'item': 'Item',
100 'ratings': list(ratings_aggr_dict.keys()-{'Item'})}
101data_df = pd.DataFrame.from_dict(ratings_aggr_dict)
102movies_aggr_df = ra.rate(
103 data_df, items_df, columns_dict=columns_dict, sort=True)
104print_pandas(movies_aggr_df)