ratingslib.app_movies.preparedata module

Module for data preparation for movies The dataset used 1 is obtained from: https://grouplens.org/datasets/movielens/ and it has the following details: Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018. Link: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip More details for the dataset used can be found at: https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html

create_movie_users(ratings: DataFrame, movies: DataFrame, min_votes: int = 1) → Tuple[DataFrame, DataFrame, dict, dict]

Data preparation for data movies. Creates user-movie matrix and maps movie ids to movie details. This function is based on the structure of MovieLens dataset 1

Parameters

ratings (pd.DataFrame) – DataFrame of user ratings
movies (pd.DataFrame) – DataFrame of movies
min_votes (int) – Minimum number of total votes per user

Returns

user_movie_dfpd.DataFrame: User - Movie matrix
movie_ratings_dfpd.DataFrame: User - Movie matrix with movie details
movies_dictdict: Dictionary that maps movie ids to movie attributes
id_titles_dictdict: Dictionary that maps movie ids to titles

Return type

Tuple[pd.DataFrame, pd.DataFrame, dict, dict]

References

1(1,2): F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872

Examples

>>> import pandas as pd
>>> from ratingslib.app_movies.preparedata import create_movie_users
>>> from ratingslib.datasets.filenames import MOVIES_SMALL_PATH, datasets_paths
>>> MIN_VOTES = 200
>>> filename_ratings, filename_movies = datasets_paths(MOVIES_SMALL_PATH+"ratings.csv", MOVIES_SMALL_PATH+"movies.csv")
>>> ratings_df = pd.read_csv(filename_ratings)
>>> movies_df = pd.read_csv(filename_movies)
>>> user_movie_df, mr_df, movies_dict, id_titles_dict = create_movie_users(
    ratings_df, movies_df, min_votes=MIN_VOTES)