ratingslib.app_movies.preparedata module
Module for data preparation for movies The dataset used 1 is obtained from: https://grouplens.org/datasets/movielens/ and it has the following details: Small: 100,000 ratings and 3,600 tag applications applied to 9,000 movies by 600 users. Last updated 9/2018. Link: https://files.grouplens.org/datasets/movielens/ml-latest-small.zip More details for the dataset used can be found at: https://files.grouplens.org/datasets/movielens/ml-latest-small-README.html
- create_movie_users(ratings: DataFrame, movies: DataFrame, min_votes: int = 1) Tuple[DataFrame, DataFrame, dict, dict]
Data preparation for data movies. Creates user-movie matrix and maps movie ids to movie details. This function is based on the structure of MovieLens dataset 1
- Parameters
ratings (pd.DataFrame) – DataFrame of user ratings
movies (pd.DataFrame) – DataFrame of movies
min_votes (int) – Minimum number of total votes per user
- Returns
- user_movie_dfpd.DataFrame
User - Movie matrix
- movie_ratings_dfpd.DataFrame
User - Movie matrix with movie details
- movies_dictdict
Dictionary that maps movie ids to movie attributes
- id_titles_dictdict
Dictionary that maps movie ids to titles
- Return type
Tuple[pd.DataFrame, pd.DataFrame, dict, dict]
References
- 1(1,2)
F. Maxwell Harper and Joseph A. Konstan. 2015. The MovieLens Datasets: History and Context. ACM Transactions on Interactive Intelligent Systems (TiiS) 5, 4: 19:1–19:19. <https://doi.org/10.1145/2827872
Examples
>>> import pandas as pd >>> from ratingslib.app_movies.preparedata import create_movie_users >>> from ratingslib.datasets.filenames import MOVIES_SMALL_PATH, datasets_paths >>> MIN_VOTES = 200 >>> filename_ratings, filename_movies = datasets_paths(MOVIES_SMALL_PATH+"ratings.csv", MOVIES_SMALL_PATH+"movies.csv") >>> ratings_df = pd.read_csv(filename_ratings) >>> movies_df = pd.read_csv(filename_movies) >>> user_movie_df, mr_df, movies_dict, id_titles_dict = create_movie_users( ratings_df, movies_df, min_votes=MIN_VOTES)