Skip to content

monil37/Movie-Recommender-System

Repository files navigation

🎬 Content-Based Movie Recommendation System

Project Overview

This project is an end-to-end Machine Learning web application that recommends the top 10 movies similar to a user's selection. The primary focus of this project was to implement a complete Data Science Lifecycle—from data collection and preprocessing to model building and web deployment.

Using Content-Based Filtering, the recommendation engine analyzes movie metadata to find contextual similarities. The final model is integrated into a clean, interactive user interface built with Streamlit.

🛠️ Tech Stack & Tools

  • Language: Python
  • Data Manipulation & Analysis: Pandas, NumPy
  • Machine Learning & NLP: Scikit-Learn (TfidfVectorizer, sigmoid_kernel)
  • Model Serialization: Joblib
  • Web Framework: Streamlit

⚙️ How It Works (The ML Process)

  1. Data Collection & Cleaning: Merged and cleaned the movies.csv and credits.csv datasets, handling missing values and extracting relevant features (genres, keywords, cast, crew, and overviews).
  2. Text Vectorization (NLP): Utilized TfidfVectorizer (Term Frequency-Inverse Document Frequency) to convert raw movie overviews and metadata into a matrix of TF-IDF features.
  3. Similarity Computation: Applied a Sigmoid Kernel to compute pairwise similarity scores between movies based on their feature vectors.
  4. Model Deployment: Exported the vectorized data and similarity models into .pkl files and built a Streamlit application (model_deployment.py) to serve real-time predictions.

🖥️ Movie Recommender UI

Movie Recommender System


🚀 Quick Start / Installation

  1. Clone this github repository

  2. Install the required packages using pip
    pip install -r requirements.txt

  3. The required dataframes and pre-trained models are already saved in the dumped_obj directory. The full exploratory and training code can be found in the Jupyter notebook Movie_Recommendation_System.ipynb.

    • Option 1 (Train from scratch): Re-execute the Jupyter notebook file. It will process the datasets and save the dataframes and models again in the dumped_obj directory.

    • Option 2 (Run Web App directly): Continue with the saved models and run the python script written in model_deployment.py from your terminal:
      streamlit run model_deployment.py
      This command will run streamlit localhost engine and you will be navigated to Simple UI in default browser.

Note

Before executing the Jupyter Notebook and streamlit command, please make sure that your terminal is pointing to current working directory.


Dataset Information

The project utilizes two primary datasets (from TMDB 5000 Movies Dataset) containing comprehensive movie metadata.


1. credits.csv

This dataset contains information regarding the cast and crew of the movies.

Feature Description
movie_id A unique identifier for each movie.
cast The names of lead and supporting actors.
crew The names of the Director, Editor, Composer, Writer, etc.

2. movies.csv

This dataset contains metadata and performance metrics for the movies.

Feature Description
budget The budget in which the movie was made.
genre The genre of the movie (Action, Comedy, Thriller, etc.).
homepage A link to the homepage of the movie.
id The unique identifier (matches movie_id in the credits dataset).
keywords Keywords or tags related to the movie.
original_language The language in which the movie was made.
original_title The title of the movie before translation or adaptation.
overview A brief description of the movie.
popularity A numeric quantity specifying the movie's popularity.
production_companies The production house of the movie.
production_countries The country in which it was produced.
release_date The date on which it was released.
revenue The worldwide revenue generated by the movie.
runtime The running time of the movie in minutes.
status "Released" or "Rumored".
tagline The movie's tagline.
title The title of the movie.
vote_average Average ratings the movie received.
vote_count The count of votes received.

Thank you and happy learning! 😄

About

Designed a Movie Recommender system which suggest Top 10 movies similar to what user has selected. Applied TF-IDF and Sigmoid kernel to compute similarity between movie summaries. Deploy the model using streamlit package to understand how model is executed at the backend.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors