🎬 Content-Based Movie Recommendation System

Project Overview

This project is an end-to-end Machine Learning web application that recommends the top 10 movies similar to a user's selection. The primary focus of this project was to implement a complete Data Science Lifecycle—from data collection and preprocessing to model building and web deployment.

Using Content-Based Filtering, the recommendation engine analyzes movie metadata to find contextual similarities. The final model is integrated into a clean, interactive user interface built with Streamlit.

🛠️ Tech Stack & Tools

Language: Python
Data Manipulation & Analysis: Pandas, NumPy
Machine Learning & NLP: Scikit-Learn (TfidfVectorizer, sigmoid_kernel)
Model Serialization: Joblib
Web Framework: Streamlit

⚙️ How It Works (The ML Process)

Data Collection & Cleaning: Merged and cleaned the movies.csv and credits.csv datasets, handling missing values and extracting relevant features (genres, keywords, cast, crew, and overviews).
Text Vectorization (NLP): Utilized TfidfVectorizer (Term Frequency-Inverse Document Frequency) to convert raw movie overviews and metadata into a matrix of TF-IDF features.
Similarity Computation: Applied a Sigmoid Kernel to compute pairwise similarity scores between movies based on their feature vectors.
Model Deployment: Exported the vectorized data and similarity models into .pkl files and built a Streamlit application (model_deployment.py) to serve real-time predictions.

🖥️ Movie Recommender UI

🚀 Quick Start / Installation

Clone this github repository
Install the required packages using pip
pip install -r requirements.txt
The required dataframes and pre-trained models are already saved in the dumped_obj directory. The full exploratory and training code can be found in the Jupyter notebook Movie_Recommendation_System.ipynb.
- Option 1 (Train from scratch): Re-execute the Jupyter notebook file. It will process the datasets and save the dataframes and models again in the dumped_obj directory.
- Option 2 (Run Web App directly): Continue with the saved models and run the python script written in model_deployment.py from your terminal:
  streamlit run model_deployment.py
  This command will run streamlit localhost engine and you will be navigated to Simple UI in default browser.

Note

Before executing the Jupyter Notebook and streamlit command, please make sure that your terminal is pointing to current working directory.

Dataset Information

The project utilizes two primary datasets (from TMDB 5000 Movies Dataset) containing comprehensive movie metadata.

1. credits.csv

This dataset contains information regarding the cast and crew of the movies.

Feature	Description
movie_id	A unique identifier for each movie.
cast	The names of lead and supporting actors.
crew	The names of the Director, Editor, Composer, Writer, etc.

2. movies.csv

This dataset contains metadata and performance metrics for the movies.

Feature	Description
budget	The budget in which the movie was made.
genre	The genre of the movie (Action, Comedy, Thriller, etc.).
homepage	A link to the homepage of the movie.
id	The unique identifier (matches `movie_id` in the credits dataset).
keywords	Keywords or tags related to the movie.
original_language	The language in which the movie was made.
original_title	The title of the movie before translation or adaptation.
overview	A brief description of the movie.
popularity	A numeric quantity specifying the movie's popularity.
production_companies	The production house of the movie.
production_countries	The country in which it was produced.
release_date	The date on which it was released.
revenue	The worldwide revenue generated by the movie.
runtime	The running time of the movie in minutes.
status	"Released" or "Rumored".
tagline	The movie's tagline.
title	The title of the movie.
vote_average	Average ratings the movie received.
vote_count	The count of votes received.

Thank you and happy learning! 😄

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.ipynb_checkpoints		.ipynb_checkpoints
dumped_obj		dumped_obj
Movie_Recommendation_System.ipynb		Movie_Recommendation_System.ipynb
Readme.md		Readme.md
Simple-Movie-Recommender-LocalhostUI.png		Simple-Movie-Recommender-LocalhostUI.png
model_deployment.py		model_deployment.py
requirements.txt		requirements.txt
tmdb_5000_credits.csv		tmdb_5000_credits.csv
tmdb_5000_movies.csv		tmdb_5000_movies.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Content-Based Movie Recommendation System

Project Overview

🛠️ Tech Stack & Tools

⚙️ How It Works (The ML Process)

🖥️ Movie Recommender UI

🚀 Quick Start / Installation

Dataset Information

1. credits.csv

2. movies.csv

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎬 Content-Based Movie Recommendation System

Project Overview

🛠️ Tech Stack & Tools

⚙️ How It Works (The ML Process)

🖥️ Movie Recommender UI

🚀 Quick Start / Installation

Dataset Information

1. credits.csv

2. movies.csv

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages