Cinematic AI: Content-Based Movie Recommender

Cinematic AI is a machine learning-powered web application that recommends movies based on content similarity. Unlike simple genre filters, this system analyzes the nuance of a film by processing its overview, genres, keywords, cast, and crew using Natural Language Processing (NLP) techniques.

🧠 How It Works

The recommendation engine uses a Content-Based Filtering approach:

Data Preprocessing: Clean and combine movie features (overview, genres, keywords, top cast, director) into a single "tags" string per movie. Apply stemming for better text matching.
Vectorization: Convert text tags into numerical vectors using CountVectorizer from scikit-learn (limited to top 5,000 features, ignoring stop words).
Similarity Calculation: Compute Cosine Similarity between movie vectors. Cosine similarity measures the angle between vectors—closer to 1 means more similar.
Recommendation: For a selected movie, find the top 5 most similar movies (excluding itself) and fetch their posters via the TMDB API.

This approach recommends movies with similar content, even if they've been watched by different audiences.

✨ Key Features

Netflix-Style UI: Dark mode theme with glassmorphism effects, hover animations, and custom CSS for a modern look.
Live Poster Fetching: Real-time movie posters and details pulled from the TMDB API.
Smart Search: Autocomplete-enabled search/select for quick movie lookup (over ~10,000 movies).
Responsive Design: Grid layout adapts to desktop and mobile screens.
Fast Recommendations: Pre-computed similarity matrix for instant results.

🛠️ Tech Stack

Frontend: Streamlit with custom CSS injection
Data Processing: Pandas, NumPy, NLTK (for stemming)
Machine Learning: Scikit-learn (CountVectorizer, Cosine Similarity)
API Integration: Requests library for TMDB API
Environment Management: python-dotenv for secure API key handling

📊 Dataset

This project uses the TMDB Top 10,000 Movies Dataset (updated till 2025) from Kaggle:

Download: https://www.kaggle.com/datasets/pankajmaulekhi/tmdb-top-10000-movies-updated-till-2025

The dataset is a single CSV file containing metadata for ~10,000 popular movies, including overviews, genres, cast, crew, keywords, and more.

Note: The raw dataset CSV, along with the pre-computed movies_dict.pkl and similarity.pkl files, are not included in this repository due to size limits. You will need to download the dataset and generate the pickle files yourself (see setup steps below).

🚀 Installation & Setup

Prerequisites

Python 3.8 or higher
Git

1. Clone the Repository

git clone https://github.com/YOUR_USERNAME/movie-recommender.git
cd movie-recommender

2. Create a Virtual Environment (Recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

If requirements.txt is missing, install core packages:

pip install streamlit pandas numpy scikit-learn nltk requests python-dotenv

Download NLTK data (run in Python):

import nltk
nltk.download('punkt')

4. Download the Dataset

Download the CSV file from the Kaggle link above and place it in the project root directory (or adjust the path in analysis.ipynb).

5. Generate Processed Data and Models

The pre-computed files (movies_dict.pkl and similarity.pkl) are not included due to size limits.

Open analysis.ipynb in Jupyter Notebook or VS Code.
Update the code to load your downloaded CSV file.
Run all cells to:
- Load and process the dataset
- Preprocess tags
- Vectorize and compute similarity matrix
- Save pickle files

6. Set Up TMDB API Key

Sign up/log in at https://www.themoviedb.org/
Go to Settings > API > Request an API Key (free for non-commercial use).
Create a .env file in the root directory:

TMDB_API_KEY=your_api_key_here

Never commit your API key to version control!

7. Run the App Locally

streamlit run app.py

The app will open in your browser at http://localhost:8501.

📂 Project Structure

movie-recommender/
├── app.py                          # Main Streamlit application
├── analysis.ipynb                  # Data preprocessing and model generation notebook
├── style.css                       # Custom CSS for dark theme and UI enhancements
├── .env                            # API key (gitignored)
├── requirements.txt                # Project dependencies
├── movies_dict.pkl                 # Processed movie data (generated, not in repo)
├── similarity.pkl                  # Pre-computed similarity matrix (generated, not in repo)
├── tmdb_top_10000_movies.csv        # Dataset CSV (download separately, not in repo)
└── README.md                       # This file

🙏 Acknowledgments

Inspired by tutorials like CampusX's Movie Recommender System.
Dataset and posters courtesy of The Movie Database (TMDB).
Built with ❤️ using open-source tools.

Enjoy discovering your next favorite movie! 🍿

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cinematic AI: Content-Based Movie Recommender

🧠 How It Works

✨ Key Features

🛠️ Tech Stack

📊 Dataset

🚀 Installation & Setup

Prerequisites

1. Clone the Repository

2. Create a Virtual Environment (Recommended)

3. Install Dependencies

4. Download the Dataset

5. Generate Processed Data and Models

6. Set Up TMDB API Key

7. Run the App Locally

📂 Project Structure

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
analysis.ipynb		analysis.ipynb
app.py		app.py
style.css		style.css

Folders and files

Latest commit

History

Repository files navigation

Cinematic AI: Content-Based Movie Recommender

🧠 How It Works

✨ Key Features

🛠️ Tech Stack

📊 Dataset

🚀 Installation & Setup

Prerequisites

1. Clone the Repository

2. Create a Virtual Environment (Recommended)

3. Install Dependencies

4. Download the Dataset

5. Generate Processed Data and Models

6. Set Up TMDB API Key

7. Run the App Locally

📂 Project Structure

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages