Cinematic AI is a machine learning-powered web application that recommends movies based on content similarity. Unlike simple genre filters, this system analyzes the nuance of a film by processing its overview, genres, keywords, cast, and crew using Natural Language Processing (NLP) techniques.
The recommendation engine uses a Content-Based Filtering approach:
- Data Preprocessing: Clean and combine movie features (overview, genres, keywords, top cast, director) into a single "tags" string per movie. Apply stemming for better text matching.
- Vectorization: Convert text tags into numerical vectors using
CountVectorizerfrom scikit-learn (limited to top 5,000 features, ignoring stop words). - Similarity Calculation: Compute Cosine Similarity between movie vectors. Cosine similarity measures the angle between vectors—closer to 1 means more similar.
- Recommendation: For a selected movie, find the top 5 most similar movies (excluding itself) and fetch their posters via the TMDB API.
This approach recommends movies with similar content, even if they've been watched by different audiences.
- Netflix-Style UI: Dark mode theme with glassmorphism effects, hover animations, and custom CSS for a modern look.
- Live Poster Fetching: Real-time movie posters and details pulled from the TMDB API.
- Smart Search: Autocomplete-enabled search/select for quick movie lookup (over ~10,000 movies).
- Responsive Design: Grid layout adapts to desktop and mobile screens.
- Fast Recommendations: Pre-computed similarity matrix for instant results.
- Frontend: Streamlit with custom CSS injection
- Data Processing: Pandas, NumPy, NLTK (for stemming)
- Machine Learning: Scikit-learn (CountVectorizer, Cosine Similarity)
- API Integration: Requests library for TMDB API
- Environment Management: python-dotenv for secure API key handling
This project uses the TMDB Top 10,000 Movies Dataset (updated till 2025) from Kaggle:
The dataset is a single CSV file containing metadata for ~10,000 popular movies, including overviews, genres, cast, crew, keywords, and more.
Note: The raw dataset CSV, along with the pre-computed movies_dict.pkl and similarity.pkl files, are not included in this repository due to size limits. You will need to download the dataset and generate the pickle files yourself (see setup steps below).
- Python 3.8 or higher
- Git
git clone https://github.com/YOUR_USERNAME/movie-recommender.git
cd movie-recommenderpython -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activatepip install -r requirements.txtIf requirements.txt is missing, install core packages:
pip install streamlit pandas numpy scikit-learn nltk requests python-dotenvDownload NLTK data (run in Python):
import nltk
nltk.download('punkt')Download the CSV file from the Kaggle link above and place it in the project root directory (or adjust the path in analysis.ipynb).
The pre-computed files (movies_dict.pkl and similarity.pkl) are not included due to size limits.
- Open
analysis.ipynbin Jupyter Notebook or VS Code. - Update the code to load your downloaded CSV file.
- Run all cells to:
- Load and process the dataset
- Preprocess tags
- Vectorize and compute similarity matrix
- Save pickle files
- Sign up/log in at https://www.themoviedb.org/
- Go to Settings > API > Request an API Key (free for non-commercial use).
- Create a
.envfile in the root directory:
TMDB_API_KEY=your_api_key_here
Never commit your API key to version control!
streamlit run app.pyThe app will open in your browser at http://localhost:8501.
movie-recommender/
├── app.py # Main Streamlit application
├── analysis.ipynb # Data preprocessing and model generation notebook
├── style.css # Custom CSS for dark theme and UI enhancements
├── .env # API key (gitignored)
├── requirements.txt # Project dependencies
├── movies_dict.pkl # Processed movie data (generated, not in repo)
├── similarity.pkl # Pre-computed similarity matrix (generated, not in repo)
├── tmdb_top_10000_movies.csv # Dataset CSV (download separately, not in repo)
└── README.md # This file
- Inspired by tutorials like CampusX's Movie Recommender System.
- Dataset and posters courtesy of The Movie Database (TMDB).
- Built with ❤️ using open-source tools.
Enjoy discovering your next favorite movie! 🍿