Skip to content

thanay2007/Movie-recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Cinematic AI: Content-Based Movie Recommender

Python Streamlit TMDB API

Cinematic AI is a machine learning-powered web application that recommends movies based on content similarity. Unlike simple genre filters, this system analyzes the nuance of a film by processing its overview, genres, keywords, cast, and crew using Natural Language Processing (NLP) techniques.

🧠 How It Works

The recommendation engine uses a Content-Based Filtering approach:

  1. Data Preprocessing: Clean and combine movie features (overview, genres, keywords, top cast, director) into a single "tags" string per movie. Apply stemming for better text matching.
  2. Vectorization: Convert text tags into numerical vectors using CountVectorizer from scikit-learn (limited to top 5,000 features, ignoring stop words).
  3. Similarity Calculation: Compute Cosine Similarity between movie vectors. Cosine similarity measures the angle between vectors—closer to 1 means more similar.
  4. Recommendation: For a selected movie, find the top 5 most similar movies (excluding itself) and fetch their posters via the TMDB API.

This approach recommends movies with similar content, even if they've been watched by different audiences.

✨ Key Features

  • Netflix-Style UI: Dark mode theme with glassmorphism effects, hover animations, and custom CSS for a modern look.
  • Live Poster Fetching: Real-time movie posters and details pulled from the TMDB API.
  • Smart Search: Autocomplete-enabled search/select for quick movie lookup (over ~10,000 movies).
  • Responsive Design: Grid layout adapts to desktop and mobile screens.
  • Fast Recommendations: Pre-computed similarity matrix for instant results.

🛠️ Tech Stack

  • Frontend: Streamlit with custom CSS injection
  • Data Processing: Pandas, NumPy, NLTK (for stemming)
  • Machine Learning: Scikit-learn (CountVectorizer, Cosine Similarity)
  • API Integration: Requests library for TMDB API
  • Environment Management: python-dotenv for secure API key handling

📊 Dataset

This project uses the TMDB Top 10,000 Movies Dataset (updated till 2025) from Kaggle:

The dataset is a single CSV file containing metadata for ~10,000 popular movies, including overviews, genres, cast, crew, keywords, and more.

Note: The raw dataset CSV, along with the pre-computed movies_dict.pkl and similarity.pkl files, are not included in this repository due to size limits. You will need to download the dataset and generate the pickle files yourself (see setup steps below).

🚀 Installation & Setup

Prerequisites

  • Python 3.8 or higher
  • Git

1. Clone the Repository

git clone https://github.com/YOUR_USERNAME/movie-recommender.git
cd movie-recommender

2. Create a Virtual Environment (Recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

3. Install Dependencies

pip install -r requirements.txt

If requirements.txt is missing, install core packages:

pip install streamlit pandas numpy scikit-learn nltk requests python-dotenv

Download NLTK data (run in Python):

import nltk
nltk.download('punkt')

4. Download the Dataset

Download the CSV file from the Kaggle link above and place it in the project root directory (or adjust the path in analysis.ipynb).

5. Generate Processed Data and Models

The pre-computed files (movies_dict.pkl and similarity.pkl) are not included due to size limits.

  1. Open analysis.ipynb in Jupyter Notebook or VS Code.
  2. Update the code to load your downloaded CSV file.
  3. Run all cells to:
    • Load and process the dataset
    • Preprocess tags
    • Vectorize and compute similarity matrix
    • Save pickle files

6. Set Up TMDB API Key

  1. Sign up/log in at https://www.themoviedb.org/
  2. Go to Settings > API > Request an API Key (free for non-commercial use).
  3. Create a .env file in the root directory:
TMDB_API_KEY=your_api_key_here

Never commit your API key to version control!

7. Run the App Locally

streamlit run app.py

The app will open in your browser at http://localhost:8501.

📂 Project Structure

movie-recommender/
├── app.py                          # Main Streamlit application
├── analysis.ipynb                  # Data preprocessing and model generation notebook
├── style.css                       # Custom CSS for dark theme and UI enhancements
├── .env                            # API key (gitignored)
├── requirements.txt                # Project dependencies
├── movies_dict.pkl                 # Processed movie data (generated, not in repo)
├── similarity.pkl                  # Pre-computed similarity matrix (generated, not in repo)
├── tmdb_top_10000_movies.csv        # Dataset CSV (download separately, not in repo)
└── README.md                       # This file

🙏 Acknowledgments

  • Inspired by tutorials like CampusX's Movie Recommender System.
  • Dataset and posters courtesy of The Movie Database (TMDB).
  • Built with ❤️ using open-source tools.

Enjoy discovering your next favorite movie! 🍿

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors