Skip to content

HyperCE/netflix-content-analysis

Repository files navigation

🎬 Netflix Content Analysis Using AI/ML

Python Jupyter License

A comprehensive data science project analyzing 16,000+ Netflix movies using machine learning techniques to identify regional genre preferences, predict movie success factors, and build AI-based recommendation systems.

📋 Table of Contents

🎯 Overview

This project performs end-to-end data analysis and machine learning on Netflix's movie catalog to extract actionable insights for content producers, platform managers, and viewers. The analysis covers:

  • Exploratory Data Analysis (EDA) - Genre distributions, regional preferences, temporal trends
  • Correlation Analysis - Relationships between budget, revenue, ratings, and popularity
  • Machine Learning Models - Revenue prediction and success classification
  • Recommendation System - Content-based filtering for personalized suggestions

📊 Dataset

Metric Value
Total Movies 16,000
Time Period 2010 - 2025
Languages 74
Countries 114
Features 18

Key Features:

  • title, director, cast - Movie information
  • country, language - Regional data
  • genres - Category classification
  • budget, revenue - Financial metrics
  • vote_average, vote_count, popularity - Audience metrics
  • release_year - Temporal information

✨ Features

1. Data Preprocessing

  • Missing value handling
  • Feature extraction (primary genre, primary country)
  • ROI calculation
  • Success labeling (rating ≥ 7.0)

2. Exploratory Data Analysis

  • Genre distribution visualization
  • Regional preference analysis
  • Financial performance by genre

3. Machine Learning Models

Model 1: Revenue Prediction (Regression)

  • Algorithm: Random Forest Regressor
  • Features: budget, ratings, popularity, genre, country
  • Metric: R² Score

Model 2: Success Classification

  • Algorithm: Random Forest Classifier
  • Target: Binary (Successful/Not Successful)
  • Metric: Accuracy, Precision, Recall, F1-Score

4. AI Recommendation System

  • Content-based filtering
  • Filters: genre, country, minimum rating
  • Weighted scoring: rating (40%) + popularity (30%) + vote count (30%)

🚀 Installation

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Setup

  1. Clone the repository
git clone https://github.com/YOUR_USERNAME/netflix-content-analysis.git
cd netflix-content-analysis
  1. Install dependencies
pip install -r requirements.txt
  1. Launch Jupyter Notebook
jupyter notebook Netflix_Analysis_Project.ipynb

💻 Usage

  1. Open Netflix_Analysis_Project.ipynb in Jupyter Notebook
  2. Run cells sequentially using Shift + Enter
  3. Follow the analysis flow:
    • Data Loading → Preprocessing → EDA → Correlation → ML Models → Recommendations

Quick Start

# Load and analyze data
import pandas as pd
df = pd.read_csv('netflix_movies_detailed_up_to_2025.csv')

# Get recommendations
recommend_movies(genre='Animation', country='Japan', min_rating=7.5, top_n=5)

📁 Project Structure

netflix-content-analysis/
│
├── Netflix_Analysis_Project.ipynb    # Main Jupyter notebook
├── netflix_movies_detailed_up_to_2025.csv  # Dataset
├── requirements.txt                   # Python dependencies
├── README.md                          # Project documentation
├── LICENSE                            # MIT License
└── .gitignore                         # Git ignore file

📈 Results

Key Findings

Finding Value
Dominant Genre Drama (22.7% of total)
Budget-Revenue Correlation r = 0.75 (Strong)
Budget-Rating Correlation r = 0.10 (Weak)
Japan's Top Genre Animation (38.7%)
South Korea's Top Genre Drama (31.3%)
China's Top Genre Action (36.6%)

Model Performance

Model Metric Score
Revenue Prediction R² Score ~0.85
Success Classification Accuracy ~0.73

Strategic Recommendations

  • For High ROI: Invest in Animation/Horror with controlled budgets
  • For High Revenue: Focus on Action-Adventure-SciFi blockbusters
  • For High Quality: Partner with Asian studios (Japan, Korea)

🛠 Technologies Used

  • Python 3.x - Programming language
  • Pandas - Data manipulation
  • NumPy - Numerical computing
  • Matplotlib & Seaborn - Data visualization
  • Scikit-learn - Machine learning
  • Jupyter Notebook - Interactive development

👤 Author

Ahmet Burak Güvercin

  • Student ID: 190316073
  • Project: Grand Project I
  • University: Celal Bayar University - Computer Engineering

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

  • Netflix for inspiration
  • TMDB for movie data
  • Scikit-learn documentation
  • Kaggle community

⭐ If you found this project helpful, please give it a star!

About

Netflix Content Analysis Using AI/ML - Grand Project I

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors