🎬 Netflix Content Analysis Using AI/ML

A comprehensive data science project analyzing 16,000+ Netflix movies using machine learning techniques to identify regional genre preferences, predict movie success factors, and build AI-based recommendation systems.

📋 Table of Contents

Overview
Dataset
Features
Installation
Usage
Project Structure
Results
Technologies Used
Author

🎯 Overview

This project performs end-to-end data analysis and machine learning on Netflix's movie catalog to extract actionable insights for content producers, platform managers, and viewers. The analysis covers:

Exploratory Data Analysis (EDA) - Genre distributions, regional preferences, temporal trends
Correlation Analysis - Relationships between budget, revenue, ratings, and popularity
Machine Learning Models - Revenue prediction and success classification
Recommendation System - Content-based filtering for personalized suggestions

📊 Dataset

Metric	Value
Total Movies	16,000
Time Period	2010 - 2025
Languages	74
Countries	114
Features	18

Key Features:

title, director, cast - Movie information
country, language - Regional data
genres - Category classification
budget, revenue - Financial metrics
vote_average, vote_count, popularity - Audience metrics
release_year - Temporal information

✨ Features

1. Data Preprocessing

Missing value handling
Feature extraction (primary genre, primary country)
ROI calculation
Success labeling (rating ≥ 7.0)

2. Exploratory Data Analysis

Genre distribution visualization
Regional preference analysis
Financial performance by genre

3. Machine Learning Models

Model 1: Revenue Prediction (Regression)

Algorithm: Random Forest Regressor
Features: budget, ratings, popularity, genre, country
Metric: R² Score

Model 2: Success Classification

Algorithm: Random Forest Classifier
Target: Binary (Successful/Not Successful)
Metric: Accuracy, Precision, Recall, F1-Score

4. AI Recommendation System

Content-based filtering
Filters: genre, country, minimum rating
Weighted scoring: rating (40%) + popularity (30%) + vote count (30%)

🚀 Installation

Prerequisites

Python 3.8 or higher
pip package manager

Setup

Clone the repository

git clone https://github.com/YOUR_USERNAME/netflix-content-analysis.git
cd netflix-content-analysis

Install dependencies

pip install -r requirements.txt

Launch Jupyter Notebook

jupyter notebook Netflix_Analysis_Project.ipynb

💻 Usage

Open Netflix_Analysis_Project.ipynb in Jupyter Notebook
Run cells sequentially using Shift + Enter
Follow the analysis flow:
- Data Loading → Preprocessing → EDA → Correlation → ML Models → Recommendations

Quick Start

# Load and analyze data
import pandas as pd
df = pd.read_csv('netflix_movies_detailed_up_to_2025.csv')

# Get recommendations
recommend_movies(genre='Animation', country='Japan', min_rating=7.5, top_n=5)

📁 Project Structure

netflix-content-analysis/
│
├── Netflix_Analysis_Project.ipynb    # Main Jupyter notebook
├── netflix_movies_detailed_up_to_2025.csv  # Dataset
├── requirements.txt                   # Python dependencies
├── README.md                          # Project documentation
├── LICENSE                            # MIT License
└── .gitignore                         # Git ignore file

📈 Results

Key Findings

Finding	Value
Dominant Genre	Drama (22.7% of total)
Budget-Revenue Correlation	r = 0.75 (Strong)
Budget-Rating Correlation	r = 0.10 (Weak)
Japan's Top Genre	Animation (38.7%)
South Korea's Top Genre	Drama (31.3%)
China's Top Genre	Action (36.6%)

Model Performance

Model	Metric	Score
Revenue Prediction	R² Score	~0.85
Success Classification	Accuracy	~0.73

Strategic Recommendations

For High ROI: Invest in Animation/Horror with controlled budgets
For High Revenue: Focus on Action-Adventure-SciFi blockbusters
For High Quality: Partner with Asian studios (Japan, Korea)

🛠 Technologies Used

Python 3.x - Programming language
Pandas - Data manipulation
NumPy - Numerical computing
Matplotlib & Seaborn - Data visualization
Scikit-learn - Machine learning
Jupyter Notebook - Interactive development

👤 Author

Ahmet Burak Güvercin

Student ID: 190316073
Project: Grand Project I
University: Celal Bayar University - Computer Engineering

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Netflix for inspiration
TMDB for movie data
Scikit-learn documentation
Kaggle community

⭐ If you found this project helpful, please give it a star!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎬 Netflix Content Analysis Using AI/ML

📋 Table of Contents

🎯 Overview

📊 Dataset

✨ Features

1. Data Preprocessing

2. Exploratory Data Analysis

3. Machine Learning Models

4. AI Recommendation System

🚀 Installation

Prerequisites

Setup

💻 Usage

Quick Start

📁 Project Structure

📈 Results

Key Findings

Model Performance

Strategic Recommendations

🛠 Technologies Used

👤 Author

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
Netflix_Analysis_Project.ipynb		Netflix_Analysis_Project.ipynb
README.md		README.md
netflix_movies_detailed_up_to_2025.csv		netflix_movies_detailed_up_to_2025.csv
requirements.txt		requirements.txt

License

HyperCE/netflix-content-analysis

Folders and files

Latest commit

History

Repository files navigation

🎬 Netflix Content Analysis Using AI/ML

📋 Table of Contents

🎯 Overview

📊 Dataset

✨ Features

1. Data Preprocessing

2. Exploratory Data Analysis

3. Machine Learning Models

4. AI Recommendation System

🚀 Installation

Prerequisites

Setup

💻 Usage

Quick Start

📁 Project Structure

📈 Results

Key Findings

Model Performance

Strategic Recommendations

🛠 Technologies Used

👤 Author

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages