A comprehensive data science project analyzing 16,000+ Netflix movies using machine learning techniques to identify regional genre preferences, predict movie success factors, and build AI-based recommendation systems.
This project performs end-to-end data analysis and machine learning on Netflix's movie catalog to extract actionable insights for content producers, platform managers, and viewers. The analysis covers:
- Exploratory Data Analysis (EDA) - Genre distributions, regional preferences, temporal trends
- Correlation Analysis - Relationships between budget, revenue, ratings, and popularity
- Machine Learning Models - Revenue prediction and success classification
- Recommendation System - Content-based filtering for personalized suggestions
| Metric | Value |
|---|---|
| Total Movies | 16,000 |
| Time Period | 2010 - 2025 |
| Languages | 74 |
| Countries | 114 |
| Features | 18 |
Key Features:
title,director,cast- Movie informationcountry,language- Regional datagenres- Category classificationbudget,revenue- Financial metricsvote_average,vote_count,popularity- Audience metricsrelease_year- Temporal information
- Missing value handling
- Feature extraction (primary genre, primary country)
- ROI calculation
- Success labeling (rating ≥ 7.0)
- Genre distribution visualization
- Regional preference analysis
- Financial performance by genre
Model 1: Revenue Prediction (Regression)
- Algorithm: Random Forest Regressor
- Features: budget, ratings, popularity, genre, country
- Metric: R² Score
Model 2: Success Classification
- Algorithm: Random Forest Classifier
- Target: Binary (Successful/Not Successful)
- Metric: Accuracy, Precision, Recall, F1-Score
- Content-based filtering
- Filters: genre, country, minimum rating
- Weighted scoring: rating (40%) + popularity (30%) + vote count (30%)
- Python 3.8 or higher
- pip package manager
- Clone the repository
git clone https://github.com/YOUR_USERNAME/netflix-content-analysis.git
cd netflix-content-analysis- Install dependencies
pip install -r requirements.txt- Launch Jupyter Notebook
jupyter notebook Netflix_Analysis_Project.ipynb- Open
Netflix_Analysis_Project.ipynbin Jupyter Notebook - Run cells sequentially using
Shift + Enter - Follow the analysis flow:
- Data Loading → Preprocessing → EDA → Correlation → ML Models → Recommendations
# Load and analyze data
import pandas as pd
df = pd.read_csv('netflix_movies_detailed_up_to_2025.csv')
# Get recommendations
recommend_movies(genre='Animation', country='Japan', min_rating=7.5, top_n=5)netflix-content-analysis/
│
├── Netflix_Analysis_Project.ipynb # Main Jupyter notebook
├── netflix_movies_detailed_up_to_2025.csv # Dataset
├── requirements.txt # Python dependencies
├── README.md # Project documentation
├── LICENSE # MIT License
└── .gitignore # Git ignore file
| Finding | Value |
|---|---|
| Dominant Genre | Drama (22.7% of total) |
| Budget-Revenue Correlation | r = 0.75 (Strong) |
| Budget-Rating Correlation | r = 0.10 (Weak) |
| Japan's Top Genre | Animation (38.7%) |
| South Korea's Top Genre | Drama (31.3%) |
| China's Top Genre | Action (36.6%) |
| Model | Metric | Score |
|---|---|---|
| Revenue Prediction | R² Score | ~0.85 |
| Success Classification | Accuracy | ~0.73 |
- For High ROI: Invest in Animation/Horror with controlled budgets
- For High Revenue: Focus on Action-Adventure-SciFi blockbusters
- For High Quality: Partner with Asian studios (Japan, Korea)
- Python 3.x - Programming language
- Pandas - Data manipulation
- NumPy - Numerical computing
- Matplotlib & Seaborn - Data visualization
- Scikit-learn - Machine learning
- Jupyter Notebook - Interactive development
Ahmet Burak Güvercin
- Student ID: 190316073
- Project: Grand Project I
- University: Celal Bayar University - Computer Engineering
This project is licensed under the MIT License - see the LICENSE file for details.
- Netflix for inspiration
- TMDB for movie data
- Scikit-learn documentation
- Kaggle community
⭐ If you found this project helpful, please give it a star!