Data cleaning is a critical step before performing any exploratory analysis on a dataset. It ensures that the data is accurate, complete, and consistent, which is essential for producing reliable insights.
In this project, I focused on cleaning two datasets: the World Layoff Dataset and the Pizza Runner Dataset. The goal was to ensure consistency and integrity across both datasets. I aimed to prepare the data for exploratory data analysis (EDA) by addressing issues such as missing values, duplicates, and inaccuracies. This preparation is vital to prevent building flawed algorithms or drawing incorrect conclusions during the EDA phase.
- Identifying and removing duplicates
- Data standardization and formatting
- Handle null and blank values
- Remove any column or rows not needed