This repository documents my journey in learning data engineering concepts, tools, and best practices. It contains structured notes, hands-on exercises, and references to key topics in the field.
- Apache Airflow
- Apache Spark
- Data Engineering Best Practices
- Data Engineering Fundamentals Concepts
- Data Modelling – Dimensional modelling techniques, scenario-based examples, and interview questions.
- Data Processing – Comparison of batch vs. stream processing techniques.
- Python Concepts – Key Python programming concepts useful for data engineering.
- DBT/SQL for Modularity – SQL refactoring techniques and best practices using dbt.
- dlt (data load tool) – Open-source Python Library to load data from multiple sources, reduces boilerplate code, features like auto schema detection, unnesting etc.
- Completing the Data Engineering ZoomCamp 2025 coursework.
- Deep diving into workflow orchestration and modern data stack tools.