A modular Python framework for data engineering, validation, and analytics workflows.
Data Manager provides a job-based architecture and pluggable storage backends for processing tabular datasets in a structured and extensible way.
- Modular storage backends (CSV, JSON, In-Memory)
- Job-based execution architecture
- Data engineering utilities
- Data validation and schema checking
- Data analytics and profiling
- Fully tested with Pytest
pip install data-manager-frameworkfrom data_manager.storage.csv_backend import CSVStorage
from data_manager.jobs.data_analytics import DataAnalytics
storage = CSVStorage()
storage.read("data.csv")
analytics = DataAnalytics(storage)
print(analytics.summary())Provides interchangeable storage backends:
- CSVStorage
- JSONStorage
- InMemoryStorage
Available operations:
- Remove duplicate records
- Handle missing values
Available validations:
- Schema validation
- Data type checks
- Nullability checks
Available analytics:
- Dataset summary
- Column statistics
- Missing value analysis
- Duplicate analysis
- Dataset profiling
from data_manager.storage.csv_backend import CSVStorage
from data_manager.jobs.data_engineer import DataEngineer
storage = CSVStorage()
storage.read("data.csv")
engineer = DataEngineer(storage)
engineer.removeDuplicates()
engineer.removeNull()
storage.write("cleaned_data.csv")Run the complete test suite:
pytestVerbose mode:
pytest -v| Component | Features |
|---|---|
| Storage | CSV, JSON, In-Memory |
| Engineering | Remove duplicates, Handle missing values |
| Validation | Schema validation, Nullability checks |
| Analytics | Summary, Profiling, Missing value analysis, Column statistics |
- CSV Backend
- JSON Backend
- In-Memory Backend
- Data Validation
- Data Analytics
- Excel Backend
- Parquet Backend
- SQL Backend
- Automated EDA Reports
MIT License
Krish Kumar