Is your feature request related to a problem? Please describe.
The current data generation process is constrained by manual steps, limited compute resources, and lack of automation for scaling up data collection. This restricts the ability to efficiently generate large-scale datasets for research, simulation, or model training.
Describe the solution you'd like
Develop a system to scale data generation through:
- Infrastructure-as-Code (IaC) scripts to provision required cloud infrastructure (compute/storage/networking)
- Automated startup of data collection, including connecting simulation clients to the MTA server via MTA:SA
- Orchestration logic to manage multiple concurrent data generation jobs (auto-scaling, monitoring, retries)
- Automatic consolidation of generated data into a unified, accessible location when jobs finish
- Documentation for deployment, scaling, and usage
Describe alternatives you've considered
- Manual setup and scaling of infrastructure (labor-intensive, error-prone, not reproducible)
- Relying on fixed, non-cloud compute resources (limits scalability)
Acceptance Criteria
Additional context
- Consider using tools like Terraform, Ansible, or cloud provider native IaC (e.g., AWS CloudFormation, GCP Deployment Manager)
- Data consolidation could use cloud storage buckets, shared filesystems, or databases
- The solution should be extensible for future scaling or multi-cloud needs.
Is your feature request related to a problem? Please describe.
The current data generation process is constrained by manual steps, limited compute resources, and lack of automation for scaling up data collection. This restricts the ability to efficiently generate large-scale datasets for research, simulation, or model training.
Describe the solution you'd like
Develop a system to scale data generation through:
Describe alternatives you've considered
Acceptance Criteria
Additional context