Skip to content

[FEATURE] Scale data generation with automated cloud infrastructure and orchestration #18

@ncudlenco

Description

@ncudlenco

Is your feature request related to a problem? Please describe.
The current data generation process is constrained by manual steps, limited compute resources, and lack of automation for scaling up data collection. This restricts the ability to efficiently generate large-scale datasets for research, simulation, or model training.

Describe the solution you'd like
Develop a system to scale data generation through:

  • Infrastructure-as-Code (IaC) scripts to provision required cloud infrastructure (compute/storage/networking)
  • Automated startup of data collection, including connecting simulation clients to the MTA server via MTA:SA
  • Orchestration logic to manage multiple concurrent data generation jobs (auto-scaling, monitoring, retries)
  • Automatic consolidation of generated data into a unified, accessible location when jobs finish
  • Documentation for deployment, scaling, and usage

Describe alternatives you've considered

  • Manual setup and scaling of infrastructure (labor-intensive, error-prone, not reproducible)
  • Relying on fixed, non-cloud compute resources (limits scalability)

Acceptance Criteria

  • IaC scripts provision necessary cloud infrastructure for data generation
  • Data collection automatically starts, including MTA:SA server-client connection
  • System supports concurrent data generation jobs and orchestrates them
  • Generated data is automatically consolidated into a single storage location
  • Documentation and usage instructions are provided
  • Tests verify scalability, automation, and data integrity
  • No existing functionality is broken
  • Performance impact is acceptable

Additional context

  • Consider using tools like Terraform, Ansible, or cloud provider native IaC (e.g., AWS CloudFormation, GCP Deployment Manager)
  • Data consolidation could use cloud storage buckets, shared filesystems, or databases
  • The solution should be extensible for future scaling or multi-cloud needs.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions