This Terraform project deploys an AWS Lambda-based scheduler that resizes EC2 instances on a schedule to optimize compute costs for business-hours workloads. It scales instances down during off-hours and scales them back up before the next business day. This allows you to reduce spend without shutting down systems or triggering alerts from monitoring tools.
- A Lambda function runs on EventBridge schedules:
- Default schedule — Scales down at 2 PM Pacific (10 PM UTC) via
cron(0 22 ? * MON-FRI *)and scales up at 9 AM Pacific (5 PM UTC) viacron(0 17 ? * MON-FRI *). Switch to the Daylight Saving Time equivalents or update the stack parameters for other timezones. If you need a post-midnight scale-down, convert the weekday tokens toTUE-SATso the rule fires on the intended calendar days.- Business-hours schedule — Scales down at 6 PM Pacific (2 AM UTC) via
cron(0 2 ? * TUE-SAT *)and scales up at 9 AM Pacific (5 PM UTC) viacron(0 17 ? * MON-FRI *). Opt instances into this schedule by tagging them withDynamicScalingSchedule=business-hours(or with your custom schedule tag key set to thebusiness-hoursvalue).
- Business-hours schedule — Scales down at 6 PM Pacific (2 AM UTC) via
- Instances are rebooted once per resize operation (stop → modify → start)
- This works even if you're using Compute Savings Plans
- Minimal impact to existing tools, monitoring agents, or workflows
- Each scale-down run estimates the discounted hourly savings per instance (respecting any configured Compute Savings Plan discount), automatically matches Linux, Windows, and SQL Server license-included fleets to the right on-demand price, stores a JSON report in an S3 bucket for cost tracking, and emits the totals to a CloudWatch Metrics namespace so you can build dashboards or alarms. Each run now also stamps helper tags (
DynamicScalingLastScaleDownTimestamp,DynamicScalingLastScaleDownHourlySavings) on the downsized instances so the next scale-up can measure real downtime. - Scale-down summaries now include the projected off-hours duration, the projected total savings before the next scale-up, and publish additional CloudWatch metrics (
TotalProjectedOffHoursSavings,ProjectedOffHoursDurationHours) that align with those projections. - When the fleet scales back up, the function reads the saved metadata to calculate actual downtime per instance, writes an aggregated
actual-savings/JSON report, publishesTotalActualSavings/TotalActualDowntimeHoursmetrics, and annotates each instance withDynamicScalingLastScaleUpTimestampto avoid double-counting.
Apply these tags to any EC2 instance you want managed by this scheduler:
| Tag Key | Tag Value | Purpose |
|---|---|---|
DynamicInstanceScaling |
true |
Opts the instance into scheduling |
PreferredInstanceType |
m7i.large (example) |
Specifies the instance type to return to at 4 AM |
⚠️ The instance type will be resized tot3.mediumby default during off-hours.
| Tag Key | Example Value | Purpose |
|---|---|---|
DynamicScalingSchedule |
default, business-hours, all |
Assigns the instance to an alternate schedule. Comma-separated values allow an instance to opt into multiple schedules; instances without this tag use the default schedule. |
environmentCategory |
prod, nonprod |
Optional filter for manual/on-demand runs; only instances with a matching value are processed when this filter is supplied. |
For the business-hours schedule, set the schedule tag key (defaults to
DynamicScalingSchedule) to the valuebusiness-hourson each instance you want on that timetable.
The Lambda function follows a least privilege model. It can only modify EC2 instances with the DynamicInstanceScaling=true tag. It also has scoped access to:
- Start/stop/modify EC2 instances
- Create EBS volume grants (for encrypted volumes)
- Write logs to CloudWatch Logs (14-day retention)
- Write savings reports to an S3 bucket created by the stack
- Query Cost Explorer Savings Plan coverage metrics when coverage-based discounts are enabled
- Configurable retries: AWS API interactions such as stop/modify/start, tag writes, and waiter checks flow through a
retryhelper with bounded attempts and backoff to absorb transient throttling before surfacing an error. - Input validation: The Lambda enforces supported
actionvalues and blocks manual runs to avoid unexpected invocation paths. - Safe metric emission: Savings metrics are skipped entirely when no namespace is configured, and each CloudWatch publication batch is wrapped with ClientError/exception logging so a metrics outage does not stop the run.
- S3 write guards: Savings reports log and continue when the target bucket is unset or when
put_objectraises an error, preventing upload failures from crashing the function. - Defensive savings math: Actual savings snapshots verify required tags, parse timestamps defensively, and skip instances with invalid or missing metadata instead of raising.
- EC2 discovery fallback: If
DescribeInstancesfails, the handler returns a structured error response and halts gracefully rather than crashing mid-run.
You can deploy this stack using the AWS Console, AWS CLI, or SAM/CDK.
- Ensure your AWS credentials and default region are configured in your environment.
- From the repository root, run:
Override any of the stack parameters via
terraform init terraform apply
-varflags (for example,-var "lambda_schedule_up_time=cron(0 17 ? * MON-FRI *)"). - Apply the required tags to your EC2 instances (see below) after the deployment completes.
To avoid naming collisions across parallel deployments (or across separate Terraform states), set deployment_id to a stable unique value per deployment. If omitted, Terraform generates one and keeps it in state so subsequent applies reuse the same names.
If your execution role cannot create IAM roles, set existing_lambda_role_arn to an already-provisioned Lambda execution role and Terraform will reuse it instead of creating aws_iam_role resources.
To deploy with AWS Console:
- Download the CloudFormation template:
ec2-dynamic-scheduler.yaml - Upload it to CloudFormation and launch the stack
- Apply the required tags to your EC2 instances
- Resize Target: Control the off-hours instance type with the
OffHoursInstanceTypestack parameter (defaults tot3.medium). - Schedule: Default cron expressions target Pacific working hours by converting the desired local times into UTC because
AWS::Events::Ruledoes not currently support theScheduleExpressionTimezoneproperty. Update theLambdaScheduleUpTime,LambdaScheduleDownTime,BusinessHoursScheduleUpTime, andBusinessHoursScheduleDownTimeparameters to match your timezone or to account for Daylight Saving Time. - Multiple Schedules: Use the
ScheduleTagKeyparameter (defaults toDynamicScalingSchedule) to choose which tag assigns instances to alternative schedules. Deploy additional EventBridge rules that invoke the Lambda with a differentschedulepayload (for example"schedule": "team-b") and tag instances accordingly. A tag value ofallopts an instance into every schedule. - Parallel Operations: Control how many instances are processed simultaneously with the
ConcurrentInstanceOperationsparameter (defaults to 4). The Lambda now uses AWS waiters and polling instead of fixed sleeps, dramatically reducing idle time during stop/modify/start sequences. - Logging: CloudWatch Log Group is created with 14-day retention. Logs show success and error messages per instance.
- Savings Reports: Every scale-down event writes a JSON summary to the provisioned S3 bucket (
SavingsLogBucket) undersavings/<date>/<timestamp>.json, which now captures projected downtime hours and projected total savings. Scale-up events complement this with measured results underactual-savings/<date>/<timestamp>.json, giving you both forecasted and realized savings without reprocessing the raw metrics. You can change the bucket properties or configure lifecycle rules by editing the CloudFormation template. - Savings Plan Discount: Choose whether to provide a manual discount percentage (
SavingsPlanDiscountPercent) or let the stack derive an effective rate from recent Cost Explorer coverage data by settingSavingsPlanDiscountModetoCoverage. Coverage mode uses thece:GetSavingsPlansCoverageAPI (ensure Cost Explorer is enabled) and averages the lastSavingsPlanCoverageLookbackDays(30 by default). - CloudWatch Metrics: Use the
SavingsMetricNamespaceparameter to control where hourly savings metrics are published. These metrics expose the total run savings and per-instance estimates, enabling dashboards, anomaly detection, or cost alerts alongside the S3 JSON reports. Set the parameter to an empty string if you prefer to disable metric publication. - Pricing Detection: The Lambda maps each instance's platform to the appropriate AWS Pricing filters before calculating savings. If an instance platform can't be detected, override the fallback filters with the
DefaultPricingOperatingSystem,DefaultPricingLicenseModel, andDefaultPricingPreInstalledSoftwareparameters instead of editing the function code. - Deployment Identity: Use
deployment_idto control the unique suffix used for global-name resources (S3 bucket, IAM role/policy names, EventBridge rule names, log group, and Lambda name). Leaving it blank auto-generates a stable per-state suffix. - Existing IAM Role Reuse: Set
existing_lambda_role_arnwhen Terraform should attach the Lambda to a pre-existing execution role instead of creating a new role/policy.
The deployment creates an SSM Automation document named <lambda_function_name>-OnDemandScaling so you can run scale-up/scale-down on demand without changing schedules.
Parameters:
Action:scaleuporscaledownSchedule:default,business-hours, orallEnvironmentCategory:all(no filter) or a specificenvironmentCategorytag value
This is the safest manual trigger path because the Lambda blocks invocations where
sourceismanual.
To test in the Lambda console:
- Open the Lambda function created by the stack
- Create a test event using this JSON format:
{
"source": "Scheduled",
"action": "scaleup",
"schedule": "default",
"environmentCategory": "all"
}- Replace
defaultwith another schedule name such asbusiness-hoursorallto target the corresponding set of tagged instances. - Set
environmentCategoryto a tag value (for example,prod) to scope the run to matching instances.
If you are looking to extend the stack further, the following ideas can help deepen the savings insights or broaden operational coverage without forcing downstream customization in the Lambda code:
- Rightsizing Recommendations: Persist the observed instance hours and savings deltas to S3/CloudWatch and surface a daily or weekly summary that highlights candidates for permanent downsizing.
- Notification Hooks: Wire optional SNS/Slack notifications into the CloudFormation parameters so operations teams are alerted when a resize or savings report fails.
- Override Schedules Per Tag: Introduce additional opt-in tags (for example
DynamicScalingSchedule=weekends) that map to distinct EventBridge cron expressions defined in the template. - Savings Dashboard Template: Publish an optional CloudWatch dashboard resource that visualizes the emitted savings metrics out of the box.
These enhancements keep customization declarative by flowing new knobs through CloudFormation parameters instead of edits to the Lambda source.