Remove heliumcli dependency (infra phase 3)#55
Closed
alexdlaird wants to merge 1731 commits into
Closed
Conversation
Split the single avg_notes_per_user timeseries into multiple Datadog requests to show entity-level breakdowns (Total, Homework, Event, Resource, Standalone). The Total query now excludes entity tags (!entity:*) and the series include explicit style settings (palettes, line_type, line_width) and metadata alias names for clearer legend labels and visualization.
Expand inline style blocks into multi-line blocks in terraform/modules/global/datadog/main.tf for the helium_user_behavior dashboard. The change breaks out palette, line_type, and line_width into separate lines for three requests (avg_notes_per_user, avg_reminders_per_user, avg_attachments_per_user) to improve readability and maintain consistent formatting.
Create a dedicated CloudWatch log group for ECS (/ecs/helium_platform_${var.environment}) with 7-day retention and update ECS task definitions to reference it (remove awslogs-create-group). Add three CloudWatch Logs Insights saved queries (Errors, Celery Task Failures, Push Notifications) for faster troubleshooting. Emit a CloudWatch metric via a log metric filter for Celery task failures (Helium/Platform::CeleryTaskFailure with Environment dimension) so external alerting can rely on log-derived metrics. Add a Datadog query alert monitor that alerts on spikes in the CloudWatch-derived celery_task_failure metric (threshold >3 in last hour) to notify support of elevated task exceptions.
Add one-time Terraform import files for dev and prod environments to bring pre-existing CloudWatch log groups under Terraform management. Each file imports module.ecs.aws_cloudwatch_log_group.platform using the existing IDs (/ecs/helium_platform_dev and /ecs/helium_platform_prod). These import blocks are only needed for the initial import and can be removed after the first successful apply.
Update aws_cloudwatch_log_group retention for /ecs/helium_platform_${var.environment} from 7 to 30 days to retain logs longer for debugging and operational analysis.
Delete dev and prod environment import files that contained one-time import blocks for module.ecs.aws_cloudwatch_log_group.platform (ids: /ecs/helium_platform_dev and /ecs/helium_platform_prod). These were only needed for the initial resource import and are safe to remove after the first successful apply.
Remove the datadog_monitor "celery_task_failures_cloudwatch" from terraform/modules/global/datadog/monitors.tf and delete the "Helium/prod" entry from the Datadog AWS integration metrics list in terraform/modules/global/datadog/integration.tf. This cleans up an existing CloudWatch-based Celery alert and removes the Helium/prod namespace from the integration configuration.
Update CloudWatch Logs query and metric filter patterns to match current log formats. The celery_task_failures query and corresponding metric filter were changed from matching "Task raised exception" to "raised unexpected" to capture the new error phrasing. The push_notifications query was adjusted to look for the service namespace "helium.common.services.pushservice" or "push notification" to more precisely target push service logs.
Create an SNS topic and email subscription (support@heliumedu.com) and add a CloudWatch metric alarm to detect spikes in Celery task failures. The alarm watches the Helium/${var.environment} namespace metric CeleryTaskFailure (Sum) over a 1-hour period and fires when >5 failures; it sends alarm and OK notifications to the SNS topic. All new resources are conditional on var.environment == "prod". Also update a nearby comment to clarify the purpose of the CloudWatch alarm.
Move the "Feature Health (Adoption %)" widget group within terraform/modules/global/datadog/main.tf to a later position in the dashboard definition. This is a pure reordering of the existing widget group (queries and conditional formats unchanged) to adjust the dashboard layout and grouping order.
Introduce a local set of user data distribution metrics and create datadog_metric_tag_configuration resources for each metric. Each metric is configured as a distribution, tagged with [env, staff, window, entity], and has percentiles enabled to improve metric tagging and analysis.
Rename local and resource identifiers from user_data_distribution_* to user_distribution_* and update the datadog_metric_tag_configuration for_each to use the new local. Also add two new user engagement distribution metrics (platform.users.engagement.completions_per_user and platform.users.engagement.graded_homework_per_user) to the metrics set.
Refactor helium_user_behavior dashboard: rename group title to "Feature Adoption (% of Active Users)", append .fill(last) to multiple adoption metric queries to fill missing points, and add timeseries_background { type = "area" } to improve visualization. Also remove a duplicated/older group of timeseries widgets to clean up redundancy.
Expose a fixed set of time window options for the dashboard template variable by adding available_values (1d, 7d, 30d, 90d, 180d). This makes the `window` template variable selectable from predefined ranges and includes minor formatting alignment in the resource block.
Set live_span = "1mo" on multiple timeseries and gauge widgets in the helium_user_behavior Datadog dashboard to standardize the default live view to one month. Changes applied in terraform/modules/global/datadog/main.tf for various metrics and adoption percentage widgets to ensure consistent chart behavior.
Update live_span from "1mo" to "3mo" in terraform/modules/global/datadog/main.tf for the datadog_dashboard.helium_user_behavior resource. Extends the dashboard lookback window to three months across multiple user behavior and adoption metric widgets to improve trend visibility and analysis.
Replace PNG with SVG for the Patreon support badge. Update README to reference the SVG asset and adjust CloudFront rewrites to route /img/support-patreon.svg to the landing site equivalent. This ensures the vector asset is used for better scalability and potentially smaller file size.
…wn-phase-1-2026-08-01 # Conflicts: # terraform/modules/environment/cloudfront/rewrites.js
…e/legacy-shutdown-phase-2-2026-08-01
…e/legacy-shutdown-phase-3-2026-08-01
Add mapping for '/img/support-patreon.png' in CloudFront rewrites to point to 'https://landing.heliumedu.com/img/support-patreon.png', ensuring the PNG asset is served alongside the existing SVG variant.
…wn-phase-1-2026-08-01 # Conflicts: # terraform/modules/environment/cloudfront/rewrites.js
…2026-08-01' into feature/legacy-shutdown-phase-2-2026-08-01
…2026-08-01' into feature/legacy-shutdown-phase-3-2026-08-01
…e/legacy-shutdown-phase-2-2026-08-01
…e/legacy-shutdown-phase-3-2026-08-01
00d004e to
c429b4d
Compare
This was referenced May 26, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Final phase of the Aug 1, 2026 frontend-legacy shutdown. Removes the last heliumcli touch points so the dep can be dropped from
requirements.txtandHeliumEdu/heliumclican be archived.Phases 1+2 already removed the bulk of the heliumcli surface (release scripts, legacy release workflow, frontend-legacy/cluster-tests from the projects list). This PR handles the remaining orchestration the Makefile invokes on
make install.Changes
bin/update-projects.sh— bash replacement forhelium-cli update-projects(clone if missing, fetch+pull if present,make install -Cin each project)installtarget invokes the new script;HELIUMCLI_PROJECTSvariable renamed toPROJECTS, JSON-list format changed to space-separatedheliumcli==1.6.38fromrequirements.txt.heliumcli.ymlAfter this merges
Linked PRs — do not merge before Aug 1, 2026
Merge in order:
Test plan
make installclones each project on a fresh checkoutmake installfetches+pulls each project on an existing checkoutPROJECTS="platform frontend" make installhonors the override