Skip to content

Remove heliumcli dependency (infra phase 3)#55

Closed
alexdlaird wants to merge 1731 commits into
feature/legacy-shutdown-phase-2-2026-08-01from
feature/legacy-shutdown-phase-3-2026-08-01
Closed

Remove heliumcli dependency (infra phase 3)#55
alexdlaird wants to merge 1731 commits into
feature/legacy-shutdown-phase-2-2026-08-01from
feature/legacy-shutdown-phase-3-2026-08-01

Conversation

@alexdlaird
Copy link
Copy Markdown
Member

@alexdlaird alexdlaird commented May 25, 2026

Final phase of the Aug 1, 2026 frontend-legacy shutdown. Removes the last heliumcli touch points so the dep can be dropped from requirements.txt and HeliumEdu/heliumcli can be archived.

Phases 1+2 already removed the bulk of the heliumcli surface (release scripts, legacy release workflow, frontend-legacy/cluster-tests from the projects list). This PR handles the remaining orchestration the Makefile invokes on make install.

Changes

  • New bin/update-projects.sh — bash replacement for helium-cli update-projects (clone if missing, fetch+pull if present, make install -C in each project)
  • Makefile install target invokes the new script; HELIUMCLI_PROJECTS variable renamed to PROJECTS, JSON-list format changed to space-separated
  • Drop heliumcli==1.6.38 from requirements.txt
  • Delete .heliumcli.yml

After this merges

  • No active Helium code depends on heliumcli
  • HeliumEdu/heliumcli GitHub repo can be archived (signaling only; PyPI keeps the package available)

Linked PRs — do not merge before Aug 1, 2026

Merge in order:

  1. Move site to www.heliumedu.com; drop Helium Classic links www#2 — marketing site reconfigured for www (supersedes closed Enable/disable maintenance mode during deploy #1)
  2. Tear down legacy frontend hosting and Twilio (phase 1 of 2) #52 — infra phase 1 (tears down legacy hosting)
  3. Repoint marketing distribution to www.heliumedu.com (phase 2 of 2) #53 — infra phase 2 (points www at the marketing distribution)
  4. Remove heliumcli dependency (infra phase 3) #55 — this PR (infra phase 3, removes heliumcli)
  5. Remove deprecated legacy frontend code, UserProfile, and Twilio SMS platform#967 — platform deprecated-API removal

Test plan

  • make install clones each project on a fresh checkout
  • make install fetches+pulls each project on an existing checkout
  • PROJECTS="platform frontend" make install honors the override

alexdlaird and others added 30 commits April 5, 2026 21:46
Split the single avg_notes_per_user timeseries into multiple Datadog requests to show entity-level breakdowns (Total, Homework, Event, Resource, Standalone). The Total query now excludes entity tags (!entity:*) and the series include explicit style settings (palettes, line_type, line_width) and metadata alias names for clearer legend labels and visualization.
Expand inline style blocks into multi-line blocks in terraform/modules/global/datadog/main.tf for the helium_user_behavior dashboard. The change breaks out palette, line_type, and line_width into separate lines for three requests (avg_notes_per_user, avg_reminders_per_user, avg_attachments_per_user) to improve readability and maintain consistent formatting.
Create a dedicated CloudWatch log group for ECS (/ecs/helium_platform_${var.environment}) with 7-day retention and update ECS task definitions to reference it (remove awslogs-create-group). Add three CloudWatch Logs Insights saved queries (Errors, Celery Task Failures, Push Notifications) for faster troubleshooting. Emit a CloudWatch metric via a log metric filter for Celery task failures (Helium/Platform::CeleryTaskFailure with Environment dimension) so external alerting can rely on log-derived metrics. Add a Datadog query alert monitor that alerts on spikes in the CloudWatch-derived celery_task_failure metric (threshold >3 in last hour) to notify support of elevated task exceptions.
Add one-time Terraform import files for dev and prod environments to bring pre-existing CloudWatch log groups under Terraform management. Each file imports module.ecs.aws_cloudwatch_log_group.platform using the existing IDs (/ecs/helium_platform_dev and /ecs/helium_platform_prod). These import blocks are only needed for the initial import and can be removed after the first successful apply.
Update aws_cloudwatch_log_group retention for /ecs/helium_platform_${var.environment} from 7 to 30 days to retain logs longer for debugging and operational analysis.
Delete dev and prod environment import files that contained one-time import blocks for module.ecs.aws_cloudwatch_log_group.platform (ids: /ecs/helium_platform_dev and /ecs/helium_platform_prod). These were only needed for the initial resource import and are safe to remove after the first successful apply.
Remove the datadog_monitor "celery_task_failures_cloudwatch" from terraform/modules/global/datadog/monitors.tf and delete the "Helium/prod" entry from the Datadog AWS integration metrics list in terraform/modules/global/datadog/integration.tf. This cleans up an existing CloudWatch-based Celery alert and removes the Helium/prod namespace from the integration configuration.
Update CloudWatch Logs query and metric filter patterns to match current log formats. The celery_task_failures query and corresponding metric filter were changed from matching "Task raised exception" to "raised unexpected" to capture the new error phrasing. The push_notifications query was adjusted to look for the service namespace "helium.common.services.pushservice" or "push notification" to more precisely target push service logs.
Create an SNS topic and email subscription (support@heliumedu.com) and add a CloudWatch metric alarm to detect spikes in Celery task failures. The alarm watches the Helium/${var.environment} namespace metric CeleryTaskFailure (Sum) over a 1-hour period and fires when >5 failures; it sends alarm and OK notifications to the SNS topic. All new resources are conditional on var.environment == "prod". Also update a nearby comment to clarify the purpose of the CloudWatch alarm.
alexdlaird and others added 18 commits May 25, 2026 16:32
Move the "Feature Health (Adoption %)" widget group within terraform/modules/global/datadog/main.tf to a later position in the dashboard definition. This is a pure reordering of the existing widget group (queries and conditional formats unchanged) to adjust the dashboard layout and grouping order.
Introduce a local set of user data distribution metrics and create datadog_metric_tag_configuration resources for each metric. Each metric is configured as a distribution, tagged with [env, staff, window, entity], and has percentiles enabled to improve metric tagging and analysis.
Rename local and resource identifiers from user_data_distribution_* to user_distribution_* and update the datadog_metric_tag_configuration for_each to use the new local. Also add two new user engagement distribution metrics (platform.users.engagement.completions_per_user and platform.users.engagement.graded_homework_per_user) to the metrics set.
Refactor helium_user_behavior dashboard: rename group title to "Feature Adoption (% of Active Users)", append .fill(last) to multiple adoption metric queries to fill missing points, and add timeseries_background { type = "area" } to improve visualization. Also remove a duplicated/older group of timeseries widgets to clean up redundancy.
Expose a fixed set of time window options for the dashboard template variable by adding available_values (1d, 7d, 30d, 90d, 180d). This makes the `window` template variable selectable from predefined ranges and includes minor formatting alignment in the resource block.
Set live_span = "1mo" on multiple timeseries and gauge widgets in the helium_user_behavior Datadog dashboard to standardize the default live view to one month. Changes applied in terraform/modules/global/datadog/main.tf for various metrics and adoption percentage widgets to ensure consistent chart behavior.
Update live_span from "1mo" to "3mo" in terraform/modules/global/datadog/main.tf for the datadog_dashboard.helium_user_behavior resource. Extends the dashboard lookback window to three months across multiple user behavior and adoption metric widgets to improve trend visibility and analysis.
Replace PNG with SVG for the Patreon support badge. Update README to reference the SVG asset and adjust CloudFront rewrites to route /img/support-patreon.svg to the landing site equivalent. This ensures the vector asset is used for better scalability and potentially smaller file size.
…wn-phase-1-2026-08-01

# Conflicts:
#	terraform/modules/environment/cloudfront/rewrites.js
Add mapping for '/img/support-patreon.png' in CloudFront rewrites to point to 'https://landing.heliumedu.com/img/support-patreon.png', ensuring the PNG asset is served alongside the existing SVG variant.
…wn-phase-1-2026-08-01

# Conflicts:
#	terraform/modules/environment/cloudfront/rewrites.js
…2026-08-01' into feature/legacy-shutdown-phase-2-2026-08-01
…2026-08-01' into feature/legacy-shutdown-phase-3-2026-08-01
@alexdlaird alexdlaird marked this pull request as draft May 26, 2026 18:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant