Skip to content

Gcp demo#511

Merged
dylanratcliffe merged 2 commits intomainfrom
gcp-demo
Mar 26, 2026
Merged

Gcp demo#511
dylanratcliffe merged 2 commits intomainfrom
gcp-demo

Conversation

@dylanratcliffe
Copy link
Copy Markdown
Member

No description provided.

@github-actions
Copy link
Copy Markdown

Open in Overmind ↗


model|risks_v6
✨Encryption Key State Risk ✨KMS Key Creation

🔴 Change Signals

Routine 🔴 ▇▅▃▂▁ Multiple AWS resources are showing unusual infrequent update patterns: an SNS topic subscription recorded only 2 events/week for the last 3 months, while EC2 instances recorded only 1 event/week for the last 2-3 months, which is infrequent compared to typical patterns.

View signals ↗


🔥 Risks

Unconfirmed first SNS email subscription leaves production alerts with no active recipient during EC2 rollout ❗Medium Open Risk ↗
The production-api-alerts SNS topic currently has no confirmed or pending subscriptions, and this change adds its first notification target as an email subscription to alerts@example.com. Because the subscription uses endpoint_auto_confirms=false, Amazon SNS will not deliver notifications until the recipient manually confirms the subscription. The Terraform confirmation_timeout_in_minutes = 1 does not change SNS behavior; it just means the apply cannot rely on the endpoint being active during rollout.

As a result, while the two production EC2 instances are being updated, any alarms or alerts published to production-api-alerts will have no active destination. That creates a real observability gap: incidents triggered by the instance changes can occur without any operator receiving the notification, weakening detection and response for this deployment.

New GCP bucket lacks explicit retention and governance controls, making future data loss and compliance drift likely ❗Medium Open Risk ↗
The change creates a new GCP bucket, github.com/overmindtech/terraform-example.gcp-storage-bucket.test, with force_destroy=true but without explicit object versioning, retention, lifecycle, ownership labels, or customer-managed encryption. Google Cloud Storage’s current default soft-delete behavior likely prevents immediate permanent deletion on first destroy, so the specific hypothesis about instant total loss is overstated. However, the bucket is still being introduced without the explicit recovery and governance controls this environment already uses on other GCP buckets.

If this bucket is later used for logs, exports, build artifacts, or rollback data, deletes and overwrites will rely only on default platform behavior rather than on a codified retention policy, and a future Terraform destroy will not be blocked by residual data because force_destroy is enabled. After the soft-delete window, the data will be gone, and the lack of labels and explicit encryption/retention settings makes the bucket non-compliant with organizational governance requirements.


🧠 Reasoning · ✔ 2 · ✖ 2

SNS email subscription: data exfiltration and monitoring gap risk

Observations 2

Hypothesis

A new SNS email subscription alerts@example.com to the production-api-alerts topic in eu-west-2 introduces both data exposure and observability gaps. Email endpoints can exfiltrate sensitive alert content if the address is external, unapproved, or not a corporate mailbox, and may sit outside standard monitoring/audit flows. Additionally, the subscription uses endpoint_auto_confirms=false with a short confirmation timeout, so during rollout there may be a window where alerts are not actually delivered while EC2 instances are changing. This weakens incident response for issues triggered by the instance changes. Validate that the email endpoint is an approved corporate address, confirm the subscription end‑to‑end before or during rollout, ensure topic policy restricts who can subscribe, and confirm there is at least one already‑working notification target so monitoring coverage is not temporarily lost.

Investigation

Evidence Gathered

I first loaded the relevant organizational guidance for monitoring, IAM/access control, and security compliance. The monitoring guidance says alarms without SNS notification targets are a risk, but it does not prohibit email specifically. The IAM guidance says resource policies with broad principals are risky only when they lack restrictive conditions. The security compliance guidance is focused on encryption and network access and does not define approved mailbox domains or ban SNS email endpoints.

I then checked the current state of the SNS topic 540044833068.eu-west-2.sns-topic.arn:aws:sns:eu-west-2:540044833068:production-api-alerts and the two EC2 instances in blast radius. The topic currently has SubscriptionsConfirmed: 0 and SubscriptionsPending: 0, so there is no existing confirmed notification target on this topic today. The only notification target being added in this plan is the new aws_sns_topic_subscription with protocol: email, endpoint: alerts@example.com, endpoint_auto_confirms: false, and confirmation_timeout_in_minutes: 1. I also checked AWS/Terraform documentation: for SNS email subscriptions, the endpoint must confirm before it receives notifications, and unconfirmed subscriptions remain pending until confirmed; SNS does not deliver notifications to an unconfirmed subscription. AWS also documents that the default topic policy using AWS:SourceOwner restricts publish/subscribe administration to the owning account.

The topic policy currently is the standard default owner-only policy with Principal: {"AWS":"*"} but a StringEquals condition on AWS:SourceOwner: "540044833068". That is not public anonymous subscribe access; it is the normal same-account restriction. I found no concrete evidence in the change or current state that alerts@example.com is an approved corporate mailbox, nor any evidence that alert payloads here contain sensitive data. That part of the hypothesis remains unverified speculation.

Impact Assessment

The real, evidenced risk is the monitoring gap during and after rollout. There is 1 SNS topic directly affected and 1 new subscription being created on it. That topic is the alerting path named production-api-alerts, and the change introduces its first subscription. Because the subscription is email-based and endpoint_auto_confirms=false, it will stay pending until someone clicks the confirmation link in the mailbox. Until that happens, the topic has zero working delivery targets.

The blast radius includes 2 EC2 instances: i-0464c4413cb0c54aa (api-server) and i-09d6479fb9b97d123 (api-207c90ee-api-server). Both are being updated in the same change, so any alerts associated with their rollout or resulting failures would publish to a topic that currently has no confirmed subscribers and, under this plan, may still have no confirmed subscribers during apply. Operationally, that means alarm notifications can be generated but reach nobody. The scope is limited to this production alert topic and whatever alarms publish to it, but because the topic name is production-specific and there are no alternate subscribers visible on the topic, the coverage gap affects the entire notification channel for these instance changes.

Conclusion

I conclude the risk is real. The data-exfiltration claim is not proven from the available evidence, but a closely related and more concrete failure in the same concern area is present: this change creates the first alert subscription as an unconfirmed SNS email endpoint, so production-api-alerts will have zero active notification targets until manual confirmation occurs, creating a real monitoring gap during the EC2 rollout.

✔ Hypothesis proven


GCP bucket with force_destroy and weak governance: data retention and exfiltration risk

Observations 3

Hypothesis

Creation of the GCP bucket google_storage_bucket.test in europe-west2 with force_destroy=true and no explicit versioning/retention controls introduces multiple storage and compliance risks. force_destroy=true allows deletion of all objects, including any non‑current versions or backups, which can violate data retention, backup, and recovery requirements. Versioning state is unspecified, so there may be no protection against accidental overwrites or deletions. If this bucket is used as a shared landing zone for logs, builds, exports, or other recovery artifacts, a later destroy/recreate cycle could erase the only copy of data needed for rollback or incident investigation. In addition, because the bucket resides in GCP rather than AWS, it falls outside AWS‑native detective controls (CloudTrail, Config, GuardDuty, CloudWatch), creating a blind spot for data exfiltration and incident response if EC2 instances or tooling start writing data there. Confirm the intended data stored in this bucket, ensure appropriate versioning, lifecycle, and retention are configured, and implement equivalent logging, access review, and encryption controls in GCP aligned with organizational data residency and backup policies.

Investigation

Evidence Gathered

I first checked the relevant organizational guidance. aws-storage-data-management says buckets holding critical data without versioning should be flagged because accidental deletes and overwrites become permanent (REL09-BP01). aws-resource-governance says resources created without mandatory ownership/environment tags are a governance finding, and non-production resources should have TTL or cleanup controls. security-compliance-requirements requires customer-controlled encryption for data at rest with no exceptions, even for test resources. infrastructure-quick-reference also notes that this environment contains testing infrastructure and that test buckets are normally given explicit short-lived lifecycle controls to avoid cost and governance drift.

I then examined the planned diff for the new GCP bucket github.com/overmindtech/terraform-example.gcp-storage-bucket.test. The only explicit protections in the plan are uniform_bucket_level_access=true; the bucket is being created with force_destroy=true, labels=null, no explicit retention configuration (default_event_based_hold=null, enable_object_retention=null), and versioning left as (known after apply) rather than being configured. Because this is a create, there is no pre-existing bucket state to inherit. I also checked the existing GCP bucket in blast radius, overmind-scale-test.gcp-storage-bucket.gcf-v2-sources-332460912908-europe-west1, as a comparator: it has versioning.enabled=true, a lifecycle rule deleting noncurrent versions, and a softDeletePolicy.retentionDurationSeconds of 604800 (7 days). That shows this environment already uses explicit recovery controls on GCP buckets when they matter.

For documentation, I verified Google Cloud Storage behavior. Google’s Cloud Storage documentation says soft delete is enabled by default on newly created buckets with a seven-day retention unless changed, and deleting a bucket or object under that policy places it into a soft-deleted state that can be restored during the retention window. It also says object deletions are permanent when versioning is not enabled, aside from soft-delete behavior, and that enabling Object Versioning together with soft delete provides stronger protection against accidental or malicious deletion. Those docs materially weaken the hypothesis’s claim that force_destroy=true alone means immediate irreversible deletion of all contents, because the provider/cloud default soft-delete policy will likely give a seven-day recovery window even if Terraform destroys the bucket. But they do not remove the governance and retention problem: the plan still omits explicit versioning, retention, lifecycle, labels, and customer-managed encryption controls.

Impact Assessment

The directly affected resource count is 1: the new bucket github.com/overmindtech/terraform-example.gcp-storage-bucket.test. The two EC2 instances and SNS subscription in the same plan are not configured to reference this bucket in any diff I was shown, and blast-radius data does not establish an actual write path from those instances to the new bucket, so I did not treat the exfiltration path to those AWS resources as proven.

The real blast radius is governance and recoverability of whatever data is later placed in this bucket. Because the bucket is created with force_destroy=true, a future Terraform destroy will not be blocked by residual objects. Since the plan does not explicitly enable object versioning or object retention, overwrites and deletes in this bucket will rely only on Cloud Storage’s default soft-delete window rather than on stronger, durable recovery controls. That means any logs, exports, build artifacts, or rollback data written here would have at most the default recovery behavior, not a codified retention policy, and nothing in the plan ensures longer-term preservation for investigations or rollback. The bucket is also created with no labels and no explicit customer-managed encryption configuration, which violates the organization’s stated governance and encryption standards if any real data is stored there.

Operationally, this is not an immediate outage risk. It is a storage governance and data protection risk scoped to this one new bucket. If teams start using it as a landing zone, accidental overwrite, cleanup, or Terraform destroy can remove the only retained copy after the soft-delete window, and the lack of explicit ownership/retention/encryption settings makes that failure mode hard to govern and audit.

Conclusion

Risk is real. The hypothesis overstates the immediate irrecoverable-deletion mechanism because Google Cloud Storage now defaults new buckets to soft delete, but the change still creates a real storage governance and retention gap: a new bucket with force_destroy=true, no explicit versioning/retention policy, no labels, and no explicit customer-managed encryption is unsafe for any non-trivial data and violates organizational data protection expectations.

✔ Hypothesis proven


EC2 instance lifecycle, ENI, and public IP changes affecting stability and connectivity

Observations 3

Hypothesis

Changes to EC2 instance i-0464c4413cb0c54aa and reliance on instance‑level networking introduce availability and connectivity risks. The attached EBS volume vol-0a61278f4602fc12b has DeleteOnTermination=true, so instance replacement, termination, or lifecycle operations (including some stop/start or recreation flows) can delete or detach the volume, causing data loss or service interruption if the volume holds stateful data without adequate backups or snapshots. The associated ENI eni-069a58a392f35dce3 may have its primary IP or attachment attributes changed, which could invalidate the fixed IP 10.0.101.133 used by a DNS A record, making clients that depend on the stable IP unable to reach the service. Both EC2 instances also expose public IPs/DNS that are changing to known after apply, so any external clients, firewall rules, or scripts pinned to the old public IPs will break once addresses change. The absence of a stable front door (ALB/CloudFront) or static address leaves workloads dependent on unstable instance‑level endpoints, violating reliability and security best practices. Review instance lifecycle and AMI behavior to ensure critical data volumes are preserved or backed up, confirm ENI settings (including PreservePrivateIpAssignments) to avoid unintended IP changes, and identify any consumers pinned to existing public or private IPs so they can be migrated to stable endpoints.

Investigation

Evidence Gathered

I first checked the relevant organizational knowledge for compute, availability, network security, quick-reference notes, multi-region architecture, and security/compliance. The strongest applicable guidance says direct EC2 public endpoints are an anti-pattern and that production EC2 instances should not have public IPs, but those are pre-existing design issues rather than risks introduced by this specific change. The quick-reference file also says the scale-test environment includes EC2 instances created mainly for relationship density, but these particular instances are tagged Environment=production, so I did not rely on that note to dismiss the concern.

I then queried the current state of the affected resources in the blast radius: EC2 instances 540044833068.eu-west-2.ec2-instance.i-0464c4413cb0c54aa and 540044833068.eu-west-2.ec2-instance.i-09d6479fb9b97d123, ENI 540044833068.eu-west-2.ec2-network-interface.eni-069a58a392f35dce3, EBS volume 540044833068.eu-west-2.ec2-volume.vol-0a61278f4602fc12b, DNS name global.dns.ip-10-0-101-133.eu-west-2.compute.internal, plus the attached security group and subnet. The queried state shows i-0464c4413cb0c54aa currently has primary private IP 10.0.101.133, ENI eni-069a58a392f35dce3 attached as device index 0, and root EBS volume vol-0a61278f4602fc12b attached with DeleteOnTermination=true. The DNS object ip-10-0-101-133.eu-west-2.compute.internal is the standard EC2 private DNS name resolving to 10.0.101.133. The only planned diffs for either instance are public_dns and public_ip changing from concrete current values to (known after apply); there are no diffs for private_ip, ENI attachment, AMI, instance type, subnet, security group, block device mappings, or termination behavior.

I also queried all planned changes for both EC2 instances. That confirmed the plan contains no hidden replacement, no stop/start-triggering attribute change, and no root volume or network-interface change beyond Terraform recomputing the public address fields. To validate the semantics, I checked AWS EC2 documentation. AWS documents that a private IPv4 address remains associated with the network interface through stop/start and is only released when the instance is terminated, while auto-assigned public IPv4 addresses are released on stop/hibernate/terminate and a new one may be assigned on next start. Those docs support the general hypothesis that public IPs are unstable and that instance-level public endpoints are weak design, but they do not show that this plan will actually perform the lifecycle event needed to change them. Terraform also commonly shows provider-computed attributes like public_ip and public_dns as (known after apply) even for in-place updates where no meaningful networking change occurs.

Impact Assessment

Directly affected by the plan are 2 EC2 instances: i-0464c4413cb0c54aa (api-server) and i-09d6479fb9b97d123 (api-207c90ee-api-server). Of those, only 1 explicitly referenced networking dependency in the blast radius is the private DNS record ip-10-0-101-133.eu-west-2.compute.internal, which resolves to the same private IP already configured on the ENI. There is also 1 attached root EBS volume in the blast radius for i-0464c4413cb0c54aa.

However, the proposed change does not alter the private IP, does not replace or reattach the ENI, does not change the root volume attachment, and does not introduce any new instance lifecycle action in the diff. That means the concrete failures described in the hypothesis are not evidenced here. There is no support for “DNS A record to 10.0.101.133 will break” because the planned value of private_ip remains 10.0.101.133 and no ENI change is planned. There is no support for “volume deletion/data loss” because the plan is not replacing or terminating the instance or volume. There is no support for “external clients pinned to old public IP will break because this change rotates the public IP” because the plan does not show the trigger that would cause AWS to release and reassign the public IP. The real operational issue is that both instances already rely on direct public IPs and one also has a security group allowing SSH from 0.0.0.0/0, but that is existing exposure, not a change-induced outage path from this plan.

Conclusion

I conclude the risk is not real for this specific change. The hypothesis correctly identifies general fragility around instance-level public IPs and DeleteOnTermination root volumes, but the actual plan only shows computed public address fields becoming (known after apply) and provides no evidence of instance replacement, stop/start, ENI mutation, or volume lifecycle change that would cause connectivity loss or data loss.

✖ Hypothesis disproven


Cross‑cloud AWS→GCP data path outside AWS detective controls

Observations 1

Hypothesis

Establishing a data path from AWS workloads to the GCP bucket gcp-storage-bucket.test in europe-west2 creates a cross‑cloud channel that is outside existing AWS‑native monitoring and control frameworks. If EC2 instances or operational tooling begin writing logs, artifacts, or copied data to this bucket, those transfers may bypass CloudTrail, Config, GuardDuty, and CloudWatch coverage, reducing visibility into access patterns and potential exfiltration. This cross‑cloud pattern must be governed to ensure consistent logging, access controls, encryption, and incident response across both providers. Investigate exactly what data will traverse from AWS to this bucket and align GCP controls and monitoring with AWS standards before treating it as a production dependency.

Investigation

Evidence Gathered

I first checked the relevant organizational guidance: aws-monitoring-detection, aws-data-protection, security-compliance-requirements, and infrastructure-quick-reference. Those standards do make cross-environment monitoring and encryption important, and they specifically require controlled encryption and observability for production data paths.

I then examined all planned diffs for the four changed resources. The only GCP change is creation of a new gcp-storage-bucket.test bucket in EUROPE-WEST2 with uniform_bucket_level_access = true. There are no IAM bindings, service accounts, bucket notifications, object lifecycle rules for business data, transfer jobs, routes, DNS records, startup scripts, application config changes, or credentials that would allow AWS workloads to write to that bucket. The two EC2 instance diffs only change public_ip and public_dns to (known after apply), which is a computed-value refresh rather than a functional config change. The SNS change is an unrelated email subscription to an AWS SNS topic.

I queried the current blast-radius state for both EC2 instances, the VPC, one security group, the SNS topic, and an existing GCP bucket. The current EC2 instances are just ordinary instances in eu-west-2; one has no IAM instance profile at all, and the other has an AWS IAM instance profile but nothing in the change indicates new GCP credentials, Workload Identity Federation, or any storage client configuration. The queried security group allows outbound internet access, which means a cross-cloud path is theoretically possible, but there is no evidence in the plan that this change creates or starts using such a path. The existing GCP bucket in blast radius only shows that GCP storage exists elsewhere in the environment; it does not create a dependency from these AWS instances to the new bucket.

I also checked product documentation. Google Cloud Storage documentation confirms that uniform bucket-level access only changes authorization to bucket-level IAM and does not itself grant any principals access. Google documentation also shows that CMEK/default customer-managed encryption must be configured explicitly for a bucket; this new bucket diff does not do that. Public access prevention is also not explicitly enforced in the diff; if not inherited from organization policy, it may remain unset. Those are real bucket-hardening concerns, but they are GCP bucket governance issues, not evidence that AWS data will traverse to this bucket as part of this change. AWS documentation and the org monitoring guidance support the hypothesis that AWS-native controls would not fully observe arbitrary application-level uploads from EC2 to an external cloud service, but that only matters if the change actually introduces such uploads.

Impact Assessment

Directly affected by this change are 4 resources: 1 new GCP storage bucket, 2 EC2 instances with recomputed public IP/DNS fields, and 1 new SNS email subscription. Of the hypothesis-named resources, both EC2 instances remain in the same AWS VPC and subnet, and no downstream application, IAM, or networking resource in the plan is updated to reference the new bucket.

Operationally, I found no planned data path from AWS to gcp-storage-bucket.test. There are zero changed resources that configure object writes, log export, replication, rsync jobs, agents, service account keys, or application settings targeting overmind-terraform-example-tf-test. Because no producer, credential, or integration is being introduced, there is no concrete blast radius for data exfiltration or observability loss from this specific change. The scope of the actual change is limited to bucket creation plus unrelated AWS metadata/subscription updates.

There is still a governance gap worth noting outside this binary decision: the bucket is being created without explicit CMEK, explicit public access prevention in the diff, versioning, or logging controls. If this bucket were later used for production data, that would need review. But this investigation is about whether this change creates the hypothesized AWS→GCP monitored-data-path risk, and the evidence does not show that it does.

Conclusion

I conclude the risk is not real for this change. The key evidence is that the plan creates only a standalone GCP bucket; it does not add any IAM, credentials, application configuration, or transfer mechanism that would cause the affected AWS instances or tooling to send data to it.

✖ Hypothesis disproven


💥 Blast Radius

Items 10

Edges 26

Copy link
Copy Markdown

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overmind

⛔ Auto-Blocked


🔴 Decision

Auto-blocked: Routine score (-5) is below minimum (-1)


📊 Signals Summary

Routine 🔴 -5


🔥 Risks Summary

High 0 · Medium 2 · Low 0


💥 Blast Radius

Items 10 · Edges 26


View full analysis in Overmind ↗

@dylanratcliffe dylanratcliffe merged commit bb25044 into main Mar 26, 2026
6 of 7 checks passed
@dylanratcliffe dylanratcliffe deleted the gcp-demo branch March 26, 2026 19:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant