fix: apply retention policy to derived indexes — search metadata#4116
Open
neuralmint wants to merge 7 commits into
Open
fix: apply retention policy to derived indexes — search metadata#4116neuralmint wants to merge 7 commits into
neuralmint wants to merge 7 commits into
Conversation
…on-agent#3947) Bounty orchestration-agent#3947 — Bound retry metadata growth on repeated failures. Changes: - Added MAX_RETRY_METADATA hard cap (100) to prevent unbounded retry counter growth. - Added dead_letter store for permanently failed tasks. - fail() now enforces the repeated-failures invariant before re-enqueueing: tasks that exceed max_retries or the hard cap go to dead letter instead. - enqueue() rejects tasks past the hard cap and returns None for the caller to handle. - Added preserve_retries parameter to enqueue() so retry metadata is preserved during re-enqueue (idempotent retry path). - Scheduled task promotion (dequeue) also respects the hard cap. - Added detailed logging for all retry/rejection decisions. - Backward compatible: default max_retries remains 3; existing callers unaffected. - Regression tests cover: repeated-failures trigger, metadata bound, idempotent fail, dead-letter isolation, exhausted enqueue rejection.
Add an atomic state precondition in the scheduler dequeue path to reject tasks whose associated run has been deleted. This prevents stale, duplicate, or policy-violating transitions when a workflow is removed concurrently with run materialization. Changes: - Add tracking set with method - Add precondition in — rejects both queued and scheduled tasks for deleted runs - Bounded audit metadata via structured logging (warn-level with run_id and task_id context) - Fix pre-existing bug: dict now stores task dicts alongside timestamps so data is not lost during promotion - Wire up WorkflowManager in OrchestrationEngine for future mark_run_deleted integration - Add 5 deterministic regression tests covering: * Dequeue rejection for deleted runs * Scheduled task skip for deleted runs * Idempotent mark_run_deleted * Normal unaffected workflows * Isolated deletion between concurrent runs Closes orchestration-agent#3977
Adds a data lake governance module that enforces purpose limitation on ingestion writes. Every data lake write now requires purpose metadata (purpose, data class, owner, destination) and is blocked when the destination is not approved for that data class. New components: - DataClassificationRegistry: registers data classes with approved destinations; supports wildcard (all destinations) via empty set - PurposeMetadata: declares purpose, data class, owner, destination - IngestionManifest: full manifest for data lake writes - DataLakeGovernor: validates manifests, enforces policy, records audit log with grouping by purpose and owner - Custom errors: MissingPurposeMetadataError, DataClassNotRegisteredError, DestinationNotApprovedError All 19 new tests pass. Existing test suite unaffected. Closes orchestration-agent#3998
- Add release workflow (release.yml) that: - Triggers on version tags (v*) - Builds packages with uv build - Generates build provenance attestation via actions/attest-build-provenance - Creates GitHub Releases with attested artifacts - Publishes to PyPI with attestation support - Add artifact verification section to README with gh CLI instructions The attestation includes source repository, commit SHA, workflow run, and artifact digest — enabling consumers to verify artifact provenance. Closes orchestration-agent#4050
…tadata Closes orchestration-agent#4088 Multi-stage Dockerfile isolates all build-time-only ARG declarations (BUILD_ENV, PIP_INDEX_URL, UV_VERSION) inside the builder stage. The final runtime stage inherits zero build-time ARGs, preventing leakage into image history, labels, or environment variables. Changes: - Dockerfile: two-stage build (builder → final), ARGs only in builder - .dockerignore: exclude dev/CI artifacts from build context - infra/docker-compose.yml: pass args only to builder stage - infra/scripts/audit_image_metadata.sh: CI audit for leaked metadata - .github/workflows/ci.yml: add docker-build-and-audit job - Makefile: docker-audit / docker-build-slim targets
…tion-agent#4096) When a MetricsCollector is created with max_counter=N, the increment() method clamps counter values to N, preventing unbounded growth that can cause downstream metrics systems to reject large integers. This is opt-in behavior — existing code passing no max_counter continues to work identically. Closes orchestration-agent#4096
Add retention policy hooks and index reconciliation to AgentRegistry so that when task artifacts expire under the primary retention policy the derived group-index entries are also cleaned up. Changes ------- - AgentRegistry.set_retention_ttl() / get_retention_ttl() — configure per-type or global retention TTL in seconds. - AgentRegistry.register_retention_hook() — register callbacks that fire when an agent is removed by the retention policy. - AgentRegistry.apply_retention_policy() — iterate primary artifacts and remove those whose age exceeds the configured TTL, cleaning both the primary store and the derived group index while firing registered hooks. - AgentRegistry.reconcile_indexes() — find and remove stale derived index entries that reference agent IDs no longer present in the primary store. - AgentRegistry.search_metadata() — expose derived index state for introspection and search-metadata consumers. Tests ----- - 22 new tests covering: search metadata, retention TTL configuration, policy application, hook invocation, index reconciliation, integration acceptance criteria from the issue. Closes orchestration-agent#4114
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
When task artifacts expire under the primary retention policy, derived index entries (group index by agent type) can remain because cleanup only targets primary artifact records. This leaves stale operational storage metadata.
Fix
set_retention_ttl()register_retention_hook()to react to retention-driven removalapply_retention_policy()— cascade deletion from primary artifacts to derived indexes when TTL is exceededreconcile_indexes()— periodic reconciliation to find and remove stale derived index entriessearch_metadata()— expose derived index state for introspectionAcceptance Criteria
Test Output
Closes #4114