Skip to content

fix: apply retention policy to derived indexes — search metadata#4116

Open
neuralmint wants to merge 7 commits into
orchestration-agent:mainfrom
neuralmint:fix/retention-derived-indexes-4114
Open

fix: apply retention policy to derived indexes — search metadata#4116
neuralmint wants to merge 7 commits into
orchestration-agent:mainfrom
neuralmint:fix/retention-derived-indexes-4114

Conversation

@neuralmint
Copy link
Copy Markdown

Bug

When task artifacts expire under the primary retention policy, derived index entries (group index by agent type) can remain because cleanup only targets primary artifact records. This leaves stale operational storage metadata.

Fix

  • Retention TTL — configure global or per-type TTLs via set_retention_ttl()
  • Retention hooks — register callbacks with register_retention_hook() to react to retention-driven removal
  • apply_retention_policy() — cascade deletion from primary artifacts to derived indexes when TTL is exceeded
  • reconcile_indexes() — periodic reconciliation to find and remove stale derived index entries
  • search_metadata() — expose derived index state for introspection

Acceptance Criteria

  • ✅ Deleting an artifact removes its derived index entries
  • ✅ Reconciliation reports stale derived records and removes them
  • ✅ Retention tests cover primary and derived storage together
  • ✅ All 22 new tests + 47 existing tests pass

Test Output

69 passed in 0.22s

Closes #4114

…on-agent#3947)

Bounty orchestration-agent#3947 — Bound retry metadata growth on repeated failures.

Changes:
- Added MAX_RETRY_METADATA hard cap (100) to prevent unbounded retry
  counter growth.
- Added dead_letter store for permanently failed tasks.
- fail() now enforces the repeated-failures invariant before re-enqueueing:
  tasks that exceed max_retries or the hard cap go to dead letter instead.
- enqueue() rejects tasks past the hard cap and returns None for the
  caller to handle.
- Added preserve_retries parameter to enqueue() so retry metadata is
  preserved during re-enqueue (idempotent retry path).
- Scheduled task promotion (dequeue) also respects the hard cap.
- Added detailed logging for all retry/rejection decisions.
- Backward compatible: default max_retries remains 3; existing callers
  unaffected.
- Regression tests cover: repeated-failures trigger, metadata bound,
  idempotent fail, dead-letter isolation, exhausted enqueue rejection.
Add an atomic state precondition in the scheduler dequeue path to
reject tasks whose associated run has been deleted. This prevents
stale, duplicate, or policy-violating transitions when a workflow
is removed concurrently with run materialization.

Changes:
- Add  tracking set with  method
- Add  precondition in  — rejects
  both queued and scheduled tasks for deleted runs
- Bounded audit metadata via structured logging (warn-level with
  run_id and task_id context)
- Fix pre-existing bug:  dict now stores task dicts
  alongside timestamps so data is not lost during promotion
- Wire up WorkflowManager in OrchestrationEngine for future
  mark_run_deleted integration
- Add 5 deterministic regression tests covering:
  * Dequeue rejection for deleted runs
  * Scheduled task skip for deleted runs
  * Idempotent mark_run_deleted
  * Normal unaffected workflows
  * Isolated deletion between concurrent runs

Closes orchestration-agent#3977
Adds a data lake governance module that enforces purpose limitation on
ingestion writes. Every data lake write now requires purpose metadata
(purpose, data class, owner, destination) and is blocked when the
destination is not approved for that data class.

New components:
- DataClassificationRegistry: registers data classes with approved
  destinations; supports wildcard (all destinations) via empty set
- PurposeMetadata: declares purpose, data class, owner, destination
- IngestionManifest: full manifest for data lake writes
- DataLakeGovernor: validates manifests, enforces policy, records
  audit log with grouping by purpose and owner
- Custom errors: MissingPurposeMetadataError, DataClassNotRegisteredError,
  DestinationNotApprovedError

All 19 new tests pass. Existing test suite unaffected.

Closes orchestration-agent#3998
- Add release workflow (release.yml) that:
  - Triggers on version tags (v*)
  - Builds packages with uv build
  - Generates build provenance attestation via actions/attest-build-provenance
  - Creates GitHub Releases with attested artifacts
  - Publishes to PyPI with attestation support
- Add artifact verification section to README with gh CLI instructions

The attestation includes source repository, commit SHA, workflow run,
and artifact digest — enabling consumers to verify artifact provenance.

Closes orchestration-agent#4050
…tadata

Closes orchestration-agent#4088

Multi-stage Dockerfile isolates all build-time-only ARG declarations
(BUILD_ENV, PIP_INDEX_URL, UV_VERSION) inside the builder stage.
The final runtime stage inherits zero build-time ARGs, preventing
leakage into image history, labels, or environment variables.

Changes:
- Dockerfile: two-stage build (builder → final), ARGs only in builder
- .dockerignore: exclude dev/CI artifacts from build context
- infra/docker-compose.yml: pass args only to builder stage
- infra/scripts/audit_image_metadata.sh: CI audit for leaked metadata
- .github/workflows/ci.yml: add docker-build-and-audit job
- Makefile: docker-audit / docker-build-slim targets
…tion-agent#4096)

When a MetricsCollector is created with max_counter=N, the increment()
method clamps counter values to N, preventing unbounded growth that can
cause downstream metrics systems to reject large integers.

This is opt-in behavior — existing code passing no max_counter continues
to work identically.

Closes orchestration-agent#4096
Add retention policy hooks and index reconciliation to AgentRegistry so
that when task artifacts expire under the primary retention policy the
derived group-index entries are also cleaned up.

Changes
-------
- AgentRegistry.set_retention_ttl() / get_retention_ttl() — configure
  per-type or global retention TTL in seconds.
- AgentRegistry.register_retention_hook() — register callbacks that fire
  when an agent is removed by the retention policy.
- AgentRegistry.apply_retention_policy() — iterate primary artifacts and
  remove those whose age exceeds the configured TTL, cleaning both the
  primary store and the derived group index while firing registered hooks.
- AgentRegistry.reconcile_indexes() — find and remove stale derived index
  entries that reference agent IDs no longer present in the primary store.
- AgentRegistry.search_metadata() — expose derived index state for
  introspection and search-metadata consumers.

Tests
-----
- 22 new tests covering: search metadata, retention TTL configuration,
  policy application, hook invocation, index reconciliation, integration
  acceptance criteria from the issue.

Closes orchestration-agent#4114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ Bounty $4k ] [ Storage ] Apply retention policy to derived indexes — search metadata

1 participant