Skip to content

Feature: Archive Retrieval#61

Merged
rosemcc merged 4 commits into
mainfrom
feature/archive-retrieval
May 28, 2026
Merged

Feature: Archive Retrieval#61
rosemcc merged 4 commits into
mainfrom
feature/archive-retrieval

Conversation

@rosemcc
Copy link
Copy Markdown
Contributor

@rosemcc rosemcc commented May 25, 2026

This pull request introduces the archive retrieval workflow to the project, adding support for restoring archived data from object storage back to a user-accessible location. It includes new models, API request/response schemas, and core logic for managing retrieval jobs.

Archive Retrieval Workflow Implementation:

  • Added the ArchiveRetrieval SQLModel and RetrievalJobStage enum in models/retrieval.py to track retrieval jobs and their lifecycle stages, supporting resume and failure handling.
  • Introduced CreateRetrievalRequest and CreateRetrievalResponse models for API requests and responses when initiating a retrieval job.
  • Implemented S3 utility functions in activescale.py for restoring archived objects (initiate_object_restore), polling their availability (is_object_ready_for_download), and streaming downloads to disk (download_file_to_disk).

Configuration:

  • Added new settings in config.py for controlling retrieval behavior, including the ActiveScale bucket name, restore duration, polling intervals, and maximum wait time.

BagIt Validation:

  • Added a utility function validate_bag in manifests.py to verify the integrity of BagIt archives as part of the retrieval and extraction process.

@rosemcc rosemcc marked this pull request as ready for review May 28, 2026 21:35
@rosemcc rosemcc merged commit 51149c9 into main May 28, 2026
1 check failed
@rosemcc rosemcc deleted the feature/archive-retrieval branch May 28, 2026 21:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant