Skip to content

Integrate AWS S3 for Pipeline Artifact Storage [5](#header-5) #39

@fuzziecoder

Description

@fuzziecoder

🎯 Issue Summary

Add AWS S3 integration to store pipeline execution artifacts (logs, outputs, reports).

📋 Current Behavior

Pipeline outputs are stored in-memory or locally, with no persistent artifact storage.

Current Limitations:

  • No artifact persistence
  • No centralized storage
  • Limited to local filesystem

✨ Proposed Solution

Integrate AWS S3 for:

  • Storing execution logs
  • Saving pipeline outputs/artifacts
  • Archiving historical execution data
  • Generating presigned URLs for artifact access

🔧 Technical Requirements

1. AWS SDK Setup

  • Add boto3 to requirements.txt
  • Create backend/integrations/s3.py
  • Configure AWS credentials in config

2. S3 Client

  • Create S3ArtifactStore class
  • Implement upload_artifact()
  • Implement download_artifact()
  • Implement generate_presigned_url()

3. Artifact Management

  • Upload execution logs to S3 after completion
  • Store pipeline outputs by execution ID
  • Organize artifacts: s3://bucket/pipelines/{id}/executions/{exec_id}/

4. API Integration

  • Add GET /api/executions/{id}/artifacts endpoint
  • Return presigned URLs for artifact download
  • Support artifact listing

5. Configuration

  • Add S3_BUCKET, AWS_REGION to config
  • Support IAM role or access key authentication
  • Add artifact retention policy

📝 Acceptance Criteria

  • ✅ Execution logs uploaded to S3 automatically
  • ✅ Artifacts accessible via presigned URLs
  • ✅ S3 bucket organized by pipeline/execution
  • ✅ API returns artifact URLs
  • ✅ Configurable retention period

💡 Implementation Example

# backend/integrations/s3.py  [6](#header-6)
import boto3  
from datetime import timedelta  
  
class S3ArtifactStore:  
    def __init__(self, bucket: str, region: str):  
        self.s3 = boto3.client('s3', region_name=region)  
        self.bucket = bucket  
      
    async def upload_artifact(self, pipeline_id: str, execution_id: str,   
                             filename: str, content: bytes):  
        key = f"pipelines/{pipeline_id}/executions/{execution_id}/{filename}"  
        self.s3.put_object(Bucket=self.bucket, Key=key, Body=content)  
        return key  
      
    def generate_presigned_url(self, key: str, expiration: int = 3600):  
        return self.s3.generate_presigned_url(  
            'get_object',  
            Params={'Bucket': self.bucket, 'Key': key},  
            ExpiresIn=expiration  
        )
📚 Resources
[Boto3 Documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/index.html)
[S3 Presigned URLs](https://docs.aws.amazon.com/AmazonS3/latest/userguide/ShareObjectPreSignedURL.html)

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions