Skip to content

Conversation

@andreasjansson
Copy link
Collaborator

Replace the FUSE mount (s3fs) + rsync approach with rclone for direct S3 API access. This eliminates the root cause of slow syncs (200-400s) and the cascading bugs caused by s3fs latency.

Key changes:

  • Dockerfile: install rclone instead of rsync
  • Restore: startup script uses rclone copy from R2 (fast, parallel)
  • Backup: uses rclone sync (not copy) to propagate deletions to R2
  • Background sync loop runs IN the container, watches for file changes every 30s and uploads via rclone sync
  • Manual sync: uses sandbox.exec() which returns ExecResult directly, eliminating all startProcess/getStatus/polling workarounds
  • Remove cron trigger: no more Worker-side scheduled handler that could exceed time limits and reset the Durable Object
  • Remove FUSE mount: no mountBucket, no isR2Mounted, no R2_MOUNT_PATH
  • Pass R2 credentials to container env for rclone config
  • Rclone restore commands tolerate transient R2 errors (prevent set -e from killing the startup script on flaky API responses)
  • E2e tests verify data reaches R2 (rclone ls) and test the full backup-delete-restart-restore cycle via gateway restart

Replace the FUSE mount (s3fs) + rsync approach with rclone for direct
S3 API access. This eliminates the root cause of slow syncs (200-400s)
and the cascading bugs caused by s3fs latency.

Key changes:

- Dockerfile: install rclone instead of rsync
- Restore: startup script uses rclone copy from R2 (fast, parallel)
- Backup: uses rclone sync (not copy) to propagate deletions to R2
- Background sync loop runs IN the container, watches for file changes
  every 30s and uploads via rclone sync
- Manual sync: uses sandbox.exec() which returns ExecResult directly,
  eliminating all startProcess/getStatus/polling workarounds
- Remove cron trigger: no more Worker-side scheduled handler that
  could exceed time limits and reset the Durable Object
- Remove FUSE mount: no mountBucket, no isR2Mounted, no R2_MOUNT_PATH
- Pass R2 credentials to container env for rclone config
- Rclone restore commands tolerate transient R2 errors (prevent set -e
  from killing the startup script on flaky API responses)
- E2e tests verify data reaches R2 (rclone ls) and test the full
  backup-delete-restart-restore cycle via gateway restart
- Fix lint: use deviceId as React key instead of array index fallback
@github-actions
Copy link

E2E Test Recording (telegram)

✅ Tests passed

E2E Test Video

@github-actions
Copy link

E2E Test Recording (discord)

✅ Tests passed

E2E Test Video

@github-actions
Copy link

E2E Test Recording (base)

✅ Tests passed

E2E Test Video

@github-actions
Copy link

E2E Test Recording (workers-ai)

✅ Tests passed

E2E Test Video

@andreasjansson andreasjansson merged commit f688dfb into main Feb 11, 2026
8 checks passed
@andreasjansson andreasjansson deleted the rclone branch February 11, 2026 12:10
scott-edwards added a commit to scott-edwards/alfred that referenced this pull request Feb 11, 2026
Upstream PRs merged:
- cloudflare#235: Fix R2 persistence (waitForProcess, exitCode, restore race)
- cloudflare#240: Replace s3fs/rsync with rclone (removes cron trigger)

Conflict resolution:
- Keep our GATEWAY_REQUEST_TIMEOUT_MS and CONTAINER_FETCH_TIMEOUT_MS
- Drop CRON_TIMEOUT_MS and R2_MOUNT_PATH (no longer needed)
- Remove scheduled handler (sync now runs inside container)
- Keep sleepAfter: '4h' fix for keepAlive death spiral

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant