Skip to content

ralftar/CloudDump

Β 
Β 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

271 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌩️ CloudDump

Publish License: MIT

Keep a copy of your cloud data somewhere you control.

The cloud is just someone else's computer. CloudDump pulls your persistent data β€” S3 buckets, Azure Blob Storage, PostgreSQL databases β€” down to on-premises storage, another cloud, or wherever you want. On a schedule, unattended, with email notifications when things succeed or fail.

Not a backup system

CloudDump is not a backup system. There is no rotation, no versioning, no retention policies. It gives you a current-state copy of your data, synced on a cron schedule. What you do with that copy β€” feed it into Restic, Borg, Veeam, tape, a RAID array in your basement β€” is up to you.

Why

You store data in S3 or Azure. Your databases run in the cloud. That's fine β€” until a provider has an outage, a misconfigured IAM policy deletes your bucket, or you just want to sleep better knowing there's a copy on hardware you own.

CloudDump runs as a single Docker container. Point it at your cloud resources, tell it when to sync, and forget about it. If something breaks, you get an email.

Disaster recovery

CloudDump can be a key component in your disaster recovery plan. Critically, it pulls data from the cloud β€” the cloud provider has no knowledge of your local copy. This means a compromised or malfunctioning cloud environment cannot delete, encrypt, or tamper with data it doesn't know exists. The dependency flows one way: your copy depends on the cloud being reachable, but the cloud has zero control over what you already have.

A typical DR setup:

  1. CloudDump syncs cloud data to local storage on a schedule.
  2. A backup tool (Restic, Borg, Veeam, etc.) snapshots the local copy with versioning and retention.
  3. A dead-man switch (e.g. Healthchecks.io) alerts you when expected emails stop arriving β€” a silent failure is worse than a loud one.
  4. Regular restore drills β€” periodically verify that you can actually restore from the local copy.

Quick start

1. Create a config file

{
  "settings": {
    "HOST": "myserver",
    "SMTPSERVER": "smtp.example.com",
    "SMTPPORT": "465",
    "SMTPUSER": "alerts@example.com",
    "SMTPPASS": "smtp-password",
    "MAILFROM": "alerts@example.com",
    "MAILTO": "ops@example.com, oncall@example.com"
  },
  "jobs": [
    {
      "type": "s3bucket",
      "id": "prod-assets",
      "crontab": "0 3 * * *",
      "buckets": [
        {
          "source": "s3://my-bucket",
          "destination": "/mnt/clouddump/s3",
          "aws_access_key_id": "AKIAIOSFODNN7EXAMPLE",
          "aws_secret_access_key": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY",
          "aws_region": "eu-west-1"
        }
      ]
    }
  ]
}

2. Run the container

docker run -d --restart always \
  --name clouddump \
  --mount type=bind,source=$(pwd)/config.json,target=/config/config.json,readonly \
  --volume /backup:/mnt/clouddump \
  ghcr.io/ralftar/clouddump:latest

That's it. CloudDump will sync your S3 bucket to /backup/s3 every day at 03:00 and email you the result.

Supported sources

Source Job type Tool used Auth
AWS S3 s3bucket AWS CLI Access key + secret
S3-compatible (MinIO, etc.) s3bucket AWS CLI Access key + secret + endpoint_url
Azure Blob Storage azstorage AzCopy SAS token in source URL
PostgreSQL pgsql pg_dump / psql Host, port, user, password

Features

  • Cron scheduling β€” standard 5-field cron patterns (0 3 * * *, */15 * * * *)
  • Catch-up execution β€” if a scheduled run is missed because another job is still running, it fires as soon as the slot opens (within a 60-minute window)
  • Retry & timeout β€” configurable per job (default: 3 attempts, 1-week timeout)
  • Email reports β€” success/failure notifications with log file attached
  • Mount support β€” SSH (sshfs) and SMB (smbnetfs) destinations without elevated privileges
  • Credential redaction β€” passwords, keys, and tokens are scrubbed from logs and emails automatically
  • Health check β€” built-in Docker HEALTHCHECK via heartbeat file
  • Graceful shutdown β€” SIGTERM forwarded to child processes

Configuration reference

Settings

Key Required Description
HOST No Hostname shown in email subjects
SMTPSERVER No SMTP server (SSL, port 465)
SMTPPORT No SMTP port
SMTPUSER No SMTP username
SMTPPASS No SMTP password
MAILFROM No Sender address
MAILTO No Recipient address(es) β€” comma-separated or JSON array
DEBUG No Enable debug logging (true/false)
mount No Array of SSH/SMB mount definitions

Email is optional. If SMTP is not configured, CloudDump runs silently. MAILTO accepts multiple recipients as a comma-separated string ("ops@example.com, oncall@example.com") or a JSON array (["ops@example.com", "oncall@example.com"]).

Job fields

Key Required Default Description
id Yes β€” Unique job identifier
type Yes β€” s3bucket, azstorage, or pgsql
crontab Yes β€” 5-field cron pattern
timeout No 604800 (7 days) Job timeout in seconds
retries No 3 Number of attempts on failure

Plus type-specific fields (buckets, blobstorages, servers) β€” see examples below.

S3 bucket

{
  "type": "s3bucket",
  "id": "my-s3-job",
  "crontab": "0 2 * * *",
  "buckets": [
    {
      "source": "s3://bucket-name/optional-prefix",
      "destination": "/mnt/clouddump/s3",
      "delete_destination": false,
      "aws_access_key_id": "AKIA...",
      "aws_secret_access_key": "...",
      "aws_region": "us-east-1",
      "endpoint_url": ""
    }
  ]
}

Set endpoint_url for S3-compatible storage like MinIO:

"endpoint_url": "https://minio.example.com:9000"

Azure Blob Storage

{
  "type": "azstorage",
  "id": "my-azure-job",
  "crontab": "*/5 * * * *",
  "blobstorages": [
    {
      "source": "https://account.blob.core.windows.net/container?sv=...&sig=...",
      "destination": "/mnt/clouddump/azure",
      "delete_destination": true
    }
  ]
}

The source URL includes the SAS token for authentication.

PostgreSQL

{
  "type": "pgsql",
  "id": "my-pg-job",
  "crontab": "0 4 * * *",
  "servers": [
    {
      "host": "db.example.com",
      "port": 5432,
      "user": "backup_user",
      "pass": "password",
      "databases": [
        { "mydb": { "tables_included": [], "tables_excluded": ["large_logs"] } }
      ],
      "databases_excluded": ["template0", "template1"],
      "backuppath": "/mnt/clouddump/pg",
      "filenamedate": true,
      "compress": true
    }
  ]
}
  • databases: explicit list with per-database table filters. If empty, all databases are dumped (except databases_excluded).
  • compress: bzip2 compression of dump files.
  • filenamedate: append timestamp to dump filenames.

Mounts

"mount": [
  {
    "path": "user@host:/remote/path",
    "mountpoint": "/mnt/ssh-target",
    "privkey": "/config/id_rsa"
  },
  {
    "path": "//server/share",
    "mountpoint": "/mnt/smb-target",
    "username": "user",
    "password": "pass"
  }
]

Mounts are set up at startup before any jobs run. Use them as backup destinations in your job configs.

Architecture

CloudDump is a single-process Python application in a Debian 12 container.

config.json ──> [Orchestrator] ──> aws s3 sync
                     β”‚          ──> azcopy sync
                     β”‚          ──> pg_dump / psql
                     β”‚
                     β”œβ”€β”€ cron scheduler (check every 60s)
                     β”œβ”€β”€ sequential job execution
                     β”œβ”€β”€ signal forwarding (SIGTERM β†’ child)
                     └── email reports (SMTPS)

Jobs run one at a time. If job B is scheduled while job A is still running, job B fires as soon as A finishes (within a 60-minute catch-up window). This keeps resource usage predictable and avoids conflicts on shared destinations.

Bundled tools

Tool Source
AWS CLI Debian apt (v1)
AzCopy Microsoft apt repo
PostgreSQL client Debian apt (v15)

Troubleshooting

Container won't start β€” Verify config.json is valid JSON and mounted at /config/config.json. CloudDump validates all jobs at startup and logs errors to stdout.

Jobs not running β€” Check your cron syntax. Supported: *, */N, exact values. Not supported: ranges (1-5), lists (1,3,5). Check container logs for scheduling messages.

Email not working β€” CloudDump uses SMTPS (SSL on port 465). Verify the container can reach your SMTP server. Check logs for Failed to send email messages.

Mount failures β€” SSH mounts need a valid key and reachable host. SMB mounts need FUSE support in the container runtime. Check logs for mount errors at startup.

Debug mode β€” Set "DEBUG": true in settings for verbose logging.

Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

License

MIT β€” Copyright (c) 2023 VENDANOR AS

About

Devops πŸ› οΈ Current-state backup of Azure blob storages and PostgreSQL databases containerized and by schedule; result reports by email.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.9%
  • Dockerfile 2.1%