Seedbraid is a reference-based reconstruction tool for large, similar binary artifacts.
It combines deterministic content-defined chunking (CDC), a compact binary SBD1 seed format, reusable genome storage, and optional IPFS transport so you can ship reconstruction intent instead of repeatedly shipping full blobs.
Seedbraid is designed for workflows where ordinary file distribution becomes wasteful:
- large binary artifacts change often, but stay mostly similar
- fixed-size chunking loses reuse under shifted offsets
- you want compact transport plus bit-perfect restore guarantees
- you want one CLI surface for encode, verify, decode, publish, and fetch
In short: Seedbraid helps you move less data, reuse more content, and still verify exact reconstruction.
Seedbraid works especially well for:
- large binary versioning: datasets, ML models, media assets, VM images
- distribution of many similar files across releases
- shift-heavy changes such as insertions that break fixed chunk reuse
- IPFS-based distribution and retrieval with integrity validation
- environments where transfer size, dedup reuse, and reproducibility matter
- Lossless encode/decode with SHA-256 verification
- Deterministic chunking with
fixed,cdc_buzhash, andcdc_rabin - Genome storage backed by SQLite for deduplicated chunk reuse
SBD1binary seed container with manifest, recipe, optional RAW, and integrity data- IPFS publish/fetch transport
- Optional remote pin integration
- Strict verification mode for production-grade restore checks
- Optional signing and encryption support
pip install seedbraidpipx install seedbraid
seedbraid --helpuvx seedbraid --help
uvx seedbraid doctor# pip
pip install "seedbraid[zstd]"
pip install "seedbraid[crypto]" # encryption / signing support
# pipx
pipx install "seedbraid[zstd]"
pipx install "seedbraid[crypto]"
# uvx
uvx --from "seedbraid[zstd]" seedbraid doctor
uvx --from "seedbraid[crypto]" seedbraid doctorseedbraid encode input.bin --genome ./genome --out seed.sbd --portableseedbraid verify seed.sbd --genome ./genome --strictseedbraid decode seed.sbd --genome ./genome --out recovered.bincmp -s input.bin recovered.bin && echo "bit-perfect roundtrip: OK"Note: If you installed via
uvx, prefix commands withuvx(e.g.uvx seedbraid encode ...). For development builds, useuv run --no-editable seedbraidinstead.
A common Seedbraid workflow looks like this:
- Prime or learn reusable chunks into a genome
- Encode a target artifact into a compact
SBD1seed - Verify integrity before distribution
- Publish the seed if needed, including via IPFS
- Fetch and decode later using the genome
- Run strict verification when exact restore is required
Seedbraid v2.0.0 is production-ready.
Before deploying to your environment, validate behavior in your own runtime, storage, and network configuration.
Treat successful verify --strict and bit-perfect restore checks as release gates.
Before using Seedbraid in CI/CD or production pipelines, run a strict smoke workflow like this:
uv sync --no-editable --extra dev
workdir="$(mktemp -d)"
python3 - <<'PY' "$workdir/input.bin"
from pathlib import Path
import sys
out = Path(sys.argv[1])
payload = (b"seedbraid-beta-smoke" * 20000) + bytes(range(256)) * 200
out.write_bytes(payload)
print(f"wrote {out} bytes={len(payload)}")
PY
uv run --no-sync --no-editable seedbraid encode "$workdir/input.bin" \
--genome "$workdir/genome" \
--out "$workdir/seed.sbd" \
--chunker cdc_buzhash \
--avg 65536 --min 16384 --max 262144 \
--learn --portable --compression zlib
uv run --no-sync --no-editable seedbraid verify "$workdir/seed.sbd" \
--genome "$workdir/genome" \
--strict
uv run --no-sync --no-editable seedbraid decode "$workdir/seed.sbd" \
--genome "$workdir/genome" \
--out "$workdir/decoded.bin"
cmp -s "$workdir/input.bin" "$workdir/decoded.bin" \
&& echo "bit-perfect roundtrip: OK"All examples below use bare
seedbraid. If you installed viauvx, prefix withuvx. For development builds, useuv run --no-editable seedbraid.
seedbraid encode input.bin --genome ./genome --out seed.sbd
seedbraid encode input.bin --genome ./genome --out seed.sbd \
--chunker cdc_buzhash --avg 65536 --min 16384 --max 262144 \
--learn --no-portable --compression zlib
seedbraid encode input.bin --genome ./genome --out seed.private.sbd \
--manifest-private
export SB_ENCRYPTION_KEY='your-secret-passphrase'
seedbraid encode input.bin --genome ./genome --out seed.encrypted.sbd \
--encrypt --manifest-privateseedbraid decode seed.sbd --genome ./genome --out recovered.bin
seedbraid decode seed.encrypted.sbd --genome ./genome --out recovered.bin \
--encryption-key "$SB_ENCRYPTION_KEY"seedbraid verify seed.sbd --genome ./genome
seedbraid verify seed.sbd --genome ./genome --strict
seedbraid verify seed.sbd --genome ./genome --require-signature --signature-key "$SB_SIGNING_KEY"
seedbraid verify seed.encrypted.sbd --genome ./genome --strict \
--encryption-key "$SB_ENCRYPTION_KEY"verify supports two modes:
- Quick mode: checks seed integrity and required chunk availability
- Strict mode: reconstructs all content and enforces source size and SHA-256 match
seedbraid prime "./dataset/**/*" --genome ./genome --chunker cdc_buzhashseedbraid doctor --genome ./genomedoctor checks:
- Python runtime compatibility (
>=3.12) - kubo API reachability (
SB_KUBO_API) IPFS_PATHstate- genome path writability
- compression support (
zlib, optionalzstd)
seedbraid genome snapshot --genome ./genome --out genome.sgs
seedbraid genome restore genome.sgs --genome ./genome-dr --replaceseedbraid publish-chunks seed.sbd --genome ./genome
seedbraid publish-chunks seed.sbd --genome ./genome \
--manifest-out chunks.json --workers 32
seedbraid publish-chunks seed.sbd --genome ./genome \
--pin --remote-pin \
--remote-endpoint https://pin.example/api/v1 \
--remote-token "$SB_PINNING_TOKEN"publish-chunks publishes all CDC chunks referenced by a seed to IPFS as raw blocks, generates a chunk manifest sidecar (.sbd.chunks.json), and optionally pins the chunk DAG locally or via a remote pinning provider.
seedbraid fetch-decode seed.sbd --out recovered.bin
seedbraid fetch-decode seed.sbd --out recovered.bin \
--workers 64 --batch-size 200 --retries 5
seedbraid fetch-decode seed.sbd --out recovered.bin \
--gateway https://ipfs.io/ipfsfetch-decode reads a seed and its chunk manifest, fetches all chunks from IPFS in parallel batches, and reconstructs the original file. Requires the chunk manifest sidecar (.sbd.chunks.json) alongside the seed.
seedbraid decode seed.sbd --genome ipfs:// --out recovered.bin
seedbraid decode seed.sbd --genome ipfs:///path/to/cache --out recovered.bin
seedbraid decode seed.sbd --genome ipfs:// --out recovered.bin \
--gateway https://ipfs.io/ipfsUsing --genome ipfs:// activates hybrid storage: chunks are fetched from IPFS with local SQLite caching. ipfs:// uses a temporary cache; ipfs:///path/to/cache persists fetched chunks for future reuse.
seedbraid publish seed.sbd --no-pin
seedbraid publish seed.sbd --pin
seedbraid publish seed.sbd --remote-pin \
--remote-endpoint https://pin.example/api/v1 --remote-token "$SB_PINNING_TOKEN"publish emits a warning when the seed is unencrypted. For sensitive data, prefer:
seedbraid encode --encrypt --manifest-private ...When --remote-pin is enabled, Seedbraid also registers the CID with a configured Pinning Services API-compatible provider.
seedbraid fetch <cid> --out fetched.sbd
seedbraid fetch <cid> --out fetched.sbd --retries 5 --backoff-ms 300
seedbraid fetch <cid> --out fetched.sbd --gateway https://ipfs.io/ipfsfetch retries with exponential backoff via the kubo HTTP API and can fall back to an HTTP gateway.
seedbraid pin-health <cid>export SB_PINNING_ENDPOINT='https://pin.example/api/v1'
export SB_PINNING_TOKEN='your-api-token'
seedbraid pin remote-add <cid>export SB_SIGNING_KEY='your-shared-secret'
seedbraid sign seed.sbd --out seed.signed.sbd --key-env SB_SIGNING_KEY --key-id team-aseedbraid export-genes seed.sbd --genome ./genome --out genes.pack
seedbraid import-genes genes.pack --genome ./another-genomeGenerate a high-entropy key for SB_ENCRYPTION_KEY:
seedbraid gen-encryption-keyPrint shell export format:
seedbraid gen-encryption-key --shellSet the current shell variable directly:
eval "$(seedbraid gen-encryption-key --shell)"Start the kubo daemon:
ipfs daemonBy default, seedbraid connects to the kubo HTTP API at
http://127.0.0.1:5001/api/v0. Override with the SB_KUBO_API
environment variable:
export SB_KUBO_API=http://127.0.0.1:5001/api/v0Run seedbraid doctor to verify connectivity.
To use a remote pinning service, set the endpoint and token as environment variables.
Using a shell profile (~/.bashrc, ~/.zshrc):
export SB_PINNING_ENDPOINT='https://api.pinata.cloud/psa'
export SB_PINNING_TOKEN='your-api-token'Using direnv (.envrc in your project directory):
# .envrc
export SB_PINNING_ENDPOINT='https://api.pinata.cloud/psa'
export SB_PINNING_TOKEN='your-api-token'With these variables set, --remote-pin works without passing --remote-endpoint and --remote-token each time.
After publishing with --remote-pin, confirm the pin is active:
# 1. Check local pin and block availability
seedbraid pin-health <cid>
# 2. Verify the pinned content is fetchable from the network
seedbraid fetch <cid> --out /tmp/verify.sbd
seedbraid verify /tmp/verify.sbd --genome ./genome --strictIf pin-health reports the CID is pinned and fetch + verify --strict succeed, the remote pin is working correctly.
kubo daemon not reachable- Install Kubo, start the daemon with
ipfs daemon, and verify withseedbraid doctor
- Install Kubo, start the daemon with
Missing required chunkon decode or verify- Provide the correct
--genome, or re-encode with--portable
- Provide the correct
zstdcompression error- Install optional dependency
zstandard, or use--compression zlib
- Install optional dependency
Reconstructing a file requires two things: a seed (the recipe describing chunk order) and the chunks themselves (the actual data). If either is missing, recovery is impossible.
| Scenario | Why It Works |
|---|---|
| Seed on hand + local genome available | Recipe and ingredients are both local |
| Seed on hand + own IPFS node running with chunks pinned | Recipe is local; ingredients are in your node's storage |
| Seed on hand + chunks held by a pinning service (Pinata, etc.) | Recipe is local; ingredients are in a paid storage provider |
| Seed on hand + teammate's IPFS node holds the chunks | Recipe is local; ingredients are on a peer's node |
Seed created with --portable (chunks embedded in seed) |
Recipe and ingredients are bundled together in one file |
Seed on hand + genome snapshot (.sgs backup) exists |
Recipe is local; ingredients are in a backup archive |
| Scenario | Why It Fails |
|---|---|
| Seed file lost | Without the recipe, there is no way to know which chunks to fetch or how to reassemble them |
| Seed exists, but genome deleted and chunks never published to IPFS | Recipe exists, but all ingredients have been discarded |
| Seed exists, but IPFS node stopped and no other node holds the chunks | Recipe exists, but the only store that had the ingredients is offline |
| Seed exists, but IPFS pin removed and garbage collection ran | Recipe exists, but automatic cleanup deleted the ingredients |
| Seed exists, but pinning service subscription expired | Recipe exists, but the storage provider disposed of the ingredients |
| Seed exists, but even one chunk is missing from all sources | Partial recovery is not supported; every chunk is required |
| Seed is encrypted and the encryption key is lost | The recipe is unreadable without the key |
| Action | Risk Mitigated |
|---|---|
| Back up seed files | Prevents seed loss |
Use --pin when publishing chunks |
Prevents IPFS garbage collection |
Use a pinning service (--remote-pin) |
Survives local node shutdown |
Encode with --portable |
Self-contained seed; no external chunk source needed (seed size increases) |
| Keep encryption keys in a secret manager | Prevents key loss for encrypted seeds |
Take genome snapshots (genome snapshot) |
Preserves local chunk data independently of IPFS |
Safest option:
--portableembeds all chunks in the seed, making it fully self-contained. The trade-off is that the seed grows to roughly the size of the original file, reducing the benefit of IPFS distribution.
| Symptom | Error Code | Next Action |
|---|---|---|
| Encryption requested but key missing | SB_E_ENCRYPTION_KEY_MISSING |
Pass --encryption-key or set SB_ENCRYPTION_KEY. |
| Signing requested but key missing | SB_E_SIGNING_KEY_MISSING |
Export signing key env var and retry seedbraid sign. |
| Kubo daemon unreachable | SB_E_IPFS_NOT_FOUND |
Install Kubo, run ipfs daemon, set SB_KUBO_API if non-default endpoint. |
| IPFS fetch/publish failure | SB_E_IPFS_FETCH / SB_E_IPFS_PUBLISH |
Check daemon/network, retry, use gateway fallback if needed. |
| Remote pin configuration missing | SB_E_REMOTE_PIN_CONFIG |
Set endpoint/token env vars or pass options. |
| Remote pin auth failed | SB_E_REMOTE_PIN_AUTH |
Verify provider token permissions and retry. |
| Remote pin request invalid | SB_E_REMOTE_PIN_REQUEST |
Check CID/provider options and retry. |
| Remote pin timeout/failure | SB_E_REMOTE_PIN_TIMEOUT / SB_E_REMOTE_PIN |
Increase retries/timeout or check provider health. |
| Seed parse/integrity failure | SB_E_SEED_FORMAT |
Re-fetch/rebuild seed and verify source integrity. |
| IPFS chunk publish failed | SB_E_IPFS_CHUNK_PUT |
Check IPFS daemon, retry, verify chunk availability. |
| IPFS chunk fetch failed | SB_E_IPFS_CHUNK_GET |
Check daemon/network, retry, use --gateway fallback. |
| Chunk manifest invalid | SB_E_CHUNK_MANIFEST_FORMAT |
Regenerate manifest with publish-chunks. |
| IPFS MFS operation failed | SB_E_IPFS_MFS |
Verify daemon is running with seedbraid doctor. |
The sections below are for contributors and developers working on Seedbraid itself.
uv sync --no-editable --extra devOptional zstd support:
uv sync --no-editable --extra dev --extra zstdRefresh the lockfile after dependency changes:
uv lockUV_CACHE_DIR=.uv-cache uv run --no-editable ruff check .
PYTHONPATH=src uv run --no-editable python -m pytest
PYTHONPATH=src uv run --no-editable python -m pytest tests/test_compat_fixtures.pyIPFS tests auto-skip when the kubo daemon is not reachable.
Compatibility fixtures are stored in tests/fixtures/compat/v1/ and validated by tests/test_compat_fixtures.py.
To regenerate them intentionally:
uv run --no-editable python scripts/gen_compat_fixtures.pyGitHub Actions workflows:
.github/workflows/ci.ymlruff check .python -m pytest- compatibility fixtures validation
- benchmark gate
.github/workflows/publish-seed.yml- manual only,
dry_run=trueby default - generates a seed from
source_path - runs
seedbraid verify --strict - publishes to IPFS only when
dry_run=false - installs Kubo when needed
- verifies Kubo release signature status and checksum
- supports
pin,portable,manifest_private, and optionalencrypt
- manual only,
Local parity commands:
uv sync --no-editable --extra dev
uv run --no-sync --no-editable ruff check .
PYTHONPATH=src uv run --no-sync --no-editable python -m pytest
PYTHONPATH=src uv run --no-sync --no-editable python -m pytest tests/test_compat_fixtures.py
uv run --no-sync --no-editable python scripts/bench_gate.py \
--min-reuse-improvement-bps 1 \
--max-seed-size-ratio 1.20 \
--min-cdc-throughput-mib-s 0.10 \
--json-out .artifacts/bench-report.jsonuv run --no-editable python scripts/bench_shifted_dedup.py
uv run --no-editable python scripts/bench_gate.py \
--min-reuse-improvement-bps 1 \
--max-seed-size-ratio 1.20 \
--min-cdc-throughput-mib-s 0.10 \
--json-out .artifacts/bench-report.jsonExpected behavior:
cdc_buzhashshould show better reuse thanfixedwhen a single-byte insertion shifts offsetsbench_gate.pyexits non-zero when configured thresholds are violated
- Minimal DVC bridge lives in
examples/dvc/ - Pipeline stages are
encode -> verify --strict -> fetch - The integration recipe and artifact layout are documented in
examples/dvc/README.md
- ORAS bridge scripts and usage docs live in
examples/oci/ - Default OCI metadata convention:
- artifact type:
application/vnd.seedbraid.seed.v1 - layer media type:
application/vnd.seedbraid.seed.layer.v1+sbd - annotations: source SHA-256, chunker, manifest-private flag, seed title
- artifact type:
- Push/pull scripts:
examples/oci/scripts/push_seed.sh <seed.sbd> <registry/repository:tag>examples/oci/scripts/pull_seed.sh <registry/repository:tag> <out.sbd>
- After pull, run strict verification:
seedbraid verify <out.sbd> --genome <genome-path> --strict
- Scripts for MLflow metadata logging and Hugging Face upload live in
examples/ml/ - MLflow hook logs seed metadata fields
- Hugging Face hook uploads
seed.sbdand a metadata sidecar - Restore workflow is documented in
examples/ml/README.md
Current adoption priorities include:
- a faster onboarding path
- stronger benchmark evidence versus alternatives
- security and operator tooling such as signing, encryption,
doctor,snapshot, andrestore - stable format governance and backward-compatibility policy for long-lived seed archives
- Format spec:
docs/FORMAT.md - Design rationale:
docs/DESIGN.md - Threat model:
docs/THREAT_MODEL.md - Error codes:
docs/ERROR_CODES.md - Performance gates:
docs/PERFORMANCE.md - DVC example:
examples/dvc/README.md - OCI example:
examples/oci/README.md - ML tooling example:
examples/ml/README.md
Seedbraid is maintained as an open-source project.
If Seedbraid helps your workflow, please consider supporting the project through the repository Sponsor button. Support goes directly toward maintenance, documentation, and compatibility/performance validation.
- License:
MIT(LICENSE) - Security policy:
SECURITY.md - Contributing guide:
CONTRIBUTING.md - Code of Conduct:
CODE_OF_CONDUCT.md