Skip to content

feat: IPFS distributed chunks and kubo HTTP API migration#33

Merged
aimsise merged 29 commits intomainfrom
feat/ipfs-distributed-chunks
Mar 24, 2026
Merged

feat: IPFS distributed chunks and kubo HTTP API migration#33
aimsise merged 29 commits intomainfrom
feat/ipfs-distributed-chunks

Conversation

@aimsise
Copy link
Copy Markdown
Owner

@aimsise aimsise commented Mar 24, 2026

Summary

  • IPFS distributed chunks (SBD-ECO-006): CDC chunks published as IPFS raw blocks with CIDv1 addressing, parallel fetch-decode, chunk manifest sidecar, hybrid local+IPFS genome storage, MFS-based DAG pinning
  • kubo HTTP API migration: All IPFS operations migrated from subprocess CLI calls to kubo HTTP RPC API (/api/v0/) via stdlib urllib.request. Zero subprocess calls remain in IPFS-related source and test files
  • New modules: ipfs_http.py, ipfs_chunks.py, cid.py, chunk_manifest.py, hybrid_storage.py
  • CLI commands: publish-chunks, fetch-decode, ipfs:// genome URI in decode
  • CI: kubo daemon startup with health check polling, IPFS E2E tests enabled
  • Docs: FORMAT.md spec (SBD-ECO-006), DESIGN.md, THREAT_MODEL.md, ERROR_CODES.md, PERFORMANCE.md, README updated for kubo daemon prerequisites
  • Version: 2.0.0

Scope

Area Changes
New source modules 5 (ipfs_http, ipfs_chunks, cid, chunk_manifest, hybrid_storage)
New test files 5 (test_ipfs_http, test_ipfs_chunks, test_cid, test_chunk_manifest, test_hybrid_storage)
Modified source cli.py, codec.py, ipfs.py, diagnostics.py, errors.py
Modified tests test_ipfs_optional, test_ipfs_chunks_integration, test_doctor, test_ipfs_reliability, test_ipfs_fetch_validation
CI workflows ci.yml (kubo daemon), publish-seed.yml (daemon startup)
Docs FORMAT.md, DESIGN.md, THREAT_MODEL.md, ERROR_CODES.md, PERFORMANCE.md, README.md, CONTRIBUTING.md

Test plan

  • 258 tests pass, 8 skipped (IPFS E2E skipped without daemon)
  • ruff lint: all checks passed
  • subprocess zero-check: 0 matches in all IPFS source and test files
  • CI kubo daemon health check polling (replaces blind sleep 3)

🤖 Generated with Claude Code

Helix and others added 29 commits March 20, 2026 03:12
- FORMAT.md: add Chunk Manifest Sidecar section
- DESIGN.md: add IPFS Distributed Chunk Storage section
- THREAT_MODEL.md: add chunk distribution risk analysis
- ERROR_CODES.md: add SB_E_IPFS_CHUNK_PUT/GET/UNAVAILABLE,
  SB_E_CHUNK_MANIFEST_FORMAT
- PERFORMANCE.md: add IPFS chunk fetch performance thresholds

Part of: IPFS distributed chunks feature (v1.2.0, Phase 0)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement Ticket #2 (Phase 1) of IPFS distributed chunks feature (SBD-ECO-006).
Adds deterministic CIDv1 raw codec computation from SHA-256 digests.

- src/seedbraid/cid.py: sha256_to_cidv1_raw() and cidv1_raw_to_sha256()
  functions using stdlib only (base64, hashlib, binascii).
- tests/test_cid.py: 15 test cases (known-value, round-trip, format,
  determinism, error handling).

Provides foundation for Tickets #3-6 (manifest, IPFSChunkStorage,
publish-chunks, fetch-decode operations).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exclude Claude Code project-local memory and state from version control.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement Ticket #4 (Phase 3) of IPFS Distributed Chunks feature.
Adds IPFSChunkStorage class implementing the GenomeStorage Protocol,
backed by ipfs block put/get/stat CLI commands with retry/backoff
and HTTP gateway fallback. Includes publish_chunk() and fetch_chunk()
standalone functions. Adds ACTION_CHECK_IPFS_DAEMON and
ACTION_CHECK_IPFS_NETWORK error messaging constants. Includes 15
unit tests with monkeypatch (no IPFS daemon required).

Changes:
- src/seedbraid/errors.py: Add ACTION_CHECK_IPFS_DAEMON,
  ACTION_CHECK_IPFS_NETWORK constants
- src/seedbraid/ipfs_chunks.py: IPFSChunkStorage class, _require_ipfs(),
  _fetch_chunk_from_gateway(), publish_chunk(), fetch_chunk()
- tests/test_ipfs_chunks.py: 15 test cases with monkeypatch fixtures

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add publish_chunks_from_genome() for parallel chunk publish
  with ThreadPoolExecutor (default 16 workers)
- Add seedbraid publish-chunks CLI command with --genome,
  --workers, --retries, --backoff-ms, --manifest-out options
- Deduplicate digests before publishing (dict.fromkeys)
- Write chunk manifest sidecar (.sbd.chunks.json)
- Display IPFS public network warning per THREAT_MODEL
- Cancel remaining futures on first publish error (fail-fast)
- Add 10 unit tests (function + CLI, monkeypatch patterns)

Part of: IPFS distributed chunk storage (SBD-ECO-006)
Ticket: #5 (Phase 4)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add fetch side of distributed chunk storage: parallel chunk retrieval from
IPFS with batch streaming to respect memory bounds. Implements fetch_chunks_parallel
(dedup + ThreadPoolExecutor) and fetch_decode_from_ipfs (batch loop with OP_RAW/OP_REF
support and SHA-256 verification). Reuse shared IPFSChunkStorage across batches to
avoid repeated `which ipfs` calls. Includes 9 tests covering parallel dedup, batch
boundaries, SHA-256 mismatch, unavailable chunks, OP_RAW+OP_REF mixing, and CLI
integration.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `/memorize` slash command to update project memory with branch progress.
Auto-detects completed tickets by cross-referencing git commits with ticket
table, marks them DONE, and updates MEMORY.md. When all tickets complete,
offers to clean up (remove entry from MEMORY.md and delete project memory file).
Ensures each feature entry is independent for safe deletion. Supports optional
user-provided progress notes for manual updates.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tests/test_ipfs_chunks_integration.py with 7 test cases:
- publish-fetch roundtrip (small file, dedup, batch_size=1)
- manifest sidecar write/read consistency
- SHA-256 verification of decoded output
- progress callback invocation
- Python CID vs ipfs block put CID match

All tests skip gracefully when ipfs CLI is unavailable.
Requires local Kubo node (no daemon needed).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract _encode() helper to eliminate 6x encode_file boilerplate
- Add autouse _set_ipfs_env fixture to replace per-test monkeypatch
- Consolidate retries/backoff into _RETRIES/_BACKOFF_MS constants
- Move sha256_to_cidv1_raw import to module level
- Use sha256_file(src) instead of manual hashlib.sha256 for oracle
- Add publish result assertions to 3 tests that discarded return

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement HybridGenomeStorage for decode workflows with local-first
lookup and IPFS fallback. Add decode_file_with_genome to accept
pre-opened GenomeStorage instances, enabling callers to inject
hybrid or custom storage backends. Refactor decode_file as a thin
wrapper to maintain backward compatibility.

- HybridGenomeStorage: local SQLite priority, IPFS fallback
- Optionally cache IPFS-fetched chunks in local storage
- decode_file_with_genome: new API for custom storage injection
- decode_file: refactored as wrapper over decode_file_with_genome
- 9 unit tests (IPFS-free, dict-backed mocks)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --gateway option for HTTP fallback. Support ipfs:// URI scheme:
- ipfs:// creates temporary cache (discarded after decode)
- ipfs:///path/to/cache persists cache for future reuse

Refactor _decode_with_ipfs_genome to eliminate code duplication
using contextlib.nullcontext for unified local/temp cache handling.
Define _IPFS_SCHEME constant to centralize scheme reference.

When using temporary cache, disable cache_fetched since DB is
immediately discarded. Improve clarity with GenomeStorage-typed
remote parameter (not hardcoded IPFSChunkStorage).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add create_chunk_dag() to build MFS directory from chunk manifest
  and return DAG root CID via ipfs files stat --hash
- Add pin_dag_locally() helper for ipfs pin add with error handling
- Add --pin/--remote-pin options to publish-chunks CLI command
  (mirrors existing publish command pattern)
- Add SB_E_IPFS_MFS error code for MFS operation failures
- Add 5 monkeypatch-based unit tests for DAG creation and pinning
- Consolidates chunk pinning cost: 16K individual pins → 1 DAG root pin

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion details

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…elog

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Version bumped to 2.0.0: IPFS distributed chunk storage as major architectural expansion
- CLAUDE.md: updated module list (16 modules), CLI commands (19 commands), crypto extra, ERROR_CODES.md ref
- README.md: added [crypto] optional extra installation instructions
- API reference: added docs for ipfs_chunks, chunk_manifest, hybrid_storage, cid modules
- mkdocs.yml: nav updated with 4 new API reference entries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove SB_E_IPFS_CHUNK_UNAVAILABLE from docs/ERROR_CODES.md (never implemented; consolidate to SB_E_IPFS_CHUNK_GET)
- Update CHANGELOG.md: remove unused error code from v1.2.0, add removal note in v2.0.0
- README.md: remove unused error code from troubleshooting, add PYTHONPATH to Local Checks commands
- DESIGN.md: clarify cache_fetched CLI behavior for ipfs:// bare URI
- docs/index.md: add IPFS distributed chunks to feature overview
- docs/PERFORMANCE.md: update deferred benchmark integration note

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- errors.py: fix genome-snapshot command name to genome snapshot (subcommand separator)
- DESIGN.md: add missing 6 modules to Architecture (pinning, oci, mlhooks, errors, perf, diagnostics)
- pyproject.toml: align description with README, add keywords for discoverability
- CHANGELOG.md: correct CLI command count (19 -> 17, excluding command groups)
- CONTRIBUTING.md: align pytest command with CLAUDE.md (remove redundant UV_CACHE_DIR)

All 16 modules now documented, 17 CLI commands properly counted, all metadata consistent.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SECURITY.md: update supported versions table for v2.0.0 release
- CODE_OF_CONDUCT.md: update enforcement contact to GitHub Security Advisories (remove stale email)
- README.md: align ruff command with CLAUDE.md/CONTRIBUTING.md (add UV_CACHE_DIR)

Verification round 5 complete. All docs-implementation inconsistencies resolved.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- LICENSE: update copyright from Helix to Seedbraid (complete rename)
- pyproject.toml: upgrade from Beta to Production/Stable classifier

Completes comprehensive docs-implementation audit (5 rounds, 30+ categories).
All 17 CLI commands, 24 error codes, 16 modules, binary formats verified.
Repository now ready for v2.0.0 release with IPFS distributed chunks.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- "Beta Status" → "Stability": Reflect v2.0.0 as production-ready
- Align README with pyproject.toml classifier (5 - Production/Stable)
- Retain validation guidance for deployment

Resolves final docs-implementation inconsistency (6th audit round).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…HTTP API migration

Update specification documents to reflect subprocess → kubo HTTP RPC API migration:
- DESIGN.md: Add ipfs_http.py module, update Architecture and SBD-ECO-006 sections
  to describe HTTP RPC API (`/api/v0/`) calls, document SB_KUBO_API env var,
  and update Assumptions to require kubo daemon instead of CLI.
- THREAT_MODEL.md: Add Risks 8 and 9 (kubo API endpoint exposure,
  SB_KUBO_API override) and corresponding mitigations (localhost-only default,
  chunk SHA-256 verification).
- ERROR_CODES.md: Update SB_E_IPFS_* codes to reference HTTP endpoints,
  add new SB_E_KUBO_API_UNREACHABLE code.

Spec-first policy: docs changes precede implementation (Phase 0).
SBD1 binary format unchanged.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New module src/seedbraid/ipfs_http.py with 8 public functions:
  api_base_url, post_json, post_raw, post_multipart_json,
  post_multipart_file_json, post_void, check_daemon, daemon_version
- Zero additional dependencies (stdlib urllib.request only)
- Configurable endpoint via SB_KUBO_API env var, timeout via SB_KUBO_TIMEOUT
- Multi-arg query param support (arg=["a","b"] → arg=a&arg=b)
- Refactored HTTP ops into _execute helper, eliminated code duplication
- Added OSError guard for error-body reads (matches pinning.py pattern)
- 16 unit tests with monkeypatched urlopen, 89% coverage
- Add ACTION_CHECK_KUBO_API constant to errors.py
- Update ACTION_CHECK_IPFS_DAEMON wording for clarity
- Simplifications: _handle_error → NoReturn, removed unreachable returns,
  _multipart_body uses b"".join() instead of bytes concatenation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 5 subprocess.run calls (ipfs add, pin add, cat, pin ls,
block stat) with ipfs_http module calls. Remove _require_ipfs()
and shutil/subprocess imports.

Update test_ipfs_reliability.py and test_ipfs_fetch_validation.py
to use ipfs_http mocks instead of subprocess/shutil.which mocks.

Part of kubo HTTP API migration (Ticket #3, Phase 2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace subprocess calls in has_chunk(), get_chunk(), put_chunk()
with ipfs_http.post_json/post_raw/post_multipart_json. Remove
_ipfs_path() method and _ipfs instance variable from
IPFSChunkStorage.

Keep _require_ipfs() and subprocess imports for MFS/DAG
operations (Phase 3 scope: create_chunk_dag, pin_dag_locally).

Update test_ipfs_chunks.py: remove _Proc dataclass and
_patch_ipfs helper from core tests, migrate all monkeypatches
to ipfs_http mocks. MFS/DAG tests retain subprocess mocks.

Update integration test skip conditions to check kubo daemon
availability via HTTP API.

Part of kubo HTTP API migration (Ticket #3, Phase 2).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use distinct error code SB_E_IPFS_CID_MISMATCH instead of fragile
  string matching ("CID mismatch" in str(exc)) for non-retryable
  integrity errors in put_chunk().
- Add threading.Lock for _published_count to ensure thread-safe
  increments under ThreadPoolExecutor.
- Add comment explaining hyphenated kwargs dict unpacking pattern.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace remaining subprocess calls in ipfs_chunks.py
(create_chunk_dag, pin_dag_locally) and diagnostics.py
(_check_ipfs_cli -> _check_kubo_api) with kubo HTTP RPC
via ipfs_http module.

- Remove _require_ipfs() and subprocess/shutil imports
  from ipfs_chunks.py
- Migrate create_chunk_dag MFS operations (/files/mkdir,
  /files/cp, /files/stat, /files/rm) to ipfs_http
- Migrate pin_dag_locally to ipfs_http.post_json(/pin/add)
- Rename _check_ipfs_cli -> _check_kubo_api using
  ipfs_http.daemon_version()
- Update tests: convert subprocess mocks to ipfs_http mocks
- Add pin_dag_locally unit tests

Phase 3 of kubo HTTP API migration (Ticket #4).
Completes subprocess elimination from IPFS operations.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 4 (CLI): Update doctor docstring and --retries help text
to reference kubo HTTP API instead of ipfs CLI.

Phase 5 (Tests & CI): Migrate IPFS tests from subprocess-based
ipfs init to kubo HTTP API daemon checks. Add kubo daemon startup
to ci.yml and publish-seed.yml with health check polling (vs blind sleep).
Rewrite test_ipfs_chunks_integration.py ipfs_repo fixture as _require_kubo
and migrate test_cid_matches_ipfs_block_put to ipfs_http.post_multipart_json.

Phase 6 (Documentation): Rewrite README IPFS Setup section and
troubleshooting matrix for kubo daemon prerequisites. Add kubo daemon
requirement to CONTRIBUTING.md. Final subprocess zero-check confirms
all IPFS-related source and test files are clean.

All 258 tests pass; kubo daemon health check polling eliminates race
conditions on slow CI runners vs blind 3-second waits.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit type annotations to suppress no-any-return errors
from json.loads() and urlopen().read() returns. Fix type
narrowing for optional chunk lookup and dag_root_cid usage.
Use AbstractContextManager for nullcontext/TemporaryDirectory union.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@aimsise aimsise merged commit f6989d0 into main Mar 24, 2026
6 checks passed
@aimsise aimsise deleted the feat/ipfs-distributed-chunks branch March 24, 2026 18:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant