feat: IPFS distributed chunks and kubo HTTP API migration#33
Merged
Conversation
- FORMAT.md: add Chunk Manifest Sidecar section - DESIGN.md: add IPFS Distributed Chunk Storage section - THREAT_MODEL.md: add chunk distribution risk analysis - ERROR_CODES.md: add SB_E_IPFS_CHUNK_PUT/GET/UNAVAILABLE, SB_E_CHUNK_MANIFEST_FORMAT - PERFORMANCE.md: add IPFS chunk fetch performance thresholds Part of: IPFS distributed chunks feature (v1.2.0, Phase 0) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement Ticket #2 (Phase 1) of IPFS distributed chunks feature (SBD-ECO-006). Adds deterministic CIDv1 raw codec computation from SHA-256 digests. - src/seedbraid/cid.py: sha256_to_cidv1_raw() and cidv1_raw_to_sha256() functions using stdlib only (base64, hashlib, binascii). - tests/test_cid.py: 15 test cases (known-value, round-trip, format, determinism, error handling). Provides foundation for Tickets #3-6 (manifest, IPFSChunkStorage, publish-chunks, fetch-decode operations). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Exclude Claude Code project-local memory and state from version control. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement Ticket #4 (Phase 3) of IPFS Distributed Chunks feature. Adds IPFSChunkStorage class implementing the GenomeStorage Protocol, backed by ipfs block put/get/stat CLI commands with retry/backoff and HTTP gateway fallback. Includes publish_chunk() and fetch_chunk() standalone functions. Adds ACTION_CHECK_IPFS_DAEMON and ACTION_CHECK_IPFS_NETWORK error messaging constants. Includes 15 unit tests with monkeypatch (no IPFS daemon required). Changes: - src/seedbraid/errors.py: Add ACTION_CHECK_IPFS_DAEMON, ACTION_CHECK_IPFS_NETWORK constants - src/seedbraid/ipfs_chunks.py: IPFSChunkStorage class, _require_ipfs(), _fetch_chunk_from_gateway(), publish_chunk(), fetch_chunk() - tests/test_ipfs_chunks.py: 15 test cases with monkeypatch fixtures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add publish_chunks_from_genome() for parallel chunk publish with ThreadPoolExecutor (default 16 workers) - Add seedbraid publish-chunks CLI command with --genome, --workers, --retries, --backoff-ms, --manifest-out options - Deduplicate digests before publishing (dict.fromkeys) - Write chunk manifest sidecar (.sbd.chunks.json) - Display IPFS public network warning per THREAT_MODEL - Cancel remaining futures on first publish error (fail-fast) - Add 10 unit tests (function + CLI, monkeypatch patterns) Part of: IPFS distributed chunk storage (SBD-ECO-006) Ticket: #5 (Phase 4) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add fetch side of distributed chunk storage: parallel chunk retrieval from IPFS with batch streaming to respect memory bounds. Implements fetch_chunks_parallel (dedup + ThreadPoolExecutor) and fetch_decode_from_ipfs (batch loop with OP_RAW/OP_REF support and SHA-256 verification). Reuse shared IPFSChunkStorage across batches to avoid repeated `which ipfs` calls. Includes 9 tests covering parallel dedup, batch boundaries, SHA-256 mismatch, unavailable chunks, OP_RAW+OP_REF mixing, and CLI integration. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add `/memorize` slash command to update project memory with branch progress. Auto-detects completed tickets by cross-referencing git commits with ticket table, marks them DONE, and updates MEMORY.md. When all tickets complete, offers to clean up (remove entry from MEMORY.md and delete project memory file). Ensures each feature entry is independent for safe deletion. Supports optional user-provided progress notes for manual updates. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add tests/test_ipfs_chunks_integration.py with 7 test cases: - publish-fetch roundtrip (small file, dedup, batch_size=1) - manifest sidecar write/read consistency - SHA-256 verification of decoded output - progress callback invocation - Python CID vs ipfs block put CID match All tests skip gracefully when ipfs CLI is unavailable. Requires local Kubo node (no daemon needed). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Extract _encode() helper to eliminate 6x encode_file boilerplate - Add autouse _set_ipfs_env fixture to replace per-test monkeypatch - Consolidate retries/backoff into _RETRIES/_BACKOFF_MS constants - Move sha256_to_cidv1_raw import to module level - Use sha256_file(src) instead of manual hashlib.sha256 for oracle - Add publish result assertions to 3 tests that discarded return Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implement HybridGenomeStorage for decode workflows with local-first lookup and IPFS fallback. Add decode_file_with_genome to accept pre-opened GenomeStorage instances, enabling callers to inject hybrid or custom storage backends. Refactor decode_file as a thin wrapper to maintain backward compatibility. - HybridGenomeStorage: local SQLite priority, IPFS fallback - Optionally cache IPFS-fetched chunks in local storage - decode_file_with_genome: new API for custom storage injection - decode_file: refactored as wrapper over decode_file_with_genome - 9 unit tests (IPFS-free, dict-backed mocks) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add --gateway option for HTTP fallback. Support ipfs:// URI scheme: - ipfs:// creates temporary cache (discarded after decode) - ipfs:///path/to/cache persists cache for future reuse Refactor _decode_with_ipfs_genome to eliminate code duplication using contextlib.nullcontext for unified local/temp cache handling. Define _IPFS_SCHEME constant to centralize scheme reference. When using temporary cache, disable cache_fetched since DB is immediately discarded. Improve clarity with GenomeStorage-typed remote parameter (not hardcoded IPFSChunkStorage). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add create_chunk_dag() to build MFS directory from chunk manifest and return DAG root CID via ipfs files stat --hash - Add pin_dag_locally() helper for ipfs pin add with error handling - Add --pin/--remote-pin options to publish-chunks CLI command (mirrors existing publish command pattern) - Add SB_E_IPFS_MFS error code for MFS operation failures - Add 5 monkeypatch-based unit tests for DAG creation and pinning - Consolidates chunk pinning cost: 16K individual pins → 1 DAG root pin Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ion details Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…elog Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Version bumped to 2.0.0: IPFS distributed chunk storage as major architectural expansion - CLAUDE.md: updated module list (16 modules), CLI commands (19 commands), crypto extra, ERROR_CODES.md ref - README.md: added [crypto] optional extra installation instructions - API reference: added docs for ipfs_chunks, chunk_manifest, hybrid_storage, cid modules - mkdocs.yml: nav updated with 4 new API reference entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove SB_E_IPFS_CHUNK_UNAVAILABLE from docs/ERROR_CODES.md (never implemented; consolidate to SB_E_IPFS_CHUNK_GET) - Update CHANGELOG.md: remove unused error code from v1.2.0, add removal note in v2.0.0 - README.md: remove unused error code from troubleshooting, add PYTHONPATH to Local Checks commands - DESIGN.md: clarify cache_fetched CLI behavior for ipfs:// bare URI - docs/index.md: add IPFS distributed chunks to feature overview - docs/PERFORMANCE.md: update deferred benchmark integration note Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- errors.py: fix genome-snapshot command name to genome snapshot (subcommand separator) - DESIGN.md: add missing 6 modules to Architecture (pinning, oci, mlhooks, errors, perf, diagnostics) - pyproject.toml: align description with README, add keywords for discoverability - CHANGELOG.md: correct CLI command count (19 -> 17, excluding command groups) - CONTRIBUTING.md: align pytest command with CLAUDE.md (remove redundant UV_CACHE_DIR) All 16 modules now documented, 17 CLI commands properly counted, all metadata consistent. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- SECURITY.md: update supported versions table for v2.0.0 release - CODE_OF_CONDUCT.md: update enforcement contact to GitHub Security Advisories (remove stale email) - README.md: align ruff command with CLAUDE.md/CONTRIBUTING.md (add UV_CACHE_DIR) Verification round 5 complete. All docs-implementation inconsistencies resolved. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- LICENSE: update copyright from Helix to Seedbraid (complete rename) - pyproject.toml: upgrade from Beta to Production/Stable classifier Completes comprehensive docs-implementation audit (5 rounds, 30+ categories). All 17 CLI commands, 24 error codes, 16 modules, binary formats verified. Repository now ready for v2.0.0 release with IPFS distributed chunks. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- "Beta Status" → "Stability": Reflect v2.0.0 as production-ready - Align README with pyproject.toml classifier (5 - Production/Stable) - Retain validation guidance for deployment Resolves final docs-implementation inconsistency (6th audit round). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…HTTP API migration Update specification documents to reflect subprocess → kubo HTTP RPC API migration: - DESIGN.md: Add ipfs_http.py module, update Architecture and SBD-ECO-006 sections to describe HTTP RPC API (`/api/v0/`) calls, document SB_KUBO_API env var, and update Assumptions to require kubo daemon instead of CLI. - THREAT_MODEL.md: Add Risks 8 and 9 (kubo API endpoint exposure, SB_KUBO_API override) and corresponding mitigations (localhost-only default, chunk SHA-256 verification). - ERROR_CODES.md: Update SB_E_IPFS_* codes to reference HTTP endpoints, add new SB_E_KUBO_API_UNREACHABLE code. Spec-first policy: docs changes precede implementation (Phase 0). SBD1 binary format unchanged. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- New module src/seedbraid/ipfs_http.py with 8 public functions: api_base_url, post_json, post_raw, post_multipart_json, post_multipart_file_json, post_void, check_daemon, daemon_version - Zero additional dependencies (stdlib urllib.request only) - Configurable endpoint via SB_KUBO_API env var, timeout via SB_KUBO_TIMEOUT - Multi-arg query param support (arg=["a","b"] → arg=a&arg=b) - Refactored HTTP ops into _execute helper, eliminated code duplication - Added OSError guard for error-body reads (matches pinning.py pattern) - 16 unit tests with monkeypatched urlopen, 89% coverage - Add ACTION_CHECK_KUBO_API constant to errors.py - Update ACTION_CHECK_IPFS_DAEMON wording for clarity - Simplifications: _handle_error → NoReturn, removed unreachable returns, _multipart_body uses b"".join() instead of bytes concatenation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace 5 subprocess.run calls (ipfs add, pin add, cat, pin ls, block stat) with ipfs_http module calls. Remove _require_ipfs() and shutil/subprocess imports. Update test_ipfs_reliability.py and test_ipfs_fetch_validation.py to use ipfs_http mocks instead of subprocess/shutil.which mocks. Part of kubo HTTP API migration (Ticket #3, Phase 2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace subprocess calls in has_chunk(), get_chunk(), put_chunk() with ipfs_http.post_json/post_raw/post_multipart_json. Remove _ipfs_path() method and _ipfs instance variable from IPFSChunkStorage. Keep _require_ipfs() and subprocess imports for MFS/DAG operations (Phase 3 scope: create_chunk_dag, pin_dag_locally). Update test_ipfs_chunks.py: remove _Proc dataclass and _patch_ipfs helper from core tests, migrate all monkeypatches to ipfs_http mocks. MFS/DAG tests retain subprocess mocks. Update integration test skip conditions to check kubo daemon availability via HTTP API. Part of kubo HTTP API migration (Ticket #3, Phase 2). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Use distinct error code SB_E_IPFS_CID_MISMATCH instead of fragile
string matching ("CID mismatch" in str(exc)) for non-retryable
integrity errors in put_chunk().
- Add threading.Lock for _published_count to ensure thread-safe
increments under ThreadPoolExecutor.
- Add comment explaining hyphenated kwargs dict unpacking pattern.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace remaining subprocess calls in ipfs_chunks.py (create_chunk_dag, pin_dag_locally) and diagnostics.py (_check_ipfs_cli -> _check_kubo_api) with kubo HTTP RPC via ipfs_http module. - Remove _require_ipfs() and subprocess/shutil imports from ipfs_chunks.py - Migrate create_chunk_dag MFS operations (/files/mkdir, /files/cp, /files/stat, /files/rm) to ipfs_http - Migrate pin_dag_locally to ipfs_http.post_json(/pin/add) - Rename _check_ipfs_cli -> _check_kubo_api using ipfs_http.daemon_version() - Update tests: convert subprocess mocks to ipfs_http mocks - Add pin_dag_locally unit tests Phase 3 of kubo HTTP API migration (Ticket #4). Completes subprocess elimination from IPFS operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 4 (CLI): Update doctor docstring and --retries help text to reference kubo HTTP API instead of ipfs CLI. Phase 5 (Tests & CI): Migrate IPFS tests from subprocess-based ipfs init to kubo HTTP API daemon checks. Add kubo daemon startup to ci.yml and publish-seed.yml with health check polling (vs blind sleep). Rewrite test_ipfs_chunks_integration.py ipfs_repo fixture as _require_kubo and migrate test_cid_matches_ipfs_block_put to ipfs_http.post_multipart_json. Phase 6 (Documentation): Rewrite README IPFS Setup section and troubleshooting matrix for kubo daemon prerequisites. Add kubo daemon requirement to CONTRIBUTING.md. Final subprocess zero-check confirms all IPFS-related source and test files are clean. All 258 tests pass; kubo daemon health check polling eliminates race conditions on slow CI runners vs blind 3-second waits. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add explicit type annotations to suppress no-any-return errors from json.loads() and urlopen().read() returns. Fix type narrowing for optional chunk lookup and dag_root_cid usage. Use AbstractContextManager for nullcontext/TemporaryDirectory union. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
subprocessCLI calls to kubo HTTP RPC API (/api/v0/) via stdliburllib.request. Zero subprocess calls remain in IPFS-related source and test filesipfs_http.py,ipfs_chunks.py,cid.py,chunk_manifest.py,hybrid_storage.pypublish-chunks,fetch-decode,ipfs://genome URI indecodeScope
ipfs_http,ipfs_chunks,cid,chunk_manifest,hybrid_storage)test_ipfs_http,test_ipfs_chunks,test_cid,test_chunk_manifest,test_hybrid_storage)cli.py,codec.py,ipfs.py,diagnostics.py,errors.pytest_ipfs_optional,test_ipfs_chunks_integration,test_doctor,test_ipfs_reliability,test_ipfs_fetch_validationci.yml(kubo daemon),publish-seed.yml(daemon startup)Test plan
sleep 3)🤖 Generated with Claude Code