IPFS Kit Python

IPFS Kit Python is a comprehensive, production-ready Python toolkit for building distributed storage applications on IPFS. It provides high-level APIs, advanced cluster management, AI/ML integration, and seamless MCP (Model Context Protocol) server support for modern decentralized applications.

🎯 What Can You Do With This?

For Developers

Build Decentralized Apps: High-level Python API for IPFS without complexity
Scale with Clusters: Multi-node cluster management with automatic replication
Integrate AI Models: Store and retrieve ML models/datasets on IPFS
Create Storage Services: Production-ready foundation for IPFS-based services

For Data Scientists

Distributed Datasets: Store and share large datasets across IPFS network
Model Versioning: Track and distribute ML models with content addressing
Reproducible Research: Immutable data storage with cryptographic verification
Collaborative Workflows: Share data and models via IPFS with team members

For DevOps/SRE

High Availability: Multi-node clusters with leader election and failover
Observability: Built-in metrics, logging, and monitoring
Container Native: Docker and Kubernetes ready deployment
Auto-Healing: Automatic error detection and recovery system

✨ Key Features

Core IPFS Operations

🌐 High-Level API: Simplified Python interface wrapping IPFS complexity
📦 Content Management: Add, get, pin, and manage content with ease
🔗 IPNS Support: Mutable pointers to immutable IPFS content
📊 Directory Operations: Work with IPFS directories and file structures
🔍 Content Discovery: Find and retrieve content across the IPFS network

Advanced Cluster Management

🔄 Multi-Node Clusters: Deploy 3+ node clusters with role hierarchy
👑 Leader Election: Automatic leader selection and failover
🎭 Role-Based: Master, Worker, and Leecher role management
📈 Auto-Scaling: Automatically replicate content based on demand
🔗 Peer Management: Dynamic peer discovery and connection handling
💾 Distributed Storage: Spread content across multiple nodes

AI/ML Integration

🤖 Model Registry: Store and version ML models on IPFS
📊 Dataset Management: Manage large datasets with IPFS chunking
�� Framework Support: LangChain, LlamaIndex, Transformers integration
📉 Metrics Tracking: Model performance metrics and visualization
🧮 Distributed Training: Share training data across nodes
🎯 Vector Search: GraphRAG and knowledge graph integration

MCP Server

🌟 Production Ready: Full-featured MCP server implementation
🛠️ Tool Integration: Expose IPFS operations as MCP tools
🔌 Plugin System: Extensible architecture for custom tools
📡 Real-Time: WebSocket support for streaming operations
🎨 Dashboard: Web-based management and monitoring interface
🔐 Secure: Built-in authentication and authorization

Storage & Performance

📦 Tiered Storage: Multi-tier caching (memory, SSD, network)
⚡ High Performance: Async/await throughout for concurrency
🔄 Write-Ahead Log: Crash recovery and data consistency
🗜️ Compression: Automatic compression for large files
📊 Metadata Index: Fast content lookup and search
🚀 Prefetching: Predictive content loading for speed

Operations & Monitoring

🔍 Observability: Prometheus metrics, structured logging, tracing
🏥 Health Checks: Built-in health endpoints for monitoring
🔧 Auto-Healing: Detect and fix common errors automatically
📈 Performance Metrics: Real-time performance tracking
🎛️ Configuration: Flexible YAML/JSON configuration
🔔 Alerting: Integration with monitoring systems

Deployment & Integration

🐳 Docker Ready: Multi-arch Docker images (AMD64, ARM64)
☸️ Kubernetes: Helm charts and operator support
🔄 CI/CD: GitHub Actions workflows included
🌐 Cloud Native: Deploy on any cloud provider
🔌 Extensible: Plugin system for custom functionality
📚 Well Documented: Comprehensive guides and examples

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Applications Layer                      │
│   (Your App, CLI, Web Dashboard, API Services)              │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                    High-Level API                           │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐   │
│  │  IPFS    │  │ Cluster  │  │  AI/ML   │  │   MCP      │   │
│  │  Ops     │  │  Mgmt    │  │  Tools   │  │  Server    │   │
│  └──────────┘  └──────────┘  └──────────┘  └────────────┘   │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                   Core Services Layer                       │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐   │
│  │ Tiered   │  │  WAL &   │  │ Metadata │  │   Pin      │   │
│  │  Cache   │  │ Journal  │  │  Index   │  │  Manager   │   │
│  └──────────┘  └──────────┘  └──────────┘  └────────────┘   │
└───────────────────────────┬─────────────────────────────────┘
                            │
┌───────────────────────────▼─────────────────────────────────┐
│                    IPFS Daemon Layer                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌────────────┐   │
│  │   Kubo   │  │  Cluster │  │  Lotus   │  │  Lassie    │   │
│  │  (IPFS)  │  │ Service  │  │(Filecoin)│  │ (Retrieval)│   │
│  └──────────┘  └──────────┘  └──────────┘  └────────────┘   │
└─────────────────────────────────────────────────────────────┘

🗄️ Storage Architecture & Backends

Multi-Backend Storage System

IPFS Kit supports 7 integrated storage backends for maximum flexibility and redundancy:

IPFS/Kubo - Decentralized content-addressed storage
Filecoin/Lotus - Long-term archival with economic incentives
S3-Compatible - AWS S3, MinIO, and other S3-compatible services
Storacha (Web3.Storage) - Web3 storage built on IPFS + Filecoin
HuggingFace - ML model and dataset storage
Lassie - High-performance IPFS retrieval client
Walrus - fsspec-compatible blob storage with direct blob-id reads and local logical-path indexing

Multi-Tier Storage Strategy

┌─────────────────────────────────────────────────────────────┐
│  Tier 1: Memory Cache (100MB default)                       │
│  • Fastest access (microseconds)                            │
│  • Hot content, recently accessed                           │
│  • ARC algorithm (Adaptive Replacement Cache)               │
└────────────────────────┬────────────────────────────────────┘
                         │ Auto-promotion/demotion
┌────────────────────────▼────────────────────────────────────┐
│  Tier 2: Disk Cache (1GB+ default)                          │
│  • Fast persistent storage (milliseconds)                   │
│  • Warm content, frequently accessed                        │
│  • Heat-based eviction, zero-copy mmap                      │
└────────────────────────┬────────────────────────────────────┘
                         │ Overflow & long-term
┌────────────────────────▼────────────────────────────────────┐
│  Tier 3: IPFS Network                                        │
│  • Distributed content-addressed storage                    │
│  • Peer discovery, automatic replication                    │
│  • DHT-based content routing                                │
└────────────────────────┬────────────────────────────────────┘
                         │ Backup & durability
┌────────────────────────▼────────────────────────────────────┐
│  Tier 4: Cloud Backends (S3, Storacha, Filecoin)            │
│  • Long-term archival, geographical distribution            │
│  • Economic persistence, compliance storage                 │
│  • Cross-region replication                                 │
└─────────────────────────────────────────────────────────────┘

Storage Backend Configuration

from ipfs_kit_py.high_level_api import IPFSSimpleAPI

# Initialize with multiple backends
api = IPFSSimpleAPI(
    storage_backends={
        'ipfs': {'enabled': True},
        'filecoin': {
            'enabled': True,
            'lotus_path': '/path/to/lotus'
        },
        's3': {
            'enabled': True,
            'bucket': 'my-ipfs-backup',
            'region': 'us-west-2'
        },
        'storacha': {
            'enabled': True,
            'token': 'your_token',
            'space': 'your_space_did'
        }
    }
)

# Content automatically distributed across backends
cid = api.add("important_data.txt", backends=['ipfs', 'filecoin', 's3'])

See Also: Storage Backends Documentation

Walrus fsspec Usage

The Walrus backend registers the walrus:// protocol with fsspec and supports publisher writes, aggregator reads, direct blob-id reads, and index-backed logical paths. ipfs_kit_py delegates the backend implementation to the standalone walrus-fsspec package while preserving the historical ipfs_kit_py.walrus_fsspec import path and Walrus environment variable aliases:

import fsspec
import ipfs_kit_py.walrus_fsspec  # registers walrus://

fs = fsspec.filesystem("walrus")
entry = fs.pipe_file("walrus://examples/hello.txt", b"hello walrus\n")

with fsspec.open("walrus://examples/hello.txt", "rb") as handle:
    print(handle.read())

with fsspec.open(f"walrus://{entry['blob_id']}", "rb") as handle:
    print(handle.read())

Set WALRUS_PUBLISHER_URL for writes, WALRUS_AGGREGATOR_URL for reads, and WALRUS_DELETE_URL for deletes. See the Walrus fsspec integration guide for full configuration, examples, and listing/deletion limitations.

🔄 Replica Management

Replication Strategies

IPFS Kit provides sophisticated replica management for high availability and data durability:

Cluster-Based Replication:

# Set replication factor for automatic distribution
api = IPFSSimpleAPI(role="master")

# Add content with 3 replicas across cluster
result = api.cluster_add(
    "dataset.tar.gz",
    replication_factor=3,  # Distribute to 3 nodes
    replication_policy="distributed"  # Strategy: distributed, local-first, geo-aware
)

# Check replication status
status = api.cluster_status(result['cid'])
print(f"Replicas: {len(status['peers'])} nodes")
print(f"Locations: {status['peer_locations']}")

Pin Management with Replication:

# Pin with min/max replica constraints
api.pin_add(
    cid,
    replication_min=2,  # Minimum 2 copies
    replication_max=5,  # Maximum 5 copies
    replication_priority="high"  # Auto-repair if below min
)

# Monitor replica health
health = api.get_replication_health(cid)
# Returns: {'total': 3, 'healthy': 3, 'degraded': 0, 'locations': [...]}

Replication Policies:

Distributed: Spread replicas across maximum geographic/network distance
Local-First: Keep replicas in nearby nodes first, then expand
Geo-Aware: Place replicas in specific regions or datacenters
Cost-Optimized: Balance between redundancy and storage costs
Latency-Optimized: Replicate to nodes with best access patterns

Automatic Repair:

# Enable auto-repair for critical content
api.enable_auto_repair(
    cid,
    check_interval=3600,  # Check every hour
    repair_threshold=2,   # Repair if below 2 replicas
    target_replicas=3     # Maintain 3 replicas
)

See Also: Cluster Management, Pin Management

💾 Multi-Tier Caching System

Advanced Caching with ARC Algorithm

IPFS Kit implements a sophisticated Adaptive Replacement Cache (ARC) with multiple tiers:

Cache Tiers:

Memory Cache (T1/T2)
- ARC algorithm balances recency vs frequency
- Configurable size (default: 100MB)
- Submillisecond access times
- Automatic size-based decisions
Disk Cache
- Persistent across restarts
- Heat-based eviction (access patterns + recency)
- Memory-mapped for zero-copy access
- Configurable size (default: 1GB+)
Network Cache
- IPFS network acts as distributed cache
- Content-addressed retrieval
- Peer caching benefits

Cache Configuration

from ipfs_kit_py.tiered_cache import TieredCacheManager

# Custom cache configuration
cache = TieredCacheManager(
    config={
        'memory_cache_size': 500 * 1024 * 1024,  # 500MB
        'disk_cache_size': 10 * 1024 * 1024 * 1024,  # 10GB
        'disk_cache_path': '/fast/ssd/cache',
        'enable_mmap': True,  # Zero-copy for large files
        'eviction_policy': 'heat',  # heat, lru, lfu
        'promotion_threshold': 3,  # Access count for promotion
    }
)

# Cache operations (automatic tier selection)
cache.put(cid, content)  # Intelligent tier placement
content = cache.get(cid)  # Fastest available tier

# Cache statistics
stats = cache.get_stats()
print(f"Hit rate: {stats['hit_rate']:.2%}")
print(f"Memory: {stats['memory_usage']}, Disk: {stats['disk_usage']}")

Cache Policies

Heat Scoring - Combines multiple factors:

Access frequency (recent access count)
Recency (time since last access)
Content size (smaller = higher priority)
Access pattern (sequential vs random)

Automatic Optimization:

Content promoted from disk → memory on repeated access
Large files use memory-mapped I/O (no duplication)
Rarely accessed content demoted to network tier
Cache pre-warming for predictable workloads

See Also: Tiered Cache Documentation

📁 VFS Buckets & Virtual Filesystem

Virtual Filesystem (VFS) Operations

IPFS Kit provides a POSIX-like virtual filesystem on top of IPFS, enabling familiar file operations:

from ipfs_kit_py.vfs_manager import get_global_vfs_manager

vfs = get_global_vfs_manager()

# File operations (like regular filesystem)
vfs.mkdir("/data/projects")
vfs.write("/data/projects/notes.txt", "Project notes...")
content = vfs.read("/data/projects/notes.txt")

# Directory operations
files = vfs.ls("/data/projects")
vfs.mv("/data/projects/old", "/data/archive/old")
vfs.rm("/data/temp/cache.db")

# Batch operations
vfs.copy_recursive("/data/input", "/data/processed")

VFS Buckets

Buckets are isolated namespaces within the VFS for organizing content:

# Create and manage buckets
vfs.create_bucket("ml-models", quota="10GB", policy="hot")
vfs.create_bucket("datasets", quota="100GB", policy="warm")
vfs.create_bucket("archive", quota="1TB", policy="cold")

# Bucket operations
vfs.write("/ml-models/resnet50.h5", model_data)
vfs.set_bucket_policy("ml-models", {
    'replication': 3,
    'cache_priority': 'high',
    'backup_schedule': 'daily'
})

# List buckets and usage
buckets = vfs.list_buckets()
for bucket in buckets:
    print(f"{bucket['name']}: {bucket['used']}/{bucket['quota']}")

VFS Features

Journaling & Change Tracking:

# Filesystem journal tracks all changes
journal = vfs.get_journal(since="2024-01-01")
for entry in journal:
    print(f"{entry['timestamp']}: {entry['operation']} {entry['path']}")

# Replicate changes to other nodes
vfs.replicate_journal(target_node="node2.example.com")

Metadata & Indexing:

# Automatic metadata extraction and indexing
vfs.write("/docs/paper.pdf", pdf_data, 
    metadata={'author': 'Smith', 'year': 2024})

# Enhanced pin index for fast lookup
results = vfs.search(query="machine learning", content_type="pdf")

See Also: VFS Management, Filesystem Journal

🧠 GraphRAG & Knowledge Graphs

Intelligent Search with GraphRAG

IPFS Kit integrates GraphRAG (Graph-based Retrieval Augmented Generation) for semantic search and knowledge management:

VFS GraphRAG Indexing

VFS GraphRAG indexing adds a dependency-light local index for virtual filesystem metadata, text chunks, embedding metadata, graph entities, relationships, snapshots, checkpoints, and portable export bundles. JSONL storage works without live IPFS, vector database, LLM, or ipfs_datasets_py services; optional adapters can provide richer chunking, embeddings, and knowledge graph extraction.

python -m ipfs_kit_py.cli vfs index \
  --index-root /tmp/vfs-graphrag \
  --namespace research \
  --path /data/reports/policy.md \
  --backend local \
  --protocol file \
  --mime-type text/markdown \
  --metadata-json '{"classification":"public"}'

python -m ipfs_kit_py.cli vfs search "policy" \
  --index-root /tmp/vfs-graphrag \
  --namespace research \
  --type hybrid \
  --filters-json '{"classification":"public"}'

from ipfs_kit_py.vfs_manager import VFSManager

vfs = VFSManager(storage_path="/srv/ipfs-kit-state")
vfs.enable_graphrag_indexing_sync(
    index_path="/srv/ipfs-kit-state/.vfs_graphrag_index",
    namespace="research",
)
vfs.index_namespace_sync("research", root_path="/data/reports", recursive=True)
results = vfs.search_sync(
    "policy",
    namespaces=["research"],
    metadata_filters={"classification": "public"},
    search_type="hybrid",
)

Export a searchable VFS snapshot with:

python -m ipfs_kit_py.cli vfs export-index \
  --index-root /tmp/vfs-graphrag \
  --namespace research \
  --output /tmp/vfs-snapshot

See VFS GraphRAG Indexing for configuration, indexing workflows, metadata/vector/graph search examples, export and import bundles, privacy controls, dependency requirements, and backend limitations.

Automatic Content Indexing:

# All VFS operations auto-index content
vfs.write("/docs/research.md", markdown_content)
# → Automatic entity extraction, relationship mapping, graph building

# Search across indexed content
results = api.search_text("quantum computing applications")
results = api.search_graph("quantum computing", max_depth=2)
results = api.search_vector("semantic similarity query", threshold=0.7)

Knowledge Graph Features

Entity Recognition:

Automatic extraction of people, places, organizations, concepts
Relationship mapping between entities
RDF triple store for structured knowledge
Graph analytics (centrality, importance scoring)

Search Methods:

Text Search - Full-text with relevance scoring
Graph Search - Traverse knowledge graph connections
Vector Search - Semantic similarity using embeddings
SPARQL Queries - Structured RDF queries
Hybrid Search - Combine multiple methods

# Hybrid search combines all methods
results = api.search_hybrid(
    query="AI model deployment",
    search_types=["text", "graph", "vector"],
    limit=20,
    min_score=0.6
)

# SPARQL for structured queries
results = api.search_sparql("""
    SELECT ?model ?accuracy ?dataset
    WHERE {
        ?model rdf:type :MLModel .
        ?model :accuracy ?accuracy .
        ?model :trainedOn ?dataset .
        FILTER (?accuracy > 0.95)
    }
""")

Graph Analytics:

# Analyze knowledge graph
stats = api.search_stats()
print(f"Entities: {stats['entity_count']}")
print(f"Relationships: {stats['relation_count']}")
print(f"Indexed documents: {stats['document_count']}")

# Find important entities
important = api.get_top_entities(limit=10, metric="centrality")

See Also: GraphRAG Documentation, Knowledge Graph

🔐 Configuration & Secrets Management

Secure Credential Management

IPFS Kit provides a unified credential manager for securely storing API keys, tokens, and credentials:

from ipfs_kit_py.credential_manager import CredentialManager

cred_manager = CredentialManager()

# Add credentials for different services
cred_manager.add_s3_credentials(
    name="production",
    aws_access_key_id="AKIA...",
    aws_secret_access_key="secret...",
    region_name="us-west-2"
)

cred_manager.add_storacha_credentials(
    name="default",
    api_token="your_token",
    space_did="did:web:..."
)

cred_manager.add_filecoin_credentials(
    name="mainnet",
    api_key="fil_api_key"
)

# Retrieve credentials securely
s3_creds = cred_manager.get_s3_credentials("production")
storacha_token = cred_manager.get_storacha_credentials()

Configuration Management

YAML Configuration:

# ~/.ipfs_kit/config.yaml
storage:
  backends:
    ipfs:
      enabled: true
      api_addr: "/ip4/127.0.0.1/tcp/5001"
    
    filecoin:
      enabled: true
      lotus_path: "/path/to/lotus"
    
    s3:
      enabled: true
      credential_name: "production"
      bucket: "ipfs-backup"
      region: "us-west-2"
    
    storacha:
      enabled: true
      credential_name: "default"

cache:
  memory_size: 500MB
  disk_size: 10GB
  disk_path: "/fast/ssd/cache"

cluster:
  role: "master"
  replication_factor: 3
  peers:
    - "/ip4/10.0.0.2/tcp/9096"
    - "/ip4/10.0.0.3/tcp/9096"

vfs:
  buckets:
    ml-models:
      quota: 10GB
      policy: hot
      replication: 3
    datasets:
      quota: 100GB
      policy: warm
      replication: 2

Environment Variables

# Credentials
export IPFS_KIT_S3_ACCESS_KEY="AKIA..."
export IPFS_KIT_S3_SECRET_KEY="secret..."
export W3_STORE_TOKEN="storacha_token"
export FILECOIN_API_KEY="fil_api_key"

# Configuration
export IPFS_PATH="/custom/ipfs/path"
export IPFS_KIT_CONFIG="/custom/config.yaml"
export IPFS_KIT_CACHE_DIR="/fast/ssd/cache"

# Feature flags
export IPFS_KIT_ENABLE_GRAPHRAG="true"
export IPFS_KIT_ENABLE_AUTO_HEALING="true"

# Optional: auto-install external daemon binaries (IPFS/Lotus) when missing
# Note: this may download platform-specific binaries.
export IPFS_KIT_AUTO_INSTALL_BINARIES="true"

# Optional: where downloaded binaries are placed
export IPFS_KIT_BIN_DIR="$HOME/.local/share/ipfs_kit_py/bin"

Security Best Practices

Credential Storage:

Store credentials in ~/.ipfs_kit/credentials.json with chmod 600
Never commit credentials to version control
Use environment variables in CI/CD
Consider system keyring integration for production

Configuration Security:

Separate configs for dev/staging/prod
Use secrets management services (AWS Secrets Manager, Vault)
Rotate credentials regularly
Audit access logs

See Also: Credential Management, Secure Credentials Guide

🚀 Quick Start

Installation

# Install core features
pip install ipfs_kit_py

# Walrus and fsspec backends are included in the core dependency set.
# The lazy loader can also install declared feature dependencies at first use
# unless IPFS_KIT_AUTO_INSTALL_LAZY_DEPS=0 is set.

# Install with AI/ML support
pip install ipfs_kit_py[ai_ml]

# Install with all features
pip install ipfs_kit_py[full]

# Development installation
git clone https://github.com/endomorphosis/ipfs_kit_py.git
cd ipfs_kit_py
pip install -e .[dev]

Implementation Progress

The supervised implementation run completed the three active VFS/fsspec task boards tracked under data/agent_supervisor/ipfs_kit_todo/state/:

Track	Task board	Completed
Walrus fsspec backend	`TODO_WALRUS_FSSPEC.md`	7 / 7
fsspec backend improvements	`TODO_FSSPEC_BACKENDS.md`	8 / 8
VFS GraphRAG indexing	`TODO_VFS_GRAPHRAG_INDEXING.md`	12 / 12

The state JSON files are the authoritative progress ledger. Some markdown checkboxes may still appear unchecked because the daemon could not rewrite the source boards after completing tasks, but the implementation state records all 27 tasks as completed with no ready, waiting, or blocked work remaining.

Feature Exposure

The Walrus, fsspec, and VFS GraphRAG work is available across the package, CLI, MCP server, dashboard, and browser SDK surfaces:

ipfs-kit walrus status
ipfs-kit walrus ls
ipfs-kit fsspec protocols
ipfs-kit graphrag status
ipfs-kit graphrag search "example query"

Python imports are available lazily from the package root:

from ipfs_kit_py import (
    VFSGraphRAGIndex,
    WalrusFileSystem,
    WalrusStorageClient,
    create_walrus_filesystem,
    register_fsspec_implementations,
)

MCP clients can call walrus_status, walrus_list, walrus_get, walrus_put, walrus_delete, fsspec_list_protocols, fsspec_backend_status, fsspec_read, fsspec_write, vfs_graphrag_status, vfs_graphrag_search, vfs_graphrag_metadata_search, vfs_graphrag_vector_search, vfs_graphrag_hybrid_search, vfs_graphrag_graph_search, and vfs_graphrag_graph_hybrid_search, and vfs_graphrag_export. The dashboard JavaScript SDK also exposes these through MCP.Walrus, MCP.FSSpec, and MCP.VFSGraphRAG.

Basic Usage

from ipfs_kit_py.high_level_api import IPFSSimpleAPI

# Initialize
api = IPFSSimpleAPI()

# Add content
result = api.add("Hello, IPFS!")
cid = result['cid']
print(f"Content added: {cid}")

# Retrieve content
content = api.get(cid)
print(f"Retrieved: {content}")

# Pin content for persistence
api.pin(cid)

# List all pins
pins = api.list_pins()

Cluster Operations

from ipfs_kit_py.high_level_api import IPFSSimpleAPI

# Initialize as cluster master
api = IPFSSimpleAPI(role="master")

# Add content to cluster (distributed across nodes)
result = api.cluster_add("large_file.dat", replication_factor=3)

# Check replication status
status = api.cluster_status(result['cid'])
print(f"Replicated on {len(status['peers'])} nodes")

# List cluster peers
peers = api.cluster_peers()

AI/ML Integration

from ipfs_kit_py.high_level_api import IPFSSimpleAPI
import pandas as pd

api = IPFSSimpleAPI()

# Store dataset
df = pd.read_csv("training_data.csv")
result = api.ai_dataset_add(
    dataset=df,
    metadata={
        "name": "customer_data_v1",
        "version": "1.0",
        "description": "Customer behavior dataset"
    }
)

# Retrieve dataset later
dataset_cid = result['cid']
loaded_df = api.ai_dataset_get(dataset_cid)

CLI Usage

# Start MCP server with dashboard
ipfs-kit mcp start --port 8004

# Check server status
ipfs-kit mcp status

# View deprecation warnings
ipfs-kit mcp deprecations

# Start 3-node cluster
python tools/start_3_node_cluster.py

📚 Documentation

Comprehensive documentation available in docs/:

Installation Guide - Setup and requirements
Quick Reference - Common operations
API Reference - Complete API docs
Cluster Guide - Cluster setup
AI/ML Integration - Machine learning features
MCP Server - MCP server documentation
Examples - Code examples and tutorials

🎓 Use Cases & Examples

1. Decentralized Application Storage

# Store application data immutably
api = IPFSSimpleAPI()
user_data = {"user_id": 123, "preferences": {...}}
cid = api.add(json.dumps(user_data))['cid']

# Share CID with users - data is permanently accessible
return f"ipfs://{cid}"

2. ML Model Distribution

# Publish trained model
model_path = "model.h5"
result = api.ai_model_add(
    model=load_model(model_path),
    metadata={"architecture": "ResNet50", "accuracy": 0.95}
)

# Others can load your model
model = api.ai_model_get(result['cid'])

3. Content Distribution Network

# Deploy content across cluster
api = IPFSSimpleAPI(role="master")
for file in website_files:
    api.cluster_add(file, replication_factor=5)

# Content automatically available on all nodes

4. Data Backup & Archival

# Backup with verification
result = api.add("important_data.zip", pin=True)
cid = result['cid']

# Later verification
assert api.exists(cid), "Backup lost!"
restored_data = api.get(cid)

🔧 Configuration

Basic Configuration

from ipfs_kit_py.high_level_api import IPFSSimpleAPI

api = IPFSSimpleAPI(
    role="master",  # master, worker, or leecher
    resources={
        "max_memory": "2GB",
        "max_storage": "100GB"
    },
    cache={
        "memory_size": "500MB",
        "disk_size": "5GB"
    },
    timeouts={
        "api": 60,
        "gateway": 120
    }
)

Environment Variables

# IPFS configuration
export IPFS_PATH=/path/to/.ipfs
export IPFS_KIT_CLUSTER_MODE=true

# MCP server
export IPFS_KIT_MCP_PORT=8004
export IPFS_KIT_DATA_DIR=~/.ipfs_kit

# Performance tuning
export IPFS_KIT_CACHE_SIZE=1GB
export IPFS_KIT_MAX_CONNECTIONS=50

🧪 Testing

# Run all tests
pytest

# Run specific test suite
pytest tests/unit/
pytest tests/integration/

# Run with coverage
pytest --cov=ipfs_kit_py --cov-report=html

# Run cluster tests
pytest tests/test_cluster_startup.py -v

🤝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📋 Requirements

Python: 3.12+ required
System: Linux (primary), macOS (supported), Windows (experimental)
Memory: 4GB minimum, 8GB recommended for clusters
Storage: 10GB minimum, 50GB+ recommended for production
Network: Internet access for IPFS network connectivity

🗺️ Roadmap

📜 License

This project is licensed under the AGPL-3.0 License - see the LICENSE file for details.

🙏 Acknowledgments

Built with:

IPFS/Kubo - InterPlanetary File System
IPFS Cluster - Cluster orchestration
py-libp2p - LibP2P networking
FastAPI - Modern web framework

📞 Support

Documentation: docs/
Issues: GitHub Issues
Discussions: GitHub Discussions

📊 Project Status

✅ Core IPFS operations - Production ready
✅ Cluster management - Production ready
✅ MCP server - Production ready
✅ AI/ML integration - Beta
✅ Auto-healing - Beta
🚧 GraphRAG - In development
📋 S3 Gateway - Planned

Version: 0.3.0
Status: Production Ready
Maintained by: Benjamin Barber (@endomorphosis)

Name		Name	Last commit message	Last commit date
Latest commit History 1,723 Commits
.config/Code - Insiders/User/globalStorage/saoudrizwan.claude-dev/settings		.config/Code - Insiders/User/globalStorage/saoudrizwan.claude-dev/settings
.github		.github
.playwright_local		.playwright_local
.vscode		.vscode
archive		archive
backup		backup
config		config
data		data
deployment		deployment
deprecated_dashboards		deprecated_dashboards
docs		docs
examples		examples
ipfs_accelerate_py @ 676c0eb		ipfs_accelerate_py @ 676c0eb
ipfs_kit_py		ipfs_kit_py
mcp		mcp
reorganization_backup		reorganization_backup
reorganization_backup_final		reorganization_backup_final
reorganization_backup_root		reorganization_backup_root
scripts		scripts
servers		servers
src		src
static		static
templates		templates
tests		tests
tools		tools
web		web
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
.gitmodules		.gitmodules
.main_installed		.main_installed
.npmrc		.npmrc
AUTO_HEALING_QUICK_START.md		AUTO_HEALING_QUICK_START.md
AUTO_HEALING_WORKFLOWS.md		AUTO_HEALING_WORKFLOWS.md
CHANGELOG.md		CHANGELOG.md
COPILOT_AUTO_HEALING_GUIDE.md		COPILOT_AUTO_HEALING_GUIDE.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
Dockerfile.docs		Dockerfile.docs
Dockerfile.gpu		Dockerfile.gpu
Dockerfile.test		Dockerfile.test
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TODO_FSSPEC_BACKENDS.md		TODO_FSSPEC_BACKENDS.md
TODO_VFS_GRAPHRAG_INDEXING.md		TODO_VFS_GRAPHRAG_INDEXING.md
TODO_WALRUS_FSSPEC.md		TODO_WALRUS_FSSPEC.md
consolidated_mcp_dashboard.py		consolidated_mcp_dashboard.py
coverage.json		coverage.json
daemon_config_manager.py		daemon_config_manager.py
docker-compose.yml		docker-compose.yml
enhanced_mcp_server_with_daemon_mgmt.py		enhanced_mcp_server_with_daemon_mgmt.py
install_ipfs.py		install_ipfs.py
install_lotus.py		install_lotus.py
migrate_secrets.py		migrate_secrets.py
package.json		package.json
postcss.config.js		postcss.config.js
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-gpu.txt		requirements-gpu.txt
requirements.txt		requirements.txt
setup.py		setup.py
sitecustomize.py		sitecustomize.py
tailwind.config.js		tailwind.config.js
test_audit_results.txt		test_audit_results.txt
zero_touch_install.sh		zero_touch_install.sh

Folders and files

Latest commit

History

Repository files navigation

IPFS Kit Python

🎯 What Can You Do With This?

For Developers

For Data Scientists

For DevOps/SRE

✨ Key Features

Core IPFS Operations

Advanced Cluster Management

AI/ML Integration

MCP Server

Storage & Performance

Operations & Monitoring

Deployment & Integration

🏗️ Architecture

🗄️ Storage Architecture & Backends

Multi-Backend Storage System

Multi-Tier Storage Strategy

Storage Backend Configuration

Walrus fsspec Usage

🔄 Replica Management

Replication Strategies

💾 Multi-Tier Caching System

Advanced Caching with ARC Algorithm

Cache Configuration

Cache Policies

📁 VFS Buckets & Virtual Filesystem

Virtual Filesystem (VFS) Operations

VFS Buckets

VFS Features

🧠 GraphRAG & Knowledge Graphs

Intelligent Search with GraphRAG

VFS GraphRAG Indexing

Knowledge Graph Features

🔐 Configuration & Secrets Management

Secure Credential Management

Configuration Management

Environment Variables

Security Best Practices

🚀 Quick Start

Installation

Implementation Progress

Feature Exposure

Basic Usage

Cluster Operations

AI/ML Integration

CLI Usage

📚 Documentation

🎓 Use Cases & Examples

1. Decentralized Application Storage

2. ML Model Distribution

3. Content Distribution Network

4. Data Backup & Archival

🔧 Configuration

Basic Configuration

Environment Variables

🧪 Testing

🤝 Contributing

📋 Requirements

🗺️ Roadmap

📜 License

🙏 Acknowledgments

📞 Support

📊 Project Status

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages