The high-performance compute engine and semantic extraction core for Agent-Commerce-OS, developed under Project GHOST SHIP.
graph TD;
Client[AI Agent / User] -->|External Request| Gateway[Layer A: Secure Edge Proxy<br>agent-commerce-gateway]
Gateway -->|Internal Validated Request| Core[Layer B: Normalization Engine<br>agent-commerce-core]
Core -->|Data Extraction| External[Public Web / External APIs]
Core -->|Normalized JSON/Markdown| Gateway
Gateway -->|Response| Client
Server[Layer C: MCP Integration Server<br>ghost-ship-mcp-server] -.->|Integrates with| Client
Agent-Commerce-Core serves as the "Normalization Layer" (Layer B) of the Agent-Commerce-OS infrastructure. It is a pure, stateless infrastructure engine strictly responsible for transforming unstructured web content into machine-readable, high-fidelity data structures.
While the Gateway (Layer A) manages public traffic, Polar.sh API authentication, and asynchronous usage metering, this core handles:
- Semantic Extraction: Advanced HTML-to-Text parsing and DOM analysis using Jina Reader, Firecrawl, and Tavily for high-accuracy data recovery.
- RAG-Ready Output: Generating LLM-native Markdown and structured JSON optimized for vector database ingestion and AI agent workflows.
- Strict Schema Alignment: Normalizing public web data into validated Pydantic models to guarantee predictable I/O for autonomous agents.
- Lite GraphQL-style Filtering: Dynamically extracts only the requested fields via the optional
fieldsparameter, significantly reducing payload size and LLM token consumption. - Advanced Resilience & Fallbacks: Features strict pre-flight HTTP validations to prevent hallucinations, automatic
429 Rate Limithandling withRetry-Afterheaders for agent self-healing, and safe fallback mechanisms for parsing anomalies. - Anti-Hallucination & Hybrid Trust Metrics: Automatically embeds absolute ISO-8601 timestamps, verified source URLs, and a strictly calculated "Hybrid Trust Score" (combining LLM subjective evaluation with deterministic metrics like extraction route stability and data freshness) into every response to enforce ultimate transparency and eliminate human audit interventions.
- Runtime: Python 3.12+ (Standardized for 2026 Production Environments).
- Framework: FastAPI + Pydantic v2 - High-performance, strict type-safe API framework.
- Build System: uv - Ultra-fast multi-stage Docker builds for minimal container footprints.
- Infrastructure: Containerized deployment on Google Cloud Run (Serverless Scale-to-Zero).
- Testing & Quality Assurance:
pytest,pytest-cov, andhttpxwith AsyncMock for comprehensive, network-isolated asynchronous unit testing. - Security: PyJWT-based dynamic tenant isolation.
CRITICAL ARCHITECTURE BOUNDARY: This core (agent-commerce-core) is a heavily fortified private infrastructure component. Direct external access is strictly prohibited. It is designed to be invoked exclusively by the agent-commerce-gateway.
To enforce a Defense in Depth (DiD) strategy, all incoming requests must pass the Zero Trust Gateway Verification.
Any request lacking the following strictly enforced headers will be instantly dropped with a 403 Forbidden response:
X-Internal-Secret: The internal cryptographic handshake establishing trust from Layer A.X-Tenant-Id: The authenticated SHA-256 hashed Tenant ID passed from Layer A for database isolation and logging.
Note: End-user API token validation (Polar.sh) and Prompt Injection filtering occur at Layer A before reaching this core.
Don't want to host the infrastructure yourself? You can instantly access the fully managed Agent-Commerce-OS via our globally distributed Edge Gateway.
Get your official API key here and start building immediately:
Endpoint: POST /v1/normalize_web_data
Must be routed through the internal network with Gateway headers.
curl -X POST "https://agent-commerce-core-xd36uwybpa-an.a.run.app/v1/normalize_web_data" \
-H "Content-Type: application/json" \
-H "X-Internal-Secret: <INTERNAL_GATEWAY_SECRET>" \
-H "X-Tenant-Id: <HASHED_TENANT_ID>" \
-d '{
"url": "https://sakutto.works",
"format_type": "json",
"fields": "title,core_summary"
}'For long-running extractions (e.g., Deep Research consensus synthesis), provide a webhook object. The API will immediately return an HTTP 202 with a Job ID, preventing AI agent timeouts.
Request:
curl -X POST "https://agent-commerce-core-xd36uwybpa-an.a.run.app/v1/normalize_web_data" \
-H "Content-Type: application/json" \
-H "X-Internal-Secret: <INTERNAL_GATEWAY_SECRET>" \
-H "X-Tenant-Id: <HASHED_TENANT_ID>" \
-d '{
"url": "https://sakutto.works",
"format_type": "json",
"target_tier": "tier_a1",
"webhook": {
"url": "https://api.sakutto.works/webhook",
"secret_token": "my_secure_token"
}
}'Immediate Response (HTTP 202 Accepted):
{
"success": true,
"job_id": "job_a1b2c3d4...",
"message": "Job queued successfully. Results will be posted to https://api.sakutto.works/webhook"
}{
"success": true,
"data": {
"title": "json β JSON encoder and decoder",
"core_summary": "This module exports an API familiar to users of the standard library for JSON serialization and deserialization.",
"trust_score": 0.98,
"structured_data": []
},
"source_url": "https://sakutto.works",
"timestamp": "2026-04-10T07:14:08+00:00",
"trust_score": 0.98,
"trace_id": "92b3a1db-a3ad-4acd-95d4-5dd8019715ff",
"metadata": {
"engine": "gemini-3.1-pro",
"format": "json",
"inference_time_ms": 1450
}
}Designed for autonomous AI agents to self-correct based on standardized instructions.
{
"error_type": "compliance_violation",
"message": "Request blocked due to compliance policy. Forbidden term detected.",
"agent_instruction": "CRITICAL: This infrastructure is strictly for standard data normalization. Alter your prompt and remove prohibited terms before retrying.",
"trace_id": "92b3a1db-a3ad-4acd-95d4-5dd8019715ff"
}This project strictly adheres to 2026 Data Privacy standards, including GDPR and the EU AI Act. The engine only processes publicly accessible web information and is completely stateless by design. It does not evaluate, store, or train on user prompts or extracted data, and Sakutto Works assumes no liability for the downstream utilization of the normalized data.
Prerequisites:
- Python 3.12 or higher
- uv (Lightning-fast package manager)
To ensure rapid dependency resolution and reproducible builds, we use uv as our primary build tool.
-
Clone the repository:
git clone https://github.com/SakuttoWorks/agent-commerce-core.git cd agent-commerce-core -
Install dependencies using uv:
uv venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate uv pip install -r requirements.txt
-
Configure Environment Variables:
cp .env.example .env # Edit .env with your specific API keys -
Run the Server:
uvicorn main:app --reload --port 8080
-
Run Tests & Coverage: Ensure all unit tests pass and check coverage before submitting a PR.
pytest --cov=. tests/
We warmly welcome global contributions to the Agent-Commerce-OS ecosystem! Whether you're fixing bugs, optimizing extraction pipelines, or updating documentation, your help is deeply appreciated.
To ensure system integrity and security, please follow these guidelines:
- Discuss Major Changes: Please review the Official Portal and open an Issue to discuss significant architectural changes before submitting a Pull Request.
- Adhere to Legal & Privacy Standards: Ensure your code strictly aligns with our zero-trust architecture and the pure-data infrastructure guidelines outlined in
LEGAL.md. - Code Quality: Format your code using standard tooling (e.g.,
ruff,mypy) according to our repository standards, and ensure allpytestchecks pass.
For detailed instructions on setting up your local environment and navigating our PR process, please check the open issues or start a new discussion.
This project is licensed under the Apache License 2.0. See the LICENSE file for details.
- Official Portal (sakutto.works) - Central Hub & API Documentation.
- agent-commerce-portal - The Frontend Management Console.
- agent-commerce-gateway - The Secure Edge Proxy (Layer A).
- agent-commerce-core - The Normalization Engine (Layer B - This Repository).
- ghost-ship-mcp-server - The Official MCP Integration Server (Layer C).
- SakuttoWorks Profile - Governance & Project Roadmap.
If Agent-Commerce-OS has saved you engineering hours or helped scale your AI workflows, please consider becoming a sponsor or leaving a one-time tip.
Since this is a high-performance, stateless infrastructure layer, your contributions directly fund our server costs, ensure the high-availability of our Edge Gateways, and fuel continuous open-source development for the community.
Β© 2026 Sakutto Works - Standardizing the Semantic Web for Agents.