Agent-Commerce-Core | The Normalization Engine

The high-performance compute engine and semantic extraction core for Agent-Commerce-OS, developed under Project GHOST SHIP.

🏗 Architecture Overview (Project GHOST SHIP)

graph TD;
    Client[AI Agent / User] -->|External Request| Gateway[Layer A: Secure Edge Proxy<br>agent-commerce-gateway]
    Gateway -->|Internal Validated Request| Core[Layer B: Normalization Engine<br>agent-commerce-core]
    Core -->|Data Extraction| External[Public Web / External APIs]
    Core -->|Normalized JSON/Markdown| Gateway
    Gateway -->|Response| Client
    Server[Layer C: MCP Integration Server<br>ghost-ship-mcp-server] -.->|Integrates with| Client

🏭 Role in Infrastructure

Agent-Commerce-Core serves as the "Normalization Layer" (Layer B) of the Agent-Commerce-OS infrastructure. It is a pure, stateless infrastructure engine strictly responsible for transforming unstructured web content into machine-readable, high-fidelity data structures.

While the Gateway (Layer A) manages public traffic, Polar.sh API authentication, and asynchronous usage metering, this core handles:

Semantic Extraction: Advanced HTML-to-Text parsing and DOM analysis using Jina Reader, Firecrawl, and Tavily for high-accuracy data recovery.
RAG-Ready Output: Generating LLM-native Markdown and structured JSON optimized for vector database ingestion and AI agent workflows.
Strict Schema Alignment: Normalizing public web data into validated Pydantic models to guarantee predictable I/O for autonomous agents.
Lite GraphQL-style Filtering: Dynamically extracts only the requested fields via the optional fields parameter, significantly reducing payload size and LLM token consumption.
Advanced Resilience & Fallbacks: Features strict pre-flight HTTP validations to prevent hallucinations, automatic 429 Rate Limit handling with Retry-After headers for agent self-healing, and safe fallback mechanisms for parsing anomalies.
Anti-Hallucination & Hybrid Trust Metrics: Automatically embeds absolute ISO-8601 timestamps, verified source URLs, and a strictly calculated "Hybrid Trust Score" (combining LLM subjective evaluation with deterministic metrics like extraction route stability and data freshness) into every response to enforce ultimate transparency and eliminate human audit interventions.

🛠 Tech Stack (Core Specifications)

Runtime: Python 3.12+ (Standardized for 2026 Production Environments).
Framework: FastAPI + Pydantic v2 - High-performance, strict type-safe API framework.
Build System: uv - Ultra-fast multi-stage Docker builds for minimal container footprints.
Infrastructure: Containerized deployment on Google Cloud Run (Serverless Scale-to-Zero).
Testing & Quality Assurance: pytest, pytest-cov, and httpx with AsyncMock for comprehensive, network-isolated asynchronous unit testing.
Security: PyJWT-based dynamic tenant isolation.

🛡 Zero Trust Inter-service Communication

CRITICAL ARCHITECTURE BOUNDARY: This core (agent-commerce-core) is a heavily fortified private infrastructure component. Direct external access is strictly prohibited. It is designed to be invoked exclusively by the agent-commerce-gateway.

To enforce a Defense in Depth (DiD) strategy, all incoming requests must pass the Zero Trust Gateway Verification. Any request lacking the following strictly enforced headers will be instantly dropped with a 403 Forbidden response:

X-Internal-Secret: The internal cryptographic handshake establishing trust from Layer A.
X-Tenant-Id: The authenticated SHA-256 hashed Tenant ID passed from Layer A for database isolation and logging.

Note: End-user API token validation (Polar.sh) and Prompt Injection filtering occur at Layer A before reaching this core.

☁️ Managed Cloud & API Access

Don't want to host the infrastructure yourself? You can instantly access the fully managed Agent-Commerce-OS via our globally distributed Edge Gateway.

Get your official API key here and start building immediately:

🚀 API Endpoint & Schema Definition

Endpoint: POST /v1/normalize_web_data

1. Example Request (`NormalizeRequest`)

Must be routed through the internal network with Gateway headers.

curl -X POST "https://agent-commerce-core-xd36uwybpa-an.a.run.app/v1/normalize_web_data" \
     -H "Content-Type: application/json" \
     -H "X-Internal-Secret: <INTERNAL_GATEWAY_SECRET>" \
     -H "X-Tenant-Id: <HASHED_TENANT_ID>" \
     -d '{
           "url": "https://sakutto.works",
           "format_type": "json",
           "fields": "title,core_summary"
         }'

1.5. Asynchronous Webhook Request (Tier A-1 Deep Research)

For long-running extractions (e.g., Deep Research consensus synthesis), provide a webhook object. The API will immediately return an HTTP 202 with a Job ID, preventing AI agent timeouts.

Request:

curl -X POST "https://agent-commerce-core-xd36uwybpa-an.a.run.app/v1/normalize_web_data" \
     -H "Content-Type: application/json" \
     -H "X-Internal-Secret: <INTERNAL_GATEWAY_SECRET>" \
     -H "X-Tenant-Id: <HASHED_TENANT_ID>" \
     -d '{
           "url": "https://sakutto.works",
           "format_type": "json",
           "target_tier": "tier_a1",
           "webhook": {
               "url": "https://api.sakutto.works/webhook",
               "secret_token": "my_secure_token"
           }
         }'

Immediate Response (HTTP 202 Accepted):

{
  "success": true,
  "job_id": "job_a1b2c3d4...",
  "message": "Job queued successfully. Results will be posted to https://api.sakutto.works/webhook"
}

2. Example Success Response (NormalizeResponse)

{
  "success": true,
  "data": {
    "title": "json — JSON encoder and decoder",
    "core_summary": "This module exports an API familiar to users of the standard library for JSON serialization and deserialization.",
    "trust_score": 0.98,
    "structured_data": []
  },
  "source_url": "https://sakutto.works",
  "timestamp": "2026-04-10T07:14:08+00:00",
  "trust_score": 0.98,
  "trace_id": "92b3a1db-a3ad-4acd-95d4-5dd8019715ff",
  "metadata": {
    "engine": "gemini-3.1-pro",
    "format": "json",
    "inference_time_ms": 1450
  }
}

3. Example AI-Optimized Error Response (AgentSemanticError)

Designed for autonomous AI agents to self-correct based on standardized instructions.

{
  "error_type": "compliance_violation",
  "message": "Request blocked due to compliance policy. Forbidden term detected.",
  "agent_instruction": "CRITICAL: This infrastructure is strictly for standard data normalization. Alter your prompt and remove prohibited terms before retrying.",
  "trace_id": "92b3a1db-a3ad-4acd-95d4-5dd8019715ff"
}

⚖️ Ethical Compliance

This project strictly adheres to 2026 Data Privacy standards, including GDPR and the EU AI Act. The engine only processes publicly accessible web information and is completely stateless by design. It does not evaluate, store, or train on user prompts or extracted data, and Sakutto Works assumes no liability for the downstream utilization of the normalized data.

💻 Local Setup & Development

Prerequisites:

Python 3.12 or higher
uv (Lightning-fast package manager)

To ensure rapid dependency resolution and reproducible builds, we use uv as our primary build tool.

Clone the repository:

git clone https://github.com/SakuttoWorks/agent-commerce-core.git
cd agent-commerce-core

Install dependencies using uv:

uv venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
uv pip install -r requirements.txt

Configure Environment Variables:

cp .env.example .env
# Edit .env with your specific API keys

Run the Server:
```
uvicorn main:app --reload --port 8080
```
Run Tests & Coverage: Ensure all unit tests pass and check coverage before submitting a PR.
```
pytest --cov=. tests/
```

🤝 Contributing

We warmly welcome global contributions to the Agent-Commerce-OS ecosystem! Whether you're fixing bugs, optimizing extraction pipelines, or updating documentation, your help is deeply appreciated.

To ensure system integrity and security, please follow these guidelines:

Discuss Major Changes: Please review the Official Portal and open an Issue to discuss significant architectural changes before submitting a Pull Request.
Adhere to Legal & Privacy Standards: Ensure your code strictly aligns with our zero-trust architecture and the pure-data infrastructure guidelines outlined in LEGAL.md.
Code Quality: Format your code using standard tooling (e.g., ruff, mypy) according to our repository standards, and ensure all pytest checks pass.

For detailed instructions on setting up your local environment and navigating our PR process, please check the open issues or start a new discussion.

📄 License

This project is licensed under the Apache License 2.0. See the LICENSE file for details.

🔗 Project Ecosystem

Official Portal (sakutto.works) - Central Hub & API Documentation.
agent-commerce-portal - The Frontend Management Console.
agent-commerce-gateway - The Secure Edge Proxy (Layer A).
agent-commerce-core - The Normalization Engine (Layer B - This Repository).
ghost-ship-mcp-server - The Official MCP Integration Server (Layer C).
SakuttoWorks Profile - Governance & Project Roadmap.

💖 Support the Project

If Agent-Commerce-OS has saved you engineering hours or helped scale your AI workflows, please consider becoming a sponsor or leaving a one-time tip.

Since this is a high-performance, stateless infrastructure layer, your contributions directly fund our server costs, ensure the high-availability of our Edge Gateways, and fuel continuous open-source development for the community.

Name		Name	Last commit message	Last commit date
Latest commit History 78 Commits
.github		.github
app		app
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LEGAL.md		LEGAL.md
LICENSE		LICENSE
README.md		README.md
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agent-Commerce-Core | The Normalization Engine

🏗 Architecture Overview (Project GHOST SHIP)

🏭 Role in Infrastructure

🛠 Tech Stack (Core Specifications)

🛡 Zero Trust Inter-service Communication

☁️ Managed Cloud & API Access

🚀 API Endpoint & Schema Definition

1. Example Request (`NormalizeRequest`)

1.5. Asynchronous Webhook Request (Tier A-1 Deep Research)

2. Example Success Response (NormalizeResponse)

3. Example AI-Optimized Error Response (AgentSemanticError)

⚖️ Ethical Compliance

💻 Local Setup & Development

🤝 Contributing

📄 License

🔗 Project Ecosystem

💖 Support the Project

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Agent-Commerce-Core | The Normalization Engine

🏗 Architecture Overview (Project GHOST SHIP)

🏭 Role in Infrastructure

🛠 Tech Stack (Core Specifications)

🛡 Zero Trust Inter-service Communication

☁️ Managed Cloud & API Access

🚀 API Endpoint & Schema Definition

1. Example Request (NormalizeRequest)

1.5. Asynchronous Webhook Request (Tier A-1 Deep Research)

2. Example Success Response (NormalizeResponse)

3. Example AI-Optimized Error Response (AgentSemanticError)

⚖️ Ethical Compliance

💻 Local Setup & Development

🤝 Contributing

📄 License

🔗 Project Ecosystem

💖 Support the Project

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Contributors

Uh oh!

Languages

1. Example Request (`NormalizeRequest`)