A dynamic, schema-driven Python library designed to drastically reduce LLM token usage by compressing complex JSON objects into a strict, hierarchical format.
When building LLM applications (using GPT-4, Gemini, Claude, etc.), passing large JSON arrays in the system prompt consumes a massive amount of tokens due to repeated keys and whitespace.
Standard JSON (Heavy):
[
{"product": "Apple", "price": "1.20", "category": "Fruit"},
{"product": "Banana", "price": "0.80", "category": "Fruit"}
]|{Apple, 1.20, Fruit}|;|{Banana, 0.80, Fruit}|
Reduces token count from ~55 tokens (JSON) to ~18 tokens (Compact) for this example.
Token Efficiency: Reduces input payload size by 40-60% for repetitive data structures.
Schema Enforcement: Generates strict instructions for the LLM, reducing hallucinations.
Dynamic: Works with any Python object or dictionary; just define the schema at runtime.
Recursive: Supports deeply nested objects and lists via the |{...}| syntax.
git clone https://github.com/Ryujose/llm-compact-serializer.git
poetry install
This example demonstrates how to serialize a product list with nested destination data.
- Define Your Schema Tell the serializer what your data looks like. Order matters!
from llm_compact_serializer.domain.schema import CompactSchema, FieldConfig
from llm_compact_serializer.core.prompt_builder import PromptBuilder
# Define a nested schema for complex objects
destination_schema = CompactSchema(
name="Destination",
fields=[
FieldConfig(source_name="address"),
FieldConfig(source_name="phones", is_list=True) # Handles arrays [x, y]
]
)
# Define the root schema
product_schema = CompactSchema(
name="Product",
fields=[
FieldConfig(source_name="name"),
FieldConfig(source_name="price"),
FieldConfig(source_name="destination", nested_schema=destination_schema)
]
)
- Prepare Your Data You can use Dictionaries, Pydantic models, or Dataclasses.
data = [
{
"name": "MacBook Pro",
"price": "1200€",
"destination": {
"address": "Silicon Valley, CA",
"phones": [5550199, 5550200]
}
}
]
- Generate the Prompt The PromptBuilder automatically generates the protocol instructions and injects your compressed data.
builder = PromptBuilder(product_schema)
base_prompt = "Analyze the following orders: [INPUT]"
final_prompt = builder.build(base_prompt, data, data_marker="[INPUT]")
print(final_prompt)
- Output (What the LLM Sees)
[COMPACT_HIERARCHICAL_PROTOCOL]
[INSTRUCTIONS]
1. Interpret input strictly as a Recursive Compact Hierarchy.
2. Syntax: Complex objects enclosed in |{ }|, separated by comma.
3. Structure Mapping:
# 1 = Product (Root)
# 1a = name
# 1b = price
# 1.1 = destination
# 1.1a = address
# 1.1b* = phones
[END_PROTOCOL]
Analyze the following orders: |{MacBook Pro, 1200€, |{Silicon Valley, CA, [5550199, 5550200]}|}|
The project follows Clean Architecture principles to ensure modularity and ease of testing.
llm-compact-serializer/
├── .github/
│ └── workflows/
│ ├── ci.yml # CI/CD: Tests & Linting
│ └── publish.yml # CD: Publish to PyPI
├── src/
│ └── llm_compact_serializer/
│ ├── __init__.py
│ ├── domain/ # Schema definitions (The "Rules")
│ │ ├── __init__.py
│ │ └── schema.py
│ └── core/ # The Engine (Generic Logic)
│ ├── __init__.py
│ └── serializer.py
├── tests/
│ └──/ # more tests
├── LICENSE # MIT
├── README.md
├── pyproject.toml # Poetry Config
└── poetry.lock
Object Wrapping: All objects are wrapped in |{ ... }|.
Separators: Fields are separated by ,. Objects in a list are separated by ;.
Recursion: A field can contain another object, creating a nested structure: |{ val1, |{ val2 }| }|.
Arrays: Simple lists are wrapped in [...].
Missing Data: None or empty values are automatically replaced with - to maintain positional integrity.
Sanitization: Commas found within data values are automatically replaced (e.g., Doe, John -> Doe John) to prevent parsing errors.
We use pytest for comprehensive testing, covering unit logic and end-to-end integration.
# Run all tests
poetry run pytest
# Run with coverage report
poetry run pytest --cov=src
test_serializer.py: Verifies primitive handling, recursive nesting logic, and sanitization (handling commas in data).
test_integration.py: Validates the full workflow (Schema -> Data -> Prompt) using complex real-world examples.
-
Fork the repository.
-
Create a feature branch (git checkout -b feat/amazing-feature).
-
Commit your changes (git commit -m 'feat: Add amazing feature').
-
Push to the branch (git push origin feat/amazing-feature).
-
Open a Pull Request.
Distributed under the MIT License. See LICENSE for more information.