LLM Compact Serializer

A dynamic, schema-driven Python library designed to drastically reduce LLM token usage by compressing complex JSON objects into a strict, hierarchical format.

🚀 Why Use This?

When building LLM applications (using GPT-4, Gemini, Claude, etc.), passing large JSON arrays in the system prompt consumes a massive amount of tokens due to repeated keys and whitespace.

Standard JSON (Heavy):

[
  {"product": "Apple", "price": "1.20", "category": "Fruit"},
  {"product": "Banana", "price": "0.80", "category": "Fruit"}
]

Compact Protocol (Efficient):

|{Apple, 1.20, Fruit}|;|{Banana, 0.80, Fruit}|

Impact:

Reduces token count from ~55 tokens (JSON) to ~18 tokens (Compact) for this example.

Key Benefits

Token Efficiency: Reduces input payload size by 40-60% for repetitive data structures.

Schema Enforcement: Generates strict instructions for the LLM, reducing hallucinations.

Dynamic: Works with any Python object or dictionary; just define the schema at runtime.

Recursive: Supports deeply nested objects and lists via the |{...}| syntax.

📦 Installation

Clone the repository

git clone https://github.com/Ryujose/llm-compact-serializer.git

Install dependencies using Poetry

poetry install

⚡ Quick Start

This example demonstrates how to serialize a product list with nested destination data.

Define Your Schema Tell the serializer what your data looks like. Order matters!

from llm_compact_serializer.domain.schema import CompactSchema, FieldConfig
from llm_compact_serializer.core.prompt_builder import PromptBuilder

# Define a nested schema for complex objects
destination_schema = CompactSchema(
    name="Destination",
    fields=[
        FieldConfig(source_name="address"),
        FieldConfig(source_name="phones", is_list=True) # Handles arrays [x, y]
    ]
)

# Define the root schema
product_schema = CompactSchema(
    name="Product",
    fields=[
        FieldConfig(source_name="name"),
        FieldConfig(source_name="price"),
        FieldConfig(source_name="destination", nested_schema=destination_schema)
    ]
)

Prepare Your Data You can use Dictionaries, Pydantic models, or Dataclasses.

data = [
    {
        "name": "MacBook Pro", 
        "price": "1200€", 
        "destination": {
            "address": "Silicon Valley, CA",
            "phones": [5550199, 5550200]
        }
    }
]

Generate the Prompt The PromptBuilder automatically generates the protocol instructions and injects your compressed data.

builder = PromptBuilder(product_schema)
base_prompt = "Analyze the following orders: [INPUT]"

final_prompt = builder.build(base_prompt, data, data_marker="[INPUT]")
print(final_prompt)

Output (What the LLM Sees)

[COMPACT_HIERARCHICAL_PROTOCOL]
[INSTRUCTIONS]
1. Interpret input strictly as a Recursive Compact Hierarchy.
2. Syntax: Complex objects enclosed in |{ }|, separated by comma.
3. Structure Mapping:
# 1 = Product (Root)
# 1a = name
# 1b = price
# 1.1 = destination
# 1.1a = address
# 1.1b* = phones
[END_PROTOCOL]

Analyze the following orders: |{MacBook Pro, 1200€, |{Silicon Valley, CA, [5550199, 5550200]}|}|

🏗 Architecture

The project follows Clean Architecture principles to ensure modularity and ease of testing.

llm-compact-serializer/
├── .github/
│   └── workflows/
│       ├── ci.yml              # CI/CD: Tests & Linting
│       └── publish.yml         # CD: Publish to PyPI
├── src/
│   └── llm_compact_serializer/
│       ├── __init__.py
│       ├── domain/             # Schema definitions (The "Rules")
│       │   ├── __init__.py
│       │   └── schema.py
│       └── core/               # The Engine (Generic Logic)
│           ├── __init__.py
│           └── serializer.py
├── tests/ 
│   └──/ # more tests
├── LICENSE                     # MIT
├── README.md
├── pyproject.toml              # Poetry Config
└── poetry.lock

The Protocol Rules

Object Wrapping: All objects are wrapped in |{ ... }|.

Separators: Fields are separated by ,. Objects in a list are separated by ;.

Recursion: A field can contain another object, creating a nested structure: |{ val1, |{ val2 }| }|.

Arrays: Simple lists are wrapped in [...].

Missing Data: None or empty values are automatically replaced with - to maintain positional integrity.

Sanitization: Commas found within data values are automatically replaced (e.g., Doe, John -> Doe John) to prevent parsing errors.

🧪 Testing

We use pytest for comprehensive testing, covering unit logic and end-to-end integration.

# Run all tests
poetry run pytest

# Run with coverage report
poetry run pytest --cov=src

Key Test Scenarios

test_serializer.py: Verifies primitive handling, recursive nesting logic, and sanitization (handling commas in data).

test_integration.py: Validates the full workflow (Schema -> Data -> Prompt) using complex real-world examples.

🤝 Contributing

Fork the repository.
Create a feature branch (git checkout -b feat/amazing-feature).
Commit your changes (git commit -m 'feat: Add amazing feature').
Push to the branch (git push origin feat/amazing-feature).
Open a Pull Request.

📄 License

Distributed under the MIT License. See LICENSE for more information.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/workflows		.github/workflows
examples		examples
src/llm_compact_serializer		src/llm_compact_serializer
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
env.example		env.example
poetry.lock		poetry.lock
poetry.toml		poetry.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Compact Serializer

🚀 Why Use This?

Compact Protocol (Efficient):

Impact:

Key Benefits

📦 Installation

Clone the repository

Install dependencies using Poetry

⚡ Quick Start

🏗 Architecture

The Protocol Rules

🧪 Testing

Key Test Scenarios

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

LLM Compact Serializer

🚀 Why Use This?

Compact Protocol (Efficient):

Impact:

Key Benefits

📦 Installation

Clone the repository

Install dependencies using Poetry

⚡ Quick Start

🏗 Architecture

The Protocol Rules

🧪 Testing

Key Test Scenarios

🤝 Contributing

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages