Skip to content

Oldrich333/raisin

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

raisin β€” dense Python for LLMs

πŸ‡ raisin

Write Python LLMs can read. ~50% fewer tokens, 100% same functionality.

Claude Code Skill 786 tests passing License: MIT


The metaphor

A raisin is a grape with the water removed. Same sweetness, same nutrients, same fruit β€” half the mass. This project does the same thing to Python: removes the water (docstrings, boilerplate, ceremonial type hints) and keeps the nutrients (logic, behavior, public API).

TL;DR

LLMs are trained to imitate human coding conventions β€” docstrings for Sphinx, type hints for IDE hover, verbose error handling for readable stack traces. None of these serve the LLM that's writing or reading the code.

When we let LLMs write natively for machine-reading, they produce programs that pass the same test suites in roughly half the tokens. Six independent experiments confirm the pattern:

Experiment Saved Verification
Click 8.2.1 retrofit 55.5% 738/738 tests pass
Flask 3.1.3 retrofit 62.7% syntax + structure
Bottle 0.13.4 retrofit 37.6% WSGI pipeline
Greenfield TODO CLI (written twice) 51.8% 28/28 tests pass
Guide β†’ agent β†’ URL shortener (from scratch) 47.2% 20/20 tests pass
Guide β†’ agent β†’ click/formatting.py (real library) 44.4% 738/738 tests pass
πŸ† Single-file record: flask/helpers.py 80.3% syntax + structure

Total: 786 tests verified. Zero regressions.


Install the Skill

raisin icon

Claude Code β€” one line

/plugin marketplace add Oldrich333/raisin
/plugin install raisin

Claude Code β€” manual

git clone https://github.com/Oldrich333/raisin.git /tmp/raisin
mkdir -p ~/.claude/skills
cp -r /tmp/raisin/plugins/raisin/skills/raisin ~/.claude/skills/

Codex / Gemini CLI / other agents

Copy plugins/raisin/skills/raisin/SKILL.md into your agent's skill directory. The skill is a single self-contained file β€” no dependencies.

Activate

Use a slash command:

/halfcode      # primary command
/dense         # alias
/raisin        # alias (brand)

Or natural language:

write this dense
minimize tokens
compress src/utils.py
no docstrings, llm-native style

The Core Experiment: Greenfield TODO CLI

To prove compression isn't "retrofit cheating," we wrote the same program twice from scratch, under two styles. Both pass the same 28-test spec.

Style Tokens LOC Tests
Normal Python (docstrings, type hints, verbose errors) 3,022 437 28/28 βœ“
LLM-native (dense from line 1) 1,458 104 28/28 βœ“
Ratio 48.2% 23.8% β€”

β†’ Full greenfield experiment


Guide Validation β€” Does the Methodology Transfer?

Sure, we can compress code. But can someone following the guide reproduce the results? We tested it twice:

Test 1 β€” URL Shortener from scratch (20 tests): Gave an agent only the guide + test suite. Result: 775 tokens, 20/20 pass, 47.2% savings.

Test 2 β€” Click formatting.py, inside the real library (738 tests): Gave a different agent only the guide + original Click file. The agent wrote a dense version that passes all 738 of Click's own tests. Result: 1,195 tokens, 738/738 pass, 44.4% savings β€” within 5% of our hand-tuned reference.

β†’ Guide validation experiment


Headline File: flask/helpers.py β€” 80.3% savings

TokensLOC
Original5,399641
Dense rewrite1,06480
Saved80.3%87.5%

Flask's helpers.py is mostly small utility functions each wrapped in 30 lines of docstrings and type overloads. The dense version preserves all public API and behavior in 80 lines.


Reproducing the results

Prerequisites

python3 -m pip install pytest tiktoken pyyaml

Verify Click (738 tests at every level)

bash tools/run_tests.sh original          # 738 passed
bash tools/run_tests.sh L1_clean          # 738 passed
bash tools/run_tests.sh LK_kolmogorov     # 738 passed
bash tools/run_tests.sh LK2_aggressive    # 738 passed

Verify greenfield TODO (28 tests Γ— 2 implementations)

cd greenfield
TODO_IMPL=normal python3 -m pytest tests/ -q       # 28 passed
TODO_IMPL=kolmogorov python3 -m pytest tests/ -q   # 28 passed

Verify guide validation (20 tests Γ— 2 implementations)

cd guide_validation
URLSHORT_IMPL=normal python3 -m pytest spec/ -q       # 20 passed
URLSHORT_IMPL=kolmogorov python3 -m pytest spec/ -q   # 20 passed

Measure token counts across all levels

python3 tools/measure.py

The methodology

CODE_COMPRESSION_GUIDE.md is an M2M (machine-to-machine) document β€” no prose, no chatty explanations. It contains:

  • FRAMING β€” what to optimize, what to preserve, what to ignore
  • WHAT TO REMOVE β€” always waste: docstrings, comments, blank lines, overload stubs, internal type hints
  • WHAT TO RESTRUCTURE β€” the real gains: shared error handlers, validation helpers, dict dispatch, bulk attribute assignment
  • NAMING RULES β€” short internal, clear public
  • FORMATTING RULES β€” semicolons, one-liners, comprehensions
  • VERIFICATION PROTOCOL β€” how to catch bugs without reverting
  • CHECKLIST β€” grep patterns for each optimization opportunity
  • GREENFIELD vs RETROFIT β€” different process for each

The guide is not a tutorial. It's a specification for LLM agents. The skill in plugins/raisin/skills/raisin/SKILL.md is the same methodology packaged as a Claude Code / Codex skill.


Repository Structure

raisin/
β”œβ”€β”€ README.md                      β€” you are here
β”œβ”€β”€ CODE_COMPRESSION_GUIDE.md      β€” the methodology (agent instructions)
β”œβ”€β”€ RESULTS.md                     β€” detailed retrofit results
β”œβ”€β”€ TECHNIQUES.md                  β€” design document
β”œβ”€β”€ READABILITY_ARGUMENT.md        β€” pre-emptive response to "but it's unreadable"
β”‚
β”œβ”€β”€ .claude-plugin/
β”‚   └── marketplace.json           β€” Claude Code marketplace definition
β”‚
β”œβ”€β”€ plugins/raisin/                β€” the installable skill
β”‚   β”œβ”€β”€ .claude-plugin/plugin.json
β”‚   β”œβ”€β”€ README.md
β”‚   └── skills/raisin/SKILL.md     β€” methodology as agent instruction
β”‚
β”œβ”€β”€ assets/
β”‚   β”œβ”€β”€ raisin-banner.png          β€” social preview
β”‚   β”œβ”€β”€ raisin-logo.png            β€” logo with code braces
β”‚   └── raisin-icon-simple.png     β€” minimalist icon
β”‚
β”œβ”€β”€ greenfield/                    β€” write-from-scratch experiment
β”‚   β”œβ”€β”€ SPEC.md, RESULTS.md
β”‚   β”œβ”€β”€ tests/test_todo.py         β€” 28 tests (shared)
β”‚   β”œβ”€β”€ normal/todo.py             β€” human-imitating (437 LOC)
β”‚   β”œβ”€β”€ normal_L1/todo.py          β€” after automated strip
β”‚   β”œβ”€β”€ normal_L2/todo.py          β€” after cosmetic pass
β”‚   └── kolmogorov/todo.py         β€” LLM-native (104 LOC)
β”‚
β”œβ”€β”€ guide_validation/              β€” methodology transfer test
β”‚   β”œβ”€β”€ spec/SPEC.md, spec/test_url_short.py  β€” 20 tests
β”‚   β”œβ”€β”€ normal/url_short.py        β€” reference human-style
β”‚   └── kolmogorov/url_short.py    β€” agent wrote this using only the guide
β”‚
β”œβ”€β”€ original/                      β€” Click 8.2.1 source
β”œβ”€β”€ L1_clean/                      β€” Click after automated strip
β”œβ”€β”€ LK_kolmogorov/                 β€” Click after LLM-native rewrite
β”œβ”€β”€ LK2_aggressive/                β€” Click after second rewrite pass
β”œβ”€β”€ LK3_agent_click/               β€” Click with agent's formatting.py
β”‚
β”œβ”€β”€ flask_benchmark/               β€” Flask 3.1.3 + compressed versions
β”œβ”€β”€ bottle_benchmark/              β€” Bottle 0.13.4 + compressed versions
β”‚
β”œβ”€β”€ tests/                         β€” Click 8.2.1's own 738-test suite
└── tools/                         β€” measure/strip/run_tests/full_report

Why this matters

Context window economics

A 200K context window loaded with Click + Flask + Bottle (original) consumes 170,767 tokens β€” 85% of the window. The LLM-native versions consume 79,542 tokens β€” 40%, leaving 120K tokens free for actual thinking.

Across a 50-library research mission, this saves ~1.5M tokens and roughly $4.50 per run at Claude's API rates.

LLM coding speed

When an LLM produces docstrings, type hints, and verbose error handling, it spends real wall-clock time generating tokens that nothing will ever read. Remove those constraints and the LLM writes the program faster and cleaner.

Code review

LLM-native code is not unreadable β€” it's densely readable. A Python programmer reading the 104-line TODO finishes faster than the 437-line version because there's less skipping. LLMs read it trivially.


Related Work

Atlas Coding Engine (ACE Protocol v15) β€” the production methodology this benchmark validates. Atlas (the agentic AI platform that produced this benchmark) has 47,472 LOC in 163 shard files, built LLM-native from day one and claiming ~60% LOC savings β€” verified here.


License

  • Repo analysis, methodology, tooling, skill: MIT
  • Click, Flask, Bottle and derivative works: original licenses (BSD-3, BSD-3, MIT)
  • See LICENSE for details

Citation

@misc{raisin-2026,
  author = {Oldrich333},
  title = {raisin: Write Python LLMs can read},
  year = {2026},
  url = {https://github.com/Oldrich333/raisin}
}

About

πŸ‡ Write Python code LLMs can read β€” ~50% fewer tokens, 100% same functionality. Verified by 786 tests across 6 experiments. Claude Code skill + benchmarks + methodology.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors