A lightweight guided generation library for small (<1B) language models, optimized for local usage.
Typed guided generation for LLMs. Forces model outputs to conform to schemas using token-level constrained decoding.
import guidegen as gg
gen = gg.GuideGen("HuggingFaceTB/SmolLM2-360M-Instruct")
# Classification
sentiment = gen.generate("Great product!", ["Positive", "Negative", "Neutral"])
# => "Positive"
print(sentiment)
# Typed extraction
age = gen.generate("John is 25 years old", int)
# => 25
print(age)
# Pydantic models
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
person = gen.generate("John Doe, age 30", Person)
# => Person(name="John Doe", age=30)
print(person)- Type-safe generation -
int,float,bool,str,Literal, Pydantic models,List[T] - Three-stage decoding - Forced emission, accept-reject sampling, full mask fallback
- Schema-compiled decoding - Deterministic structure emission for small models (~75% fewer model calls)
- Adaptive strategies - Auto-detects model size and selects optimal decoding strategy
- Classification with calibration - Full logprob scoring, label bias correction
- Uncertainty detection -
safe()API returnsNonewhen the model is uncertain - Streaming - Token-level streaming with commit events
- Reason-then-render - Unconstrained reasoning followed by constrained output
Install directly from GitHub (PyPI coming soon):
pip install git+https://github.com/tabularis-ai/guidegen.gitOr clone the repository and install in editable mode:
git clone https://github.com/tabularis-ai/guidegen.git
cd guidegen
pip install -e .Note: Pydantic is included by default. If you encounter import errors, try installing without the -e flag or use the clone method above.
import guidegen as gg
from typing import Literal
from pydantic import BaseModel
# Load model
gen = gg.GuideGen("HuggingFaceTB/SmolLM2-360M-Instruct")
# Classification
label = gen.generate(
"The movie was terrible and boring",
Literal["Positive", "Negative", "Neutral"]
)
# Or pass a list directly
label = gen.generate("Great product!", ["Positive", "Negative", "Neutral"])
# Extraction with Pydantic
class Contact(BaseModel):
name: str
email: str
age: int
contact = gen.generate(
"Reach out to Alice Smith at alice@example.com, she's 28",
Contact
)
# Classification with calibration and confidence scores
result = gen.generate(
"Ich mag es",
["Positiv", "Negativ", "Neutral"],
return_details=True, # returns ClassificationResult
calibrate=True, # removes label prior bias
prompt_suffix="Antwort:", # multilingual prompt suffix
)
# result.label => "Positiv", result.score => 0.87, result.scores => {...}
# Safe mode - abstains if uncertain
result = gen.safe(
"What color is the sky on Mars?",
Literal["Red", "Blue", "Green"],
uncertainty=gg.UncertaintyConfig(min_confidence=0.6)
)
# result is None if model is uncertain, otherwise the labelGuideGen automatically selects the best decoding strategy based on model size:
| Strategy | Best for | How it works |
|---|---|---|
"standard" |
Models >= 3B | Three-stage decoding with JSON grammar |
"compiled" |
Models < 3B | Schema-driven: emits structure, model fills values |
"decomposed" |
Very weak models | Generates each field independently |
Override with:
options = gg.GuideGenOptions(strategy="compiled")
result = gen.generate(prompt, Person, options=options)- TypeSpec compilation - Your Python type is compiled into a language-independent schema
- Constraint engine - The schema drives a token-level constraint engine (JSON grammar + slot engines)
- Three-stage decoding:
- Stage 0: Forced emission - only one valid token, emit it directly
- Stage 1: Accept-reject - sample from model, validate against constraints
- Stage 2: Full mask - compute all valid tokens, sample from masked distribution
- Schema-compiled mode (small models): Engine emits all structural tokens deterministically, model only generates values
options = gg.GuideGenOptions(return_trace=True)
result, trace = gen.generate("John is 25", Person, options=options)
print(f"Stage 0 forced: {trace.stage0_forced_count}")
print(f"Stage 1 accepted: {trace.stage1_accept_count}")
print(f"Stage 2 fallback: {trace.stage2_fallback_count}")
print(f"Avg logprob: {trace.average_logprob:.2f}")MIT