Releases: Stackbilt-dev/llm-providers
v1.9.0 — getRoutingInfo, onIteration abort, deprecation warnings
What's new
getRoutingInfo() — pre-flight routing snapshot
Call once at ingress to get { useCase, provider, model, estimatedInputTokens, modelLifecycle, deprecationWarning, ... } without dispatching. Pair with request.metadata.useCase to pre-classify at the gateway layer and let the catalog engine drive dispatch.
metadata.useCase passthrough
resolveUseCase() in the factory now reads request.metadata.useCase directly. Gateways classify once, set the field, and the catalog honours it.
onIteration abort — ToolLoopAbortSignal
Return { abort: true, reason? } from the onIteration callback in generateResponseWithTools to stop a tool loop immediately. ToolLoopAbortedError is thrown. Void-returning callbacks are unaffected.
Response deprecation annotations
generateResponse() attaches metadata.llmProvidersDeprecationWarning to any response using a compatibility or retired lifecycle model. The two Cerebras models deprecating 2026-05-27 surface warnings starting today.
VERSION constant fix
Was hardcoded '0.1.0' since v1.0.0; now tracks the real package version.
Full changelog
See CHANGELOG.md.
v1.8.0 — NVIDIA NIM provider
Adds NvidiaProvider with 9 NVIDIA-hosted models (Llama 3.3/3.1 70B, Llama 4 Maverick, Nemotron 70B/49B/253B, Mistral Large 2, DeepSeek V4
Flash/Pro). Tool calling verified live on Meta Llama and Nemotron families. Also ships the previously uncommitted v1.7.0 Cerebras scope. See CHANGELOG.md
for full details.
v1.6.5
1.6.5\n\n### Fixed\n- Published package ESM import resolution by switching runtime relative imports to explicit .js specifiers, fixing Node ERR_MODULE_NOT_FOUND for installed consumers.\n\n### Added\n- Tarball consumer smoke test (npm run test:package) that packs, installs in a clean temp project, and verifies both require and import entrypoints.\n- CI and publish workflow gates now run npm run test:package before release publish.
v1.6.0 — SSE validation, cache hints, schema canary
What's new
Streaming schema validation (#41)
All four providers now surface malformed SSE frames as SchemaDriftError and fire onSchemaDrift instead of swallowing silently. Anthropic additionally validates content_block_delta event shape and delta.text type; future tool-streaming delta types are skipped via forward-compat discriminator.
Cache-aware routing (#52)
- New
CacheHintstype —LLMRequest.cacheis a no-op for callers that don't set it - Anthropic:
strategy: 'provider-prefix'wraps the system prompt as a content block withcache_control: { type: 'ephemeral' }and marks the last tool as a breakpoint - OpenAI / Groq / Cerebras: automatic caching with no request-side translation needed
- Cached token counts normalized into
TokenUsage:cachedInputTokens,cacheReadInputTokens,cacheCreationInputTokens supportsPromptCacheflag added toModelCapabilities
Schema drift canary (#39 Part 2)
extractShape(obj)— flatpath → typemap from any response objectcompareShapes(golden, live)— diffs two shape maps into{ added, removed, changed }runCanaryCheck(provider, golden, liveResponse)— one-shot canary returning aCanaryReport- Golden fixtures committed for all five providers under
src/__tests__/fixtures/response-shapes/ - All three utilities exported from the package root
Previously merged, now documented
- Factory-level streaming with fallback (#26) —
generateResponseStreamuses the same circuit-breaker and fallback chain asgenerateResponse - Tool-use loop helper (#28) —
generateResponseWithToolswithToolLoopLimitError,ToolLoopAbortedError, iteration/cost caps, and abort-signal support - Cloudflare AI Gateway metadata forwarding (#29) —
cf-aig-*headers forwarded only whenbaseUrlmatches the Gateway pattern - Cloudflare LoRA / fine-tune forwarding (#51) —
LLMRequest.loraforwarded to Workers AI binding
Bug fixes
stop_sequenceschema false positive — was typed asstring; real Anthropic API returnsnullwhen no stop sequence triggers, causingSchemaDriftErroron every normal response. Fixed tostring-or-null.AnthropicProvider.getProviderBalance()— was calling a non-existent endpoint (/v1/organizations/cost_report). Now returnsunavailablewith a message directing users to the Admin API, matching the Groq pattern.
Full changelog
See CHANGELOG.md for the complete entry.
v1.5.1 — fix Cloudflare llama-3.2 vision silent empty response
Fixed
analyzeImage()silent empty response on Cloudflare —@cf/meta/llama-3.2-11b-vision-instructvia the Workers AI binding requires a raw{ image: number[], prompt, max_tokens }input shape, not the OpenAI-compatiblemessages/image_urlformat. The chat path returnschoices[0].message.content === nullvia the binding, causingextractText()to silently return"". The provider now detects this model and dispatches to the raw binding format. Other vision models are unaffected. Fixes #53.
Full changelog: https://github.com/Stackbilt-dev/llm-providers/blob/main/CHANGELOG.md
v1.5.0
Consolidates the unreleased 1.4.0 scope and undocumented features into a single minor release. 1.4.0 was tagged in package.json but never published to npm; consumers upgrading from 1.3.0 receive all of the following atomically.
Added
- Declarative model catalog (
src/model-catalog.ts) — semantic catalog for provider/model metadata, recommendation use cases, lifecycle status, and runtime scoring - Runtime recommendation API —
LLMProviders#getRecommendedModel(request, useCase?)exposes the same routing logic the factory uses internally - Schema drift envelope validation —
OpenAIProvider,GroqProvider,CerebrasProvider, andAnthropicProvidernow validate response envelopes at the provider boundary, throwingSchemaDriftErroron mismatch instead of corrupting downstream consumers silently LLMProviders.fromEnv()static factory — auto-discovers providers from Cloudflare Workersenvbindings without manual wiring- Model drift test — asserts every provider's
models[]is symmetrically covered by its capabilities map - Catalog tests — coverage for retired-model exclusion, health-aware ranking, request-shape use-case inference
Changed
- Factory routing selects provider/model pairs from the catalog instead of hardcoded ordering
- Health-aware dispatch considers circuit-breaker state including degraded and recovering providers, not just fully open
- Budget-aware dispatch — with a
CreditLedgerattached, selection demotes providers under high utilization or near projected depletion - Provider defaults for OpenAI, Anthropic, Cloudflare, Cerebras, and Groq resolve through the shared catalog
- Cloudflare model recommendation prefers modern active baselines (Gemma 4, GPT-OSS) instead of legacy TinyLlama/Qwen heuristics
- Recommendation exports exclude retired targets (e.g.
gpt-4o) while preserving deprecated constants for compatibility
Deprecated
MODELS.CLAUDE_3_HAIKU— migrate toCLAUDE_HAIKU_4_5orCLAUDE_3_5_HAIKUMODELS.GPT_4O— migrate toGPT_4O_MINIor a current GPT-4 successor
Removed
claude-3-haiku-20240307,gpt-4o, and dead aliasgpt-4-turbo-previewdropped from providermodels[]and capabilities tables. Arbitrary-string passthrough on request inputs is unchanged — consumers pinning olderMODELSenum values via string literals are not affected.
Full changelog: CHANGELOG.md
v1.3.0 — Cloudflare Workers AI vision support
Added
- Cloudflare Workers AI vision support —
CloudflareProvidernow acceptsrequest.imagesand routes to vision-capable models. Previously image data was silently dropped on the CF path. - Three new CF vision models:
@cf/google/gemma-4-26b-a4b-it— 256K context, vision + function calling + reasoning@cf/meta/llama-4-scout-17b-16e-instruct— natively multimodal, tool calling@cf/meta/llama-3.2-11b-vision-instruct— image understanding
CloudflareProvider.supportsVision = true— factory'sanalyzeImagenow dispatches to CF when configured.- Factory default vision fallback —
getDefaultVisionModel()falls back to@cf/google/gemma-4-26b-a4b-itwhen neither Anthropic nor OpenAI is configured, enabling CF-only deployments to useanalyzeImage().
Changed
- Images are passed to CF using the OpenAI-compatible
image_urlcontent-part shape (base64 data URIs). HTTP image URLs throw a helpfulConfigurationError— fetch the image and pass bytes inimage.data. - Attempting
request.imageson a non-vision CF model throws aConfigurationErrornaming the vision-capable alternatives.
Usage
factory.analyzeImage({
image: { data: base64, mimeType: 'image/jpeg' },
prompt: 'Extract recipe data',
model: '@cf/google/gemma-4-26b-a4b-it',
});See #43 for details.
v1.1.0 — Multi-Modal: Image Generation
Image Generation Provider
@stackbilt/llm-providers is now multi-modal — text + image inference under one package.
New: ImageProvider
import { ImageProvider } from '@stackbilt/llm-providers';
const img = new ImageProvider({
cloudflareAi: env.AI,
geminiApiKey: env.GEMINI_API_KEY,
});
const result = await img.generateImage({
prompt: 'a mountain landscape at sunset',
model: 'flux-dev',
});
// result.image: ArrayBuffer, result.responseTime, result.providerBuilt-in Models
| Model | Provider | Use Case |
|---|---|---|
sdxl-lightning |
Cloudflare | Fast drafts, free tier |
flux-klein |
Cloudflare | Balanced quality/speed |
flux-dev |
Cloudflare | Highest CF quality |
gemini-flash-image |
Text rendering capable | |
gemini-flash-image-preview |
Latest preview model |
Extracted from img-forge production codebase. Battle-tested response normalization handles all Workers AI return formats.
Full changelog: CHANGELOG.md
v1.0.0 — Production Release
First stable release. Production-tested in AEGIS cognitive kernel since v1.72.0.
Highlights
- Zero runtime dependencies — supply chain security by design
- 5 providers: OpenAI, Anthropic, Cloudflare Workers AI, Cerebras, Groq
LLMProviders.fromEnv()— one-line multi-provider setup- Graduated circuit breakers — automatic failover with half-open probe recovery
- CreditLedger — per-provider budget tracking with threshold alerts + burn rate projection
- npm provenance — every version cryptographically linked to its source commit
Install
npm install @stackbilt/llm-providersQuick Start
import { LLMProviders } from '@stackbilt/llm-providers';
const llm = LLMProviders.fromEnv(process.env);
const response = await llm.generateResponse({
messages: [{ role: 'user', content: 'Hello!' }],
});See README for full documentation.
See SECURITY.md for supply chain security policy.