Skip to content

Releases: Stackbilt-dev/llm-providers

v1.9.0 — getRoutingInfo, onIteration abort, deprecation warnings

22 May 11:22

Choose a tag to compare

What's new

getRoutingInfo() — pre-flight routing snapshot

Call once at ingress to get { useCase, provider, model, estimatedInputTokens, modelLifecycle, deprecationWarning, ... } without dispatching. Pair with request.metadata.useCase to pre-classify at the gateway layer and let the catalog engine drive dispatch.

metadata.useCase passthrough

resolveUseCase() in the factory now reads request.metadata.useCase directly. Gateways classify once, set the field, and the catalog honours it.

onIteration abort — ToolLoopAbortSignal

Return { abort: true, reason? } from the onIteration callback in generateResponseWithTools to stop a tool loop immediately. ToolLoopAbortedError is thrown. Void-returning callbacks are unaffected.

Response deprecation annotations

generateResponse() attaches metadata.llmProvidersDeprecationWarning to any response using a compatibility or retired lifecycle model. The two Cerebras models deprecating 2026-05-27 surface warnings starting today.

VERSION constant fix

Was hardcoded '0.1.0' since v1.0.0; now tracks the real package version.

Full changelog

See CHANGELOG.md.

v1.8.0 — NVIDIA NIM provider

22 May 10:39

Choose a tag to compare

Adds NvidiaProvider with 9 NVIDIA-hosted models (Llama 3.3/3.1 70B, Llama 4 Maverick, Nemotron 70B/49B/253B, Mistral Large 2, DeepSeek V4
Flash/Pro). Tool calling verified live on Meta Llama and Nemotron families. Also ships the previously uncommitted v1.7.0 Cerebras scope. See CHANGELOG.md
for full details.

v1.6.5

15 May 10:04

Choose a tag to compare

1.6.5\n\n### Fixed\n- Published package ESM import resolution by switching runtime relative imports to explicit .js specifiers, fixing Node ERR_MODULE_NOT_FOUND for installed consumers.\n\n### Added\n- Tarball consumer smoke test (npm run test:package) that packs, installs in a clean temp project, and verifies both require and import entrypoints.\n- CI and publish workflow gates now run npm run test:package before release publish.

v1.6.0 — SSE validation, cache hints, schema canary

27 Apr 23:23

Choose a tag to compare

What's new

Streaming schema validation (#41)

All four providers now surface malformed SSE frames as SchemaDriftError and fire onSchemaDrift instead of swallowing silently. Anthropic additionally validates content_block_delta event shape and delta.text type; future tool-streaming delta types are skipped via forward-compat discriminator.

Cache-aware routing (#52)

  • New CacheHints type — LLMRequest.cache is a no-op for callers that don't set it
  • Anthropic: strategy: 'provider-prefix' wraps the system prompt as a content block with cache_control: { type: 'ephemeral' } and marks the last tool as a breakpoint
  • OpenAI / Groq / Cerebras: automatic caching with no request-side translation needed
  • Cached token counts normalized into TokenUsage: cachedInputTokens, cacheReadInputTokens, cacheCreationInputTokens
  • supportsPromptCache flag added to ModelCapabilities

Schema drift canary (#39 Part 2)

  • extractShape(obj) — flat path → type map from any response object
  • compareShapes(golden, live) — diffs two shape maps into { added, removed, changed }
  • runCanaryCheck(provider, golden, liveResponse) — one-shot canary returning a CanaryReport
  • Golden fixtures committed for all five providers under src/__tests__/fixtures/response-shapes/
  • All three utilities exported from the package root

Previously merged, now documented

  • Factory-level streaming with fallback (#26) — generateResponseStream uses the same circuit-breaker and fallback chain as generateResponse
  • Tool-use loop helper (#28) — generateResponseWithTools with ToolLoopLimitError, ToolLoopAbortedError, iteration/cost caps, and abort-signal support
  • Cloudflare AI Gateway metadata forwarding (#29) — cf-aig-* headers forwarded only when baseUrl matches the Gateway pattern
  • Cloudflare LoRA / fine-tune forwarding (#51) — LLMRequest.lora forwarded to Workers AI binding

Bug fixes

  • stop_sequence schema false positive — was typed as string; real Anthropic API returns null when no stop sequence triggers, causing SchemaDriftError on every normal response. Fixed to string-or-null.
  • AnthropicProvider.getProviderBalance() — was calling a non-existent endpoint (/v1/organizations/cost_report). Now returns unavailable with a message directing users to the Admin API, matching the Groq pattern.

Full changelog

See CHANGELOG.md for the complete entry.

v1.5.1 — fix Cloudflare llama-3.2 vision silent empty response

27 Apr 12:05
68f6daf

Choose a tag to compare

Fixed

  • analyzeImage() silent empty response on Cloudflare@cf/meta/llama-3.2-11b-vision-instruct via the Workers AI binding requires a raw { image: number[], prompt, max_tokens } input shape, not the OpenAI-compatible messages/image_url format. The chat path returns choices[0].message.content === null via the binding, causing extractText() to silently return "". The provider now detects this model and dispatches to the raw binding format. Other vision models are unaffected. Fixes #53.

Full changelog: https://github.com/Stackbilt-dev/llm-providers/blob/main/CHANGELOG.md

v1.5.0

23 Apr 11:58

Choose a tag to compare

Consolidates the unreleased 1.4.0 scope and undocumented features into a single minor release. 1.4.0 was tagged in package.json but never published to npm; consumers upgrading from 1.3.0 receive all of the following atomically.

Added

  • Declarative model catalog (src/model-catalog.ts) — semantic catalog for provider/model metadata, recommendation use cases, lifecycle status, and runtime scoring
  • Runtime recommendation APILLMProviders#getRecommendedModel(request, useCase?) exposes the same routing logic the factory uses internally
  • Schema drift envelope validationOpenAIProvider, GroqProvider, CerebrasProvider, and AnthropicProvider now validate response envelopes at the provider boundary, throwing SchemaDriftError on mismatch instead of corrupting downstream consumers silently
  • LLMProviders.fromEnv() static factory — auto-discovers providers from Cloudflare Workers env bindings without manual wiring
  • Model drift test — asserts every provider's models[] is symmetrically covered by its capabilities map
  • Catalog tests — coverage for retired-model exclusion, health-aware ranking, request-shape use-case inference

Changed

  • Factory routing selects provider/model pairs from the catalog instead of hardcoded ordering
  • Health-aware dispatch considers circuit-breaker state including degraded and recovering providers, not just fully open
  • Budget-aware dispatch — with a CreditLedger attached, selection demotes providers under high utilization or near projected depletion
  • Provider defaults for OpenAI, Anthropic, Cloudflare, Cerebras, and Groq resolve through the shared catalog
  • Cloudflare model recommendation prefers modern active baselines (Gemma 4, GPT-OSS) instead of legacy TinyLlama/Qwen heuristics
  • Recommendation exports exclude retired targets (e.g. gpt-4o) while preserving deprecated constants for compatibility

Deprecated

  • MODELS.CLAUDE_3_HAIKU — migrate to CLAUDE_HAIKU_4_5 or CLAUDE_3_5_HAIKU
  • MODELS.GPT_4O — migrate to GPT_4O_MINI or a current GPT-4 successor

Removed

  • claude-3-haiku-20240307, gpt-4o, and dead alias gpt-4-turbo-preview dropped from provider models[] and capabilities tables. Arbitrary-string passthrough on request inputs is unchanged — consumers pinning older MODELS enum values via string literals are not affected.

Full changelog: CHANGELOG.md

v1.3.0 — Cloudflare Workers AI vision support

17 Apr 00:05
bf713cd

Choose a tag to compare

Added

  • Cloudflare Workers AI vision supportCloudflareProvider now accepts request.images and routes to vision-capable models. Previously image data was silently dropped on the CF path.
  • Three new CF vision models:
    • @cf/google/gemma-4-26b-a4b-it — 256K context, vision + function calling + reasoning
    • @cf/meta/llama-4-scout-17b-16e-instruct — natively multimodal, tool calling
    • @cf/meta/llama-3.2-11b-vision-instruct — image understanding
  • CloudflareProvider.supportsVision = true — factory's analyzeImage now dispatches to CF when configured.
  • Factory default vision fallbackgetDefaultVisionModel() falls back to @cf/google/gemma-4-26b-a4b-it when neither Anthropic nor OpenAI is configured, enabling CF-only deployments to use analyzeImage().

Changed

  • Images are passed to CF using the OpenAI-compatible image_url content-part shape (base64 data URIs). HTTP image URLs throw a helpful ConfigurationError — fetch the image and pass bytes in image.data.
  • Attempting request.images on a non-vision CF model throws a ConfigurationError naming the vision-capable alternatives.

Usage

factory.analyzeImage({
  image: { data: base64, mimeType: 'image/jpeg' },
  prompt: 'Extract recipe data',
  model: '@cf/google/gemma-4-26b-a4b-it',
});

See #43 for details.

v1.1.0 — Multi-Modal: Image Generation

01 Apr 15:54

Choose a tag to compare

Image Generation Provider

@stackbilt/llm-providers is now multi-modal — text + image inference under one package.

New: ImageProvider

import { ImageProvider } from '@stackbilt/llm-providers';

const img = new ImageProvider({
  cloudflareAi: env.AI,
  geminiApiKey: env.GEMINI_API_KEY,
});

const result = await img.generateImage({
  prompt: 'a mountain landscape at sunset',
  model: 'flux-dev',
});
// result.image: ArrayBuffer, result.responseTime, result.provider

Built-in Models

Model Provider Use Case
sdxl-lightning Cloudflare Fast drafts, free tier
flux-klein Cloudflare Balanced quality/speed
flux-dev Cloudflare Highest CF quality
gemini-flash-image Google Text rendering capable
gemini-flash-image-preview Google Latest preview model

Extracted from img-forge production codebase. Battle-tested response normalization handles all Workers AI return formats.

Full changelog: CHANGELOG.md

v1.0.0 — Production Release

01 Apr 14:12

Choose a tag to compare

First stable release. Production-tested in AEGIS cognitive kernel since v1.72.0.

Highlights

  • Zero runtime dependencies — supply chain security by design
  • 5 providers: OpenAI, Anthropic, Cloudflare Workers AI, Cerebras, Groq
  • LLMProviders.fromEnv() — one-line multi-provider setup
  • Graduated circuit breakers — automatic failover with half-open probe recovery
  • CreditLedger — per-provider budget tracking with threshold alerts + burn rate projection
  • npm provenance — every version cryptographically linked to its source commit

Install

npm install @stackbilt/llm-providers

Quick Start

import { LLMProviders } from '@stackbilt/llm-providers';

const llm = LLMProviders.fromEnv(process.env);
const response = await llm.generateResponse({
  messages: [{ role: 'user', content: 'Hello!' }],
});

See README for full documentation.
See SECURITY.md for supply chain security policy.