Skip to content

feat: add Flare WASM inference engine integration#301

Merged
sauravpanda merged 1 commit into
mainfrom
worktree-feat+flare-engine-integration
Apr 16, 2026
Merged

feat: add Flare WASM inference engine integration#301
sauravpanda merged 1 commit into
mainfrom
worktree-feat+flare-engine-integration

Conversation

@sauravpanda
Copy link
Copy Markdown
Owner

@sauravpanda sauravpanda commented Apr 15, 2026

Summary

Integrates Flare LLM as a third inference engine backend for BrowserAI, alongside MLC WebLLM and Transformers.js. Flare is a pure Rust → WASM engine that runs standard GGUF files directly — no TVM compilation step needed.

Closes #295, #296, #297, #298, #300. Part of #293.

What is in this PR

New files:

  • src/engines/flare-engine-wrapper.tsFlareEngineWrapper implementing the BrowserAI engine interface with:
    • loadModel() — fetches GGUF with download progress + OPFS caching for instant repeat loads
    • generateText() — OpenAI-compatible response, optional per-token onToken streaming callback
    • loadAdapter() — SafeTensors LoRA merging via merge_lora / merge_lora_with_alpha
    • loadModelProgressive() — API hook for future layer-streaming inference
    • OPFS helpers: isModelCached, deleteCachedModel, listCachedModels
  • src/config/models/flare-models.json — 6 GGUF model registry entries:
    • SmolLM2-135M (Q8_0 / Q4_K_M)
    • SmolLM2-360M (Q8_0)
    • Qwen2.5-0.5B (Q4_K_M)
    • Llama-3.2-1B (Q8_0 / Q4_K_M)

Modified files:

  • src/config/models/types.ts — adds FlareConfig interface, updates ModelConfig union
  • src/core/llm/index.ts — wires FlareEngineWrapper into loadModel(), adds loadAdapter(), isFlareModelCached(), clearFlareModelCache() to BrowserAI class
  • src/index.ts — exports FlareEngineWrapper, flareModels, and OPFS utility functions

Usage example

import { BrowserAI } from '@browseros/ai';

const ai = new BrowserAI();

// Load a GGUF model via Flare
await ai.loadModel('smollm2-135m-flare', {
  onProgress: (loaded, total) => console.log(Math.round(loaded / total * 100) + '%')
});

// Generate text with streaming
const result = await ai.generateText('What is Rust?', {
  max_tokens: 256,
  temperature: 0.7,
  onToken: (tok) => process.stdout.write(tok),
});

// Load a LoRA adapter
await ai.loadAdapter({ url: 'https://hf.co/.../adapter.safetensors' });

// Check OPFS cache
const cached = await ai.isFlareModelCached(); // true after first load

Dependency note

@aspect/flare (issue #294) is a peer dependency loaded via dynamic import(). BrowserAI will continue to work without it — attempting to use a Flare model without the package installed produces a clear error message with install instructions.

Test plan

  • TypeScript type-check passes (no new errors from Flare code)
  • Library build succeeds (npm run build)
  • Existing unit tests all pass (npm test — 22/22)
  • End-to-end: load smollm2-135m-flare in browser once @aspect/flare is published (Publish @aspect/flare npm package from flarellm repo #294)
  • Verify OPFS cache gives instant second load
  • Verify LoRA adapter merging works with a compatible adapter file

Summary by CodeRabbit

  • New Features
    • Added Flare text-generation engine with six new compact models.
    • Offline model caching with inspect/clear cache controls.
    • Optional GPU acceleration with automatic CPU fallback.
    • LoRA adapter support for merged model adapters.
    • Real-time download and load progress reporting and streaming token output.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 15, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 47e730db-65a6-4738-b6a0-d8300808d121

📥 Commits

Reviewing files that changed from the base of the PR and between 5e6f1f1 and 2b1fd5c.

📒 Files selected for processing (5)
  • src/config/models/flare-models.json
  • src/config/models/types.ts
  • src/core/llm/index.ts
  • src/engines/flare-engine-wrapper.ts
  • src/index.ts

📝 Walkthrough

Walkthrough

Adds Flare WASM engine support as a third inference backend: new FlareEngineWrapper with OPFS caching, GGUF model registry (six models), Flare-specific types (FlareConfig), BrowserAI integration and Flare-specific public APIs (adapter loading, cache inspection/clearing), and new exports.

Changes

Cohort / File(s) Summary
Model Registry & Types
src/config/models/flare-models.json, src/config/models/types.ts
Added flare-models.json with six GGUF model entries and extended ModelConfig union with FlareConfig (engine: 'flare', url, architecture, quantization, downloadSizeMB).
Engine Implementation
src/engines/flare-engine-wrapper.ts
New comprehensive FlareEngineWrapper: dynamic Flare WASM init, GGUF load with OPFS cache + byte-progress, GPU init with CPU fallback, chat-template prompt formatting, streaming/non-streaming generation, LoRA adapter merging, cache inspection/clearing and helper functions.
BrowserAI Integration
src/core/llm/index.ts
Wired Flare into BrowserAI: imported flare models and wrapper, expanded engine union, adjusted engine selection logic, added Flare branch in loadModel, and three Flare-only public methods (loadAdapter, isFlareModelCached, clearFlareModelCache).
Public Exports
src/index.ts
Exported FlareEngineWrapper, Flare option types, Flare-prefixed cache helpers (isFlareModelCached, deleteFlareModelCache, listFlareCachedModels), and default flareModels JSON export.

Sequence Diagram

sequenceDiagram
    participant BrowserAI
    participant FlareEngineWrapper as FlareEngineWrapper
    participant Network as Network/OPFS
    participant FlareWASM as Flare WASM Module
    participant WebGPU as WebGPU/CPU

    BrowserAI->>FlareEngineWrapper: loadModel(config, options)
    activate FlareEngineWrapper
    FlareEngineWrapper->>Network: Check OPFS cache (URL key)
    alt Cache Hit
        Network-->>FlareEngineWrapper: Return cached bytes
    else Cache Miss
        FlareEngineWrapper->>Network: Fetch GGUF from URL (streaming with progress)
        Network-->>FlareEngineWrapper: Stream bytes
        FlareEngineWrapper->>Network: Write bytes to OPFS cache (async)
    end
    FlareEngineWrapper->>FlareWASM: init() & load(bytes)
    FlareEngineWrapper->>WebGPU: init_gpu()
    alt GPU Available
        WebGPU-->>FlareEngineWrapper: GPU initialized
    else GPU Unavailable
        WebGPU-->>FlareEngineWrapper: Fallback to CPU
    end
    deactivate FlareEngineWrapper
    FlareEngineWrapper-->>BrowserAI: Model loaded

    BrowserAI->>FlareEngineWrapper: generateText(prompt, options)
    activate FlareEngineWrapper
    FlareEngineWrapper->>FlareWASM: format prompt & encode
    FlareEngineWrapper->>FlareWASM: begin_stream_with_params / batch_generate
    loop token generation (stream)
        FlareWASM-->>FlareEngineWrapper: next_token()
        FlareEngineWrapper-->>BrowserAI: onToken callback
    end
    FlareEngineWrapper-->>BrowserAI: OpenAI-compatible completion + usage
    deactivate FlareEngineWrapper
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

  • #295 — Implements the FlareEngine adapter and BrowserAI integration as requested (direct match).
  • #294 — Depends on the @aspect/flare WASM package used by the new wrapper (dependency noted).
  • #300 — Implements progressive loading (loadModelProgressive / onLayersReady) matching that issue's objective.

Possibly related PRs

  • #301 — Appears to implement the same Flare WASM integration, model registry, types, and BrowserAI wiring; likely a direct duplicate or sibling PR.

Poem

🐰 I hopped into bytes and GGUF streams,
I cached the models and chased GPU dreams,
Tokens trickle out in a streaming rhyme,
Adapters merge and the rabbit keeps time,
Flare lights the path for inference prime.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: add Flare WASM inference engine integration' clearly and concisely summarizes the main change, accurately reflecting the addition of Flare as a new inference engine backend.
Linked Issues check ✅ Passed The PR successfully implements all core coding objectives from #295: FlareEngineWrapper class with loadModel/generateText/loadAdapter methods, OpenAI-compatible responses, download progress and OPFS caching, streaming generation with onToken callbacks, config types, and full BrowserAI integration.
Out of Scope Changes check ✅ Passed All changes are directly related to Flare engine integration: new engine wrapper, config types, model registry, and public API exports. No unrelated modifications detected.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-feat+flare-engine-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
src/core/llm/index.ts (1)

267-270: Inconsistent error handling: isFlareModelCached silently returns false while peer methods throw.

loadAdapter (line 258) and clearFlareModelCache (line 276) throw errors when the engine is not FlareEngineWrapper, but isFlareModelCached silently returns false. This inconsistency may confuse consumers who expect uniform behavior.

Consider aligning the behavior — either throw for all three methods, or return a default/sentinel value for all.

Option A: Throw consistently
   async isFlareModelCached(): Promise<boolean> {
-    if (!(this.engine instanceof FlareEngineWrapper)) return false;
+    if (!(this.engine instanceof FlareEngineWrapper)) {
+      throw new Error('isFlareModelCached is only supported with the Flare engine.');
+    }
     return this.engine.isCached();
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/llm/index.ts` around lines 267 - 270, isFlareModelCached currently
returns false when this.engine is not a FlareEngineWrapper while sibling methods
loadAdapter and clearFlareModelCache throw; make behavior consistent by throwing
the same error as those methods: in isFlareModelCached, check if (this.engine
instanceof FlareEngineWrapper) and if not throw the same Error/message used by
loadAdapter/clearFlareModelCache, otherwise return this.engine.isCached();
reference isFlareModelCached, loadAdapter, clearFlareModelCache, and
FlareEngineWrapper to locate the checks and align the error handling.
src/engines/flare-engine-wrapper.ts (2)

156-167: Empty catch blocks silently swallow errors.

The OPFS helper functions (lines 164, 193, 206, 223) use empty catch blocks, making debugging difficult when OPFS operations fail unexpectedly. Consider logging at debug level or using a more specific error type check.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` around lines 156 - 167, The OPFS helpers
(e.g., readFromOpfs) currently swallow errors in empty catch blocks; update each
catch to capture the error (e.g., catch (err)) and log contextual debug
information (include the function name like readFromOpfs, the cacheKey, and the
error object) instead of silently returning null — use console.debug or your
ambient logger to record the error and then return null (or rethrow for
unexpected error types) so failures are visible while preserving existing
fallback behavior.

422-426: Unused parameter _options flagged by linter.

The pipeline reports an unused variable warning. Since this is an intentional interface stub, prefix with underscore consistently or remove the parameter entirely.

🧹 Proposed fix
-  async embed(_input: string, _options: Record<string, unknown> = {}): Promise<unknown> {
+  async embed(_input: string): Promise<unknown> {
     throw new Error(
       '[Flare] Embedding is not supported. Use a Transformers.js feature-extraction model instead.',
     );
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` around lines 422 - 426, The linter flags
the unused `_options` parameter in the embed method; update the embed function
signature (embed) to remove the unused `_options` parameter (or make it an
optional rest/unused parameter consistent with project lint rules) so the stub
no longer contains an unused variable, and ensure any callers or interface
declarations are adjusted to match the new signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/engines/flare-engine-wrapper.ts`:
- Around line 214-226: In listCachedModels, the OPFS directory handle is being
iterated incorrectly by casting dir to AsyncIterable; call dir.entries() and
iterate that AsyncIterableIterator instead. Update the loop in listCachedModels
to use for await (const [name] of dir.entries()) (referencing the OPFS_CACHE_DIR
and the listCachedModels function) and keep the try/catch behavior the same so
it returns the collected keys or an empty array on error.
- Line 414: The code reads this.engine.stream_stop_reason (used to set
stopReason) but that property is from the streaming API and may be stale after
calling generate_text_with_params in the non-streaming branch; update the
non-streaming path around generate_text_with_params to determine stopReason from
the batch response (or explicitly default to 'stop' when the response doesn't
include a stream_stop_reason), e.g., after calling generate_text_with_params
inspect the returned result for a stream_stop_reason/stop_reason field and set
stopReason from that or fall back to 'stop' instead of always reading
this.engine.stream_stop_reason.
- Around line 400-416: The code currently sets completionTokens =
outputText.length (characters), which is incorrect; replace that with a real
token count by running the model tokenizer over the generated text (e.g., use
this.engine.encode(outputText) or this.engine.tokenize(outputText) if such a
method exists) and set completionTokens to the length of that token array; if
the engine has a shared tokenizer used earlier for prompt tokens (how
promptTokens was computed), reuse that same encode/tokenize function to compute
completionTokens before calling buildChatResponse(outputText,
promptTokens.length, completionTokens, stopReason).

---

Nitpick comments:
In `@src/core/llm/index.ts`:
- Around line 267-270: isFlareModelCached currently returns false when
this.engine is not a FlareEngineWrapper while sibling methods loadAdapter and
clearFlareModelCache throw; make behavior consistent by throwing the same error
as those methods: in isFlareModelCached, check if (this.engine instanceof
FlareEngineWrapper) and if not throw the same Error/message used by
loadAdapter/clearFlareModelCache, otherwise return this.engine.isCached();
reference isFlareModelCached, loadAdapter, clearFlareModelCache, and
FlareEngineWrapper to locate the checks and align the error handling.

In `@src/engines/flare-engine-wrapper.ts`:
- Around line 156-167: The OPFS helpers (e.g., readFromOpfs) currently swallow
errors in empty catch blocks; update each catch to capture the error (e.g.,
catch (err)) and log contextual debug information (include the function name
like readFromOpfs, the cacheKey, and the error object) instead of silently
returning null — use console.debug or your ambient logger to record the error
and then return null (or rethrow for unexpected error types) so failures are
visible while preserving existing fallback behavior.
- Around line 422-426: The linter flags the unused `_options` parameter in the
embed method; update the embed function signature (embed) to remove the unused
`_options` parameter (or make it an optional rest/unused parameter consistent
with project lint rules) so the stub no longer contains an unused variable, and
ensure any callers or interface declarations are adjusted to match the new
signature.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2791ef6d-d861-49f0-865d-f7dd17140e2a

📥 Commits

Reviewing files that changed from the base of the PR and between d48fa13 and 5e6f1f1.

📒 Files selected for processing (5)
  • src/config/models/flare-models.json
  • src/config/models/types.ts
  • src/core/llm/index.ts
  • src/engines/flare-engine-wrapper.ts
  • src/index.ts

Comment on lines +214 to +226
export async function listCachedModels(): Promise<string[]> {
try {
const root = await navigator.storage.getDirectory();
const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });
const keys: string[] = [];
for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) {
keys.push(name);
}
return keys;
} catch {
return [];
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Incorrect type for OPFS directory iteration.

FileSystemDirectoryHandle.entries() returns AsyncIterableIterator<[string, FileSystemHandle]>, but the code casts dir directly to AsyncIterable<[string, FileSystemHandle]>. The correct approach is to call dir.entries().

🛠️ Proposed fix
 export async function listCachedModels(): Promise<string[]> {
   try {
     const root = await navigator.storage.getDirectory();
     const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });
     const keys: string[] = [];
-    for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) {
+    for await (const [name] of dir.entries()) {
       keys.push(name);
     }
     return keys;
   } catch {
     return [];
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function listCachedModels(): Promise<string[]> {
try {
const root = await navigator.storage.getDirectory();
const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });
const keys: string[] = [];
for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) {
keys.push(name);
}
return keys;
} catch {
return [];
}
}
export async function listCachedModels(): Promise<string[]> {
try {
const root = await navigator.storage.getDirectory();
const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });
const keys: string[] = [];
for await (const [name] of dir.entries()) {
keys.push(name);
}
return keys;
} catch {
return [];
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` around lines 214 - 226, In
listCachedModels, the OPFS directory handle is being iterated incorrectly by
casting dir to AsyncIterable; call dir.entries() and iterate that
AsyncIterableIterator instead. Update the loop in listCachedModels to use for
await (const [name] of dir.entries()) (referencing the OPFS_CACHE_DIR and the
listCachedModels function) and keep the try/catch behavior the same so it
returns the collected keys or an empty array on error.

Comment on lines +400 to +416
} else {
// Batch path — generate_text_with_params is synchronous inside WASM
outputText = this.engine.generate_text_with_params(
formattedPrompt,
maxTokens,
temperature,
topP,
topK,
repeatPenalty,
minP,
);
completionTokens = outputText.length; // approximate
}

const stopReason = this.engine.stream_stop_reason || 'stop';

return buildChatResponse(outputText, promptTokens.length, completionTokens, stopReason);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inaccurate completionTokens count in non-streaming path.

Line 411 sets completionTokens = outputText.length, which counts characters, not tokens. This makes the usage object in the response misleading, especially for multibyte characters or subword tokenization where token count ≠ character count.

Consider encoding the output to get accurate token count, or clearly document the approximation:

🛠️ Proposed fix for accurate token counting
     } else {
       // Batch path — generate_text_with_params is synchronous inside WASM
       outputText = this.engine.generate_text_with_params(
         formattedPrompt,
         maxTokens,
         temperature,
         topP,
         topK,
         repeatPenalty,
         minP,
       );
-      completionTokens = outputText.length; // approximate
+      // Encode output to get accurate token count
+      completionTokens = this.engine.encode_text(outputText).length;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` around lines 400 - 416, The code
currently sets completionTokens = outputText.length (characters), which is
incorrect; replace that with a real token count by running the model tokenizer
over the generated text (e.g., use this.engine.encode(outputText) or
this.engine.tokenize(outputText) if such a method exists) and set
completionTokens to the length of that token array; if the engine has a shared
tokenizer used earlier for prompt tokens (how promptTokens was computed), reuse
that same encode/tokenize function to compute completionTokens before calling
buildChatResponse(outputText, promptTokens.length, completionTokens,
stopReason).

completionTokens = outputText.length; // approximate
}

const stopReason = this.engine.stream_stop_reason || 'stop';
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

stream_stop_reason may be stale in non-streaming path.

In the non-streaming branch (lines 401-412), generate_text_with_params is called, but stream_stop_reason is a property from the streaming API. This value may be undefined or hold a stale value from a previous streaming call.

Consider checking if stream_stop_reason is valid after batch generation, or default to 'stop' for the non-streaming path.

🛠️ Proposed fix
-    const stopReason = this.engine.stream_stop_reason || 'stop';
+    // stream_stop_reason is only valid after streaming; default to 'stop' for batch generation
+    const stopReason = onToken ? (this.engine.stream_stop_reason || 'stop') : 'stop';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const stopReason = this.engine.stream_stop_reason || 'stop';
// stream_stop_reason is only valid after streaming; default to 'stop' for batch generation
const stopReason = onToken ? (this.engine.stream_stop_reason || 'stop') : 'stop';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` at line 414, The code reads
this.engine.stream_stop_reason (used to set stopReason) but that property is
from the streaming API and may be stale after calling generate_text_with_params
in the non-streaming branch; update the non-streaming path around
generate_text_with_params to determine stopReason from the batch response (or
explicitly default to 'stop' when the response doesn't include a
stream_stop_reason), e.g., after calling generate_text_with_params inspect the
returned result for a stream_stop_reason/stop_reason field and set stopReason
from that or fall back to 'stop' instead of always reading
this.engine.stream_stop_reason.

Integrates Flare (pure Rust to WASM, standard GGUF) as a third BrowserAI
engine backend alongside MLC WebLLM and Transformers.js.

- Add FlareConfig type and update ModelConfig union
- Add flare-models.json with 6 GGUF registry entries (SmolLM2-135M/360M, Qwen2.5-0.5B, Llama-3.2-1B)
- Add FlareEngineWrapper with OPFS caching, streaming, LoRA support, and progressive loading hook
- Wire Flare into BrowserAI class with loadAdapter, isFlareModelCached, clearFlareModelCache
- Export FlareEngineWrapper, flareModels, and OPFS helpers from main index

Closes #295, #296, #297, #298, #300. Part of #293.
@sauravpanda sauravpanda force-pushed the worktree-feat+flare-engine-integration branch from 5e6f1f1 to 2b1fd5c Compare April 16, 2026 00:38
@sauravpanda sauravpanda merged commit 0b573ca into main Apr 16, 2026
9 of 10 checks passed
@sauravpanda sauravpanda deleted the worktree-feat+flare-engine-integration branch April 16, 2026 00:38
sauravpanda added a commit that referenced this pull request Apr 16, 2026
The Flare engine integration (#301) landed with zero tests. This adds
two test files that together bring the project from 35 → 62 passing
tests (+27):

**src/engines/flare-engine-wrapper.test.ts (11 tests)**
- Constructor initializes to a clean unloaded state
- generateText / loadAdapter before loadModel throws helpful errors
- embed() always throws (Flare is text-generation only)
- isCached / clearCache / dispose are safe on unloaded engines
- loadModel throws with a clear install instruction when
  @aspect/flare isn't installed (dynamic import fails cleanly)
- flare-models.json shape validation + guarding against cross-registry
  key collisions with Demucs/MLC

**src/core/llm/browserai.test.ts (13 tests)**
- BrowserAI construction + guard rails (generateText, embed,
  separateAudio, transcribeAudio, loadAdapter, isFlareModelCached,
  clearFlareModelCache all throw sensible errors before loadModel)
- loadModel with unknown identifier throws "not recognized"
- registerCustomModel accepts a DemucsConfig
- dispose is idempotent
- Cross-registry sanity check: Demucs and Flare keys don't collide
  with each other or with MLC/Transformers. If demucs-models.json
  ever shadows an mlc-models.json key, the engine-selection logic in
  loadModel would silently pick the wrong one, so it's worth asserting
  at test time
- Verifies every demucs/flare config has the expected `engine` field

The BrowserAI tests mock all four engine wrappers (MLC, Transformers,
Demucs, Flare) because they transitively use `import.meta.url`, which
jest's default CJS runtime can't parse. The stubs just give us an
API-compatible surface for assertions that never actually invoke
engine methods.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement FlareEngine adapter for BrowserAI engine interface

1 participant