feat: add Flare WASM inference engine integration#301
Conversation
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (5)
📝 WalkthroughWalkthroughAdds Flare WASM engine support as a third inference backend: new FlareEngineWrapper with OPFS caching, GGUF model registry (six models), Flare-specific types (FlareConfig), BrowserAI integration and Flare-specific public APIs (adapter loading, cache inspection/clearing), and new exports. Changes
Sequence DiagramsequenceDiagram
participant BrowserAI
participant FlareEngineWrapper as FlareEngineWrapper
participant Network as Network/OPFS
participant FlareWASM as Flare WASM Module
participant WebGPU as WebGPU/CPU
BrowserAI->>FlareEngineWrapper: loadModel(config, options)
activate FlareEngineWrapper
FlareEngineWrapper->>Network: Check OPFS cache (URL key)
alt Cache Hit
Network-->>FlareEngineWrapper: Return cached bytes
else Cache Miss
FlareEngineWrapper->>Network: Fetch GGUF from URL (streaming with progress)
Network-->>FlareEngineWrapper: Stream bytes
FlareEngineWrapper->>Network: Write bytes to OPFS cache (async)
end
FlareEngineWrapper->>FlareWASM: init() & load(bytes)
FlareEngineWrapper->>WebGPU: init_gpu()
alt GPU Available
WebGPU-->>FlareEngineWrapper: GPU initialized
else GPU Unavailable
WebGPU-->>FlareEngineWrapper: Fallback to CPU
end
deactivate FlareEngineWrapper
FlareEngineWrapper-->>BrowserAI: Model loaded
BrowserAI->>FlareEngineWrapper: generateText(prompt, options)
activate FlareEngineWrapper
FlareEngineWrapper->>FlareWASM: format prompt & encode
FlareEngineWrapper->>FlareWASM: begin_stream_with_params / batch_generate
loop token generation (stream)
FlareWASM-->>FlareEngineWrapper: next_token()
FlareEngineWrapper-->>BrowserAI: onToken callback
end
FlareEngineWrapper-->>BrowserAI: OpenAI-compatible completion + usage
deactivate FlareEngineWrapper
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
src/core/llm/index.ts (1)
267-270: Inconsistent error handling:isFlareModelCachedsilently returnsfalsewhile peer methods throw.
loadAdapter(line 258) andclearFlareModelCache(line 276) throw errors when the engine is notFlareEngineWrapper, butisFlareModelCachedsilently returnsfalse. This inconsistency may confuse consumers who expect uniform behavior.Consider aligning the behavior — either throw for all three methods, or return a default/sentinel value for all.
Option A: Throw consistently
async isFlareModelCached(): Promise<boolean> { - if (!(this.engine instanceof FlareEngineWrapper)) return false; + if (!(this.engine instanceof FlareEngineWrapper)) { + throw new Error('isFlareModelCached is only supported with the Flare engine.'); + } return this.engine.isCached(); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core/llm/index.ts` around lines 267 - 270, isFlareModelCached currently returns false when this.engine is not a FlareEngineWrapper while sibling methods loadAdapter and clearFlareModelCache throw; make behavior consistent by throwing the same error as those methods: in isFlareModelCached, check if (this.engine instanceof FlareEngineWrapper) and if not throw the same Error/message used by loadAdapter/clearFlareModelCache, otherwise return this.engine.isCached(); reference isFlareModelCached, loadAdapter, clearFlareModelCache, and FlareEngineWrapper to locate the checks and align the error handling.src/engines/flare-engine-wrapper.ts (2)
156-167: Empty catch blocks silently swallow errors.The OPFS helper functions (lines 164, 193, 206, 223) use empty
catchblocks, making debugging difficult when OPFS operations fail unexpectedly. Consider logging at debug level or using a more specific error type check.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engines/flare-engine-wrapper.ts` around lines 156 - 167, The OPFS helpers (e.g., readFromOpfs) currently swallow errors in empty catch blocks; update each catch to capture the error (e.g., catch (err)) and log contextual debug information (include the function name like readFromOpfs, the cacheKey, and the error object) instead of silently returning null — use console.debug or your ambient logger to record the error and then return null (or rethrow for unexpected error types) so failures are visible while preserving existing fallback behavior.
422-426: Unused parameter_optionsflagged by linter.The pipeline reports an unused variable warning. Since this is an intentional interface stub, prefix with underscore consistently or remove the parameter entirely.
🧹 Proposed fix
- async embed(_input: string, _options: Record<string, unknown> = {}): Promise<unknown> { + async embed(_input: string): Promise<unknown> { throw new Error( '[Flare] Embedding is not supported. Use a Transformers.js feature-extraction model instead.', ); }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/engines/flare-engine-wrapper.ts` around lines 422 - 426, The linter flags the unused `_options` parameter in the embed method; update the embed function signature (embed) to remove the unused `_options` parameter (or make it an optional rest/unused parameter consistent with project lint rules) so the stub no longer contains an unused variable, and ensure any callers or interface declarations are adjusted to match the new signature.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/engines/flare-engine-wrapper.ts`:
- Around line 214-226: In listCachedModels, the OPFS directory handle is being
iterated incorrectly by casting dir to AsyncIterable; call dir.entries() and
iterate that AsyncIterableIterator instead. Update the loop in listCachedModels
to use for await (const [name] of dir.entries()) (referencing the OPFS_CACHE_DIR
and the listCachedModels function) and keep the try/catch behavior the same so
it returns the collected keys or an empty array on error.
- Line 414: The code reads this.engine.stream_stop_reason (used to set
stopReason) but that property is from the streaming API and may be stale after
calling generate_text_with_params in the non-streaming branch; update the
non-streaming path around generate_text_with_params to determine stopReason from
the batch response (or explicitly default to 'stop' when the response doesn't
include a stream_stop_reason), e.g., after calling generate_text_with_params
inspect the returned result for a stream_stop_reason/stop_reason field and set
stopReason from that or fall back to 'stop' instead of always reading
this.engine.stream_stop_reason.
- Around line 400-416: The code currently sets completionTokens =
outputText.length (characters), which is incorrect; replace that with a real
token count by running the model tokenizer over the generated text (e.g., use
this.engine.encode(outputText) or this.engine.tokenize(outputText) if such a
method exists) and set completionTokens to the length of that token array; if
the engine has a shared tokenizer used earlier for prompt tokens (how
promptTokens was computed), reuse that same encode/tokenize function to compute
completionTokens before calling buildChatResponse(outputText,
promptTokens.length, completionTokens, stopReason).
---
Nitpick comments:
In `@src/core/llm/index.ts`:
- Around line 267-270: isFlareModelCached currently returns false when
this.engine is not a FlareEngineWrapper while sibling methods loadAdapter and
clearFlareModelCache throw; make behavior consistent by throwing the same error
as those methods: in isFlareModelCached, check if (this.engine instanceof
FlareEngineWrapper) and if not throw the same Error/message used by
loadAdapter/clearFlareModelCache, otherwise return this.engine.isCached();
reference isFlareModelCached, loadAdapter, clearFlareModelCache, and
FlareEngineWrapper to locate the checks and align the error handling.
In `@src/engines/flare-engine-wrapper.ts`:
- Around line 156-167: The OPFS helpers (e.g., readFromOpfs) currently swallow
errors in empty catch blocks; update each catch to capture the error (e.g.,
catch (err)) and log contextual debug information (include the function name
like readFromOpfs, the cacheKey, and the error object) instead of silently
returning null — use console.debug or your ambient logger to record the error
and then return null (or rethrow for unexpected error types) so failures are
visible while preserving existing fallback behavior.
- Around line 422-426: The linter flags the unused `_options` parameter in the
embed method; update the embed function signature (embed) to remove the unused
`_options` parameter (or make it an optional rest/unused parameter consistent
with project lint rules) so the stub no longer contains an unused variable, and
ensure any callers or interface declarations are adjusted to match the new
signature.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 2791ef6d-d861-49f0-865d-f7dd17140e2a
📒 Files selected for processing (5)
src/config/models/flare-models.jsonsrc/config/models/types.tssrc/core/llm/index.tssrc/engines/flare-engine-wrapper.tssrc/index.ts
| export async function listCachedModels(): Promise<string[]> { | ||
| try { | ||
| const root = await navigator.storage.getDirectory(); | ||
| const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false }); | ||
| const keys: string[] = []; | ||
| for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) { | ||
| keys.push(name); | ||
| } | ||
| return keys; | ||
| } catch { | ||
| return []; | ||
| } | ||
| } |
There was a problem hiding this comment.
Incorrect type for OPFS directory iteration.
FileSystemDirectoryHandle.entries() returns AsyncIterableIterator<[string, FileSystemHandle]>, but the code casts dir directly to AsyncIterable<[string, FileSystemHandle]>. The correct approach is to call dir.entries().
🛠️ Proposed fix
export async function listCachedModels(): Promise<string[]> {
try {
const root = await navigator.storage.getDirectory();
const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });
const keys: string[] = [];
- for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) {
+ for await (const [name] of dir.entries()) {
keys.push(name);
}
return keys;
} catch {
return [];
}
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export async function listCachedModels(): Promise<string[]> { | |
| try { | |
| const root = await navigator.storage.getDirectory(); | |
| const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false }); | |
| const keys: string[] = []; | |
| for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) { | |
| keys.push(name); | |
| } | |
| return keys; | |
| } catch { | |
| return []; | |
| } | |
| } | |
| export async function listCachedModels(): Promise<string[]> { | |
| try { | |
| const root = await navigator.storage.getDirectory(); | |
| const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false }); | |
| const keys: string[] = []; | |
| for await (const [name] of dir.entries()) { | |
| keys.push(name); | |
| } | |
| return keys; | |
| } catch { | |
| return []; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engines/flare-engine-wrapper.ts` around lines 214 - 226, In
listCachedModels, the OPFS directory handle is being iterated incorrectly by
casting dir to AsyncIterable; call dir.entries() and iterate that
AsyncIterableIterator instead. Update the loop in listCachedModels to use for
await (const [name] of dir.entries()) (referencing the OPFS_CACHE_DIR and the
listCachedModels function) and keep the try/catch behavior the same so it
returns the collected keys or an empty array on error.
| } else { | ||
| // Batch path — generate_text_with_params is synchronous inside WASM | ||
| outputText = this.engine.generate_text_with_params( | ||
| formattedPrompt, | ||
| maxTokens, | ||
| temperature, | ||
| topP, | ||
| topK, | ||
| repeatPenalty, | ||
| minP, | ||
| ); | ||
| completionTokens = outputText.length; // approximate | ||
| } | ||
|
|
||
| const stopReason = this.engine.stream_stop_reason || 'stop'; | ||
|
|
||
| return buildChatResponse(outputText, promptTokens.length, completionTokens, stopReason); |
There was a problem hiding this comment.
Inaccurate completionTokens count in non-streaming path.
Line 411 sets completionTokens = outputText.length, which counts characters, not tokens. This makes the usage object in the response misleading, especially for multibyte characters or subword tokenization where token count ≠ character count.
Consider encoding the output to get accurate token count, or clearly document the approximation:
🛠️ Proposed fix for accurate token counting
} else {
// Batch path — generate_text_with_params is synchronous inside WASM
outputText = this.engine.generate_text_with_params(
formattedPrompt,
maxTokens,
temperature,
topP,
topK,
repeatPenalty,
minP,
);
- completionTokens = outputText.length; // approximate
+ // Encode output to get accurate token count
+ completionTokens = this.engine.encode_text(outputText).length;
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engines/flare-engine-wrapper.ts` around lines 400 - 416, The code
currently sets completionTokens = outputText.length (characters), which is
incorrect; replace that with a real token count by running the model tokenizer
over the generated text (e.g., use this.engine.encode(outputText) or
this.engine.tokenize(outputText) if such a method exists) and set
completionTokens to the length of that token array; if the engine has a shared
tokenizer used earlier for prompt tokens (how promptTokens was computed), reuse
that same encode/tokenize function to compute completionTokens before calling
buildChatResponse(outputText, promptTokens.length, completionTokens,
stopReason).
| completionTokens = outputText.length; // approximate | ||
| } | ||
|
|
||
| const stopReason = this.engine.stream_stop_reason || 'stop'; |
There was a problem hiding this comment.
stream_stop_reason may be stale in non-streaming path.
In the non-streaming branch (lines 401-412), generate_text_with_params is called, but stream_stop_reason is a property from the streaming API. This value may be undefined or hold a stale value from a previous streaming call.
Consider checking if stream_stop_reason is valid after batch generation, or default to 'stop' for the non-streaming path.
🛠️ Proposed fix
- const stopReason = this.engine.stream_stop_reason || 'stop';
+ // stream_stop_reason is only valid after streaming; default to 'stop' for batch generation
+ const stopReason = onToken ? (this.engine.stream_stop_reason || 'stop') : 'stop';📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| const stopReason = this.engine.stream_stop_reason || 'stop'; | |
| // stream_stop_reason is only valid after streaming; default to 'stop' for batch generation | |
| const stopReason = onToken ? (this.engine.stream_stop_reason || 'stop') : 'stop'; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/engines/flare-engine-wrapper.ts` at line 414, The code reads
this.engine.stream_stop_reason (used to set stopReason) but that property is
from the streaming API and may be stale after calling generate_text_with_params
in the non-streaming branch; update the non-streaming path around
generate_text_with_params to determine stopReason from the batch response (or
explicitly default to 'stop' when the response doesn't include a
stream_stop_reason), e.g., after calling generate_text_with_params inspect the
returned result for a stream_stop_reason/stop_reason field and set stopReason
from that or fall back to 'stop' instead of always reading
this.engine.stream_stop_reason.
Integrates Flare (pure Rust to WASM, standard GGUF) as a third BrowserAI engine backend alongside MLC WebLLM and Transformers.js. - Add FlareConfig type and update ModelConfig union - Add flare-models.json with 6 GGUF registry entries (SmolLM2-135M/360M, Qwen2.5-0.5B, Llama-3.2-1B) - Add FlareEngineWrapper with OPFS caching, streaming, LoRA support, and progressive loading hook - Wire Flare into BrowserAI class with loadAdapter, isFlareModelCached, clearFlareModelCache - Export FlareEngineWrapper, flareModels, and OPFS helpers from main index Closes #295, #296, #297, #298, #300. Part of #293.
5e6f1f1 to
2b1fd5c
Compare
The Flare engine integration (#301) landed with zero tests. This adds two test files that together bring the project from 35 → 62 passing tests (+27): **src/engines/flare-engine-wrapper.test.ts (11 tests)** - Constructor initializes to a clean unloaded state - generateText / loadAdapter before loadModel throws helpful errors - embed() always throws (Flare is text-generation only) - isCached / clearCache / dispose are safe on unloaded engines - loadModel throws with a clear install instruction when @aspect/flare isn't installed (dynamic import fails cleanly) - flare-models.json shape validation + guarding against cross-registry key collisions with Demucs/MLC **src/core/llm/browserai.test.ts (13 tests)** - BrowserAI construction + guard rails (generateText, embed, separateAudio, transcribeAudio, loadAdapter, isFlareModelCached, clearFlareModelCache all throw sensible errors before loadModel) - loadModel with unknown identifier throws "not recognized" - registerCustomModel accepts a DemucsConfig - dispose is idempotent - Cross-registry sanity check: Demucs and Flare keys don't collide with each other or with MLC/Transformers. If demucs-models.json ever shadows an mlc-models.json key, the engine-selection logic in loadModel would silently pick the wrong one, so it's worth asserting at test time - Verifies every demucs/flare config has the expected `engine` field The BrowserAI tests mock all four engine wrappers (MLC, Transformers, Demucs, Flare) because they transitively use `import.meta.url`, which jest's default CJS runtime can't parse. The stubs just give us an API-compatible surface for assertions that never actually invoke engine methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Integrates Flare LLM as a third inference engine backend for BrowserAI, alongside MLC WebLLM and Transformers.js. Flare is a pure Rust → WASM engine that runs standard GGUF files directly — no TVM compilation step needed.
Closes #295, #296, #297, #298, #300. Part of #293.
What is in this PR
New files:
src/engines/flare-engine-wrapper.ts—FlareEngineWrapperimplementing the BrowserAI engine interface with:loadModel()— fetches GGUF with download progress + OPFS caching for instant repeat loadsgenerateText()— OpenAI-compatible response, optional per-tokenonTokenstreaming callbackloadAdapter()— SafeTensors LoRA merging viamerge_lora/merge_lora_with_alphaloadModelProgressive()— API hook for future layer-streaming inferenceisModelCached,deleteCachedModel,listCachedModelssrc/config/models/flare-models.json— 6 GGUF model registry entries:Modified files:
src/config/models/types.ts— addsFlareConfiginterface, updatesModelConfigunionsrc/core/llm/index.ts— wiresFlareEngineWrapperintoloadModel(), addsloadAdapter(),isFlareModelCached(),clearFlareModelCache()toBrowserAIclasssrc/index.ts— exportsFlareEngineWrapper,flareModels, and OPFS utility functionsUsage example
Dependency note
@aspect/flare(issue #294) is a peer dependency loaded via dynamicimport(). BrowserAI will continue to work without it — attempting to use a Flare model without the package installed produces a clear error message with install instructions.Test plan
npm run build)npm test— 22/22)smollm2-135m-flarein browser once@aspect/flareis published (Publish @aspect/flare npm package from flarellm repo #294)Summary by CodeRabbit