feat: add Flare WASM inference engine integration by sauravpanda · Pull Request #301 · sauravpanda/BrowserAI

sauravpanda · 2026-04-15T05:14:15Z

Summary

Integrates Flare LLM as a third inference engine backend for BrowserAI, alongside MLC WebLLM and Transformers.js. Flare is a pure Rust → WASM engine that runs standard GGUF files directly — no TVM compilation step needed.

Closes #295, #296, #297, #298, #300. Part of #293.

What is in this PR

New files:

src/engines/flare-engine-wrapper.ts — FlareEngineWrapper implementing the BrowserAI engine interface with:
- loadModel() — fetches GGUF with download progress + OPFS caching for instant repeat loads
- generateText() — OpenAI-compatible response, optional per-token onToken streaming callback
- loadAdapter() — SafeTensors LoRA merging via merge_lora / merge_lora_with_alpha
- loadModelProgressive() — API hook for future layer-streaming inference
- OPFS helpers: isModelCached, deleteCachedModel, listCachedModels
src/config/models/flare-models.json — 6 GGUF model registry entries:
- SmolLM2-135M (Q8_0 / Q4_K_M)
- SmolLM2-360M (Q8_0)
- Qwen2.5-0.5B (Q4_K_M)
- Llama-3.2-1B (Q8_0 / Q4_K_M)

Modified files:

src/config/models/types.ts — adds FlareConfig interface, updates ModelConfig union
src/core/llm/index.ts — wires FlareEngineWrapper into loadModel(), adds loadAdapter(), isFlareModelCached(), clearFlareModelCache() to BrowserAI class
src/index.ts — exports FlareEngineWrapper, flareModels, and OPFS utility functions

Usage example

import { BrowserAI } from '@browseros/ai';

const ai = new BrowserAI();

// Load a GGUF model via Flare
await ai.loadModel('smollm2-135m-flare', {
  onProgress: (loaded, total) => console.log(Math.round(loaded / total * 100) + '%')
});

// Generate text with streaming
const result = await ai.generateText('What is Rust?', {
  max_tokens: 256,
  temperature: 0.7,
  onToken: (tok) => process.stdout.write(tok),
});

// Load a LoRA adapter
await ai.loadAdapter({ url: 'https://hf.co/.../adapter.safetensors' });

// Check OPFS cache
const cached = await ai.isFlareModelCached(); // true after first load

Dependency note

@aspect/flare (issue #294) is a peer dependency loaded via dynamic import(). BrowserAI will continue to work without it — attempting to use a Flare model without the package installed produces a clear error message with install instructions.

Test plan

TypeScript type-check passes (no new errors from Flare code)
Library build succeeds (npm run build)
Existing unit tests all pass (npm test — 22/22)
End-to-end: load smollm2-135m-flare in browser once @aspect/flare is published (Publish @aspect/flare npm package from flarellm repo #294)
Verify OPFS cache gives instant second load
Verify LoRA adapter merging works with a compatible adapter file

Summary by CodeRabbit

New Features
- Added Flare text-generation engine with six new compact models.
- Offline model caching with inspect/clear cache controls.
- Optional GPU acceleration with automatic CPU fallback.
- LoRA adapter support for merged model adapters.
- Real-time download and load progress reporting and streaming token output.

coderabbitai · 2026-04-15T05:14:30Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 47e730db-65a6-4738-b6a0-d8300808d121

📥 Commits

Reviewing files that changed from the base of the PR and between 5e6f1f1 and 2b1fd5c.

📒 Files selected for processing (5)

src/config/models/flare-models.json
src/config/models/types.ts
src/core/llm/index.ts
src/engines/flare-engine-wrapper.ts
src/index.ts

📝 Walkthrough

Walkthrough

Adds Flare WASM engine support as a third inference backend: new FlareEngineWrapper with OPFS caching, GGUF model registry (six models), Flare-specific types (FlareConfig), BrowserAI integration and Flare-specific public APIs (adapter loading, cache inspection/clearing), and new exports.

Changes

Cohort / File(s)	Summary
Model Registry & Types `src/config/models/flare-models.json`, `src/config/models/types.ts`	Added `flare-models.json` with six GGUF model entries and extended `ModelConfig` union with `FlareConfig` (engine: 'flare', url, architecture, quantization, downloadSizeMB).
Engine Implementation `src/engines/flare-engine-wrapper.ts`	New comprehensive `FlareEngineWrapper`: dynamic Flare WASM init, GGUF load with OPFS cache + byte-progress, GPU init with CPU fallback, chat-template prompt formatting, streaming/non-streaming generation, LoRA adapter merging, cache inspection/clearing and helper functions.
BrowserAI Integration `src/core/llm/index.ts`	Wired Flare into BrowserAI: imported flare models and wrapper, expanded engine union, adjusted engine selection logic, added Flare branch in `loadModel`, and three Flare-only public methods (`loadAdapter`, `isFlareModelCached`, `clearFlareModelCache`).
Public Exports `src/index.ts`	Exported `FlareEngineWrapper`, Flare option types, Flare-prefixed cache helpers (`isFlareModelCached`, `deleteFlareModelCache`, `listFlareCachedModels`), and default `flareModels` JSON export.

Sequence Diagram

sequenceDiagram
    participant BrowserAI
    participant FlareEngineWrapper as FlareEngineWrapper
    participant Network as Network/OPFS
    participant FlareWASM as Flare WASM Module
    participant WebGPU as WebGPU/CPU

    BrowserAI->>FlareEngineWrapper: loadModel(config, options)
    activate FlareEngineWrapper
    FlareEngineWrapper->>Network: Check OPFS cache (URL key)
    alt Cache Hit
        Network-->>FlareEngineWrapper: Return cached bytes
    else Cache Miss
        FlareEngineWrapper->>Network: Fetch GGUF from URL (streaming with progress)
        Network-->>FlareEngineWrapper: Stream bytes
        FlareEngineWrapper->>Network: Write bytes to OPFS cache (async)
    end
    FlareEngineWrapper->>FlareWASM: init() & load(bytes)
    FlareEngineWrapper->>WebGPU: init_gpu()
    alt GPU Available
        WebGPU-->>FlareEngineWrapper: GPU initialized
    else GPU Unavailable
        WebGPU-->>FlareEngineWrapper: Fallback to CPU
    end
    deactivate FlareEngineWrapper
    FlareEngineWrapper-->>BrowserAI: Model loaded

    BrowserAI->>FlareEngineWrapper: generateText(prompt, options)
    activate FlareEngineWrapper
    FlareEngineWrapper->>FlareWASM: format prompt & encode
    FlareEngineWrapper->>FlareWASM: begin_stream_with_params / batch_generate
    loop token generation (stream)
        FlareWASM-->>FlareEngineWrapper: next_token()
        FlareEngineWrapper-->>BrowserAI: onToken callback
    end
    FlareEngineWrapper-->>BrowserAI: OpenAI-compatible completion + usage
    deactivate FlareEngineWrapper

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related issues

#295 — Implements the FlareEngine adapter and BrowserAI integration as requested (direct match).
#294 — Depends on the @aspect/flare WASM package used by the new wrapper (dependency noted).
#300 — Implements progressive loading (loadModelProgressive / onLayersReady) matching that issue's objective.

Possibly related PRs

#301 — Appears to implement the same Flare WASM integration, model registry, types, and BrowserAI wiring; likely a direct duplicate or sibling PR.

Poem

🐰 I hopped into bytes and GGUF streams,
I cached the models and chased GPU dreams,
Tokens trickle out in a streaming rhyme,
Adapters merge and the rabbit keeps time,
Flare lights the path for inference prime.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'feat: add Flare WASM inference engine integration' clearly and concisely summarizes the main change, accurately reflecting the addition of Flare as a new inference engine backend.
Linked Issues check	✅ Passed	The PR successfully implements all core coding objectives from `#295`: FlareEngineWrapper class with loadModel/generateText/loadAdapter methods, OpenAI-compatible responses, download progress and OPFS caching, streaming generation with onToken callbacks, config types, and full BrowserAI integration.
Out of Scope Changes check	✅ Passed	All changes are directly related to Flare engine integration: new engine wrapper, config types, model registry, and public API exports. No unrelated modifications detected.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-feat+flare-engine-integration

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

src/core/llm/index.ts (1)
267-270: Inconsistent error handling: isFlareModelCached silently returns false while peer methods throw.

loadAdapter (line 258) and clearFlareModelCache (line 276) throw errors when the engine is not FlareEngineWrapper, but isFlareModelCached silently returns false. This inconsistency may confuse consumers who expect uniform behavior.

Consider aligning the behavior — either throw for all three methods, or return a default/sentinel value for all.
Option A: Throw consistently
   async isFlareModelCached(): Promise<boolean> {
-    if (!(this.engine instanceof FlareEngineWrapper)) return false;
+    if (!(this.engine instanceof FlareEngineWrapper)) {
+      throw new Error('isFlareModelCached is only supported with the Flare engine.');
+    }
     return this.engine.isCached();
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/llm/index.ts` around lines 267 - 270, isFlareModelCached currently
returns false when this.engine is not a FlareEngineWrapper while sibling methods
loadAdapter and clearFlareModelCache throw; make behavior consistent by throwing
the same error as those methods: in isFlareModelCached, check if (this.engine
instanceof FlareEngineWrapper) and if not throw the same Error/message used by
loadAdapter/clearFlareModelCache, otherwise return this.engine.isCached();
reference isFlareModelCached, loadAdapter, clearFlareModelCache, and
FlareEngineWrapper to locate the checks and align the error handling.
src/engines/flare-engine-wrapper.ts (2)
156-167: Empty catch blocks silently swallow errors.

The OPFS helper functions (lines 164, 193, 206, 223) use empty catch blocks, making debugging difficult when OPFS operations fail unexpectedly. Consider logging at debug level or using a more specific error type check.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` around lines 156 - 167, The OPFS helpers
(e.g., readFromOpfs) currently swallow errors in empty catch blocks; update each
catch to capture the error (e.g., catch (err)) and log contextual debug
information (include the function name like readFromOpfs, the cacheKey, and the
error object) instead of silently returning null — use console.debug or your
ambient logger to record the error and then return null (or rethrow for
unexpected error types) so failures are visible while preserving existing
fallback behavior.
422-426: Unused parameter _options flagged by linter.

The pipeline reports an unused variable warning. Since this is an intentional interface stub, prefix with underscore consistently or remove the parameter entirely.
🧹 Proposed fix
-  async embed(_input: string, _options: Record<string, unknown> = {}): Promise<unknown> {
+  async embed(_input: string): Promise<unknown> {
     throw new Error(
       '[Flare] Embedding is not supported. Use a Transformers.js feature-extraction model instead.',
     );
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/flare-engine-wrapper.ts` around lines 422 - 426, The linter flags
the unused `_options` parameter in the embed method; update the embed function
signature (embed) to remove the unused `_options` parameter (or make it an
optional rest/unused parameter consistent with project lint rules) so the stub
no longer contains an unused variable, and ensure any callers or interface
declarations are adjusted to match the new signature.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/engines/flare-engine-wrapper.ts`:
- Around line 214-226: In listCachedModels, the OPFS directory handle is being
iterated incorrectly by casting dir to AsyncIterable; call dir.entries() and
iterate that AsyncIterableIterator instead. Update the loop in listCachedModels
to use for await (const [name] of dir.entries()) (referencing the OPFS_CACHE_DIR
and the listCachedModels function) and keep the try/catch behavior the same so
it returns the collected keys or an empty array on error.
- Line 414: The code reads this.engine.stream_stop_reason (used to set
stopReason) but that property is from the streaming API and may be stale after
calling generate_text_with_params in the non-streaming branch; update the
non-streaming path around generate_text_with_params to determine stopReason from
the batch response (or explicitly default to 'stop' when the response doesn't
include a stream_stop_reason), e.g., after calling generate_text_with_params
inspect the returned result for a stream_stop_reason/stop_reason field and set
stopReason from that or fall back to 'stop' instead of always reading
this.engine.stream_stop_reason.
- Around line 400-416: The code currently sets completionTokens =
outputText.length (characters), which is incorrect; replace that with a real
token count by running the model tokenizer over the generated text (e.g., use
this.engine.encode(outputText) or this.engine.tokenize(outputText) if such a
method exists) and set completionTokens to the length of that token array; if
the engine has a shared tokenizer used earlier for prompt tokens (how
promptTokens was computed), reuse that same encode/tokenize function to compute
completionTokens before calling buildChatResponse(outputText,
promptTokens.length, completionTokens, stopReason).

---

Nitpick comments:
In `@src/core/llm/index.ts`:
- Around line 267-270: isFlareModelCached currently returns false when
this.engine is not a FlareEngineWrapper while sibling methods loadAdapter and
clearFlareModelCache throw; make behavior consistent by throwing the same error
as those methods: in isFlareModelCached, check if (this.engine instanceof
FlareEngineWrapper) and if not throw the same Error/message used by
loadAdapter/clearFlareModelCache, otherwise return this.engine.isCached();
reference isFlareModelCached, loadAdapter, clearFlareModelCache, and
FlareEngineWrapper to locate the checks and align the error handling.

In `@src/engines/flare-engine-wrapper.ts`:
- Around line 156-167: The OPFS helpers (e.g., readFromOpfs) currently swallow
errors in empty catch blocks; update each catch to capture the error (e.g.,
catch (err)) and log contextual debug information (include the function name
like readFromOpfs, the cacheKey, and the error object) instead of silently
returning null — use console.debug or your ambient logger to record the error
and then return null (or rethrow for unexpected error types) so failures are
visible while preserving existing fallback behavior.
- Around line 422-426: The linter flags the unused `_options` parameter in the
embed method; update the embed function signature (embed) to remove the unused
`_options` parameter (or make it an optional rest/unused parameter consistent
with project lint rules) so the stub no longer contains an unused variable, and
ensure any callers or interface declarations are adjusted to match the new
signature.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2791ef6d-d861-49f0-865d-f7dd17140e2a

📥 Commits

Reviewing files that changed from the base of the PR and between d48fa13 and 5e6f1f1.

📒 Files selected for processing (5)

src/config/models/flare-models.json
src/config/models/types.ts
src/core/llm/index.ts
src/engines/flare-engine-wrapper.ts
src/index.ts

coderabbitai · 2026-04-15T05:18:24Z

+export async function listCachedModels(): Promise<string[]> {
+  try {
+    const root = await navigator.storage.getDirectory();
+    const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });
+    const keys: string[] = [];
+    for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) {
+      keys.push(name);
+    }
+    return keys;
+  } catch {
+    return [];
+  }
+}


⚠️ Potential issue | 🟡 Minor

Incorrect type for OPFS directory iteration.

FileSystemDirectoryHandle.entries() returns AsyncIterableIterator<[string, FileSystemHandle]>, but the code casts dir directly to AsyncIterable<[string, FileSystemHandle]>. The correct approach is to call dir.entries().

🛠️ Proposed fix

export async function listCachedModels(): Promise<string[]> { try { const root = await navigator.storage.getDirectory(); const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false }); const keys: string[] = []; - for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) { + for await (const [name] of dir.entries()) { keys.push(name); } return keys; } catch { return []; } }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

export async function listCachedModels(): Promise<string[]> {

try {

const root = await navigator.storage.getDirectory();

const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });

const keys: string[] = [];

for await (const [name] of (dir as unknown as AsyncIterable<[string, FileSystemHandle]>)) {

keys.push(name);

}

return keys;

} catch {

return [];

}

}

export async function listCachedModels(): Promise<string[]> {

try {

const root = await navigator.storage.getDirectory();

const dir = await root.getDirectoryHandle(OPFS_CACHE_DIR, { create: false });

const keys: string[] = [];

for await (const [name] of dir.entries()) {

keys.push(name);

}

return keys;

} catch {

return [];

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engines/flare-engine-wrapper.ts` around lines 214 - 226, In listCachedModels, the OPFS directory handle is being iterated incorrectly by casting dir to AsyncIterable; call dir.entries() and iterate that AsyncIterableIterator instead. Update the loop in listCachedModels to use for await (const [name] of dir.entries()) (referencing the OPFS_CACHE_DIR and the listCachedModels function) and keep the try/catch behavior the same so it returns the collected keys or an empty array on error.

coderabbitai · 2026-04-15T05:18:24Z

+    } else {
+      // Batch path — generate_text_with_params is synchronous inside WASM
+      outputText = this.engine.generate_text_with_params(
+        formattedPrompt,
+        maxTokens,
+        temperature,
+        topP,
+        topK,
+        repeatPenalty,
+        minP,
+      );
+      completionTokens = outputText.length; // approximate
+    }
+
+    const stopReason = this.engine.stream_stop_reason || 'stop';
+
+    return buildChatResponse(outputText, promptTokens.length, completionTokens, stopReason);


⚠️ Potential issue | 🟠 Major

Inaccurate completionTokens count in non-streaming path.

Line 411 sets completionTokens = outputText.length, which counts characters, not tokens. This makes the usage object in the response misleading, especially for multibyte characters or subword tokenization where token count ≠ character count.

Consider encoding the output to get accurate token count, or clearly document the approximation:

🛠️ Proposed fix for accurate token counting

} else { // Batch path — generate_text_with_params is synchronous inside WASM outputText = this.engine.generate_text_with_params( formattedPrompt, maxTokens, temperature, topP, topK, repeatPenalty, minP, ); - completionTokens = outputText.length; // approximate + // Encode output to get accurate token count + completionTokens = this.engine.encode_text(outputText).length; }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engines/flare-engine-wrapper.ts` around lines 400 - 416, The code currently sets completionTokens = outputText.length (characters), which is incorrect; replace that with a real token count by running the model tokenizer over the generated text (e.g., use this.engine.encode(outputText) or this.engine.tokenize(outputText) if such a method exists) and set completionTokens to the length of that token array; if the engine has a shared tokenizer used earlier for prompt tokens (how promptTokens was computed), reuse that same encode/tokenize function to compute completionTokens before calling buildChatResponse(outputText, promptTokens.length, completionTokens, stopReason).

coderabbitai · 2026-04-15T05:18:24Z

+      completionTokens = outputText.length; // approximate
+    }
+
+    const stopReason = this.engine.stream_stop_reason || 'stop';


⚠️ Potential issue | 🟡 Minor

stream_stop_reason may be stale in non-streaming path.

In the non-streaming branch (lines 401-412), generate_text_with_params is called, but stream_stop_reason is a property from the streaming API. This value may be undefined or hold a stale value from a previous streaming call.

Consider checking if stream_stop_reason is valid after batch generation, or default to 'stop' for the non-streaming path.

🛠️ Proposed fix

- const stopReason = this.engine.stream_stop_reason || 'stop'; + // stream_stop_reason is only valid after streaming; default to 'stop' for batch generation + const stopReason = onToken ? (this.engine.stream_stop_reason || 'stop') : 'stop';

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

const stopReason = this.engine.stream_stop_reason || 'stop';

// stream_stop_reason is only valid after streaming; default to 'stop' for batch generation

const stopReason = onToken ? (this.engine.stream_stop_reason || 'stop') : 'stop';

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engines/flare-engine-wrapper.ts` at line 414, The code reads this.engine.stream_stop_reason (used to set stopReason) but that property is from the streaming API and may be stale after calling generate_text_with_params in the non-streaming branch; update the non-streaming path around generate_text_with_params to determine stopReason from the batch response (or explicitly default to 'stop' when the response doesn't include a stream_stop_reason), e.g., after calling generate_text_with_params inspect the returned result for a stream_stop_reason/stop_reason field and set stopReason from that or fall back to 'stop' instead of always reading this.engine.stream_stop_reason.

Integrates Flare (pure Rust to WASM, standard GGUF) as a third BrowserAI engine backend alongside MLC WebLLM and Transformers.js. - Add FlareConfig type and update ModelConfig union - Add flare-models.json with 6 GGUF registry entries (SmolLM2-135M/360M, Qwen2.5-0.5B, Llama-3.2-1B) - Add FlareEngineWrapper with OPFS caching, streaming, LoRA support, and progressive loading hook - Wire Flare into BrowserAI class with loadAdapter, isFlareModelCached, clearFlareModelCache - Export FlareEngineWrapper, flareModels, and OPFS helpers from main index Closes #295, #296, #297, #298, #300. Part of #293.

The Flare engine integration (#301) landed with zero tests. This adds two test files that together bring the project from 35 → 62 passing tests (+27): **src/engines/flare-engine-wrapper.test.ts (11 tests)** - Constructor initializes to a clean unloaded state - generateText / loadAdapter before loadModel throws helpful errors - embed() always throws (Flare is text-generation only) - isCached / clearCache / dispose are safe on unloaded engines - loadModel throws with a clear install instruction when @aspect/flare isn't installed (dynamic import fails cleanly) - flare-models.json shape validation + guarding against cross-registry key collisions with Demucs/MLC **src/core/llm/browserai.test.ts (13 tests)** - BrowserAI construction + guard rails (generateText, embed, separateAudio, transcribeAudio, loadAdapter, isFlareModelCached, clearFlareModelCache all throw sensible errors before loadModel) - loadModel with unknown identifier throws "not recognized" - registerCustomModel accepts a DemucsConfig - dispose is idempotent - Cross-registry sanity check: Demucs and Flare keys don't collide with each other or with MLC/Transformers. If demucs-models.json ever shadows an mlc-models.json key, the engine-selection logic in loadModel would silently pick the wrong one, so it's worth asserting at test time - Verifies every demucs/flare config has the expected `engine` field The BrowserAI tests mock all four engine wrappers (MLC, Transformers, Demucs, Flare) because they transitively use `import.meta.url`, which jest's default CJS runtime can't parse. The stubs just give us an API-compatible surface for assertions that never actually invoke engine methods. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions Bot added the size/L label Apr 15, 2026

coderabbitai Bot reviewed Apr 15, 2026

View reviewed changes

sauravpanda force-pushed the worktree-feat+flare-engine-integration branch from 5e6f1f1 to 2b1fd5c Compare April 16, 2026 00:38

sauravpanda merged commit 0b573ca into main Apr 16, 2026
9 of 10 checks passed

sauravpanda deleted the worktree-feat+flare-engine-integration branch April 16, 2026 00:38

sauravpanda mentioned this pull request Apr 16, 2026

Failed to load model: Failed to load MLC model "[object Object]": Unable to find a compatible GPU. ... #186

Closed

This was referenced Apr 17, 2026

feat: add engine benchmark demo and fix Flare package name #307

Merged

Move Flare benchmark onto a dedicated Web Worker #309

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Flare WASM inference engine integration#301

feat: add Flare WASM inference engine integration#301
sauravpanda merged 1 commit into
mainfrom
worktree-feat+flare-engine-integration

sauravpanda commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Uh oh!

coderabbitai Bot Apr 15, 2026

Uh oh!

coderabbitai Bot Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	const stopReason = this.engine.stream_stop_reason \|\| 'stop';
	// stream_stop_reason is only valid after streaming; default to 'stop' for batch generation
	const stopReason = onToken ? (this.engine.stream_stop_reason \|\| 'stop') : 'stop';

Conversation

sauravpanda commented Apr 15, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What is in this PR

Usage example

Dependency note

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related issues

Possibly related PRs

Poem

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sauravpanda commented Apr 15, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 15, 2026 •

edited

Loading