Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions src/config/models/mlc-models.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,27 @@
{
"gemma-3-1b-it": {
"engine": "mlc",
"modelName": "Gemma-3-1B-Instruct",
"modelType": "text-generation",
"repo": "mlc-ai/gemma-3-1b-it-{quantization}-MLC",
"quantizations": [
"q4f16_1",
"q4f32_1"
],
"defaultQuantization": "q4f32_1",
"defaultParams": {
"temperature": 0.7,
"maxTokens": 2048
},
"pipeline": "text-generation",
"required_features": [
"shader-f16"
],
"metadata": {
"context_window_size": 32768,
"description": "Google Gemma 3 1B Instruct — smallest Gemma variant, fits in ~1 GB VRAM"
}
},
"llama-3.2-1b-instruct": {
"engine": "mlc",
"modelName": "Llama-3.2-1B-Instruct",
Expand Down
61 changes: 61 additions & 0 deletions src/core/agent/html-cleaner.ts
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,62 @@ export class HTMLCleaner {
return doc.body;
}

/**
* Removes elements that are explicitly hidden or are responsive-design duplicates.
*
* Many sites include the same content twice — once for mobile and once for desktop —
* controlling visibility via CSS classes or inline styles. Since DOMParser runs without
* stylesheets we cannot resolve class-based visibility, but we can prune elements that
* use explicit DOM signals to indicate they are non-primary content:
*
* 1. `aria-hidden="true"` — explicitly hidden from assistive technologies; almost always
* a decorative duplicate (e.g. a mobile nav icon that mirrors visible text).
* 2. `style="display:none"` / `style="visibility:hidden"` — inline-hidden elements that
* are never visible regardless of which CSS is loaded.
* 3. CSS class names containing "mobile", "desktop", "tablet" combined with common
* hide/show utility words ("hidden", "only", "show", "hide", "visible") — a heuristic
Comment on lines +56 to +57
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not prune show/visible responsive classes as duplicates.

Line [67] removes elements matching show/visible responsive classes (e.g., show-on-tablet, visible-desktop). That can delete the only intended content variant and cause downstream content loss.

Proposed fix
-    const RESPONSIVE_PATTERN =
-      /\b(mobile|desktop|tablet|phone|sm|md|lg|xl)\b.{0,10}\b(only|hidden|hide|show|visible|invisible)\b|\b(hidden|hide|show|visible|invisible)\b.{0,10}\b(mobile|desktop|tablet|phone|sm|md|lg|xl)\b/i;
+    const RESPONSIVE_HIDDEN_PATTERN =
+      /\b(mobile|desktop|tablet|phone|xs|sm|md|lg|xl)\b.{0,10}\b(hidden|hide|invisible|only)\b|\b(hidden|hide|invisible|only)\b.{0,10}\b(mobile|desktop|tablet|phone|xs|sm|md|lg|xl)\b/i;
@@
-      if (typeof className === 'string' && RESPONSIVE_PATTERN.test(className)) {
+      if (typeof className === 'string' && RESPONSIVE_HIDDEN_PATTERN.test(className)) {
         toRemove.push(el);
       }

Also applies to: 67-90

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/core/agent/html-cleaner.ts` around lines 56 - 57, The cleanup currently
treats responsive classes containing utility words including "show" and
"visible" as duplicates and removes those elements; update the class-matching
logic in src/core/agent/html-cleaner.ts so that the responsive-utility
list/regex no longer includes "show" and "visible" (leave "hidden", "only",
"hide", etc.), i.e., remove "show" and "visible" from the array/regex that
detects responsive hide/show classes and adjust any logic in the function that
prunes responsive duplicate nodes to skip elements matched only by
"show"/"visible" variants; run or add tests covering cases like "show-on-tablet"
and "visible-desktop" to ensure they are preserved.

* that catches Tailwind / Bootstrap / custom responsive utilities.
*
* @private
* @param {HTMLElement} root - The parsed DOM element to prune in-place
*/
private removeHiddenAndDuplicateElements(root: HTMLElement): void {
// Responsive class patterns that signal show/hide intent.
// Match e.g. "mobile-only", "desktop-hidden", "show-on-tablet", "hidden-xs"
const RESPONSIVE_PATTERN =
/\b(mobile|desktop|tablet|phone|sm|md|lg|xl)\b.{0,10}\b(only|hidden|hide|show|visible|invisible)\b|\b(hidden|hide|show|visible|invisible)\b.{0,10}\b(mobile|desktop|tablet|phone|sm|md|lg|xl)\b/i;

const toRemove: Element[] = [];

root.querySelectorAll('*').forEach((el) => {
// 1. aria-hidden="true"
if (el.getAttribute('aria-hidden') === 'true') {
toRemove.push(el);
return;
}

// 2. Inline display:none or visibility:hidden
const style = (el as HTMLElement).style;
if (style) {
if (style.display === 'none' || style.visibility === 'hidden') {
toRemove.push(el);
return;
}
}

// 3. Responsive class heuristic
const className = el.className;
if (typeof className === 'string' && RESPONSIVE_PATTERN.test(className)) {
toRemove.push(el);
}
});

// Remove in reverse document order so parent removal doesn't invalidate children
for (let i = toRemove.length - 1; i >= 0; i--) {
toRemove[i].remove();
}
}

/**
* Cleans HTML content by removing specified tags and attributes, returning only text content.
* @param {string} html - The HTML content to clean
Expand All @@ -49,6 +105,9 @@ export class HTMLCleaner {
clean(html: string): string {
const tempElement = this.parseHTML(html);

// Remove hidden elements and responsive-design duplicates before extracting text
this.removeHiddenAndDuplicateElements(tempElement);

this.tagsToRemove.forEach((tag) => {
let elements = tempElement.querySelectorAll(tag);
elements.forEach((el) => el.remove());
Expand All @@ -71,6 +130,7 @@ export class HTMLCleaner {
*/
cleanSemantic(html: string): string {
const tempElement = this.parseHTML(html);
this.removeHiddenAndDuplicateElements(tempElement);
let importantText = '';
const importantTags = [
'article',
Expand Down Expand Up @@ -141,6 +201,7 @@ export class HTMLCleaner {
*/
preserveSemanticHierarchy(html: string): string {
const tempElement = this.parseHTML(html);
this.removeHiddenAndDuplicateElements(tempElement);

const headingLevels = ['h1', 'h2', 'h3', 'h4', 'h5', 'h6'];
let structuredContent = '';
Expand Down
27 changes: 25 additions & 2 deletions src/engines/transformer-engine-wrapper.ts
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,22 @@ import { ModelConfig } from '../config/models/types';
import { TTSEngine, SAMPLE_RATE as TTS_SAMPLE_RATE } from './tts-engine';
import { AutoProcessor, MultiModalityCausalLM } from '../libs/transformers/transformers';

/**
* Detect whether a usable WebGPU adapter is available.
* Falls back to CPU (ONNX WASM backend) when WebGPU is absent or fails.
*/
async function detectBestDevice(): Promise<'webgpu' | 'cpu'> {
if (typeof navigator === 'undefined' || !('gpu' in navigator)) return 'cpu';
try {
const adapter = await (
navigator as unknown as { gpu: { requestAdapter(): Promise<unknown> } }
).gpu.requestAdapter();
return adapter ? 'webgpu' : 'cpu';
} catch {
return 'cpu';
}
}
Comment thread
coderabbitai[bot] marked this conversation as resolved.

export class TransformersEngineWrapper {
private transformersPipeline:
| TextGenerationPipeline
Expand Down Expand Up @@ -48,7 +64,15 @@ export class TransformersEngineWrapper {

this.modelType = modelConfig.modelType;

options.device = 'webgpu';
// Detect the best available compute device; fall back to CPU/WASM when
// WebGPU is unavailable (e.g. Firefox, older Chromium, Node.js).
// Callers may still override by passing `options.device` explicitly.
if (!options.device) {
options.device = await detectBestDevice();
if (options.device === 'cpu') {
console.info('[Transformers] WebGPU unavailable — falling back to CPU/WASM inference');
}
}
Comment on lines +70 to +75
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify default call sites that hard-code WebGPU and bypass auto-fallback.

rg -n -C3 "textToSpeech\(" src/core/llm/index.ts
rg -n -C3 "device:\s*'webgpu'" src/core/llm/index.ts src/engines/transformer-engine-wrapper.ts
rg -n -C3 "if \(!options\.device\)" src/engines/transformer-engine-wrapper.ts

Repository: sauravpanda/BrowserAI

Length of output: 1139


Remove hardcoded device: 'webgpu' to allow the engine's auto-detection fallback to run.

At Line 68 of src/engines/transformer-engine-wrapper.ts, the auto-detection only executes when options.device is unset. However, textToSpeech() in src/core/llm/index.ts (Line 197) hardcodes device: 'webgpu', bypassing this fallback and breaking on non-WebGPU browsers.

Suggested fix
-      await this.engine.loadModel(MODEL_CONFIG['kokoro-tts'], {
-        quantized: true,
-        device: 'webgpu',
-        ...options,
-      });
+      await this.engine.loadModel(MODEL_CONFIG['kokoro-tts'], {
+        quantized: true,
+        ...options,
+      });
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engines/transformer-engine-wrapper.ts` around lines 68 - 73, The engine
wrapper's auto-detection is bypassed because callers are hardcoding
options.device; remove the hardcoded device in the caller (textToSpeech in
src/core/llm/index.ts) so that TransformerEngineWrapper code (the options.device
check and detectBestDevice call inside transformer-engine-wrapper.ts) can run;
specifically, stop passing device: 'webgpu' from textToSpeech and instead pass
no device (or forward undefined) so TransformerEngineWrapper's logic using
detectBestDevice() can pick the best device at runtime.


// Configure pipeline options with proper worker settings
const pipelineOptions = {
Expand All @@ -67,7 +91,6 @@ export class TransformersEngineWrapper {

// Initialize image processor for multimodal models
if (modelConfig.modelType === 'multimodal') {
options.device = 'webgpu';
// console.log('Loading multimodal model...');
this.imageProcessor = await AutoProcessor.from_pretrained(modelConfig.repo, pipelineOptions);
// console.log('Image processor loaded');
Expand Down
Loading