Add image handling to document indexing pipeline#1
Conversation
Extract images from PDFs, preserve alt-text from HTML/DOCX, and optionally describe images via vision LLM (Gemini, OpenAI, Claude). Falls back to [Image present] markers when no vision LLM is available. Image descriptions are appended to page text before tree building so the entire downstream pipeline works unchanged. Co-Authored-By: Mithun Gowda B <mithungowda.b7411@gmail.com>
- Full 60s cinematic promo video (1920x1080, 30fps) built with Remotion - 5 acts: Hero intro, Problem (traditional RAG), Solution (TreeDex pipeline), Query demo, and Closing with provider orbit - Pure CSS/SVG 2D motion graphics — no Three.js/WebGL dependencies - Terminal typewriter animations, animated SVG tree, floating cards - Sarvam AI TTS voiceover (male voice) + synthesized ambient BGM - Film grain, scene transitions, color grading, glitch effects - GitHub Actions workflow for automated video rendering
|
| GitGuardian id | GitGuardian status | Secret | Commit | Filename | |
|---|---|---|---|---|---|
| 29076893 | Triggered | Generic High Entropy Secret | 7aa299f | video/scripts/generate-tts.sh | View secret |
🛠 Guidelines to remediate hardcoded secrets
- Understand the implications of revoking this secret by investigating where it is used in your code.
- Replace and store your secret safely. Learn here the best practices.
- Revoke and rotate this secret.
- If possible, rewrite git history. Rewriting git history is not a trivial act. You might completely break other contributing developers' workflow and you risk accidentally deleting legitimate data.
To avoid such incidents in the future consider
- following these best practices for managing and storing secrets including API keys and other credentials
- install secret detection on pre-commit to catch secret before it leaves your machine and ease remediation.
🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.
📝 WalkthroughWalkthroughAdds optional image extraction to loaders and PDF parsing, appends per-image descriptions into page text (using alt text or LLM vision-capable generation), extends LLM backends with vision methods, prevents raw images from being saved to index JSON, and introduces a new Remotion-based promo video project with many scenes and components. Changes
Sequence Diagram(s)sequenceDiagram
participant TF as TreeDex.fromFile
participant AL as autoLoader / Loader (PDF/HTML/DOCX)
participant PP as PDF Parser (extractPages)
participant LLM as LLM Backend (supportsVision?)
participant TD as TreeDex.groupPages / save
TF->>AL: load(path, { extractImages })
AL->>PP: extractPages(path, extractImages?)
PP-->>AL: pages[] (may include images[])
AL-->>TF: pages[]
TF->>TF: describeImages(pages)
loop per page with images
TF->>LLM: if no alt_text && supportsVision -> generateWithImage(prompt,imageBase64,mime)
alt LLM returns description
LLM-->>TF: description
else use alt_text or fallback
Note right of TF: append "[Image: ...]" or "[Image present]"
end
end
TF->>TD: groupPages(pages with appended image text)
TD->>TD: Tree structure extraction
TF->>TD: save(path) (write pages without images field)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
9 issues found across 40 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="video/scripts/generate-tts.sh">
<violation number="1" location="video/scripts/generate-tts.sh:3">
P0: Remove the hardcoded API key and read it from an environment variable to avoid secret leakage.</violation>
</file>
<file name="video/src/components/SceneTransition.tsx">
<violation number="1" location="video/src/components/SceneTransition.tsx:13">
P2: `wipe` is exposed as a supported transition mode, but no `wipe` behavior is implemented. This makes the public prop contract misleading and causes callers to get an unintended fallback effect.</violation>
</file>
<file name="video/patches/patch-remotion.js">
<violation number="1" location="video/patches/patch-remotion.js:27">
P1: The script reports a successful patch even when the replacement pattern is not found, which can silently leave the compatibility fix unapplied.</violation>
</file>
<file name="video/scripts/generate-bgm.py">
<violation number="1" location="video/scripts/generate-bgm.py:92">
P2: Ensure the output directory exists before writing the WAV file, otherwise the script can fail with `FileNotFoundError` on clean environments.</violation>
</file>
<file name="video/src/components/ParticleBackground.tsx">
<violation number="1" location="video/src/components/ParticleBackground.tsx:115">
P2: `shape === 2` is generated but never rendered as a line, so one-third of particles fall back to the wrong shape.</violation>
</file>
<file name=".github/workflows/render-video.yml">
<violation number="1" location=".github/workflows/render-video.yml:55">
P2: Validate `BROWSER_PATH` before exporting it; otherwise the render step can fail with an empty browser executable path.</violation>
</file>
<file name="video/src/constants/tree-data.ts">
<violation number="1" location="video/src/constants/tree-data.ts:71">
P2: Several EM spectrum nodes use inverted page ranges (`startIndex` greater than `endIndex`), creating invalid section spans.</violation>
</file>
<file name="video/src/components/AnimatedTree.tsx">
<violation number="1" location="video/src/components/AnimatedTree.tsx:334">
P2: The pulse dot does not follow the same Bézier curve as the edge because x is interpolated linearly. Compute x with the cubic Bézier formula so the pulse stays on the path.</violation>
</file>
<file name="video/src/components/GlitchEffect.tsx">
<violation number="1" location="video/src/components/GlitchEffect.tsx:70">
P2: `color` is treated as a hex string when building gradients, but the prop type allows any CSS color string. Non-hex values will generate invalid color tokens and the glitch overlay colors will fail to render.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| @@ -0,0 +1,98 @@ | |||
| #!/bin/bash | |||
| # Generate TTS voiceover — continuous, paced to fit act durations | |||
| API_KEY="sk_ga7g62co_c33iM5ndlzrlgmY68UYIMRMp" | |||
There was a problem hiding this comment.
P0: Remove the hardcoded API key and read it from an environment variable to avoid secret leakage.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/scripts/generate-tts.sh, line 3:
<comment>Remove the hardcoded API key and read it from an environment variable to avoid secret leakage.</comment>
<file context>
@@ -0,0 +1,98 @@
+#!/bin/bash
+# Generate TTS voiceover — continuous, paced to fit act durations
+API_KEY="sk_ga7g62co_c33iM5ndlzrlgmY68UYIMRMp"
+API_URL="https://api.sarvam.ai/text-to-speech"
+OUT_DIR="public/audio"
</file context>
| const guard = "if (!cpus || cpus.length === 0) return;"; | ||
|
|
||
| if (!content.includes(guard)) { | ||
| content = content.replace( |
There was a problem hiding this comment.
P1: The script reports a successful patch even when the replacement pattern is not found, which can silently leave the compatibility fix unapplied.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/patches/patch-remotion.js, line 27:
<comment>The script reports a successful patch even when the replacement pattern is not found, which can silently leave the compatibility fix unapplied.</comment>
<file context>
@@ -0,0 +1,35 @@
+const guard = "if (!cpus || cpus.length === 0) return;";
+
+if (!content.includes(guard)) {
+ content = content.replace(
+ "const cpus = os.cpus();",
+ `const cpus = os.cpus();\n ${guard}`
</file context>
| enterColor?: string; | ||
| exitColor?: string; | ||
| /** Transition style */ | ||
| mode?: "fade" | "zoom" | "wipe" | "blur"; |
There was a problem hiding this comment.
P2: wipe is exposed as a supported transition mode, but no wipe behavior is implemented. This makes the public prop contract misleading and causes callers to get an unintended fallback effect.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/src/components/SceneTransition.tsx, line 13:
<comment>`wipe` is exposed as a supported transition mode, but no `wipe` behavior is implemented. This makes the public prop contract misleading and causes callers to get an unintended fallback effect.</comment>
<file context>
@@ -0,0 +1,135 @@
+ enterColor?: string;
+ exitColor?: string;
+ /** Transition style */
+ mode?: "fade" | "zoom" | "wipe" | "blur";
+ children: React.ReactNode;
+}
</file context>
|
|
||
| print("Generating ambient background music...") | ||
| samples = generate_bgm() | ||
| outpath = os.path.join("public", "audio", "bgm_ambient.wav") |
There was a problem hiding this comment.
P2: Ensure the output directory exists before writing the WAV file, otherwise the script can fail with FileNotFoundError on clean environments.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/scripts/generate-bgm.py, line 92:
<comment>Ensure the output directory exists before writing the WAV file, otherwise the script can fail with `FileNotFoundError` on clean environments.</comment>
<file context>
@@ -0,0 +1,95 @@
+
+print("Generating ambient background music...")
+samples = generate_bgm()
+outpath = os.path.join("public", "audio", "bgm_ambient.wav")
+write_wav(outpath, samples, SAMPLE_RATE)
+size = os.path.getsize(outpath)
</file context>
| position: "absolute", | ||
| left: x - size / 2, | ||
| top: y - size / 2, | ||
| width: size, |
There was a problem hiding this comment.
P2: shape === 2 is generated but never rendered as a line, so one-third of particles fall back to the wrong shape.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/src/components/ParticleBackground.tsx, line 115:
<comment>`shape === 2` is generated but never rendered as a line, so one-third of particles fall back to the wrong shape.</comment>
<file context>
@@ -0,0 +1,144 @@
+ position: "absolute",
+ left: x - size / 2,
+ top: y - size / 2,
+ width: size,
+ height: size,
+ borderRadius: p.shape === 1 ? 1 : "50%",
</file context>
| # libasound2 renamed to libasound2t64 on Ubuntu 24.04+ | ||
| sudo apt-get install -y libasound2t64 || sudo apt-get install -y libasound2 | ||
| # Find Chrome/Chromium path for Remotion | ||
| echo "BROWSER_PATH=$(which chromium-browser || which chromium || which google-chrome-stable || which google-chrome)" >> $GITHUB_ENV |
There was a problem hiding this comment.
P2: Validate BROWSER_PATH before exporting it; otherwise the render step can fail with an empty browser executable path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At .github/workflows/render-video.yml, line 55:
<comment>Validate `BROWSER_PATH` before exporting it; otherwise the render step can fail with an empty browser executable path.</comment>
<file context>
@@ -0,0 +1,85 @@
+ # libasound2 renamed to libasound2t64 on Ubuntu 24.04+
+ sudo apt-get install -y libasound2t64 || sudo apt-get install -y libasound2
+ # Find Chrome/Chromium path for Remotion
+ echo "BROWSER_PATH=$(which chromium-browser || which chromium || which google-chrome-stable || which google-chrome)" >> $GITHUB_ENV
+
+ - name: Install dependencies
</file context>
| structure: "1.4.1", | ||
| title: "Radio waves", | ||
| startIndex: 8, | ||
| endIndex: 7, |
There was a problem hiding this comment.
P2: Several EM spectrum nodes use inverted page ranges (startIndex greater than endIndex), creating invalid section spans.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/src/constants/tree-data.ts, line 71:
<comment>Several EM spectrum nodes use inverted page ranges (`startIndex` greater than `endIndex`), creating invalid section spans.</comment>
<file context>
@@ -0,0 +1,150 @@
+ structure: "1.4.1",
+ title: "Radio waves",
+ startIndex: 8,
+ endIndex: 7,
+ children: [],
+ },
</file context>
| const edgeT = (progress * edges.length) % 1; | ||
|
|
||
| // Interpolate along bezier curve (simplified as linear for now) | ||
| const x = edge.x1 + (edge.x2 - edge.x1) * edgeT; |
There was a problem hiding this comment.
P2: The pulse dot does not follow the same Bézier curve as the edge because x is interpolated linearly. Compute x with the cubic Bézier formula so the pulse stays on the path.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/src/components/AnimatedTree.tsx, line 334:
<comment>The pulse dot does not follow the same Bézier curve as the edge because x is interpolated linearly. Compute x with the cubic Bézier formula so the pulse stays on the path.</comment>
<file context>
@@ -0,0 +1,381 @@
+ const edgeT = (progress * edges.length) % 1;
+
+ // Interpolate along bezier curve (simplified as linear for now)
+ const x = edge.x1 + (edge.x2 - edge.x1) * edgeT;
+ const midY = edge.y1 + (edge.y2 - edge.y1) * 0.5;
+ // Approximate cubic bezier y
</file context>
| right: 0, | ||
| top: y, | ||
| height: h, | ||
| background: `linear-gradient(90deg, transparent 10%, ${color}${Math.round(intensity * 40).toString(16).padStart(2, "0")} 30%, transparent 70%)`, |
There was a problem hiding this comment.
P2: color is treated as a hex string when building gradients, but the prop type allows any CSS color string. Non-hex values will generate invalid color tokens and the glitch overlay colors will fail to render.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At video/src/components/GlitchEffect.tsx, line 70:
<comment>`color` is treated as a hex string when building gradients, but the prop type allows any CSS color string. Non-hex values will generate invalid color tokens and the glitch overlay colors will fail to render.</comment>
<file context>
@@ -0,0 +1,91 @@
+ right: 0,
+ top: y,
+ height: h,
+ background: `linear-gradient(90deg, transparent 10%, ${color}${Math.round(intensity * 40).toString(16).padStart(2, "0")} 30%, transparent 70%)`,
+ transform: `translateX(${jitterX}px)`,
+ }}
</file context>
There was a problem hiding this comment.
Actionable comments posted: 11
🧹 Nitpick comments (22)
video/scripts/generate-bgm.py (1)
90-95: Wrap script entrypoint with__main__guard.This avoids generating files if the module is imported elsewhere (tests/tooling).
Entrypoint cleanup
-print("Generating ambient background music...") -samples = generate_bgm() -outpath = os.path.join("public", "audio", "bgm_ambient.wav") -os.makedirs(os.path.dirname(outpath), exist_ok=True) -write_wav(outpath, samples, SAMPLE_RATE) -size = os.path.getsize(outpath) -print(f"Saved: {outpath} ({size} bytes, {DURATION}s)") +if __name__ == "__main__": + print("Generating ambient background music...") + samples = generate_bgm() + outpath = os.path.join("public", "audio", "bgm_ambient.wav") + os.makedirs(os.path.dirname(outpath), exist_ok=True) + write_wav(outpath, samples, SAMPLE_RATE) + size = os.path.getsize(outpath) + print(f"Saved: {outpath} ({size} bytes, {DURATION}s)")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/scripts/generate-bgm.py` around lines 90 - 95, The top-level execution in generate-bgm.py runs when the module is imported; wrap the entrypoint lines that call generate_bgm(), write_wav(), and the prints inside an if __name__ == "__main__": guard so importing the module (for tests or tooling) won't generate files—move the lines that call generate_bgm(), compute outpath, call write_wav(outpath, samples, SAMPLE_RATE), getsize and the final print into that guarded block and preserve references to SAMPLE_RATE and DURATION.video/src/components/FloatingCards.tsx (1)
30-35: Clamp animation progress inputs to avoid overshoot artifacts.If upstream progress exceeds
[0, 1], opacity/scale math can produce unstable visuals. Clamp once near props.Suggested guard
export const FloatingCards: React.FC<FloatingCardsProps> = ({ floatProgress, shatterProgress, width = 700, height = 600, }) => { const frame = useCurrentFrame(); + const safeFloat = Math.max(0, Math.min(1, floatProgress)); + const safeShatter = Math.max(0, Math.min(1, shatterProgress)); @@ - {floatProgress > 0.3 && shatterProgress < 0.5 && ( + {safeFloat > 0.3 && safeShatter < 0.5 && ( @@ - const opacity = (1 - shatterProgress) * 0.08; + const opacity = (1 - safeShatter) * 0.08; @@ - const wobbleX = Math.sin(t1) * 12 * floatProgress + Math.sin(t1 * 1.7) * 5 * floatProgress; - const wobbleY = Math.cos(t2) * 10 * floatProgress + Math.cos(t2 * 1.4) * 4 * floatProgress; - const wobbleRot = Math.sin(frame * 0.025 + i * 2.1) * 4 * floatProgress; + const wobbleX = Math.sin(t1) * 12 * safeFloat + Math.sin(t1 * 1.7) * 5 * safeFloat; + const wobbleY = Math.cos(t2) * 10 * safeFloat + Math.cos(t2 * 1.4) * 4 * safeFloat; + const wobbleRot = Math.sin(frame * 0.025 + i * 2.1) * 4 * safeFloat; @@ - const shatterEased = shatterProgress * shatterProgress; // Accelerating + const shatterEased = safeShatter * safeShatter; // Accelerating @@ - const shatterScale = 1 - shatterProgress * 0.5; - const shatterOpacity = Math.max(0, 1 - shatterProgress * 1.5); + const shatterScale = 1 - safeShatter * 0.5; + const shatterOpacity = Math.max(0, 1 - safeShatter * 1.5);Also applies to: 99-106, 124-126
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/FloatingCards.tsx` around lines 30 - 35, The component FloatingCards currently uses floatProgress and shatterProgress raw, which can exceed [0,1] and cause unstable opacity/scale visuals; clamp both props as soon as they are received (e.g., inside the FloatingCards function) to the [0,1] range using a simple clamp (Math.max(0, Math.min(1, value))) or a small clamp helper, and then use those clamped values everywhere (including any calculations referenced around the float/shatter math and the code paths noted near the blocks that compute opacity/scale) so downstream math at places like the float/shatter transforms always receives values bounded to [0,1].video/src/components/ColorGrade.tsx (1)
16-70: Clamp grade inputs to stable visual bounds.Clamping
contrastandvignetteto0..1will make this component safer against upstream overshoot.Suggested change
export const ColorGrade: React.FC<ColorGradeProps> = ({ tone = "cool", contrast = 0.05, vignette = 0.35, }) => { + const safeContrast = Math.max(0, Math.min(1, contrast)); + const safeVignette = Math.max(0, Math.min(1, vignette)); const toneColor = @@ - {contrast > 0 && ( + {safeContrast > 0 && ( <div style={{ @@ - opacity: contrast, + opacity: safeContrast, @@ - {vignette > 0 && ( + {safeVignette > 0 && ( <div style={{ @@ - background: `radial-gradient(ellipse at 50% 50%, transparent 50%, rgba(0,0,0,${vignette}) 100%)`, + background: `radial-gradient(ellipse at 50% 50%, transparent 50%, rgba(0,0,0,${safeVignette}) 100%)`,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/ColorGrade.tsx` around lines 16 - 70, The ColorGrade component accepts contrast and vignette props that can exceed visual bounds; clamp both values to the 0..1 range at the top of the ColorGrade function (e.g., const clampedContrast = Math.max(0, Math.min(1, contrast)) and const clampedVignette = Math.max(0, Math.min(1, vignette))) and then use clampedContrast and clampedVignette in the JSX where contrast and vignette are currently used (keep tone/toneColor logic unchanged); this ensures stable visual output if upstream values overshoot.video/package.json (1)
9-10: Deduplicate identicalrender/buildcommands.
buildcan callnpm run renderto avoid command drift later.Small cleanup
- "render": "remotion render src/index.ts TreeDexVideo out/treedex.mp4", - "build": "remotion render src/index.ts TreeDexVideo out/treedex.mp4" + "render": "remotion render src/index.ts TreeDexVideo out/treedex.mp4", + "build": "npm run render"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/package.json` around lines 9 - 10, The package.json has duplicate scripts "render" and "build" that run the same remotion command; change the "build" script to invoke the "render" script instead (e.g., set build to run "npm run render" or the project's package manager equivalent) so future changes stay DRY and avoid command drift; update the "build" script entry referencing the "render" script name.video/src/components/Icons.tsx (1)
10-158: Consider extracting a shared base SVG wrapper.All icons repeat the same
svgboilerplate (viewBox, stroke props, caps/joins). A tiny base component would reduce duplication and future edits.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/Icons.tsx` around lines 10 - 158, The icon components (e.g., IconScissors, IconArrowRight, IconDatabase, IconSearch, etc.) duplicate identical SVG wrapper props (viewBox, width/height, fill, stroke, strokeWidth, strokeLinecap, strokeLinejoin); extract a small IconBase component that accepts size, color, strokeWidth and forwards any extra svg props and children into a single <svg> wrapper, update each icon (IconScissors, IconArrowRight, IconDatabase, IconDocument, IconTree, IconMapPin, IconQuestion, IconCheckCircle, IconSparkles, IconBolt, IconCpu, IconPackage, IconHash, IconGitBranch) to render their path/circle/line elements as children of IconBase, and keep existing IconProps types so callers are unaffected.video/src/constants/timing.ts (1)
5-14: DeriveTOTAL_FRAMESfromACT_DURATIONSto avoid timeline drift.
TOTAL_FRAMESduplicates the same source-of-truth math. Computing it fromACT_DURATIONSprevents accidental mismatch.Suggested refactor
-export const TOTAL_FRAMES = 1800; // 60s - -// Act durations (frames) — used by <Series.Sequence durationInFrames={}> export const ACT_DURATIONS = { hero: 270, // 9s problem: 330, // 11s solution: 480, // 16s query: 360, // 12s closing: 360, // 12s } as const; + +export const TOTAL_FRAMES = Object.values(ACT_DURATIONS).reduce((sum, n) => sum + n, 0); // 60s🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/constants/timing.ts` around lines 5 - 14, TOTAL_FRAMES is hardcoded and duplicates the sum of ACT_DURATIONS, which risks drift; change TOTAL_FRAMES to be computed from ACT_DURATIONS (e.g., sum of Object.values(ACT_DURATIONS)) so the timeline is derived from the single source of truth. Update the declaration that currently exports TOTAL_FRAMES to compute the total from ACT_DURATIONS (referencing the ACT_DURATIONS constant and TOTAL_FRAMES symbol) and keep the export as const to preserve type behavior.src/prompts.ts (1)
96-101: Resolve contradictory instructions in the image prompt.“Likely purpose” (Line 99) invites inference, but Line 101 forbids speculation. This conflict can reduce output consistency.
Suggested prompt adjustment
- return `Describe this image concisely in 1-2 sentences. Focus on: + return `Describe this image concisely in 1-2 sentences. Focus on: - What the image shows (diagram, chart, photo, table, etc.) - Key information visible (labels, data points, text) -- Its likely purpose in a document context +- Any explicit context visible in the image (titles, captions, legends) Be factual and specific. Do not speculate beyond what is visible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/prompts.ts` around lines 96 - 101, The prompt string returned in src/prompts.ts contains contradictory guidance: it asks for the image's "likely purpose" but also says "Do not speculate beyond what is visible." Update that returned prompt string to resolve the conflict by either removing the "likely purpose" clause or clarifying it to only ask for purpose when it is explicitly evident in the image (e.g., replace "Its likely purpose in a document context" with "If the image explicitly indicates a purpose, state it; otherwise omit any inferred purpose"). Locate and edit the returned template string in src/prompts.ts to apply this change so the instructions are consistent.video/src/hooks/useProgress.ts (1)
56-67: Clamp Newton updates incubicBezierfor numerical stability.
xcan drift outside[0, 1]during iteration. Clamping prevents unstable outputs on edge control-point values.Suggested tweak
- x -= dx / dbx; + x = Math.max(0, Math.min(1, x - dx / dbx));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/hooks/useProgress.ts` around lines 56 - 67, The Newton iteration in cubicBezier allows x to drift outside [0,1], causing unstable outputs for edge control points; update the loop in cubicBezier to clamp x to the [0,1] range after each Newton step (and before using it to compute bx/dbx and the final y), e.g., compute dx/dbx, set x -= dx/dbx, then if (x < 0) x = 0; if (x > 1) x = 1; keep the existing convergence checks and ensure the returned value uses the clamped x so outputs remain stable.treedex/prompts.py (1)
83-89: Prompt guidance is internally inconsistent here as well.Line 86 asks for “likely purpose,” while Line 88 disallows speculation. Harmonize these to improve response reliability.
Suggested prompt adjustment
IMAGE_DESCRIPTION_PROMPT = """\ Describe this image concisely in 1-2 sentences. Focus on: - What the image shows (diagram, chart, photo, table, etc.) - Key information visible (labels, data points, text) -- Its likely purpose in a document context +- Any explicit context visible in the image (titles, captions, legends) Be factual and specific. Do not speculate beyond what is visible.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@treedex/prompts.py` around lines 83 - 89, The prompt text starting with "Describe this image concisely..." is internally inconsistent because it asks for the image's "likely purpose" while also forbidding speculation; update that prompt string in treedex/prompts.py so it is consistent: either remove the phrase "Its likely purpose in a document context" (to forbid inference) or replace it with a constrained, evidence-based wording such as "If the purpose is explicit or clearly implied by visible content, state it" (to allow limited, non-speculative inference). Locate and edit the prompt string that contains the bullets "What the image shows..." and "Key information visible..." and adjust the third bullet accordingly so the guidance is coherent.treedex/pdf_parser.py (1)
61-62: Consider logging or narrowing the exception type.Silently swallowing all exceptions makes debugging difficult when image extraction fails unexpectedly. At minimum, log the error at debug level or catch a more specific exception (e.g.,
RuntimeError,ValueError).♻️ Suggested improvement
- except Exception: - continue + except (RuntimeError, ValueError, KeyError) as e: + # Log at debug level if logging is available + continueAlternatively, if you have a logger available:
except Exception as e: logger.debug(f"Failed to extract image xref={xref}: {e}") continue🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@treedex/pdf_parser.py` around lines 61 - 62, The current bare "except Exception:" in treedex/pdf_parser.py hides errors during image extraction; update the handler in the image extraction routine (e.g., the function handling xref extraction—look for variables named xref or a function like extract_images/parse_pdf) to either catch specific exceptions (e.g., except (RuntimeError, ValueError) as e) or at minimum log the error before continuing; ensure you obtain a logger (or use the module logger) and call logger.debug/failure message including xref and the exception object so failures are visible for debugging.src/pdf-parser.ts (1)
74-76: Silent catch block hides extraction failures.Consider logging the error or catching a more specific error type to aid debugging when image extraction fails.
♻️ Suggested improvement
- } catch { - // Ignore image extraction errors - } + } catch (err) { + // Image extraction failed for this page; continue without images + console.debug?.(`Image extraction failed for page ${i + 1}:`, err); + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/pdf-parser.ts` around lines 74 - 76, The empty catch in the image extraction block silently swallows failures; update the try-catch around image extraction (the block that currently reads "catch { // Ignore image extraction errors }") to catch the error object (e.g., catch (err)) and log it (using the module's logger or console.error) or narrow the catch to specific error types thrown by the image extraction routine so failures are visible for debugging; ensure the log includes context like "image extraction failed" plus the error message and any relevant identifiers from the extraction function.treedex/loaders.py (1)
142-152: Consider documenting thatextract_imagesonly affects PDF loading.The
extract_imagesparameter is only used when loading PDF files. For HTML and DOCX, image markers ([Image: alt]or[Image]) are always extracted. This asymmetry could be clarified in the docstring to avoid confusion.📝 Suggested docstring improvement
def auto_loader(path: str, extract_images: bool = False) -> list[dict]: - """Auto-detect file format and load pages.""" + """Auto-detect file format and load pages. + + Args: + path: Path to the document file. + extract_images: If True, extract images from PDFs. HTML and DOCX + loaders always emit [Image] markers regardless of this flag. + """ ext = os.path.splitext(path)[1].lower()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@treedex/loaders.py` around lines 142 - 152, The docstring for auto_loader currently says "Auto-detect file format and load pages." but doesn't clarify that the extract_images flag only applies to PDF files; update the auto_loader docstring to state that extract_images is honored only when ext == ".pdf" (used by PDFLoader with extract_images=True), while HTML and DOCX loaders always extract image markers (e.g., "[Image: alt]" or "[Image]"); mention the default behavior and effect of extract_images to avoid confusion for callers of auto_loader and to reference PDFLoader when describing the PDF-specific behavior.video/src/scenes/Act1Hero.tsx (1)
145-147: Hardcoded version string may become stale.The version badge displays
v0.1.4as a hardcoded string. Consider importing this from a shared constant orpackage.jsonto keep it in sync with the actual package version.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/scenes/Act1Hero.tsx` around lines 145 - 147, Replace the hardcoded "v0.1.4" string in the Act1Hero component's version badge div with a sourced version constant (for example import { version } from '../../package.json' or from a shared constants module) and render it as `v${version}`; update the JSX in the version badge element inside the Act1Hero component to use that imported variable instead of the literal so the UI stays in sync with the package version..github/workflows/render-video.yml (1)
24-27: Consider adding type validation for CRF input.The
crfinput accepts any string value. Extremely low values (e.g., 0-5) could produce very large files or cause the render to fail, while values above ~51 are invalid for most codecs.📝 Suggested validation step
Add a validation step before rendering:
- name: Validate CRF run: | CRF="${{ github.event.inputs.crf || '18' }}" if ! [[ "$CRF" =~ ^[0-9]+$ ]] || [ "$CRF" -lt 0 ] || [ "$CRF" -gt 51 ]; then echo "::error::Invalid CRF value: $CRF (must be 0-51)" exit 1 fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/render-video.yml around lines 24 - 27, The crf workflow input currently accepts any string and needs validation; add a pre-render step (e.g., a job step named "Validate CRF") that reads the crf input (use the same `crf` input name and default "18" if unset), verifies it is an integer and within 0–51, and fails the job with a clear error message if validation fails so invalid values (non-numeric, <0, or >51) are rejected before the renderer runs.video/src/components/TerminalWindow.tsx (1)
259-260: Minor: Variable namelastIsShellis slightly misleading.The condition
lastLine.text.startsWith("$") || lastLine.type !== "command"means the prompt defaults to$for any non-command line (output, success, error), not just shell commands. The behavior is correct, but consider renaming to something likeuseShellPromptfor clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/TerminalWindow.tsx` around lines 259 - 260, Rename the misleading variable lastIsShell to useShellPrompt in TerminalWindow.tsx and update all its usages; keep the same condition (lastLine.text.startsWith("$") || lastLine.type !== "command") but assign prompt based on useShellPrompt (useShellPrompt ? "$ " : ">>> "). Also update any inline comments or tests referencing lastIsShell to use the new name to preserve clarity.src/core.ts (1)
162-167: Custom loaders won't receiveextractImagesoption.When a custom loader is provided,
extractImagesis not forwarded to it. If this is intentional (custom loaders manage their own image extraction), consider documenting this behavior in the JSDoc. Otherwise, you may want to extend theLoaderinterface to accept options.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/core.ts` around lines 162 - 167, The custom loader path doesn't receive the extractImages option; update the code so when a custom loader is present you forward the same options object to loader.load (i.e., call loader.load(path, { extractImages })), and update the Loader interface/signature (and any implementations of load) to accept an options parameter (or alternatively add a JSDoc comment on the Loader type and load method clarifying that custom loaders are expected to handle image extraction themselves). Modify references to loader.load, the Loader interface, and any implementors to match the new signature or add the documentation consistently.video/src/components/SceneTransition.tsx (2)
49-60: Scale transition may cause a visual discontinuity at midpoint.At exactly
frame === duration / 2, the scale abruptly switches fromenterScaletoexitScale. Since these scales end/start at1, the jump should be minimal, but consider using a unified interpolation across the full duration for smoother results.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/SceneTransition.tsx` around lines 49 - 60, The zoom branch in SceneTransition uses separate enterScale and exitScale and switches based on frame < duration / 2, causing a potential visual discontinuity at the midpoint; replace the two-step switch with a single unified interpolation for scale (e.g., compute a normalized progress from frame/duration or build one interpolate over [0, duration] that maps through the desired scale curve) so transform is assigned from one continuous scale value; update the code that references enterP/exitP/interpolate/frame/duration and set transform = `scale(${continuousScale})` to ensure smooth transition.
13-13:"fade"and"wipe"modes are declared but not implemented.The type allows
"fade" | "zoom" | "wipe" | "blur", but onlyzoomandblurhave specific implementations.fadefalls through to base opacity (acceptable), butwipehas no special handling and falls through to the same behavior. Consider either implementingwipeor removing it from the type.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/SceneTransition.tsx` at line 13, The mode union in SceneTransition.tsx declares "wipe" but SceneTransition's render/animation logic only handles "zoom" and "blur" (with "fade" falling back to base opacity); either remove "wipe" from the mode type or implement its behavior: add a branch in the SceneTransition component's animation selection (where mode is switched/used) to handle mode === "wipe" and apply a wipe animation (e.g., animate a clipping/mask or width/translate to reveal the scene) using the same animation system used for "zoom"/"blur" so timing/easing and props like duration/ease are respected; update any related types/comments to keep the union and implementation in sync.src/loaders.ts (1)
185-187: Consider documenting thatextractImagesonly applies to PDFs.The
extractImagesoption is silently ignored for non-PDF files. This is correct behavior (other formats don't support binary image extraction), but a brief comment or JSDoc note would clarify this for API consumers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/loaders.ts` around lines 185 - 187, Add a short comment/JSDoc near the code that checks ext and options (the block using ext === ".pdf" and options?.extractImages) clarifying that the extractImages option is only honored for PDFLoader and is ignored for other file types; mention PDFLoader and the options parameter by name so callers know extractImages applies exclusively to PDFs and will be silently ignored for non-PDF loaders.treedex/core.py (1)
55-56: Consider logging suppressed exceptions for debugging.The broad
except Exceptionis acceptable given the graceful fallback, but silently swallowing all exceptions may hide unexpected issues during development. Consider logging at DEBUG level.🔧 Optional: Add debug logging
+import logging + +logger = logging.getLogger(__name__) + def _describe_images(pages: list[dict], llm=None, verbose: bool = False) -> None: ... elif llm is not None and getattr(llm, "supports_vision", False) and img.get("data"): try: desc = llm.generate_with_image( IMAGE_DESCRIPTION_PROMPT, img["data"], img["mime_type"], ) descriptions.append(f"[Image: {desc.strip()}]") - except Exception: + except Exception as e: + logger.debug("Image description failed: %s", e) descriptions.append("[Image present]")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@treedex/core.py` around lines 55 - 56, The except Exception: block that currently does descriptions.append("[Image present]") swallows all errors; update it to log the caught exception at DEBUG level before appending the fallback string. Replace the silent except with a logger.debug (or logger.exception) call that includes the exception info (e.g., logger.debug("Failed to extract image description", exc_info=True) or logger.exception(...)) and then append "[Image present]" as before; reference the existing except Exception and descriptions.append("[Image present]") locations when making the change.video/src/components/FeatureHighlight.tsx (1)
12-12: Make the color prop contract explicit.
Feature.color,leftColor, andrightColorare plain strings, but these styles build translucent variants by appending hex alpha suffixes. That only works for hex input, sorgb(...),hsl(...), CSS variables, or named colors will generate invalid CSS here. Either constrain these props to hex colors or run them through a helper before applying transparency.Also applies to: 194-195, 56-57, 231-232, 239-239, 257-258
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/FeatureHighlight.tsx` at line 12, The color props (Feature.color, leftColor, rightColor) are treated as hex colors but typed as plain strings; either tighten their type to only accept hex (e.g., enforce a hex string pattern on Feature.color/leftColor/rightColor) or sanitize/convert any incoming color to a translucent variant before appending alpha. Modify the Feature type and FeatureHighlight component to accept a constrained HexColor type or run colors through a helper (e.g., normalizeToHexOrRgba) that returns a valid rgba/hex with alpha, then use that normalized value wherever translucent variants are built (references: Feature.color, leftColor, rightColor, and the translucent-building sites in FeatureHighlight).video/src/components/ParticleBackground.tsx (1)
24-25:shape === 2never renders as a line.
shapeis randomized across three values, but the JSX only special-casesshape === 1; the documented2=linecase currently falls through to the round particle. Either add the line styling or remove the third enum value/comment so the contract matches the visuals.Also applies to: 56-57, 117-118
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@video/src/components/ParticleBackground.tsx` around lines 24 - 25, The component's shape prop (ParticleBackground -> shape) documents three values but only special-cases shape === 1 in the JSX, so shape === 2 falls back to the round particle; either implement the line variant or remove the third enum and randomization. Fix by updating the ParticleBackground render logic (the JSX that checks shape) to handle shape === 2 and apply line-specific markup/style (e.g., a thin rect/div with rotated transform or a CSS class like particle-line) so it visually renders as a line, or alternatively remove the "2=line" comment and change the randomization that sets shape to only produce 0 or 1 and adjust any shape consumers accordingly. Ensure you update all occurrences where shape is checked/rendered (the same conditional used in the particle JSX and any helper that randomizes shape).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/llm-backends.ts`:
- Around line 89-91: The supportsVision getter is hard-coded to return true
causing wasted image requests; change it to be model-aware by making
supportsVision read from instance state (e.g., this.modelName or
this.opts.supportsVision) or derive it from a helper isVisionModel(modelName).
Update the getters at the three locations (the get supportsVision()
implementations around lines 89-91, 149-151, 225-227) to return a boolean based
on either an explicit constructor option (supportsVision) or on a small
isVisionModel(this.modelName) lookup, and add that option/lookup where the
backend classes are constructed so callers can override or the backend can
compute capability from the model name.
In `@src/pdf-parser.ts`:
- Around line 27-29: Replace the two hardcoded numeric OPS constants with the
exported OPS object from pdfjs and remove the deprecated JPEG-specific branch:
import OPS from pdfjs-dist (or import { OPS } from 'pdfjs-dist') and replace
uses of OPS_PAINT_IMAGE/OPS_PAINT_JPEG with OPS.paintImageXObject, delete any
checks against the old paintJpegXObject value, and simplify the image mime-type
assignment to use the single image path (e.g., set mime_type based on the
unified paintImageXObject handling or default to 'image/jpeg' if a fallback is
needed) in the code that handles image painting (look for the symbols
OPS_PAINT_IMAGE, OPS_PAINT_JPEG, and the mime_type assignment).
In `@treedex/llm_backends.py`:
- Around line 99-101: The supports_vision property must be model-aware: update
each backend class's supports_vision property in treedex/llm_backends.py (e.g.,
the OpenAI/Claude/Gemini backend classes that currently return True
unconditionally) so it inspects the configured model string (self.model or
equivalent) and returns True only for known vision-capable model names or
patterns (e.g., OpenAI: only "gpt-4o", "gpt-4-turbo", "gpt-4.1", "gpt-4v"
variants; Claude: only "claude-3" and above; Gemini: only general-purpose
current models), or allow an instance flag to override; implement this via model
string matching or a small provider-specific set/regex check so callers using
text-only model names (e.g., "gpt-4") get False and avoid generate_with_image
API calls that will fail.
In `@treedex/pdf_parser.py`:
- Around line 56-60: The image dict appended in extract_pages is missing the
alt_text field expected by downstream code (see core.py:44); update the
images.append call inside extract_pages to include "alt_text": f"[Embedded image
{img_index + 1} on page {page_num + 1}]" (use the same variables used to compute
img_index and page_num in that scope) so each image entry contains data,
mime_type, index_on_page, and alt_text matching the TypeScript convention.
In `@video/patches/patch-remotion.js`:
- Around line 26-33: The patch currently assumes the anchor "const cpus =
os.cpus();" exists and reports success even if replace() does nothing; add an
explicit anchor check and a post-replace verification: first define the anchor
string used for replacement (the literal "const cpus = os.cpus();") and if
content does not include that anchor, log an error and exit non-zero so the
patch fails loudly; then perform the replacement into a new variable (e.g.,
newContent) and only call fs.writeFileSync(target, newContent) and
console.log("Patched...") if newContent !== content (otherwise log/exit with an
error), using the existing symbols content, guard, target, replace,
fs.writeFileSync, and console.log to locate and implement the checks.
In `@video/scripts/generate-bgm.py`:
- Around line 92-93: The code calls write_wav(outpath, samples, SAMPLE_RATE) but
doesn't ensure the target directory exists, so create the parent directory for
outpath before calling write_wav; e.g., compute the parent directory from
outpath (using os.path.dirname or Path(outpath).parent) and call
os.makedirs(..., exist_ok=True) or Path(...).mkdir(parents=True, exist_ok=True)
right before write_wav to guarantee "public/audio" exists when write_wav is
invoked.
In `@video/scripts/generate-tts.sh`:
- Around line 3-4: Revoke/rotate the exposed Sarvam API key immediately and
remove it from the repo; then update the generate-tts.sh script to stop using
the hardcoded API_KEY and instead read the key from an environment variable
(e.g., SARVAM_API_KEY) while keeping API_URL as-is, and ensure CI/build secrets
are set accordingly; finally, purge the leaked secret from git history (use git
filter-repo/BFG or equivalent) and treat the old key as compromised.
In `@video/src/components/AnimatedTree.tsx`:
- Around line 333-342: The pulse dot is off the actual Bézier because x is
linearly interpolated while y uses the cubic formula; update the x calculation
to mirror the cubic interpolation used for y. Compute a mid control x (e.g.,
midX from edge.x1 and edge.x2 analogous to midY) and apply the same cubic Bézier
polynomial with t=edgeT and control points edge.x1, midX, edge.x2 so both x and
y follow the same curve; modify the code around the x/y computations in
AnimatedTree where edge, edgeT, x and y are calculated.
In `@video/src/components/FilmGrain.tsx`:
- Around line 14-33: FilmGrain currently computes seed as Math.floor(frame /
speed) and accepts opacity directly from props, which allows speed=0/negative
and out-of-range opacity; update the FilmGrain component to sanitize its props
at the top: clamp opacity to [0,1] (e.g., opacity = Math.min(1, Math.max(0,
opacity))) and enforce a safe minimum speed (e.g., speed = Math.max(1, speed))
before using useCurrentFrame and computing seed (seed = Math.floor(frame /
speed)); reference the FilmGrain props (opacity, speed) and the seed computation
to locate where to apply these guards.
In `@video/src/constants/tree-data.ts`:
- Around line 66-122: Several tree nodes have inverted index ranges (startIndex
> endIndex) which will break range logic; for each offending node (nodeId "0008"
title "Radio waves", "0010" "Infrared waves", "0011" "Visible rays", and "0013"
"X-rays") correct the startIndex/endIndex values so startIndex <= endIndex —
either swap the two values or set them to the intended correct numeric range
based on surrounding entries (e.g., align with adjacent nodes "0007"/"0009" and
"0012"/"0014") and ensure the final data for the functions that consume this
tree (e.g., any range-iteration in code referencing these nodeIds) uses the
fixed indices.
In `@video/src/hooks/useProgress.ts`:
- Around line 4-8: The useProgress hook divides by (end - start) without
guarding against invalid ranges; add a guard at the top of useProgress to handle
end <= start and avoid division by zero. Specifically, in the function
useProgress(start, end) (which uses useCurrentFrame()), return 0 while the frame
is before the range and 1 once at/after the range when end <= start (e.g., if
end <= start then return frame < start ? 0 : 1), otherwise proceed with the
existing clamped division (frame - start) / (end - start). Ensure the guard runs
before any division.
---
Nitpick comments:
In @.github/workflows/render-video.yml:
- Around line 24-27: The crf workflow input currently accepts any string and
needs validation; add a pre-render step (e.g., a job step named "Validate CRF")
that reads the crf input (use the same `crf` input name and default "18" if
unset), verifies it is an integer and within 0–51, and fails the job with a
clear error message if validation fails so invalid values (non-numeric, <0, or
>51) are rejected before the renderer runs.
In `@src/core.ts`:
- Around line 162-167: The custom loader path doesn't receive the extractImages
option; update the code so when a custom loader is present you forward the same
options object to loader.load (i.e., call loader.load(path, { extractImages })),
and update the Loader interface/signature (and any implementations of load) to
accept an options parameter (or alternatively add a JSDoc comment on the Loader
type and load method clarifying that custom loaders are expected to handle image
extraction themselves). Modify references to loader.load, the Loader interface,
and any implementors to match the new signature or add the documentation
consistently.
In `@src/loaders.ts`:
- Around line 185-187: Add a short comment/JSDoc near the code that checks ext
and options (the block using ext === ".pdf" and options?.extractImages)
clarifying that the extractImages option is only honored for PDFLoader and is
ignored for other file types; mention PDFLoader and the options parameter by
name so callers know extractImages applies exclusively to PDFs and will be
silently ignored for non-PDF loaders.
In `@src/pdf-parser.ts`:
- Around line 74-76: The empty catch in the image extraction block silently
swallows failures; update the try-catch around image extraction (the block that
currently reads "catch { // Ignore image extraction errors }") to catch the
error object (e.g., catch (err)) and log it (using the module's logger or
console.error) or narrow the catch to specific error types thrown by the image
extraction routine so failures are visible for debugging; ensure the log
includes context like "image extraction failed" plus the error message and any
relevant identifiers from the extraction function.
In `@src/prompts.ts`:
- Around line 96-101: The prompt string returned in src/prompts.ts contains
contradictory guidance: it asks for the image's "likely purpose" but also says
"Do not speculate beyond what is visible." Update that returned prompt string to
resolve the conflict by either removing the "likely purpose" clause or
clarifying it to only ask for purpose when it is explicitly evident in the image
(e.g., replace "Its likely purpose in a document context" with "If the image
explicitly indicates a purpose, state it; otherwise omit any inferred purpose").
Locate and edit the returned template string in src/prompts.ts to apply this
change so the instructions are consistent.
In `@treedex/core.py`:
- Around line 55-56: The except Exception: block that currently does
descriptions.append("[Image present]") swallows all errors; update it to log the
caught exception at DEBUG level before appending the fallback string. Replace
the silent except with a logger.debug (or logger.exception) call that includes
the exception info (e.g., logger.debug("Failed to extract image description",
exc_info=True) or logger.exception(...)) and then append "[Image present]" as
before; reference the existing except Exception and descriptions.append("[Image
present]") locations when making the change.
In `@treedex/loaders.py`:
- Around line 142-152: The docstring for auto_loader currently says "Auto-detect
file format and load pages." but doesn't clarify that the extract_images flag
only applies to PDF files; update the auto_loader docstring to state that
extract_images is honored only when ext == ".pdf" (used by PDFLoader with
extract_images=True), while HTML and DOCX loaders always extract image markers
(e.g., "[Image: alt]" or "[Image]"); mention the default behavior and effect of
extract_images to avoid confusion for callers of auto_loader and to reference
PDFLoader when describing the PDF-specific behavior.
In `@treedex/pdf_parser.py`:
- Around line 61-62: The current bare "except Exception:" in
treedex/pdf_parser.py hides errors during image extraction; update the handler
in the image extraction routine (e.g., the function handling xref
extraction—look for variables named xref or a function like
extract_images/parse_pdf) to either catch specific exceptions (e.g., except
(RuntimeError, ValueError) as e) or at minimum log the error before continuing;
ensure you obtain a logger (or use the module logger) and call
logger.debug/failure message including xref and the exception object so failures
are visible for debugging.
In `@treedex/prompts.py`:
- Around line 83-89: The prompt text starting with "Describe this image
concisely..." is internally inconsistent because it asks for the image's "likely
purpose" while also forbidding speculation; update that prompt string in
treedex/prompts.py so it is consistent: either remove the phrase "Its likely
purpose in a document context" (to forbid inference) or replace it with a
constrained, evidence-based wording such as "If the purpose is explicit or
clearly implied by visible content, state it" (to allow limited, non-speculative
inference). Locate and edit the prompt string that contains the bullets "What
the image shows..." and "Key information visible..." and adjust the third bullet
accordingly so the guidance is coherent.
In `@video/package.json`:
- Around line 9-10: The package.json has duplicate scripts "render" and "build"
that run the same remotion command; change the "build" script to invoke the
"render" script instead (e.g., set build to run "npm run render" or the
project's package manager equivalent) so future changes stay DRY and avoid
command drift; update the "build" script entry referencing the "render" script
name.
In `@video/scripts/generate-bgm.py`:
- Around line 90-95: The top-level execution in generate-bgm.py runs when the
module is imported; wrap the entrypoint lines that call generate_bgm(),
write_wav(), and the prints inside an if __name__ == "__main__": guard so
importing the module (for tests or tooling) won't generate files—move the lines
that call generate_bgm(), compute outpath, call write_wav(outpath, samples,
SAMPLE_RATE), getsize and the final print into that guarded block and preserve
references to SAMPLE_RATE and DURATION.
In `@video/src/components/ColorGrade.tsx`:
- Around line 16-70: The ColorGrade component accepts contrast and vignette
props that can exceed visual bounds; clamp both values to the 0..1 range at the
top of the ColorGrade function (e.g., const clampedContrast = Math.max(0,
Math.min(1, contrast)) and const clampedVignette = Math.max(0, Math.min(1,
vignette))) and then use clampedContrast and clampedVignette in the JSX where
contrast and vignette are currently used (keep tone/toneColor logic unchanged);
this ensures stable visual output if upstream values overshoot.
In `@video/src/components/FeatureHighlight.tsx`:
- Line 12: The color props (Feature.color, leftColor, rightColor) are treated as
hex colors but typed as plain strings; either tighten their type to only accept
hex (e.g., enforce a hex string pattern on Feature.color/leftColor/rightColor)
or sanitize/convert any incoming color to a translucent variant before appending
alpha. Modify the Feature type and FeatureHighlight component to accept a
constrained HexColor type or run colors through a helper (e.g.,
normalizeToHexOrRgba) that returns a valid rgba/hex with alpha, then use that
normalized value wherever translucent variants are built (references:
Feature.color, leftColor, rightColor, and the translucent-building sites in
FeatureHighlight).
In `@video/src/components/FloatingCards.tsx`:
- Around line 30-35: The component FloatingCards currently uses floatProgress
and shatterProgress raw, which can exceed [0,1] and cause unstable opacity/scale
visuals; clamp both props as soon as they are received (e.g., inside the
FloatingCards function) to the [0,1] range using a simple clamp (Math.max(0,
Math.min(1, value))) or a small clamp helper, and then use those clamped values
everywhere (including any calculations referenced around the float/shatter math
and the code paths noted near the blocks that compute opacity/scale) so
downstream math at places like the float/shatter transforms always receives
values bounded to [0,1].
In `@video/src/components/Icons.tsx`:
- Around line 10-158: The icon components (e.g., IconScissors, IconArrowRight,
IconDatabase, IconSearch, etc.) duplicate identical SVG wrapper props (viewBox,
width/height, fill, stroke, strokeWidth, strokeLinecap, strokeLinejoin); extract
a small IconBase component that accepts size, color, strokeWidth and forwards
any extra svg props and children into a single <svg> wrapper, update each icon
(IconScissors, IconArrowRight, IconDatabase, IconDocument, IconTree, IconMapPin,
IconQuestion, IconCheckCircle, IconSparkles, IconBolt, IconCpu, IconPackage,
IconHash, IconGitBranch) to render their path/circle/line elements as children
of IconBase, and keep existing IconProps types so callers are unaffected.
In `@video/src/components/ParticleBackground.tsx`:
- Around line 24-25: The component's shape prop (ParticleBackground -> shape)
documents three values but only special-cases shape === 1 in the JSX, so shape
=== 2 falls back to the round particle; either implement the line variant or
remove the third enum and randomization. Fix by updating the ParticleBackground
render logic (the JSX that checks shape) to handle shape === 2 and apply
line-specific markup/style (e.g., a thin rect/div with rotated transform or a
CSS class like particle-line) so it visually renders as a line, or alternatively
remove the "2=line" comment and change the randomization that sets shape to only
produce 0 or 1 and adjust any shape consumers accordingly. Ensure you update all
occurrences where shape is checked/rendered (the same conditional used in the
particle JSX and any helper that randomizes shape).
In `@video/src/components/SceneTransition.tsx`:
- Around line 49-60: The zoom branch in SceneTransition uses separate enterScale
and exitScale and switches based on frame < duration / 2, causing a potential
visual discontinuity at the midpoint; replace the two-step switch with a single
unified interpolation for scale (e.g., compute a normalized progress from
frame/duration or build one interpolate over [0, duration] that maps through the
desired scale curve) so transform is assigned from one continuous scale value;
update the code that references enterP/exitP/interpolate/frame/duration and set
transform = `scale(${continuousScale})` to ensure smooth transition.
- Line 13: The mode union in SceneTransition.tsx declares "wipe" but
SceneTransition's render/animation logic only handles "zoom" and "blur" (with
"fade" falling back to base opacity); either remove "wipe" from the mode type or
implement its behavior: add a branch in the SceneTransition component's
animation selection (where mode is switched/used) to handle mode === "wipe" and
apply a wipe animation (e.g., animate a clipping/mask or width/translate to
reveal the scene) using the same animation system used for "zoom"/"blur" so
timing/easing and props like duration/ease are respected; update any related
types/comments to keep the union and implementation in sync.
In `@video/src/components/TerminalWindow.tsx`:
- Around line 259-260: Rename the misleading variable lastIsShell to
useShellPrompt in TerminalWindow.tsx and update all its usages; keep the same
condition (lastLine.text.startsWith("$") || lastLine.type !== "command") but
assign prompt based on useShellPrompt (useShellPrompt ? "$ " : ">>> "). Also
update any inline comments or tests referencing lastIsShell to use the new name
to preserve clarity.
In `@video/src/constants/timing.ts`:
- Around line 5-14: TOTAL_FRAMES is hardcoded and duplicates the sum of
ACT_DURATIONS, which risks drift; change TOTAL_FRAMES to be computed from
ACT_DURATIONS (e.g., sum of Object.values(ACT_DURATIONS)) so the timeline is
derived from the single source of truth. Update the declaration that currently
exports TOTAL_FRAMES to compute the total from ACT_DURATIONS (referencing the
ACT_DURATIONS constant and TOTAL_FRAMES symbol) and keep the export as const to
preserve type behavior.
In `@video/src/hooks/useProgress.ts`:
- Around line 56-67: The Newton iteration in cubicBezier allows x to drift
outside [0,1], causing unstable outputs for edge control points; update the loop
in cubicBezier to clamp x to the [0,1] range after each Newton step (and before
using it to compute bx/dbx and the final y), e.g., compute dx/dbx, set x -=
dx/dbx, then if (x < 0) x = 0; if (x > 1) x = 1; keep the existing convergence
checks and ensure the returned value uses the clamped x so outputs remain
stable.
In `@video/src/scenes/Act1Hero.tsx`:
- Around line 145-147: Replace the hardcoded "v0.1.4" string in the Act1Hero
component's version badge div with a sourced version constant (for example
import { version } from '../../package.json' or from a shared constants module)
and render it as `v${version}`; update the JSX in the version badge element
inside the Act1Hero component to use that imported variable instead of the
literal so the UI stays in sync with the package version.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 73b547d1-285e-4c4a-bd00-585beade22b9
⛔ Files ignored due to path filters (7)
video/package-lock.jsonis excluded by!**/package-lock.jsonvideo/public/audio/bgm_ambient.wavis excluded by!**/*.wavvideo/public/audio/vo_act1.wavis excluded by!**/*.wavvideo/public/audio/vo_act2.wavis excluded by!**/*.wavvideo/public/audio/vo_act3.wavis excluded by!**/*.wavvideo/public/audio/vo_act4.wavis excluded by!**/*.wavvideo/public/audio/vo_act5.wavis excluded by!**/*.wav
📒 Files selected for processing (45)
.github/workflows/render-video.ymlsrc/core.tssrc/index.tssrc/llm-backends.tssrc/loaders.tssrc/pdf-parser.tssrc/prompts.tssrc/types.tstreedex/core.pytreedex/llm_backends.pytreedex/loaders.pytreedex/pdf_parser.pytreedex/prompts.pyvideo/package.jsonvideo/patches/patch-remotion.jsvideo/remotion.config.tsvideo/scripts/generate-bgm.pyvideo/scripts/generate-tts.shvideo/src/Root.tsxvideo/src/TreeDexVideo.tsxvideo/src/components/AnimatedTree.tsxvideo/src/components/ColorGrade.tsxvideo/src/components/FeatureHighlight.tsxvideo/src/components/FilmGrain.tsxvideo/src/components/FloatingCards.tsxvideo/src/components/GlitchEffect.tsxvideo/src/components/GlowOrb.tsxvideo/src/components/Icons.tsxvideo/src/components/ParticleBackground.tsxvideo/src/components/SceneTransition.tsxvideo/src/components/StepFlow.tsxvideo/src/components/TerminalWindow.tsxvideo/src/components/TextReveal.tsxvideo/src/constants/colors.tsvideo/src/constants/timing.tsvideo/src/constants/tree-data.tsvideo/src/hooks/useProgress.tsvideo/src/hooks/useTreeLayout2D.tsvideo/src/index.tsvideo/src/scenes/Act1Hero.tsxvideo/src/scenes/Act2Problem.tsxvideo/src/scenes/Act3Solution.tsxvideo/src/scenes/Act4Query.tsxvideo/src/scenes/Act5Closing.tsxvideo/tsconfig.json
| get supportsVision(): boolean { | ||
| return true; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's examine the src/llm-backends.ts file to understand the structure and find the relevant lines
wc -l src/llm-backends.tsRepository: mithun50/TreeDex
Length of output: 83
🏁 Script executed:
# Read the entire file to understand the implementation
cat -n src/llm-backends.tsRepository: mithun50/TreeDex
Length of output: 25884
🏁 Script executed:
# Also check src/core.ts to understand how supportsVision is used
cat -n src/core.tsRepository: mithun50/TreeDex
Length of output: 13819
supportsVision should be model-aware, not hard-coded by backend class.
These getters always return true, even when callers override modelName with a text-only model. While generateWithImage() errors are caught with graceful fallback in src/core.ts (line 55), this causes unnecessary API calls and errors for text-only model configurations. Please make this configurable per instance or derive it from the chosen model to avoid wasted requests.
Also applies to: 149-151, 225-227
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llm-backends.ts` around lines 89 - 91, The supportsVision getter is
hard-coded to return true causing wasted image requests; change it to be
model-aware by making supportsVision read from instance state (e.g.,
this.modelName or this.opts.supportsVision) or derive it from a helper
isVisionModel(modelName). Update the getters at the three locations (the get
supportsVision() implementations around lines 89-91, 149-151, 225-227) to return
a boolean based on either an explicit constructor option (supportsVision) or on
a small isVisionModel(this.modelName) lookup, and add that option/lookup where
the backend classes are constructed so callers can override or the backend can
compute capability from the model name.
| // pdfjs OPS constants for image painting | ||
| const OPS_PAINT_IMAGE = 85; // paintImageXObject | ||
| const OPS_PAINT_JPEG = 82; // paintJpegXObject |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
pdfjs-dist OPS constants export paintImageXObject paintJpegXObject
💡 Result:
In pdfjs-dist, OPS is an object containing constants for PDF operator names used in operator lists from page.getOperatorList. These are referenced as pdfjsLib.OPS.paintImageXObject and pdfjsLib.OPS.paintJpegXObject in code examples. OPS.paintImageXObject is used for painting general image XObjects (Do operator with image subtype). OPS.paintJpegXObject was used for JPEG XObjects but marked deprecated around 2021 (PR #14069, merged Sep 2021) as it's no longer used, now handled by paintImageXObject or paintXObject. They are exported in pdfjs-dist/build/pdf.js (and legacy builds) as part of PDFJS.OPS, accessible after importing pdfjs-dist. Source code is in mozilla/pdf.js repository (src/shared/util.js or similar defines OPS). Build generates pdfjs-dist with minified/concatenated code where OPS constants are available. Example usage for image extraction: const opList = await page.getOperatorList; for (let i = 0; i < opList.fnArray.length; i++) { if (opList.fnArray[i] === pdfjsLib.OPS.paintImageXObject || opList.fnArray[i] === pdfjsLib.OPS.paintJpegXObject) { const imgObj = opList.argsArray[i][0]; const img = await page.objs.get(imgObj); // process img } } OPS is not publicly documented with full list/values (internal), but commonly used for operator matching. Recent versions (up to v5.x as of 2026) maintain these for compatibility.
Citations:
- 1: How to get the image base64 data. mozilla/pdf.js#13538
- 2: https://npmdoc.github.io/node-npmdoc-pdfjs-dist/build/apidoc.html
- 3: https://stackoverflow.com/questions/40378990/extract-images-from-a-single-pdf-page-using-pdfjs
- 4: https://github.com/mozilla/pdfjs-dist
- 5: There is an issue with using pdfjs dist to obtain PDF coordinate data mozilla/pdf.js#18410
- 6: https://github.com/mozilla/pdf.js/blob/master/src/display/api.js
- 7: https://docs.google.com/spreadsheets/d/1LDbsAwrlC5OM5FPefD4sMDDAhls6OukoZvkWHaWyMUo/edit?usp=sharing
- 8: https://github.com/mozilla/pdf.js/
🏁 Script executed:
# Check pdfjs-dist version and how the constants are used
fd -t f "package.json" | head -5 | xargs grep -l "pdfjs-dist" 2>/dev/null | head -1Repository: mithun50/TreeDex
Length of output: 72
🏁 Script executed:
# Find and read src/pdf-parser.ts to see full context
cat -n src/pdf-parser.ts | head -100Repository: mithun50/TreeDex
Length of output: 3761
🏁 Script executed:
# Search for usage of OPS_PAINT_IMAGE and OPS_PAINT_JPEG
rg "OPS_PAINT_IMAGE|OPS_PAINT_JPEG" src/pdf-parser.ts -A 2 -B 2Repository: mithun50/TreeDex
Length of output: 763
🏁 Script executed:
cat package.json | grep -A 2 -B 2 "pdfjs-dist"Repository: mithun50/TreeDex
Length of output: 168
🏁 Script executed:
# Also check if OPS is actually exported from the legacy build
rg "OPS\." src/pdf-parser.tsRepository: mithun50/TreeDex
Length of output: 42
Replace hardcoded OPS constants with exported constants from pdfjs.
The hardcoded operator constants (85, 82) should use the exported OPS object from pdfjs-dist for maintainability and correctness. Additionally, paintJpegXObject (82) has been deprecated since 2021 and is no longer used in current versions; all images now use paintImageXObject. Update the code to use the OPS object directly and remove the obsolete JPEG check.
♻️ Suggested fix
const doc = await pdfjs.getDocument({ data }).promise;
- // pdfjs OPS constants for image painting
- const OPS_PAINT_IMAGE = 85; // paintImageXObject
- const OPS_PAINT_JPEG = 82; // paintJpegXObject
+ const { OPS } = pdfjs;Then update the comparison:
- if (op === OPS_PAINT_IMAGE || op === OPS_PAINT_JPEG) {
+ if (op === OPS.paintImageXObject) {And simplify the mime_type assignment:
- mime_type: op === OPS_PAINT_JPEG ? "image/jpeg" : "image/unknown",
+ mime_type: "image/jpeg",📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // pdfjs OPS constants for image painting | |
| const OPS_PAINT_IMAGE = 85; // paintImageXObject | |
| const OPS_PAINT_JPEG = 82; // paintJpegXObject | |
| const { OPS } = pdfjs; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/pdf-parser.ts` around lines 27 - 29, Replace the two hardcoded numeric
OPS constants with the exported OPS object from pdfjs and remove the deprecated
JPEG-specific branch: import OPS from pdfjs-dist (or import { OPS } from
'pdfjs-dist') and replace uses of OPS_PAINT_IMAGE/OPS_PAINT_JPEG with
OPS.paintImageXObject, delete any checks against the old paintJpegXObject value,
and simplify the image mime-type assignment to use the single image path (e.g.,
set mime_type based on the unified paintImageXObject handling or default to
'image/jpeg' if a fallback is needed) in the code that handles image painting
(look for the symbols OPS_PAINT_IMAGE, OPS_PAINT_JPEG, and the mime_type
assignment).
| @property | ||
| def supports_vision(self) -> bool: | ||
| return True |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, locate and examine the llm_backends.py file
find . -name "llm_backends.py" -type fRepository: mithun50/TreeDex
Length of output: 85
🏁 Script executed:
# Check the file size to understand how to read it
wc -l treedex/llm_backends.pyRepository: mithun50/TreeDex
Length of output: 87
🏁 Script executed:
# Examine the specific lines mentioned in the review
sed -n '95,105p' treedex/llm_backends.py
sed -n '140,150p' treedex/llm_backends.py
sed -n '195,205p' treedex/llm_backends.pyRepository: mithun50/TreeDex
Length of output: 1091
🏁 Script executed:
# Look at the class definitions and model_name configuration
sed -n '1,50p' treedex/llm_backends.pyRepository: mithun50/TreeDex
Length of output: 1993
🏁 Script executed:
# Check how model_name is defined for each backend class
rg "class (BaseLLM|GeminiLLM|OpenAILLM|ClaudeLLM)" treedex/llm_backends.py -A 10Repository: mithun50/TreeDex
Length of output: 1161
🏁 Script executed:
# Check the generate_with_image implementations for each backend
sed -n '99,125p' treedex/llm_backends.py
sed -n '143,175p' treedex/llm_backends.py
sed -n '197,230p' treedex/llm_backends.pyRepository: mithun50/TreeDex
Length of output: 2735
🏁 Script executed:
# Check how treedex/core.py uses supports_vision
find . -name "core.py" -type f | grep treedexRepository: mithun50/TreeDex
Length of output: 77
🏁 Script executed:
# Check how core.py uses supports_vision
rg "supports_vision" treedex/core.py -B 3 -A 3Repository: mithun50/TreeDex
Length of output: 404
🏁 Script executed:
# Check the context around image handling in core.py
rg "generate_with_image|Image present|fallback" treedex/core.py -B 2 -A 2Repository: mithun50/TreeDex
Length of output: 579
🏁 Script executed:
# Check if there's any model validation or filtering for vision support
rg "model_name|supports_vision" treedex/llm_backends.py | head -30Repository: mithun50/TreeDex
Length of output: 927
🏁 Script executed:
# Check all backends to see if any have conditional vision support
sed -n '50,700p' treedex/llm_backends.py | grep -A 2 "supports_vision"Repository: mithun50/TreeDex
Length of output: 366
🌐 Web query:
OpenAI GPT-4 models vision support capabilities which models support images
💡 Result:
The OpenAI GPT-4 family models that support vision (image input analysis) include GPT-4o, GPT-4 Turbo (gpt-4-turbo), GPT-4.1 (and variants like gpt-4.1-mini), and GPT-4V (early vision version of GPT-4). Base GPT-4 (gpt-4) does not support images. Newer models like GPT-5 series and o-series also support vision, but focusing on GPT-4 lineage: GPT-4o explicitly accepts text and image inputs for analysis. GPT-4 Turbo with Vision (gpt-4-turbo) offers image-to-text capabilities via Chat Completions API. GPT-4.1 family supports vision tokenization and processing. GPT-4V enables image analysis. Vision models process images for understanding objects, text in images, shapes, colors, etc., via APIs like Chat Completions or Responses. Image generation is handled by separate models like gpt-image-1, not core vision analysis.
Citations:
- 1: https://help.openai.com/en/articles/8555496-gpt-4-vision-api
- 2: https://platform.openai.com/docs/guides/images-vision
- 3: https://developers.openai.com/api/docs/models/gpt-4o
- 4: https://platform.openai.com/docs/models/gpt-4o/
- 5: https://developers.openai.com/api/docs/models/gpt-4-turbo
- 6: https://platform.openai.com/docs/models/gpt-4
- 7: https://openai.com/index/gpt-4-1/
🌐 Web query:
Google Gemini vision support all models or specific versions
💡 Result:
Google Gemini vision (multimodal image input support) is available for all current general-purpose Gemini models, but not universally across all specialized models. The official Gemini API vision documentation explicitly states: "All Gemini model versions are multimodal and can be utilized in a wide range of image processing and computer vision tasks". Examples include Gemini 3 series (e.g., Gemini 3 Pro, 3.1 Pro, 3 Flash), Gemini 2.5 series (e.g., 2.5 Pro, 2.5 Flash), and earlier active models like 1.5 series. Deprecated models like gemini-1.0-pro-vision no longer support it (retired 2024-2025). Specialized models like Gemini Embedding (text-only legacy) or pure audio/TTS do not support vision input, though newer Gemini Embedding 2 Preview is multimodal. Image generation output is limited to specific variants (e.g., Gemini X Flash Image, Nano Banana). As of 2026-03-22, all active non-specialized generative models support vision input via the unified generateContent API.
Citations:
- 1: https://ai.google.dev/gemini-api/docs/vision
- 2: https://ai.google.dev/gemini-api/docs/models
- 3: https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro
- 4: https://ai.google.dev/gemini-api/docs/models/gemini-2.5-pro
- 5: https://cloud.google.com/vertex-ai/generative-ai/docs/deprecations/gemini-1.0-pro
- 6: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/models
🌐 Web query:
Anthropic Claude vision support all models or specific versions
💡 Result:
Anthropic's Claude vision support is available for specific versions, not all models. Vision capabilities were introduced with the Claude 3 family and continue in all current Claude 4 models. Older models like Claude 2 do not support vision.
Citations:
- 1: https://platform.claude.com/docs/en/about-claude/models/overview
- 2: https://anthropic.mintlify.app/en/docs/build-with-claude/vision
- 3: https://www.anthropic.com/news/claude-3-family
- 4: https://docs.anthropic.com/en/docs/resources/claude-3-model-card
- 5: https://docs.cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude/haiku-3
- 6: https://cloud.google.com/vertex-ai/generative-ai/docs/partner-models/claude/haiku-3-5
Make supports_vision model-aware, not backend-class-aware.
These properties return True for any model string passed into each backend, but vision support varies by model:
- OpenAI: Only GPT-4o, GPT-4 Turbo, GPT-4.1, and GPT-4V support images; base GPT-4 does not.
- Claude: Only Claude 3+ models support vision; Claude 2 and earlier do not.
- Gemini: All current general-purpose models support vision, but specialized/deprecated models do not.
If a caller configures a text-only model (e.g., OpenAILLM(model="gpt-4")), supports_vision will still return True, causing treedex/core.py to attempt image processing via generate_with_image(), which will fail with a provider error. While the exception handler falls back to [Image present], this is an unintended failure path that wastes an API call.
Implement model-aware detection—either by checking the model string against known vision-capable models for each provider, making it instance-configurable, or detecting capability at runtime.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@treedex/llm_backends.py` around lines 99 - 101, The supports_vision property
must be model-aware: update each backend class's supports_vision property in
treedex/llm_backends.py (e.g., the OpenAI/Claude/Gemini backend classes that
currently return True unconditionally) so it inspects the configured model
string (self.model or equivalent) and returns True only for known vision-capable
model names or patterns (e.g., OpenAI: only "gpt-4o", "gpt-4-turbo", "gpt-4.1",
"gpt-4v" variants; Claude: only "claude-3" and above; Gemini: only
general-purpose current models), or allow an instance flag to override;
implement this via model string matching or a small provider-specific set/regex
check so callers using text-only model names (e.g., "gpt-4") get False and avoid
generate_with_image API calls that will fail.
| images.append({ | ||
| "data": base64.b64encode(img_bytes).decode("ascii"), | ||
| "mime_type": mime_type, | ||
| "index_on_page": img_index, | ||
| }) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if Python code elsewhere expects alt_text on image dicts
rg -n "alt_text" --type pyRepository: mithun50/TreeDex
Length of output: 128
🏁 Script executed:
cat -n treedex/core.py | sed -n '35,55p'Repository: mithun50/TreeDex
Length of output: 958
🏁 Script executed:
cat -n treedex/pdf_parser.py | sed -n '48,70p'Repository: mithun50/TreeDex
Length of output: 1172
🏁 Script executed:
# Check the TypeScript version to see the alt_text format
git show HEAD:src/pdf-parser.ts | sed -n '55,75p'Repository: mithun50/TreeDex
Length of output: 828
Add alt_text field to image entries in extract_pages.
The image dict at lines 56–60 is missing the alt_text field, which is expected by downstream code in core.py:44. The TypeScript implementation pre-generates this as [Embedded image {imgIndex + 1} on page {pageNum + 1}]. Without it, Python silently falls back to LLM-based image descriptions, creating inconsistency between implementations and reducing efficiency.
Current code (lines 56–60)
images.append({
"data": base64.b64encode(img_bytes).decode("ascii"),
"mime_type": mime_type,
"index_on_page": img_index,
})🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@treedex/pdf_parser.py` around lines 56 - 60, The image dict appended in
extract_pages is missing the alt_text field expected by downstream code (see
core.py:44); update the images.append call inside extract_pages to include
"alt_text": f"[Embedded image {img_index + 1} on page {page_num + 1}]" (use the
same variables used to compute img_index and page_num in that scope) so each
image entry contains data, mime_type, index_on_page, and alt_text matching the
TypeScript convention.
| if (!content.includes(guard)) { | ||
| content = content.replace( | ||
| "const cpus = os.cpus();", | ||
| `const cpus = os.cpus();\n ${guard}` | ||
| ); | ||
| fs.writeFileSync(target, content); | ||
| console.log("Patched @remotion/renderer for Android compatibility."); | ||
| } else { |
There was a problem hiding this comment.
Avoid silent no-op patch application.
If the anchor string changes upstream, replace() won’t modify content but the script still reports success. Add an explicit anchor check and fail clearly.
Suggested hardening
let content = fs.readFileSync(target, "utf8");
const guard = "if (!cpus || cpus.length === 0) return;";
+const anchor = "const cpus = os.cpus();";
if (!content.includes(guard)) {
+ if (!content.includes(anchor)) {
+ console.error("Patch anchor not found. Remotion file format may have changed.");
+ process.exit(1);
+ }
content = content.replace(
- "const cpus = os.cpus();",
- `const cpus = os.cpus();\n ${guard}`
+ anchor,
+ `${anchor}\n ${guard}`
);
fs.writeFileSync(target, content);
console.log("Patched `@remotion/renderer` for Android compatibility.");
} else {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@video/patches/patch-remotion.js` around lines 26 - 33, The patch currently
assumes the anchor "const cpus = os.cpus();" exists and reports success even if
replace() does nothing; add an explicit anchor check and a post-replace
verification: first define the anchor string used for replacement (the literal
"const cpus = os.cpus();") and if content does not include that anchor, log an
error and exit non-zero so the patch fails loudly; then perform the replacement
into a new variable (e.g., newContent) and only call fs.writeFileSync(target,
newContent) and console.log("Patched...") if newContent !== content (otherwise
log/exit with an error), using the existing symbols content, guard, target,
replace, fs.writeFileSync, and console.log to locate and implement the checks.
| API_KEY="sk_ga7g62co_c33iM5ndlzrlgmY68UYIMRMp" | ||
| API_URL="https://api.sarvam.ai/text-to-speech" |
There was a problem hiding this comment.
Critical: Hardcoded API key must be removed and rotated.
The Sarvam API key is exposed in source code. This is a security violation — the key is now in git history and should be considered compromised.
Immediate actions required:
- Rotate/revoke this API key in the Sarvam dashboard
- Remove the key from source and use an environment variable
🔒 Suggested fix
-API_KEY="sk_ga7g62co_c33iM5ndlzrlgmY68UYIMRMp"
+API_KEY="${SARVAM_API_KEY:?SARVAM_API_KEY environment variable is required}"Then set the key via environment or CI secrets:
export SARVAM_API_KEY="your-key-here"
./video/scripts/generate-tts.sh📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| API_KEY="sk_ga7g62co_c33iM5ndlzrlgmY68UYIMRMp" | |
| API_URL="https://api.sarvam.ai/text-to-speech" | |
| API_KEY="${SARVAM_API_KEY:?SARVAM_API_KEY environment variable is required}" | |
| API_URL="https://api.sarvam.ai/text-to-speech" |
🧰 Tools
🪛 Betterleaks (1.1.1)
[high] 3-3: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@video/scripts/generate-tts.sh` around lines 3 - 4, Revoke/rotate the exposed
Sarvam API key immediately and remove it from the repo; then update the
generate-tts.sh script to stop using the hardcoded API_KEY and instead read the
key from an environment variable (e.g., SARVAM_API_KEY) while keeping API_URL
as-is, and ensure CI/build secrets are set accordingly; finally, purge the
leaked secret from git history (use git filter-repo/BFG or equivalent) and treat
the old key as compromised.
| // Interpolate along bezier curve (simplified as linear for now) | ||
| const x = edge.x1 + (edge.x2 - edge.x1) * edgeT; | ||
| const midY = edge.y1 + (edge.y2 - edge.y1) * 0.5; | ||
| // Approximate cubic bezier y | ||
| const t = edgeT; | ||
| const y = | ||
| (1 - t) * (1 - t) * (1 - t) * edge.y1 + | ||
| 3 * (1 - t) * (1 - t) * t * midY + | ||
| 3 * (1 - t) * t * t * midY + | ||
| t * t * t * edge.y2; |
There was a problem hiding this comment.
Keep the pulse dot on the actual Bézier path.
y follows the cubic, but x is still interpolated linearly. On curved edges the dot drifts off the stroke near the ends, so the traversal highlight can visibly miss the edge.
🎯 Proposed fix
- // Interpolate along bezier curve (simplified as linear for now)
- const x = edge.x1 + (edge.x2 - edge.x1) * edgeT;
- const midY = edge.y1 + (edge.y2 - edge.y1) * 0.5;
- // Approximate cubic bezier y
const t = edgeT;
+ const midY = edge.y1 + (edge.y2 - edge.y1) * 0.5;
+ const x =
+ (1 - t) ** 3 * edge.x1 +
+ 3 * (1 - t) ** 2 * t * edge.x1 +
+ 3 * (1 - t) * t ** 2 * edge.x2 +
+ t ** 3 * edge.x2;
const y =
(1 - t) * (1 - t) * (1 - t) * edge.y1 +
3 * (1 - t) * (1 - t) * t * midY +
3 * (1 - t) * t * t * midY +🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@video/src/components/AnimatedTree.tsx` around lines 333 - 342, The pulse dot
is off the actual Bézier because x is linearly interpolated while y uses the
cubic formula; update the x calculation to mirror the cubic interpolation used
for y. Compute a mid control x (e.g., midX from edge.x1 and edge.x2 analogous to
midY) and apply the same cubic Bézier polynomial with t=edgeT and control points
edge.x1, midX, edge.x2 so both x and y follow the same curve; modify the code
around the x/y computations in AnimatedTree where edge, edgeT, x and y are
calculated.
| export const FilmGrain: React.FC<FilmGrainProps> = ({ | ||
| opacity = 0.035, | ||
| speed = 3, | ||
| }) => { | ||
| const frame = useCurrentFrame(); | ||
| // Change seed every N frames for grain flicker | ||
| const seed = Math.floor(frame / speed); | ||
|
|
||
| return ( | ||
| <svg | ||
| style={{ | ||
| position: "absolute", | ||
| inset: 0, | ||
| width: "100%", | ||
| height: "100%", | ||
| pointerEvents: "none", | ||
| zIndex: 999, | ||
| mixBlendMode: "overlay", | ||
| opacity, | ||
| }} |
There was a problem hiding this comment.
Guard invalid speed/opacity prop values.
speed at 0 (or negative) breaks seed computation; unbounded opacity can also produce odd output. Clamp both at component boundary.
Defensive fix
export const FilmGrain: React.FC<FilmGrainProps> = ({
opacity = 0.035,
speed = 3,
}) => {
const frame = useCurrentFrame();
+ const safeSpeed = Math.max(1, speed);
+ const safeOpacity = Math.max(0, Math.min(1, opacity));
// Change seed every N frames for grain flicker
- const seed = Math.floor(frame / speed);
+ const seed = Math.floor(frame / safeSpeed);
return (
<svg
style={{
@@
- opacity,
+ opacity: safeOpacity,
}}
>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| export const FilmGrain: React.FC<FilmGrainProps> = ({ | |
| opacity = 0.035, | |
| speed = 3, | |
| }) => { | |
| const frame = useCurrentFrame(); | |
| // Change seed every N frames for grain flicker | |
| const seed = Math.floor(frame / speed); | |
| return ( | |
| <svg | |
| style={{ | |
| position: "absolute", | |
| inset: 0, | |
| width: "100%", | |
| height: "100%", | |
| pointerEvents: "none", | |
| zIndex: 999, | |
| mixBlendMode: "overlay", | |
| opacity, | |
| }} | |
| export const FilmGrain: React.FC<FilmGrainProps> = ({ | |
| opacity = 0.035, | |
| speed = 3, | |
| }) => { | |
| const frame = useCurrentFrame(); | |
| const safeSpeed = Math.max(1, speed); | |
| const safeOpacity = Math.max(0, Math.min(1, opacity)); | |
| // Change seed every N frames for grain flicker | |
| const seed = Math.floor(frame / safeSpeed); | |
| return ( | |
| <svg | |
| style={{ | |
| position: "absolute", | |
| inset: 0, | |
| width: "100%", | |
| height: "100%", | |
| pointerEvents: "none", | |
| zIndex: 999, | |
| mixBlendMode: "overlay", | |
| opacity: safeOpacity, | |
| }} |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@video/src/components/FilmGrain.tsx` around lines 14 - 33, FilmGrain currently
computes seed as Math.floor(frame / speed) and accepts opacity directly from
props, which allows speed=0/negative and out-of-range opacity; update the
FilmGrain component to sanitize its props at the top: clamp opacity to [0,1]
(e.g., opacity = Math.min(1, Math.max(0, opacity))) and enforce a safe minimum
speed (e.g., speed = Math.max(1, speed)) before using useCurrentFrame and
computing seed (seed = Math.floor(frame / speed)); reference the FilmGrain props
(opacity, speed) and the seed computation to locate where to apply these guards.
| { | ||
| nodeId: "0008", | ||
| structure: "1.4.1", | ||
| title: "Radio waves", | ||
| startIndex: 8, | ||
| endIndex: 7, | ||
| children: [], | ||
| }, | ||
| { | ||
| nodeId: "0009", | ||
| structure: "1.4.2", | ||
| title: "Microwaves", | ||
| startIndex: 8, | ||
| endIndex: 8, | ||
| children: [], | ||
| }, | ||
| { | ||
| nodeId: "0010", | ||
| structure: "1.4.3", | ||
| title: "Infrared waves", | ||
| startIndex: 9, | ||
| endIndex: 8, | ||
| children: [], | ||
| }, | ||
| { | ||
| nodeId: "0011", | ||
| structure: "1.4.4", | ||
| title: "Visible rays", | ||
| startIndex: 9, | ||
| endIndex: 8, | ||
| children: [], | ||
| }, | ||
| { | ||
| nodeId: "0012", | ||
| structure: "1.4.5", | ||
| title: "Ultraviolet rays", | ||
| startIndex: 9, | ||
| endIndex: 9, | ||
| children: [], | ||
| }, | ||
| { | ||
| nodeId: "0013", | ||
| structure: "1.4.6", | ||
| title: "X-rays", | ||
| startIndex: 10, | ||
| endIndex: 9, | ||
| children: [], | ||
| }, | ||
| { | ||
| nodeId: "0014", | ||
| structure: "1.4.7", | ||
| title: "Gamma rays", | ||
| startIndex: 10, | ||
| endIndex: 13, | ||
| children: [], | ||
| }, | ||
| ], |
There was a problem hiding this comment.
Data error: Several nodes have startIndex > endIndex.
The following nodes have inverted ranges where startIndex > endIndex:
- Line 70-71: "Radio waves"
startIndex: 8, endIndex: 7 - Line 86-87: "Infrared waves"
startIndex: 9, endIndex: 8 - Line 94-95: "Visible rays"
startIndex: 9, endIndex: 8 - Line 110-111: "X-rays"
startIndex: 10, endIndex: 9
This will cause issues if any code computes range length (end - start) or iterates from start to end.
🛠️ Likely corrections (verify against source)
{
nodeId: "0008",
structure: "1.4.1",
title: "Radio waves",
- startIndex: 8,
- endIndex: 7,
+ startIndex: 7,
+ endIndex: 8,
children: [],
},Similar fixes needed for "Infrared waves", "Visible rays", and "X-rays".
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| { | |
| nodeId: "0008", | |
| structure: "1.4.1", | |
| title: "Radio waves", | |
| startIndex: 8, | |
| endIndex: 7, | |
| children: [], | |
| }, | |
| { | |
| nodeId: "0009", | |
| structure: "1.4.2", | |
| title: "Microwaves", | |
| startIndex: 8, | |
| endIndex: 8, | |
| children: [], | |
| }, | |
| { | |
| nodeId: "0010", | |
| structure: "1.4.3", | |
| title: "Infrared waves", | |
| startIndex: 9, | |
| endIndex: 8, | |
| children: [], | |
| }, | |
| { | |
| nodeId: "0011", | |
| structure: "1.4.4", | |
| title: "Visible rays", | |
| startIndex: 9, | |
| endIndex: 8, | |
| children: [], | |
| }, | |
| { | |
| nodeId: "0012", | |
| structure: "1.4.5", | |
| title: "Ultraviolet rays", | |
| startIndex: 9, | |
| endIndex: 9, | |
| children: [], | |
| }, | |
| { | |
| nodeId: "0013", | |
| structure: "1.4.6", | |
| title: "X-rays", | |
| startIndex: 10, | |
| endIndex: 9, | |
| children: [], | |
| }, | |
| { | |
| nodeId: "0014", | |
| structure: "1.4.7", | |
| title: "Gamma rays", | |
| startIndex: 10, | |
| endIndex: 13, | |
| children: [], | |
| }, | |
| ], | |
| { | |
| nodeId: "0008", | |
| structure: "1.4.1", | |
| title: "Radio waves", | |
| startIndex: 7, | |
| endIndex: 8, | |
| children: [], | |
| }, |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@video/src/constants/tree-data.ts` around lines 66 - 122, Several tree nodes
have inverted index ranges (startIndex > endIndex) which will break range logic;
for each offending node (nodeId "0008" title "Radio waves", "0010" "Infrared
waves", "0011" "Visible rays", and "0013" "X-rays") correct the
startIndex/endIndex values so startIndex <= endIndex — either swap the two
values or set them to the intended correct numeric range based on surrounding
entries (e.g., align with adjacent nodes "0007"/"0009" and "0012"/"0014") and
ensure the final data for the functions that consume this tree (e.g., any
range-iteration in code referencing these nodeIds) uses the fixed indices.
| export function useProgress(start: number, end: number): number { | ||
| const frame = useCurrentFrame(); | ||
| if (frame < start) return 0; | ||
| if (frame >= end) return 1; | ||
| return (frame - start) / (end - start); |
There was a problem hiding this comment.
Guard invalid ranges in useProgress.
Line 8 divides by (end - start) with no protection. If start === end (or end < start), progress becomes invalid and can break animations.
Suggested fix
export function useProgress(start: number, end: number): number {
const frame = useCurrentFrame();
+ if (end <= start) return frame >= end ? 1 : 0;
if (frame < start) return 0;
if (frame >= end) return 1;
return (frame - start) / (end - start);
}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@video/src/hooks/useProgress.ts` around lines 4 - 8, The useProgress hook
divides by (end - start) without guarding against invalid ranges; add a guard at
the top of useProgress to handle end <= start and avoid division by zero.
Specifically, in the function useProgress(start, end) (which uses
useCurrentFrame()), return 0 while the frame is before the range and 1 once
at/after the range when end <= start (e.g., if end <= start then return frame <
start ? 0 : 1), otherwise proceed with the existing clamped division (frame -
start) / (end - start). Ensure the guard runs before any division.
Summary
page.get_images()) and pdfjs-dist (getOperatorList())<img>tags and DOCX embedded images (viawp:inlineXML /convertToHtml())generateWithImage()on Gemini, OpenAI, Claude to describe extracted images[Image present]markers when no vision LLM is availablepage.textbeforegroup_pages(), so the entire downstream pipeline (tree building, querying, save/load) works unchangedsave()strips base64 image data from serialized JSON to prevent bloatFiles changed (12)
src/types.tsPageImageinterface,images?field onPagetreedex/prompts.py/src/prompts.tsIMAGE_DESCRIPTION_PROMPTtreedex/llm_backends.py/src/llm-backends.tssupports_vision+generate_with_image()on BaseLLM, Gemini, OpenAI, Claudetreedex/loaders.py/src/loaders.tsextract_imagesparamtreedex/pdf_parser.py/src/pdf-parser.tstreedex/core.py/src/core.ts_describe_images()helper,extract_imagesonfrom_file(), strip onsave()src/index.tsPageImagetype,imageDescriptionPromptTest plan
<img alt="test">produces[Image: test]in loaded page textimagesfield indexed with non-vision LLM produce[Image present]markersextract_images=Truepopulatesimagesarray on pagesSummary by cubic
Adds image handling to the indexing pipeline by extracting and describing images from PDFs/HTML/DOCX. Also ships a 60s
remotionpromo video with a CI workflow that renders via system Chromium and now runs faster with parallel frames, JPEG frames, and bundle caching.New Features
pdfjs-dist(TS) andPyMuPDF(Python); pages exposeimages?: PageImage[].<img alt>and DOCX alt text; emit “[Image: alt]” or “[Image]”.supportsVisionandgenerateWithImage()for Gemini, OpenAI, and Claude; fallback to “[Image present]”.page.textbefore grouping;save()strips image data; TS exportsPageImageandimageDescriptionPrompt.video/withstudio/renderscripts,@remotion/cli, and a GitHub Actions workflow that auto-renders using system Chromium; CI render optimized with parallel frames, JPEG image format, and bundle cache; includes Ubuntu 24.04libasound2fix.Migration
extractImages: true(TS) orextract_images=True(Python) when loading PDFs.Written for commit 17d6bf7. Summary will update on new commits.
Summary by CodeRabbit