PointDev

Structured browser context capture for AI coding tools

Talk, draw, and click in your browser. PointDev captures the technical context and your intent, then compiles it into a structured prompt any AI agent can act on.

Getting Started · How It Works · Features · Roadmap · Contributing

What is PointDev?

You spot a problem in your browser. You switch to Claude Code and type "the hero font is too small." The agent has to guess which file, which component, what the current size is, and what "too small" means. The DOM path, component name, computed styles, and your spatial intent are all lost.

Browser automation gives AI agents eyes. PointDev gives humans a voice.

PointDev captures technical context (element selector, DOM subtree, computed styles, React component name, device metadata) and human context (timestamped voice narration, canvas annotations, cursor dwell behavior) simultaneously, then compiles everything into structured output any downstream tool can consume.

Key highlights:

Talk, draw, click all at the same time during a single capture session
Five annotation tools — circle, arrow, freehand, rectangle, and element selection
Temporal correlation links what you're pointing at with what you're saying
Smart screenshots auto-captured when you talk, dwell, or scroll — powered by real-time frame differencing
Console + network capture catches errors and failed requests alongside your feedback
Works on any web page, with opportunistic React component detection
Copy and paste the structured output into any AI coding tool

Real Output

This is actual output from PointDev, captured on a live site:

## Context
- URL: https://almostalab.io/
- Page title: Almost A Lab — Building the Future of AdTech
- Viewport: 1677 x 1145px
- Captured at: 2026-03-15 15:12:20

## User Intent (voice transcript)
[00:04] "I think the"
[00:06] "main hero"
[00:09] "is far too"
[00:11] "large"
[00:15] "and we need to adjust the line breaking"
[00:19] "two or three lines at Max"
[00:23] "and there is a problem at the bottom here"
[00:26] "where you scroll CTA"
[00:32] "overlapping with the"
[00:35] "subtitle of the page"

## Annotations
1. [00:40] Circle around .lg\:min-h-\[100svh\] at (42, 1019), radius 131px

## Cursor Behavior
- [00:07-00:32] Dwelled 25.5s over h1.font-display.text-hero (during: "main hero")
- [00:25-00:32] Dwelled 7.3s over div.absolute.bottom-10 (during: "scroll CTA")
- [00:29-00:36] Dwelled 6.2s over div.absolute.bottom-10 (during: "overlapping with the")

## Screenshots
1. [00:11] Multiple signals — "large"
   Signals: visual change: 12%, dwell: h1.font-display.text-hero (4.2s), voice: "large" [score: 0.9]
2. [00:26] Multiple signals — "where you scroll CTA"
   Signals: visual change: 31%, dwell: div.absolute.bottom-10 (7.3s), voice: "where you scroll CTA" [score: 0.8]
3. [00:40] Circle around .lg\:min-h-\[100svh\] — "subtitle of the page"
   Signals: voice: "subtitle of the page" [score: 1.00]

What an AI Agent Does With This

We pasted PointDev output into a Claude Code session managing the live website. Without any prior context, the agent:

Identified three UI issues from the voice transcript, annotations, and cursor dwell data
Mapped each issue to specific elements (the h1.font-display, div.absolute.bottom-10 overlap, padding mismatch)
Offered to fix all three immediately, asking only "Want me to look at the Hero component and fix these alignment/spacing issues?"

The agent also described what would make the output even more actionable:

"The ideal output for an agent is: screenshot + voice intent + source file:line + computed styles on each annotation target. That's a one-shot fix with no exploration needed."

"Loom for humans, annotated screenshots + structured metadata for agents. Same capture session, two output formats."

That feedback is now driving our roadmap.

Quickstart

Status: Proof of Concept. Working demo, not a production release.

Requires Bun and Chrome.

git clone https://github.com/BraedenBDev/pointdev.git
cd pointdev
bun install
bun build

Open chrome://extensions/, enable Developer Mode
Click Load unpacked and select the dist/ folder
Open any web page, click the PointDev icon to open the sidepanel
Click Setup Microphone if you want voice narration (one-time, tab auto-closes)
Chrome will prompt to approve host permissions on first use (needed for smart screenshots)

How It Works

flowchart LR
    subgraph Browser Tab
        CS[Content Script]
        MW[Main World<br/>Console/Network]
    end
    subgraph Extension
        SP[Sidepanel<br/>React UI + Speech]
        SI[Screenshot<br/>Intelligence]
        SW[Service Worker<br/>State + Capture]
    end

    SP -- START_CAPTURE --> SW
    SW -- INJECT_CAPTURE --> CS
    CS -- element, annotation,<br/>cursor --> SW
    MW -- CustomEvent --> CS
    CS -- CONSOLE_BATCH --> SW
    SI -- SNAPSHOT_REQUEST --> SW
    SW -- DWELL_UPDATE --> SI
    SP -. voice signal .-> SI
    SI -- SMART_SCREENSHOT --> SW
    SW -- SESSION_UPDATED --> SP

Sidepanel (React): Capture controls, live feedback, voice transcription (Web Speech API runs here directly), screenshot thumbnails with copy-to-clipboard, compiled output display.

Service Worker: Coordinates state between sidepanel and content script. Holds the CaptureSession, routes messages, captures screenshots via captureVisibleTab (both full-resolution for storage and low-quality JPEG for frame differencing), runs real-time dwell detection, injects main-world console/network capture script.

Content Script: Injected into the active page. Handles element selection (with Alt+scroll ancestry cycling), canvas annotation overlay (circle, arrow, freehand, rectangle), cursor tracking, React component detection, and CSS variable discovery.

Screenshot Intelligence (Sidepanel): Requests low-quality JPEG snapshots from the service worker every 2 seconds and compares them at 160x90 resolution via sparse pixel differencing. Combines frame diff results with cursor dwell and voice activity signals to produce a weighted interest score. Screenshots are captured when the score exceeds a threshold, and always on annotations (with a render delay so the canvas overlay is visible). Voice context is retained for 5 seconds after speech ends to ensure nearby captures carry relevant narration.

Main World Script: Injected into the page's JavaScript world via chrome.scripting.executeScript({ world: 'MAIN' }). Monkey-patches console.error/warn, fetch, and XMLHttpRequest to capture errors and failed requests. Bridges data back to the content script via CustomEvent.

All capture data flows into a single CaptureSession object with timestamps relative to recording start. A template formatter compiles this into structured output.

Features

Technical context (captured automatically):

Feature	Description
CSS selector + DOM subtree	Click any element to capture its selector and surrounding HTML
Computed styles + box model	font-size, color, spacing, display, position, content/padding/border/margin dimensions
CSS custom properties	Discovers `--variable` declarations from matching stylesheet rules
React component detection	Resolves component name via `__reactFiber$` internals
Console errors + network failures	Captures `console.error/warn`, failed fetch/XHR, uncaught exceptions, unhandled rejections
Page + device metadata	URL, title, viewport, browser, OS, screen size, pixel ratio, touch, color scheme
Cursor dwell tracking	Records which elements you hover over and for how long
Smart screenshots	Auto-captured by multi-signal intelligence: frame diff (CV), cursor dwell, voice activity, annotations

Human context (your input):

Feature	Description
Voice narration	Speak naturally; transcription runs live with timestamps
Visual annotations	Circle, arrow, freehand, and rectangle tools
Element selection	Click to select, Alt+scroll to cycle through parent/child elements

Everything is temporally correlated. The cursor dwell data shows which element you were pointing at when you said each phrase. Annotations are timestamped to align with your voice. Screenshots are enriched with the voice context from the moment you drew the annotation.

Tech Stack

Layer	Technology
Extension	Chrome Manifest V3
UI	React 18, TypeScript
Build	Vite + CRXJS
Runtime	Bun
Canvas	HTML5 Canvas API (annotation overlay)
Voice	Web Speech API
Testing	Vitest (1188 tests)

Project Structure

pointdev/
├── src/
│   ├── background/         # Service worker, message handler, session store
│   ├── content/            # Element selector, canvas overlay, cursor tracker,
│   │                       # React inspector, device metadata, console/network capture
│   ├── shared/             # Types, message definitions, template formatter,
│   │                       # dwell computation
│   └── sidepanel/          # React UI: App, hooks, components, screenshot intelligence
├── public/                 # Mic-permission page, icons
├── tests/                  # Vitest unit tests (mirrors src/ structure)
├── docs/
│   ├── design/             # MVP spec, implementation plan, library research
│   ├── superpowers/        # Feature specs and implementation plans
│   └── genai-disclosure/   # AI-assisted development log
├── CLAUDE.md               # AI agent guidance for this codebase
├── CONTRIBUTING.md         # Dev setup, testing, commit conventions
└── README.md

Permissions

PointDev requests minimal Chrome permissions:

Permission	Why
`activeTab`	Access the current tab when you start a capture
`scripting`	Inject content script + main-world console/network capture
`sidePanel`	The extension UI
`storage`	Persist capture session and mic permission state
`<all_urls>` (host)	Required for periodic screenshot capture during active sessions (`captureVisibleTab` from sidepanel context needs host permission — `activeTab` alone doesn't work from sidepanel timers)

Screenshots are only captured during active capture sessions you explicitly start. No background access to your browsing. Voice transcription offers two modes: "Fast (Google)" uses Web Speech API (audio sent to Google), "Private (On-device)" uses Whisper via WASM (zero network traffic, all inference local).

Roadmap

See all open issues for the full backlog.

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, testing, coding standards, and commit conventions.

Look for issues labeled good first issue and help wanted.

AI-Assisted Development Disclosure

This project is built using AI coding tools (Claude Code, Cursor) as the primary implementation workflow. The developer architects solutions, defines acceptance criteria, and reviews all code. AI agents execute implementation tasks under human direction.

Commits from AI agents use Co-Authored-By tags so they are distinguishable from human-authored commits. All code is reviewed, tested, and validated by the maintainer before merging. Architectural decisions are made by the human lead.

This is directly relevant to PointDev's mission: we're building a tool that improves the input side of human-to-AI-coder communication, and we're building it with those same tools.

Full development log: docs/genai-disclosure/development-log.md

License

MIT. See LICENSE.

github.com/BraedenBDev/pointdev · Built by Braeden Bihag at Almost a Lab

Name		Name	Last commit message	Last commit date
Latest commit History 118 Commits
bridge		bridge
docs		docs
public		public
src		src
tests		tests
tools/video-annotator		tools/video-annotator
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
bun.lock		bun.lock
package.json		package.json
tsconfig.json		tsconfig.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PointDev

What is PointDev?

Real Output

What an AI Agent Does With This

Quickstart

How It Works

Features

Tech Stack

Project Structure

Permissions

Roadmap

Contributing

AI-Assisted Development Disclosure

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PointDev

What is PointDev?

Real Output

What an AI Agent Does With This

Quickstart

How It Works

Features

Tech Stack

Project Structure

Permissions

Roadmap

Contributing

AI-Assisted Development Disclosure

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages