Skip to content

fix: improve What field quality and prevent duplicate entries#31

Open
StartupBros wants to merge 1 commit intoalexknowshtml:mainfrom
StartupBros:fix/what-field-quality
Open

fix: improve What field quality and prevent duplicate entries#31
StartupBros wants to merge 1 commit intoalexknowshtml:mainfrom
StartupBros:fix/what-field-quality

Conversation

@StartupBros
Copy link
Copy Markdown

Problem

The **What:** field in bookmark entries frequently degrades to low-quality output across three failure modes:

  1. Category labels instead of descriptions — "Tool or resource share", "Commentary/perspective", "Claude Code insights/comparison"
  2. Echo of tweet text — The model parrots the tweet verbatim instead of synthesizing ("karpathy really is the fucking goat", "Are you guys starting to catch on?")
  3. Empty or placeholder content — "Video content post", "Social media image bookmark", raw t.co URLs, or completely empty fields

This happens because the prompt gives almost no guidance on What field quality — just {1-2 sentence description of what this actually is}.

Solution

Three targeted improvements to process-bookmarks.md:

1. What Field Rules section

  • Explicit quality minimum (80+ characters)
  • NEVER-write list for common failure patterns
  • THIN: prefix convention for image/video-only tweets where context is genuinely missing
  • LINK_FAILED: prefix for unresolvable t.co links
  • Special guidance for quote tweets (synthesize BOTH the reaction AND quoted content)

2. Subagent template quality rules

Subagents don't see the full prompt, so the What field rules are inlined in the subagent prompt template.

3. Deduplication check

Prevents duplicate entries when bookmarks are re-processed (e.g., after a --force fetch).

4. Better title guidance

  • Quote tweets: use the quoted content's substance, not the reaction text
  • Media-only posts: [Media] prefix instead of "Video post" or raw URLs

Testing

Tested on 419 bookmarks processed with both Haiku and Sonnet:

Metric Before After
Empty What fields 9 (2.1%) 0
Very Short (<50 chars) 37 (8.8%) 0
LINK_FAILED (properly tagged) 0 7
THIN (properly tagged) 0 4

Zero garbage descriptions after the fix.

🤖 Generated with Claude Code

The What field in bookmark entries frequently degrades to generic category
labels ("Tool or resource share"), echoes of tweet text ("karpathy really
is the fucking goat"), or empty strings — especially for quote tweets,
image-only posts, and failed link expansions.

Changes:
- Add What Field Rules section with explicit quality minimum (80+ chars)
- Add NEVER-write list for common failure patterns (category labels, echoes,
  placeholders, raw URLs)
- Add THIN: prefix convention for image/video-only tweets with no text
- Add LINK_FAILED: prefix for unresolvable t.co links
- Add inline quality rules in subagent prompt template (subagents don't
  see the full prompt, so rules must be repeated)
- Improve title guidance for quote tweets (use quoted content, not reaction)
  and media-only posts
- Add deduplication check before inserting entries to prevent duplicates
  on re-runs

Tested on 419 bookmarks: reduced garbage descriptions from 8.9% to 0%.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant