Skip to content

Add reliable CJK bookmark search#179

Open
willzhqiang wants to merge 1 commit into
afar1:mainfrom
willzhqiang:codex/fieldtheory-cjk-search
Open

Add reliable CJK bookmark search#179
willzhqiang wants to merge 1 commit into
afar1:mainfrom
willzhqiang:codex/fieldtheory-cjk-search

Conversation

@willzhqiang

Copy link
Copy Markdown

Summary

  • route CJK-containing bookmark queries through escaped LIKE substring matching instead of FTS5 unicode61 token matching
  • apply the same CJK query path to search, list, and count operations
  • split CJK mixed queries on whitespace so queries like 微信 RSS can match text with punctuation between terms

Why

The current FTS5 tokenizer is porter unicode61, which works well for English terms but is unreliable for CJK substring search. Queries such as 提示词 or 严厉的老师 can exist in bookmark text but return no results.

Tests

  • npm run build
  • npm test -- tests/bookmarks-db.test.ts (570 passing)

Local smoke

  • node dist/cli.js search '提示词' --limit 5
  • node dist/cli.js search '严厉的老师' --limit 3
  • node dist/cli.js search '微信 RSS' --limit 3
  • node dist/cli.js search 'learning' --limit 3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant