-
Notifications
You must be signed in to change notification settings - Fork 0
feat(index): add FTS5 full-text search implementation #82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: p3-19/index/types-interfaces
Are you sure you want to change the base?
Conversation
b130dcc to
d3d9843
Compare
b50f9a7 to
aff813d
Compare
d3d9843 to
d6e91c4
Compare
aff813d to
097c3aa
Compare
d6e91c4 to
01b6488
Compare
608a423 to
b8d09c1
Compare
Greptile Summary
|
| Filename | Overview |
|---|---|
| packages/index/src/fts5.ts | New FTS5 implementation with SQLite virtual tables, BM25 ranking, and transaction safety |
| packages/index/src/tests/index.test.ts | Comprehensive test suite with 39 tests validating all search functionality and edge cases |
Confidence score: 4/5
- This PR is safe to merge with only minor risks related to delete-then-insert pattern complexity
- Score reflects solid implementation with comprehensive testing, but deducted one point due to complex transaction handling and potential race conditions in the delete-then-insert pattern for document updates
- Pay close attention to
packages/index/src/fts5.tsfor transaction handling and concurrent access patterns
Sequence Diagram
sequenceDiagram
participant User
participant Index
participant Database
participant FTS5
User->>Index: "createIndex(options)"
Index->>Database: "new Database(path)"
Index->>Database: "PRAGMA journal_mode=WAL"
Index->>Database: "CREATE VIRTUAL TABLE ... USING fts5"
Database->>FTS5: "Initialize FTS5 table"
Index-->>User: "return index instance"
User->>Index: "add(document)"
Index->>Database: "DELETE FROM table WHERE id = ?"
Index->>Database: "INSERT INTO table VALUES (id, content, metadata)"
Database->>FTS5: "Index content for search"
Index-->>User: "Result.ok()"
User->>Index: "search(query)"
Index->>Database: "SELECT ... FROM table WHERE MATCH ?"
Database->>FTS5: "Execute FTS5 search with BM25"
FTS5-->>Database: "Return ranked results"
Database-->>Index: "Search results with scores"
Index-->>User: "Result.ok(searchResults)"
User->>Index: "close()"
Index->>Database: "close()"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 files reviewed, 3 comments
| id UNINDEXED, | ||
| content, | ||
| metadata UNINDEXED, | ||
| tokenize='${tokenizer}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: String interpolation in SQL could be vulnerable to injection if tokenizer comes from user input. Is the tokenizer value always controlled by the application code?
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/index/src/fts5.ts
Line: 122:122
Comment:
**style:** String interpolation in SQL could be vulnerable to injection if tokenizer comes from user input. Is the tokenizer value always controlled by the application code?
<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>
How can I resolve this? If you propose a fix, please make it concise.| it("addMany() is atomic (all or nothing)", async () => { | ||
| // First add some valid docs | ||
| await index.addMany([{ id: "valid-1", content: "Valid content" }]); | ||
|
|
||
| // Verify initial state | ||
| const db = new Database(dbPath); | ||
| const initialCount = db.query("SELECT COUNT(*) as count FROM documents").get() as { | ||
| count: number; | ||
| }; | ||
| db.close(); | ||
|
|
||
| expect(initialCount.count).toBe(1); | ||
| }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Test for addMany() atomicity is incomplete - it only verifies initial state but doesn't test the actual atomic behavior (rollback on failure)
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/index/src/__tests__/index.test.ts
Line: 230:242
Comment:
**logic:** Test for addMany() atomicity is incomplete - it only verifies initial state but doesn't test the actual atomic behavior (rollback on failure)
How can I resolve this? If you propose a fix, please make it concise.| const walPath = `${dbPath}-wal`; | ||
|
|
||
| // Write something to trigger WAL file creation | ||
| index.add({ id: "wal-test", content: "WAL test" }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: Test calls index.add() without awaiting the result, which could cause race conditions in the WAL file existence check
| index.add({ id: "wal-test", content: "WAL test" }); | |
| // Write something to trigger WAL file creation | |
| await index.add({ id: "wal-test", content: "WAL test" }); |
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/index/src/__tests__/index.test.ts
Line: 600:600
Comment:
**logic:** Test calls index.add() without awaiting the result, which could cause race conditions in the WAL file existence check
```suggestion
// Write something to trigger WAL file creation
await index.add({ id: "wal-test", content: "WAL test" });
```
How can I resolve this? If you propose a fix, please make it concise.b8d09c1 to
86b969f
Compare
1659fa4 to
e52caea
Compare
|
Verified BM25 ordering: search() orders by bm25 ASC (lower scores first) and the relevance-order test asserts ascending scores. Tests passed in the pre-push run. |
86b969f to
c319824
Compare
e52caea to
3bc02a6
Compare
|
Restacked after downstack update (formatRelative test stabilization); no additional changes in this PR. |
Implement SQLite FTS5 indexing with bun:sqlite: - createIndex() factory with WAL mode for concurrency - BM25 ranking for relevance scoring - Three tokenizers: unicode61, porter, trigram - CRUD operations: add, search, update, remove - Metadata support with JSON serialization - Highlight extraction from search results - Comprehensive test coverage (39 tests) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
3bc02a6 to
d0d9e68
Compare
c319824 to
1ed1245
Compare
|
Validated tableName/tokenizer to avoid SQL interpolation issues, strengthened addMany atomicity test (rollback on JSON stringify failure), and awaited WAL creation write. Kept BM25 ordering ASC—tests/documentation treat lower (more negative) scores as more relevant. |

Implement SQLite FTS5 indexing with bun:sqlite:
Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com
Resolves #55