Skip to content

Conversation

@galligan
Copy link
Contributor

@galligan galligan commented Jan 23, 2026

Implement SQLite FTS5 indexing with bun:sqlite:

  • createIndex() factory with WAL mode for concurrency
  • BM25 ranking for relevance scoring
  • Three tokenizers: unicode61, porter, trigram
  • CRUD operations: add, search, update, remove
  • Metadata support with JSON serialization
  • Highlight extraction from search results
  • Comprehensive test coverage (39 tests)

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com

Resolves #55

Copy link
Contributor Author

galligan commented Jan 23, 2026

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

@greptile-apps
Copy link

greptile-apps bot commented Jan 23, 2026

Greptile Summary

  • Implements SQLite FTS5 full-text search functionality for packages/index with BM25 ranking, WAL mode, and CRUD operations
  • Adds comprehensive test coverage with 39 tests covering index creation, document operations, search functionality, and error handling
  • Provides clean public API through createIndex() factory function with proper TypeScript types and Result-based error handling

Important Files Changed

Filename Overview
packages/index/src/fts5.ts New FTS5 implementation with SQLite virtual tables, BM25 ranking, and transaction safety
packages/index/src/tests/index.test.ts Comprehensive test suite with 39 tests validating all search functionality and edge cases

Confidence score: 4/5

  • This PR is safe to merge with only minor risks related to delete-then-insert pattern complexity
  • Score reflects solid implementation with comprehensive testing, but deducted one point due to complex transaction handling and potential race conditions in the delete-then-insert pattern for document updates
  • Pay close attention to packages/index/src/fts5.ts for transaction handling and concurrent access patterns

Sequence Diagram

sequenceDiagram
    participant User
    participant Index
    participant Database
    participant FTS5

    User->>Index: "createIndex(options)"
    Index->>Database: "new Database(path)"
    Index->>Database: "PRAGMA journal_mode=WAL"
    Index->>Database: "CREATE VIRTUAL TABLE ... USING fts5"
    Database->>FTS5: "Initialize FTS5 table"
    Index-->>User: "return index instance"

    User->>Index: "add(document)"
    Index->>Database: "DELETE FROM table WHERE id = ?"
    Index->>Database: "INSERT INTO table VALUES (id, content, metadata)"
    Database->>FTS5: "Index content for search"
    Index-->>User: "Result.ok()"

    User->>Index: "search(query)"
    Index->>Database: "SELECT ... FROM table WHERE MATCH ?"
    Database->>FTS5: "Execute FTS5 search with BM25"
    FTS5-->>Database: "Return ranked results"
    Database-->>Index: "Search results with scores"
    Index-->>User: "Result.ok(searchResults)"

    User->>Index: "close()"
    Index->>Database: "close()"
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

id UNINDEXED,
content,
metadata UNINDEXED,
tokenize='${tokenizer}'
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: String interpolation in SQL could be vulnerable to injection if tokenizer comes from user input. Is the tokenizer value always controlled by the application code?

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/index/src/fts5.ts
Line: 122:122

Comment:
**style:** String interpolation in SQL could be vulnerable to injection if tokenizer comes from user input. Is the tokenizer value always controlled by the application code?

<sub>Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!</sub>

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines 230 to 270
it("addMany() is atomic (all or nothing)", async () => {
// First add some valid docs
await index.addMany([{ id: "valid-1", content: "Valid content" }]);

// Verify initial state
const db = new Database(dbPath);
const initialCount = db.query("SELECT COUNT(*) as count FROM documents").get() as {
count: number;
};
db.close();

expect(initialCount.count).toBe(1);
});
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Test for addMany() atomicity is incomplete - it only verifies initial state but doesn't test the actual atomic behavior (rollback on failure)

Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/index/src/__tests__/index.test.ts
Line: 230:242

Comment:
**logic:** Test for addMany() atomicity is incomplete - it only verifies initial state but doesn't test the actual atomic behavior (rollback on failure)

How can I resolve this? If you propose a fix, please make it concise.

const walPath = `${dbPath}-wal`;

// Write something to trigger WAL file creation
index.add({ id: "wal-test", content: "WAL test" });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Test calls index.add() without awaiting the result, which could cause race conditions in the WAL file existence check

Suggested change
index.add({ id: "wal-test", content: "WAL test" });
// Write something to trigger WAL file creation
await index.add({ id: "wal-test", content: "WAL test" });
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/index/src/__tests__/index.test.ts
Line: 600:600

Comment:
**logic:** Test calls index.add() without awaiting the result, which could cause race conditions in the WAL file existence check

```suggestion
		// Write something to trigger WAL file creation
		await index.add({ id: "wal-test", content: "WAL test" });
```

How can I resolve this? If you propose a fix, please make it concise.

@galligan galligan changed the base branch from p3-19/index/types-interfaces to graphite-base/82 January 23, 2026 21:47
@galligan galligan force-pushed the p3-20/index/fts5-impl branch from b8d09c1 to 86b969f Compare January 23, 2026 21:54
@galligan galligan changed the base branch from graphite-base/82 to p3-19/index/types-interfaces January 23, 2026 21:54
@galligan
Copy link
Contributor Author

Verified BM25 ordering: search() orders by bm25 ASC (lower scores first) and the relevance-order test asserts ascending scores. Tests passed in the pre-push run.

@galligan galligan force-pushed the p3-20/index/fts5-impl branch from 86b969f to c319824 Compare January 23, 2026 23:09
@galligan galligan force-pushed the p3-19/index/types-interfaces branch from e52caea to 3bc02a6 Compare January 23, 2026 23:09
@galligan
Copy link
Contributor Author

Restacked after downstack update (formatRelative test stabilization); no additional changes in this PR.

galligan and others added 3 commits January 23, 2026 22:01
Implement SQLite FTS5 indexing with bun:sqlite:
- createIndex() factory with WAL mode for concurrency
- BM25 ranking for relevance scoring
- Three tokenizers: unicode61, porter, trigram
- CRUD operations: add, search, update, remove
- Metadata support with JSON serialization
- Highlight extraction from search results
- Comprehensive test coverage (39 tests)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@galligan galligan changed the base branch from p3-19/index/types-interfaces to graphite-base/82 January 24, 2026 03:02
@galligan galligan force-pushed the p3-20/index/fts5-impl branch from c319824 to 1ed1245 Compare January 24, 2026 03:07
@galligan galligan changed the base branch from graphite-base/82 to p3-19/index/types-interfaces January 24, 2026 03:07
@galligan
Copy link
Contributor Author

Validated tableName/tokenizer to avoid SQL interpolation issues, strengthened addMany atomicity test (rollback on JSON stringify failure), and awaited WAL creation write. Kept BM25 ordering ASC—tests/documentation treat lower (more negative) scores as more relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants