Skip to content

[pull] main from microsoft:main#132

Merged
pull[bot] merged 2 commits intographrag:mainfrom
microsoft:main
Feb 21, 2026
Merged

[pull] main from microsoft:main#132
pull[bot] merged 2 commits intographrag:mainfrom
microsoft:main

Conversation

@pull
Copy link

@pull pull bot commented Feb 21, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

dayesouza and others added 2 commits February 20, 2026 19:53
#2236)

* feat(graphrag-vectors): add filtering, timestamps, and CRUD operations

Implement the vector store enhancements from the graphrag-vectors-design spec:

New modules:
- filtering.py: Pydantic-based filter expression system with F builder,
  operator overloads, JSON serialization, client-side evaluate(), and
  per-backend compilation (SQL for LanceDB/CosmosDB, OData for Azure AI Search)
- timestamp.py: ISO 8601 timestamp explosion into filterable component fields

Enhanced VectorStoreDocument:
- data: dict for user-defined metadata fields
- create_date / update_date: automatic ISO 8601 timestamps

Enhanced VectorStore base class:
- fields config for typed metadata columns
- insert / count / remove / update CRUD methods
- select, filters, include_vectors params on search methods
- Automatic timestamp explosion on insert/update
- User-defined date field explosion

Backend implementations (LanceDB, Azure AI Search, CosmosDB):
- Full filter compilation to native query languages
- Typed schema creation with user-defined fields
- All new CRUD operations

Breaking changes:
- search_by_id raises IndexError when document not found
- Updated indexer_adapters.py caller to handle the new exception

Tests:
- 54 unit tests for filtering and timestamp modules
- 28 LanceDB integration tests covering CRUD, filters, timestamps, select,
  include_vectors, and user-defined date field explosion

* fix: resolve CI build failures (formatting, lint, pyright, test mocks)

- Fix ruff formatting and lint errors across all changed files
- Refactor filtering.py: move operator overloads from monkey-patching to
  direct class methods for pyright visibility
- Use validation_alias/serialization_alias with populate_by_name for
  Pydantic AND/OR/NOT models (pyright + runtime compatible)
- Use Operator enum members instead of string literals in FieldRef
- Add missing abstract methods (insert, count, remove, update) to test
  mock VectorStore classes
- Update mock method signatures to match base class (select, filters,
  include_vectors params)
- Add docstrings to FieldRef magic methods (ruff D105)
- Fix noqa:S608 placement in cosmosdb.py

* feat: add top-level vector_size to VectorStoreConfig

Add a vector_size field (default 3072) to VectorStoreConfig so users
can set it once instead of on every individual index schema. The value
is propagated to new IndexSchema entries during validation.

* chore: add semversioner patch entry

* chore: add ismatch and ftype to spellcheck dictionary

* Add example notebooks for LanceDB, Azure AI Search, and CosmosDB vector stores

- Three notebooks demonstrating: document loading, similarity search, metadata
  filtering with F builder, timestamp filtering, document update/removal
- Sample data files (text_units.parquet, embeddings.text_unit_text.parquet)
- Add CPY001, SLF001, DTZ005 to notebook lint ignores in pyproject.toml

* refactor: extract model/tokenizer creation from generate_text_embeddings into callers
@pull pull bot locked and limited conversation to collaborators Feb 21, 2026
@pull pull bot added the ⤵️ pull label Feb 21, 2026
@pull pull bot merged commit cd0c405 into graphrag:main Feb 21, 2026
6 of 10 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants