Skip to content

Step17: fix vector db search , filter by url and contenthash#19

Merged
Parth576 merged 5 commits intomainfrom
step17
Mar 2, 2026
Merged

Step17: fix vector db search , filter by url and contenthash#19
Parth576 merged 5 commits intomainfrom
step17

Conversation

@Parth576
Copy link
Owner

@Parth576 Parth576 commented Mar 2, 2026

No description provided.

Parth576 added 5 commits March 1, 2026 20:49
Add url string parameter to VectorStore.Search method signature,
SearchCall recording struct, and MockVectorStore implementation.
This enables URL-based filtering during search to prevent
cross-contamination between analyzed websites.

Assisted by the code-assist SOP
When a non-empty URL is provided, QdrantStore.Search now applies a
Qdrant payload filter (keyword match on the "url" field) to restrict
results to chunks from that specific website. Empty URL preserves the
previous unfiltered behavior. The URL is also logged for observability.

Assisted by the code-assist SOP
Add url parameter to Pipeline.Retrieve and forward it to
VectorStore.Search so retrieval is scoped to the correct website.
Update Analyzer.Analyze to pass req.URL through the retrieval call.
Add URL to the retrieve log line for observability.

Assisted by the code-assist SOP
Add contentHash parameter to VectorStore.Search, Pipeline.Retrieve,
and Analyzer.Analyze so retrieval is scoped to the exact document
version (url + content_hash), preventing stale chunks from a previous
crawl from mixing into results.

Update Qdrant implementation to build a multi-condition Must filter
when both url and contentHash are provided. Add content_hash to the
retrieve log line for observability.

Assisted by the code-assist SOP
@Parth576 Parth576 merged commit 812093b into main Mar 2, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant