Merged
Conversation
IndexFlatIP + L2 normalization for cosine similarity. Lazy index init on first upsert(). save/load via .faiss + .meta files.
True upsert via ON CONFLICT DO UPDATE. Automatic table creation on first upsert(). Cosine similarity via pgvector <=> operator.
Covers upsert/search, cosine score, save/load roundtrip, and error cases. Auto-skipped if faiss-cpu is not installed.
Covers upsert/search, idempotent upsert, score range, and auto table creation. Skipped if TEST_POSTGRES_URL is not set.
…ir Protocol All integration and component classes now declare their Port contract via explicit inheritance instead of relying on structural subtyping alone. - AnthropicLLM, OpenAILLM → LLMPort - SQLAlchemyDB → DBPort - OpenAIEmbedding → EmbeddingPort - InMemoryVectorStore, FAISSVectorStore, PGVectorStore → VectorStorePort - MarkdownLoader, PlainTextLoader, PDFLoader → DocumentLoaderPort - SemanticChunker → DocumentChunkerPort
…ations - NullHook, MemoryHook → TraceHook - RecursiveCharacterChunker → DocumentChunkerPort
- pgvector_: return early in upsert() when vectors list is empty
- chunker: return early in _split() when separators list is empty
- directory_: catch per-file load errors with warnings.warn
so one bad file does not abort the entire directory load
Hard-coded 1024 could truncate long SQL responses. Default raised to 4096; users can override per instance.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#️⃣ Issue Number
📝 요약(Summary)
VectorRetriever,FAISSVectorStore,PGVectorStore를 추가하여 벡터 기반 스키마 검색 백엔드를구현한다.
KeywordRetriever와 벡터 검색을 Reciprocal Rank Fusion(RRF)으로 결합하는HybridRetriever및
HybridNL2SQL플로우를 추가한다.PDFLoader,RecursiveCharacterChunker,SemanticChunker등 Document Indexing Pipeline을 구축하고,IndexBuilder를 제거하며from_chunks()API로 단순화한다.💬 To Reviewers (선택)
hybrid.py):1/(k + rank)공식으로 BM25/Vector 결과를 병합합니다.k=60기본값이 이 도메인에서도 적절한지 의견 부탁드립니다.
faiss_.py): 동일chunk_id를 두 번 upsert하면 중복 엔트리가생깁니다. 현재는 인스턴스를 재생성하여 재인덱싱하는 방식인데, 별도 삭제 인터페이스가 필요한지 검토
부탁드립니다.
ports.py):LLMPort(Protocol)형태로 모든 Port가Protocol을명시적으로 상속하도록 변경했습니다. 기존 런타임 동작에는 영향이 없으나 타입 검사 일관성을 위한
변경입니다.
PR Checklist
HybridRetriever 포함)
pytest tests/)__init__.py업데이트 (FAISSVectorStore,PGVectorStore,HybridRetriever,HybridNL2SQLexport)docs/tutorials/,docs/vectorstore/)reference) How to Code Review