KnowledgeSDK is a Go library for building and managing vector-based knowledge bases with semantic search capabilities. It provides a comprehensive set of tools for document management, chunking, embedding generation, and semantic search.
- Knowledge base management (create, read, update, delete)
- Document handling with automatic content extraction
- Text chunking for efficient storage and retrieval
- Vector embedding generation
- Semantic search with similarity scoring
- PostgreSQL-based vector storage with efficient indexing
- Support for various file formats via Apache Tika integration
type Config struct {
// Database configuration
DBHost string
DBPort int
DBName string
DBUser string
DBPassword string
// Vector embedding service configuration
APIKey string
BaseURL string // Compatible with different model services
EmbeddingModel string // e.g. "text-embedding-ada-002"
}type ChunkConfig struct {
ChunkSize int // Maximum number of characters per chunk
Overlap int // Number of overlapping characters between adjacent chunks
}type SearchParams struct {
Query string // Query text to search for
TopK int // Number of results to return
SimilarityThreshold float64 // Minimum similarity score (0-1)
CreatorID string // Creator ID for filtering results (optional)
KBID string // Knowledge base ID to limit search scope (optional)
}type TikaConfig struct {
URL string // Tika server URL, e.g., "http://localhost:9998"
}
// DefaultTikaConfig returns default Tika configuration with URL set to "http://localhost:9998"Creates a new SDK instance with the provided configuration.
- Parameters:
config Config: Configuration for database and embedding service
- Returns:
*KnowledgeSDK: SDK instanceerror: Error if initialization fails
Creates a new knowledge base.
- Parameters:
ctx context.Context: Context for the operationkb *KnowledgeBase: Knowledge base object with fields:Name: Knowledge base nameDescription: Knowledge base descriptionModelID: Large model identifier (optional)Temperature: Model temperature parameter, controls randomness (optional, default 0.7)RigorousPrompt: Rigorous answer prompt template (optional)EnableRigorousAnswer: Whether to enable rigorous answer mode (optional, default false)ChunkSize: Document chunk size in characters (optional, default 1000)Overlap: Overlap between adjacent chunks (optional, default 50)TopK: Maximum number of related chunks to retrieve (optional, default 5)SimilarityThreshold: Similarity threshold (optional, default 0.6)SystemPromptTemplate: System prompt template (optional)MaxReferenceLength: Maximum reference knowledge length (optional, default 3000)CreatorID: ID of the knowledge base creator (optional)
- Returns:
*KnowledgeBase: Created knowledge baseerror: Error if creation fails
Retrieves a knowledge base by ID.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge base
- Returns:
*KnowledgeBase: Retrieved knowledge baseerror: Error if retrieval fails
Lists all knowledge bases.
- Parameters:
ctx context.Context: Context for the operation
- Returns:
[]KnowledgeBase: List of knowledge baseserror: Error if listing fails
Lists all knowledge bases by creator ID.
- Parameters:
ctx context.Context: Context for the operationcreatorID string: ID of the creator
- Returns:
[]KnowledgeBase: List of knowledge baseserror: Error if listing fails
Retrieves multiple knowledge bases by their IDs.
- Parameters:
ctx context.Context: Context for the operationkbIDs []string: List of knowledge base IDs
- Returns:
[]KnowledgeBase: List of knowledge baseserror: Error if retrieval fails
Updates all properties of a knowledge base.
- Parameters:
ctx context.Context: Context for the operationkb *KnowledgeBase: Knowledge base object with updated fields
- Returns:
*KnowledgeBase: Updated knowledge baseerror: Error if update fails
Deletes a knowledge base and all its documents.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge base
- Returns:
error: Error if deletion fails
Lists all documents in a knowledge base.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge base
- Returns:
[]Document: List of documentserror: Error if listing fails
Lists documents in a knowledge base with pagination, sorting, and keyword filtering for document names.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge basekeyword string: Keyword for filtering document names (use empty string for no filtering)page int: Page number (starting from 1)pageSize int: Number of documents per pageorderBy string: Sorting criteria (e.g., "uploaded_at DESC")creatorID string: ID of the creator (optional, for filtering)
- Returns:
[]Document: List of documentsint64: Total number of documents in the knowledge base matching the filter criteriaerror: Error if listing fails
Search knowledge bases by name.
- Parameters:
ctx context.Context: Context for the operationname string: Name keyword to search for
- Returns:
[]KnowledgeBase: List of matching knowledge baseserror: Error if search fails
Search knowledge bases by description.
- Parameters:
ctx context.Context: Context for the operationdescription string: Description keyword to search for
- Returns:
[]KnowledgeBase: List of matching knowledge baseserror: Error if search fails
Search knowledge bases by keyword (searches both name and description).
- Parameters:
ctx context.Context: Context for the operationkeyword string: Keyword to search for
- Returns:
[]KnowledgeBase: List of matching knowledge baseserror: Error if search fails
Perform advanced search on knowledge bases with multiple criteria.
- Parameters:
ctx context.Context: Context for the operationparams KnowledgeBaseSearchParams: Search parameters including:Keyword: Keyword to search in name and description (optional)Name: Name keyword (optional)Description: Description keyword (optional)ModelID: Model ID for exact matching (optional)CreatorID: Creator ID for filtering (optional)Page: Page number (starting from 1)PageSize: Number of items per pageOrderBy: Sorting criteria (e.g., "created_at DESC")
- Returns:
[]KnowledgeBase: List of matching knowledge basesint64: Total number of matching knowledge baseserror: Error if search fails
Adds a text document to a knowledge base and immediately chunks it.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge basename string: Document namecontent string: Document contentchunkConfig ChunkConfig: Chunking configuration
- Returns:
*Document: Added documenterror: Error if addition fails
Adds a document with metadata to a knowledge base and chunks it.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge basename string: Document namecontent string: Document contentcontentType string: Content MIME typemetadata map[string]string: Document metadatachunkConfig ChunkConfig: Chunking configuration
- Returns:
*Document: Added documenterror: Error if addition fails
Retrieves a document by ID.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
*Document: Retrieved documenterror: Error if retrieval fails
Retrieves a document with its chunks.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
*Document: Retrieved document with chunkserror: Error if retrieval fails
Deletes a document and its chunks.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if deletion fails
Updates a document's content and re-chunks it.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document IDnewContent string: New document contentchunkConfig ChunkConfig: Chunking configuration
- Returns:
error: Error if update fails
Retrieves a document's metadata.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
map[string]string: Document metadataerror: Error if retrieval fails
Adds a file to a knowledge base, extracts its content, and chunks it.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge basefileName string: Name of the filefileData []byte: File datatikaConfig TikaConfig: Apache Tika configurationchunkConfig ChunkConfig: Chunking configuration
- Returns:
*Document: Added documenterror: Error if addition fails
Adds a file from an io.Reader to a knowledge base.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge basefileName string: Name of the filereader io.Reader: File data readertikaConfig TikaConfig: Apache Tika configurationchunkConfig ChunkConfig: Chunking configuration
- Returns:
*Document: Added documenterror: Error if addition fails
Adds a file from an HTTP multipart upload to a knowledge base.
- Parameters:
ctx context.Context: Context for the operationkbID string: ID of the knowledge basefile *multipart.FileHeader: Uploaded filetikaConfig TikaConfig: Apache Tika configurationchunkConfig ChunkConfig: Chunking configuration
- Returns:
*Document: Added documenterror: Error if addition fails
Extracts content and metadata from a file using Apache Tika.
- Parameters:
ctx context.Context: Context for the operationfileName string: Name of the filefileData []byte: File datatikaConfig TikaConfig: Apache Tika configuration
- Returns:
*FileContent: Extracted content and metadataerror: Error if extraction fails
Extracts content and metadata from a file using io.Reader.
- Parameters:
ctx context.Context: Context for the operationfileName string: Name of the filereader io.Reader: File data readertikaConfig TikaConfig: Apache Tika configuration
- Returns:
*FileContent: Extracted content and metadataerror: Error if extraction fails
Extracts content and metadata from an HTTP multipart uploaded file.
- Parameters:
ctx context.Context: Context for the operationfile *multipart.FileHeader: Uploaded filetikaConfig TikaConfig: Apache Tika configuration
- Returns:
*FileContent: Extracted content and metadataerror: Error if extraction fails
Extracts content and metadata from a file at a given URL.
- Parameters:
ctx context.Context: Context for the operationfileURL string: URL of the filetikaConfig TikaConfig: Apache Tika configuration
- Returns:
*FileContent: Extracted content and metadataerror: Error if extraction fails
Extracts metadata from a file without storing it.
- Parameters:
ctx context.Context: Context for the operationfileName string: Name of the filefileData []byte: File datatikaConfig TikaConfig: Apache Tika configuration
- Returns:
map[string]string: File metadataerror: Error if extraction fails
Extracts metadata from a file using io.Reader without storing it.
- Parameters:
ctx context.Context: Context for the operationfileName string: Name of the filereader io.Reader: File data readertikaConfig TikaConfig: Apache Tika configuration
- Returns:
map[string]string: File metadataerror: Error if extraction fails
Extracts metadata from an HTTP multipart uploaded file without storing it.
- Parameters:
ctx context.Context: Context for the operationfile *multipart.FileHeader: Uploaded filetikaConfig TikaConfig: Apache Tika configuration
- Returns:
map[string]string: File metadataerror: Error if extraction fails
Performs vector similarity search.
- Parameters:
ctx context.Context: Context for the operationparams SearchParams: Search parameters including:Query: Search query textTopK: Maximum number of results to returnSimilarityThreshold: Minimum similarity score (0-1)CreatorID: Creator ID for filtering results (optional)KBID: Knowledge base ID to limit search scope (optional)
- Returns:
[]SearchResult: Search resultserror: Error if search fails
Performs traditional full-text search.
- Parameters:
ctx context.Context: Context for the operationquery string: Search querylimit int: Maximum number of resultscreatorID string: Creator ID for filtering results (optional)kbID string: Knowledge base ID to limit search scope (optional)
- Returns:
[]SearchResult: Search resultserror: Error if search fails
Performs hybrid search (vector + full-text).
- Parameters:
ctx context.Context: Context for the operationparams SearchParams: Search parameters including:Query: Search query textTopK: Maximum number of results to returnSimilarityThreshold: Minimum similarity score (0-1)CreatorID: Creator ID for filtering results (optional)KBID: Knowledge base ID to limit search scope (optional)
- Returns:
[]SearchResult: Search resultserror: Error if search fails
Generates a vector embedding for text.
- Parameters:
ctx context.Context: Context for the operationtext string: Text to embed
- Returns:
[]float32: Vector embeddingerror: Error if generation fails
Generates vector embeddings for multiple texts in batch.
- Parameters:
ctx context.Context: Context for the operationtexts []string: Texts to embed
- Returns:
[][]float32: Vector embeddingserror: Error if generation fails
Retrieves the status of embedding generation.
- Parameters:
ctx context.Context: Context for the operation
- Returns:
*ChunkStatus: Status informationerror: Error if retrieval fails
Updates a chunk's vector embedding.
- Parameters:
ctx context.Context: Context for the operationchunk *Chunk: Chunk to updateembedding []float32: Vector embedding
- Returns:
error: Error if update fails
Updates multiple chunks' vector embeddings in batch.
- Parameters:
ctx context.Context: Context for the operationchunks []Chunk: Chunks to updateembeddings [][]float32: Vector embeddings
- Returns:
error: Error if update fails
Retrieves chunks pending embedding generation.
- Parameters:
ctx context.Context: Context for the operationlimit int: Maximum number of chunks
- Returns:
[]Chunk: Pending chunkserror: Error if retrieval fails
Updates a document's status.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document IDstatus string: New status
- Returns:
error: Error if update fails
Marks a document as successfully uploaded.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Marks a document as failed during upload.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Marks a document as successfully content-extracted.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Marks a document as failed during content extraction.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Marks a document as successfully chunked.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Marks a document as failed during chunking.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Marks a document as successfully indexed.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Marks a document as failed during indexing.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if marking fails
Checks if a document is ready for content extraction.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
bool: True if ready, false otherwiseerror: Error if check fails
Checks if a document is ready for chunking.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
bool: True if ready, false otherwiseerror: Error if check fails
Checks if a document is ready for indexing.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
bool: True if ready, false otherwiseerror: Error if check fails
Retrieves documents with a specific status.
- Parameters:
ctx context.Context: Context for the operationstatus string: Status to filter bylimit int: Maximum number of documents
- Returns:
[]Document: Documents with the specified statuserror: Error if retrieval fails
Retrieves documents waiting for content extraction.
- Parameters:
ctx context.Context: Context for the operationlimit int: Maximum number of documents
- Returns:
[]Document: Documents waiting for content extractionerror: Error if retrieval fails
Retrieves documents waiting for chunking.
- Parameters:
ctx context.Context: Context for the operationlimit int: Maximum number of documents
- Returns:
[]Document: Documents waiting for chunkingerror: Error if retrieval fails
Retrieves documents waiting for indexing.
- Parameters:
ctx context.Context: Context for the operationlimit int: Maximum number of documents
- Returns:
[]Document: Documents waiting for indexingerror: Error if retrieval fails
Checks if all chunks of a document are indexed.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
bool: True if all chunks are indexed, false otherwiseerror: Error if check fails
Updates a document's index status based on its chunks.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document ID
- Returns:
error: Error if update fails
Updates multiple document chunks.
- Parameters:
ctx context.Context: Context for the operationchunks []Chunk: Chunks to update
- Returns:
error: Error if update fails
Retrieves chunks that need indexing.
- Parameters:
ctx context.Context: Context for the operationlimit int: Maximum number of chunks
- Returns:
[]Chunk: Chunks needing indexingerror: Error if retrieval fails
Marks a chunk as indexed.
- Parameters:
ctx context.Context: Context for the operationdocID string: Document IDchunkIndex int: Chunk index
- Returns:
error: Error if marking fails
Compares the content of two chunks.
- Parameters:
chunk1 *Chunk: First chunkchunk2 *Chunk: Second chunk
- Returns:
bool: True if content is identical, false otherwise
Retrieves the GORM database connection.
- Returns:
*gorm.DB: Database connection
Retrieves the OpenAI client.
- Returns:
*openai.Client: OpenAI client
Retrieves the current embedding model name.
- Returns:
string: Embedding model name
Retrieves the model vector dimension.
- Returns:
int: Vector dimension
Converts a vector embedding to PostgreSQL vector format.
- Parameters:
embedding []float32: Vector embedding
- Returns:
string: PostgreSQL vector format
Returns default Tika configuration.
- Returns:
TikaConfig: Default Tika configuration with URL set to "http://localhost:9998"
- 创建测试数据库:
make setup-test-db- 运行所有测试:
make test- 运行搜索功能测试:
make test-search- 运行知识库测试:
make test-kb- 清理测试数据:
make clean-test-db使用提供的测试脚本进行完整的测试流程:
# 运行完整测试(包含设置和清理)
./scripts/test_search.sh --cleanup
# 仅设置测试环境
./scripts/test_search.sh --setup-only
# 仅清理测试数据
./scripts/test_search.sh --cleanup-only为确保测试的可靠性,所有测试都实现了数据隔离:
- 使用唯一标识符防止测试数据冲突
- 每个测试后自动清理创建的数据
- 建议使用专门的测试数据库
- 详细说明请参考 测试最佳实践指南
DocStatusUploadFailed: Upload failedDocStatusUploadSuccess: Upload successful, waiting for content extractionDocStatusExtractFailed: Content extraction failedDocStatusExtractSuccess: Content extraction successful, waiting for chunkingDocStatusSplitFailed: Chunking failedDocStatusSplitSuccess: Chunking successful, waiting for indexingDocStatusIndexFailed: Indexing failedDocStatusIndexSuccess: Indexing successful