Skip to content

feat: add Oracle AI Vector Search DocumentStore (oracle-haystack)#3096

Open
fede-kamel wants to merge 2 commits intodeepset-ai:mainfrom
fede-kamel:feat/oracle-document-store
Open

feat: add Oracle AI Vector Search DocumentStore (oracle-haystack)#3096
fede-kamel wants to merge 2 commits intodeepset-ai:mainfrom
fede-kamel:feat/oracle-document-store

Conversation

@fede-kamel
Copy link
Copy Markdown

Closes #3095

What this PR adds

A new integration package oracle-haystack (integrations/oracle/) that brings Oracle Database 23ai/26ai native vector search into the Haystack ecosystem.

Components

Class Description
OracleDocumentStore Full DocumentStore protocol — write, filter, delete, count, async, serialization
OracleEmbeddingRetriever @component with filter merging, run + arun
OracleConnectionConfig Dataclass with Secret-wrapped credentials, ADB-S wallet support

Oracle features used

  • VECTOR(dim, FLOAT32) column — native to Oracle 23ai/26ai, included in DB license
  • CREATE VECTOR INDEX … ORGANIZATION INMEMORY NEIGHBOR GRAPH (HNSW)
  • FETCH APPROX FIRST N ROWS ONLY — activates HNSW for approximate search
  • JSON column + JSON_VALUE(metadata, '$.key') for metadata filtering
  • VARCHAR2(64) primary key — stores any Haystack ID natively (UUID, SHA-256, custom)
  • Thin-mode wallet connections to Oracle Autonomous Database (no Instant Client needed)

Haystack filter grammar support

AND / OR / NOT, equality, comparison, in, not in — all translated to Oracle SQL WHERE fragments via _FilterTranslator.

Tests

  • 34 unit tests — no Oracle required, oracledb.create_pool mocked
  • 13 integration tests — guarded by pytest.mark.skipif(not os.getenv("ORACLE_DSN"), ...)
  • e2e smoke test (tests/e2e_real_data.py) — 200 SQuAD passages via fastembed → Oracle 26ai ADB

Validation results (Oracle AI Database 26ai, OCI Free Tier)

Writing 200 documents...
  Written: 200 docs in 1.1s (181 docs/sec)

Creating HNSW index...
  Index created in 0.7s

Query: "Who invented the telephone?"  [466ms]
  1. [0.3468] Nikola_Tesla — In 1881, Tesla moved to Budapest...
  2. [0.3624] Nikola_Tesla — Nikola Tesla was a Serbian American inventor...

Checklist

  • ruff check and ruff format — clean
  • 34 unit tests pass
  • Integration tests pass against live Oracle 26ai ADB
  • to_dict / from_dict roundtrip — passwords never serialized as plaintext
  • SQL injection prevention via _SAFE_TABLE_NAME regex on table name
  • Async variants for all public methods

New integration: oracle-haystack

Adds OracleDocumentStore backed by Oracle Database 23ai/26ai native
VECTOR type, plus OracleEmbeddingRetriever.

Key features:
- VECTOR(dim, FLOAT32) column with HNSW approximate search index
- Supports Oracle Autonomous Database (ADB-S) wallet connections
- All three DuplicatePolicy modes via INSERT / MERGE SQL
- Haystack filter grammar translated to JSON_VALUE WHERE clauses
- Full async support (asyncio.to_thread wrappers)
- Credentials handled via Haystack Secret — never serialised as plaintext
- 34 unit tests (mocked oracledb) + 13 integration tests (live Oracle)
- e2e validated: 200 SQuAD passages at 181 docs/sec write, <700ms query

Closes #<issue>
@fede-kamel fede-kamel requested a review from a team as a code owner April 2, 2026 16:06
@fede-kamel fede-kamel requested review from davidsbatista and removed request for a team April 2, 2026 16:06
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 2, 2026

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Apr 2, 2026
@fede-kamel
Copy link
Copy Markdown
Author

recheck

- Add pydoc/config_docusaurus.yml required by CI_check_api_ref workflow
- Add hatch docs/fmt scripts to [tool.hatch.envs.default]
- Use tag-pattern + git_describe_command matching repo conventions
- Add fallback-version = 0.1.0 for new integration with no tags yet
- Add [tool.hatch.envs.test] scripts for unit/integration test runs
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Oracle AI Vector Search DocumentStore

2 participants