Skip to content

feat(opensearch): Handle errors due to clause overflow caused by fuzziness#3068

Merged
bogdankostic merged 4 commits intomainfrom
opensearc_fuzziness_overflow
Apr 1, 2026
Merged

feat(opensearch): Handle errors due to clause overflow caused by fuzziness#3068
bogdankostic merged 4 commits intomainfrom
opensearc_fuzziness_overflow

Conversation

@bogdankostic
Copy link
Copy Markdown
Contributor

Related Issues

Proposed Changes:

When a BM25 query with fuzziness="AUTO" exceeds OpenSearch's maxClauseCount (default 1024), the document store now catches the error and retries with fuzziness=0 (exact matching). This applies to both the sync (_bm25_retrieval) and async (_bm25_retrieval_async) methods.

The retry is intentionally skipped when:

  • fuzziness is already 0 or "0" (retry would be redundant)
  • A custom_query is provided

A warning is logged when the fallback is triggered.

How did you test it?

  • Unit tests: Added 8 unit tests (4 sync + 4 async) covering:
    • the retry behavior
    • no-retry when fuzziness is already 0
    • no-retry with custom queries
    • re-raising of unrelated errors
  • Integration tests: Added 2 integration tests (sync + async) that reproduce the too_many_clauses error by indexing similar 5-character words and querying with fuzziness="AUTO", verifying the fallback succeeds.

Notes for the reviewer

Checklist

@bogdankostic bogdankostic requested a review from a team as a code owner March 31, 2026 13:19
@bogdankostic bogdankostic requested review from davidsbatista and removed request for a team March 31, 2026 13:19
@github-actions github-actions bot added integration:opensearch type:documentation Improvements or additions to documentation labels Mar 31, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 1, 2026

Coverage report (opensearch)

Click to see where and how coverage changed

FileStatementsMissingCoverageCoverage
(new stmts)
Lines missing
  integrations/opensearch/src/haystack_integrations/document_stores/opensearch
  document_store.py
Project Total  

This report was generated by python-coverage-comment-action

documents = self._search_documents(search_params)
try:
documents = self._search_documents(search_params)
except TransportError as e:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the request/discussion I see a RequestErrorinstead

Is this the correct exception to catch?

Unhandled exception
RequestError: RequestError(400, 'search_phase_execution_exception', 'too_many_clauses: maxClauseCount is set to 1024')

/home/haystackd/.local/lib/python3.12/site-packages/haystack/core/pipeline/pipeline.py, line 70, _run_component

/home/haystackd/.local/lib/python3.12/site-packages/haystack_integrations/components/retrievers/opensearch/bm25_retriever.py, line 269, run

/home/haystackd/.local/lib/python3.12/site-packages/haystack_integrations/components/retrievers/opensearch/bm25_retriever.py, line 266, run

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue was created when our integration still used version 2 of opensearch. My guess would be that the exception changed from version 2 to version 3. TransportError is what I get on my end with the latest OpenSearch version.

In any case, RequestError is a subclass of TransportError in opensearch-py (see the exception hierarchy). So catching TransportError also catches RequestError.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perfect! thanks for the clarification 👍🏽

Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good just doubtful about the correct exception to handle

Copy link
Copy Markdown
Contributor

@davidsbatista davidsbatista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@bogdankostic bogdankostic merged commit 6b39f60 into main Apr 1, 2026
9 checks passed
@bogdankostic bogdankostic deleted the opensearc_fuzziness_overflow branch April 1, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration:opensearch type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants