feat(opensearch): Handle errors due to clause overflow caused by fuzziness#3068
feat(opensearch): Handle errors due to clause overflow caused by fuzziness#3068bogdankostic merged 4 commits intomainfrom
Conversation
Coverage report (opensearch)Click to see where and how coverage changed
This report was generated by python-coverage-comment-action |
||||||||||||||||||||||||
| documents = self._search_documents(search_params) | ||
| try: | ||
| documents = self._search_documents(search_params) | ||
| except TransportError as e: |
There was a problem hiding this comment.
In the request/discussion I see a RequestErrorinstead
Is this the correct exception to catch?
Unhandled exception
RequestError: RequestError(400, 'search_phase_execution_exception', 'too_many_clauses: maxClauseCount is set to 1024')
/home/haystackd/.local/lib/python3.12/site-packages/haystack/core/pipeline/pipeline.py, line 70, _run_component
/home/haystackd/.local/lib/python3.12/site-packages/haystack_integrations/components/retrievers/opensearch/bm25_retriever.py, line 269, run
/home/haystackd/.local/lib/python3.12/site-packages/haystack_integrations/components/retrievers/opensearch/bm25_retriever.py, line 266, run
There was a problem hiding this comment.
The issue was created when our integration still used version 2 of opensearch. My guess would be that the exception changed from version 2 to version 3. TransportError is what I get on my end with the latest OpenSearch version.
In any case, RequestError is a subclass of TransportError in opensearch-py (see the exception hierarchy). So catching TransportError also catches RequestError.
There was a problem hiding this comment.
perfect! thanks for the clarification 👍🏽
davidsbatista
left a comment
There was a problem hiding this comment.
looks good just doubtful about the correct exception to handle
Related Issues
Proposed Changes:
When a BM25 query with
fuzziness="AUTO"exceeds OpenSearch'smaxClauseCount(default 1024), the document store now catches the error and retries withfuzziness=0(exact matching). This applies to both the sync (_bm25_retrieval) and async (_bm25_retrieval_async) methods.The retry is intentionally skipped when:
fuzzinessis already0or"0"(retry would be redundant)custom_queryis providedA warning is logged when the fallback is triggered.
How did you test it?
Notes for the reviewer
Checklist
fix:,feat:,build:,chore:,ci:,docs:,style:,refactor:,perf:,test:.