Skip to content

feat: add Tavily as configurable search engine in core search framework#891

Open
Tavily-FDE-Bot wants to merge 1 commit intomodelscope:mainfrom
Tavily-FDE:feat/tavily-migration/core-search-framework
Open

feat: add Tavily as configurable search engine in core search framework#891
Tavily-FDE-Bot wants to merge 1 commit intomodelscope:mainfrom
Tavily-FDE:feat/tavily-migration/core-search-framework

Conversation

@Tavily-FDE-Bot
Copy link

Summary

  • Added TavilySearch as a new search engine option alongside Exa, SerpAPI, and Arxiv
  • Follows the existing engine pattern: schema dataclasses + SearchEngine subclass + registration in all registries
  • Tavily results are mapped to the shared BaseResult schema for seamless downstream consumption

Files Changed

  • ms_agent/tools/search/search_base.py — Added TAVILY to SearchEngineType enum and 'tavily': 'tavily_search' to ENGINE_TOOL_NAMES
  • ms_agent/tools/search/tavily/__init__.py — New package init exporting TavilySearch, TavilySearchRequest, TavilySearchResult
  • ms_agent/tools/search/tavily/schema.py — New file with TavilySearchRequest and TavilySearchResult dataclasses
  • ms_agent/tools/search/tavily/search.py — New file with TavilySearch class using tavily-python SDK
  • ms_agent/tools/search/websearch_tool.py — Added 'tavily' to SUPPORTED_ENGINES, get_search_engine_class(), get_search_engine(), _api_keys init, and connect() engine instantiation
  • ms_agent/tools/search_engine.py — Added TAVILY branch in get_web_search_tool() factory and TAVILY_API_KEY support in env overrides
  • requirements/research.txt — Added tavily-python dependency

Dependency Changes

  • Added tavily-python to requirements/research.txt

Environment Variable Changes

  • Added TAVILY_API_KEY — required when using the Tavily engine

Notes for Reviewers

  • This is an additive change — all existing engines remain fully functional
  • Tavily can be selected via engine: tavily in agent YAML config or via the FIN_RESEARCH_SEARCH_ENGINE override
  • The TavilySearch.search() method maps Tavily API results (contentsummary, urlid) to the shared BaseResult schema

🤖 Generated with Claude Code

Automated Review

  • Passed after 1 attempt(s)
  • Final review: The Tavily migration is well-implemented and follows all existing patterns in the codebase. The additive strategy is respected: no existing engines are removed or altered. All factory functions (get_search_engine_class, get_search_engine, get_web_search_tool, WebSearchTool.connect) are correctly updated. The TavilySearch class, schema dataclasses, __init__.py, enum entry, tool-name mapping, and SUPPORTED_ENGINES tuple are all consistent with how Exa and SerpAPI are implemented. The tavily-python dependency is added to requirements/research.txt. The Tavily SDK is used correctly (TavilyClient.search(**kwargs) returning a dict with a results list). No critical or major issues found; only three minor documentation/style gaps.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces Tavily as a new, configurable search engine within the existing search framework. The change enhances the system's ability to perform AI-optimized web searches by providing an additional, high-relevance search option. The implementation adheres to current architectural patterns, ensuring that the new engine integrates smoothly and is easily configurable without impacting the functionality of other search providers.

Highlights

  • New Search Engine Integration: TavilySearch has been added as a new configurable search engine option, expanding the existing suite of Exa, SerpAPI, and Arxiv.
  • Standardized Implementation: The integration follows the established pattern of using schema dataclasses, a dedicated SearchEngine subclass, and registration across all relevant registries.
  • Unified Result Schema: Tavily search results are consistently mapped to the shared BaseResult schema, ensuring seamless consumption by downstream components.
  • Additive Change: This feature is purely additive, meaning all previously existing search engines remain fully functional and unaffected.
  • Configurability: Tavily can be selected as the search engine via agent YAML configuration or through the FIN_RESEARCH_SEARCH_ENGINE environment variable override.
  • New Dependency: The tavily-python library has been added to requirements/research.txt.
  • API Key Support: Support for TAVILY_API_KEY has been introduced, which is required when using the Tavily engine.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Tavily search engine integration into the existing search tool framework. This includes defining new data structures for Tavily search requests and results, implementing the Tavily search logic, and integrating it across various parts of the ms_agent search tool infrastructure, such as enums, mappings, and configuration. A new tavily-python dependency is also added. The review comments suggest several improvements: replacing assert with more robust error handling for API key validation, refining type hints for better type safety, and switching from print statements to the logger module for consistent warning and informational messages.

def __init__(self, api_key: str = None):

api_key = api_key or os.getenv('TAVILY_API_KEY')
assert api_key, 'TAVILY_API_KEY must be set either as an argument or as an environment variable'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Using assert for validating API keys is not recommended in production code, as assert statements can be optimized out by the Python interpreter, leading to unexpected behavior if the key is missing. A ValueError or RuntimeError provides a more robust and explicit error handling mechanism.

Suggested change
assert api_key, 'TAVILY_API_KEY must be set either as an argument or as an environment variable'
if not api_key:
raise ValueError('TAVILY_API_KEY must be set either as an argument or as an environment variable')

arguments: Dict[str, Any] = field(default_factory=dict)

# The response from the Tavily search API (dict with 'results' key)
response: SearchResponse = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for response should specify the generic type of SearchResponse. Since the TavilySearch.search method populates this with SearchResponse[BaseResult], the type hint here should reflect that for better type safety and clarity.

Suggested change
response: SearchResponse = None
response: Optional[SearchResponse[BaseResult]] = None

Convert the search results to a list of dictionaries.
"""
if not self.response or not self.response.results:
print('***Warning: No search results found.')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using print for warnings is not ideal in a library context. It's better to use the logger module for consistent logging practices, which allows for configurable output and severity levels.

Suggested change
print('***Warning: No search results found.')
logger.warning('No search results found.')

print('***Warning: No search results found.')
return []

if not self.query:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using print for warnings is not ideal in a library context. It's better to use the logger module for consistent logging practices, which allows for configurable output and severity levels.

Suggested change
if not self.query:
logger.warning('No query provided for search results.')

"""
with open(file_path, 'r', encoding='utf-8') as f:
data = json.load(f)
print(f'Search results loaded from {file_path}')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using print for informational messages is not ideal in a library context. It's better to use the logger module for consistent logging practices, which allows for configurable output and severity levels.

Suggested change
print(f'Search results loaded from {file_path}')
logger.info(f'Search results loaded from {file_path}')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant