feat: add MiniMax as configurable evaluation LLM provider (default M3) by octo-patch · Pull Request #351 · EvolvingLMMs-Lab/Otter

octo-patch · 2026-03-26T09:34:17Z

Summary

Add configurable evaluation LLM client (pipeline/benchmarks/utils/eval_llm.py) supporting OpenAI and MiniMax providers with auto-detection, temperature clamping, and think-tag stripping
Default MiniMax model is MiniMax-M3 (512K context, 128K max output, image input support); MiniMax-M2.7 and MiniMax-M2.7-highspeed remain available
Update MagnifierBench, MathVista, and MM-Vet evaluation datasets to use the configurable client instead of hardcoded OpenAI API calls, with backward-compatible eval_provider parameter
Update Syphus data generation pipeline with MiniMax provider documentation and temperature handling
Add 24 unit tests and 4 integration tests (all passing)

Motivation

The benchmark evaluation system (MagnifierBench, MathVista, MM-Vet) previously hardcoded OpenAI GPT-4 as the evaluation judge LLM. This PR makes the evaluation LLM configurable, enabling users to choose alternative providers like MiniMax M3 — the latest MiniMax model with 512K context and image input support — as a cost-effective evaluation backend.

Configuration

# For benchmark evaluation (defaults to MiniMax-M3)
export EVAL_LLM_PROVIDER="minimax"
export MINIMAX_API_KEY="your-key"

# For Syphus data generation (via liteLLM)
export MINIMAX_API_KEY="your-key"
export OPENAI_API_ENGINE="openai/MiniMax-M3"
export OPENAI_API_BASE="https://api.minimax.io/v1"

Or via YAML config:

datasets:
  - name: magnifierbench
    eval_provider: minimax
    api_key: your-key

To target a specific MiniMax model instead of the default:

client = EvalLLMClient(provider="minimax", api_key="your-key", model="MiniMax-M2.7")

Changes

File	Change
pipeline/benchmarks/utils/eval_llm.py	New configurable LLM client with provider registry; default MiniMax model is M3
pipeline/benchmarks/utils/init.py	Package init
pipeline/benchmarks/datasets/magnifierbench.py	Use EvalLLMClient, add eval_provider/eval_model params
pipeline/benchmarks/datasets/mathvista.py	Use EvalLLMClient, add eval_provider/eval_model params
pipeline/benchmarks/datasets/mmvet.py	Use EvalLLMClient, replace OpenAI() client
mimic-it/syphus/file_utils.py	Add MiniMax docs, temp clamping, query_llm() alias
unit_tests/test_eval_llm.py	24 unit tests
unit_tests/test_eval_llm_integration.py	4 integration tests
README.md	MiniMax badge, config docs

MiniMax model line

Model	Notes
`MiniMax-M3`	Default. 512K context, 128K max output, image input support
`MiniMax-M2.7`	Previous-generation model
`MiniMax-M2.7-highspeed`	Previous-generation low-latency variant

Test plan

24 unit tests passing (provider config, init, temp clamping, think-tag stripping, chat completion, retry logic)
4 integration tests passing against live MiniMax API (basic completion, judge yes/no, scoring, auto-detect)
Verify backward compatibility: existing OpenAI-based evaluation works unchanged when no eval_provider is set

Add support for MiniMax M2.7 as an alternative LLM provider for benchmark evaluation (MagnifierBench, MathVista, MM-Vet) and the Syphus data generation pipeline. Previously, evaluation judging was hardcoded to OpenAI GPT-4. Changes: - Add pipeline/benchmarks/utils/eval_llm.py: Configurable evaluation LLM client supporting OpenAI and MiniMax providers with auto-detection via environment variables, temperature clamping, and think-tag stripping - Update magnifierbench.py, mathvista.py, mmvet.py to use configurable eval LLM client with backward-compatible eval_provider parameter - Update Syphus file_utils.py with MiniMax provider documentation and temperature clamping when MINIMAX_API_KEY is set - Add 24 unit tests and 4 integration tests - Update README with MiniMax configuration docs and badge

- Update default model to MiniMax-M3 in PROVIDER_CONFIGS - Document MiniMax-M2.7 and MiniMax-M2.7-highspeed as alternative models - Update README badge and configuration examples to M3 - Update unit and integration tests to expect M3 as default - Update Syphus pipeline docstring OPENAI_API_ENGINE example to M3

PR Bot and others added 2 commits March 26, 2026 17:33

octo-patch changed the title ~~feat: add MiniMax as configurable evaluation LLM provider~~ feat: add MiniMax as configurable evaluation LLM provider (default M3) Jun 7, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add MiniMax as configurable evaluation LLM provider (default M3)#351

feat: add MiniMax as configurable evaluation LLM provider (default M3)#351
octo-patch wants to merge 2 commits into
EvolvingLMMs-Lab:mainfrom
octo-patch:feature/add-minimax-provider

octo-patch commented Mar 26, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

octo-patch commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Configuration

Changes

MiniMax model line

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

octo-patch commented Mar 26, 2026 •

edited

Loading