Skip to content

feat: add MiniMax as configurable evaluation LLM provider (default M3)#351

Open
octo-patch wants to merge 2 commits into
EvolvingLMMs-Lab:mainfrom
octo-patch:feature/add-minimax-provider
Open

feat: add MiniMax as configurable evaluation LLM provider (default M3)#351
octo-patch wants to merge 2 commits into
EvolvingLMMs-Lab:mainfrom
octo-patch:feature/add-minimax-provider

Conversation

@octo-patch
Copy link
Copy Markdown

@octo-patch octo-patch commented Mar 26, 2026

Summary

  • Add configurable evaluation LLM client (pipeline/benchmarks/utils/eval_llm.py) supporting OpenAI and MiniMax providers with auto-detection, temperature clamping, and think-tag stripping
  • Default MiniMax model is MiniMax-M3 (512K context, 128K max output, image input support); MiniMax-M2.7 and MiniMax-M2.7-highspeed remain available
  • Update MagnifierBench, MathVista, and MM-Vet evaluation datasets to use the configurable client instead of hardcoded OpenAI API calls, with backward-compatible eval_provider parameter
  • Update Syphus data generation pipeline with MiniMax provider documentation and temperature handling
  • Add 24 unit tests and 4 integration tests (all passing)

Motivation

The benchmark evaluation system (MagnifierBench, MathVista, MM-Vet) previously hardcoded OpenAI GPT-4 as the evaluation judge LLM. This PR makes the evaluation LLM configurable, enabling users to choose alternative providers like MiniMax M3 — the latest MiniMax model with 512K context and image input support — as a cost-effective evaluation backend.

Configuration

# For benchmark evaluation (defaults to MiniMax-M3)
export EVAL_LLM_PROVIDER="minimax"
export MINIMAX_API_KEY="your-key"

# For Syphus data generation (via liteLLM)
export MINIMAX_API_KEY="your-key"
export OPENAI_API_ENGINE="openai/MiniMax-M3"
export OPENAI_API_BASE="https://api.minimax.io/v1"

Or via YAML config:

datasets:
  - name: magnifierbench
    eval_provider: minimax
    api_key: your-key

To target a specific MiniMax model instead of the default:

client = EvalLLMClient(provider="minimax", api_key="your-key", model="MiniMax-M2.7")

Changes

File Change
pipeline/benchmarks/utils/eval_llm.py New configurable LLM client with provider registry; default MiniMax model is M3
pipeline/benchmarks/utils/init.py Package init
pipeline/benchmarks/datasets/magnifierbench.py Use EvalLLMClient, add eval_provider/eval_model params
pipeline/benchmarks/datasets/mathvista.py Use EvalLLMClient, add eval_provider/eval_model params
pipeline/benchmarks/datasets/mmvet.py Use EvalLLMClient, replace OpenAI() client
mimic-it/syphus/file_utils.py Add MiniMax docs, temp clamping, query_llm() alias
unit_tests/test_eval_llm.py 24 unit tests
unit_tests/test_eval_llm_integration.py 4 integration tests
README.md MiniMax badge, config docs

MiniMax model line

Model Notes
MiniMax-M3 Default. 512K context, 128K max output, image input support
MiniMax-M2.7 Previous-generation model
MiniMax-M2.7-highspeed Previous-generation low-latency variant

Test plan

  • 24 unit tests passing (provider config, init, temp clamping, think-tag stripping, chat completion, retry logic)
  • 4 integration tests passing against live MiniMax API (basic completion, judge yes/no, scoring, auto-detect)
  • Verify backward compatibility: existing OpenAI-based evaluation works unchanged when no eval_provider is set

PR Bot and others added 2 commits March 26, 2026 17:33
Add support for MiniMax M2.7 as an alternative LLM provider for benchmark
evaluation (MagnifierBench, MathVista, MM-Vet) and the Syphus data generation
pipeline. Previously, evaluation judging was hardcoded to OpenAI GPT-4.

Changes:
- Add pipeline/benchmarks/utils/eval_llm.py: Configurable evaluation LLM
  client supporting OpenAI and MiniMax providers with auto-detection via
  environment variables, temperature clamping, and think-tag stripping
- Update magnifierbench.py, mathvista.py, mmvet.py to use configurable
  eval LLM client with backward-compatible eval_provider parameter
- Update Syphus file_utils.py with MiniMax provider documentation and
  temperature clamping when MINIMAX_API_KEY is set
- Add 24 unit tests and 4 integration tests
- Update README with MiniMax configuration docs and badge
- Update default model to MiniMax-M3 in PROVIDER_CONFIGS
- Document MiniMax-M2.7 and MiniMax-M2.7-highspeed as alternative models
- Update README badge and configuration examples to M3
- Update unit and integration tests to expect M3 as default
- Update Syphus pipeline docstring OPENAI_API_ENGINE example to M3
@octo-patch octo-patch changed the title feat: add MiniMax as configurable evaluation LLM provider feat: add MiniMax as configurable evaluation LLM provider (default M3) Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant