feat: support customized multimodal preprocess configs.#1481
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a new MMMessageConfig structure to allow per-message configuration for multimodal inputs, such as pixel limits, frame counts, and FPS. These configurations are propagated from the API layer through the MMInput class to the image processors for GLM-4V and Qwen2-VL. The review feedback correctly identifies critical thread-safety issues in both Glm4VImageProcessor and Qwen2VLImageProcessor, where request-specific configurations are being stored in shared member variables, leading to potential race conditions in concurrent environments. Additionally, the logic for handling multiple mm_config instances within a single request needs to be clarified to avoid unintentional overwriting.
No description provided.