-
Notifications
You must be signed in to change notification settings - Fork 222
Bounding boxes #138
Copy link
Copy link
Open
Labels
enhancementNew feature or requestNew feature or requestmodelRequest to add / extend support for the model.Request to add / extend support for the model.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestmodelRequest to add / extend support for the model.Request to add / extend support for the model.
Type
Fields
Give feedbackNo fields configured for issues without a type.
Search before asking
Description
As far as I know, Qwen2.5-VL is the first open source multimodal model that can extract bounding boxes.
e.g. from https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb:
It would be great to support this so that other models can support this as well.
Use case
We would use this for generative process automation in https://github.com/OpenAdaptAI/OpenAdapt
Additional
No response
Are you willing to submit a PR?