All Models

Qwen/Qwen2.5-VL-32B-Instruct

qwen Tool Calling Attachments Open Weights Structured Output

Qwen2.5-VL-32B is a multimodal vision-language model fine-tuned through reinforcement learning for enhanced mathematical reasoning, structured outputs, and visual problem-solving capabilities. It excels at visual analysis tasks, including object recognition, textual interpretation within images, and precise event localization in extended videos. Qwen2.5-VL-32B demonstrates state-of-the-art performance across multimodal benchmarks such as MMMU, MathVista, and VideoMME, while maintaining strong reasoning and clarity in text-based tasks like MMLU, mathematical problem-solving, and code generation.

Providers 3
Released Mar 24, 2025
Input Modalities text, image
Output Modalities text
Tarsk Use coding

Available Providers (3)

Provider Model ID Input Cost Output Cost Context Max Output Docs
Kilo Gateway qwen/qwen2.5-vl-32b-instruct $0.20/MTok $0.60/MTok 128K 16.4K
SiliconFlow Qwen/Qwen2.5-VL-32B-Instruct $0.27/MTok $0.27/MTok 131K 131K
SiliconFlow (China) Qwen/Qwen2.5-VL-32B-Instruct $0.27/MTok $0.27/MTok 131K 131K

Capabilities

Reasoning
Tool Calling
Attachments
Open Weights
Structured Output