Qwen/Qwen3-VL-32B-Instruct

qwen Tool Calling Attachments Structured Output

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

Providers 3

Released Oct 21, 2025

Input Modalities text, image

Output Modalities text

Tarsk Use coding

Benchmarks

Available Providers (3)

Provider	Model ID	Input Cost	Output Cost	Context	Max Output
Kilo Gateway	`qwen/qwen3-vl-32b-instruct`	$0.10/MTok	$0.42/MTok	131.1K	32.8K
SiliconFlow (China)	`Qwen/Qwen3-VL-32B-Instruct`	$0.20/MTok	$0.60/MTok	262K	262K
SiliconFlow	`Qwen/Qwen3-VL-32B-Instruct`	$0.20/MTok	$0.60/MTok	262K	262K

Capabilities

Reasoning

Tool Calling

Attachments

Open Weights

Structured Output