Qwen3 VL 32B Instruct

Tool Calling Attachments Open Weights Structured Output

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

Providers 2

Released Oct 21, 2025

Input Modalities text, image

Output Modalities text

Tarsk Use coding

Benchmarks

Available Providers (2)

Provider	Model ID	Input Cost	Output Cost	Context	Max Output	Docs
Kilo Gateway	`qwen/qwen3-vl-32b-instruct`	$0.10/MTok	$0.42/MTok	131.1K	32.8K
OpenRouter	`qwen/qwen3-vl-32b-instruct`	$0.10/MTok	$0.42/MTok	262.1K	32.8K

Capabilities

Reasoning

Tool Calling

Attachments

Open Weights

Structured Output