qwen/qwen3-vl-8b-instruct

Tool Calling Attachments Open Weights Structured Output

Qwen3-VL-8B-Instruct is a multimodal vision-language model from the Qwen3-VL series, built for high-fidelity understanding and reasoning across text, images, and video. It features improved multimodal fusion with Interleaved-MRoPE for long-horizon...

Providers 3

Released Oct 15, 2025

Input Modalities text, image, video

Output Modalities text

Tarsk Use coding

Benchmarks

Available Providers (3)

Provider	Model ID	Input Cost	Output Cost	Context	Max Output
NovitaAI	`qwen/qwen3-vl-8b-instruct`	$0.08/MTok	$0.50/MTok	131.1K	32.8K
SiliconFlow (China)	`Qwen/Qwen3-VL-8B-Instruct`	$0.18/MTok	$0.68/MTok	262K	262K
SiliconFlow	`Qwen/Qwen3-VL-8B-Instruct`	$0.18/MTok	$0.68/MTok	262K	262K

Capabilities

Reasoning

Tool Calling

Attachments

Open Weights

Structured Output