Llama 3.2 11B Vision Instruct

llama Reasoning Tool Calling Attachments Open Weights Structured Output

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Providers 8

Released Sep 18, 2024

Input Modalities text, image, audio

Output Modalities text

Tarsk Use coding

Available Providers (8)

Provider	Model ID	Input Cost	Output Cost	Context	Max Output
OpenRouter	`meta-llama/llama-3.2-11b-vision-instruct`	$0.00/MTok	$0.00/MTok	131.1K	8.2K
Nvidia	`meta/llama-3.2-11b-vision-instruct`	$0.00/MTok	$0.00/MTok	128K	4.1K
GitHub Models	`meta/llama-3.2-11b-vision-instruct`	$0.00/MTok	$0.00/MTok	128K	8.2K
Cloudflare AI Gateway	`workers-ai/@cf/meta/llama-3.2-11b-vision-instruct`	$0.05/MTok	$0.68/MTok	128K	16.4K
Inference	`meta/llama-3.2-11b-vision-instruct`	$0.06/MTok	$0.06/MTok	16K	4.1K
Vercel AI Gateway	`meta/llama-3.2-11b`	$0.16/MTok	$0.16/MTok	128K	8.2K
Azure Cognitive Services	`llama-3.2-11b-vision-instruct`	$0.37/MTok	$0.37/MTok	128K	8.2K
Azure	`llama-3.2-11b-vision-instruct`	$0.37/MTok	$0.37/MTok	128K	8.2K

Capabilities

Reasoning

Tool Calling

Attachments

Open Weights

Structured Output