All Models

MiMo V2 Omni

Reasoning Tool Calling Attachments Open Weights Structured Output

MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step planning, tool use, and code execution - making it well-suited for complex real-world tasks that span modalities. 256K context window.

Providers 3
Released Mar 18, 2026
Input Modalities text, image, video, audio, pdf
Output Modalities text
Tarsk Use coding

Available Providers (3)

Provider Model ID Input Cost Output Cost Context Max Output Docs
ZenMux xiaomi/mimo-v2-omni $0.40/MTok $2.00/MTok 265K 265K
Xiaomi mimo-v2-omni $0.40/MTok $2.00/MTok 256K 128K
OpenRouter xiaomi/mimo-v2-omni $0.40/MTok $2.00/MTok 262.1K 65.5K

Capabilities

Reasoning
Tool Calling
Attachments
Open Weights
Structured Output