Models
66 models
Claude Haiku 4.5
Fast and cost-efficient Claude model for latency-sensitive assistants, summaries, and high-volume production traffic.
Claude Opus 4.6
Most capable model for complex reasoning, analysis, and extended coding tasks. Excels at nuanced understanding and creative generation with 1M context.
Claude Opus 4.7
Anthropic's newest top-tier Claude for the most demanding reasoning, long-context analysis, and extended coding workloads.
Claude Sonnet 4.6
Best balance of intelligence and speed. Ideal for enterprise workloads, RAG, and agentic coding with strong instruction following.
DeepSeek Chat V3 0324
DeepSeek V3 chat snapshot for dependable coding, math, and general reasoning.
DeepSeek Chat V3.1
DeepSeek V3.1 chat model for balanced quality, cost, and coding support.
DeepSeek V3.1 Terminus
DeepSeek V3.1 Terminus model for stronger chat and technical reasoning workloads.
DeepSeek V3.2
High-performance open-weight model with exceptional coding and math abilities. Outstanding cost-efficiency for complex tasks.
DeepSeek V3.2 Exp
Experimental DeepSeek V3.2 model for previewing newer reasoning and chat behavior.
DeepSeek V4 Flash
Lower-latency DeepSeek V4 variant optimized for cost-sensitive chat and structured generation at scale.
DeepSeek V4 Pro
DeepSeek's higher-end V4 model for coding, math, and complex analysis with stronger quality than prior open-weight generations.
Gemini 3.1 Flash Lite Preview
Ultra-lightweight Google model for speed-sensitive applications with excellent cost performance and low latency.
Gemini 3.1 Pro Preview
Google's most capable model with native multimodal understanding, long context, and strong code generation abilities.
Gemma 3 12B IT
Small Gemma 3 instruction model for very low-cost chat and classification tasks.
Gemma 3 27B IT
Instruction-tuned Gemma 3 model for lightweight assistants and text generation.
Gemma 4 26B A4B IT
Compact instruction-tuned Gemma model for fast and low-cost production use.
Gemma 4 31B IT
Instruction-tuned Gemma model for economical chat, coding support, and analysis.
GLM-4.5 Air
Efficient GLM model for low-cost bilingual chat, summarization, and extraction.
GLM-4.6
Earlier GLM generation with dependable bilingual chat and instruction following.
GLM-4.7
Zhipu GLM model for balanced reasoning, coding support, and Chinese-English tasks.
GLM-4.7 Flash
Fast GLM flash model for inexpensive chat, classification, and structured output.
GLM-5
Zhipu AI's latest foundation model with strong bilingual capabilities and advanced reasoning for enterprise applications.
GLM-5.1
Newer Zhipu flagship with stronger bilingual reasoning and enterprise-oriented instruction following.
GPT OSS 120B
Large open-weight GPT OSS model for economical reasoning, coding, and general chat.
GPT OSS 20B
Smaller open-weight GPT OSS model for lightweight chat and high-volume routing.
GPT-5.4
Latest frontier model from OpenAI with enhanced reasoning, tool use, and multimodal understanding across text, images, and audio.
GPT-5.4 Mini
Small and fast model optimized for lightweight tasks. Excellent cost-efficiency for simple queries and high-volume applications.
GPT-5.5
OpenAI's newer frontier model with stronger reasoning depth, tool use, and reliability for demanding agentic tasks.
Grok 4 Fast
Fast Grok model for responsive assistants and cost-sensitive reasoning workloads.
Grok 4.1 Fast
Low-latency Grok model for fast chat, tool use, and general production workloads.
Grok 4.20
High-capacity Grok model for complex prompts, research, and long-context workflows.
Kimi K2.5
Moonshot AI's powerful model with strong long-context understanding and reasoning. Excels at document analysis and complex tasks.
Kimi K2.6
Updated Moonshot model with stronger long-context reasoning and higher output quality for research and document-heavy workflows.
Llama 3.1 70B Instruct
Larger Llama instruct model for stronger open-weight reasoning and generation.
Llama 3.1 8B Instruct
Small Llama instruct model for inexpensive chat, classification, and extraction.
Llama 3.3 70B Instruct
Updated 70B Llama instruct model for general-purpose chat and reasoning.
Llama 4 Maverick
Llama 4 model for long-context open-weight chat and multimodal-adjacent workflows.
MiMo V2 Flash
Xiaomi MiMo flash model for high-throughput, latency-sensitive production traffic.
MiMo V2 Omni
Omni-capable MiMo model for balanced multimodal-adjacent text and assistant workloads.
MiMo V2 Pro
Xiaomi MiMo pro model for stronger reasoning, coding, and instruction-following tasks.
MiMo V2.5
General-purpose MiMo model with 1M context and tiered pricing for longer prompts.
MiMo V2.5 Pro
Updated MiMo pro model for broad long-context reasoning and production assistants.
MiniMax M2.5
MiniMax's versatile model with balanced performance across reasoning, creative writing, and multilingual tasks.
MiniMax M2.7
Refreshed MiniMax model with broader context and balanced performance for multilingual chat, reasoning, and creative work.
Mistral Nemo
Efficient Mistral model for broad multilingual chat and economical generation.
Mistral Small 3.2 24B Instruct
Instruction-tuned Mistral Small model for compact reasoning and production assistants.
Nemotron 3 Nano 30B A3B
Compact NVIDIA Nemotron model for efficient chat and instruction following.
Nemotron 3 Super 120B A12B
Larger NVIDIA Nemotron model for stronger reasoning and production assistants.
Qwen 3 235B A22B 2507
Large Qwen mixture model for economical reasoning and long-context generation.
Qwen 3 30B A3B Instruct 2507
Compact Qwen mixture model for fast instruction following and practical chat.
Qwen 3 32B
Mid-sized Qwen 3 model for affordable reasoning, chat, and content generation.
Qwen 3 Coder
Qwen 3 coder model for software engineering, code repair, and technical analysis.
Qwen 3 Coder Next
Qwen coding model for code generation, refactoring, and agentic developer workflows.
Qwen 3 Coder Plus
Specialized coding model from Alibaba with strong code generation, debugging, and analysis capabilities.
Qwen 3 Next 80B A3B Instruct
Qwen Next mixture model for efficient reasoning, chat, and instruction following.
Qwen 3 VL 235B A22B Instruct
Qwen VL model for multimodal-oriented prompts and strong text instruction following.
Qwen 3.5 27B
Mid-sized Qwen 3.5 model for general chat, analysis, and structured output.
Qwen 3.5 35B A3B
Efficient Qwen mixture model balancing quality, cost, and multilingual coverage.
Qwen 3.5 397B A17B
Large Qwen 3.5 model for high-quality multilingual reasoning and synthesis.
Qwen 3.5 9B
Small Qwen 3.5 model for low-latency chat, routing, and simple transformations.
Qwen 3.5 Flash
Fast and cost-effective Qwen model optimized for high-throughput tasks. Supports up to 1M context with tiered pricing.
Qwen 3.5 Flash 02-23
Qwen 3.5 Flash snapshot for fast, low-cost multilingual assistant traffic.
Qwen 3.5 Plus
Alibaba's flagship language model with strong multilingual and reasoning capabilities. Supports up to 1M context with tiered pricing.
Qwen 3.5 Plus 02-15
Qwen 3.5 Plus snapshot with 1M context and tiered long-prompt pricing.
Qwen 3.6 Plus
Latest generation Qwen model with improved reasoning and instruction following capabilities.
Step 3.5 Flash
StepFun flash model for quick multilingual chat, extraction, and structured generation.