Models

66 models

Claude Haiku 4.5

Fast and cost-efficient Claude model for latency-sensitive assistants, summaries, and high-volume production traffic.

Anthropic|1M context|70 Credits input/350 Credits output/7 Credits cache read/87.5 Credits cache write/ 1M tokens

Claude Opus 4.6

Most capable model for complex reasoning, analysis, and extended coding tasks. Excels at nuanced understanding and creative generation with 1M context.

Anthropic|1M context|350 Credits input/1,750 Credits output/35 Credits cache read/437.5 Credits cache write/ 1M tokens

Claude Opus 4.7

Anthropic's newest top-tier Claude for the most demanding reasoning, long-context analysis, and extended coding workloads.

Anthropic|1M context|350 Credits input/1,750 Credits output/35 Credits cache read/437.5 Credits cache write/ 1M tokens

Claude Sonnet 4.6

Best balance of intelligence and speed. Ideal for enterprise workloads, RAG, and agentic coding with strong instruction following.

Anthropic|1M context|210 Credits input/1,050 Credits output/21 Credits cache read/262.5 Credits cache write/ 1M tokens

DeepSeek Chat V3 0324

DeepSeek V3 chat snapshot for dependable coding, math, and general reasoning.

DeepSeek|164K context|20 Credits input/77 Credits output/13.5 Credits cache read/20 Credits cache write/ 1M tokens

DeepSeek Chat V3.1

DeepSeek V3.1 chat model for balanced quality, cost, and coding support.

DeepSeek|33K context|15 Credits input/75 Credits output/15 Credits cache read/15 Credits cache write/ 1M tokens

DeepSeek V3.1 Terminus

DeepSeek V3.1 Terminus model for stronger chat and technical reasoning workloads.

DeepSeek|164K context|21 Credits input/79 Credits output/13 Credits cache read/21 Credits cache write/ 1M tokens

DeepSeek V3.2

High-performance open-weight model with exceptional coding and math abilities. Outstanding cost-efficiency for complex tasks.

DeepSeek|128K context|19.6 Credits input/29.4 Credits output/1.96 Credits cache read/19.6 Credits cache write/ 1M tokens

DeepSeek V3.2 Exp

Experimental DeepSeek V3.2 model for previewing newer reasoning and chat behavior.

DeepSeek|164K context|27 Credits input/41 Credits output/27 Credits cache read/27 Credits cache write/ 1M tokens

DeepSeek V4 Flash

Lower-latency DeepSeek V4 variant optimized for cost-sensitive chat and structured generation at scale.

DeepSeek|1M context|9.8 Credits input/19.6 Credits output/1.96 Credits cache read/9.8 Credits cache write/ 1M tokens

DeepSeek V4 Pro

DeepSeek's higher-end V4 model for coding, math, and complex analysis with stronger quality than prior open-weight generations.

DeepSeek|1M context|121.8 Credits input/243.6 Credits output/10.15 Credits cache read/121.8 Credits cache write/ 1M tokens

Gemini 3.1 Flash Lite Preview

Ultra-lightweight Google model for speed-sensitive applications with excellent cost performance and low latency.

Google|1M context|17.5 Credits input/105 Credits output/1.75 Credits cache read/17.5 Credits cache write/ 1M tokens

Gemini 3.1 Pro Preview

Google's most capable model with native multimodal understanding, long context, and strong code generation abilities.

Google|1M context|Tiered pricing:0–200K:140 Credits input/840 Credits output/14 Credits cache read/140 Credits cache write·200K–1M:280 Credits input/1,260 Credits output/28 Credits cache read/280 Credits cache write/ 1M tokens

Gemma 3 12B IT

Small Gemma 3 instruction model for very low-cost chat and classification tasks.

Google|131K context|4 Credits input/13 Credits output/4 Credits cache read/4 Credits cache write/ 1M tokens

Gemma 3 27B IT

Instruction-tuned Gemma 3 model for lightweight assistants and text generation.

Google|131K context|8 Credits input/16 Credits output/8 Credits cache read/8 Credits cache write/ 1M tokens

Gemma 4 26B A4B IT

Compact instruction-tuned Gemma model for fast and low-cost production use.

Google|262K context|13 Credits input/40 Credits output/13 Credits cache read/13 Credits cache write/ 1M tokens

Gemma 4 31B IT

Instruction-tuned Gemma model for economical chat, coding support, and analysis.

Google|262K context|13 Credits input/38 Credits output/13 Credits cache read/13 Credits cache write/ 1M tokens

GLM-4.5 Air

Efficient GLM model for low-cost bilingual chat, summarization, and extraction.

Zhipu|131K context|13 Credits input/85 Credits output/2.5 Credits cache read/13 Credits cache write/ 1M tokens

GLM-4.6

Earlier GLM generation with dependable bilingual chat and instruction following.

Zhipu|205K context|39 Credits input/190 Credits output/39 Credits cache read/39 Credits cache write/ 1M tokens

GLM-4.7

Zhipu GLM model for balanced reasoning, coding support, and Chinese-English tasks.

Zhipu|203K context|40 Credits input/175 Credits output/8 Credits cache read/40 Credits cache write/ 1M tokens

GLM-4.7 Flash

Fast GLM flash model for inexpensive chat, classification, and structured output.

Zhipu|203K context|6 Credits input/40 Credits output/1 Credits cache read/6 Credits cache write/ 1M tokens

GLM-5

Zhipu AI's latest foundation model with strong bilingual capabilities and advanced reasoning for enterprise applications.

Zhipu|200K context|55 Credits input/176 Credits output/11 Credits cache read/55 Credits cache write/ 1M tokens

GLM-5.1

Newer Zhipu flagship with stronger bilingual reasoning and enterprise-oriented instruction following.

Zhipu|200K context|77 Credits input/242 Credits output/14.3 Credits cache read/77 Credits cache write/ 1M tokens

GPT OSS 120B

Large open-weight GPT OSS model for economical reasoning, coding, and general chat.

OpenAI|131K context|3.9 Credits input/19 Credits output/3.9 Credits cache read/3.9 Credits cache write/ 1M tokens

GPT OSS 20B

Smaller open-weight GPT OSS model for lightweight chat and high-volume routing.

OpenAI|131K context|3 Credits input/14 Credits output/3 Credits cache read/3 Credits cache write/ 1M tokens

GPT-5.4

Latest frontier model from OpenAI with enhanced reasoning, tool use, and multimodal understanding across text, images, and audio.

OpenAI|1M context|Tiered pricing:0–272K:175 Credits input/1,050 Credits output/17.5 Credits cache read/175 Credits cache write·272K–1M:350 Credits input/1,575 Credits output/35 Credits cache read/350 Credits cache write/ 1M tokens

GPT-5.4 Mini

Small and fast model optimized for lightweight tasks. Excellent cost-efficiency for simple queries and high-volume applications.

OpenAI|400K context|52.5 Credits input/315 Credits output/5.25 Credits cache read/52.5 Credits cache write/ 1M tokens

GPT-5.5

OpenAI's newer frontier model with stronger reasoning depth, tool use, and reliability for demanding agentic tasks.

OpenAI|1M context|Tiered pricing:0–272K:350 Credits input/2,100 Credits output/35 Credits cache read/350 Credits cache write·272K–1M:700 Credits input/3,150 Credits output/70 Credits cache read/700 Credits cache write/ 1M tokens

Grok 4 Fast

Fast Grok model for responsive assistants and cost-sensitive reasoning workloads.

xAI|2M context|20 Credits input/50 Credits output/5 Credits cache read/20 Credits cache write/ 1M tokens

Grok 4.1 Fast

Low-latency Grok model for fast chat, tool use, and general production workloads.

xAI|2M context|20 Credits input/50 Credits output/5 Credits cache read/20 Credits cache write/ 1M tokens

Grok 4.20

High-capacity Grok model for complex prompts, research, and long-context workflows.

xAI|2M context|Tiered pricing:0–200K:200 Credits input/600 Credits output/20 Credits cache read/200 Credits cache write·200K–2M:400 Credits input/1,200 Credits output/40 Credits cache read/400 Credits cache write/ 1M tokens

Kimi K2.5

Moonshot AI's powerful model with strong long-context understanding and reasoning. Excels at document analysis and complex tasks.

Moonshot|256K context|42 Credits input/210 Credits output/7 Credits cache read/42 Credits cache write/ 1M tokens

Kimi K2.6

Updated Moonshot model with stronger long-context reasoning and higher output quality for research and document-heavy workflows.

Moonshot|256K context|66.5 Credits input/280 Credits output/11.2 Credits cache read/66.5 Credits cache write/ 1M tokens

Llama 3.1 70B Instruct

Larger Llama instruct model for stronger open-weight reasoning and generation.

Meta|131K context|40 Credits input/40 Credits output/40 Credits cache read/40 Credits cache write/ 1M tokens

Llama 3.1 8B Instruct

Small Llama instruct model for inexpensive chat, classification, and extraction.

Meta|16K context|2 Credits input/5 Credits output/2 Credits cache read/2 Credits cache write/ 1M tokens

Llama 3.3 70B Instruct

Updated 70B Llama instruct model for general-purpose chat and reasoning.

Meta|66K context|10 Credits input/32 Credits output/10 Credits cache read/10 Credits cache write/ 1M tokens

Llama 4 Maverick

Llama 4 model for long-context open-weight chat and multimodal-adjacent workflows.

Meta|1M context|15 Credits input/60 Credits output/15 Credits cache read/15 Credits cache write/ 1M tokens

MiMo V2 Flash

Xiaomi MiMo flash model for high-throughput, latency-sensitive production traffic.

Xiaomi|262K context|9 Credits input/29 Credits output/4.5 Credits cache read/9 Credits cache write/ 1M tokens

MiMo V2 Omni

Omni-capable MiMo model for balanced multimodal-adjacent text and assistant workloads.

Xiaomi|262K context|40 Credits input/200 Credits output/8 Credits cache read/40 Credits cache write/ 1M tokens

MiMo V2 Pro

Xiaomi MiMo pro model for stronger reasoning, coding, and instruction-following tasks.

Xiaomi|1M context|Tiered pricing:0–256K:100 Credits input/300 Credits output/20 Credits cache read/100 Credits cache write·256K–1M:200 Credits input/600 Credits output/40 Credits cache read/200 Credits cache write/ 1M tokens

MiMo V2.5

General-purpose MiMo model with 1M context and tiered pricing for longer prompts.

Xiaomi|1M context|Tiered pricing:0–256K:40 Credits input/200 Credits output/8 Credits cache read/40 Credits cache write·256K–1M:80 Credits input/400 Credits output/16 Credits cache read/80 Credits cache write/ 1M tokens

MiMo V2.5 Pro

Updated MiMo pro model for broad long-context reasoning and production assistants.

MiniMax M2.5

MiniMax's versatile model with balanced performance across reasoning, creative writing, and multilingual tasks.

MiniMax|200K context|21 Credits input/84 Credits output/4.2 Credits cache read/26.25 Credits cache write/ 1M tokens

MiniMax M2.7

Refreshed MiniMax model with broader context and balanced performance for multilingual chat, reasoning, and creative work.

MiniMax|192K context|21 Credits input/84 Credits output/4.2 Credits cache read/26.25 Credits cache write/ 1M tokens

Mistral Nemo

Efficient Mistral model for broad multilingual chat and economical generation.

Mistral|131K context|2 Credits input/4 Credits output/2 Credits cache read/2 Credits cache write/ 1M tokens

Mistral Small 3.2 24B Instruct

Instruction-tuned Mistral Small model for compact reasoning and production assistants.

Mistral|128K context|7.5 Credits input/20 Credits output/7.5 Credits cache read/7.5 Credits cache write/ 1M tokens

Nemotron 3 Nano 30B A3B

Compact NVIDIA Nemotron model for efficient chat and instruction following.

NVIDIA|256K context|5 Credits input/20 Credits output/5 Credits cache read/5 Credits cache write/ 1M tokens

Nemotron 3 Super 120B A12B

Larger NVIDIA Nemotron model for stronger reasoning and production assistants.

NVIDIA|262K context|9 Credits input/45 Credits output/9 Credits cache read/9 Credits cache write/ 1M tokens

Qwen 3 235B A22B 2507

Large Qwen mixture model for economical reasoning and long-context generation.

Alibaba|262K context|10 Credits input/60 Credits output/10 Credits cache read/10 Credits cache write/ 1M tokens

Qwen 3 30B A3B Instruct 2507

Compact Qwen mixture model for fast instruction following and practical chat.

Alibaba|262K context|9 Credits input/30 Credits output/9 Credits cache read/9 Credits cache write/ 1M tokens

Qwen 3 32B

Mid-sized Qwen 3 model for affordable reasoning, chat, and content generation.

Alibaba|41K context|8 Credits input/24 Credits output/4 Credits cache read/8 Credits cache write/ 1M tokens

Qwen 3 Coder

Qwen 3 coder model for software engineering, code repair, and technical analysis.

Alibaba|262K context|22 Credits input/180 Credits output/22 Credits cache read/22 Credits cache write/ 1M tokens

Qwen 3 Coder Next

Qwen coding model for code generation, refactoring, and agentic developer workflows.

Alibaba|262K context|14 Credits input/80 Credits output/9 Credits cache read/14 Credits cache write/ 1M tokens

Qwen 3 Coder Plus

Specialized coding model from Alibaba with strong code generation, debugging, and analysis capabilities.

Alibaba|1M context|Tiered pricing:0–32K:40.18 Credits input/160.58 Credits output/4.018 Credits cache read/50.225 Credits cache write·32K–128K:60.27 Credits input/240.87 Credits output/6.027 Credits cache read/75.3375 Credits cache write·128K–256K:100.38 Credits input/401.45 Credits output/10.038 Credits cache read/125.475 Credits cache write·256K–1M:200.76 Credits input/2,006.97 Credits output/20.076 Credits cache read/250.95 Credits cache write/ 1M tokens

Qwen 3 Next 80B A3B Instruct

Qwen Next mixture model for efficient reasoning, chat, and instruction following.

Alibaba|262K context|9 Credits input/110 Credits output/9 Credits cache read/9 Credits cache write/ 1M tokens

Qwen 3 VL 235B A22B Instruct

Qwen VL model for multimodal-oriented prompts and strong text instruction following.

Alibaba|262K context|20 Credits input/88 Credits output/11 Credits cache read/20 Credits cache write/ 1M tokens

Qwen 3.5 27B

Mid-sized Qwen 3.5 model for general chat, analysis, and structured output.

Alibaba|262K context|30 Credits input/240 Credits output/30 Credits cache read/30 Credits cache write/ 1M tokens

Qwen 3.5 35B A3B

Efficient Qwen mixture model balancing quality, cost, and multilingual coverage.

Alibaba|262K context|25 Credits input/200 Credits output/25 Credits cache read/25 Credits cache write/ 1M tokens

Qwen 3.5 397B A17B

Large Qwen 3.5 model for high-quality multilingual reasoning and synthesis.

Alibaba|262K context|39 Credits input/234 Credits output/19.5 Credits cache read/39 Credits cache write/ 1M tokens

Qwen 3.5 9B

Small Qwen 3.5 model for low-latency chat, routing, and simple transformations.

Alibaba|262K context|10 Credits input/15 Credits output/10 Credits cache read/10 Credits cache write/ 1M tokens

Qwen 3.5 Flash

Fast and cost-effective Qwen model optimized for high-throughput tasks. Supports up to 1M context with tiered pricing.

Alibaba|1M context|Tiered pricing:0–128K:2.03 Credits input/20.09 Credits output/0.203 Credits cache read/2.5375 Credits cache write·128K–256K:8.05 Credits input/80.29 Credits output/0.805 Credits cache read/10.0625 Credits cache write·256K–1M:12.04 Credits input/120.4 Credits output/1.204 Credits cache read/15.05 Credits cache write/ 1M tokens

Qwen 3.5 Flash 02-23

Qwen 3.5 Flash snapshot for fast, low-cost multilingual assistant traffic.

Alibaba|1M context|10 Credits input/40 Credits output/1 Credits cache read/12.5 Credits cache write/ 1M tokens

Qwen 3.5 Plus

Alibaba's flagship language model with strong multilingual and reasoning capabilities. Supports up to 1M context with tiered pricing.

Alibaba|1M context|Tiered pricing:0–128K:8.05 Credits input/48.16 Credits output/0.805 Credits cache read/10.0625 Credits cache write·128K–256K:20.09 Credits input/120.4 Credits output/2.009 Credits cache read/25.1125 Credits cache write·256K–1M:40.11 Credits input/240.8 Credits output/4.011 Credits cache read/50.1375 Credits cache write/ 1M tokens

Qwen 3.5 Plus 02-15

Qwen 3.5 Plus snapshot with 1M context and tiered long-prompt pricing.

Alibaba|1M context|Tiered pricing:0–256K:40 Credits input/240 Credits output/4 Credits cache read/50 Credits cache write·256K–1M:50 Credits input/300 Credits output/5 Credits cache read/62.5 Credits cache write/ 1M tokens

Qwen 3.6 Plus

Latest generation Qwen model with improved reasoning and instruction following capabilities.

Alibaba|1M context|Tiered pricing:0–256K:19.32 Credits input/115.57 Credits output/1.932 Credits cache read/24.15 Credits cache write·256K–1M:77.07 Credits input/462.14 Credits output/7.707 Credits cache read/96.3375 Credits cache write/ 1M tokens

Step 3.5 Flash

StepFun flash model for quick multilingual chat, extraction, and structured generation.

StepFun|262K context|10 Credits input/30 Credits output/10 Credits cache read/10 Credits cache write/ 1M tokens