These are AWS default quotas for region us-east-1.
An account's applied value can be higher if a limit increase was approved.
Adjustable quotas can be raised via Service Quotas; many per-model token limits are adjustable, batch minimums are not.
Hover any number for the exact value.
Anthropic Claude
39 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Cross-region RPM | Cross-region TPM | Global RPM | Global TPM | Global TPD | Tokens/day | Lat-opt RPM | Lat-opt TPM | Mantle ITPM | Mantle OTPM |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Anthropic Claude 3 Haiku | 1K | 2M | 2K | 4M | — | — | — | 2.88B | — | — | — | — |
| Anthropic Claude 3 Opus | 50 | 400K | 100 | 800K | — | — | — | — | — | — | — | — |
| Anthropic Claude 3 Sonnet | 500 | 1M | 1K | 2M | — | — | — | — | — | — | — | — |
| Anthropic Claude 3.5 Haiku | 1K | 2M | 2K | 4M | — | — | — | 2.88B | 100 | 500K | — | — |
| Anthropic Claude 3.5 Sonnet V1 | 50 | 400K | 100 | 800K | — | — | — | 2.88B | — | — | — | — |
| Anthropic Claude 3.5 Sonnet V2 | 50 | 400K | 100 | 800K | — | — | — | 2.88B | — | — | — | — |
| Anthropic Claude 3.7 Sonnet V1 | — | — | 250 | 1M | — | — | — | 720M | — | — | — | — |
| Anthropic Claude Fable 5 | — | — | — | 200K | — | 500K | 720M | 144M | — | — | — | — |
| Anthropic Claude Haiku 4.5 | — | — | 10K | 5M | 10K | 5M | 7.2B | 3.6B | — | — | — | — |
| Anthropic Claude Opus 4 V1 | — | — | 200 | 200K | — | — | — | 144M | — | — | — | — |
| Anthropic Claude Opus 4.1 | — | — | 50 | 500K | — | — | — | 360M | — | — | — | — |
| Anthropic Claude Opus 4.5 | — | — | 10K | 2M | 10K | 2M | 2.88B | 1.44B | — | — | — | — |
| Anthropic Claude Opus 4.6 V1 | — | — | 10K | 3M | 10K | 3M | 4.32B | 2.16B | — | — | — | — |
| Anthropic Claude Opus 4.7 | — | — | — | 30M | — | 30M | 43.2B | 21.6B | — | — | 20M | 4M |
| Anthropic Claude Opus 4.8 | — | — | — | 30M | — | 30M | 43.2B | 21.6B | — | — | 20M | 4M |
| Anthropic Claude Sonnet 4 V1 | — | — | 200 | 200K | 200 | 200K | 288M | 144M | — | — | — | — |
| Anthropic Claude Sonnet 4 V1 1M Context Length | — | — | 5 | 1M | — | — | — | 720M | — | — | — | — |
| Anthropic Claude Sonnet 4.5 V1 | — | — | 10K | 5M | 10K | 5M | 7.2B | 3.6B | — | — | — | — |
| Anthropic Claude Sonnet 4.5 V1 1M Context Length | — | — | 1K | 1M | 1K | 1M | 1.44B | 720M | — | — | — | — |
| Anthropic Claude Sonnet 4.6 | — | — | 10K | 6M | 10K | 6M | 8.64B | 4.32B | — | — | — | — |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Anthropic Claude 3 Haiku | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude 3 Opus | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude 3 Sonnet | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude 3.5 Haiku | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude 3.5 Sonnet V1 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude 3.5 Sonnet V2 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude 3.7 Sonnet V1 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude Haiku 4.5 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude Opus 4.5 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude Opus 4.6 V1 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude Sonnet 4 V1 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude Sonnet 4.5 V1 | 100K | 100 | 100K | 1 | 5 | 100 |
| Anthropic Claude Sonnet 4.6 | 100K | 100 | 100K | 1 | 5 | 100 |
Provisioned throughput
| Model | MU/PT model |
|---|---|
| Anthropic Claude 3 Haiku 200K | 0 |
| Anthropic Claude 3 Haiku 48K | 0 |
| Anthropic Claude 3 Sonnet 200K | 0 |
| Anthropic Claude 3 Sonnet 28K | 0 |
| Anthropic Claude 3.5 Haiku 16K | 0 |
| Anthropic Claude 3.5 Haiku 200K | 0 |
| Anthropic Claude 3.5 Haiku 64K | 0 |
| Anthropic Claude 3.5 Sonnet 18K | 0 |
| Anthropic Claude 3.5 Sonnet 200K | 0 |
| Anthropic Claude 3.5 Sonnet 51K | 0 |
| Anthropic Claude 3.5 Sonnet V2 18K | 0 |
| Anthropic Claude 3.5 Sonnet V2 200K | 0 |
| Anthropic Claude 3.5 Sonnet V2 51K | 0 |
| Anthropic Claude Instant V1 100K | 0 |
| Anthropic Claude V2 100K | 0 |
| Anthropic Claude V2 18K | 0 |
| Anthropic Claude V2.1 18K | 0 |
| Anthropic Claude V2.1 200K | 0 |
Model customization
| Model | Train+val records |
|---|---|
| Anthropic Claude 3 Haiku | 10K |
| Claude 3-5-Haiku | 10K |
Amazon (Nova / Titan)
36 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Cross-region RPM | Cross-region TPM | Global RPM | Global TPM | Global TPD | Tokens/day | Lat-opt RPM | Lat-opt TPM | Lat-opt TPD | Concurrent reqs | Async concurrent |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Amazon Nova 2 Lite | — | — | 2K | 8M | 2K | 8M | 11.52B | 5.76B | — | — | — | — | — |
| Amazon Nova 2 Multimodal Embeddings V1 | 2K | — | — | — | — | — | — | — | — | — | — | — | 30 |
| Amazon Nova 2 Omni | — | — | 2K | 8M | 2K | 8M | 11.52B | 5.76B | — | — | — | — | — |
| Amazon Nova 2 Pro Preview | — | — | 100 | 1M | 100 | 1M | 1.44B | 720M | — | — | — | — | — |
| Amazon Nova 2 Sonic | — | — | — | — | — | — | — | — | — | — | — | 20 | — |
| Amazon Nova Canvas | 100 | — | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Nova Lite | 2K | 4M | 4K | 8M | — | — | — | 5.76B | — | — | — | — | — |
| Amazon Nova Micro | 2K | 4M | 4K | 8M | — | — | — | 5.76B | — | — | — | — | — |
| Amazon Nova Premier V1 | — | — | 500 | 2M | — | — | — | 1.44B | — | — | — | — | — |
| Amazon Nova Pro V1 | 250 | 1M | 500 | 2M | — | — | — | 1.44B | 10 | 40K | 57.6M | — | — |
| Amazon Nova Reel1.0 | — | — | — | — | — | — | — | — | — | — | — | 10 | — |
| Amazon Nova Reel1.1 | — | — | — | — | — | — | — | — | — | — | — | 3 | — |
| Amazon Nova Sonic | — | — | — | — | — | — | — | — | — | — | — | 20 | — |
| Amazon Rerank 1.0 | 200 | — | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Titan Image Generator G1 | 60 | — | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Titan Image Generator G1 V2 | 60 | 2K | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Titan Multimodal Embeddings G1 | 2K | 300K | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Titan Text Embeddings | 2K | 300K | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Titan Text Embeddings V2 | 6K | 300K | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Titan Text Express | 400 | 300K | — | — | — | — | — | — | — | — | — | — | — |
| Amazon Titan Text Premier | 100 | 300K | — | — | — | — | — | — | — | — | — | — | — |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) | Concurrent jobs (custom) |
|---|---|---|---|---|---|---|---|
| Amazon Nova 2 Lite | 100K | 100 | 100K | 1 | — | 100 | — |
| Amazon Nova 2 Multimodal Embeddings V1 | 100K | 100 | 100K | 1 | 100 | 100 | — |
| Amazon Nova Lite | 100K | 100 | 100K | 1 | 100 | 100 | — |
| Amazon Nova Micro | 100K | 100 | 100K | 1 | 5 | 100 | — |
| Amazon Nova Premier V1 | 100K | 100 | 100K | 1 | 5 | 100 | — |
| Amazon Nova Pro V1 | 100K | 100 | 100K | 1 | 100 | 100 | — |
| Amazon Titan Multimodal Embeddings G1 | 100K | 100 | 100K | 1 | 5 | 100 | 3 |
| Amazon Titan Text Embeddings V2 | 100K | 100 | 100K | 1 | 5 | 100 | 3 |
Provisioned throughput
| Model | MU/PT model | MU/PT (24k ctx) | MU/PT (128k ctx) | MU/PT (300k ctx) | MU (no commit) |
|---|---|---|---|---|---|
| Amazon Nova 2 Lite V1.0 256K | 0 | — | — | — | — |
| Amazon Nova Canvas | 0 | — | — | — | — |
| Amazon Nova Lite | — | 0 | — | 0 | — |
| Amazon Nova Micro | — | 0 | 0 | — | — |
| Amazon Nova Pro V1 | — | 0 | — | 0 | — |
| Amazon Titan Embeddings G1 - Text | 0 | — | — | — | — |
| Amazon Titan Image Generator G1 | 0 | — | — | — | — |
| Amazon Titan Image Generator G2 | 0 | — | — | — | — |
| Amazon Titan Lite V1 4K | 0 | — | — | — | — |
| Amazon Titan Multimodal Embeddings G1 | 0 | — | — | — | — |
| Amazon Titan Text Embeddings V2 | 0 | — | — | — | — |
| Amazon Titan Text G1 - Express 8K | 0 | — | — | — | — |
| Amazon Titan Text Premier V1 32K | 0 | — | — | — | — |
| base model Amazon Nova 2 Lite V1.0 256K | — | — | — | — | 0 |
| custom model Amazon Nova 2 Lite V1.0 256K | — | — | — | — | 0 |
Model customization
| Model | Train+val records | Custom deploy RPM | Custom deploy TPM | Custom deploy TPD | Max FT ctx length |
|---|---|---|---|---|---|
| Amazon Nova 2 Lite | 20K | 2K | 4M | 5.76B | — |
| Amazon Nova Lite | 20K | 2K | 4M | 5.76B | — |
| Amazon Nova Micro | 20K | 2K | 4M | 5.76B | — |
| Amazon Nova Micro V1 distillation customization jobs | — | — | — | — | 32K |
| Amazon Nova Pro V1 | 20K | 200 | 800K | 1.15B | — |
| Amazon Nova V1 distillation customization jobs | — | — | — | — | 32K |
| Amazon Titan Image Generator G1 | 10K | — | — | — | — |
| Amazon Titan Multimodal Embeddings G1 | 50K | — | — | — | — |
| Titan Text G1 - Express | 10K | — | — | — | — |
| Titan Text G1 - Express v1 Continued Pre-Training job | 100K | — | — | — | — |
| Titan Text G1 - Lite | 10K | — | — | — | — |
| Titan Text G1 - Lite v1 Continued Pre-Training job | 100K | — | — | — | — |
| Titan Text G1 - Premier | 20K | — | — | — | — |
Meta Llama
18 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Cross-region RPM | Cross-region TPM | Tokens/day | Lat-opt RPM | Lat-opt TPM |
|---|---|---|---|---|---|---|---|
| Meta Llama 3 70B Instruct | 400 | 300K | — | — | — | — | — |
| Meta Llama 3 8B Instruct | 800 | 300K | — | — | — | — | — |
| Meta Llama 3.1 70B Instruct | 400 | 300K | 800 | 600K | — | 100 | 40K |
| Meta Llama 3.1 8B Instruct | 800 | 300K | 1.6K | 600K | — | — | — |
| Meta Llama 3.2 11B Instruct | 400 | 300K | — | — | 432M | — | — |
| Meta Llama 3.2 1B Instruct | 800 | 300K | 1.6K | 600K | 432M | — | — |
| Meta Llama 3.2 3B Instruct | 800 | 300K | 1.6K | 600K | 432M | — | — |
| Meta Llama 3.2 90B Instruct | 400 | 300K | — | — | 432M | — | — |
| Meta Llama 3.3 70B Instruct | — | — | 800 | 600K | — | — | — |
| Meta Llama 4 Maverick V1 | — | — | 800 | 600K | 432M | — | — |
| Meta Llama 4 Scout V1 | — | — | 800 | 600K | 432M | — | — |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Llama 3.1 405B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 3.1 70B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 3.1 8B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 3.2 11B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 3.2 1B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 3.2 3B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 3.2 90B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 3.3 70B Instruct | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 4 Maverick V1 | 100K | 100 | 100K | 1 | 5 | 100 |
| Meta Llama 4 Scout V1 | 100K | 100 | 100K | 1 | 5 | 100 |
Provisioned throughput
| Model | MU/PT model | MU (commitment) |
|---|---|---|
| Meta Llama 2 13B | 0 | — |
| Meta Llama 2 70B | 0 | — |
| Meta Llama 2 Chat 13B | 0 | — |
| Meta Llama 2 Chat 70B | 0 | — |
| Meta Llama 3 70B Instruct | 0 | — |
| Meta Llama 3 8B Instruct | 0 | — |
| Meta Llama 4 Scout 17B Instruct 10M | — | 0 |
| Meta Llama 4 Scout 17B Instruct 128K | — | 0 |
Model customization
| Model | Train+val records |
|---|---|
| Meta Llama 2 13B | 10K |
| Meta Llama 2 70B | 10K |
Mistral AI
23 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Cross-region RPM | Cross-region TPM | Tokens/day |
|---|---|---|---|---|---|
| Magistral Small 1.2 | 10K | 100M | — | — | 144B |
| Ministral 14B 3.0 | 10K | 100M | — | — | 144B |
| Ministral 3B 3.0 | 10K | 100M | — | — | 144B |
| Ministral 8B 3.0 | 10K | 100M | — | — | 144B |
| Mistral AI Mistral 7B Instruct | 800 | 300K | — | — | 432M |
| Mistral AI Mistral Large | 400 | 300K | — | — | 432M |
| Mistral AI Mistral Small | 400 | 300K | — | — | 432M |
| Mistral AI Mixtral 8X7B Instruct | — | — | — | — | 432M |
| Mistral AI Mixtral 8X7BB Instruct | — | 300K | — | — | — |
| Mistral Devstral 2 123b | 10K | 100M | — | — | 144B |
| Mistral Large 3 | 10K | 100M | — | — | 144B |
| Mistral Mixtral 8x7b Instruct | 400 | — | — | — | — |
| Mistral Pixtral Large 25.02 V1 | — | — | 10 | 80K | 57.6M |
| Voxtral Mini 1.0 | 10K | 100M | — | — | 144B |
| Voxtral Small 1.0 | 10K | 100M | — | — | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Devstral 2 123B | 100K | 100 | 100K | 1 | 5 | 100 |
| Magistral Small 2509 | 100K | 100 | 100K | 1 | 5 | 100 |
| Ministral 3 14B | 100K | 100 | 100K | 1 | 5 | 100 |
| Ministral 3 8B | 100K | 100 | 100K | 1 | 5 | 100 |
| Ministral 3B | 100K | 100 | 100K | 1 | 5 | 100 |
| Mistral AI Mistral Small | 100K | 100 | 100K | 1 | 5 | 100 |
| Mistral Large 2 (24.07) | 100K | 100 | 100K | 1 | 5 | 100 |
| Mistral Large 3 | 100K | 100 | 100K | 1 | 5 | 100 |
| Voxtral Mini 3B 2507 | 100K | 100 | 100K | 1 | 5 | 100 |
| Voxtral Small 24B 2507 | 100K | 100 | 100K | 1 | 5 | 100 |
Provisioned throughput
| Model | MU/PT model |
|---|---|
| Mistral AI Mistral Small | 0 |
Cohere
6 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Cross-region RPM | Cross-region TPM | Global RPM | Global TPM | Global TPD | Tokens/day |
|---|---|---|---|---|---|---|---|---|
| Cohere Command R | 400 | 300K | — | — | — | — | — | — |
| Cohere Command R Plus | 400 | 300K | — | — | — | — | — | — |
| Cohere Embed English | 2K | 300K | — | — | — | — | — | — |
| Cohere Embed Multilingual | 2K | 300K | — | — | — | — | — | — |
| Cohere Embed V4 | 1K | 150K | 2K | 300K | 2K | 300K | 432M | 216M |
| Cohere Rerank 3.5 | 250 | — | — | — | — | — | — | — |
Provisioned throughput
| Model | MU/PT model |
|---|---|
| Cohere Command R | 0 |
| Cohere Command R Plus | 0 |
| Cohere Embed English | 0 |
| Cohere Embed Multilingual | 0 |
AI21 Labs
4 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day |
|---|---|---|---|
| AI21 Labs Jamba 1.5 Large | 100 | 300K | 432M |
| AI21 Labs Jamba 1.5 Mini | 100 | 300K | 432M |
Provisioned throughput
| Model | MU/PT model |
|---|---|
| AI21 Labs Jurassic-2 Mid | 0 |
| AI21 Labs Jurassic-2 Ultra | 0 |
DeepSeek
2 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Cross-region RPM | Cross-region TPM | Tokens/day |
|---|---|---|---|---|---|
| DeepSeek R1 V1 | — | — | 200 | 200K | 144M |
| DeepSeek V3.2 | 10K | 100M | — | — | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| DeepSeek V3.2 | 100K | 100 | 100K | 1 | 5 | 100 |
OpenAI
6 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day | Mantle ITPM | Mantle OTPM |
|---|---|---|---|---|---|
| GPT-5.4 | — | — | — | 20M | 4M |
| GPT-5.5 | — | — | — | 5M | 1M |
| OpenAI GPT OSS 120b | 10K | 100M | 144B | — | — |
| OpenAI GPT OSS 20b | 10K | 100M | 144B | — | — |
| OpenAI GPT OSS Safeguard 120b | 10K | 100M | 144B | — | — |
| OpenAI GPT OSS Safeguard 20b | 10K | 100M | 144B | — | — |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| OpenAI GPT OSS 120b | 100K | 100 | 100K | 1 | 5 | 100 |
| OpenAI GPT OSS 20b | 100K | 100 | 100K | 1 | 5 | 100 |
| OpenAI GPT OSS Safeguard 120b | 100K | 100 | 100K | 1 | 5 | 100 |
| OpenAI GPT OSS Safeguard 20b | 100K | 100 | 100K | 1 | 5 | 100 |
Qwen
8 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day |
|---|---|---|---|
| Qwen3 32B V1 | 10K | 100M | 144B |
| Qwen3 Coder 30B a3b V1 | 10K | 100M | 144B |
| Qwen3 Coder Next | 10K | 100M | 144B |
| Qwen3 Next 80B A3B | 10K | 100M | 144B |
| Qwen3 VL 235B A22B | 10K | 100M | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Qwen3 32B V1 | 100K | 100 | 100K | 1 | 5 | 100 |
| Qwen3 Coder 30B | 100K | 100 | 100K | 1 | 5 | 100 |
| Qwen3 Coder Next | 100K | 100 | 100K | 1 | 5 | 100 |
| Qwen3 Next 80B | 100K | 100 | 100K | 1 | 5 | 100 |
| Qwen3 VL 235B | 100K | 100 | 100K | 1 | 5 | 100 |
Z.ai (GLM)
5 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day |
|---|---|---|---|
| Z.ai GLM 5 | 10K | 100M | 144B |
| Z.ai GLM-4.7 | 10K | 100M | 144B |
| Z.ai GLM-4.7 Flash | 10K | 100M | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| GLM 4.7 | 100K | 100 | 100K | 1 | 5 | 100 |
| GLM 4.7 Flash | 100K | 100 | 100K | 1 | 5 | 100 |
| Z.ai GLM 5 | 100K | 100 | 100K | 1 | 5 | 100 |
Writer
3 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Cross-region RPM | Cross-region TPM | Tokens/day |
|---|---|---|---|---|---|
| Writer AI Palmyra X4 V1 | — | — | 10 | 150K | 108M |
| Writer AI Palmyra X5 V1 | — | — | 10 | 150K | 108M |
| Writer Palmyra Vision 7B | 10K | 100M | — | — | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Writer Palmyra Vision 7B | 100K | 100 | 100K | 1 | 5 | 100 |
TwelveLabs
3 modelsInference rate limits
| Model | On-demand RPM | Cross-region RPM | Concurrent reqs | Async concurrent |
|---|---|---|---|---|
| Twelve Labs Marengo | 100 | 200 | 30 | — |
| Twelve Labs Pegasus | 60 | 120 | 30 | — |
| TwelveLabs Marengo Embed 3.0 | 500 | 1K | — | 10 |
Stability AI
15 modelsInference rate limits
| Model | On-demand RPM | Cross-region RPM |
|---|---|---|
| Stable Image Conservative Upscale | 2 | 4 |
| Stable Image Control Sketch | 10 | 20 |
| Stable Image Control Structure | 10 | 20 |
| Stable Image Creative Upscale | 2 | 4 |
| Stable Image Erase Object | 10 | 20 |
| Stable Image Fast Upscale | 10 | 20 |
| Stable Image Inpaint | 10 | 20 |
| Stable Image Outpaint | 2 | 4 |
| Stable Image Remove Background | 10 | 20 |
| Stable Image Search and Recolor | 10 | 20 |
| Stable Image Search and Replace | 10 | 20 |
| Stable Image Style Guide | 10 | 20 |
| Stable Image Style Transfer | 10 | 20 |
Provisioned throughput
| Model | MU/PT model |
|---|---|
| Stability.ai Stable Diffusion XL 0.8 | 0 |
| Stability.ai Stable Diffusion XL 1.0 | 0 |
Google Gemma
3 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day |
|---|---|---|---|
| Gemma 3 12B | 10K | 100M | 144B |
| Gemma 3 27B | 10K | 100M | 144B |
| Gemma 3 4B | 10K | 100M | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Gemma 3 12B | 100K | 100 | 100K | 1 | 5 | 100 |
| Gemma 3 27B | 100K | 100 | 100K | 1 | 5 | 100 |
| Gemma 3 4B | 100K | 100 | 100K | 1 | 5 | 100 |
NVIDIA
6 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day |
|---|---|---|---|
| NVIDIA Nemotron 3 Super 120B A12B | 10K | 100M | 144B |
| NVIDIA Nemotron Nano 2 | 10K | 100M | 144B |
| NVIDIA Nemotron Nano 2 VL | 10K | 100M | 144B |
| NVIDIA Nemotron Nano 3 30B | 10K | 100M | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| NVIDIA Nemotron 3 Super 120B A12B | 100K | 100 | 100K | 1 | 5 | 100 |
| NVIDIA Nemotron Nano 12B | 100K | 100 | 100K | 1 | 5 | 100 |
| NVIDIA Nemotron Nano 3 30B | 100K | 100 | 100K | 1 | 5 | 100 |
| NVIDIA Nemotron Nano 9B | 100K | 100 | 100K | 1 | 5 | 100 |
MiniMax
3 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day |
|---|---|---|---|
| Minimax M2 | 10K | 100M | 144B |
| Minimax M2.1 | 10K | 100M | 144B |
| MiniMax M2.5 | 10K | 100M | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Minimax M2 | 100K | 100 | 100K | 1 | 5 | 100 |
| Minimax M2.1 | 100K | 100 | 100K | 1 | 5 | 100 |
| MiniMax M2.5 | 100K | 100 | 100K | 1 | 5 | 100 |
Other
5 modelsInference rate limits
| Model | On-demand RPM | On-demand TPM | Tokens/day |
|---|---|---|---|
| Kimi K2 Thinking | 10K | 100M | 144B |
| Moonshot AI Kimi K2.5 | 10K | 100M | 144B |
Batch inference
| Model | Max records/job | Min records/job | Records/input file | Input file GB | Job size GB | Concurrent jobs (base) |
|---|---|---|---|---|---|---|
| Kimi K2 Thinking | 100K | 100 | 100K | 1 | 5 | 100 |
| Kimi K2.5 | 100K | 100 | 100K | 1 | 5 | 100 |
Provisioned throughput
| Model | MU (commitment) |
|---|---|
| Meta Maverick 4 Scout 17B Instruct 128K | 0 |
| Meta Maverick 4 Scout 17B Instruct 1M | 0 |
Account & API quotas
288 quotasService-wide limits not tied to a specific model — feature capacities, control-plane API request rates, and customization account limits.
Model customization (9)
| Quota | Default value |
|---|---|
| Custom models per account | 100 |
| In-progress custom model deployments | 2 |
| Maximum input file size for distillation customization jobs | 2 |
| Maximum line length for distillation customization jobs | 16 |
| Maximum number of prompts for distillation customization jobs | 15K |
| Maximum number of training records for an Amazon Nova Canvas Fine-tuning job | 10K |
| Minimum number of prompts for distillation customization jobs | 100 |
| Scheduled customization jobs | 10 |
| Total number of custom model deployments | 10 |
Knowledge Bases (37)
| Quota | Default value |
|---|---|
| Concurrent ingestion jobs per account | 5 |
| Concurrent ingestion jobs per data source | 1 |
| Concurrent ingestion jobs per knowledge base | 1 |
| Concurrent IngestKnowledgeBaseDocuments and DeleteKnowledgeBaseDocuments requests per account | 10 |
| CreateDataSource requests per second | 2 |
| CreateKnowledgeBase requests per second | 2 |
| Data sources per knowledge base | 5 |
| DeleteDataSource requests per second | 2 |
| DeleteKnowledgeBase requests per second | 2 |
| DeleteKnowledgeBaseDocuments requests per second | 5 |
| Files to add or update per ingestion job | 5M |
| Files to delete per ingestion job | 5M |
| Files to ingest per IngestKnowledgeBaseDocuments job. | 25 |
| GenerateQuery requests per second | 2 |
| GetDataSource requests per second | 10 |
| GetIngestionJob requests per second | 10 |
| GetKnowledgeBase requests per second | 10 |
| GetKnowledgeBaseDocuments requests per second | 5 |
| Ingestion job file size with text content | 50 |
| Ingestion job size | 100 |
| IngestKnowledgeBaseDocuments requests per second | 5 |
| IngestKnowledgeBaseDocuments total payload size | 6 |
| Knowledge bases per account | 100 |
| ListDataSources requests per second | 10 |
| ListIngestionJobs requests per second | 10 |
| ListKnowledgeBaseDocuments requests per second | 5 |
| ListKnowledgeBases requests per second | 10 |
| Maximum number of files for BDA parser | 1K |
| Maximum number of files for Foundation Models as a parser | 1K |
| Rerank requests per second | 10 |
| Retrieve requests per second | 20 |
| RetrieveAndGenerate requests per second | 20 |
| RetrieveAndGenerateStream requests per second | 20 |
| StartIngestionJob requests per second | 0.1 |
| UpdateDataSource requests per second | 2 |
| UpdateKnowledgeBase requests per second | 2 |
| User query size | 1K |
Data Automation (39)
| Quota | Default value |
|---|---|
| (Console) Maximum document file size (MB) | 200 |
| (Console) Maximum number of pages per document file | 20 |
| CreateBlueprint - Max number of blueprints per account | 350 |
| CreateBlueprintVersion - Max number of Blueprint versions per Blueprint | 10 |
| CreateDataAutomationLibrary - Max number of data automation libraries per account | 10 |
| Description length for fields (Characters) | 300 |
| InvokeBlueprintOptimizationAsync - Max number of blueprint optimization concurrent jobs | 3 |
| InvokeBlueprintOptimizationAsync - Max number of blueprint optimization jobs per day | 30 |
| InvokeDataAutomation(Sync) - Document - Max number of requests | 60 |
| InvokeDataAutomation(Sync) - Image - Max number of requests | 200 |
| InvokeDataAutomationAsync - Audio - Max number of concurrent jobs | 20 |
| InvokeDataAutomationAsync - Document - Max number of concurrent jobs | 25 |
| InvokeDataAutomationAsync - Image - Max number of concurrent jobs | 20 |
| InvokeDataAutomationAsync - Max number of open jobs | 1.8K |
| InvokeDataAutomationAsync - Video - Max number of concurrent jobs | 20 |
| Max number of vocabulary phrases per library | 500 |
| Maximum audio file size (MB) | 2K |
| Maximum audio length (Minutes) | 240 |
| Maximum Audio Sample Rate (Hz) | 48K |
| Maximum Blueprints per Project (Audios) | 1 |
| Maximum Blueprints per Project (Documents) | 40 |
| Maximum Blueprints per Project (Images) | 1 |
| Maximum Blueprints per Project (Videos) | 1 |
| Maximum document file size (MB) | 500 |
| Maximum image file size (MB) | 5 |
| Maximum instruction field length for Audio Blueprint - (Characters) | 500 |
| Maximum JSON Blueprint Size (Characters) | 100K |
| Maximum Levels of Field Hierarchy | 1 |
| Maximum number of Blueprints per Start Inference request (Audios) | 1 |
| Maximum number of Blueprints per Start Inference request (Documents) | 10 |
| Maximum number of Blueprints per Start Inference request (Images) | 1 |
| Maximum number of Blueprints per Start Inference request (Videos) | 1 |
| Maximum number of list fields per Blueprint | 15 |
| Maximum Number of pages per document | 3K |
| Maximum Resolution | 8K |
| Maximum video file size (MB) | 10.2K |
| Maximum video length (Minutes) | 240 |
| Minimum audio length (Miliseconds) | 500 |
| Minimum Audio Sample Rate (Hz) | 8K |
Automated Reasoning (36)
| Quota | Default value |
|---|---|
| Annotations in policy | 10 |
| CancelAutomatedReasoningPolicyBuildWorkflow requests per second | 5 |
| Concurrent builds per policy | 2 |
| Concurrent policy builds per account | 5 |
| CreateAutomatedReasoningPolicy requests per second | 5 |
| CreateAutomatedReasoningPolicyTestCase requests per second | 5 |
| CreateAutomatedReasoningPolicyVersion requests per second | 5 |
| DeleteAutomatedReasoningPolicy requests per second | 5 |
| DeleteAutomatedReasoningPolicyBuildWorkflow requests per second | 5 |
| DeleteAutomatedReasoningPolicyTestCase requests per second | 5 |
| ExportAutomatedReasoningPolicyVersion requests per second | 5 |
| GetAutomatedReasoningPolicy requests per second | 10 |
| GetAutomatedReasoningPolicyAnnotations requests per second | 10 |
| GetAutomatedReasoningPolicyBuildWorkflow requests per second | 10 |
| GetAutomatedReasoningPolicyBuildWorkflowResultAssets requests per second | 10 |
| GetAutomatedReasoningPolicyNextScenario requests per second | 10 |
| GetAutomatedReasoningPolicyTestCase requests per second | 10 |
| GetAutomatedReasoningPolicyTestResult requests per second | 10 |
| ListAutomatedReasoningPolicies requests per second | 5 |
| ListAutomatedReasoningPolicyBuildWorkflows requests per second | 5 |
| ListAutomatedReasoningPolicyTestCases requests per second | 5 |
| ListAutomatedReasoningPolicyTestResults requests per second | 5 |
| Policies per account | 100 |
| Rules in policy | 500 |
| Source document size (MB) | 5 |
| Source document tokens | 122.9K |
| StartAutomatedReasoningPolicyBuildWorkflow requests per second | 1 |
| StartAutomatedReasoningPolicyTestWorkflow requests per second | 1 |
| Tests per policy | 100 |
| Types per policy | 50 |
| UpdateAutomatedReasoningPolicy requests per second | 5 |
| UpdateAutomatedReasoningPolicyAnnotations requests per second | 5 |
| UpdateAutomatedReasoningPolicyTestCase requests per second | 5 |
| Values per type in policy | 50 |
| Variables in policy | 200 |
| Versions per policy | 1K |
Evaluation (12)
| Quota | Default value |
|---|---|
| Number of concurrent automatic model evaluation jobs | 20 |
| Number of concurrent model evaluation jobs that use human workers | 10 |
| Number of custom metrics | 10 |
| Number of custom prompt datasets in a human-based model evaluation job | 1 |
| Number of datasets per job | 5 |
| Number of evaluation jobs | 5K |
| Number of metrics per dataset | 3 |
| Number of models in a model evaluation job that uses human workers | 2 |
| Number of models in automated model evaluation job | 1 |
| Number of prompts in a custom prompt dataset | 1K |
| Size of prompt | 4 |
| Task time for workers | 30 |
Advanced Prompt Optimization (2)
| Quota | Default value |
|---|---|
| Active jobs per account | 20 |
| Inactive jobs per account | 5K |
Flows (35)
| Quota | Default value |
|---|---|
| Agent nodes per flow | 20 |
| Collector nodes per flow | 1 |
| Condition nodes per flow | 5 |
| Conditions per condition node | 5 |
| CreateFlow requests per second | 2 |
| CreateFlowAlias requests per second | 2 |
| CreateFlowVersion requests per second | 2 |
| DeleteFlow requests per second | 2 |
| DeleteFlowAlias requests per second | 2 |
| DeleteFlowVersion requests per second | 2 |
| Flow aliases per flow | 10 |
| Flow executions per account | 1K |
| Flow versions per flow | 10 |
| Flows per account | 100 |
| GetFlow requests per second | 10 |
| GetFlowAlias requests per second | 10 |
| GetFlowVersion requests per second | 10 |
| Inline code nodes per flow | 5 |
| Input nodes per flow | 1 |
| Iterator nodes per flow | 1 |
| Knowledge base nodes per flow | 20 |
| Lambda function nodes per flow | 20 |
| Lex nodes per flow | 5 |
| ListFlowAliases requests per second | 10 |
| ListFlows requests per second | 10 |
| ListFlowVersions requests per second | 10 |
| Output nodes per flow | 20 |
| PrepareFlow requests per second | 2 |
| Prompt nodes per flow | 20 |
| S3 retrieval nodes per flow | 10 |
| S3 storage nodes per flow | 10 |
| Total nodes per flow | 40 |
| UpdateFlow requests per second | 2 |
| UpdateFlowAlias requests per second | 2 |
| ValidateFlowDefinition requests per second | 2 |
General (17)
| Quota | Default value |
|---|---|
| Action groups per Agent | 20 |
| Agent Collaborators per Agent | 1K |
| Agents per account | 1K |
| APIs per Agent | 11 |
| Associated aliases per Agent | 10 |
| Associated knowledge bases per Agent | 2 |
| Characters in Agent instructions | 20K |
| Concurrent model import jobs | 1 |
| Custom models with a creating status per account | 2 |
| Enabled action groups per agent | 15 |
| Endpoints per inference profile | 5 |
| Imported models per account | 3 |
| Inference profiles per account | 1K |
| Model units no-commitment Provisioned Throughputs across base models | 0 |
| Model units no-commitment Provisioned Throughputs across custom models | 0 |
| Number of custom prompt routers per account | 500 |
| Parameters per function | 5 |
API request rates (53)
| Quota | Default value |
|---|---|
| AssociateAgentKnowledgeBase requests per second | 6 |
| CreateAgent requests per second | 6 |
| CreateAgentActionGroup requests per second | 12 |
| CreateAgentAlias requests per second | 2 |
| DeleteAgent requests per second | 2 |
| DeleteAgentActionGroup requests per second | 2 |
| DeleteAgentAlias requests per second | 2 |
| DeleteAgentVersion requests per second | 2 |
| DisassociateAgentKnowledgeBase requests per second | 4 |
| GetAgent requests per second | 15 |
| GetAgentActionGroup requests per second | 20 |
| GetAgentAlias requests per second | 10 |
| GetAgentKnowledgeBase requests per second | 15 |
| GetAgentVersion requests per second | 10 |
| ListAgentActionGroups requests per second | 10 |
| ListAgentAliases requests per second | 10 |
| ListAgentKnowledgeBases requests per second | 10 |
| ListAgents requests per second | 10 |
| ListAgentVersions requests per second | 10 |
| PrepareAgent requests per second | 2 |
| Throttle rate limit for Bedrock Data Automation Runtime: ListTagsForResource | 25 |
| Throttle rate limit for Bedrock Data Automation Runtime: TagResource | 25 |
| Throttle rate limit for Bedrock Data Automation Runtime: UntagResource | 25 |
| Throttle rate limit for Bedrock Data Automation: ListTagsForResource | 25 |
| Throttle rate limit for Bedrock Data Automation: TagResource | 25 |
| Throttle rate limit for Bedrock Data Automation: UntagResource | 25 |
| Throttle rate limit for CreateBlueprint | 5 |
| Throttle rate limit for CreateBlueprintVersion | 5 |
| Throttle rate limit for CreateDataAutomationLibrary | 3 |
| Throttle rate limit for CreateDataAutomationProject | 5 |
| Throttle rate limit for DeleteBlueprint | 5 |
| Throttle rate limit for DeleteDataAutomationLibrary | 3 |
| Throttle rate limit for DeleteDataAutomationProject | 5 |
| Throttle rate limit for GetBlueprint | 5 |
| Throttle rate limit for GetDataAutomationLibrary | 5 |
| Throttle rate limit for GetDataAutomationLibraryEntity | 5 |
| Throttle rate limit for GetDataAutomationLibraryIngestionJob | 5 |
| Throttle rate limit for GetDataAutomationProject | 5 |
| Throttle rate limit for GetDataAutomationStatus | 10 |
| Throttle rate limit for InvokeDataAutomationAsync | 10 |
| Throttle rate limit for InvokeDataAutomationLibraryIngestionJob | 5 |
| Throttle rate limit for ListBlueprints | 5 |
| Throttle rate limit for ListDataAutomationLibraries | 5 |
| Throttle rate limit for ListDataAutomationLibraryEntities | 5 |
| Throttle rate limit for ListDataAutomationLibraryIngestionJobs | 5 |
| Throttle rate limit for ListDataAutomationProjects | 5 |
| Throttle rate limit for UpdateBlueprint | 5 |
| Throttle rate limit for UpdateDataAutomationLibrary | 5 |
| Throttle rate limit for UpdateDataAutomationProject | 5 |
| UpdateAgent requests per second | 4 |
| UpdateAgentActionGroup requests per second | 6 |
| UpdateAgentAlias requests per second | 2 |
| UpdateAgentKnowledgeBase requests per second | 4 |
Guardrails (21)
| Quota | Default value |
|---|---|
| Automated Reasoning policies per guardrail | 2 |
| Contextual grounding query length in text units | 1 |
| Contextual grounding response length in text units | 5 |
| Contextual grounding source length in text units | 100 |
| Example phrases per Topic | 5 |
| Guardrails per account | 100 |
| On-demand ApplyGuardrail Content filter policy text units per second | 200 |
| On-demand ApplyGuardrail Content filter policy text units per second (standard) | 200 |
| On-demand ApplyGuardrail contextual grounding policy text units per second | 106 |
| On-demand ApplyGuardrail Denied topic policy text units per second | 50 |
| On-demand ApplyGuardrail Denied topic policy text units per second (standard) | 200 |
| On-demand ApplyGuardrail requests per second | 100 |
| On-demand ApplyGuardrail Sensitive information filter policy text units per second | 500 |
| On-demand ApplyGuardrail Word filter policy text units per second | 500 |
| On-demand InvokeGuardrailChecks requests per minute | 1.5K |
| Regex entities in Sensitive Information Filter | 30 |
| Regex length in characters | 500 |
| Topics per guardrail | 30 |
| Versions per guardrail | 20 |
| Word length in characters | 100 |
| Words per word policy | 10K |
Managed Knowledge Bases (19)
| Quota | Default value |
|---|---|
| AgenticRetrieveStream requests per second per account | 1 |
| AgenticRetrieveStream user query size | 10K |
| Concurrent ingestion jobs per knowledge base | 50 |
| Data sources per knowledge base | 200 |
| DeleteKnowledgeBaseDocuments requests per second | 10 |
| DeleteResourcePolicy requests per second | 5 |
| Files to ingest per IngestKnowledgeBaseDocuments request | 10 |
| GetDocumentContent requests per second per account | 100 |
| GetDocumentContent requests per second per knowledge base | 5 |
| GetResourcePolicy requests per second | 5 |
| Individual file extracted text size (MB) | 30 |
| IngestKnowledgeBaseDocuments requests per second | 20 |
| Knowledge bases per account | 1K |
| ListKnowledgeBaseDocuments requests per second | 10 |
| PutResourcePolicy requests per second | 5 |
| Retrieve requests per second per account | 100 |
| Retrieve requests per second per knowledge base | 5 |
| Retrieve user query size | 10K |
| Total storage size per knowledge base (TB) | 10 |
Prompt management (8)
| Quota | Default value |
|---|---|
| CreatePrompt requests per second | 2 |
| CreatePromptVersion requests per second | 2 |
| DeletePrompt requests per second | 2 |
| GetPrompt requests per second | 10 |
| ListPrompts requests per second | 10 |
| Prompts per account | 500 |
| UpdatePrompt requests per second | 2 |
| Versions per prompt | 10 |