See details on API, Subscription and Finetune pricing.
High-performance AI models at competitive prices
Model Name | Est. Speed | Context Length | Tool Calling | Input Priceper million tokens | Output Priceper million tokens |
---|---|---|---|---|---|
Meta Llama 3.1 405B Instruct Enterprise | ~ 130 tok/s | 131,072 | ✓ | $1.50 | $1.50 |
Meta Llama 3.3 70B Instruct Professional | ~ 200 tok/s | 131,072 | ✓ | $0.45 | $0.45 |
Meta Llama 3.1 8B Instruct Starter | ~ 450 tok/s | 131,072 | ✓ | $0.10 | $0.10 |
Avian API offers competitive pricing for all models, at some of the highest speeds on the market by leveraging speculative decoding and running on the latest Nvidia H200 SXM GPUs. We have production grade capacity for all the models we serve, allowing usage with no rate limits to support you as you scale.
High-performance dedicated GPU instances for your custom models
GPU Type | Frombilled by the second | Memory |
---|---|---|
H200 SXM Latest Generation | $0.00208 | 141GB HBM3 |
H100 SXM Enterprise | $0.00139 | 80GB HBM3 |
Get dedicated GPU instances to deploy and run any HuggingFace model with our optimized infrastructure. Pricing shown reflects reserved instances - contact us for on-demand rates. Perfect for high-throughput production workloads requiring dedicated resources.