Get started with Avian

See details on API, Subscription and Finetune pricing.

Avian API Model Pricing

High-performance AI models at competitive prices

Model Name Est. Speed Context Length Tool Calling Input Priceper million tokens Output Priceper million tokens
Meta Llama 3.1 405B Instruct Enterprise ~ 130 tok/s 131,072 $1.50 $1.50
Meta Llama 3.3 70B Instruct Professional ~ 200 tok/s 131,072 $0.45 $0.45
Meta Llama 3.1 8B Instruct Starter ~ 450 tok/s 131,072 $0.10 $0.10

Enterprise-Grade Performance

Avian API offers competitive pricing for all models, at some of the highest speeds on the market by leveraging speculative decoding and running on the latest Nvidia H200 SXM GPUs. We have production grade capacity for all the models we serve, allowing usage with no rate limits to support you as you scale.

Dedicated Deployments

High-performance dedicated GPU instances for your custom models

GPU Type Frombilled by the second Memory
H200 SXM Latest Generation $0.00208 141GB HBM3
H100 SXM Enterprise $0.00139 80GB HBM3

Deploy Custom Models

Get dedicated GPU instances to deploy and run any HuggingFace model with our optimized infrastructure. Pricing shown reflects reserved instances - contact us for on-demand rates. Perfect for high-throughput production workloads requiring dedicated resources.