Pricing
Predictable tiers. No per-token surprises.
Inference runs on your hardware, so you never pay us per token. You pay for the platform, the deployment work, and the level of operations you want from our team.
Pilot
$25k
one-time, fixed scope
A 4-week pilot deployment in a non-production segment of your network.
- Reference architecture deployed in your environment
- Up to 2 inference nodes (your hardware)
- Up to 25 internal users
- LiteLLM + vLLM + Ollama configured
- Office-hours support during pilot
- Migration path to Production tier
Production
Most popularFrom $12k
per month
Production-grade deployment with managed operations and on-call coverage.
- Unlimited internal users & applications
- HA inference cluster sized for your load
- Coder workspaces for development teams
- SAML / OIDC / SCIM identity integration
- Audit log shipping to your SIEM
- Business-hours managed operations
- Quarterly model & capacity review
Enterprise
Custom
annual
For agencies, large hospital systems, and multi-site deployments.
- Multi-site & multi-region deployments
- Air-gapped & classified-network support
- 24/7 managed operations and on-call
- Dedicated solutions architect
- Custom model fine-tuning pipeline
- Compliance evidence packs (FedRAMP, HIPAA, etc.)
- Named SLA with penalties
FAQ
Common questions about pricing.
- Do we pay for inference compute?
- No. You provision and own the GPU hardware. Our pricing covers the platform, deployment, and operations — not the silicon underneath.
- Can we run this fully air-gapped?
- Yes. We support fully disconnected networks, with mirrored model registries and signed offline updates. The Enterprise tier is built around this.
- What hardware do we need?
- Anything from a single H100/H200 box up to multi-node clusters. We commonly deploy on NVIDIA L40S, H100, and H200 hardware. We help you size during discovery.
- What happens if we want to take operations in-house?
- Everything is open source and deployed as code in your environment. You can drop the managed-ops portion at any renewal and keep running the platform yourselves.