Vercel Details "Fluid" Compute Powering AI Gateway with Cost-Saving Pricing Model
Executive Summary
Vercel has provided a technical deep-dive into its AI Gateway, a service for connecting to AI models through a unified interface. The announcement reveals that the gateway is built on Vercel's next-generation runtime, "Fluid," which is optimized for highly concurrent, network-bound workloads typical of AI applications. The core innovation is Fluid's "Active CPU Pricing," a model that charges full CPU rates only when code is actively running and a lower memory-only rate during idle time (e.g., waiting for an AI provider's response), aiming to drastically reduce operational costs for developers.
Key Takeaways
* Product: The announcement focuses on the architecture of the Vercel AI Gateway, which runs on a new infrastructure called "Fluid" compute.
* Active CPU Pricing: Fluid introduces a new pricing model where customers only pay full CPU rates when the CPU is active. Idle time spent waiting for network responses from AI providers is billed at a significantly lower memory-only rate.
* Cost Efficiency Example: For the AI Gateway workload, only 8% of its runtime involves active CPU work. With Fluid, Vercel pays CPU rates for that 8% instead of the full 100%, demonstrating significant cost savings.
* Fluid Compute Technology: Fluid is designed for network-bound tasks. It reuses compute instances during and across invocations, enabling in-function concurrency and persistent in-memory caching to reduce latency and improve performance, combining serverless elasticity with server-like efficiency.
* Enhanced Reliability: The AI Gateway leverages Vercel's global network for low-latency routing and features automatic provider fallback. If a primary AI provider (e.g., Anthropic via Bedrock) is slow or fails, it can automatically retry the request with another provider (e.g., Anthropic via Vertex AI) without application-level changes.
Strategic Importance
This announcement positions Vercel's infrastructure as uniquely optimized for the new wave of AI applications, which are often idle while waiting for model responses. By directly addressing the high cost of this idle time with "Active CPU Pricing," Vercel creates a compelling competitive advantage to attract developers building and scaling AI-powered products on its platform.