We use cookies to provide you with  the best website experience

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Insights
11
/
05
/
26

Unikraft: Why the AI Era Needs a Cloud Rewrite

Today's cloud was built for predictable traffic and steady-state apps. AI demands something fundamentally different, and we're convinced Unikraft, the cloud infrastructure company backed by Vercel and First Momentum, will lead the rebuild.

share:

The cloud we're running AI on wasn't built for AI. It was built for a world where traffic was predictable, deployments were measured in minutes, and a "spike" in usage meant Black Friday. That world is gone. Today, a single AI coding agent can spin up a thousand sandboxed environments in a few seconds and need zero of them a minute later. AI apps can go from quiet to millions of requests overnight. The infrastructure underneath, designed for steady-state web applications with gentle daily traffic curves is buckling under workloads it was never meant to serve.

AI companies are spending millions on cloud capacity they don't use, just to keep cold starts from killing their user experience. Margin pressure is mounting across the entire AI application layer, and the root cause is the same everywhere: the current cloud stack forces a brutal trade-off between performance and cost that didn't exist at this scale before. The sooner infrastructure catches up, the sooner the next generation of AI companies can become sustainable and profitable businesses. Right now, that gap between what modern workloads need and what the hyperscalers deliver is one of the largest platform opportunities in software.

For us at First Momentum, companies tackling this area are one of the most compelling investment themes we're tracking, and we're actively looking for founders building in and around it. Below, I want to explain how we're thinking about the space and why we are convinced that our portfolio company Unikraft is in pole position to become a next-generation, AI-native hyperscaler.

How Infrastructure Needs Are Changing in the Age of AI

For the last fifteen years, cloud infrastructure has been optimized for a fairly predictable pattern: a web app with some background jobs, traffic that ebbs and flows on a daily cycle, and scaling decisions that happen on the order of minutes. The abstractions we have built so far, containers, Kubernetes, autoscaling groups, are working relatively well for that world. But they were designed around a set of assumptions that AI workloads violate in almost every dimension.

  • Containers share a kernel, which means they don't provide strong enough isolation for untrusted, machine-generated code, and in the AI era, that's most of the code that needs to run.
  • Kubernetes autoscaling reacts on the order of minutes, polling metrics and adjusting replica counts long after a traffic spike has already hit. That's an eternity for AI workloads that fluctuate on a millisecond timescale.
  • Traditional virtual machines (VMs) offer proper isolation but take seconds or minutes to boot, making them unusable for anything that needs to start on demand.

And all of these abstractions assume that workloads, once started, stay running. There's no native concept of scaling to zero and resuming statefully in milliseconds, which is exactly what ephemeral AI workloads like agent sandboxes require. The result is that companies are forced into a set of trade-offs that didn't exist before: choose fast startup but weak isolation, or strong isolation but slow startup; choose cost efficiency but cold-start penalties, or good UX but massive idle spend. AI workloads need all of these properties at once, and the current stack simply cannot deliver them together.

And the even bigger problem: AI companies are hitting meaningful scale in timeframes that didn't exist before. We drastically need a new wave of infrastructure companies building the backbone for these new requirements.

Why the old playbook is breaking

The current cloud stack is becoming a challenge for companies of all sizes. Every organisation that is deploying AI workloads at scale, whether it's a bank running fraud-detection agents or an enterprise rolling out internal copilots, is running into the same infrastructure wall. The workloads are spiky, ephemeral, stateful, and security-sensitive.

The cloud they're running on was designed for none of those things. Agentic AI consumes up to 30x more tokens per user interaction than standard generative AI because each agent run involves multi-step reasoning, tool calls, and coordination with other agents, not a single prompt-response exchange.

Gartner projects that 33% of enterprise software applications will include agentic AI by 2028, up from essentially zero in 2024. Infrastructure is the top bottleneck to scaling these systems as the shape of an AI workload is fundamentally different from the shape of a web app, and the old abstractions simply don't fit. The strain this puts on infrastructure, from compute to energy, is something we've explored before.

For startups, the problem is also becoming acute. The old playbook was "move fast on whatever infra gets you shipping, fix it later when you hit PMF." That worked because "later" used to mean a couple of years, enough time to raise a Series A, hire an infra team, and refactor. In the AI era, "later" is often a couple of months.

Unikraft's Co-Founder & CEO Felipe laid out the numbers in a recent post: among the top ten AI startups, average revenue in the first twelve months is around $74M. Lovable, Cursor, and Mistral each crossed $100M in year one. And as Gary Tran at YC has pointed out, this isn't limited to the headline names, entire batches of early-stage AI companies are growing 10% week-over-week, which has never happened before in early-stage venture.

If you're growing that fast, infrastructure isn't a thing you get to clean up later. It's a day-one problem, and the wrong choice on day one can mean margin structures that are broken by the time you need to raise your Series A.

What AI workloads actually look like and why legacy infra breaks beneath them

So what exactly makes AI workloads so different that the current stack can't handle them? Felipe's framing is the clearest I've seen, and it maps directly onto the infrastructure failures described above (he goes deeper on why unikernels matter for AI workloads in this recent podcast).

Modern AI infra has to handle five things at once, and legacy platforms were built for at most one or two of them:

  • Unprecedented scale. Millions of instances, sandboxes, or agent sessions are created per week. Every user interaction with an AI coding assistant, browser agent, or copilot can spawn its own isolated environment. Legacy infra buckles under that volume, or at least becomes ruinously expensive. Traditional orchestrators like Kubernetes were designed to manage hundreds or thousands of pods, not millions of ephemeral micro-environments spinning up and down every hour.
  • Ephemerality. These workloads are idle most of the time. A sandbox runs for thirty seconds, then sits unused for hours until the next interaction. Traditional clouds punish you for this: warm pools mean you're paying for servers that are doing nothing. Legacy platforms have no concept of true scale-to-zero; the smallest billable unit is still a running container or VM, even when nobody is using it.
  • Persistence. Most AI workloads are stateful. An agent has context. A sandbox has a working directory, installed packages, and file state. Any scale-to-zero mechanism has to resume exactly where it left off, in milliseconds, not start from scratch. Otherwise, the user experience breaks. Traditional serverless platforms like AWS Lambda are stateless by design: every cold start is a blank slate, which means either you lose context or you bolt on external state management that adds latency and complexity.
  • Security. When code is generated by an LLM and executed automatically, you cannot trust it the way you trust code your engineers wrote. Container-level isolation isn't enough for multi-tenant untrusted code. Strong, hardware-level VM isolation is an absolute must. But legacy VMs that provide this level of isolation take seconds or minutes to boot, making them impractical at the scale and speed AI workloads demand.
  • Unit economics. AI scale breaks traditional infra economics. If each sandbox or agent session requires its own VM, and you can only fit a few hundred VMs on a server, the cost per user becomes prohibitive. Margins get eaten alive unless the underlying platform was built for exponentially higher density. Legacy hyperscaler pricing was built around long-running, predictable workloads; the more ephemeral and spiky your usage, the worse your unit economics get.

Why this is a massive platform opportunity

The macro picture is unmissable. IDC's latest data projects global AI infrastructure spending to surpass $1 trillion by 2029, with a five-year CAGR of approximately 31%. In Q4 2025 alone, spending reached nearly $90 billion, and IDC projects $487 billion for the full year 2026.

That spending is happening because demand is real. But the vast majority of it is flowing into hardware: GPUs, servers, data centers, networking. What's getting less attention is the software layer that actually deploys, orchestrates, and runs AI workloads on top of that hardware. And the current stack is extraordinarily wasteful. According to Datadog's State of Cloud Costs report, 83% of container spend goes to idle resources: 54% to idle clusters and 29% to idle workloads. Only 17% is actually utilized.

Container costs breakdown
Container costs breakdown. Source: Datadog's State of Cloud Costs

For every dollar companies spend on container infrastructure today, 83 cents is effectively wasted on compute that isn't doing anything. For the fastest-growing AI companies, many of which are scaling to tens of millions in revenue within their first year, this waste translates directly into margin pressure that threatens their path to profitability. A deployment layer that eliminates idle spend isn't just a nice optimization for these companies. It's a massive margin unlock that can make the difference between a sustainable business and one that burns through capital faster than it grows.

You can buy all the GPUs in the world, but if the deployment stack underneath charges you for idle capacity, the hardware investment alone won't solve the problem. New chips don't fix cold starts. New data centers don't fix the trade-off between isolation and startup speed. New server racks don't give you stateful scale-to-zero. Deploying AI at scale requires not just better hardware but a fundamentally new software layer. That deployment software is the missing piece of the AI infrastructure stack, and a meaningful share of cloud spending will shift toward it. The companies that build this layer won't just capture a slice of the existing market, they'll unlock workloads and business models that aren't economically viable on today's stack at all.

What Unikraft Is Doing to Solve This

Unikraft is exactly the kind of new deployment software layer we described above. Based in Heidelberg and San Francisco and built by a team with deep roots in a Linux FoundationOSS project, Unikraft Cloud is a next-generation cloud platform that can be deployed on bare-metal servers, inside VMs, as dedicated hosts, or on-prem/BYOC.

Anything you deploy on Unikraft runs inside its own hardware-level isolated microVM. When idle, the platform transparently scales instances to zero: no memory, no CPU, no cost. When the next request comes in, the platform wakes the instance back up in milliseconds, so end users think every service is running all the time, when in the back end most of them aren't. From the outside, it looks like always-on infrastructure. You're only paying for what's actually active. And the platform can pack hundreds of thousands of such instances onto a single server.

How it works under the hood

1. An ultra-efficient cloud stack. Every instance runs as a minimal VM built for speed and security. These VMs include only what your application actually needs: no unused packages, no general-purpose OS overhead and no bloat. At the core sits a heavily modified version of the Firecracker VMM, custom-optimized for ultra-fast instance starts. Getting a VMM to boot a fully isolated VM in single-digit milliseconds is years of engineering work that can't be shortcut. The resulting images are typically 10x smaller than standard containers.

2. A reactive network layer built from first principles. Traditional cloud platforms react to traffic changes on the order of minutes. Unikraft built a custom proxy and controller from scratch that reacts to incoming requests in real-time. When a request hits the platform, the proxy buffers it, the controller checks whether an instance is active, and if not, cold-boots one instantly. The moment the instance is ready, the proxy forwards the request. After an idle timeout, the instance scales back to zero, and its state is preserved so the next request resumes exactly where it left off. The platform can scale thousands of instances simultaneously, all within milliseconds. Building a network layer that operates at this speed and reliability takes years and world-class distributed systems expertise.

3. Lightning-fast app startup through snapshotting. The platform can pre-initialize any application, capture its fully initialized state as a memory snapshot, and then scale it to zero. When the next request arrives, the instance resumes from that snapshot as if it never stopped. This works for heavy, stateful applications, not just lightweight functions: databases, AI agents with context, and full development environments.

Beyond basic snapshots, the platform supports forking (spawning multiple new instances from a single shared initialized state), continuous snapshots (checkpointing state on an ongoing basis so nothing is ever lost), and instance templates (pre-built snapshots of complex environments that can be launched on demand).

The range of workloads Unikraft supports is broad: AI agent sandboxes, headless browsers for scraping and automation, serverless functions, serverless databases, build and test environments, ETL and data pipelines, remote IDEs, API gateways, and game servers. Ultimately, anything you can define in a Dockerfile, Unikraft can run. It doesn't matter how large or heavy the environment is.

Performance Benchmarks: How Unikraft Compares to the Market

Over the last few years, we evaluated several hundred software infrastructure companies, and we have never seen performance benchmarks like Unikraft's. They sound like they shouldn't be possible, but they are. And these aren't synthetic tests in controlled environments. They're production numbers from Unikraft's customers running real workloads at scale:

Cloud performance comparison
Cloud performance comparison. Source: Unikraft

These numbers are orders of magnitude better than what the market can currently offer. It's rare to see such a paradigm shift in one of the biggest industries in the world happening right in front of you. A few of these deserve a closer look because they don't just represent incremental improvements. They change what's technically and economically possible:

  • Sub-10ms cold starts with full VM isolation. Unikraft boots a complete, hardware-isolated microVM in under 10 milliseconds. For context, a traditional VM takes 10 to 60 seconds to boot. AWS Lambda cold starts range from hundreds of milliseconds to several seconds and even then only provide container-level isolation. Unikraft delivers stronger security in a fraction of the time it takes weaker isolation models to start. That's what lets customers eliminate warm pools entirely: if you can boot a fully isolated environment faster than a network round-trip, there's simply no reason to keep anything pre-warmed. Companies paying millions per year to keep infrastructure always-on can turn that entire line item into a variable cost.
  • 100,000+ strongly isolated instances per single server. Traditional VM infrastructure tops out at a few hundred instances per server. Container orchestrators like Kubernetes can pack more, but at the cost of weaker isolation boundaries. Unikraft achieves several-thousand-times better density than traditional VMs while maintaining hardware-level isolation for every single instance. On modern server hardware, the platform can push toward a million scaled-to-zero instances on a single machine. That density is what turns previously impossible business models into viable ones: sustainable free tiers for database products or sandbox platforms that serve a million users from a handful of servers instead of an entire data center aisle.
Server density with Scale to Zero
Server density with Scale to Zero. Source: Unikraft
  • Stateful scale-to-zero and resume in milliseconds. Most serverless platforms are stateless by design: when a function scales to zero, its state is gone, and the next invocation starts from scratch. Unikraft's snapshot subsystem captures the full memory state of a running instance, hibernates it with zero resource consumption, and restores it exactly where it left off in single-digit milliseconds. Combined with forking and instance templates, this means even heavy, stateful environments like databases, AI agent sandboxes, or multi-gigabyte build toolchains can scale to zero and resume instantly. That turns what used to be a massive fixed infrastructure cost into a variable cost that scales linearly with actual usage.
  • Full compatibility with existing tooling. Most alternative high-performance isolation approaches require teams to throw away their existing stack and rebuild from scratch. Unikraft takes the opposite approach. It plugs transparently into Dockerfiles, Kubernetes, Prometheus, Grafana, Terraform, and OpenTelemetry. Teams keep their entire toolchain and get the performance gains essentially for free. That means adoption can happen in days rather than months, removing the biggest historical barrier to next-generation infrastructure actually getting deployed.

What makes these numbers truly remarkable is that they normally exist in direct tension with each other. In cloud infrastructure, you historically had to choose: fast startup or strong isolation, high density or full VM security, developer convenience or raw performance. Every platform before Unikraft has delivered one or two of these at the expense of the others. Unikraft delivers all of them simultaneously, with virtually no trade-offs. This is resulting in a fundamentally different architecture, and in our view, it's the strongest technical foundation we've seen for what the next generation of cloud infrastructure needs to look like.

How This Plays Out in Production: Browser Use & Prisma

The benchmarks above tell you what Unikraft can do. But the more convincing evidence comes from watching what happens when sophisticated engineering teams actually bet their products on it. Two of the best public examples are Browser Use and Prisma: an AI agent platform and a managed Postgres service, two completely different products that ran into the same infrastructure wall and arrived at the same answer.

Browser Use: millions of AI agents on micro-VMs

Browser Use, one of the open-source companies we highlighted in Sifted, runs millions of web agents on Unikraft. Their CTO Larsen Cundric published a detailed writeup of their migration from AWS Lambda to Unikraft micro-VMs. Once their agents needed to execute arbitrary code, Lambda broke down: the agent loop shared a backend with the REST API, a redeploy killed every running agent, and code execution could theoretically access host credentials. Now each agent session runs in its own Unikraft micro-VM on dedicated bare metal inside AWS, seeing exactly three environment variables. No AWS keys, no database credentials, no API tokens. They run the exact same Docker image locally, in evals, and in production, with a single config flag switching between environments.

Two details from their architecture are worth pulling out because they're the kind of thing you only get from a platform designed for this specifically:

  • Scale-to-zero that actually resumes with state. Between user queries, an agent's VM suspends completely. It consumes zero memory and zero CPU. The moment the next request arrives, the VM resumes, not cold-starts, but resumes, exactly where it left off, with full context intact, in milliseconds. The user never notices the agent was sleeping. This is exactly the stateful scale-to-zero property that AI agent workloads require, and that traditional serverless platforms like Lambda fundamentally cannot provide because they are stateless by design.
  • Agents run globally without Browser Use having to manage it. Unikraft automatically distributes agents across multiple geographic regions, so no single location becomes a bottleneck. Browser Use doesn't need to build that infrastructure themselves.

The business result: Browser Use doubled their contract size within six months of starting on Unikraft. The platform didn't just solve a technical problem. It gave them the isolation model, the cost structure, and the operational simplicity they needed to scale from thousands of agents to millions.

Prisma: rebuilding serverless Postgres from the ground up

Prisma went even further: they rebuilt their entire serverless Postgres product on Unikraft. Their architecture post is effectively a long explanation of why traditional cloud infrastructure made their product economically impossible. Serverless database providers typically either resell AWS infrastructure or run container-based orchestration with Kubernetes. Both approaches mean each database instance consumes significant resources even when idle, and the cost of maintaining millions of free-tier databases at that overhead is unsustainable. This is exactly why providers like PlanetScale and Heroku shut down their free tiers.

Prisma's answer was to rebuild the stack from first principles on Unikraft. Rather than reselling hyperscaler infrastructure, they lease bare metal servers in data centers globally and run Postgres inside full-fledged, Linux-based microVMs:

  • Image size reduction from 280 MB to 61 MB. The Unikraft team worked with Prisma to strip the standard Postgres image of every component a managed database doesn't need, bringing it to roughly 20% of the original size. That smaller footprint is what enables millisecond boot times for a workload that historically takes many seconds to initialize.
  • Thousands of isolated Postgres instances per bare-metal machine. Each instance runs as its own microVM with full hardware-level VM isolation, not container isolation. That density is what makes a sustainable free tier economically viable. Container-based designs top out at a few hundred instances per server with much weaker isolation boundaries. Unikraft delivers orders-of-magnitude better density with stronger security guarantees.
  • Multi-tier snapshotting for stateful scale-to-zero. Databases are the hardest workload to scale to zero because they hold state that must survive hibernation. Prisma and Unikraft built a snapshotting system that captures the full memory state of each database, hibernates idle instances, and resumes them from that snapshot in single-digit milliseconds. The user experiences zero cold-start delay. Prisma pays nothing while the database sleeps.

This server density is unheard of in traditional architectures. Prisma didn't just adopt Unikraft as a vendor. They rebuilt their flagship product on it because nothing else could support the business model they wanted to build.

The Next Shift

We believe Unikraft is at an inflection point. The technology is proven. The first wave of customers has validated it in production. The expansion dynamics are strong. What comes next is the transition from early adopters to industry standard, and that's where the real scale begins.

The timing for that transition could not be better. AI workloads are exposing the limits of legacy cloud infrastructure at exactly the moment Unikraft's product is mature enough to absorb them. The market is enormous: a trillion-dollar cloud industry growing at 16% annually, with the fastest-growing segment, AI-native workloads, being precisely the one where legacy infrastructure fails most dramatically. And the deployment flexibility is already there, meeting customers on bare metal, inside hyperscaler VMs, or on-prem, which means there's no architectural reason Unikraft can't be everywhere these workloads run.

Infrastructure shifts happen roughly once a decade. Physical servers gave way to virtual machines. VMs gave way to containers. Containers gave way to serverless. The next shift, to something that combines the isolation of VMs, the speed of functions, and the density required for AI-scale workloads, is happening right now.

Related Articles

Read
11
/
05
/
26
Christian Neumann
Unikraft: Why the AI Era Needs a Cloud Rewrite
Today's cloud was built for predictable traffic and steady-state apps. AI demands something fundamentally different, and we're convinced Unikraft, the cloud infrastructure company backed by Vercel and First Momentum, will lead the rebuild.
Insights
Listen
17
/
04
/
26
Andreas Fischer
Roboter-Revolution in den Fabriken
Ein Blick nach China zeigt, wie sich industrielle Produktion verändert: mehr Roboter, weniger Menschen, Automatisierung in Rekordzeit. Andreas Fischer spricht darüber, was das für Deutschland bedeutet.
News
Read
18
/
03
/
26
Andreas Fischer
Two Principals. One Clear Signal.
First Momentum promotes Max Ochs and Christian Neumann to Principal, doubling down on the belief that knowing the science and domain beats knowing the VC game.
Insights