The cloud we're running AI on wasn't built for AI. It was built for a world where traffic was predictable, deployments were measured in minutes, and a "spike" in usage meant Black Friday. That world is gone. Today, a single AI coding agent can spin up a thousand sandboxed environments in a few seconds and need zero of them a minute later. AI apps can go from quiet to millions of requests overnight. The infrastructure underneath, designed for steady-state web applications with gentle daily traffic curves is buckling under workloads it was never meant to serve.
AI companies are spending millions on cloud capacity they don't use, just to keep cold starts from killing their user experience. Margin pressure is mounting across the entire AI application layer, and the root cause is the same everywhere: the current cloud stack forces a brutal trade-off between performance and cost that didn't exist at this scale before. The sooner infrastructure catches up, the sooner the next generation of AI companies can become sustainable and profitable businesses. Right now, that gap between what modern workloads need and what the hyperscalers deliver is one of the largest platform opportunities in software.
For us at First Momentum, companies tackling this area are one of the most compelling investment themes we're tracking, and we're actively looking for founders building in and around it. Below, I want to explain how we're thinking about the space and why we are convinced that our portfolio company Unikraft is in pole position to become a next-generation, AI-native hyperscaler.
For the last fifteen years, cloud infrastructure has been optimized for a fairly predictable pattern: a web app with some background jobs, traffic that ebbs and flows on a daily cycle, and scaling decisions that happen on the order of minutes. The abstractions we have built so far, containers, Kubernetes, autoscaling groups, are working relatively well for that world. But they were designed around a set of assumptions that AI workloads violate in almost every dimension.
And all of these abstractions assume that workloads, once started, stay running. There's no native concept of scaling to zero and resuming statefully in milliseconds, which is exactly what ephemeral AI workloads like agent sandboxes require. The result is that companies are forced into a set of trade-offs that didn't exist before: choose fast startup but weak isolation, or strong isolation but slow startup; choose cost efficiency but cold-start penalties, or good UX but massive idle spend. AI workloads need all of these properties at once, and the current stack simply cannot deliver them together.
And the even bigger problem: AI companies are hitting meaningful scale in timeframes that didn't exist before. We drastically need a new wave of infrastructure companies building the backbone for these new requirements.
The current cloud stack is becoming a challenge for companies of all sizes. Every organisation that is deploying AI workloads at scale, whether it's a bank running fraud-detection agents or an enterprise rolling out internal copilots, is running into the same infrastructure wall. The workloads are spiky, ephemeral, stateful, and security-sensitive.
The cloud they're running on was designed for none of those things. Agentic AI consumes up to 30x more tokens per user interaction than standard generative AI because each agent run involves multi-step reasoning, tool calls, and coordination with other agents, not a single prompt-response exchange.
Gartner projects that 33% of enterprise software applications will include agentic AI by 2028, up from essentially zero in 2024. Infrastructure is the top bottleneck to scaling these systems as the shape of an AI workload is fundamentally different from the shape of a web app, and the old abstractions simply don't fit. The strain this puts on infrastructure, from compute to energy, is something we've explored before.
For startups, the problem is also becoming acute. The old playbook was "move fast on whatever infra gets you shipping, fix it later when you hit PMF." That worked because "later" used to mean a couple of years, enough time to raise a Series A, hire an infra team, and refactor. In the AI era, "later" is often a couple of months.
Unikraft's Co-Founder & CEO Felipe laid out the numbers in a recent post: among the top ten AI startups, average revenue in the first twelve months is around $74M. Lovable, Cursor, and Mistral each crossed $100M in year one. And as Gary Tran at YC has pointed out, this isn't limited to the headline names, entire batches of early-stage AI companies are growing 10% week-over-week, which has never happened before in early-stage venture.
If you're growing that fast, infrastructure isn't a thing you get to clean up later. It's a day-one problem, and the wrong choice on day one can mean margin structures that are broken by the time you need to raise your Series A.
So what exactly makes AI workloads so different that the current stack can't handle them? Felipe's framing is the clearest I've seen, and it maps directly onto the infrastructure failures described above (he goes deeper on why unikernels matter for AI workloads in this recent podcast).
Modern AI infra has to handle five things at once, and legacy platforms were built for at most one or two of them:
The macro picture is unmissable. IDC's latest data projects global AI infrastructure spending to surpass $1 trillion by 2029, with a five-year CAGR of approximately 31%. In Q4 2025 alone, spending reached nearly $90 billion, and IDC projects $487 billion for the full year 2026.
That spending is happening because demand is real. But the vast majority of it is flowing into hardware: GPUs, servers, data centers, networking. What's getting less attention is the software layer that actually deploys, orchestrates, and runs AI workloads on top of that hardware. And the current stack is extraordinarily wasteful. According to Datadog's State of Cloud Costs report, 83% of container spend goes to idle resources: 54% to idle clusters and 29% to idle workloads. Only 17% is actually utilized.

For every dollar companies spend on container infrastructure today, 83 cents is effectively wasted on compute that isn't doing anything. For the fastest-growing AI companies, many of which are scaling to tens of millions in revenue within their first year, this waste translates directly into margin pressure that threatens their path to profitability. A deployment layer that eliminates idle spend isn't just a nice optimization for these companies. It's a massive margin unlock that can make the difference between a sustainable business and one that burns through capital faster than it grows.
You can buy all the GPUs in the world, but if the deployment stack underneath charges you for idle capacity, the hardware investment alone won't solve the problem. New chips don't fix cold starts. New data centers don't fix the trade-off between isolation and startup speed. New server racks don't give you stateful scale-to-zero. Deploying AI at scale requires not just better hardware but a fundamentally new software layer. That deployment software is the missing piece of the AI infrastructure stack, and a meaningful share of cloud spending will shift toward it. The companies that build this layer won't just capture a slice of the existing market, they'll unlock workloads and business models that aren't economically viable on today's stack at all.
Unikraft is exactly the kind of new deployment software layer we described above. Based in Heidelberg and San Francisco and built by a team with deep roots in a Linux FoundationOSS project, Unikraft Cloud is a next-generation cloud platform that can be deployed on bare-metal servers, inside VMs, as dedicated hosts, or on-prem/BYOC.
Anything you deploy on Unikraft runs inside its own hardware-level isolated microVM. When idle, the platform transparently scales instances to zero: no memory, no CPU, no cost. When the next request comes in, the platform wakes the instance back up in milliseconds, so end users think every service is running all the time, when in the back end most of them aren't. From the outside, it looks like always-on infrastructure. You're only paying for what's actually active. And the platform can pack hundreds of thousands of such instances onto a single server.
1. An ultra-efficient cloud stack. Every instance runs as a minimal VM built for speed and security. These VMs include only what your application actually needs: no unused packages, no general-purpose OS overhead and no bloat. At the core sits a heavily modified version of the Firecracker VMM, custom-optimized for ultra-fast instance starts. Getting a VMM to boot a fully isolated VM in single-digit milliseconds is years of engineering work that can't be shortcut. The resulting images are typically 10x smaller than standard containers.
2. A reactive network layer built from first principles. Traditional cloud platforms react to traffic changes on the order of minutes. Unikraft built a custom proxy and controller from scratch that reacts to incoming requests in real-time. When a request hits the platform, the proxy buffers it, the controller checks whether an instance is active, and if not, cold-boots one instantly. The moment the instance is ready, the proxy forwards the request. After an idle timeout, the instance scales back to zero, and its state is preserved so the next request resumes exactly where it left off. The platform can scale thousands of instances simultaneously, all within milliseconds. Building a network layer that operates at this speed and reliability takes years and world-class distributed systems expertise.
3. Lightning-fast app startup through snapshotting. The platform can pre-initialize any application, capture its fully initialized state as a memory snapshot, and then scale it to zero. When the next request arrives, the instance resumes from that snapshot as if it never stopped. This works for heavy, stateful applications, not just lightweight functions: databases, AI agents with context, and full development environments.
Beyond basic snapshots, the platform supports forking (spawning multiple new instances from a single shared initialized state), continuous snapshots (checkpointing state on an ongoing basis so nothing is ever lost), and instance templates (pre-built snapshots of complex environments that can be launched on demand).
The range of workloads Unikraft supports is broad: AI agent sandboxes, headless browsers for scraping and automation, serverless functions, serverless databases, build and test environments, ETL and data pipelines, remote IDEs, API gateways, and game servers. Ultimately, anything you can define in a Dockerfile, Unikraft can run. It doesn't matter how large or heavy the environment is.
Over the last few years, we evaluated several hundred software infrastructure companies, and we have never seen performance benchmarks like Unikraft's. They sound like they shouldn't be possible, but they are. And these aren't synthetic tests in controlled environments. They're production numbers from Unikraft's customers running real workloads at scale:

These numbers are orders of magnitude better than what the market can currently offer. It's rare to see such a paradigm shift in one of the biggest industries in the world happening right in front of you. A few of these deserve a closer look because they don't just represent incremental improvements. They change what's technically and economically possible:

What makes these numbers truly remarkable is that they normally exist in direct tension with each other. In cloud infrastructure, you historically had to choose: fast startup or strong isolation, high density or full VM security, developer convenience or raw performance. Every platform before Unikraft has delivered one or two of these at the expense of the others. Unikraft delivers all of them simultaneously, with virtually no trade-offs. This is resulting in a fundamentally different architecture, and in our view, it's the strongest technical foundation we've seen for what the next generation of cloud infrastructure needs to look like.
The benchmarks above tell you what Unikraft can do. But the more convincing evidence comes from watching what happens when sophisticated engineering teams actually bet their products on it. Two of the best public examples are Browser Use and Prisma: an AI agent platform and a managed Postgres service, two completely different products that ran into the same infrastructure wall and arrived at the same answer.
Browser Use, one of the open-source companies we highlighted in Sifted, runs millions of web agents on Unikraft. Their CTO Larsen Cundric published a detailed writeup of their migration from AWS Lambda to Unikraft micro-VMs. Once their agents needed to execute arbitrary code, Lambda broke down: the agent loop shared a backend with the REST API, a redeploy killed every running agent, and code execution could theoretically access host credentials. Now each agent session runs in its own Unikraft micro-VM on dedicated bare metal inside AWS, seeing exactly three environment variables. No AWS keys, no database credentials, no API tokens. They run the exact same Docker image locally, in evals, and in production, with a single config flag switching between environments.
Two details from their architecture are worth pulling out because they're the kind of thing you only get from a platform designed for this specifically:
The business result: Browser Use doubled their contract size within six months of starting on Unikraft. The platform didn't just solve a technical problem. It gave them the isolation model, the cost structure, and the operational simplicity they needed to scale from thousands of agents to millions.
Prisma went even further: they rebuilt their entire serverless Postgres product on Unikraft. Their architecture post is effectively a long explanation of why traditional cloud infrastructure made their product economically impossible. Serverless database providers typically either resell AWS infrastructure or run container-based orchestration with Kubernetes. Both approaches mean each database instance consumes significant resources even when idle, and the cost of maintaining millions of free-tier databases at that overhead is unsustainable. This is exactly why providers like PlanetScale and Heroku shut down their free tiers.
Prisma's answer was to rebuild the stack from first principles on Unikraft. Rather than reselling hyperscaler infrastructure, they lease bare metal servers in data centers globally and run Postgres inside full-fledged, Linux-based microVMs:
This server density is unheard of in traditional architectures. Prisma didn't just adopt Unikraft as a vendor. They rebuilt their flagship product on it because nothing else could support the business model they wanted to build.
We believe Unikraft is at an inflection point. The technology is proven. The first wave of customers has validated it in production. The expansion dynamics are strong. What comes next is the transition from early adopters to industry standard, and that's where the real scale begins.
The timing for that transition could not be better. AI workloads are exposing the limits of legacy cloud infrastructure at exactly the moment Unikraft's product is mature enough to absorb them. The market is enormous: a trillion-dollar cloud industry growing at 16% annually, with the fastest-growing segment, AI-native workloads, being precisely the one where legacy infrastructure fails most dramatically. And the deployment flexibility is already there, meeting customers on bare metal, inside hyperscaler VMs, or on-prem, which means there's no architectural reason Unikraft can't be everywhere these workloads run.
Infrastructure shifts happen roughly once a decade. Physical servers gave way to virtual machines. VMs gave way to containers. Containers gave way to serverless. The next shift, to something that combines the isolation of VMs, the speed of functions, and the density required for AI-scale workloads, is happening right now.