When AI Is a Commodity

Capability is commoditising. Lock-in is the next move.

May 12, 2026

This week OpenAI announced a $4 billion vehicle called the OpenAI Deployment Company, designed to put OpenAI engineers inside enterprise customers to help them, in the company’s words, move from pilot to production. McKinsey, Bain & Company, Capgemini, and BBVA are among the founding investors, and OpenAI is acquiring a London-based consultancy called Tomoro to staff it with roughly 150 forward deployed engineers from day one. Anthropic announced a similar $1.5 billion vehicle the week before. The official framing across both is straightforward. Model performance is no longer the bottleneck. Deployment is. The labs are stepping up to close the gap.

That framing is correct as far as it goes, but it’s also a tell. Frontier model capability has started to look an awful lot like a commodity, and the labs know it. Lock-in is the next move, and a multi-billion-dollar FDE army inside customer environments is the cleanest way to get it.

Capability is commoditising

There are at least half a dozen frontier-class models now, from OpenAI, Anthropic, Google, Meta, DeepSeek, Qwen, Kimi, and a few others, that swap places on benchmarks every few months. They’re all good. The marginal difference between the best model on a given task and the second-best is usually within the range of what a slightly better prompt or a slightly better retrieval pipeline would close anyway.

The economics tell the same story. Inference costs for equivalent quality are falling at something close to an order of magnitude per year. Anthropic’s gross margins came down to around 40% in 2025, from a projected 50%, and their break-even isn’t on the horizon until 2028. None of the frontier labs are profitable yet, and the unit economics on their consumer subscriptions are visibly under strain.

The Claude Code regression saga earlier this year is the cleanest community signal of where this is heading. Anthropic eventually traced the issue to three product-layer changes affecting Claude Code, the Agent SDK, and Cowork, while the underlying API was unaffected. Despite the post-mortem explanation, for me, Opus 4.6 was the peak, and the experience has felt downhill ever since Opus 4.7 shipped. The point is that we don’t actually know what we’re being served on any given day. The wrapper changed, and the people paying for it couldn’t tell. That isn’t unique to Anthropic. It’s the structural condition of paying a subscription for a service whose internals you can’t see.

When capability stops differentiating, and customers can’t fully trust that the thing they bought yesterday is the thing they’re getting today, the strategy has to shift. The play becomes embedding so deep in customer workflows that switching becomes painful even when it looks economically obvious. Microsoft did this for thirty years. The model labs are doing it on a compressed timeline.

Most enterprise AI work doesn’t need a frontier model

This is the part that doesn’t get said often enough. Most of what we actually deploy at enterprise clients isn’t frontier work. It’s classifying inbound emails into one of four categories. Extracting line items from invoices and matching them against purchase orders. Drafting first-pass replies to support tickets. Summarising call transcripts into action items. Routing tasks between systems. Reading a sales order and turning it into the right ERP transaction.

Almost none of this requires Opus 4.7 or GPT-5.5. A 7B to 70B open-weight model handles all of it well enough, and the gap between “well enough” and “best in class” usually shows up nowhere except in a benchmark sheet. The frontier earns its keep on a small slice of genuinely hard problems: novel agentic workflows, complex multi-step reasoning, edge cases that haven’t been seen in training, low-resource languages, long-context retrieval where every percentage point of accuracy compounds. In my work at Lleverage, across logistics, wholesale, manufacturing, insurance, and finance, that hard slice is maybe 5% of what actually lands in production. The other 95% is workhorse stuff that an open-weight model trained six months ago handles without anyone noticing.

I’ve been running Qwen, Gemma, and Mistral on my own hardware over the last few months. Not in production for any client yet, but enough to form a view. For the kinds of tasks I’d typically reach for a frontier API to do, they’re not quite as good, but they are good enough for the work most enterprises actually need done. Good enough is the only threshold that matters in production, and the gap to good enough closes faster every quarter.

A return to owning the machine

There’s a useful parallel here to the first wave of enterprise computing in the 1970s. The trailblazers were the companies that bought IBM mainframes when nobody else did, brought the machine in-house, built applications around it, and accepted the operational burden of owning the silicon, the workloads, and the upgrades. That model gave way to client-server, then to the cloud, where compute became something you rent. For two decades, owning the machine felt like an artefact of an older era.

I think AI is going to bend that line back. The companies that take this transformation seriously will, increasingly, buy ready-made machines for running LLMs and treat them as a core piece of their infrastructure, the way an earlier generation treated their first AS/400 or their first VMware cluster. Not for everything. But for the routine workloads that already touch sensitive data and don’t need a frontier model in the first place. The hardware exists in 2026 in a way it didn’t a year ago. NVIDIA’s DGX Spark, Apple’s M-series workstations, Lenovo’s ThinkStation PGX, AMD’s MI-series, and a handful of other options have made on-premise inference economical at a scale that wasn’t true even twelve months ago. A Quad DGX Spark setup runs roughly 18 to 20 thousand dollars and pays for itself in three to seven months for any team whose API spend regularly exceeds three to five thousand dollars a month.

The interesting consequence is that once you own the machine, the model running on it becomes a choice rather than a dependency. You can swap Qwen for Mistral for Gemma without re-implementing anything around it. The vendor relationship moves from the model layer down to the silicon, and silicon competition is, historically, a much healthier kind of competition than software lock-in.

The questions clients have started asking line up with this shift, even when nobody at the table uses these words. Where does the inference actually happen, on what continent, under what jurisdiction. Whose audit logs can you request. Whether anyone can promise the model isn’t being retrained on the data you’re sending it. A year ago these questions came mostly from engineering teams curious about latency. Now they come from procurement and compliance, and they tend to surface around the IT security review when somebody finally reads the fine print. The regulatory direction of travel reinforces the same shift. The EU AI Act’s high-risk obligations land in August 2026. France and Germany have launched a sovereign AI initiative with Mistral and SAP. The UK established a Sovereign AI Unit with up to £500 million of funding in mid-2025. The questions aren’t an early signal anymore. They’re the direction the rules are moving in.

Who actually does the wiring?

This is what explains why a $4 billion deployment company makes strategic sense even when frontier capability is commoditising. The bottleneck in enterprise AI isn’t the model. It’s everything around the model. The ERP that has no API. The operator whose tacit knowledge is the only documentation of how the process actually runs. The firewall rules that need re-negotiating with an implementation partner who left the client three years ago. The compliance team that needs to sign off on what data leaves which jurisdiction. The change management work that gets a sceptical operator to trust an agent’s output for the first time.

That’s the work. It’s what I’ve written about in almost every post on this newsletter. It’s unglamorous, slow, deeply context-dependent, and impossible to automate away. And it’s exactly what FDEs do. The reason OpenAI is willing to pay $4 billion to staff up an in-house FDE army is the same reason every AI company serious about enterprise revenue has been quietly building FDE teams for the last three years. The model is becoming the cheap part. Wiring it into a working system isn’t. The lock-in critique still applies, of course. Once an OpenAI engineer has spent six months inside a customer’s environment, mapping systems and building agents tightly coupled to OpenAI’s specific APIs, switching to a different lab stops being a model swap and becomes a re-implementation. That’s by design. It’s the same playbook every enterprise software vendor has run since SAP.

Which surfaces the bigger question. Who actually executes this transformation, across all these companies, globally?

Every mid-sized manufacturer, wholesaler, insurer, and logistics operator on the planet has to go through their version of what their bigger counterparts went through with the move from mainframes to client-server, and then again with client-server to cloud. The big four consultancies took the bulk of those last two waves. McKinsey, Bain & Company, and Capgemini being on the OpenAI Deployment Company cap table suggests they intend to take this one too. They have the relationships, the brand, and the procurement preference baked into thirty years of enterprise IT decision-making.

The trouble with that answer is that the work this time is more technical, more bespoke, and more dependent on the kind of context that doesn’t transfer well from one consultant’s slide deck to another. It’s FDE work. And there aren’t anywhere near enough FDEs on the planet to do it for every company that needs it. The OpenAI Deployment Company is starting with 150 engineers. Anthropic’s vehicle will scale similarly. McKinsey can throw people at projects, but the gap between a generalist consultant who knows the AI talking points and someone who can actually wire an agent into a customised ERP is real, and it’s a gap that doesn’t close in a quarter.

So who actually does it? The labs are betting on their in-house FDE armies, the consultancies on the relationships and bench they've built over thirty years, and the market is probably bigger than both put together. The question I'm sitting with isn't who wins. It's who gets the work done for the companies neither side is structured to serve.

Rory John O'Brien

Great take and agree with your assessment. I think the longer term play for the model owners (realizing they are absolutely going to be fully commoditized) is using these armies of FDE's (that they own) and build vertical specific skills/products/agents. Anthropic did the v1 of this for the Finance Agents they just launched.

It's not just vendor lock from models themselves, customers don't want vendor lock-in from AI-Native companies who are currently trying to "own the workflow". These armies of FDE's directly from OpenAI/Anthropic are going to be purpose built to reinforce all their learnings on how to weed through the messy enterprise noise, in order to own the workflow as well.

I still can't decide if Anthropic/OpenAI will go out and buy vertical-specific agentic companies. Maybe for customer logos, pre-built FDE teams, and maybe domain knowledge...but not from a technology perspective.

1 reply by Milos Mandic

1 more comment...

Discussion about this post

Ready for more?