The Context Gap
The real bottleneck in enterprise AI isn't intelligence. It's everything around it.
I had a client in manufacturing where the entire project hinged on creating sales orders in their ERP. The agent that could read and process those orders was ready in days. But the ERP was completely custom, built by a single developer over more than a decade, with no API, no documentation, and no obvious way in.
The client was still in touch with the original builder, so they shared the contact and I got on a call with the guy. Nothing dramatic, but every piece of logic, every business rule, every edge case in that system lived in one person’s head. If he’d moved on or been unreachable, there was no project. The AI was perfectly capable. There was just no door to walk through.
The agent works. The system it needs to talk to does not. These systems were never designed to be talked to by an agent. They were designed for a person who already knows how everything works.
The distance between what an agent can reason about and what it can actually access and act on is the context gap. Right now, it’s the single biggest bottleneck in enterprise AI.
Demos are easy. Integrations are not.
Building an agent that can read a sales order and extract line items is a weekend project. Building one that can take that sales order, match it against the right customer in an ERP, check inventory across three warehouses, apply the correct pricing tier, and book it into the accounting system with proper GL codes? That’s months of work. And most of that time isn’t spent on the agent itself. It’s spent getting access to the systems around it.
Another client, a wholesaler, ran an old on-premise version of Business Central. Getting a connection required weeks of back-and-forth with their implementation partner over firewall restrictions, whitelisted IPs, and specific port configurations. Once we finally had access, we discovered the system had been customised with internal actions and triggers that nobody fully understood. We’d call a seemingly straightforward endpoint and something unexpected would fire in the background. Weeks were spent mapping undocumented behaviour that only revealed itself through trial and error.
This is the reality across almost every deployment. APIs are undocumented or don’t exist. Every customer’s setup is customised in ways that generic documentation doesn’t cover. And the people who originally configured the system have often moved on.
System archaeology
When documentation doesn’t exist, you discover the system yourself. I’ve started thinking of this as system archaeology. Software archaeology is an established concept, but what FDEs do in the field is a specific version of it.
In practice, it looks something like this. You hit an endpoint that returns all available objects. You fetch a thousand of them and immediately you’re looking at field names that only make sense if you’ve worked in that specific industry for a decade. This is where an LLM becomes genuinely useful. You feed it the response and it helps you interpret the data, connect the dots between cryptic field names and domain concepts, and start mapping the structure of what you’re looking at. You filter by type until you find the ones you actually need, often going back and forth with the model to make sense of what each type represents in the context of the client’s business.
Then you need to learn how to create that type. So you start with the absolute minimum number of fields and try to post an object. It fails. The LLM helps you build the next request and interpret the error in context. You add fields, try again. Eventually it succeeds, and you check what you’re missing compared to what a properly created object should look like. You iterate until the object looks right.
That gets you basic reads and writes, and through most of that process the LLM is a genuine partner. But the real complexity shows up when you need to figure out how to perform actions. Creating an object is one thing. Triggering the right downstream behaviour, posting it to a ledger, moving it through a status chain, firing the correct workflow, is another entirely. At that point the LLM can only really help you verify what happened, and verification itself is hard. You’re checking whether the system ended up in the right state, often across multiple tables and processes, with no specification to compare against. That’s where the undocumented behaviour lives. That’s where the system does things nobody told you about, because nobody remembered it did them.
Fetch. Interpret. Request. Validate. Repeat.
It’s painstaking work, but it’s the work that actually gets an agent into production, and no amount of model improvement will eliminate it. The problem isn’t the agent’s reasoning. The problem is that the context the agent needs is locked behind firewalls, buried in legacy databases, or sitting in the head of an operator who’s been doing the job for fifteen years.
Where this leaves us
Companies have access to the most capable AI models ever built, and most of them are still running on spreadsheets, disconnected tools, and manual processes. The gap between what the models can do and what the systems around them allow remains enormous, and nobody has done the work to close it.
I think the instinct for a lot of companies right now is to look at AI as a way to leapfrog the infrastructure problems they’ve been ignoring, to automate the messy process and let the agent figure it out. But AI is not a magic cure for undermaintained systems. If anything, it makes the mess more visible. The moment you try to connect an agent to a process, you discover exactly how undocumented, fragile, and human-dependent that process really is.
The organisations that get real value from AI will be the ones willing to do the unglamorous work first: mapping the processes, connecting the systems, documenting what was never documented, and building the doors before expecting agents to walk through them.
The question I keep coming back to is simple. How many companies are prepared to do that work? And what happens to the ones that aren’t?



