Wrapped Too Tight

Why architectural decisions age faster than the projects that produce them

May 26, 2026

The coating project

We are working on a project in aerospace right now. The client receives metal parts that need to be coated according to a long list of industry specifications. Each part has its own attributes. The job is, given a part and the relevant specifications, to produce the exact sequence of steps to coat it correctly. A human engineer verifies the output before it lands in a system that walks factory workers through the steps on the shop floor.

I built it as a workflow.

Collection stage pulls the part attributes and the library of relevant specifications. Transformation stage does the structured extraction, the cross-referencing, the validation. Decision stage produces the ordered steps that get routed for human review. There are LLM calls at specific points where the work demands them. Everything else is deterministic.

It works. The client is happy with it. But sitting with it now, I think it probably should have been an agent.

The reasoning step is where the actual work happens. The combinations of part attributes and applicable specifications are too varied to enumerate, so the matching requires real judgement. I built deterministic plumbing around an LLM call that was making the only genuinely hard decision in the system, and then I added retrieval and validation layers because I wanted to verify each step in isolation. The result is more rigid than the underlying work needs to be.

An agent version would have given the model the tools to query specifications when it needed them and the room to iterate on partial matches. It would have been harder to verify. It would have shipped slower. And it would probably have been a better fit for the shape of the problem.

The direction of my errors

This is not an isolated case. If I am being honest about the direction of my errors, I have built workflows that should have been agents. I have not built agents that should have been workflows.

The cause is conservative bias. When I started building these systems, the models were less reliable, and every additional degree of freedom felt like risk. So I leaned toward deterministic flows with LLM calls at very specific points where I could verify each step. The systems shipped, the systems work, but some of them are more rigid than the problems they were solving.

I think this is also the bias I see most often in other practitioners. Engineers tend to over-constrain. We trust deterministic code more than probabilistic reasoning, and we reach for the workflow because it is easier to debug, easier to explain, and easier to defend in a postmortem.

That bias is becoming out of date faster than I would like to admit. The models today are reliable enough that some of the determinism I wrapped around them is doing more harm than good. It adds latency, it limits flexibility, and it papers over the fact that the genuinely hard part of the system is the reasoning step.

What changes now

If a coating project landed in my queue today, I would still draw the same shape. Collect the attributes, gather the relevant specifications, produce ordered steps, hand to a human. That part does not change.

What changes is how much determinism I would wrap around the reasoning stage. The question I would ask first is whether the work inside each stage is actually deterministic, or whether it requires reasoning over input I cannot fully anticipate. For the matching stage on coating, the answer is reasoning. That should have told me to give the model more freedom, not less.

The threshold for what counts as reasoning territory moves with the model. The systems I built when the models were less capable were workflows because I did not trust the models with more freedom. Today I would build some of them differently. The pace of model improvement now means architectural decisions in AI have a shorter shelf life than the projects that produced them. The build always has a sequel, and the sequel arrives sooner than it used to.

Whui-Mei Yeo

Jun 2

Sounds like a problem I have recently been dealing with. My build was a Slack interface to the CRM, to ease the mental hurdle for the human staff to use regularly. My first version which I YOLO-ed led to the AI coding agent assigning firstly 6 tasks, then a second version added 10 more (total 16) to the AI agent node (classify, match, decide, evaluate on multi-turns). The first version was strict and required the human to give instructions in a certain way - too robotic. The second version was an attempt to make it less deterministic and it ended up not responding reliably during some of my eval re-runs. Not good. For my own personal learning I refactored to limit the AI agent to 3 tasks and handle the rest in code but am seeing it doesn't handle ambiguous multi-turn responses as it doesn't have a brain to figure out how all the pieces from each turn are connected. Haven't figured out what to do next.

Discussion about this post

Ready for more?