Semantic Agent Harness Architecture

If you've ever walked through a beautiful European city you realise the power of great urban planning. Everything is a short walk away, streets flow into communal parks, and it all just seems to work. In contrast, many modern cities show the result of uncoordinated, incoherent development, and are still untangling decisions made without an architectural frame.

That's what's happening right now with AI agent harnesses. AI agents — software systems that can independently plan, make decisions, and take actions to complete tasks — are being deployed across enterprises at pace. The developer community has discovered that wrapping these agents in structured controls — prompt management, tool orchestration, error recovery, human-in-the-loop approvals — dramatically improves reliability. They're calling it "harness engineering," and the conversation is moving fast. Anthropic, OpenAI, and Martin Fowler's team at Thoughtworks have all published frameworks in the last fortnight. Meta paid US$2 billion for Manus, a company whose core product is, technically, a wrapper around someone else's models. The formula everyone is converging on is simple: Agent = Model + Harness.

However the entire conversation stops at the developer's laptop. And that should concern every enterprise architect, CTO, and COO reading this.

The enterprise gap

The current harness engineering discourse focuses on making individual agents more reliable: better error recovery, smarter context management, structured evaluation loops. These are genuine engineering advances. However they address only one dimension of a much larger problem.

When agents operate inside an enterprise, they operate inside an existing web of controls, obligations, and architectural standards that the current frameworks ignore entirely. Four concerns stand out.

Risk and compliance. Every enterprise maintains a risk framework, whether that's ISO 31000, COSO, or a bespoke regime shaped by industry regulation. When an agent makes a decision — selecting a library, choosing a data model, modifying an API contract — that decision carries risk. Today's agent harnesses have no concept of organisational risk appetite, no way to classify decisions by impact tier, and no mechanism to route high-impact choices to the right approval authority. The agent is productive, but the risk classification that should accompany every significant decision simply isn't happening.

Data governance and lineage. Agents consume data, transform data, and generate new data artefacts. In any organisation with data governance obligations — and that's most of them — every transformation should be traceable: who did what, when, using what source, under what authority. Current harnesses don't capture this lineage in a form that integrates with existing data governance frameworks. The agent's work happens in a lineage blind spot.

Enterprise architecture standards. Most organisations maintain architectural principles, technology standards, and pattern libraries that guide how systems are built. Agents don't consult these. A coding agent will happily introduce a new framework, create a redundant service, or deviate from established integration patterns, not because it's making a bad decision, but because it has no access to the organisational context that would make it a good one.

Change management and audit. Risk classification tells you whether a decision needs scrutiny. Audit tells you what happened. When a human architect makes a significant decision, there's typically a record: an architecture decision record, a change request, a design review. When an agent makes an equivalent decision during a seven-hour autonomous session, the decision is buried in a conversation transcript that nobody will read. The structured, queryable audit trail that enterprise governance depends on simply doesn't exist.

Why better models won't solve this

There's an active debate in the AI community right now about whether agent harnesses are permanent infrastructure or temporary scaffolding. The argument, sometimes called "the bitter lesson," goes like this: every time engineers build clever hand-crafted systems around AI, the next generation of models simply absorbs that engineering and makes it redundant. Features that required elaborate orchestration last year are handled natively by this year's models. If this trend continues, the harness gets thinner and eventually disappears, and investing heavily in harness infrastructure is a waste.

For developer-facing concerns like retry logic, context management, and error recovery, this argument has real merit. Models will continue to get better at these things. However it breaks down completely at the enterprise governance layer, for a straightforward reason: every organisation does governance differently.

There's a critical distinction between a model that can understand your governance requirements and an infrastructure layer that enforces them within your organisational workflows.

Yes, future models will get better at understanding governance rules and even following organisational conventions. However there's a critical distinction between a model that can understand your governance requirements and an infrastructure layer that enforces them within your organisational workflows. An agent that knows your risk framework is not the same as a harness that routes high-impact decisions to the right approval authority, captures structured audit records, and feeds lineage data into your existing data governance platform. One is knowledge, the other is process, and process is an architecture problem.

Your risk appetite is yours. Your compliance obligations are shaped by your industry, your jurisdiction, and your regulatory history. Your architectural principles reflect hard-won lessons about what works in your specific technology landscape. Your data classification policies, your change approval workflows, your audit requirements — these are organisational decisions, not technical ones. No model improvement will absorb them because they aren't general knowledge. They're your knowledge, and the systems that enforce them are your architecture.

This is why the agent harness is fundamentally an enterprise architecture concern. The harness is where standardised AI capability meets bespoke organisational reality. It's the integration point between what the agent can do and what the organisation permits, expects, and needs to know about. Getting this layer right is an architecture problem, not a model problem and not a developer tooling problem.

What enterprise-grade should look like

If we accept that the agent harness is an EA concern, what should an enterprise-grade harness actually deliver? I think four capabilities matter beyond basic developer reliability.

Provenance and audit trails. Every agent decision, every artefact generated, every tool invoked should produce a governance record that maps to existing audit frameworks. Not a conversation log, rather a structured, queryable record of what was decided, why, what alternatives were considered, and what the downstream impact is. This is the foundation everything else builds on.

Governance gates that map to existing controls. The harness should enforce the same decision authority and review processes that apply to human-led work. High-impact architectural decisions should route through established approval workflows. Risk classification should happen at the point of decision, not after the fact. The agent should consult your architectural principles and pattern libraries before proposing a solution, not operate in a vacuum.

Signal capture that feeds existing data lifecycles. Every agent session generates signals: what data was consumed, what was produced, what transformations were applied, what dependencies were created. These signals should flow into existing data governance and lineage platforms automatically, not require manual reconstruction.

Structured learning from agent activity. The governance records and provenance data generated by a well-harnessed agent shouldn't just satisfy compliance. They should feed back into the organisation's architectural knowledge base, improving future agent sessions and human decision-making alike. The organisation should get smarter over the harness lifetime, not just the agent. This kind of structured organisational memory is fundamentally a semantic problem. Agents need more than rules — they need access to the meaning behind your architecture: what your concepts represent, how your components relate, and why decisions were made. We'll explore this in depth later in this series.

The organisations that treat agent harnesses as an enterprise architecture concern now will be the ones that deploy agents at scale with confidence. The ones that leave it to the developers will spend the next decade untangling decisions made without an architectural frame — just like those cities.

Are Agent Harnesses an Enterprise Architecture Problem?

The enterprise gap

Why better models won't solve this

What enterprise-grade should look like

Access tomorrow's data & AI tools