The Geographer, oil on canvas, Johannes Vermeer, 1668–1669

Hi everyone! I’m Renan, founder of OSS, leading venture builder and investment fund specialized in operations, based in Paris and Boston.

A few months ago, one of our portfolio startups achieved something we believe is a first in its category: the full automation of a complex physical product — from bill of materials to default-free production — orchestrated end-to-end by AI agents. We documented our learnings in a post.

Since then, over 10 new systems have been put in production by other startups of ours.

When we shared the post, and when we shared this learning, the question that kept coming back was: which model did you use?

It’s a reasonable question. It’s also the wrong one.

The model was not the differentiator. What made it work was an architectural decision that we see few people in the current AI landscape making — and that we think might define the next generation of production-grade systems.

The AI did not run the factory. The AI wrote the code that runs the factory.

That distinction, we’ve come to believe, matters more than most people think.

The Prevailing Paradigm

Here is how most AI deployments in operations work today.

A company takes a foundation model — sometimes fine-tuned, sometimes not — and places it in the operational loop. The model ingests data, reasons about it, and produces outputs that the system acts on. Sometimes those outputs go through a human. Sometimes they don’t.

In demos, this looks spectacular. The model answers questions about production schedules. It suggests planning adjustments. It flags anomalies. Executives nod. Budgets are approved.

Then it hits the shop floor.

And things get uncomfortable.

The model gives a different answer on Tuesday than it gave on Monday for the same input. A planning suggestion that was correct last week is subtly wrong this week because the model drifted. A hallucination — not a dramatic one, just a quiet misattribution of a constraint — propagates through the scheduling system and nobody catches it until a machine is idle for four hours.

For a factory running at 95% utilization, non-determinism is not a feature. It is a production incident.

And these are the failures you notice. The more insidious pattern is the one you don’t. The model produces an output that is plausible, internally consistent, and subtly wrong in a way that only someone with deep domain knowledge would catch. A material specification that looks right but violates a tolerance nobody documented. A scheduling sequence that appears optimal but ignores a constraint that exists only in the mind of the shift manager who has been running that line for fifteen years. These errors don’t trigger alarms. They accumulate silently until something downstream breaks — a quality defect, a missed delivery, a machine collision — and by then the root cause is nearly impossible to trace.

This is not unique to manufacturing. Any domain with low error tolerance — pharmaceutical production, aerospace engineering, financial compliance, energy grid management — requires auditability and repeatability as first-order properties of the system. You need to be able to explain why a decision was made, reproduce it exactly, and prove to a regulator or a customer that the same input will always produce the same output. These are not nice-to-haves. They are table stakes. And a probabilistic system deployed as the operational layer cannot deliver them by construction.

The root problem is architectural. When you deploy a probabilistic system as the operational layer — the thing the factory actually runs on — you inherit all the properties of that system: stochasticity, opacity, drift, and the fundamental inability to guarantee the same output given the same input. These properties are acceptable in a research context. They are disqualifying in a production context.

Nobody runs their production software on the compiler. The compiler is a tool. It produces an artifact — the executable — and the system runs on the artifact.

Most AI companies today are asking their customers to run on the compiler.

An Architecture That Seems to Work

The architecture that has survived contact with the shop floor — at least in our experience — looks different from what most teams are building.

We are not claiming this is the only way. We are claiming it is the pattern that has held across our deployments, and we have not found a better one. It works in three steps.

Step one: build the knowledge layer. This is the ontology — the structured representation of what exists (machines, materials, roles, constraints), what happens (manufacturing orders, quality events, delays), and what should happen (decision rules, trade-offs, optimization objectives). I wrote about this in detail in a previous piece. The knowledge layer is the hard part. At least 40% of the operationally relevant data in any company lives in people’s heads, has never been transcribed, and can only be extracted through direct, trust-based interaction with domain experts.

This step is unglamorous, slow, and profoundly human. It means sitting in a factory for weeks. It means earning the trust of a production manager who has seen a dozen technology vendors come and go. It means asking the right questions — not “what does your ERP say?” but “what do you actually do on Monday morning when the ERP output is wrong?” — and being patient enough to wait for honest answers.

Step two: use AI to generate code from the knowledge layer. This is where the models do their work — not at runtime, but at build time. The AI ingests the ontology, the contextual knowledge, and the human validations, and produces deterministic code: scheduling rules, BOM generation logic, planning constraints, quality gates. This code is versioned, testable, auditable, and — crucially — it does exactly what it says it does.

The AI here acts as a translator. It takes the messy, contextual, often contradictory knowledge that humans carry and transforms it into precise, executable logic. This is not a trivial task — it requires models that are genuinely capable of reasoning about domain constraints. But the key insight is that the model’s job ends when the code is written. It does not stay in the loop.

Step three: run a balanced system — deterministic code at the core, AI at the edges. This is important and easy to get wrong in either direction. The core operational layer — the decisions that the factory actually executes — runs on the deterministic code. Not on the AI. The AI is the craftsman, supervised by the human.

But the system does not stop there. Around that deterministic core, LLMs and agents play essential and ongoing roles. A production manager who wants to understand why a scheduling decision was made — that is a natural-language query against the knowledge layer, and an LLM is the right interface. An engineer who needs to create a new rule because a machine was just installed — that is a code-generation task where the AI translates intent into a structured rule. An agent that monitors deviation patterns across sites and surfaces anomalies no human would catch — that is a probabilistic system operating in an advisory capacity, feeding insights back to the humans who govern the deterministic layer. An operator who spots something unexpected and wants to propose a rule change — that is a conversation with an AI that understands the existing codebase well enough to suggest where the change should go and what its downstream effects might be.

The architecture is not “no AI at runtime.” It is “deterministic code for the decisions that run the operation, probabilistic AI for the questions, the creation, and the monitoring that wrap around it.” The deterministic layer is the contract. The probabilistic layer is the advisor. The key discipline is knowing which is which.

This inversion seems subtle. Its consequences are not.

Code is deterministic. It gives the same output for the same input. Every time. Code is auditable. When something breaks, you know where to look. Code is versionable. You can diff Tuesday’s scheduling logic against Monday’s and see exactly what changed. Code is governable. A production manager can read a rule in natural language and say “that’s wrong” before it ever touches the line. Also : code is cheap. Token use of our systems is laughably low. We don’t charge per token, thank God.

When AI is deployed as the core operational layer — the contract, not the advisor — these properties disappear. And in environments where a wrong decision can stop a production line, trigger a regulatory violation, or cascade into millions in lost output, their absence is not a theoretical concern. It is a business risk.

What This Looks Like in Practice

This is not theory. We have now shipped this architecture across multiple portfolio companies, in different industries, with different technical stacks. We don’t know if it generalizes to every domain. But across the ones we’ve touched, the pattern has held.

Hundreds of rules for a physical product.

One of our companies works on product design automation for manufacturing. The domain involves specialized industrial machines, factory-specific production constraints, and the design requirements of a large retailer. Before the system existed, this knowledge lived in hundreds of pages of PDFs that were never up to date and varied wildly across teams. Different factories had different versions. Different engineers had different interpretations. Nobody knew which document was current.

The AI was used to extract, structure, and encode approximately 800 design and manufacturing rules — machine constraints, production parameters, designer specifications — into deterministic code. Those rules are now managed by the production teams themselves, in natural language, with an AI assistant that translates updates into code. When a designer says “this material can’t be used with that technique at temperatures above 40 degrees,” the system encodes that constraint, checks it against the existing rule set for conflicts, and adds it to the codebase after human validation.

The result: full product variants generated autonomously, from design to production-ready bill of materials. First pass. Zero defects.

The AI does not run in production. The 800 rules do.

Hundreds of rules for supply chain planning.

Another portfolio company encodes planning logic for factories. At one deployment, approximately 400 planning rules were captured — in natural language — from the operational teams who had been carrying this knowledge in their heads and their spreadsheets for years. The AI translates those rules into scheduling code. The factory runs on the code.

The impact at a single site: over one million euros in annual gains, from fewer planners and better planning. Weeks of recoverable inventory out of a bloated stock position. And the critical detail: the rules are managed by the factory’s operational teams, not by software engineers. When a planning constraint changes — a new machine comes online, a seasonal priority emerges, a major event reshuffles the production calendar — the teams update the rule in natural language, the AI regenerates the code, and the factory adapts.

This is not a copilot answering questions. This is a system that turns operational knowledge into running software. The humans who carry the knowledge become the governors of the code that embodies it. They don’t need to be developers. They need to be domain experts with an interface that respects what they know.

Causal graphs for commercial intelligence : 300 objects, thousands of rules.

The most recent validation of this pattern comes from an entirely different domain. We are building a new company that applies causal graph technology to commercial intelligence. The problem: companies spend millions on marketing, promotions, brand building, and pricing decisions, but have no structural understanding of what actually causes sales to move. They have correlations. They have dashboards. They do not have causation.

The architecture is the same. An AI discovers the causal structure — the graph of relationships between actions, signals, and outcomes — using structure learning algorithms and causal inference mathematics. But the output is not a probabilistic model that answers questions on the fly. The output is a deterministic scenario engine: a set of encoded causal pathways that a CMO can inspect, challenge, and use to simulate interventions before committing budget.

The AI built the graph. The company runs on the graph.

Same pattern. Different domain. Same result: the operational layer is deterministic, auditable, and governed by humans. The AI remains present — as the engine that discovered the structure, as the interface for asking questions about it, as the tool for proposing modifications. But the decisions flow through the graph, not through the model.

What Happens When You Get This Wrong

We also know what the alternative looks like, because we lived through it. And I share this not to point fingers — the mistake was partly ours — but because the pattern is instructive.

One of our portfolio companies deployed AI as the operational runtime for an industrial configuration and quoting system. The product had extraordinary client traction. The market pull was real. But the technical architecture placed the model in the production loop: AI reasoning at query time, generating outputs directly from probabilistic inference.

It worked in demos. It worked for the first twenty clients. Insane commercial traction, by the way.

Then ambiguity compounded. Edge cases accumulated. The model started producing subtly inconsistent outputs for similar configurations. A client would get one answer on Monday and a slightly different one on Wednesday for an identical request, with no explanation for the discrepancy. Trust eroded — not catastrophically, but steadily. The engineering team spent increasing amounts of time debugging outputs that couldn’t be reproduced because the system was non-deterministic by design.

Turns out, .98^20 is zero.

The diagnosis was architectural, and in hindsight we should have caught it earlier. The technical leadership at the time — and we include ourselves in this — did not fully appreciate the fundamental constraint: that a probabilistic system deployed as an operational runtime will degrade under domain complexity, because ambiguity is additive and the system has no structural mechanism to resolve it.

The fix was not incremental. We could not patch our way out. We had to reboot the technical architecture from the ground up — re-extract the domain knowledge, re-encode it as deterministic logic, and rebuild the product around the code-as-artifact pattern. The company survived because the market pull was strong enough to buy time. But it cost six months and a leadership change.

The lesson was expensive, and we think it generalizes: deploying AI as the core operational layer is an architectural choice that looks correct early and fails late. And by the time it fails, the switching cost is enormous. You are not fixing a bug. You are rebuilding the foundation.

What This Means

If this architecture holds — and we think after shipping more than 80 systems across industries that it does, though we remain open to being wrong — then it suggests something about what “AI company” might come to mean.

The product is not the model. Everyone has access to the same models. GPT-4, Claude, Gemini, Llama — the difference between them matters far less than most people think when the model is used as a build-time tool rather than a runtime engine.

The product is the encoded knowledge. The 800 rules. The 400 planning constraints. The causal graph. The kinetic layer — the decision logic that actually governs how the company operates. That knowledge, structured into a well-formed ontology and compiled into deterministic code, is what creates value.

The model is a tool. An extraordinarily powerful tool — one that can extract structure from hundreds of pages of tribal documentation, reconcile conflicting knowledge between teams, and generate code that would have taken a team of engineers months to write. But still a tool.

And here is the part that makes traditional software investors uncomfortable: the code itself is increasingly expendable. A few weeks ago, a single developer built a clone of a well-know entire open-source project — an application that would have required a team of ten engineers two years ago — essentially alone, using AI as the code generation layer. The codebase spread through the developer ecosystem in days, was forked thousands of times, and rewritten in multiple languages within a week. Code has never been cheaper to produce. The act of writing software is being commoditized in real time, and the speed of that commoditization is accelerating.

This makes the architecture argument even sharper. If code is cheap and models are commoditized, then the only durable asset is the knowledge that tells the AI what code to write. The ontology. The contextual layer. The kinetic rules. The things that took years of factory visits, trust-building, and domain expertise to extract — and that no model, no matter how powerful, can generate from first principles. You cannot prompt your way to understanding why a particular factory overrides its ERP every Sunday night. You have to be there. You have to ask. You have to earn the answer.

Our bet — and it is a bet, not a certainty — is that the non-model companies that will win are not the ones with the best models. They are the ones that capture the most contextual knowledge, encode it into the most reliable systems, and build governance structures that keep humans in control of the kinetic layer — the decision logic that is the living brain of the organization. Also, the cost advantage of this architecture is so high that tokenomics (the new one, not the blockchain one) will catch up with economic reality.

If this is right, the competitive moat is not technical in the way most investors assume. It is operational. It is the depth of the ontology, the trust relationships that allowed you to extract it, the forward-deployed engineers who sat in the factory long enough to understand what they were actually looking at, and the product design that turns domain experts into people who can nurture and extend the codebase without being software engineers.

This is categorically different work than fine-tuning a model. It is also more durable.

A Thought on What Comes Next

The tooling is getting better fast. AI agents are becoming meaningfully capable of extracting structure from existing documentation — we estimate roughly 60% of the initial structuring can now be automated when documentation exists. That number was closer to 20% two years ago. The trajectory is clear.

But the remaining 40% is where the real difficulty lives: reconciliation, disambiguation, and validation against the messy, contradictory, context-dependent reality of how organizations actually operate. The part where two teams in the same factory use the same word to mean different things. The part where a rule that was correct three years ago has been silently overridden by a workaround that nobody documented. The part where the most critical constraint in the entire system lives in a single person’s head, and that person is retiring next year.

That work still depends on human judgment. It still requires trust. It still requires proximity. And it still requires organizations to confront an uncomfortable truth: that much of what makes them work has never been written down.

If you are building in industrial software, or in any domain where the operational layer must be reliable, auditable, and governed — we’d encourage you to think about where the AI sits in your architecture. Not whether to use it. But where.

If it sits in the runtime as the operational contract, you are asking your customers to run on the compiler.

If it sits in the build step — and at the edges as advisor, monitor, and interface — you are giving them something they can own, inspect, and trust.

We think the second path is the right one. We know it has worked a few times with incredible economic lift. We could be wrong. But so far, it’s the one that keeps working.

What a time to be a builder.

Let’s build.