When Software Becomes Probabilistic: Borrowing Manufacturing’s Playbook for a GenAI World

At OSS Ventures, we build startups for the future of industrial operations. Over the past few years, that’s meant spending a lot of time in factories (860 to be precise), on shop floors, and with the people who keep them running — and more recently, a lot of time with AI systems in production. We’re not theorists; we’re builders. Our views come from what we’ve shipped, broken, fixed, and shipped again. This article is not a manifesto or a set of immutable laws — it’s a reflection on patterns we’ve seen, and lessons we think are worth sharing. As always, reality has the last word, and we fully expect to revisit and revise these ideas as we keep building.

1. The Day Code Stopped Being Deterministic (and music died)

For decades, the unwritten contract between software and its creators was simple: Same input → same output.

If something broke, it broke in a reproducible way. A bug was a loyal bug — you could call it at 2 a.m., and it would show up in exactly the same place, wearing exactly the same hat.

This was the software engineer’s comfort zone.

And then along came large language models (LLMs) and generative AI. Now, when you ask the same thing twice, you get two different answers. Both plausible, both confident — one correct, one… poetic. Sometimes both wrong in entirely different ways.

It’s as if your CNC machine, the one that cuts aircraft parts with micron-level precision, decided that on Thursdays it’s an impressionist sculptor and produces something “inspired by” the blueprint.

If that thought makes you sweat — welcome to the AI builder’s life.

If you’ve worked in manufacturing, though, you define this situation as “another Tuesday”.

Variability is not news to you. Factories have been taming unreliable processes for over a century, delivering parts and products to tolerances tighter than human hair, making planes fly and rocket go to the moon in the process. They do it not by wishing variability away, but by designing for it.

And that, in 2025, is exactly what genAI enthusiasts can reflect on.

2. From Determinism to Variance Management

In the old world of deterministic software, reproducibility was the norm. The same input always produced the same output. Bugs were reproducible, test coverage climbed, defect rates fell, and reliability grew over time. The unspoken belief was: “If it works once, it will work forever (until someone changes the code).”

In the probabilistic world of GenAI, the rules have changed. The same input can produce slightly different outputs every time. Problems appear sporadically, vanish when you test, and then reappear in production. And when you chain multiple AI steps together, the small variations multiply. The result is a system where you can’t just “fix the bug” — you have to manage a rolling tide of variability.

For a software engineer trained in the deterministic world, the first instinct is to squash all variance until it disappears. But manufacturing teaches us that zero variability is an illusion. Instead, the real game is to control variance so it never accumulates enough to break the system.

3. Manufacturing’s 100-Year Head Start

Manufacturing accepts a truth that software is only now discovering: no two outputs are ever perfectly identical. Cutting tools wear down. Materials expand and contract. Ambient temperature changes performance.

If you don’t control those deviations, they pile up until the product stops fitting together — or fails entirely.

Factories have had a century to perfect the art of managing this reality. They use systems built on a few core pillars.

3.1 Tolerancing — Defining “Good Enough” and bounds

Every engineering drawing comes with a tolerance: the acceptable range around the target dimension. ±0.01 mm might be fine for a consumer product, ±0.0001 mm for aerospace. The tighter the tolerance, the higher the cost.

In GenAI, you can apply the same thinking. Decide in advance what “good enough” means for your output. For example, a summarization might be acceptable if it captures at least 90% of the key points; generated code might be acceptable if it passes 95% of its unit tests on the first run; a procurement bot might need to suggest prices within a few percentage points of the target range. Then measure continuously, and act when outputs drift outside the acceptable band.

3.2 Process Control — Watching the Trend, Not Just the Point

Statistical Process Control (SPC) is about tracking variation over time, not just checking whether the latest part meets spec. By spotting a trend before it produces bad parts, factories save themselves from costly recalls.

The AI version of this is continuous monitoring of key performance indicators: accuracy, latency, bias levels, hallucination rates. You don’t wait for a customer to notice something’s wrong — your monitoring should tell you long before the defect reaches them.

3.3 Standard Work — The Best Known Way

In Toyota’s world, “standard work” means everyone does a task the same way, using the best method known so far, until a better one is found and documented.

In AI systems, standard work means standardizing prompt templates, preprocessing steps, and output validators. Treat them like living documents: the “best known prompt” today will be outdated tomorrow. Version control them, update them intentionally, and document deviations so they can be reversed if they fail.

3.4 Continuous improvement : discover new things about the context, all the time

The processes and organizations in manufacturing expect to discover new things about their context and have standardized rituals (performance management), deliverables (ishikawa and standardized problem solving) to discover them, and then special processes to deploy them (

Basically, the “standardized work” is the result of dozens if not hundreds of different edge cases discovered through continuous improvement over time, getting to a very reduced level of variance.

3.5 Quality at the Source

Six Sigma preaches building quality in, not inspecting it in at the end. In manufacturing, every station checks its own work before passing it on.

For AI systems, this means adding lightweight validation after every step. After a classification, check whether the confidence is above your minimum. After a summarization, check for required terms. After code generation, run automated tests before committing. Catching defects early prevents them from cascading downstream. Use a mix of human and machine. Iterate on this.

3.6 Designing the right chunks of processes to minimize and control variability

Knowing how to design the right “chunks of processes” is paramount in manufacturing. Usually, standardized work will be designed to have all the context to get one thing done, and have as little interaction with the rest of the process as possible.

The parallel with genAI is obvious : if you’ve been building in the space, you always end up asking yourself “should I try and one-shot this, or should I split it into subtasks ? What is the right chunk ?”. The Lean and manufacturing answer to this is that the right chunk is the smallest one that captures all the inner variability of the thing at hand.

There are also tasks for which a certain process (say, casting for luxury items) is just not suitable at all because the variability is not the right tolerance set for the particular demands of a piece.

4. Translating Factory Roles into Software Teams

Manufacturing has a role that most software teams don’t: the Quality Manager. This person doesn’t run the machines or design the parts — they own the system that ensures consistent output across people, machines, and shifts. They track metrics, investigate deviations, and coordinate fixes.

A side note : usually, big software teams will rely on “site reliability engineers”’ or “quality software engineers”. The funny part is that, in the manufacturing playbook, they’re not quality people, they’re helpers, i.e they don’t help dealing with the variability of the process, but rather help deal with the variability of the people writing the code, assuming then that the code will be deterministic. I got in a heated argument with a CTO once because 15% of his payroll was “quality” (spoiler alert : I won that argument, of course).

In GenAI software, we’ll see the emergence of an “AI Quality Manager.” This role will combine prompt curation, output variance monitoring, drift detection, and guardrail design. They’ll be the keeper of prompt libraries, the analyst who notices when an output starts to wander, and the engineer who defines what to do when it does. Most probably, the end user also will be part of that quality process.

5. Prediction — The Factory Model for AI Product Orgs

If you take the analogy to its logical conclusion, the future AI product organization looks a lot like a factory.

You’ll have process engineers (ML engineers and prompt engineers) designing the “machines.” Operators (end-users) running the process in real-world conditions. Quality inspectors (validators and human reviewers) catching defects early. Quality managers (AI quality owners) maintaining the system. And maintenance crews (model retrainers and prompt optimizers) keeping everything tuned.

In this model, a software product is not just a set of features — it’s a production process. Shipping something new is less about pushing code and more about adding a new station to the line, tuning a machine, or adjusting a tolerance.

More importantly, operating the software will not be a zero-cost process : there will be a part of operating the software that has a cost. This is new territory for software teams, and has profound implications on the business model, org chart and processes.

6. A Practical Starting Point for AI Builders

We have seen this pattern emerge over the last companies we created. Those are thougts in progress. But here are some hard-earned truths we think you can rely on as a builder :

First, define what “good enough” looks like for your outputs.
Second, instrument your system so you can actually see when things drift.
Third, standardize your processes — prompts, validators, integrations.
Fourth, insert validation after every AI step.
Fifth, assign clear ownership of quality. If it’s “everyone’s job,” it will end up being no one’s job.

7. Closing Thoughts — This Is Not Optional, and not everything

We think the winners in the probabilistic software era will look a lot like well-run factories: process-driven, variance-aware, quality-obsessed.

At OSS Ventures, we’ve seen enough in both factories and AI deployments to know that the flashy demo is not the product. The product is the system that makes that demo reliable, at scale, under real-world conditions.

Of course some things are different : for example, inner variability of LLMs cannot be tamed for certain words or applications, which is inherently different from process control.

And when the CNC in your code decides to freestyle on a Friday afternoon, you’ll be glad you built like a manufacturer.

Did you like this article ? Then come build with us at OSS Ventures, we’re hiring in the team, in the portfolio. We are also always looking for incredible founders wanting to build with us