Chocolate Covered Broccoli, or why most genAI entreprise transformations fail

The spoiled child, 1760s, Jean-Philippe Greuze

The first time I heard the phrase “chocolate covered broccoli” I was in San Francisco, sitting at a long table with a few founders and a partner from a fund that had just passed on a deal I cared about. Someone — I cannot remember who — said that the company we had been discussing had built “chocolate covered broccoli.” The table laughed in that particular way insiders laugh when they recognize a shape they have seen a hundred times. Being the french noob I was with less than 9 months Bay experience, I had not heard the expression before. It stuck.

The idea is simple. You take the new technology and you smear it on top of the old technology, the old process, the old organization, the old assumptions. You serve it. Everyone agrees the chocolate is interesting. Nobody actually wants to eat broccoli. The whole thing tastes like a compromise between two things that did not need to be combined in the first place.

It is the dominant pattern of every technology cycle I have ever read about or lived through. It is the first thing humans reach for. And it is almost always wrong.

The Yahoo Problem

When the consumer internet arrived, the obvious thing to do was to take the most useful printed object in every household — the yellow pages — and recreate it as a website. A directory of categories, edited by humans, sorted alphabetically, with paid placements on top. That is what Yahoo was. A magnificent execution of the obvious instinct: take the existing artifact, plug it into the new substrate, and call it the future.

Yahoo, for a brief period, was the largest internet company in the world. They had the brand, the traffic, the cash, the institutional respect. They did not have the right architecture.

Two graduate students, working on a problem that did not look like a business at all, decided that the internet did not need a better directory. It needed a way to discover anything from any starting point. They built a search engine that treated links as votes and pages as authority graphs. The output was a single text field. There was no taxonomy. There were no editors. There was nothing recognizable from the old world.

Google turned out to be an all right company.

The point is not that Yahoo’s team was stupid. They were not. Most of them were better educated, better funded, and better networked than Larry Page and Sergey Brin in 1998. The point is that the obvious thing — port the artifact, keep the model — was the wrong thing. And it was the wrong thing precisely because it felt safe, legible, defensible to the board, and easy to explain to advertisers. Chocolate over broccoli looks like progress to the people serving it. It is not.

What I See in Board Meetings

I sit on the board of about twenty industrial software and AI companies, and I spend a meaningful share of every quarter inside the rooms of their large industrial clients. The pattern repeats with painful regularity.

A factory operations director has read about generative AI. He is excited. He convenes a working group. The working group asks the existing ERP vendor and the existing MES vendor what their AI roadmap is. Both vendors present a slide deck with a copilot in the corner of the existing interface. The copilot can summarize a production order. It can draft an email to a supplier. It can answer a question about an inventory level by reading the same table that a human used to read manually.

Everyone nods. A pilot is scoped. Six months later the pilot is reviewed. The conclusion is that the technology is “promising” but the ROI is “hard to quantify.” A second pilot is scoped, usually with a different vendor, often with the same architecture.

What has happened is that AI has been applied to a process that was designed in 1995 to produce a piece of paper that nobody reads anymore. The copilot makes the wrong thing slightly faster. It does not ask whether the thing should exist. It cannot, because it was wired into a workflow that presumes the thing must exist.

The right move would be to ask a different question. If we were rebuilding this factory’s information system today, with a model that can ingest unstructured signals, infer constraints, and propose decisions in natural language, would we still build the eighty-seven screens of the MES? Would we still have a production planner spend two days a week reconciling spreadsheets? Would we still have a quality function whose primary output is a non-conformity report rather than a recommendation that prevents the non-conformity?

The answer is no. The answer is always no. The work, however, of saying no, and of designing what replaces the screens and the spreadsheets and the reports, is enormous. Chocolate is easier than rebuilding the recipe.

The Horse Rider and the Rocket Engineer

There is a second mistake nested inside the first one, and it is the one that explains why the chocolate keeps getting served.

In almost every company I work with, the reinvention of a function is delegated to the people who currently run that function. The head of supply chain is asked to lead the supply chain AI transformation. The head of quality is asked to design the quality function of the future. The CFO is asked to architect the finance team’s relationship with autonomous agents. This sounds reasonable. It is the way large organizations have always allocated change: those closest to the work are presumed to know best what the work should become.

In a moment of profound technological discontinuity, this allocation is exactly wrong. It is as inane as asking a champion horse rider for advice on the mechanical design of a rocket engine.

The rider is not stupid. She has spent thirty years developing tacit knowledge about momentum, balance, the moods of a living animal, the texture of a course, the cadence of acceleration through a turn. Her expertise is real and hard-won and beautiful to watch. None of it transfers. Worse, much of it actively misleads. The intuitions that make her a great rider — read the animal, listen to the breath, never force the gait — are the wrong intuitions for combustion chambers and turbopump assemblies. If you put her in charge of the propulsion team, she will produce a rocket engine that vaguely resembles a horse: organic, temperamental, responsive to the rider, and incapable of leaving the atmosphere.

The production planner who has spent twenty years reconciling spreadsheets across a six-plant network has the same kind of expertise. He knows where the bodies are buried. He knows which supplier always lies about lead times. He knows that the Tuesday meeting matters more than the Thursday one. This knowledge is genuine. But almost none of it is the knowledge you need to architect a system in which an agent reads the supplier email, infers the real lead time, replans the network in milliseconds, and never produces the spreadsheet at all. The planner’s expertise is the expertise of operating the broccoli. He is the wrong person to design a world in which the broccoli has been replaced.

This does not mean the planner should be ignored. He should be interviewed, deeply, by the people designing the replacement, because his pain points and edge cases are gold. But he should not be in charge of the design. The discontinuity is too large. His unconscious model of what the function is for will smuggle itself into every architectural choice, and the result will be a faster, cheaper version of the thing that should not exist.

The implication for leaders is uncomfortable. The reinvention of a function rarely comes from inside the function. It usually comes from outside the company entirely, brought in by people whose primary qualification is that they do not yet know how things are done here. This is why the AI cycle will reward founders and external builders disproportionately, and why most internal transformation programs will quietly fail. The CEO who understands this allocates the reinvention to a small team with an explicit mandate to ignore current practice, and protects that team from the political gravity of the existing organization. The CEO who does not understand it puts the head of operations in charge of operations AI, watches the broccoli get a thicker coat of chocolate, and wonders, two years later, where the money went.

Why Legacy PLM and ERP Will Yahoo

The financial markets have already started pricing this. The large incumbents in PLM, ERP, MES, and adjacent industrial software have spent the last eighteen months telling their analysts the same story. The story is: our foundations are the right foundations for the AI era. Our data model is the substrate. Our customer relationships are the moat. The next decade will be us, plus AI, only better.

I think this is Yahoo’s story, told one more time.

The data model of a 1990s ERP was designed to track movements of inventory across nodes in a known topology. The data model of a 1990s PLM was designed to manage revisions of CAD files inside a deterministic release process. Neither was designed to ingest semi-structured supplier communications, to reason about counterfactual production plans, to negotiate with another agent across a supply chain, or to learn from the outcome of its own recommendations. These are not extensions of the old model. They are a different category of object. Bolting an agentic layer onto a relational schema does not produce an agentic system. It produces a relational schema with an agent constantly fighting its container.

The honest version of the incumbent strategy is the one nobody says out loud at the analyst day. We will milk the install base for as long as customer switching costs remain high. We will pretend to lead on AI because we cannot afford to be perceived as following. When the new model has been proven by a startup we did not see coming, we will buy that startup at a price we will tell ourselves is a bargain. We will then spend five years integrating it badly, lose most of the original team, and watch a second startup eat the next layer.

Yahoo bought a lot of things. Most of them did not save it.

I do not say this with hostility toward those companies. Some of them have built remarkable products and shipped real value for decades. I say it because the temptation to confuse install base with architectural advantage is the single most expensive error in enterprise software, and the AI cycle will be the most ruthless test of that error in the history of the industry.

Why AI Gains Are So Hard to Capture

A common complaint, repeated in every quarterly survey, in every CIO panel, in every consulting report, is that companies are investing heavily in AI and seeing very little measurable return. The usual explanations are change management, data quality, talent gaps, governance, regulation. All of those are real. None of them is the main thing.

The main thing is that the bulk of enterprise AI deployments are chocolate on broccoli. The new tool has been pointed at the old workflow, with the old approval chain, the old performance metrics, the old job descriptions, and the old assumption about what the function exists to produce. Of course the gains are small. The gains can only ever be a few percentage points of efficiency on a process that should have been deleted.

The clearest empirical confirmation of this is the trajectory of Microsoft Copilot. Microsoft has spent the last three years embedding generative AI into Word, Excel, PowerPoint, Outlook, and Teams — the canonical artifacts of pre-AI knowledge work. The thesis was that the install base was the moat, the workflow was the substrate, and the agent in the corner of the document would be the future of productivity. The thesis was the Yahoo thesis. The results have been telling. An MIT study found that 95% of generative AI pilots fail to deliver measurable business impact. Gartner reports that only 5% of organizations have moved Copilot from pilot into broad production. Multiple enterprise CFOs have publicly questioned whether the seat license justifies the marginal time savings. None of this is a verdict on the underlying model, which is excellent. It is a verdict on what happens when you wrap the strongest language model in human history around a document-centric workflow designed for a different century. The chocolate is exquisite. The broccoli underneath has not changed.

The companies that will capture the order-of-magnitude gains from this cycle are the ones willing to ask the harder, more expensive question. What is the function actually for, what would it look like if we designed it from scratch with these new capabilities, and what is the smallest brave step we can take from where we are today toward where we should be in five years.

The answer is rarely a copilot.

Send Engineers Into the Building

If the incumbents cannot reinvent themselves, the vendors will not, and the bolted-on copilot is the wrong shape, the obvious question is what works.

The model that works is the one Palantir invented twenty years ago and that almost nobody copied for fifteen of those years: the forward-deployed engineer. A small team of strong engineers is embedded inside the client, in the room with the operations director, the controller, the planner, the floor supervisor. They are not consultants. They are not trainers. They are not implementation specialists. They build, in real time, the software that replaces the old workflow, and they keep building it until it actually works inside the messy reality of the business. They do not have a contract with a fixed scope. They have a contract with an outcome.

The crucial part of the model is that the engineer is not delegated the question of what the new system should be. The engineer brings the question. The engineer has the architectural vocabulary that the planner does not have, and the engineer is allowed to throw away the eighty-seven screens of the MES because she has no political stake in any of them. The engineer learns from the planner what the system must do. The planner learns from the engineer what the system can now look like. The synthesis is the reinvention. Neither party could produce it alone.

This is a deeply uncomfortable model for traditional enterprise software vendors. It is high-touch, slow to scale, expensive to recruit for, and impossible to fit inside a clean per-seat license. It is also the only model that has ever consistently produced order-of-magnitude gains from technological discontinuity in a large organization. Palantir built a several-hundred-billion-dollar company around it. OpenAI and Anthropic have both, in the last two years, quietly built variations of the same model for their enterprise work, because they discovered the hard way that the software does not deploy itself. At OSS Ventures we built our entire industrial go-to-market around a version of this idea — we call it the Transformer program — because after working with hundreds of factories we concluded that nothing else closes the gap between the model’s capability and the customer’s reality.

The implication for any CEO of a large industrial or services company is that the right partner for the AI cycle is not the vendor with the most polished slide on AI strategy. It is the team that will send people into the buildings, sit beside the operators for months, throw away the parts of the process that do not survive contact with the new tools, and build what should be there instead. The contract should pay for outcomes, not seats. The team should have an explicit mandate to ignore current practice. And the relationship should be measured in years, not quarters, because reinvention at this scale is not a procurement event.

This is the part of the industry that is quietly being built right now, mostly out of sight of the analyst community, and it is where most of the durable value from this cycle will end up.

The Test

Whenever I am in a meeting and someone presents an AI initiative, I run a single test in my head. I ask whether the project, if it succeeded completely, would meaningfully change who has which job, which decisions are made by whom, and what the output of the team actually looks like. If the answer is no — if the org chart, the decision rights, and the deliverables would all look the same, only a bit faster — then I am looking at chocolate on broccoli. The project may be useful. It will not be transformative. It will not show up in the P&L two years from now in a way anyone can point to.

If the answer is yes — if the success of the project would force a redesign of the team, the process, or the product — then I lean in. That is the project that is reaching for a different architecture rather than dressing up the existing one.

This is not a guarantee of success. Most of the projects that pass the test still fail. But none of the projects that fail the test ever produce the gains people hope for, and a lot of capital is currently being spent learning that lesson the slow way.

Closing

I think about this expression often, and I have come to like the image more than the joke. Broccoli is a perfectly good vegetable. Chocolate is a perfectly good dessert. The mistake is the combination, and the mistake is born of an entirely human refusal to admit that the new thing requires us to throw away the old thing rather than improve it.

The companies that will win this cycle are the ones that look at their own broccoli with affection, and then have the discipline to set it aside.