Rebuild Legacy Products with AI: Spec-First Strategy

Most corporate modernisation projects fail before they start. Not because the technology isn't there, but because the team looks at the legacy system and tries to evolve it rather than replace it. They patch, they refactor, they wrap the old codebase in new interfaces — and eighteen months later they've spent a board-level budget to produce something almost as brittle as what they started with.

There's a better way. And it requires changing the question from "how do we modernise this?" to "how do we understand it well enough to rebuild it cleanly?"

The real problem with legacy systems isn't the code

It's the knowledge that's trapped inside it.

Decades of business rules, regulatory accommodations, and edge-case fixes live in codebases that have never been documented. The original developers are gone. The specs — if they ever existed — don't match what's actually running. And the system itself has become the only reliable source of truth about what the business actually does.

That's the problem AI can genuinely help you solve. Not by rewriting your COBOL, but by reading it.

Research consistently shows that legacy systems consume a disproportionate share of IT budgets and are frequently cited as barriers to AI adoption — though the specific proportions vary considerably by industry and organisation size. Consulting projections, including estimates from McKinsey, suggest that AI-augmented modernisation can cut timelines by 40–50% by automating code translation, dependency mapping, documentation, and QA. These are projected averages from specific engagements, not universal guarantees, but the direction of the evidence is consistent. The tools exist. The question is whether your team is using them strategically or just pointing them at the problem and hoping for output.

Step one: Use AI to read the codebase before you touch it

Before any rebuild discussion, you need a complete picture of what the existing system actually does. Not what the documentation says it does. What it does.

Modern large language models are genuinely capable here. AI-powered modernisation tools can automatically analyse your legacy code and generate comprehensive documentation at multiple levels. Tools such as GitHub Copilot Chat, Cursor, Amazon Q, and a rapidly expanding range of AI code intelligence platforms let you ask specific questions about functions and modules, receiving detailed explanations of complex business logic that may have been undocumented for decades. The space is evolving quickly and the competitive landscape has shifted substantially even in the past year, so it is worth evaluating current options rather than relying on any fixed shortlist.

What's more useful than raw documentation is semantic understanding. AI-powered code analysis moves beyond simple call mapping to provide semantic understanding of relationships between modules. Instead of merely identifying that "module A calls module B," the AI layer adds contextual intelligence — explaining that "module A needs module B for insurance premium calculation" — giving you deeper insight into the business purpose behind each dependency.

For COBOL-heavy environments, which is most financial services and government, purpose-built tooling is more reliable than general-purpose models. Vendors including IBM (with watsonx Code Assistant for Z), Broadcom, and OpenText (formerly Micro Focus) offer tools fine-tuned for COBOL and mainframe domains, providing code summaries, business-level intent abstraction, and question-and-answer capabilities supporting legacy code comprehension, module refactoring, and variable renaming guidance.

Modern AI-assisted code analysis tools can process large codebases — tens of thousands of lines — in a matter of hours rather than the weeks that manual archaeology typically requires. Specific throughput will vary by tooling, codebase complexity, and configuration.

The output of this phase isn't a rewrite. It's an inventory. A map. Every module annotated with its business purpose, its dependencies, and the rules it enforces. That inventory is what makes the next step possible.

Step two: Extract the value, not the code

This is the part most teams skip, and it's the part that determines whether your rebuild succeeds.

The instinct is to start modernising from the existing codebase — refactoring incrementally, translating line by line, wrapping old services in new APIs. The problem with that instinct is that you're preserving the architecture of the legacy system while changing the technology. You end up with modern code that still has the same structural problems, the same coupling, the same implicit assumptions baked in.

The smarter move is to treat the legacy codebase as a source document, not a starting point. Use AI to extract the business logic and store it as human-readable, executable specifications. Then choose a modern stack and reimplement from those specs, cleanly, with no historical baggage carried forward.

Here's what that extraction process looks like in practice:

1. Generate annotated logic documents. Run your AI analysis tooling across the full codebase and produce module-by-module summaries of what each component does in plain language. Include inputs, outputs, validation rules, exception handling, and any domain-specific calculations. Flag anywhere the logic appears contradictory or duplicated — those are your highest-risk areas.

2. Translate annotated logic into structured specifications. Move the plain-language summaries into a formal spec format. By extracting specs from legacy code, teams can verify that modernisation efforts preserve required functionality while eliminating undocumented behaviours. The distinction is that traditional specs are read by humans, while modern executable specs act as validation gates. This is where BDD (behaviour-driven development) earns its place in the process.

3. Write BDD scenarios from each specification. Behaviour-Driven Development is the most direct ancestor of modern spec-driven development. Gherkin scenarios are executable specifications that bridge business requirements and technical implementation. What AI-assisted tooling adds is the ability to generate code from those specs, accelerating the path from scenario to working software.

A BDD scenario for a legacy interest calculation module might look like this:

Feature: Fixed-rate loan interest calculation

  Scenario: Monthly interest on a standard residential mortgage
    Given a loan principal of $500,000
    And an annual interest rate of 6.5%
    And a term of 30 years
    When the monthly repayment is calculated
    Then the monthly repayment amount should be $3,160.34
    And the total interest paid over the term should be $637,722.40

  Scenario: Early repayment penalty applies within fixed-rate period
    Given a loan with 3 years remaining in a 5-year fixed period
    And an early repayment fee of 1.5% of outstanding principal
    When the borrower requests full early repayment
    Then an early repayment fee should be calculated and disclosed
    And the payoff amount should include principal, accrued interest, and the fee

Write one of these for every significant behaviour in the system. They become your specification, your test suite, and your acceptance criteria simultaneously. Some AI tools can generate BDD-style syntax directly from legacy code analysis, allowing developers to tweak the design or add functionality before generating new code.

4. Choose a modern stack based on what the specs demand, not what the legacy system used. You're no longer constrained by the original architecture. The specs define the behaviour. The stack should be chosen on current engineering merit: maintainability, performance, ecosystem maturity, and your team's capability.

5. Reimplement against the specs. Build the new system feature by feature, with your BDD scenarios as the acceptance gate for each one. Nothing ships until the scenario passes. These aren't just technical tests — they are living proof that the system is meeting the specific needs of the business. When automated scenarios run in your development pipeline, they generate living documentation that must stay current. If the system's behaviour changes, the automated scenario breaks, immediately signalling that attention is needed.

This approach means you carry forward the intent of the legacy system while leaving behind its structural debt. That's a fundamentally different outcome from a translation or a refactor.

Step three: Replace incrementally with the strangler-fig approach

Even with a clean rebuild strategy, you can't switch everything off on a Tuesday afternoon and flip to the new system on Wednesday. Not with mission-critical products. Not in regulated industries.

The strangler-fig pattern is the answer. It's named after the tropical fig tree that grows around an existing tree, eventually replacing it entirely while both are alive for years. Applied to software, it means building the new system in parallel and progressively routing traffic from the old system to the new one, module by module.

The strangler-fig methodology involves incrementally replacing legacy functionality by routing traffic to new services while the old system remains operational. The practical implication is that your rebuild never creates a point where the business is exposed to a full system cutover. Each module that goes live is individually validated before the next one is tackled.

Because the system is replaced piece by piece, the legacy system can continue to operate during the transition — this enables business continuity and minimises disruption. There's reduced risk associated with each change by breaking the replacement process into smaller, incremental changes.

AI accelerates this pattern in a specific way. Rather than hand-mapping which modules to migrate in which order, AI-assisted dependency analysis from your Step 1 inventory tells you exactly which modules have the fewest dependencies (lowest risk to migrate first) and which are the most deeply entangled (migrate last). The strangler-fig approach is broadly considered lower-risk than big-bang rewrites precisely because of these continuous feedback loops and smaller, reversible deployments — though outcomes vary significantly by project scope and organisational context.

A practical sequence for a corporate product team looks like this: identify the boundary modules first — authentication, reporting, notification services — as these typically have clean interfaces and limited downstream dependencies. Migrate those, validate them against your BDD scenarios, confirm parity with the legacy system's behaviour, then decommission the old module. Repeat inward toward the core business logic. The last thing you retire is usually the most critical transaction engine, but by then you've built confidence across a dozen successful migrations.

Step four: Build a regression safety net from observed behaviour

One of the most persistent problems in legacy rebuilds is the absence of a test suite to validate against. If the legacy system has no tests — and most don't — you have no automated way to confirm that the new implementation behaves identically.

AI solves this by generating tests from what the system does, not from what it was specified to do.

The approach is to instrument the legacy system and capture real transaction inputs and outputs during normal operation. Feed those input/output pairs to an AI test generation tool. The tool produces regression tests that validate the same behaviour in the new system. Tools such as Diffblue Cover, and test automation platforms now offered by Tricentis and others, can automatically generate unit, integration, and regression tests for legacy systems — though specific capabilities and product names in this space continue to evolve, so current documentation should be verified before selection.

Combined with your BDD scenarios from Step 2, this gives you two complementary safety nets. The BDD scenarios validate intent — the business rules as they should work. The AI-generated regression tests validate observed behaviour — including the quirks, the edge cases, and the places where the legacy system did something unexpected that users have come to depend on.

That second category matters more than most teams expect. Legacy systems accumulate what engineers call "accidental features" — behaviours that weren't designed but that users rely on. A rebuild that ignores them will face complaints the moment it goes live, even if it's technically correct by the spec. Observed-behaviour regression tests catch these before you deploy.

Step five: Keep domain experts in the room at every milestone

This is where most technically competent teams still go wrong. They treat the domain experts — the people who actually know the business — as stakeholders to consult at the start and demo to at the end. That's not enough.

AI is a powerful tool in this rebuild process. It isn't an autonomous replacement for human judgement. Multiple developer surveys conducted in 2025 noted declining confidence in AI-generated code accuracy, with concerns centred on subtle errors in business logic rather than syntactic failures. That scepticism isn't irrational. It reflects real experience with AI-generated code that passes syntactic checks but gets the business logic wrong in subtle ways.

The mitigation is structural. Put a domain expert in the room at every spec review, every BDD scenario sign-off, and every module-level acceptance check. Their job isn't to read the code — it's to read the plain-language specifications and the BDD scenarios and confirm that they accurately describe how the business actually works.

A banking domain expert reviewing a loan calculation scenario will catch a rounding rule that the AI annotated incorrectly from the legacy COBOL. A clinical workflow expert will flag a prescription flow that the AI interpreted as sequential when it's actually conditional. These corrections happen at the spec layer, before a single line of new code is written, which is exactly where you want them.

This approach lets your existing team use AI to handle the technical heavy lifting while they focus on what humans do best: understanding the business context, making architectural decisions, and ensuring nothing breaks in production.

Milestone-gate every phase. Don't move from spec extraction to reimplementation until your domain experts have signed off the BDD scenarios for that module. Don't cut over a module in the strangler-fig until the regression tests pass and a domain expert has validated the results on a representative sample of real transactions. The gates feel slow in the moment. They prevent the kind of failures that set projects back by quarters.

What this looks like on a board slide

If you're building a business case for this approach, here's the honest summary:

The full-AI-rewrite-in-a-weekend narrative is wrong. AI doesn't remove the complexity of legacy modernisation. It changes where the human effort is spent. You spend less time in manual code archaeology and more time on spec validation and domain knowledge capture. That's a better use of your team, and it produces a more reliable outcome.

McKinsey has published research suggesting that AI-augmented modernisation can accelerate timelines by 40 to 50%. These figures are drawn from specific engagements and consulting projections rather than controlled studies, and outcomes will vary — but they are directionally consistent with what practitioners report. A separate claim sometimes attributed to McKinsey about a 40% reduction in technical debt-related costs circulates widely but cannot be traced to a single verifiable named report; it is excluded here accordingly.

Industry analysts and practitioners widely report that full rewrites carry high failure and overrun rates — some estimates place the proportion of full rewrites that exceed budget or timeline at or above 70%, though these figures are difficult to verify precisely and originate from multiple sources with varying methodologies. The spec-extraction approach described here isn't a full rewrite in the traditional sense. It's a disciplined rebuild with clear milestones, validated specifications, and incremental delivery. Those are the conditions that improve the odds.

The spec layer you produce in this process has value beyond the rebuild itself. Your BDD scenarios become the foundation for future feature development. Your annotated business logic documents become onboarding material for new engineers. Your AI-generated regression suite becomes the ongoing safety net for every release. You're not just rebuilding a product — you're creating the documentation and test infrastructure the original build never had.

That's worth putting in front of the board.

Evotron Studio runs engagements exactly like this — senior operators, milestone-gated delivery, and a clear path from legacy codebase to live reimplementation in weeks, not years. If you're a corporate innovation leader with a legacy product and a mandate to move, start with a diagnostic.

How to Rebuild Legacy Products from Scratch Using AI

The real problem with legacy systems isn't the code

Step one: Use AI to read the codebase before you touch it

Step two: Extract the value, not the code

Step three: Replace incrementally with the strangler-fig approach

Step four: Build a regression safety net from observed behaviour

Step five: Keep domain experts in the room at every milestone

What this looks like on a board slide

We build and run our own AI-amplified ventures — and bring the same capability into yours.

Related Articles

How to Rebuild Legacy Products from Scratch Using AI

The real problem with legacy systems isn't the code

Step one: Use AI to read the codebase before you touch it

Step two: Extract the value, not the code

Step three: Replace incrementally with the strangler-fig approach

Step four: Build a regression safety net from observed behaviour

Step five: Keep domain experts in the room at every milestone

What this looks like on a board slide

We build and run our own AI-amplified ventures — and bring the same capability into yours.

Related Articles

Spec Driven Development for Corporate Innovation Labs

AI Microteams: The Delivery Model Replacing the 10-Person Squad

Automating Organisational Processes with AI Agents: A Practical Guide for Innovation Leaders

The 18-Month Rebuild Is a Death Sentence Now

AI Prototype to Board-Ready Demo in 8 Weeks (No IT Queue Required)

10 AI Bioinformatics Startup Ideas Worth Building in 2026