The Spaghetti Trap For Smart Models: Don’t let GenAI Break The Legacy Ledger

At a glance

Technical debt remains the silent killer of modernisation programs, especially where COBOL, batch jobs, and reconciliation engines still run the show.
In banking, GenAI modernisation needs a different playbook entirely.
GenAI can untangle decades of messy legacy code only if you bake tests, telemetry, and human checks into the pipeline.

Consider this scenario: Asha, a senior payments developer at a mid‑sized bank, runs a two‑week generative artificial intelligence (GenAI) pilot to understand a 25‑year‑old COBOL batch that posts overnight settlements. The goal is to produce a dependency map, generate unit/integration test scaffolds, and surface risky code paths for refactor. Within 48 hours, the tool ingests the repository and produces a call graph, module summaries, and about 200 test scaffolds for the busiest routines. This turns opaque code into readable documentation and cuts new‑hire onboarding from days to hours.

During sandbox replays, however, a persistent reconciliation variance appears. An undocumented, account‑specific rounding rule is implemented with subtle differences across three modules. It is a classic spaghetti pattern. GenAI produces syntactically correct outputs but misses implicit business intent. So, the team pauses automated refactor work until parity is proven with telemetry, targeted tests, and SME validation.

A minimalist vector illustration on a dark purple background showing a human figure pulling a glowing gold thread out from a massive ball of tangled grey-purple yarn. This visualizes the process of extracting clarity, order, and a clear path from chaotic, legacy "spaghetti" code

Untangling legacy system complexity requires finding the clear, logical thread within the chaos. 'Visualised using AI'

This is just one example of how GenAI can deliver tangible impact in banking technologies but can be severely limited by spaghetti code in legacy systems. According to Databricks, a data intelligence software company, 94% of Indian enterprises report using GenAI in at least one function. However, legacy cores and technical debt will materially slow and reduce realised value unless programs invest in tests, telemetry, and governance.

Treat GenAI as an assistant, not an autopilot.

A GenAI assistant can turn hours of reverse engineering into actionable maps and tests, but a single batch dependency in legacy core code can stop a pilot cold. Here, human experts and telemetry remain essential. Forrester found 21% of Indian software decision‑makers say technical debt is a top barrier to innovation, higher than the global average.

Untangling the spaghetti trap

A conceptual illustration showing a large bowl overflowing with tangled dark blue spaghetti lines. A small human figure holds a magnifying glass over it; inside the lens, the tangled lines become a single, straight orange path. This illustrates how GenAI acts as a diagnostic lens to instantly map and clarify complex legacy execution traces.

GenAI can act as a powerful lens, mapping execution traces to instantly reveal a straight path through complex legacy "spaghetti" code.

Spaghetti code describes systems whose control flow is tangled, responsibilities are mixed, and decades of ad‑hoc patches and global state make behaviour hard to follow. In banking, this becomes a “spaghetti bowl” of integrations and custom fixes. This is often seen in payment clearing, posting engines, ledgers, loan servicing, and fraud/AML logic where mainframes and COBOL still run mission‑critical workloads.

Because these applications are high‑volume and highly regulated, any change must preserve reconciliation parity and audit trails. This makes them hard to test, trace, or refactor safely. As a result, many modernisation efforts stall at discovery and documentation. In many cases, GenAI output may pass compiler or static checks but prove inadequate for meeting regulatory and reconciliation requirements.

How legacy blockers show up in practice

Semantic gaps: GenAI can produce syntactically correct translations, but business intent in rounding rules, exception handling, and reconciliation tolerances often lives in comments, runbooks, or tribal knowledge, not the code itself. This causes parity failures during reconciliation.
Runtime dependencies: Batch jobs and overnight runs depend on specific sequencing, file formats, and external systems. Static outputs miss these runtime behaviors unless paired with telemetry and trace replay.
Audit and compliance friction: Banks must show provenance and explainability for changes. Black‑box GenAI outputs without clear lineage fail audit gates.

Industry guidance from IBM and recent technical work on GenAI‑driven mainframe modernisation note that while GenAI reduces boilerplate work, projects typically require staged, test‑first rewrites and extensive telemetry to satisfy audit and reconciliation requirements. Microsoft’s engineering team describes using AI agents to accelerate COBOL migration while relying on test harnesses and staged validation to avoid parity failures and audit friction.

How to let GenAI help without breaking the ledger

Start where value is easiest to capture

GenAI delivers the fastest, most reliable wins in developer productivity, documentation, and routine automation. Tasks like code comprehension, generating readable module summaries, and scaffolding unit and integration tests are low‑risk and high‑leverage. Document processing and customer‑facing automation (chatbots, form extraction, templated responses) also yield measurable gains quickly. These are areas where GenAI augments human work without touching core transaction logic, so you can capture value with limited governance overhead and clear KPIs.

Keep the core safe: Add checks where value is hardest to capture

The hardest gains lie in automated translation or wholesale refactor of mission‑critical mainframe code and reconciliation‑sensitive posting engines. Languages and runtimes such as COBOL/JCL encode decades of implicit business rules, rounding quirks, and sequencing dependencies. Semantic fidelity matters more than syntactic correctness. Auditability, provenance, and runtime behavior must be preserved. This requires exhaustive testing, SME validation, and controlled rollouts. In practice, attempts to fully automate these migrations without heavy engineering controls produce parity failures and regulatory pushback.

Rethink from the ground-up

If you are looking at mid‑to‑high single‑digit to double‑digit percentage gains in operating cost and developer throughput, you must invest up front. Build test harnesses, capture execution traces, and bake governance into the pipeline before scaling GenAI. Organisations that skip these steps will see pilots stall. Those that make them turn pilots into sustained, auditable modernisation programs.

Pilot to production with GenAI

A sophisticated workflow diagram on a dark blue background. On the left, a human hand forming an Indian classical mudra directs shapes through preparation stages like 'Test Harness' and 'SME Sign Off'. On the right, the shapes form a perfect Kolam grid, passing through deployment stages like 'Canary' and 'Live Deploy'. This uses the cultural metaphor of a Mudra guiding a Kolam to symbolize disciplined human-in-the-loop governance and the highly structured deployment of GenAI models.

Moving GenAI from pilot to production requires rigorous human-in-the-loop governance and a highly structured deployment matrix

Discovery first, then tests: Use GenAI to map and draft tests; lock behaviour with tests before changing code.
Combine static and dynamic analysis: Feed GenAI with execution traces and sample data to surface hidden side effects.
Human‑in‑the‑loop governance: Require SME validation, maintain change provenance, and gate deployments with reconciliation checks.
Pilot metrics: Measure time‑to‑understand, test coverage increase, and reconciliation parity to decide scale‑up.

Discovery to Deployment: A Six-Stage Controlled Rollout Framework. GenAI - BFSI

A visual breakdown of our six-stage controlled rollout framework designed to ensure data consistency and expert-validated scaling.

A disciplined approach can convert the GenAI promise into measurable modernisation wins while keeping the lights on and the ledgers balanced.

Disclaimer: Content provided by The Niche Foundry India is for informational purposes only. While we aim to provide accurate data and strategic insights, information is subject to rapid market and technological shifts. This content should not replace independent due diligence or professional consultation. The Niche Foundry India bears no responsibility for any actions taken, or financial losses incurred, in reliance on this material.