Banking IT: Why resilience is more important than flawlessness

Written by Shruthi Premkumar | Tuesday, 10.2.2026

Data centers are not sterile laboratories, but crisis zones. While the customer sees a clean interface on their smartphone, there is a constant battle against entropy, latency and interface chaos in the backend.

The truth is: there is no "perfect" system behind every successful bank transfer. There is an architecture behind it that is just barely kept from collapsing. Functional testing in banking, as in other industries with complex IT, therefore rarely serves to simulate flawlessness. Its sole purpose is to manage the inevitable failure of components in such a way that the customer experience is not affected and regulations are complied with.

Farewell to the "happy path": Chaos is the default state

The idea that banking software should "never" have errors is dangerous. Hardware fails, API partners have timeouts, suppliers' systems falter, deployments fail. This is not an accident, it's a statistic.

We are therefore currently testing not only the "happy path", but also resilience:

Graceful Degradation: if the real-time credit check fails, the credit application must not fail. The system must switch to a safe waiting state.
Idempotence: If an app sends the transfer three times due to a poor network, the core banking system must only post once. It must recognize: "I already know this message."
Traceability (logs): The last layer is the checking employee. If automation reaches its limits or a customer makes a complaint, support must be able to reconstruct the case. We therefore test whether logs not only store technical noise, but also tell a human-readable "story" of the transaction so that the back office remains capable of acting.

Without these mechanisms, technical trivialities cause chaos.

The reality: hybrid architecture between old and new

Modern banking IT is no longer a monolithic block, but a highly complex network: COBOL mainframes form the foundation, cloud services the infrastructure and React frontends the interface to the customer.

In this hybrid architecture - often exacerbated by the coordination of globally distributed teams (offshore/nearshore) and ongoing cloud migrations - quality assurance is no longer a purely technical issue. It becomes strategic risk management. Three examples of why interaction is critical:

The interpreter effect (legacy integration): Modern apps communicate via JSON/REST, while in the backend, mainframes calculate in EBCDIC. Functional tests are the only reliable interpreter here. One conversion error at this interface - without a strict plausibility check - and €100.00 quickly becomes €1.00 or €10,000.00.
The regulatory constraint (DORA): With the Digital Operational Resilience Act, system stability is the law. Banks must prove that they not only have their own software under control, but also the risks of their third-party ICT providers (cloud, SaaS). Those who fail to test here no longer just risk bugs, but also severe fines.
Real-time stress (instant payments): The world no longer waits for the nightly batch run. Customers demand transactions in seconds. This forces systems that were designed for "office hours" into 24/7 operation. Functional tests must prove that parallel accesses (race conditions) do not lead to data inconsistencies - a complexity that makes purely manual tests virtually impossible.

Risk-based testing as a consequence of the quality pyramid.

Risk-based testing: the courage to leave gaps

Testing everything is impossible. Anyone who promises this has no budget awareness. Resources are finite, which is why we use risk-based testingin test management - also with a view to DORA :

We prioritize mercilessly according to damage potential:

The super-GAU: an error in interest calculation or SWIFT transfers is unacceptable. This is a matter of liability, regulation and the existence of the bank.
Image damage: A glitch in the profile picture upload or inconsistent menus may not be system-critical, but they destroy the "user experience". If the app feels "cheap", trust in the brand suffers massively.
The gateway: unstable functions are often hidden security vulnerabilities (e.g. lack of validation). At these critical points, we seamlessly integrate our security testing services to close attack surfaces immediately.

The decision is strategic: in consultation with the stakeholders, we secure the critical path (cash flow & data) with maximum rigor. We weigh up UX issues - we accept a documented, calculated residual risk as long as the core operation (the security of funds) is not jeopardized.

Conventional vs. resilient approach

Characteristic	Conventional testing (compliance focus)	Resilient testing (business continuity focus)
Basic assumption	"The system works as long as no error occurs."	"The system will fail - the only question is when and how."
Test scenarios	Happy Path: We test what is in the specification (target state).	Chaos Engineering: We test what is not in the specification (network failure, slow third-party providers).
Database	Synthetic: Clean "lab data" that fits the logic perfectly.	Realistic: "Dirty" production data, historical legacy, coding conflicts (legacy).
Success metric	Green lights: "All 500 test cases passed." (Deceptive security)	Time-to-recovery: "How quickly does the system recover after an error?" (Real stability)
Attitude towards errors	Errors must be prevented.	Errors must be mitigated and understood (traceability).

Trust through control

Ultimately, functional testing is an insurance policy against the reputation super-GAU. Customers expect functioning transactions like electricity from the socket. When banking systems "just work", it is not by magic, but because tests were running in the background, anticipating and absorbing the chaos.

Quietly, efficiently and without applause. Just as it should be.

Get in touch with us - we can help you!

View full post