Human testers in the age of AI

Written by Florian Fieber | Thursday, 13.2.2025

Artificial intelligence has reached a significant milestone with the launch of ChatGPT at the end of 2022 at the latest and is now generally available and usable. The development has led to disruption in many areas, including software development and software testing. This article looks at the impact of AI on software testing and shows that AI plays a significant role both as a test object and as a means of testing.

When testing AI systems, new test procedures must be applied, as AI systems behave differently to traditional systems. At the same time, AI can support or even completely take over testing activities, which leads to an increase in efficiency.

Humans will still have to play a central role in the testing process, especially in the quality assessment of AI. The use of AI in testing leads to higher productivity in software development, as development cycles can be shortened and more software can be produced in less time.

Overall, it is clear that AI will change the job profile of testing, but that humans will remain indispensable. The combination of human expertise and AI support will lead to more efficient and productive software development, with humans playing an indispensable role.

AI is here - for real now!

With the launch of ChatGPT at the end of 2022 at the latest, artificial intelligence in general, and generative AI in particular, has appeared on our map. Even if the topic is only now being widely recognized, it is of course not new and has been researched and developed for decades.

The main innovation, however, is that AI is now generally available and, above all, usable and accessible. This means that AI is really here now, developments are exponential and we can only guess where the journey will take us. In any case, it is what we typically refer to as disruption - software development has already changed significantly and will do so even more in the coming years.

The importance of AI for software testing

This disruption also affects our discipline - software testing. After looking euphorically at the new possibilities and associated opportunities with the launch of ChatGPT, we are now increasingly recognizing the (actual and potential) impact AI will have on our profession and that there are also legitimate doubts and risks.

Testing will be influenced by AI in two main areas:

Testing AI: Our portfolio of test objects is expanding to include completely new representatives: in addition to "classic" systems, we are now also dealing with AI systems whose quality should be evaluated. On the one hand, these can be stand-alone AI systems, on the other hand - and this will be the case much more frequently - they can be AI-based systems or components that are integrated with classic systems or components and thus AI technology will have an influence on overall systems and end-to-end processes. Since AI systems behave differently than we are used to from classic systems (among other things, they are non-deterministic and their behavior is probabilistic), we have to adapt our test procedures or use new test procedures for this.

Testing with AI: AI is not only a test object, it is also a means of testing. Test activities can be partially supported or perhaps even completely mapped by AI. The previously purely human activity of testing will change as a result and in future will be carried out in interaction between humans and (intelligent) machines.

What is the challenge?

If I ask a person what the most important skill of a software tester is, they will (have to) give me exactly one answer - it is a question to which there can only be one answer. If I ask him the question again, he should give me the same answer (at least if I repeat the question directly and the world of tester skills has not fundamentally changed in this short period of time), e.g. "accuracy".

If I now ask ChatGPT this question, I also get an answer. If I ask the question a second time, I may get a different answer, the third time the same, and so on. I have asked ChatGPT the question one hundred times and received a total of 16 different answers (see Figure 1). This behavior is different from what we would expect from a human - there can only be one answer to the question "What is the most important ...".

Figure 1: ChatGPT's answers to the question about the most important skill of a software tester and frequency of answers for one hundred repetitions.

If we work with AI, we have to deal with exactly this kind of behavior, either because we are testing an AI or because we are using an AI to generate test resources. If, for example, we use a language model such as ChatGPT to generate test data, we have to ask ourselves to what extent the results are valid and whether we can actually trust them. We need to be able to question the results and recognize when they are not valid.

Humans vs. AI in testing

Both areas - testing AI and testing with AI - will have a massive impact on the testing profession and require the right skills from testers. When it comes to testing with AI, the question arises as to the extent to which AI can and should take over testing and what need there will still be for humans.

I would like to explore this question further below and examine two theses in more detail:

1. AI as an assistant: The primary benefit of AI in testing lies in increasing the efficiency of testing activities and not in replacing humans.

2. human-in-the-loop: Humans are absolutely needed for the quality assessment of AI; we cannot and must not replace humans at all.

Greater efficiency in testing

AI can take over many testing tasks

AI can already take over many testing activities to varying degrees, especially where repeatable, easily automated and generative activities are involved, as well as text-intensive activities in which large amounts of text are analyzed and evaluated. All activities in the testing process can thus be supported by AI, e.g.

Analyzing requirements and identifying acceptance criteria
Developing test ideas and generating test cases
Generating and transforming test scripts
Generate test data
Analysing, prioritizing and optimizing test portfolios
Analyze test results and identify patterns
Evaluate and classify errors

With the help of AI, we don't start our tasks from scratch. AI can easily take over 80% of the preparatory work, leaving humans to "only" take care of the remaining 20%. Overall, test tasks can be carried out much more efficiently and quickly - we have to invest less human labor to solve a problem.

However, AI does not release us from being well-trained and experienced testers. For the 20% of the remaining work, we have to master our testing skills and need the right skills to be able to assess, evaluate and, if necessary, revise the quality of the 80% of the preliminary work.

This already shows that the greater danger for human testers is not the AI itself, but other people who are better at testing and know how to deal with AI.

Increased efficiency leads to higher resource requirements

The use of AI therefore makes it possible to save resources (human labor) when carrying out test activities. We therefore need fewer people to carry out the same tasks. If we were to leave it at the same amount of tasks, we would actually need fewer people in the long term.

However, we will reinvest the resources gained through the increase in efficiency in order to be able to carry out more tasks - the increase in efficiency therefore leads to higher resource requirements. In economics, this is known as the "Jevons paradox" and, in a broader sense, the "rebound effect".

Figure 2: In 1865, William Stanley Jevons found that England's coal consumption increased after the introduction of James Watt's coal-fired steam engine, even though it was much more efficient than Thomas Newcomen's earlier version. Watt's innovations made coal a cheaper source of energy and led to an increase in the use of his steam engine in transportation and other industries. This led to the overall increase in coal consumption, although at the same time the specific consumption of each individual application decreased. (Source: W. Jevons; Drawing from Florian Arnd at de.wikipedia, Public domain, via Wikimedia Commons, [1])

This effect can often be observed in many areas, e.g. a faster connection between city A and city B when traveling by train leads to increased use of this connection. But this effect is also known in our discipline, for example test automation: test automation is intended to reduce the effort for repeated manual tests, the expected saving is often a reduction in manual test effort as well as faster tests and more efficient test processes.

Although automation increases the efficiency of test execution, the increased need for human labor in the form of scripting, maintenance, training and supplementary testing can lead to more resources (especially time and personnel) being required overall.

When using AI in testing, it is therefore reasonable to assume that we will see similar effects in terms of human labor. The efficiency gain will lead to faster and/or more frequent testing, human testers can focus more on the things they can do better than the machine. The need for humans will not be reduced.

Increased efficiency in testing leads to higher productivity in software development

The trend in software development over the last few decades is that we are producing more and more software in ever shorter cycles. This is made possible not least by greater efficiency in testing - automation and shift-left approaches are key pillars of efficient software development. Increased efficiency in testing through AI will therefore help to further shorten development cycles and thus produce more software in less time. This will not reduce the need for people.

Figure 3: Global trends in software sales development and the need for software developers (source: [2])

In addition, the use of AI in testing enables us to do things that we would otherwise not do at all, as the benefits would be offset by a very high level of manual effort. An example: we have a very large portfolio of automated tests using Selenium, but now want to use a new test automation framework, e.g. Playwright.

The manual effort involved in migrating the test scripts from Selenium to Playwright is very high and can tie up a lot of resources that are more urgently needed elsewhere. With the help of AI, such a migration can be made much more efficient or even possible in the first place.

AI can help ensure that testing is no longer a bottleneck

Before we consider who we could replace with AI, let's take a look at where we are today. We often have two challenges in testing where AI can help us to fulfill our actual duty (to increase effectiveness) before we even think about increasing efficiency:

We have had a high skills shortage in our discipline for years and the trends do not suggest that this will change any time soon. Tasks that cannot be filled by humans because they are simply not available can potentially be supported or taken over by AI, thereby alleviating the skills shortage to some extent.
In many organizations or projects, Testing is still a bottleneck, either because the development and testing processes are immature or simply because too few resources (time, budget and personnel) are made available. It often takes a great deal of effort to actually achieve a desired level of quality. AI can help us to fulfill our duty in the first place and achieve a minimum level of coverage.

Humans cannot and must not be replaced

Do we really want to replace testing with AI?

Before we investigate further whether human testers can be replaced, we should consider whether we want this at all. Here's a little thought experiment: the process of software development can typically be broken down into different phases or tasks: Analysis, design, development and testing. As things stand today, all of these tasks are carried out by people (see Figure 4).

Figure 4: Typical activities in software development.

Assuming we were to have three of these tasks performed by AI in the future - which would we choose, or for which task would we like to keep humans?

Purely intuitively and following our instincts, we would probably decide not to let the AI take over testing completely, or at least as the last task (see Figure 5). Why is that the case? Probably because it simply feels wrong to take humans out of the equation when evaluating quality and deciding on the productive use of software.

Figure 5: Which activities in software development do we want to replace with AI - and which not?

We develop software for people - not for machines

Most systems are developed for use by humans and not for use by machines. A system that is to be used by humans also requires testing by humans, as only they can judge what they want.

Usability and accessibility are examples of quality criteria to be tested. These are inherently aimed at people and can therefore only be conclusively assessed by people.

Testing is more than automation

Many testing activities can be usefully automated - but by no means all of them. Although analyzing and evaluating text-intensive content, generating test cases or test data or performing and evaluating tests are relevant activities that can also be easily automated, there is much more to testing.

People are a key factor in test projects; it's about culture, mindset, communication and collaboration. People bring their intuition, their instinct - in other words, their common sense. They have a good sense of "just enough" and are able to make decisions. All of these things cannot (at least not yet) be automated or mapped by an AI.

Humans have strengths where AI has weaknesses

AI is not the solution for everything. For example, while Large Language Models (LLM) show impressive capabilities and benefits for a range of use cases, they also have significant shortcomings that affect their ability to understand and reason. They thus have limitations in terms of the breadth and depth of their isolated applicability without human supervision (see S. Williams, J. Huckle: Easy Problems That LLMs Get Wrong, arXiv:2405.19616, [5]):

Linguistic comprehension: LLMs often misinterpret human language or overlook important nuanced meanings. This leads to inaccuracies or misjudgments in linguistic tasks.
Common sense: LLMs cannot physically perceive the world, lacking visual, auditory and tactile stimuli and experiences. This disembodied state limits their ability to learn the subtleties of common sense.
Contextual understanding: LLMs lack the ability to think contextually. However, correct thinking is closely linked to the ability to understand the often implicit context in which something is situated.
Visual-spatial thinking: LLMs lack basic spatial awareness, the ability to mentally visualize objects and understand their relationships in space.
Mathematical thinking: LLMs have weaknesses in performing simple mathematical thought processes. Although they can often give correct answers to challenging mathematical questions, they have to outsource calculations to other tools (e.g. code or calculators).
Popular science knowledge: LLMs are susceptible to the spread and amplification of inaccuracies contained in their training data, including scientific misconceptions or outdated information that is often spread online.
Relational understanding: LLMs have difficulty understanding and interpreting temporal, causal and conceptual relationships between entities. Solving these problems often requires human understanding and intuition.
Logical reasoning: LLMs are trained with knowledge that does not guarantee competence in logical reasoning at the time of inference. Human thinking can only be imitated to a certain extent.
Overfitting: LLMs tend to adapt to the peculiarities of the training data at the expense of broader generalization. Pre-trained models are excellent at interpolating within the limits of their training data, but extrapolating outside these limits can be more difficult.

These limitations are by no means intended to deny the possibilities and potential of AI in testing, but they should make it clear that it requires a critical approach and that it poses challenges in testing. Ultimately, we need people with their critical thinking, intuition and ability to think outside the box - which, incidentally, are typical and important skills of testers.

The limitations of AI mean challenges in testing where humans are required, e.g.

Hallucinations of Generative AI lead to dubious results that need to be recognized and corrected.
The strong black box character of AI leads to a lack of transparency, explainability and trustworthiness.
There is a strong dependence on the quality and availability of data for training, validation and testing of AI systems.
AI systems are non-deterministic, we are dealing with probabilities and testing is no longer a pure binary pass/fail.
The non-determinism for a significant intensification of the test oracle problem, which must be countered by suitable test methods and procedures.

Testing of and with AI requires humans

Determining the quality of an AI plays a role in testing the AI, but is also important when using AI for testing. Certain quality characteristics, such as freedom from bias, ethics, fairness, transparency and explainability, can only be assessed by humans.

The testing of AI can be partially automated by machines, but for the most part, humans are also required to assess quality. Examples of this are the evaluation of training, validation and test data and their quality, the resolution of the test oracle problem (e.g. through suitable test procedures such as metamorphic testing, A/B testing, back-to-back testing and exploratory testing) and the evaluation of usability and accessibility.

EU AI Act: we must not replace humans at all

With the EU AI Act, the European Union has presented a directive on the safe and ethical use of artificial intelligence. The aim is to promote innovation, strengthen trust and minimize risks when using AI. With regard to humans, the EU AI Act requires that "AI systems should be monitored by humans and not by automation in order to prevent harmful outcomes" [6].

Humans must not be taken out of the equation here; on the contrary, humans must play a central role in the development of AI systems.

However, this also requires the people involved to be appropriately qualified. This is also required in Article 4 "AI skills" of the EU AI Act: "Providers and operators of AI systems shall take measures to ensure, to the best of their knowledge and belief, that their personnel and other persons involved in the operation and use of AI systems on their behalf have sufficient AI skills [...]" [7]. Here we can derive a direct mandate for us as testers to deal with AI accordingly and to undergo further training.

How will the testing profession change?

In view of the rapid developments in AI technologies and the enormous potential for the future, it should be obvious that it is not a question of whether the job profile of testing will change, but how - and we should adapt to this.

As outlined above, in my view, the primary benefit of AI in testing is to increase the efficiency of testing activities and not to replace humans. Humans are absolutely essential for the quality assessment of AI, and we must not replace humans at all.

AI should support people - not people support AI

AI is a technology that was created and is being further developed by humans and should primarily benefit humans. The idea that the development towards an AI that is superior to humans is inevitable is not only dystopian, but also ignores the fact that humans themselves are in control of how they shape their future.

Figure 6: Brave new AI world? (Source: [3])

Of course, we can only have a limited influence on how AI will continue to establish itself worldwide, driven by the big tech companies. But we should at least be able to influence within our discipline whether human testers become AI assistants or vice versa.

Do we want a future in which AI does all the cool testing tasks for us and we only have to click "Start" and "Stop", or would it not be much more pleasant if AI assisted us with all the dull, boring, tedious, time-consuming and repetitive tasks, making testing even more challenging and attractive for us? I think that we have it in our own hands and should shape the future of testing in this way.

More concept knowledge instead of product knowledge

Regardless of this very big (and admittedly rather philosophical) question, we will see a shift in the focus of our work from tactical (how do I need to test?) to strategic (why do I need to test?) - at least if we don't want AI to take over our job. We can already see how LLMs are very good at solving very specific and technical tasks.

If I am only good at writing test scripts in a specific test automation framework, AI is already a real competitor for me today. This tactical knowledge (also called product knowledge) will be the first to be replaced by AI. It is therefore advisable to move away from product knowledge towards concept knowledge and focus more on the "what" and "why" of testing.

Should the singularity actually come at some point in the distant future, then none of this will matter; the fact that software exists and needs to be tested will no longer play a role. However, this should remain a rather unlikely and - see above - dystopian outlook.

Figure 7: In our tasks as testers, we have to deal with questions and tasks at different levels of abstraction. Concrete, tactical skills (product knowledge) will be replaced by AI sooner than abstract, strategic skills (concept knowledge).

AI as junior, humans as senior?

Does this shift from tactical to strategic tasks also mean that AI will take on junior tasks and humans will take on senior tasks? This sounds quite attractive at first, but it can't work because I can't become a senior without having had the experience of a junior (see Figure 8).

Figure 8: Is AI taking independent learning and thinking away from us? (Dall-E generated image, cartoon based on [4]).

AI will never relieve us of the task of evaluating, classifying and critically analyzing. I can't do all that if I've never done it myself. In order to be able to work conceptually, delegate work and evaluate results and make well-founded decisions, I need to have gained the relevant experience myself and have worked as a junior.

If we rely too much or even exclusively on machine support, we run the risk of unlearning important skills (like a car driver who learns to navigate and make decisions independently due to the many assistance and navigation systems) or having to trust blindly (cf. the automation bias in test automation).

AI as a sparring partner in testing

A more apt description of the collaboration between human testers and AI is that of a sparring partner. The AI is the tester's sparring partner, it challenges him and helps him to develop and improve and prepare for his "battle".

Another analogy is the pilot-co-pilot relationship in a cockpit: both pilot and co-pilot are in principle capable of piloting an aircraft on their own, they are appropriately qualified and trained. However, they work together, supporting and controlling each other. Following this metaphor, the human tester should be the pilot and the AI the co-pilot.

Skills for testers

A good tester will achieve better results even faster with the help of AI - but a bad tester will also achieve worse results even faster. AI will not take our tester job away from us, but rather someone else who knows how to use AI effectively will take our job. Or to put it positively: Testers who, on the one hand, have mastered the craft of testing and, on the other, know how to both test and use AI will significantly increase their market value.

The most important foundation is and remains to follow the basic principles, processes and methods of testing as we have known and successfully applied them for many years. We cannot throw AI at the test problem and be done with it. Tester skills and experience were, are and will remain the key to successful testing.

We now need to complement these skills with basic skills in AI systems and technologies, e.g.

Knowledge of machine learning, data quality analytics, etc.
Knowledge of the use of Generative AI and interpretation of its results (e.g. prompt engineering/crafting.
Knowledge of relevant test strategies and methods for testing AI and testing with AI.
Further development from product knowledge to concept knowledge.

Humans will not be replaced

By increasing efficiency and productivity through the use of AI, we will produce more and more software in ever shorter cycles and will absolutely need people. To prevent human testers from becoming a bottleneck, I can well imagine that we will be dependent on even more people in the area of testing than before.

Human-in-the-loop and machine-in-the-loop

The future we should prepare for is not one in which we will either work with or without AI. We will see different levels of autonomy in different areas: In the testing process, different testing activities will be supported by AI to varying degrees.

The activities that already have a high degree of automation today will be the first and most strongly supported by AI tools and assistants (especially test execution). Other test activities, on the other hand, will retain a high proportion of manual work (especially test management).

Human testers and machines will work together in a complementary way; in some areas, the human will be in the lead - supported by the machine - and in other areas, the machine will be in the lead - with the human-in-the-loop (see Figure 9).

Figure 9: Man and machine complement each other in testing. Sometimes the human is in the lead, supported by the machine, sometimes the machine is in the lead, supported and monitored by the human.

Want to find out more? Interested in an inhouse-lecture?

Then contact us:

Sources

[1] https://de.wikipedia.org/wiki/Jevons-Paradoxon

[2] https://de.statista.com/outlook/tmo/software/weltweit

[3] https://indiepocalypse.social/@AuthorJMac/112178826967890119

[4] https://www.spiegel.de/fotostrecke/cartoon-des-tages-fotostrecke-142907.html

[5] https://arxiv.org/abs/2405.19616

[6] https://www.europarl.europa.eu/topics/de/article/20230601STO93804/ki-gesetz-erste-regulierung-der-kunstlichen-intelligenz

[7] https://artificialintelligenceact.eu/de/article/4/

View full post