Overall, it is clear that AI will change the job profile of testing, but that humans will remain indispensable. The combination of human expertise and AI support will lead to more efficient and productive software development, with humans playing an indispensable role.
With the launch of ChatGPT at the end of 2022 at the latest, artificial intelligence in general, and generative AI in particular, has appeared on our map. Even if the topic is only now being widely recognized, it is of course not new and has been researched and developed for decades.
The main innovation, however, is that AI is now generally available and, above all, usable and accessible. This means that AI is really here now, developments are exponential and we can only guess where the journey will take us. In any case, it is what we typically refer to as disruption - software development has already changed significantly and will do so even more in the coming years.
This disruption also affects our discipline - software testing. After looking euphorically at the new possibilities and associated opportunities with the launch of ChatGPT, we are now increasingly recognizing the (actual and potential) impact AI will have on our profession and that there are also legitimate doubts and risks.
Testing will be influenced by AI in two main areas:
Testing AI: Our portfolio of test objects is expanding to include completely new representatives: in addition to "classic" systems, we are now also dealing with AI systems whose quality should be evaluated. On the one hand, these can be stand-alone AI systems, on the other hand - and this will be the case much more frequently - they can be AI-based systems or components that are integrated with classic systems or components and thus AI technology will have an influence on overall systems and end-to-end processes. Since AI systems behave differently than we are used to from classic systems (among other things, they are non-deterministic and their behavior is probabilistic), we have to adapt our test procedures or use new test procedures for this.
Testing with AI: AI is not only a test object, it is also a means of testing. Test activities can be partially supported or perhaps even completely mapped by AI. The previously purely human activity of testing will change as a result and in future will be carried out in interaction between humans and (intelligent) machines.
If I ask a person what the most important skill of a software tester is, they will (have to) give me exactly one answer - it is a question to which there can only be one answer. If I ask him the question again, he should give me the same answer (at least if I repeat the question directly and the world of tester skills has not fundamentally changed in this short period of time), e.g. "accuracy".
If I now ask ChatGPT this question, I also get an answer. If I ask the question a second time, I may get a different answer, the third time the same, and so on. I have asked ChatGPT the question one hundred times and received a total of 16 different answers (see Figure 1). This behavior is different from what we would expect from a human - there can only be one answer to the question "What is the most important ...".
If we work with AI, we have to deal with exactly this kind of behavior, either because we are testing an AI or because we are using an AI to generate test resources. If, for example, we use a language model such as ChatGPT to generate test data, we have to ask ourselves to what extent the results are valid and whether we can actually trust them. We need to be able to question the results and recognize when they are not valid.
Both areas - testing AI and testing with AI - will have a massive impact on the testing profession and require the right skills from testers. When it comes to testing with AI, the question arises as to the extent to which AI can and should take over testing and what need there will still be for humans.
I would like to explore this question further below and examine two theses in more detail:
1. AI as an assistant: The primary benefit of AI in testing lies in increasing the efficiency of testing activities and not in replacing humans.
2. human-in-the-loop: Humans are absolutely needed for the quality assessment of AI; we cannot and must not replace humans at all.
AI can already take over many testing activities to varying degrees, especially where repeatable, easily automated and generative activities are involved, as well as text-intensive activities in which large amounts of text are analyzed and evaluated. All activities in the testing process can thus be supported by AI, e.g.
With the help of AI, we don't start our tasks from scratch. AI can easily take over 80% of the preparatory work, leaving humans to "only" take care of the remaining 20%. Overall, test tasks can be carried out much more efficiently and quickly - we have to invest less human labor to solve a problem.
However, AI does not release us from being well-trained and experienced testers. For the 20% of the remaining work, we have to master our testing skills and need the right skills to be able to assess, evaluate and, if necessary, revise the quality of the 80% of the preliminary work.
This already shows that the greater danger for human testers is not the AI itself, but other people who are better at testing and know how to deal with AI.
The use of AI therefore makes it possible to save resources (human labor) when carrying out test activities. We therefore need fewer people to carry out the same tasks. If we were to leave it at the same amount of tasks, we would actually need fewer people in the long term.
However, we will reinvest the resources gained through the increase in efficiency in order to be able to carry out more tasks - the increase in efficiency therefore leads to higher resource requirements. In economics, this is known as the "Jevons paradox" and, in a broader sense, the "rebound effect".
This effect can often be observed in many areas, e.g. a faster connection between city A and city B when traveling by train leads to increased use of this connection. But this effect is also known in our discipline, for example test automation: test automation is intended to reduce the effort for repeated manual tests, the expected saving is often a reduction in manual test effort as well as faster tests and more efficient test processes.
Although automation increases the efficiency of test execution, the increased need for human labor in the form of scripting, maintenance, training and supplementary testing can lead to more resources (especially time and personnel) being required overall.
When using AI in testing, it is therefore reasonable to assume that we will see similar effects in terms of human labor. The efficiency gain will lead to faster and/or more frequent testing, human testers can focus more on the things they can do better than the machine. The need for humans will not be reduced.
The trend in software development over the last few decades is that we are producing more and more software in ever shorter cycles. This is made possible not least by greater efficiency in testing - automation and shift-left approaches are key pillars of efficient software development. Increased efficiency in testing through AI will therefore help to further shorten development cycles and thus produce more software in less time. This will not reduce the need for people.
In addition, the use of AI in testing enables us to do things that we would otherwise not do at all, as the benefits would be offset by a very high level of manual effort. An example: we have a very large portfolio of automated tests using Selenium, but now want to use a new test automation framework, e.g. Playwright.
The manual effort involved in migrating the test scripts from Selenium to Playwright is very high and can tie up a lot of resources that are more urgently needed elsewhere. With the help of AI, such a migration can be made much more efficient or even possible in the first place.
Before we consider who we could replace with AI, let's take a look at where we are today. We often have two challenges in testing where AI can help us to fulfill our actual duty (to increase effectiveness) before we even think about increasing efficiency:
Before we investigate further whether human testers can be replaced, we should consider whether we want this at all. Here's a little thought experiment: the process of software development can typically be broken down into different phases or tasks: Analysis, design, development and testing. As things stand today, all of these tasks are carried out by people (see Figure 4).
Assuming we were to have three of these tasks performed by AI in the future - which would we choose, or for which task would we like to keep humans?
Purely intuitively and following our instincts, we would probably decide not to let the AI take over testing completely, or at least as the last task (see Figure 5). Why is that the case? Probably because it simply feels wrong to take humans out of the equation when evaluating quality and deciding on the productive use of software.
Most systems are developed for use by humans and not for use by machines. A system that is to be used by humans also requires testing by humans, as only they can judge what they want.
Usability and accessibility are examples of quality criteria to be tested. These are inherently aimed at people and can therefore only be conclusively assessed by people.
Many testing activities can be usefully automated - but by no means all of them. Although analyzing and evaluating text-intensive content, generating test cases or test data or performing and evaluating tests are relevant activities that can also be easily automated, there is much more to testing.
People are a key factor in test projects; it's about culture, mindset, communication and collaboration. People bring their intuition, their instinct - in other words, their common sense. They have a good sense of "just enough" and are able to make decisions. All of these things cannot (at least not yet) be automated or mapped by an AI.
AI is not the solution for everything. For example, while Large Language Models (LLM) show impressive capabilities and benefits for a range of use cases, they also have significant shortcomings that affect their ability to understand and reason. They thus have limitations in terms of the breadth and depth of their isolated applicability without human supervision (see S. Williams, J. Huckle: Easy Problems That LLMs Get Wrong, arXiv:2405.19616, [5]):
These limitations are by no means intended to deny the possibilities and potential of AI in testing, but they should make it clear that it requires a critical approach and that it poses challenges in testing. Ultimately, we need people with their critical thinking, intuition and ability to think outside the box - which, incidentally, are typical and important skills of testers.
The limitations of AI mean challenges in testing where humans are required, e.g.
Determining the quality of an AI plays a role in testing the AI, but is also important when using AI for testing. Certain quality characteristics, such as freedom from bias, ethics, fairness, transparency and explainability, can only be assessed by humans.
The testing of AI can be partially automated by machines, but for the most part, humans are also required to assess quality. Examples of this are the evaluation of training, validation and test data and their quality, the resolution of the test oracle problem (e.g. through suitable test procedures such as metamorphic testing, A/B testing, back-to-back testing and exploratory testing) and the evaluation of usability and accessibility.
With the EU AI Act, the European Union has presented a directive on the safe and ethical use of artificial intelligence. The aim is to promote innovation, strengthen trust and minimize risks when using AI. With regard to humans, the EU AI Act requires that "AI systems should be monitored by humans and not by automation in order to prevent harmful outcomes" [6].
Humans must not be taken out of the equation here; on the contrary, humans must play a central role in the development of AI systems.
However, this also requires the people involved to be appropriately qualified. This is also required in Article 4 "AI skills" of the EU AI Act: "Providers and operators of AI systems shall take measures to ensure, to the best of their knowledge and belief, that their personnel and other persons involved in the operation and use of AI systems on their behalf have sufficient AI skills [...]" [7]. Here we can derive a direct mandate for us as testers to deal with AI accordingly and to undergo further training.
In view of the rapid developments in AI technologies and the enormous potential for the future, it should be obvious that it is not a question of whether the job profile of testing will change, but how - and we should adapt to this.
As outlined above, in my view, the primary benefit of AI in testing is to increase the efficiency of testing activities and not to replace humans. Humans are absolutely essential for the quality assessment of AI, and we must not replace humans at all.
AI is a technology that was created and is being further developed by humans and should primarily benefit humans. The idea that the development towards an AI that is superior to humans is inevitable is not only dystopian, but also ignores the fact that humans themselves are in control of how they shape their future.
Of course, we can only have a limited influence on how AI will continue to establish itself worldwide, driven by the big tech companies. But we should at least be able to influence within our discipline whether human testers become AI assistants or vice versa.
Do we want a future in which AI does all the cool testing tasks for us and we only have to click "Start" and "Stop", or would it not be much more pleasant if AI assisted us with all the dull, boring, tedious, time-consuming and repetitive tasks, making testing even more challenging and attractive for us? I think that we have it in our own hands and should shape the future of testing in this way.
Regardless of this very big (and admittedly rather philosophical) question, we will see a shift in the focus of our work from tactical (how do I need to test?) to strategic (why do I need to test?) - at least if we don't want AI to take over our job. We can already see how LLMs are very good at solving very specific and technical tasks.
If I am only good at writing test scripts in a specific test automation framework, AI is already a real competitor for me today. This tactical knowledge (also called product knowledge) will be the first to be replaced by AI. It is therefore advisable to move away from product knowledge towards concept knowledge and focus more on the "what" and "why" of testing.
Should the singularity actually come at some point in the distant future, then none of this will matter; the fact that software exists and needs to be tested will no longer play a role. However, this should remain a rather unlikely and - see above - dystopian outlook.
Does this shift from tactical to strategic tasks also mean that AI will take on junior tasks and humans will take on senior tasks? This sounds quite attractive at first, but it can't work because I can't become a senior without having had the experience of a junior (see Figure 8).
AI will never relieve us of the task of evaluating, classifying and critically analyzing. I can't do all that if I've never done it myself. In order to be able to work conceptually, delegate work and evaluate results and make well-founded decisions, I need to have gained the relevant experience myself and have worked as a junior.
If we rely too much or even exclusively on machine support, we run the risk of unlearning important skills (like a car driver who learns to navigate and make decisions independently due to the many assistance and navigation systems) or having to trust blindly (cf. the automation bias in test automation).
A more apt description of the collaboration between human testers and AI is that of a sparring partner. The AI is the tester's sparring partner, it challenges him and helps him to develop and improve and prepare for his "battle".
Another analogy is the pilot-co-pilot relationship in a cockpit: both pilot and co-pilot are in principle capable of piloting an aircraft on their own, they are appropriately qualified and trained. However, they work together, supporting and controlling each other. Following this metaphor, the human tester should be the pilot and the AI the co-pilot.
A good tester will achieve better results even faster with the help of AI - but a bad tester will also achieve worse results even faster. AI will not take our tester job away from us, but rather someone else who knows how to use AI effectively will take our job. Or to put it positively: Testers who, on the one hand, have mastered the craft of testing and, on the other, know how to both test and use AI will significantly increase their market value.
The most important foundation is and remains to follow the basic principles, processes and methods of testing as we have known and successfully applied them for many years. We cannot throw AI at the test problem and be done with it. Tester skills and experience were, are and will remain the key to successful testing.
We now need to complement these skills with basic skills in AI systems and technologies, e.g.
By increasing efficiency and productivity through the use of AI, we will produce more and more software in ever shorter cycles and will absolutely need people. To prevent human testers from becoming a bottleneck, I can well imagine that we will be dependent on even more people in the area of testing than before.
The future we should prepare for is not one in which we will either work with or without AI. We will see different levels of autonomy in different areas: In the testing process, different testing activities will be supported by AI to varying degrees.
The activities that already have a high degree of automation today will be the first and most strongly supported by AI tools and assistants (especially test execution). Other test activities, on the other hand, will retain a high proportion of manual work (especially test management).
Human testers and machines will work together in a complementary way; in some areas, the human will be in the lead - supported by the machine - and in other areas, the machine will be in the lead - with the human-in-the-loop (see Figure 9).
[1] https://de.wikipedia.org/wiki/Jevons-Paradoxon
[2] https://de.statista.com/outlook/tmo/software/weltweit
[3] https://indiepocalypse.social/@AuthorJMac/112178826967890119
[4] https://www.spiegel.de/fotostrecke/cartoon-des-tages-fotostrecke-142907.html