Software system testing assisted by large language models: an exploratory study

IRIS

Large language models (LLMs) based on transformer architecture have revolutionized natural language processing (NLP), demonstrating excellent capabilities in understanding and generating human-like text. In Software Engineering, LLMs have been applied in code generation, documentation, and report writing tasks, to support the developer and reduce the amount of manual work. In Software Testing, one of the cornerstones of Software Engineering, LLMs have been explored for generating test code, test inputs, automating the oracle process or generating test scenarios. However, their application to high-level testing stages such as system testing, in which a deep knowledge of the business and the technological stack is needed, remains largely unexplored. This paper presents an exploratory study about how LLMs can support system test development. Given that LLM performance depends on input data quality, the study focuses on how to query general purpose LLMs to first obtain test scenarios and then derive test cases from them. The study evaluates two popular LLMs (GPT-4o and GPT-4o-mini), using as a benchmark a European project demonstrator. The study compares two different prompt strategies and employs well-established prompt patterns, showing promising results as well as room for improvement in the application of LLMs to support system testing.

Software system testing assisted by large language models: an exploratory study

Augusto C.;Morán J.;Bertolino A.;de la Riva C.;Tuya J.

2025-01-01

Abstract

Large language models (LLMs) based on transformer architecture have revolutionized natural language processing (NLP), demonstrating excellent capabilities in understanding and generating human-like text. In Software Engineering, LLMs have been applied in code generation, documentation, and report writing tasks, to support the developer and reduce the amount of manual work. In Software Testing, one of the cornerstones of Software Engineering, LLMs have been explored for generating test code, test inputs, automating the oracle process or generating test scenarios. However, their application to high-level testing stages such as system testing, in which a deep knowledge of the business and the technological stack is needed, remains largely unexplored. This paper presents an exploratory study about how LLMs can support system test development. Given that LLM performance depends on input data quality, the study focuses on how to query general purpose LLMs to first obtain test scenarios and then derive test cases from them. The study evaluates two popular LLMs (GPT-4o and GPT-4o-mini), using as a benchmark a European project demonstrator. The study compares two different prompt strategies and employs well-established prompt patterns, showing promising results as well as room for improvement in the application of LLMs to support system testing.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2025
			
	Codice ISBN
	
				9783031808883
			
	Parole chiave
	
				Large Language Model
Software Testing
System Testing
Test Cases
Test Scenarios
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.12571/35847

Citazioni

ND

1

0

social impact