Projects per year
Abstract
Since the advent of Large Language Models (LLM[s]) a few years ago, they have not only reached the mainstream but have become a commodity. Their application areas steadily expand because of sophisticated model architectures and enormous training corpora. However, accessible chatbot user interfaces and human-like responses may cause a tendency to overestimate their abilities. This study contributes to demonstrating the strengths and weaknesses of LLMs. In this work, we bridge methods from sub-symbolic and symbolic AI. In particular, we evaluate the capabilities of LLMs to convert textual requirements documents into their logical representation, enabling analysis and reasoning. This task demonstrates a use case close to industry, as requirements analysis is key in requirements and system engineering. Our experiments evaluate the popular model family used in OpenAI's ChatGPT, GPT-3.5, and GPT-4. The underlying goal of testing for the correct abstraction of meaning is not trivial, as the relationship between input and output semantics is not directly measurable. Thus, it is necessary to approximate translation correctness through quantifiable criteria. Most notably, we defined consistency-based metrics for the plausibility and stability of translations. Our experiments give insights into syntactical validity, semantic plausibility, stability of translations, and parameter configurations for LLM translations. We use real-world requirements and test the LLMs' performance out of the box and after pre-training. Experimentally, we demonstrated the strong relation between ChatGPT parameters and the stability of translations. Finally, we showed that even the best model configurations produced syntactically faulty (5%) or semantically implausible (7%) output and are not stable in their results.
Original language | English |
---|---|
Title of host publication | Proceedings - 2024 IEEE 24th International Conference on Software Quality, Reliability and Security, QRS 2024 |
Publisher | IEEE |
Pages | 238-249 |
Number of pages | 12 |
ISBN (Electronic) | 9798350365634 |
DOIs | |
Publication status | Published - 26 Sept 2024 |
Event | 24th IEEE International Conference on Software Quality, Reliability and Security, QRS 2024 - Cambridge, United Kingdom Duration: 1 Jul 2024 → 5 Jul 2024 |
Publication series
Name | IEEE International Conference on Software Quality, Reliability and Security, QRS |
---|---|
ISSN (Print) | 2693-9177 |
Conference
Conference | 24th IEEE International Conference on Software Quality, Reliability and Security, QRS 2024 |
---|---|
Country/Territory | United Kingdom |
City | Cambridge |
Period | 1/07/24 → 5/07/24 |
Keywords
- ChatGPT
- large language models
- logical abstraction
- NLP
- requirements engineering
- symbolic AI
ASJC Scopus subject areas
- Software
- Safety, Risk, Reliability and Quality
- Artificial Intelligence
Fields of Expertise
- Information, Communication & Computing
Fingerprint
Dive into the research topics of 'Evaluating OpenAI Large Language Models for Generating Logical Abstractions of Technical Requirements Documents'. Together they form a unique fingerprint.Projects
- 1 Finished
-
CD-Laboratory for Quality Assurance Methodologies for Autonomous Cyber-Physical Systems
Wotawa, F. (Co-Investigator (CoI))
1/10/17 → 30/09/24
Project: Research project
Activities
- 1 Talk at conference or symposium
-
Evaluating OpenAI Large Language Models for Generating Logical Abstractions of Technical Requirements Documents
Perko, A. (Speaker)
3 Jul 2024Activity: Talk or presentation › Talk at conference or symposium › Science to science