Using Combinatorial Testing for Prompt Engineering of LLMs in Medicine

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

Large Language Models (LLMs) like GPT-4o are of growing interest. Interfaces such as ChatGPT invite an ever-growing number of people to ask questions, including health advice, which brings in additional risks for harm. It is well known that tools based on LLMs tend to hallucinate or deliver different answers for the same or similar questions. In both cases, the outcome might be wrong or incomplete, possibly leading to safety issues. In this paper, we investigate the outcome of ChatGPT when we ask similar questions in the medical domain. In particular, we suggest using combinatorial testing to generate variants of questions aimed at identifying wrong or misleading answers. In detail, we discuss the general framework and its parts and present a proof-of-concept utilizing a medical query and ChatGPT.
Original languageEnglish
Title of host publicationProceedings of the 27th International Multiconference Information Society – IS 2024
ChapterK
Pages930-935
Number of pages6
DOIs
Publication statusPublished - Oct 2024
Event27th International Multiconference Information Society – IS 2024 - Ljubljana, Slovenia
Duration: 7 Oct 202411 Oct 2024

Conference

Conference27th International Multiconference Information Society – IS 2024
Abbreviated titleIS 2024
Country/TerritorySlovenia
CityLjubljana
Period7/10/2411/10/24

Keywords

  • Large Language Models
  • ChatGPT
  • Prompt Engineering
  • Combinatorial Testing
  • Validation

ASJC Scopus subject areas

  • Artificial Intelligence

Fields of Expertise

  • Information, Communication & Computing

Cite this