Learning and Repair of Deep Reinforcement Learning Policies from Fuzz-Testing Data

Martin Tappler, Andrea Pferscher, Bernhard Aichernig, Bettina Könighofer

Research output: Chapter in Book/Report/Conference proceedingConference paperpeer-review

Abstract

Reinforcement learning from demonstrations (RLfD) is a promising approach to improve the exploration efficiency of reinforcement learning (RL) by learning from expert demonstrations in addition to interactions with the environment. In this paper, we propose a framework that combines techniques from search-based testing with RLfD with the goal to raise the level of dependability of RL policies and to reduce human engineering effort. Within our framework, we provide methods for efficiently training, evaluating, and repairing RL policies. Instead of relying on the costly collection of demonstrations from (human) experts, we automatically compute a diverse set of demonstrations via search-based fuzzing methods and use the fuzz demonstrations for RLfD. To evaluate the safety and robustness of the trained RL agent, we search for safety-critical scenarios in the black-box environment. Finally, when unsafe behavior is detected, we compute demonstrations through fuzz testing that represent safe behavior and use them to repair the policy. Our experiments show that our framework is able to efficiently learn high-performing and safe policies without requiring any expert knowledge.
Original languageEnglish
Title of host publicationICSE 2024 - Proceedings of the 46th IEEE/ACM International Conference on Software Engineering
Pages27-39
ISBN (Electronic)9798400702174
DOIs
Publication statusPublished - 6 Feb 2024
Event46th International Conference on Software Engineering: ICSE 2024 - Lissabon, Portugal
Duration: 14 Apr 202420 Apr 2024
Conference number: 46
https://conf.researchr.org/home/icse-2024

Conference

Conference46th International Conference on Software Engineering
Abbreviated titleICSE 2024
Country/TerritoryPortugal
CityLissabon
Period14/04/2420/04/24
Internet address

Keywords

  • Deep reinforcement learning
  • Reinforcement learning from demonstrations
  • Search-based software testing
  • Policy repair

ASJC Scopus subject areas

  • Software

Fields of Expertise

  • Information, Communication & Computing

Cite this