Safe Reinforcement Learning Using Probabilistic Shields

Nils Jansen; Bettina Könighofer; Sebastian Junges; Alex Serban; Roderick Bloem

doi:10.4230/LIPIcs.CONCUR.2020.3

Safe Reinforcement Learning Using Probabilistic Shields

Nils Jansen, Bettina Könighofer, Sebastian Junges, Alex Serban, Roderick Bloem

Institut für Angewandte Informationsverarbeitung und Kommunikationstechnologie (7050)

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

Originalsprache	englisch
Titel	31st International Conference on Concurrency Theory, CONCUR 2020
Untertitel	31st CONCUR 2020: Vienna, Austria (Virtual Conference)
Redakteure/-innen	Igor Konnov, Laura Kovacs
Herausgeber (Verlag)	Schloss Dagstuhl - Leibniz-Zentrum für Informatik
Seiten	31-316
Seitenumfang	286
ISBN (elektronisch)	978-3-95977-160-3
DOIs	https://doi.org/10.4230/LIPIcs.CONCUR.2020.3
Publikationsstatus	Veröffentlicht - 2020
Veranstaltung	31st International Conference on Concurrency Theory - Virtuell, Österreich Dauer: 1 Sept. 2020 → 4 Sept. 2020

Konferenz

Konferenz	31st International Conference on Concurrency Theory
Kurztitel	CONCUR 2020
Land/Gebiet	Österreich
Ort	Virtuell
Zeitraum	1/09/20 → 4/09/20

ASJC Scopus subject areas

Software

Zugriff auf Dokument

10.4230/LIPIcs.CONCUR.2020.3Lizenz: CC BY 4.0

Andere Dateien und Links

http://www.scopus.com/inward/record.url?scp=85091574202&partnerID=8YFLogxK

Dieses zitieren

Jansen, N., Könighofer, B., Junges, S., Serban, A., & Bloem, R. (2020). Safe Reinforcement Learning Using Probabilistic Shields. in I. Konnov, & L. Kovacs (Hrsg.), 31st International Conference on Concurrency Theory, CONCUR 2020: 31st CONCUR 2020: Vienna, Austria (Virtual Conference) (S. 31-316). Artikel 3 Schloss Dagstuhl - Leibniz-Zentrum für Informatik. https://doi.org/10.4230/LIPIcs.CONCUR.2020.3

Safe Reinforcement Learning Using Probabilistic Shields. / Jansen, Nils; Könighofer, Bettina; Junges, Sebastian et al.
31st International Conference on Concurrency Theory, CONCUR 2020: 31st CONCUR 2020: Vienna, Austria (Virtual Conference). Hrsg. / Igor Konnov; Laura Kovacs. Schloss Dagstuhl - Leibniz-Zentrum für Informatik, 2020. S. 31-316 3.

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Jansen, N, Könighofer, B, Junges, S, Serban, A & Bloem, R 2020, Safe Reinforcement Learning Using Probabilistic Shields. in I Konnov & L Kovacs (Hrsg.), 31st International Conference on Concurrency Theory, CONCUR 2020: 31st CONCUR 2020: Vienna, Austria (Virtual Conference)., 3, Schloss Dagstuhl - Leibniz-Zentrum für Informatik, S. 31-316, 31st International Conference on Concurrency Theory, Virtuell, Österreich, 1/09/20. https://doi.org/10.4230/LIPIcs.CONCUR.2020.3

@inproceedings{4e68b10aa48c4bdf992d415cebc9d387,

title = "Safe Reinforcement Learning Using Probabilistic Shields",

abstract = "This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.",

keywords = "Formal Verification, Markov Decision Process, Model Checking, Safe Exploration, Safe Reinforcement Learning",

author = "Nils Jansen and Bettina K{\"o}nighofer and Sebastian Junges and Alex Serban and Roderick Bloem",

year = "2020",

doi = "10.4230/LIPIcs.CONCUR.2020.3",

language = "English",

pages = "31--316",

editor = "Igor Konnov and Laura Kovacs",

booktitle = "31st International Conference on Concurrency Theory, CONCUR 2020",

publisher = "Schloss Dagstuhl - Leibniz-Zentrum f{\"u}r Informatik",

address = "Germany",

note = "31st International Conference on Concurrency Theory, CONCUR 2020 ; Conference date: 01-09-2020 Through 04-09-2020",

}

TY - GEN

T1 - Safe Reinforcement Learning Using Probabilistic Shields

AU - Jansen, Nils

AU - Könighofer, Bettina

AU - Junges, Sebastian

AU - Serban, Alex

AU - Bloem, Roderick

PY - 2020

Y1 - 2020

N2 - This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

AB - This paper concerns the efficient construction of a safety shield for reinforcement learning. We specifically target scenarios that incorporate uncertainty and use Markov decision processes (MDPs) as the underlying model to capture such problems. Reinforcement learning (RL) is a machine learning technique that can determine near-optimal policies in MDPs that may be unknown before exploring the model. However, during exploration, RL is prone to induce behavior that is undesirable or not allowed in safety- or mission-critical contexts. We introduce the concept of a probabilistic shield that enables RL decision-making to adhere to safety constraints with high probability. We employ formal verification to efficiently compute the probabilities of critical decisions within a safety-relevant fragment of the MDP. These results help to realize a shield that, when applied to an RL algorithm, restricts the agent from taking unsafe actions, while optimizing the performance objective. We discuss tradeoffs between sufficient progress in the exploration of the environment and ensuring safety. In our experiments, we demonstrate on the arcade game PAC-MAN and on a case study involving service robots that the learning efficiency increases as the learning needs orders of magnitude fewer episodes.

KW - Formal Verification

KW - Markov Decision Process

KW - Model Checking

KW - Safe Exploration

KW - Safe Reinforcement Learning

UR - http://www.scopus.com/inward/record.url?scp=85091574202&partnerID=8YFLogxK

U2 - 10.4230/LIPIcs.CONCUR.2020.3

DO - 10.4230/LIPIcs.CONCUR.2020.3

M3 - Conference paper

SP - 31

EP - 316

BT - 31st International Conference on Concurrency Theory, CONCUR 2020

A2 - Konnov, Igor

A2 - Kovacs, Laura

PB - Schloss Dagstuhl - Leibniz-Zentrum für Informatik

T2 - 31st International Conference on Concurrency Theory

Y2 - 1 September 2020 through 4 September 2020

ER -

Safe Reinforcement Learning Using Probabilistic Shields

Abstract

Konferenz

ASJC Scopus subject areas

Zugriff auf Dokument

Andere Dateien und Links

Fingerprint

Dieses zitieren