Reinforcement Learning Under Partial Observability Guided by Learned Environment Models

Edi Muškardin; Martin Tappler; Bernhard K. Aichernig; Ingo Pill

doi:10.1007/978-3-031-47705-8_14

Reinforcement Learning Under Partial Observability Guided by Learned Environment Models

Edi Muškardin^*, Martin Tappler, Bernhard K. Aichernig, Ingo Pill

^*Korrespondierende/r Autor/-in für diese Arbeit

Institut für Softwaretechnologie (7160)

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

Reinforcement learning and planning under partial observability is notoriously difficult. In this setting, decision-making agents need to perform a sequence of actions with incomplete information about the underlying state of the system. As such, methods that can act in the presence of incomplete state information are of special interest to machine learning, planning, and control communities. In the scope of this paper, we consider environments that behave like a partially observable Markov decision process (POMDP) with known discrete actions, while assuming no knowledge about its structure or transition probabilities. We propose an approach for reinforcement learning (RL) in such partially observable environments. Our approach combines Q-learning with IoAlergia, an automata learning method that can learn Markov decision processes (MDPs). By learning MDP models of the environment from the experiences of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide the RL agent with additional observations in the form of abstract environment states. By simulating new experiences on a learned model we extend the agent’s internal state representation, which in turn enables better decision-making in the presence of partial observability. In our evaluation we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.

Originalsprache	englisch
Titel	iFM 2023 - 18th International Conference, iFM 2023, Proceedings
Redakteure/-innen	Paula Herber, Anton Wijs
Herausgeber (Verlag)	Springer Science and Business Media Deutschland GmbH
Seiten	257-276
Seitenumfang	20
ISBN (Print)	9783031477041
DOIs	https://doi.org/10.1007/978-3-031-47705-8_14
Publikationsstatus	Veröffentlicht - 2024
Veranstaltung	18th International Conference on integrated Formal Methods: iFM 2023 - Leiden, Niederlande Dauer: 13 Nov. 2023 → 15 Nov. 2023

Publikationsreihe

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Band	14300 LNCS
ISSN (Print)	0302-9743
ISSN (elektronisch)	1611-3349

Konferenz

Konferenz	18th International Conference on integrated Formal Methods
Land/Gebiet	Niederlande
Ort	Leiden
Zeitraum	13/11/23 → 15/11/23

ASJC Scopus subject areas

Theoretische Informatik
Allgemeine Computerwissenschaft

Zugriff auf Dokument

10.1007/978-3-031-47705-8_14

Andere Dateien und Links

Verknüpfung zur Publikation in Scopus

Dieses zitieren

Muškardin, E., Tappler, M., Aichernig, B. K., & Pill, I. (2024). Reinforcement Learning Under Partial Observability Guided by Learned Environment Models. in P. Herber, & A. Wijs (Hrsg.), iFM 2023 - 18th International Conference, iFM 2023, Proceedings (S. 257-276). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 14300 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-47705-8_14

Reinforcement Learning Under Partial Observability Guided by Learned Environment Models. / Muškardin, Edi; Tappler, Martin ; Aichernig, Bernhard K. et al.
iFM 2023 - 18th International Conference, iFM 2023, Proceedings. Hrsg. / Paula Herber; Anton Wijs. Springer Science and Business Media Deutschland GmbH, 2024. S. 257-276 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Band 14300 LNCS).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Muškardin, E, Tappler, M , Aichernig, BK & Pill, I 2024, Reinforcement Learning Under Partial Observability Guided by Learned Environment Models. in P Herber & A Wijs (Hrsg.), iFM 2023 - 18th International Conference, iFM 2023, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Bd. 14300 LNCS, Springer Science and Business Media Deutschland GmbH, S. 257-276, 18th International Conference on integrated Formal Methods, Leiden, Niederlande, 13/11/23. https://doi.org/10.1007/978-3-031-47705-8_14

Muškardin E, Tappler M , Aichernig BK, Pill I. Reinforcement Learning Under Partial Observability Guided by Learned Environment Models. in Herber P, Wijs A, Hrsg., iFM 2023 - 18th International Conference, iFM 2023, Proceedings. Springer Science and Business Media Deutschland GmbH. 2024. S. 257-276. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-47705-8_14

Muškardin, Edi ; Tappler, Martin ; Aichernig, Bernhard K. et al. / Reinforcement Learning Under Partial Observability Guided by Learned Environment Models. iFM 2023 - 18th International Conference, iFM 2023, Proceedings. Hrsg. / Paula Herber ; Anton Wijs. Springer Science and Business Media Deutschland GmbH, 2024. S. 257-276 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{89fe68baea534377b939de6d8f7b36af,

title = "Reinforcement Learning Under Partial Observability Guided by Learned Environment Models",

abstract = "Reinforcement learning and planning under partial observability is notoriously difficult. In this setting, decision-making agents need to perform a sequence of actions with incomplete information about the underlying state of the system. As such, methods that can act in the presence of incomplete state information are of special interest to machine learning, planning, and control communities. In the scope of this paper, we consider environments that behave like a partially observable Markov decision process (POMDP) with known discrete actions, while assuming no knowledge about its structure or transition probabilities. We propose an approach for reinforcement learning (RL) in such partially observable environments. Our approach combines Q-learning with IoAlergia, an automata learning method that can learn Markov decision processes (MDPs). By learning MDP models of the environment from the experiences of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide the RL agent with additional observations in the form of abstract environment states. By simulating new experiences on a learned model we extend the agent{\textquoteright}s internal state representation, which in turn enables better decision-making in the presence of partial observability. In our evaluation we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.",

keywords = "Automata Learning, Markov Decision Processes, Partially Observable Markov Decision Processes, Reinforcement Learning",

author = "Edi Mu{\v s}kardin and Martin Tappler and Aichernig, {Bernhard K.} and Ingo Pill",

note = "Publisher Copyright: {\textcopyright} The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.; 18th International Conference on integrated Formal Methods : iFM 2023 ; Conference date: 13-11-2023 Through 15-11-2023",

year = "2024",

doi = "10.1007/978-3-031-47705-8_14",

language = "English",

isbn = "9783031477041",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "257--276",

editor = "Paula Herber and Anton Wijs",

booktitle = "iFM 2023 - 18th International Conference, iFM 2023, Proceedings",

address = "Germany",

}

TY - GEN

T1 - Reinforcement Learning Under Partial Observability Guided by Learned Environment Models

AU - Muškardin, Edi

AU - Tappler, Martin

AU - Aichernig, Bernhard K.

AU - Pill, Ingo

N1 - Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.

PY - 2024

Y1 - 2024

N2 - Reinforcement learning and planning under partial observability is notoriously difficult. In this setting, decision-making agents need to perform a sequence of actions with incomplete information about the underlying state of the system. As such, methods that can act in the presence of incomplete state information are of special interest to machine learning, planning, and control communities. In the scope of this paper, we consider environments that behave like a partially observable Markov decision process (POMDP) with known discrete actions, while assuming no knowledge about its structure or transition probabilities. We propose an approach for reinforcement learning (RL) in such partially observable environments. Our approach combines Q-learning with IoAlergia, an automata learning method that can learn Markov decision processes (MDPs). By learning MDP models of the environment from the experiences of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide the RL agent with additional observations in the form of abstract environment states. By simulating new experiences on a learned model we extend the agent’s internal state representation, which in turn enables better decision-making in the presence of partial observability. In our evaluation we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.

AB - Reinforcement learning and planning under partial observability is notoriously difficult. In this setting, decision-making agents need to perform a sequence of actions with incomplete information about the underlying state of the system. As such, methods that can act in the presence of incomplete state information are of special interest to machine learning, planning, and control communities. In the scope of this paper, we consider environments that behave like a partially observable Markov decision process (POMDP) with known discrete actions, while assuming no knowledge about its structure or transition probabilities. We propose an approach for reinforcement learning (RL) in such partially observable environments. Our approach combines Q-learning with IoAlergia, an automata learning method that can learn Markov decision processes (MDPs). By learning MDP models of the environment from the experiences of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide the RL agent with additional observations in the form of abstract environment states. By simulating new experiences on a learned model we extend the agent’s internal state representation, which in turn enables better decision-making in the presence of partial observability. In our evaluation we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.

KW - Automata Learning

KW - Markov Decision Processes

KW - Partially Observable Markov Decision Processes

KW - Reinforcement Learning

UR - http://www.scopus.com/inward/record.url?scp=85177849305&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-47705-8_14

DO - 10.1007/978-3-031-47705-8_14

M3 - Conference paper

AN - SCOPUS:85177849305

SN - 9783031477041

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 257

EP - 276

BT - iFM 2023 - 18th International Conference, iFM 2023, Proceedings

A2 - Herber, Paula

A2 - Wijs, Anton

PB - Springer Science and Business Media Deutschland GmbH

T2 - 18th International Conference on integrated Formal Methods

Y2 - 13 November 2023 through 15 November 2023

ER -