TY - GEN
T1 - Reinforcement Learning Under Partial Observability Guided by Learned Environment Models
AU - Muškardin, Edi
AU - Tappler, Martin
AU - Aichernig, Bernhard K.
AU - Pill, Ingo
N1 - Publisher Copyright:
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024.
PY - 2024
Y1 - 2024
N2 - Reinforcement learning and planning under partial observability is notoriously difficult. In this setting, decision-making agents need to perform a sequence of actions with incomplete information about the underlying state of the system. As such, methods that can act in the presence of incomplete state information are of special interest to machine learning, planning, and control communities. In the scope of this paper, we consider environments that behave like a partially observable Markov decision process (POMDP) with known discrete actions, while assuming no knowledge about its structure or transition probabilities. We propose an approach for reinforcement learning (RL) in such partially observable environments. Our approach combines Q-learning with IoAlergia, an automata learning method that can learn Markov decision processes (MDPs). By learning MDP models of the environment from the experiences of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide the RL agent with additional observations in the form of abstract environment states. By simulating new experiences on a learned model we extend the agent’s internal state representation, which in turn enables better decision-making in the presence of partial observability. In our evaluation we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.
AB - Reinforcement learning and planning under partial observability is notoriously difficult. In this setting, decision-making agents need to perform a sequence of actions with incomplete information about the underlying state of the system. As such, methods that can act in the presence of incomplete state information are of special interest to machine learning, planning, and control communities. In the scope of this paper, we consider environments that behave like a partially observable Markov decision process (POMDP) with known discrete actions, while assuming no knowledge about its structure or transition probabilities. We propose an approach for reinforcement learning (RL) in such partially observable environments. Our approach combines Q-learning with IoAlergia, an automata learning method that can learn Markov decision processes (MDPs). By learning MDP models of the environment from the experiences of the RL agent, we enable RL in partially observable domains without explicit, additional memory to track previous interactions for dealing with ambiguities stemming from partial observability. We instead provide the RL agent with additional observations in the form of abstract environment states. By simulating new experiences on a learned model we extend the agent’s internal state representation, which in turn enables better decision-making in the presence of partial observability. In our evaluation we report on the validity of our approach and its promising performance in comparison to six state-of-the-art deep RL techniques with recurrent neural networks and fixed memory.
KW - Automata Learning
KW - Markov Decision Processes
KW - Partially Observable Markov Decision Processes
KW - Reinforcement Learning
UR - http://www.scopus.com/inward/record.url?scp=85177849305&partnerID=8YFLogxK
U2 - 10.1007/978-3-031-47705-8_14
DO - 10.1007/978-3-031-47705-8_14
M3 - Conference paper
AN - SCOPUS:85177849305
SN - 9783031477041
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 257
EP - 276
BT - iFM 2023 - 18th International Conference, iFM 2023, Proceedings
A2 - Herber, Paula
A2 - Wijs, Anton
PB - Springer Science and Business Media Deutschland GmbH
T2 - 18th International Conference on integrated Formal Methods
Y2 - 13 November 2023 through 15 November 2023
ER -