Run-Time Optimization for Learned Controllers Through Quantitative Games

Guy Avni; Roderick Bloem; Krishnendu Chatterjee; Thomas Henzinger; Bettina Könighofer; Stefan Pranger

doi:10.1007/978-3-030-25540-4_36

Run-Time Optimization for Learned Controllers Through Quantitative Games

Guy Avni, Roderick Bloem, Krishnendu Chatterjee, Thomas Henzinger, Bettina Könighofer, Stefan Pranger

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Abstract

A controller is a device that interacts with a plant. At each time point, it reads the plant’s state and issues commands with the goal that the plant operates optimally. Constructing optimal controllers is a fundamental and challenging problem. Machine learning techniques have recently been successfully applied to train controllers, yet they have limitations. Learned controllers are monolithic and hard to reason about. In particular, it is difficult to add features without retraining, to guarantee any level of performance, and to achieve acceptable performance when encountering untrained scenarios. These limitations can be addressed by deploying quantitative run-time shields that serve as a proxy for the controller. At each time point, the shield reads the command issued by the controller and may choose to alter it before passing it on to the plant. We show how optimal shields that interfere as little as possible while guaranteeing a desired level of controller performance, can be generated systematically and automatically using reactive synthesis. First, we abstract the plant by building a stochastic model. Second, we consider the learned controller to be a black box. Third, we measure controller performance and shield interference by two quantitative run-time measures that are formally defined using weighted automata. Then, the problem of constructing a shield that guarantees maximal performance with minimal interference is the problem of finding an optimal strategy in a stochastic 2-player game “controller versus shield” played on the abstract state space of the plant with a quantitative objective obtained from combining the performance and interference measures. We illustrate the effectiveness of our approach by automatically constructing lightweight shields for learned traffic-light controllers in various road networks. The shields we generate avoid liveness bugs, improve controller performance in untrained and changing traffic situations, and add features to learned controllers, such as giving priority to emergency vehicles .

Originalsprache	englisch
Titel	Computer Aided Verification (CAV)
Redakteure/-innen	I. Dillig, S. Tasiran
Herausgeber (Verlag)	Springer
Seiten	630-649
Band	11561
Auflage	31
ISBN (Print)	978-3-030-25539-8
DOIs	https://doi.org/10.1007/978-3-030-25540-4_36
Publikationsstatus	Veröffentlicht - 15 Juli 2019
Veranstaltung	31st International Conference on Computer Aided Verification - The New School, New York City, USA / Vereinigte Staaten Dauer: 15 Juli 2019 → 18 Juli 2019

Publikationsreihe

Name	Lecture Notes in Computer Science
Band	11561

Konferenz

Konferenz	31st International Conference on Computer Aided Verification
Kurztitel	CAV 2019
Land/Gebiet	USA / Vereinigte Staaten
Ort	New York City
Zeitraum	15/07/19 → 18/07/19

Zugriff auf Dokument

10.1007/978-3-030-25540-4_36Lizenz: CC BY 4.0

Dieses zitieren

Avni, G., Bloem, R., Chatterjee, K., Henzinger, T., Könighofer, B., & Pranger, S. (2019). Run-Time Optimization for Learned Controllers Through Quantitative Games. in I. Dillig, & S. Tasiran (Hrsg.), Computer Aided Verification (CAV) (31 Aufl., Band 11561, S. 630-649). ( Lecture Notes in Computer Science; Band 11561). Springer. https://doi.org/10.1007/978-3-030-25540-4_36

Run-Time Optimization for Learned Controllers Through Quantitative Games. / Avni, Guy; Bloem, Roderick; Chatterjee, Krishnendu et al.
Computer Aided Verification (CAV). Hrsg. / I. Dillig; S. Tasiran. Band 11561 31. Aufl. Springer, 2019. S. 630-649 ( Lecture Notes in Computer Science; Band 11561).

Publikation: Beitrag in Buch/Bericht/Konferenzband › Beitrag in einem Konferenzband › Begutachtung

Avni, G, Bloem, R, Chatterjee, K, Henzinger, T, Könighofer, B & Pranger, S 2019, Run-Time Optimization for Learned Controllers Through Quantitative Games. in I Dillig & S Tasiran (Hrsg.), Computer Aided Verification (CAV). 31 Aufl., Bd. 11561, Lecture Notes in Computer Science, Bd. 11561, Springer, S. 630-649, 31st International Conference on Computer Aided Verification, New York City, New York, USA / Vereinigte Staaten, 15/07/19. https://doi.org/10.1007/978-3-030-25540-4_36

@inproceedings{dd2a24fece044f5eaeced0b196953e6c,

title = "Run-Time Optimization for Learned Controllers Through Quantitative Games",

abstract = "A controller is a device that interacts with a plant. At each time point, it reads the plant{\textquoteright}s state and issues commands with the goal that the plant operates optimally. Constructing optimal controllers is a fundamental and challenging problem. Machine learning techniques have recently been successfully applied to train controllers, yet they have limitations. Learned controllers are monolithic and hard to reason about. In particular, it is difficult to add features without retraining, to guarantee any level of performance, and to achieve acceptable performance when encountering untrained scenarios. These limitations can be addressed by deploying quantitative run-time shields that serve as a proxy for the controller. At each time point, the shield reads the command issued by the controller and may choose to alter it before passing it on to the plant. We show how optimal shields that interfere as little as possible while guaranteeing a desired level of controller performance, can be generated systematically and automatically using reactive synthesis. First, we abstract the plant by building a stochastic model. Second, we consider the learned controller to be a black box. Third, we measure controller performance and shield interference by two quantitative run-time measures that are formally defined using weighted automata. Then, the problem of constructing a shield that guarantees maximal performance with minimal interference is the problem of finding an optimal strategy in a stochastic 2-player game “controller versus shield” played on the abstract state space of the plant with a quantitative objective obtained from combining the performance and interference measures. We illustrate the effectiveness of our approach by automatically constructing lightweight shields for learned traffic-light controllers in various road networks. The shields we generate avoid liveness bugs, improve controller performance in untrained and changing traffic situations, and add features to learned controllers, such as giving priority to emergency vehicles .",

author = "Guy Avni and Roderick Bloem and Krishnendu Chatterjee and Thomas Henzinger and Bettina K{\"o}nighofer and Stefan Pranger",

year = "2019",

month = jul,

day = "15",

doi = "10.1007/978-3-030-25540-4_36",

language = "English",

isbn = "978-3-030-25539-8",

volume = "11561",

series = " Lecture Notes in Computer Science",

publisher = "Springer",

pages = "630--649",

editor = "I. Dillig and S. Tasiran",

booktitle = "Computer Aided Verification (CAV)",

edition = "31",

note = "31st International Conference on Computer Aided Verification, CAV 2019 ; Conference date: 15-07-2019 Through 18-07-2019",

}

TY - GEN

T1 - Run-Time Optimization for Learned Controllers Through Quantitative Games

AU - Avni, Guy

AU - Bloem, Roderick

AU - Chatterjee, Krishnendu

AU - Henzinger, Thomas

AU - Könighofer, Bettina

AU - Pranger, Stefan

PY - 2019/7/15

Y1 - 2019/7/15

N2 - A controller is a device that interacts with a plant. At each time point, it reads the plant’s state and issues commands with the goal that the plant operates optimally. Constructing optimal controllers is a fundamental and challenging problem. Machine learning techniques have recently been successfully applied to train controllers, yet they have limitations. Learned controllers are monolithic and hard to reason about. In particular, it is difficult to add features without retraining, to guarantee any level of performance, and to achieve acceptable performance when encountering untrained scenarios. These limitations can be addressed by deploying quantitative run-time shields that serve as a proxy for the controller. At each time point, the shield reads the command issued by the controller and may choose to alter it before passing it on to the plant. We show how optimal shields that interfere as little as possible while guaranteeing a desired level of controller performance, can be generated systematically and automatically using reactive synthesis. First, we abstract the plant by building a stochastic model. Second, we consider the learned controller to be a black box. Third, we measure controller performance and shield interference by two quantitative run-time measures that are formally defined using weighted automata. Then, the problem of constructing a shield that guarantees maximal performance with minimal interference is the problem of finding an optimal strategy in a stochastic 2-player game “controller versus shield” played on the abstract state space of the plant with a quantitative objective obtained from combining the performance and interference measures. We illustrate the effectiveness of our approach by automatically constructing lightweight shields for learned traffic-light controllers in various road networks. The shields we generate avoid liveness bugs, improve controller performance in untrained and changing traffic situations, and add features to learned controllers, such as giving priority to emergency vehicles .

AB - A controller is a device that interacts with a plant. At each time point, it reads the plant’s state and issues commands with the goal that the plant operates optimally. Constructing optimal controllers is a fundamental and challenging problem. Machine learning techniques have recently been successfully applied to train controllers, yet they have limitations. Learned controllers are monolithic and hard to reason about. In particular, it is difficult to add features without retraining, to guarantee any level of performance, and to achieve acceptable performance when encountering untrained scenarios. These limitations can be addressed by deploying quantitative run-time shields that serve as a proxy for the controller. At each time point, the shield reads the command issued by the controller and may choose to alter it before passing it on to the plant. We show how optimal shields that interfere as little as possible while guaranteeing a desired level of controller performance, can be generated systematically and automatically using reactive synthesis. First, we abstract the plant by building a stochastic model. Second, we consider the learned controller to be a black box. Third, we measure controller performance and shield interference by two quantitative run-time measures that are formally defined using weighted automata. Then, the problem of constructing a shield that guarantees maximal performance with minimal interference is the problem of finding an optimal strategy in a stochastic 2-player game “controller versus shield” played on the abstract state space of the plant with a quantitative objective obtained from combining the performance and interference measures. We illustrate the effectiveness of our approach by automatically constructing lightweight shields for learned traffic-light controllers in various road networks. The shields we generate avoid liveness bugs, improve controller performance in untrained and changing traffic situations, and add features to learned controllers, such as giving priority to emergency vehicles .

U2 - 10.1007/978-3-030-25540-4_36

DO - 10.1007/978-3-030-25540-4_36

M3 - Conference paper

SN - 978-3-030-25539-8

VL - 11561

T3 - Lecture Notes in Computer Science

SP - 630

EP - 649

BT - Computer Aided Verification (CAV)

A2 - Dillig, I.

A2 - Tasiran, S.

PB - Springer

T2 - 31st International Conference on Computer Aided Verification

Y2 - 15 July 2019 through 18 July 2019

ER -

Run-Time Optimization for Learned Controllers Through Quantitative Games

Abstract

Publikationsreihe

Konferenz

Zugriff auf Dokument

Fingerprint

Dieses zitieren