Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder

Jianning Li; Jana Fragemann; Seyed Ahmad Ahmadi; Jens Kleesiek; Jan Egger

doi:10.1007/978-3-031-25046-0_7

Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder

Jianning Li^*, Jana Fragemann, Seyed Ahmad Ahmadi, Jens Kleesiek, Jan Egger

^*Corresponding author for this work

Institute of Computer Graphics and Vision (7100)

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Abstract

The reconstruction loss and the Kullback-Leibler divergence (KLD) loss in a variational autoencoder (VAE) often play antagonistic roles, and tuning the weight of the KLD loss in β -VAE to achieve a balance between the two losses is a tricky and dataset-specific task. As a result, current practices in VAE training often result in a trade-off between the reconstruction fidelity and the continuity/disentanglement of the latent space, if the weight β is not carefully tuned. In this paper, we present intuitions and a careful analysis of the antagonistic mechanism of the two losses, and propose, based on the insights, a simple yet effective two-stage method for training a VAE. Specifically, the method aggregates a learned Gaussian posterior z∼ q_θ(z| x) with a decoder decoupled from the KLD loss, which is trained to learn a new conditional distribution p_ϕ(x| z) of the input data x. Experimentally, we show that the aggregated VAE maximally satisfies the Gaussian assumption about the latent space, while still achieves a reconstruction error comparable to when the latent space is only loosely regularized by N(0, I). The proposed approach does not require hyperparameter (i.e., the KLD weight β ) tuning given a specific dataset as required in common VAE training practices. We evaluate the method using a medical dataset intended for 3D skull reconstruction and shape completion, and the results indicate promising generative capabilities of the VAE trained using the proposed method. Besides, through guided manipulation of the latent variables, we establish a connection between existing autoencoder (AE)-based approaches and generative approaches, such as VAE, for the shape completion problem. Codes and pre-trained weights are available at https://github.com/Jianningli/skullVAE.

Original language	English
Title of host publication	Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings
Editors	Jana Fragemann, Jianning Li, Jan Egger, Xiao Liu, Sotirios A. Tsaftaris, Jens Kleesiek
Publisher	Springer Science and Business Media Deutschland GmbH
Pages	70-92
Number of pages	23
ISBN (Print)	9783031250453
DOIs	https://doi.org/10.1007/978-3-031-25046-0_7
Publication status	Published - 2023
Event	1st MICCAI Workshop on Medical Applications with Disentanglements, held in conjunction with MICCAI 2022: MAD 2022 - Singapore, Singapore Duration: 22 Sept 2022 → 22 Sept 2022

Publication series

Name	Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume	13823 LNCS
ISSN (Print)	0302-9743
ISSN (Electronic)	1611-3349

Conference

Conference	1st MICCAI Workshop on Medical Applications with Disentanglements, held in conjunction with MICCAI 2022
Country/Territory	Singapore
City	Singapore
Period	22/09/22 → 22/09/22

Keywords

Disentanglement
Latent representation
Shape completion
Skull reconstruction
VAE

ASJC Scopus subject areas

Theoretical Computer Science
General Computer Science

Access to Document

10.1007/978-3-031-25046-0_7

Cite this

Li, J., Fragemann, J., Ahmadi, S. A., Kleesiek, J., & Egger, J. (2023). Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder. In J. Fragemann, J. Li, J. Egger, X. Liu, S. A. Tsaftaris, & J. Kleesiek (Eds.), Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings (pp. 70-92). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13823 LNCS). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-25046-0_7

Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder. / Li, Jianning; Fragemann, Jana; Ahmadi, Seyed Ahmad et al.
Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings. ed. / Jana Fragemann; Jianning Li; Jan Egger; Xiao Liu; Sotirios A. Tsaftaris; Jens Kleesiek. Springer Science and Business Media Deutschland GmbH, 2023. p. 70-92 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 13823 LNCS).

Research output: Chapter in Book/Report/Conference proceeding › Conference paper › peer-review

Li, J, Fragemann, J, Ahmadi, SA, Kleesiek, J & Egger, J 2023, Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder. in J Fragemann, J Li, J Egger, X Liu, SA Tsaftaris & J Kleesiek (eds), Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13823 LNCS, Springer Science and Business Media Deutschland GmbH, pp. 70-92, 1st MICCAI Workshop on Medical Applications with Disentanglements, held in conjunction with MICCAI 2022, Singapore, Singapore, 22/09/22. https://doi.org/10.1007/978-3-031-25046-0_7

Li J, Fragemann J, Ahmadi SA, Kleesiek J, Egger J. Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder. In Fragemann J, Li J, Egger J, Liu X, Tsaftaris SA, Kleesiek J, editors, Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings. Springer Science and Business Media Deutschland GmbH. 2023. p. 70-92. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). doi: 10.1007/978-3-031-25046-0_7

Li, Jianning ; Fragemann, Jana ; Ahmadi, Seyed Ahmad et al. / Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder. Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings. editor / Jana Fragemann ; Jianning Li ; Jan Egger ; Xiao Liu ; Sotirios A. Tsaftaris ; Jens Kleesiek. Springer Science and Business Media Deutschland GmbH, 2023. pp. 70-92 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).

@inproceedings{250a547b3fe0410086dd4a1448b0a2c8,

title = "Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder",

abstract = "The reconstruction loss and the Kullback-Leibler divergence (KLD) loss in a variational autoencoder (VAE) often play antagonistic roles, and tuning the weight of the KLD loss in β -VAE to achieve a balance between the two losses is a tricky and dataset-specific task. As a result, current practices in VAE training often result in a trade-off between the reconstruction fidelity and the continuity/disentanglement of the latent space, if the weight β is not carefully tuned. In this paper, we present intuitions and a careful analysis of the antagonistic mechanism of the two losses, and propose, based on the insights, a simple yet effective two-stage method for training a VAE. Specifically, the method aggregates a learned Gaussian posterior z∼ qθ(z| x) with a decoder decoupled from the KLD loss, which is trained to learn a new conditional distribution pϕ(x| z) of the input data x. Experimentally, we show that the aggregated VAE maximally satisfies the Gaussian assumption about the latent space, while still achieves a reconstruction error comparable to when the latent space is only loosely regularized by N(0, I). The proposed approach does not require hyperparameter (i.e., the KLD weight β ) tuning given a specific dataset as required in common VAE training practices. We evaluate the method using a medical dataset intended for 3D skull reconstruction and shape completion, and the results indicate promising generative capabilities of the VAE trained using the proposed method. Besides, through guided manipulation of the latent variables, we establish a connection between existing autoencoder (AE)-based approaches and generative approaches, such as VAE, for the shape completion problem. Codes and pre-trained weights are available at https://github.com/Jianningli/skullVAE.",

keywords = "Disentanglement, Latent representation, Shape completion, Skull reconstruction, VAE",

author = "Jianning Li and Jana Fragemann and Ahmadi, {Seyed Ahmad} and Jens Kleesiek and Jan Egger",

note = "Publisher Copyright: {\textcopyright} 2023, The Author(s), under exclusive license to Springer Nature Switzerland AG.; 1st MICCAI Workshop on Medical Applications with Disentanglements, held in conjunction with MICCAI 2022 : MAD 2022 ; Conference date: 22-09-2022 Through 22-09-2022",

year = "2023",

doi = "10.1007/978-3-031-25046-0_7",

language = "English",

isbn = "9783031250453",

series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",

publisher = "Springer Science and Business Media Deutschland GmbH",

pages = "70--92",

editor = "Jana Fragemann and Jianning Li and Jan Egger and Xiao Liu and Tsaftaris, {Sotirios A.} and Jens Kleesiek",

booktitle = "Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings",

address = "Germany",

}

TY - GEN

T1 - Training β -VAE by Aggregating a Learned Gaussian Posterior with a Decoupled Decoder

AU - Li, Jianning

AU - Fragemann, Jana

AU - Ahmadi, Seyed Ahmad

AU - Kleesiek, Jens

AU - Egger, Jan

PY - 2023

Y1 - 2023

N2 - The reconstruction loss and the Kullback-Leibler divergence (KLD) loss in a variational autoencoder (VAE) often play antagonistic roles, and tuning the weight of the KLD loss in β -VAE to achieve a balance between the two losses is a tricky and dataset-specific task. As a result, current practices in VAE training often result in a trade-off between the reconstruction fidelity and the continuity/disentanglement of the latent space, if the weight β is not carefully tuned. In this paper, we present intuitions and a careful analysis of the antagonistic mechanism of the two losses, and propose, based on the insights, a simple yet effective two-stage method for training a VAE. Specifically, the method aggregates a learned Gaussian posterior z∼ qθ(z| x) with a decoder decoupled from the KLD loss, which is trained to learn a new conditional distribution pϕ(x| z) of the input data x. Experimentally, we show that the aggregated VAE maximally satisfies the Gaussian assumption about the latent space, while still achieves a reconstruction error comparable to when the latent space is only loosely regularized by N(0, I). The proposed approach does not require hyperparameter (i.e., the KLD weight β ) tuning given a specific dataset as required in common VAE training practices. We evaluate the method using a medical dataset intended for 3D skull reconstruction and shape completion, and the results indicate promising generative capabilities of the VAE trained using the proposed method. Besides, through guided manipulation of the latent variables, we establish a connection between existing autoencoder (AE)-based approaches and generative approaches, such as VAE, for the shape completion problem. Codes and pre-trained weights are available at https://github.com/Jianningli/skullVAE.

AB - The reconstruction loss and the Kullback-Leibler divergence (KLD) loss in a variational autoencoder (VAE) often play antagonistic roles, and tuning the weight of the KLD loss in β -VAE to achieve a balance between the two losses is a tricky and dataset-specific task. As a result, current practices in VAE training often result in a trade-off between the reconstruction fidelity and the continuity/disentanglement of the latent space, if the weight β is not carefully tuned. In this paper, we present intuitions and a careful analysis of the antagonistic mechanism of the two losses, and propose, based on the insights, a simple yet effective two-stage method for training a VAE. Specifically, the method aggregates a learned Gaussian posterior z∼ qθ(z| x) with a decoder decoupled from the KLD loss, which is trained to learn a new conditional distribution pϕ(x| z) of the input data x. Experimentally, we show that the aggregated VAE maximally satisfies the Gaussian assumption about the latent space, while still achieves a reconstruction error comparable to when the latent space is only loosely regularized by N(0, I). The proposed approach does not require hyperparameter (i.e., the KLD weight β ) tuning given a specific dataset as required in common VAE training practices. We evaluate the method using a medical dataset intended for 3D skull reconstruction and shape completion, and the results indicate promising generative capabilities of the VAE trained using the proposed method. Besides, through guided manipulation of the latent variables, we establish a connection between existing autoencoder (AE)-based approaches and generative approaches, such as VAE, for the shape completion problem. Codes and pre-trained weights are available at https://github.com/Jianningli/skullVAE.

KW - Disentanglement

KW - Latent representation

KW - Shape completion

KW - Skull reconstruction

KW - VAE

UR - http://www.scopus.com/inward/record.url?scp=85151059789&partnerID=8YFLogxK

U2 - 10.1007/978-3-031-25046-0_7

DO - 10.1007/978-3-031-25046-0_7

M3 - Conference paper

AN - SCOPUS:85151059789

SN - 9783031250453

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 70

EP - 92

BT - Medical Applications with Disentanglements - First MICCAI Workshop, MAD 2022, Held in Conjunction with MICCAI 2022, Proceedings

A2 - Fragemann, Jana

A2 - Li, Jianning

A2 - Egger, Jan

A2 - Liu, Xiao

A2 - Tsaftaris, Sotirios A.

A2 - Kleesiek, Jens

PB - Springer Science and Business Media Deutschland GmbH

T2 - 1st MICCAI Workshop on Medical Applications with Disentanglements, held in conjunction with MICCAI 2022

Y2 - 22 September 2022 through 22 September 2022

ER -