Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes

Wolfgang Roth; Franz Pernkopf

doi:10.1109/TPAMI.2018.2884905

Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes

Wolfgang Roth, Franz Pernkopf

Institute of Signal Processing and Speech Communication (4420)

Research output: Contribution to journal › Article › peer-review

Abstract

We extend feed-forward neural networks with a Dirichlet process prior over the weight distribution. This enforces a sharing on the network weights, which can reduce the overall number of parameters drastically. We alternately sample from the posterior of the weights and the posterior of assignments of network connections to the weights. This results in a weight sharing that is adopted to the given data. In order to make the procedure feasible, we present several techniques to reduce the computational burden. Experiments show that our approach mostly outperforms models with random weight sharing. Our model is capable of reducing the memory footprint substantially while maintaining a good performance compared to neural networks without weight sharing.

Original language	English
Pages (from-to)	246-252
Journal	IEEE Transactions on Pattern Analysis and Machine Intelligence
Volume	42
Issue number	1
DOIs	https://doi.org/10.1109/TPAMI.2018.2884905
Publication status	Published - 2020

Keywords

Bayesian neural networks
Dirichlet processes
Gibbs sampling
hybrid Monte-Carlo
non-conjugate models
weight sharing

ASJC Scopus subject areas

Software
Computer Vision and Pattern Recognition
Computational Theory and Mathematics
Artificial Intelligence
Applied Mathematics

Access to Document

10.1109/TPAMI.2018.2884905

Cite this

@article{e97b22a0c579484da42fd49c9514e81a,

title = "Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes",

abstract = "We extend feed-forward neural networks with a Dirichlet process prior over the weight distribution. This enforces a sharing on the network weights, which can reduce the overall number of parameters drastically. We alternately sample from the posterior of the weights and the posterior of assignments of network connections to the weights. This results in a weight sharing that is adopted to the given data. In order to make the procedure feasible, we present several techniques to reduce the computational burden. Experiments show that our approach mostly outperforms models with random weight sharing. Our model is capable of reducing the memory footprint substantially while maintaining a good performance compared to neural networks without weight sharing.",

keywords = "Bayesian neural networks, Dirichlet processes, Gibbs sampling, hybrid Monte-Carlo, non-conjugate models, weight sharing",

author = "Wolfgang Roth and Franz Pernkopf",

year = "2020",

doi = "10.1109/TPAMI.2018.2884905",

language = "English",

volume = "42",

pages = "246--252",

journal = "IEEE Transactions on Pattern Analysis and Machine Intelligence",

issn = "0162-8828",

publisher = "IEEE Computer Society",

number = "1",

}

TY - JOUR

T1 - Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes

AU - Roth, Wolfgang

AU - Pernkopf, Franz

PY - 2020

Y1 - 2020

N2 - We extend feed-forward neural networks with a Dirichlet process prior over the weight distribution. This enforces a sharing on the network weights, which can reduce the overall number of parameters drastically. We alternately sample from the posterior of the weights and the posterior of assignments of network connections to the weights. This results in a weight sharing that is adopted to the given data. In order to make the procedure feasible, we present several techniques to reduce the computational burden. Experiments show that our approach mostly outperforms models with random weight sharing. Our model is capable of reducing the memory footprint substantially while maintaining a good performance compared to neural networks without weight sharing.

AB - We extend feed-forward neural networks with a Dirichlet process prior over the weight distribution. This enforces a sharing on the network weights, which can reduce the overall number of parameters drastically. We alternately sample from the posterior of the weights and the posterior of assignments of network connections to the weights. This results in a weight sharing that is adopted to the given data. In order to make the procedure feasible, we present several techniques to reduce the computational burden. Experiments show that our approach mostly outperforms models with random weight sharing. Our model is capable of reducing the memory footprint substantially while maintaining a good performance compared to neural networks without weight sharing.

KW - Bayesian neural networks

KW - Dirichlet processes

KW - Gibbs sampling

KW - hybrid Monte-Carlo

KW - non-conjugate models

KW - weight sharing

UR - http://www.scopus.com/inward/record.url?scp=85058111559&partnerID=8YFLogxK

U2 - 10.1109/TPAMI.2018.2884905

DO - 10.1109/TPAMI.2018.2884905

M3 - Article

AN - SCOPUS:85058111559

SN - 0162-8828

VL - 42

SP - 246

EP - 252

JO - IEEE Transactions on Pattern Analysis and Machine Intelligence

JF - IEEE Transactions on Pattern Analysis and Machine Intelligence

IS - 1

ER -

Bayesian Neural Networks with Weight Sharing Using Dirichlet Processes

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this