Sparsifying Bayesian neural networks with latent binary variables and normalizing flows

Publication details

Artificial neural networks are powerful machine learning methods used in many
modern applications. A common issue is that they have millions or billions of
parameters and tend to overfit. Bayesian neural networks (BNN) can improve on this since they incorporate parameter uncertainty. Latent binary Bayesian neural networks (LBBNN) further take into account structural uncertainty by allowing the weights to be turned on or off, enabling inference in the joint space of weights and structures. Mean-field variational inference is typically used for computation within such models. In this paper, we will consider two extensions of variational inference for the LBBNN: Firstly, by using the local reparametrization trick (LCRT), we improve computational efficiency. Secondly, and more importantly,
by using normalizing flows on the variational posterior distribution of the LBBNN parameters, we learn a more flexible variational posterior than the mean
field Gaussian. Experimental results on real data show that this improves predictive power compared to using mean field variational inference on the LBBNN method, while also obtaining sparser networks. We also perform a simulation study, where we consider variable selection in a logistic regression setting with highly correlated data, where the more flexible variational distribution improves results.