A Bayesian binomial regression model with latent gaussian processes for modelling DNA methylation


Epigenetic observations are represented by the total number of reads from a given pool of cells and the number of methylated reads, making it reasonable to model this data by a binomial distribution. There are numerous factors that can influence the probability of success in a particular region. Moreover, there is a strong spatial (alongside the genome) dependence of these probabilities. We incorporate dependence on the covariates and the spatial dependence of the methylation probability for observations from a pool of cells by means of a binomial regression model with a latent Gaussian field and a logit link function. We apply a Bayesian approach including prior specifications on model configurations. We run a mode jumping Markov chain Monte Carlo algorithm (MJMCMC) across different choices of covariates in order to obtain the joint posterior distribution of parameters and models. This also allows finding the best set of covariates to model methylation probability within the genomic region of interest and individual marginal inclusion probabilities of the covariates.