computational
modeling in biology

Schriftgröße »A . A+ . A++ .

Statistical Inference

The CMB group intimately works on the development of statistical methods for the analysis of biological data from the three application areas "stem cells", "regulatory networks" and "metabolomics". Our model approaches range from differential equation (ODE, SDE, PDE) models to stochastic mixture models. Our main inference tools are Bayesian parameter estimation, techniques from machine learning such as independent component analysis and principal component analysis and techniques from functional data analysis such as spline interpolation.

 

Bayesian Estimation of Latent Causes in Gene Regulatory Processes

  • We consider the dynamics of molecular systems, typically modeled through networks like the continuous time recurrent neural network (CTRNN). Such systems mainly consist of components for which the interaction structure is known apart from some constants. These components might be observed or unobserved.
  • Moreover, we assume that the dynamics is further influenced by latent causes which follow some unknown dynamics and act on the system in an unknown manner. These latent causes might be linear or non-linear.
  • The dynamics of the whole system is described in terms of ordinary differential equations and different noise models.
  • The aim of our analysis is to estimate both the unknown parameters and the latent causes in the dynamic model. This is done by combination of penalized spline approximation, Bayesian parameter estimation, model selection methods and blind source separation.
  • External information on the dynamics of the latent causes and on the model parameters is incorporated through prior distributions.

 

Bayesian Blind Source Separation for Microarray Data

Main contacts at CMB: Katrin Illner, Christiane Fuchs, Fabian Theis

  • In many bioinformatic applications we deal with heterogeneous large scale data like microRNA, mRNA or protein level data, and in most cases we already know some interactions and dependencies between these variables.
  • In the proposed work we focus on this aspect and introduce a version of blind source separation, i.e. we assume that our measurements (observations) are linear mixtures of some underlying biological processes (sources), where we explicitly include the dependency structure as prior knowledge.
  • More precisely, we generalize an idea from the context of time series data. Instead of considering the time-delayed correlation, we indroduce a graph-delayed correlation for networks and assume different sources to have vanishing graph-delayed correlation. We define our model in a probabilistic way and formulate the expectation maximization algorithm to get maximum a posteriori estimates of parameters and hidden sources.

 

Analysis of Zirconium Ingestion

Main contacts at CMB: Daniel Schmidl, Sabine Hug

  • Understanding the processing of radioactive substances in the human body is of high relevance for radiation sciences. For the processing of zirconium, several compartmental models exist. Their rate parameters are however unknown, and different models have to be compared.
  • Model selection is performed by computation of Bayes factors through thermodynamic integration. Statistical analysis already helped improving a proposed model.
  • Estimation of kinetic parameters, however, is challenging because of the high dimensionality of the joint distribution and the very low effective sampling size of a standard MCMC sampler. Hence, a novel independence MCMC sampler based on a vine copula proposal function is developed.

 

Modeling and Bayesian Inference for Diffusions

Main contact at CMB: Christiane Fuchs (b. Dargatz)

  • Diffusion processes are a promising instrument to realistically model the time-continuous evolution of phenomena in biology as they combine the advantages of probabilistic models and differential equation models. However, both the correct approximation of such dynamics in terms of diffusion processes and the statistical inference for diffusions proves to be challenging in practice.
  • In the CMB group we are intimately involved in diffusion modelling and the development of Bayesian estimation techniques for diffusions [Dargatz 2010]. The application of diffusion processes to fluorescence microscopy data and single-cell data yields promising results and shows the potential of this approach.

Mixture Models

Main contact at CMB: Fabian Theis

  • A key aspect in the successful application of mixture models to data from bioinformatics is to control for robustness of the algorithm with respect to missing data and outliers. We have described and analysed a blind source separation method for this in [Theis et al, Proc. ICA 2010] by applying the concept of scatter matrices. In a parallel project, we have studied how the application of Bayesian matrix factorization methods may yield an improvement over frequentist approaches; indeed it turned out that the resulting methods are more robust against outliers, which allowed the analysis and interpretation of a genome-scale principal component analysis [Theis et al, Mol Biol and Evol, 2011].
  • We have applied these mixture model and factorization techniques to the efficient clustering of large-scale heterogeneous interaction networks, derived either from experimental data or from text mining in collaboration with the BIS group, IBIS; results on clustering a manually annotated gene-complex graph shows significant homogeneity between gene and corresponding complex clusters [Hartsperger et al., BMC Bioinformatics, 2010]. We expect this tool to be valuable in the preprocessing of large-scale interaction data in order to build smaller scale models.

 

References

  • C. Dargatz. Bayesian Inference for Diffusion Processes with Applications in Life Sciences. Ph.D. Thesis, Ludwig-Maximilians-Universität Munich, 2010. [ .pdf ]
  • S. Hug and F. Theis. Bayesian Inference of Latent Causes in Gene Regulatory Dynamics. In Proc. LVA/ICA 2012, volume 7191 of LNCS, pages 520-527. Springer, Tel Aviv, Israel, 2012.
  • F. Theis, N. Latif, P. Wong and D. Frishman. Complex principal component and correlation structure of 16 yeast genomic variables. Molecular Biology and Evolution, accepted, 2011. 10.1093/molbev/msr077. [ DOI | PubMed | .pdf ]
  • F. Theis, N. Müller, C. Plant and C. Böhm. Robust second-order source separation identifies experimental responses in biomedical imaging. In Proc. ICA 2010, volume 6365 of LNCS, pages 466-473. Springer, St. Malo, France, 2010. [ .pdf ]
  • M. Hartsperger, F. Blöchl, V. Stümpflen and F. Theis. Structuring heterogeneous biological information using fuzzy clustering of k-partite graphs. BMC Bioinformatics. 2010, 11:522. [ .pdf ]