Biostatistics

Projects

This is a selection of current and previous research projects. Code and more information is provided via the GitHub repository.

Prospective COVID-19 Cohort Munich (KoCo19)

Investigating the impact of household structures on disease spread (source: Houda Yaqine, HMGU).

Main contacts: Turid Frahnow, Mercè Garí, Ronan Le Gleut, Heidi Seibold, Houda Yaqine, Christiane Fuchs

Under the lead of the Division of Infectious Diseases and Tropical Medicine, we participate in the prospective COVID-19 Cohort Munich (KoCo19) study. We contribute through statistical analyses and stochastic modelling, especially using stochastic differential equation models which account for disease transmission across regions and within households.

More information: Project webpage 

Uncertainty Quantification - From Data to Reliable Knowledge

Sources of uncertainty (brown), methods for treating uncertainty (blue) and expertise required (grey) (source: Christiane Fuchs, HMGU).

Main contacts:  Heidi Seibold, Houda Yaqine, Christiane Fuchs

How will the climate develop, how secure is our energy supply, and what chances does molecular medicine offer? The rapidly increasing amount of data offers radically new opportunities to address today’s most pressing questions of society, science, and economy: Data, outcomes and predictions are, however, subject to uncertainties. The goal of the project Uncertainty Quantification is to understand these uncertainties through methods of probability theory, and to include them into research and outreach. The project connects applied researchers from the four research fields Earth & Environment, Energy, Health, and Information among each other and with Helmholtz data science experts, as well as external university partners from mathematics and econometrics.

More information: Project webpage 

Modeling and Bayesian Inference for Diffusions

Main contacts: Susanne Pieschner, Houda YaqineChristiane Fuchs

Diffusion processes are a promising instrument to realistically model the time-continuous evolution of phenomena in biology as they combine the advantages of probabilistic models and differential equation models. However, both the correct approximation of such dynamics in terms of diffusion processes and the statistical inference for diffusions proves to be challenging in practice.

We are intimately involved in diffusion modelling and the development of Bayesian estimation techniques for diffusions. The application of diffusion processes to fluorescence microscopy data and single-cell data yields promising results and shows the potential of this approach.

References

Sample paths of jump process (left), ODE (middle) and SDE (right) (source: Christiane Fuchs, HMGU).

Spatial organization and scientific collaboration at the Helmholtz Zentrum München and the Bielefeld University

source: Hannah Busen, HMGU

Main contacts: Hannah Busen,  Christiane Fuchs

It seems evident that spatial proximity between researchers may lead to more frequent or more intense collaboration than between scientists who work at large distance from each other. We hypothesize that the spatial organization within a research campus or even within a building influences interdisciplinary work. In a collaboration network study, we investigate which distance matters, how much researchers are influenced by people working around them and how scientific publishing changes depending on the heterogeneity among authors.
Outcomes of the study could provide valuable information about which spatial organization can foster (interdisciplinary) research and could be used for future plans of building structures.
We plan to extend the analysis to other research centers and universities in Munich like the Ludwig-Maximilians University (LMU) which has institutes spread all over the city. In this way it would be possible to compare different campus structures and their influence on scientific collaboration.

References:

H. Busen, C. Fuchs (2020): Modelling the impact of spatial proximity on scientific collaboration networks. Proceedings of the 35th International Workshop on Statistical Modelling (IWSM). 

Estimation of Single-cell Heterogeneities from Cell Populations

Mixture of cells in different regulatory states (left) and single-cell mixture density (right) (source: Christiane Fuchs, HMGU).

Main contacts: Lisa Amrhein, Mercé GarìChristiane Fuchs

Even when appearing perfectly homogeneous on a morphological basis, tissues can be substantially heterogeneous in single-cell molecular expression. As such heterogeneities might govern the regulation of cell fate, one is interested in quantifying the heterogeneities in a given tissue.

Gene expression measurements of single cells would be most suitable to detect and further parameterize a heterogeneous population if the dataset was large and error-free. Unfortunately, such measurements are often expensive and subject to substantial technical noise. Instead of considering single-cell data, we randomly select small numbers of cells and measure the subpopulation average expression levels.

We investigate how heterogeneities can be detected from such data by application of statistical methods, and how the proportions, mean values and standard deviations of the groups of differently expressed cells can be estimated.

Application to measurements from human breast epithelial cells reveals the functional relevance of the heterogeneous expression of a particular gene.

Source code, an R package and a webtool are provided on the StochasticProfiling project website.

Collaboration partner:
Prof. Dr. Kevin Janes, University of Virginia

References

Past Projects

Detecting Genetic and Environmental Risks for Childhood Asthma

Main contacts: Norbert Krautenbacher, Christiane Fuchs

Childhood asthma is a widespread disease. Many studies revealed that its onset is influenced by genetic and environmental factors like certain single nucleotide polymorphism (SNP) variants, family history or farming environment.

Our objective is to develop an asthma risk score especially for children between one and three years with which one can assess a child’s personal risk to develop the disease. The score should be based on few SNPs and the environmental variables. This shall allow a cost-efficient targeted treatment for exposed children.

Statistical aspects of this project are regularization and variable selection, gene-environment interactions, big data, inclusion of prior knowledge, stratification of the data, missing values, SNP imputation and validation.

Collaboration partners:
Prof. Dr. Erika von Mutius, Dr. Markus Ege, Prof. Dr. Bianca Schaub
Dr. von Hauner Children‘s Hospital

ROC curves for models taking into account genetics (left), environment (middle), or both (right) (source: Norbert Krautenbacher, HMGU).

Risk Prediction for Prostate Cancer Patients

Main contacts at ICB: Ivan Kondofersky, Norbert Krautenbacher, Hagen Scherb, Christiane Fuchs

The Prostate Cancer DREAM Challenge attempted to improve survival prediction of prostate cancer patients. Participants were asked to build risk scores from a bulk of snapshot and longitudinal data tables within four months.

As "A Bavarian Dream" we participated in this challenge and finished up among the winning teams in both Subchallenges 1 and 2. Our work involved data and result management, data cleaning and preprocessing in close collaboration with a clinician, and model building ranging from classical Cox regression to machine learning, ensemble methods and model averaging. Final predictions were evaluated on an independent test set that had been withheld by the challenge organizers.

References

More information:

From left to right: Donna Pauler Ankerst, Norbert Krautenbacher, Michael Laimighofer, Christoph Kurz, Christiane Fuchs, Hagen Scherb, Ivan Kondofersky, Julia Söllner; not pictured: Philip Dargatz (source: HMGU).

Learning Classifiers on Biased Samples

Selection process leading to biased sample (source: Christiane Fuchs, Eleni Tsalma, HMGU).

Main contacts: Norbert Krautenbacher, Christiane Fuchs

In many epidemiological applications, particular interest lies on the investigation of rare combinations of an exposure and a target variable. Representative samples from a population hence may not contain sufficiently many cases for a reliable analysis. For that reason, stratified samples are taken from the population to enrich the rare combinations. Well-known examples are case-control studies or two-phase studies. The enrichment comes at the cost of biased samples distorting estimates. We address issues arising in prediction on such biased samples, both for training and evaluation of a statistical model.

References

Computational Models of Neoplasmic Heterogeneities and Lineage Choice

Main contacts at Biostatistics: Lisa Amrhein, Christiane Fuchs

Acute myeloid leukemia (AML) often results from the myelodysplastic syndrome (MDS). Here, the differentiation hierarchy from hematopoietic stem cells to mature, functional cells is disturbed. AML patients, again, frequently carry a mixture of different cancer cell types, so-called subclones. This is reflected by a mixture of genomic signatures and heterogeneous transcriptome profiles. Applying statistical and dynamical models to data from our clinical and biological collaborators, we want to identify altered differentiation hierarchies of MDS subclones and characterize the development of related heterogeneities in AML while the tumor undergoes evolution.

This project is funded as Subproject A17 of the Collaborative Research Centre (CRC) 1243 "Genetic and Epigenetic Evolution of Hematopoietic Neoplasms".

Further reading:

  • Webpage of CRC 1243:

http://www.sfb1243.biologie.uni-muenchen.de/index.html

Objectives of this project. Left: Inferring transcriptional and genomic heterogeneity in AML, that is the co-existence of at least two distinct cell populations within a tumor, and how it evolves over time. Right: Estimating differentiation and self-renewal rates in healthy and clonal differentiation hierarchies from healthy and MDS patient data (source: Christiane Fuchs, Carsten Marr, HMGU).