Bioinformatics Master Practical SS19

Master Practical

Fabian TheisCarsten MarrMathias HeinigBenjamin Schubert
Tel.: +49 89 2891-7961Tel.: +49 89 3187-2158Tel.: +49 89 3187-4217Tel.:+49 89 3187-3369
E-mail: E-mail:  E-mail:  Email: 


Room:HMGU, Institute of Computational Biology, Building 58a,
Date & Time:     

Block course: 6 weeks from July 29 on, Weekly meeting with research group 

Prerequisites:Bachelor in mathematics, bioinformatics, statistics or related fields.
Language:      English       


NOTE: Preliminary course descriptions

Project I: Model inference from protein time-courses in hematopoietic stem cells

Supervisor: Carsten Marr

Abstract: Stochasticity of gene expression becomes apparent when we study the dynamics of single cells rather than a population of cells. In fact, fluctuations in gene expression reveal more information about the underlying mechanisms of transcription and translation than one could obtain from population averages.

In this project we will develop a particle filtering algorithm to infer the parameters of stochastic models from single cell time course data. In particular, we will apply this method to time-lapse microscopy data of two transcription factors in murine blood stem cells. These transcription factors play a major role in stem cell differentiation.

During the project, you will learn about stochastic dynamics, associated inference method and models of stem cell fate decisions. Requirements are good programming skills and solid background knowledge in statistics/machine learning.

Meetings: Mondays


  1. Kaern, M., Elston, T. C., Blake, W. J. & Collins, J. J. Stochasticity in gene expression: from theories to phenotypes. Nat. Rev. Genet. 6, 451–464 (2005).
  2. Zechner, C., Pelet, S., Peter, M. & Koeppl, H. Recursive Bayesian Estimation of Stochastic Rate Constants from Heterogeneous Cell Populations. 50th IEEE Conf. Decis. Control (2011).

Project II: Inference of transcriptional regulators from single cell open chromatin and gene expression data

Supervisor: Matthias Heinig

Abstract: The identity of a cell is mirrored in its global gene expression profile. Transcription is regulated by transcription factors that recognize specific sequences in cis regulatory elements residing in regions of open chromatin. Recent technological developments allow for the measurement of genome-wide transcriptional profiles as well as open chromatin regions in single cells.

In this project we will develop an approach to infer which transcription factors are defining the cellular identities by integrating gene expression, open chromatin and known sequence motifs of transcription factors.

During this project you will learn to analyze single cell RNA-seq and single cell ATAC-seq data, to identify transcription factor binding sites from ATAC-seq footprints and to integrate these data in a statistical model.

Meetings: Monday


  1. Schmidt, F., Gasparoni, N., Gasparoni, G., Gianmoena, K., Cadenas, C., Polansky, J. K., et al. (2017). Combining transcription factor binding affinities with open-chromatin data for accurate gene expression prediction. Nucleic Acids Research, 45(1), 54–66.
  2. Gusmao, E. G., Allhoff, M., Zenke, M., & Costa, I. G. (2016). Analysis of computational footprinting methods for DNase sequencing experiments. Nature Methods, 13(4), 303–309.
  3. Cusanovich, D. A., Hill, A. J., Aghamirzaie, D., Daza, R. M., Pliner, H. A., Berletch, J. B., et al. (2018). A Single-Cell Atlas of In Vivo Mammalian Chromatin Accessibility. Cell, 174(5), 1309–1324.e18.

Project III: Exploring the cellular heterogeneity of the mouse hippocampus under stress

Supervisor: Fabian Theis

Abstract: Droplet-based single cell RNA-sequencing has enabled us to profile the cellular heterogeneity of tissues, organs, and even whole organisms. The largest of these so-called “atlases” is a 1.3 million cell dataset of the mouse brain. The next step in single-cell profiling is to understand how these atlases change under perturbations. Together with collaborators at the MPI for psychiatry, we are generating a dataset of over 120,000 cells to profile how the mouse hippocampus responds to stress under various conditions.

In this project we will begin to analyse the first perturbation atlas of the mouse brain. Using our popular analysis platform, scanpy, we will investigate how hippocampal cellular diversity is affected by stress and neuronal receptor knockout. This project will involve using and adapting various machine learning methods for cellular embedding and clustering, computing gene expression signatures, comparing cellular compositions between conditions, and inferring developmental and other trajectories of brain cell types. We will further adapt machine learning methods for trajectory inference to describe continuous cellular phenotypes in the brain.

During this project you will learn how single-cell RNA-seq data is pre-processed, analysed, and interpreted to obtain biological insights. You will familiarize yourself with how machine learning is applied in a biological context and you will work with analysis methods that represent the current state-of-the-art in the single-cell field. Experience in Python and a basic understanding of statistics are required for this project.

Meetings: Thursdays


  1. Schaum, N., Karkanias, J., Neff, N., et al., Single-cell transcriptomics of 20 mouse organs creates a Tabula Muris, Nature (2018) 562(7727) 367-372
  2. Wolf, F.A., Angerer, P., Theis, F.J., SCANPY: large-scale single-cell gene expression data analysis, Genome Biology (2018) 19(1) 15
  3. Haghverdi, L., Buettner, M., Wolf, F.A., Buettner, F., Theis, F.J., Diffusion pseudotime robustly reconstructs lineage branching, Nature Methods (2016) 13(10) 845-848

Project IV: Alignment-free HLA Genotype Inference

Supervisor: Benjamin Schubert

Abstract: The human leukocyte antigen (HLA) cluster of genes plays a major role in adaptive immune response and thus is important for many clinical applications. However, experimental determination of an individual's HLA genotype is very cost inefficient, as sequencing data has to be produced with the sole purpose of HLA genotyping.

In this project, we will develop a fast and resource efficient approach to infer the HLA genotype from standard NGS data using pseudo-alignment and quasi-indexing strategies combined with an integer linear programming formulation. Once the HLA genotype is inferred we will extend the model to also quantify the relative abundance of the HLA alleles (given RNA-Seq data).

During this project, you will familiarize yourself with modern pseudo-alignment and quasi-mapping techniques and will learn how to formulate and solve integer linear programs. You will require some level of C/C++ programming experience for this project.

Meetings: Thursdays (subject to change)


  1. Szolek, A., Schubert, B., Mohr, C., Sturm, M., Feldhahn, M., & Kohlbacher, O. (2014). OptiType: precision HLA typing from next-generation sequencing data.Bioinformatics,30(23), 3310-3316.
  2. Srivastava, A., Sarkar, H., Gupta, N., & Patro, R. (2016). RapMap: a rapid, sensitive and accurate tool for mapping RNA-seq reads to transcriptomes. Bioinformatics, 32(12), i192-i200.
  3. Saltzman, M. J. (2002). COIN-OR: an open-source library for optimization. In Programming languages and systems in computational economics and finance (pp. 3-32). Springer, Boston, MA.



We use cookies to improve your experience on our Website. We need cookies to continuously improve the services, to enable certain features and when embedding services or content of third parties, such as video player. By using our website, you agree to the use of cookies. We use different types of cookies. You can personalize your cookie settings here:

Show detail settings
Please find more information in our privacy statement.

There you may also change your settings later.