Staff Details

Dr. Arthur Gilly
Head of Analytics

Phone: +49 89 3187 43390
Building/Room: 3533/215






As head of the analytics team in our institute my role is to help build up, maintain and monitor our institute's big computer infrastructure and provide programmatic and statistical support for a wide range of multi-omics analyses. We identify and address challenging, data intensive research questions to advance the translation of genetic research into actionable outcomes for precision health. My research focuses on the genetic architecture and underlying mechanisms of complex traits and diseases.

Technical focus:

  • Statistics, data analysis and machine learning
  • Data extraction, transformation and cleaning
  • Programming and scripting (bash, R, python, C++ ...)
  • Pipelining
  • Data visualisation and reporting

Research topics:

  • Whole-genome sequencing
  • Proteomics
  • Complex architectures: Rare and structural variants



  • Solve local IT issues
  • Build up maintain and monitor HPC infrastructure
  • Provide bioinformatics support (writing pipelines etc.)
  • Provide training for soft and hardware resources
  • Identify and address data-intensive research questions



Finalist in the ASHG/Charles J. Epstein Trainee Awards for Excellence in Human Genetics Research in 2017. 

Selected Publications:

Png, G. ; ... ; Zeggini, E. ; Gilly, A.
Population-wide copy number variation calling using variant call format files from 6,898 individuals.
Genet. Epidemiol., accepted (2019) 

Gilly, A. ; ... ; Hatzikotoulas, K. ; Zeggini, E.
Very low-depth whole-genome sequencing in complex trait association studies.
Bioinformatics 35, 2555-2561 (2019) 

Gilly, A. ; ... ; Hatzikotoulas, K. ; Zeggini, E.
Cohort-wide deep whole genome sequencing and the allelic architecture of complex traits.
Nat. Commun. 9:5460 (2018) 

Gilly, A. ; … ; Southam, L. ; Zeggini, E.
Very low-depth sequencing in a founder population identifies a cardioprotective APOC3 signal missed by genome-wide imputation.
Hum Mol Genet, 2360-2365 (2016)

Masters Projects:

All projects are fully computer-based. Students interested in applying should be familiar with R and/or python, and be proficient with the *NIX command line. A basic background in genetics is required. 

Evaluation of the heritability components of circulating protein levels

A sizeable fraction of the ~20,000 proteins encoded in our genomes can be detected in the blood. Since proteins are important actors in normal and pathological processes, those circulating protein levels have the potential to be important biomarkers of metabolic states and disease. In a previous study, we have performed a genome-wide association of several hundreds of proteins to determine which genomic variations have strong influences on blood protein levels. 

Although these results provide valuable insights into the genetic architecture of these traits, they do not include smaller, harder to detect effects, and as such, paint an incomplete genetic picture of these traits. In this project, the student will apply so-called heritability estimation methods, which sum effects across the genome, including weak and statistically ambiguous ones. These methods estimate the total variance in phenotype explained additively by all these effects. They also allow partitioning heritability into different variant classes, and determine what proportion of this total is explained by the strong effects usually detected by association studies. In this project, the student will benchmark several of these methods and compare their estimations. We will also study the relative contributions of strong effects and different consequence classes such as regulatory variants, characterise proteins that have high vs. low genetic heritability, and more.

Selected references:

Disentangling the LD structure of a disease-relevant locus using whole-genome sequencing data

Genetic association studies detect statistical associations between single nucleotide DNA variants (SNVs) and human traits. This can help pinpoint genetic loci involved in the molecular mechanisms underpinning disease onset and progression.  However, there are other classes of variation, such as structural variants (SV), which are not detected by these studies. These can affect large DNA regions, and have been shown to play a role in the genetics of medically-relevant traits. 

We have detected a large signature on human chromosome 16 in an isolated population that displays increased longevity and resistance to diabetic complications. This region harbours a strong cholesterol-altering signal, and is rare (<1%) in the population. This project's objective is to perform a "deep dive" analysis into this region and fully characterise it using computational tools. The student will perform, among others, haplotype phasing and structural variant detection using the 22x whole-genome sequencing data available in this cohort. They will also use this data to perform pedigree reconstruction to trace its transmission in families, and will look for this signature in other worldwide populations.

Selected references:

Understanding ageing markers in population isolates

Ageing is the only “disease” that affects us all. However, the molecular mechanisms underlying ageing processes are complex and poorly understood. This project aims to study the molecular markers of ageing, their genetic component and their correlation with disease. The student will leverage deep phenotypes, proteomics and whole-genome sequencing data previously generated in two population cohorts. Both are similarly-sized (n~1,500) samples from Greek isolated populations, the first with reported increased longevity and lower rates of metabolic disease complications, the second with increased incidence of metabolic and psychiatric disorders.

The aims of the project are twofold. First, the student will study the genetic determinants of known age-dependent variables such as telomere length and mosaic loss of Y-chromosome, and compare their robustness using several computational methods. They will then link them with measurements of circulating protein levels to identify ageing biomarkers that best predict age, or that differentiate healthy vs. non-healthy aging. A final focus of the project will be to compare results with existing literature, to determine whether any signal is specific to the either isolated population under study. Previous experience with supervised regression learning is preferred.

Selected references: