GPLVM for time-structured data

More and more single-cell experiments of developing stem cells yield gene expression data for different developmental stages. In order to better understand cell fate decisions during differentiation, it is desirable to analyse the high-dimensional gene expression data and assess differences in gene expression patterns between different developmental stages as well as within developmental stages. Conventional methods include univariate analyses of distributions of genes at different stages or multivariate linear methods such as PCA. However, these approaches often fail to resolve important differences as each lineage has a unique gene expression pattern which changes gradually over time yielding different gene expressions both between different developmental stages as well as heterogeneous distributions at a specific stage. Furthermore, to date no approach taking the temporal structure of the data into account has been presented.
 We present a novel framework based on Gaussian Process Latent Variable Models (GPLVMs) to analyse single-cell qPCR expression data from differnt developmental stages. We extend GPLVMs by introducing gene relevance maps and gradient plots in order to provide interpretability as in the linear case. Furthermore we take the temporal group structure of the data into account and introduce a new factor in the GPLVM likelihood which ensures that small distances are preserved for cells from the same developmental stage.

Source Code

The mappings were implemented based on Prof. Neil Lawrence’s FGPLVM toolbox; extensions for relevance analysis and including the structure of the data can be downloaded here.