Multivariate Statistics

Course description

Requirements: Programming skills with R, e.g. course Introduction to R and basic knowledge of statistics, e.g. course Introduction to Statistics. Some practice in ggplot2 is also welcome, that can be achieved in the course Graphics with R (not mandatory).

Your profit: The participants will learn when and how to apply unsupervised learning methods such as PCA, MDS, t-SNE or UMAP for dimension reduction and k-means, hierarchical clustering or some hybrid approaches for clustering. The course also covers two supervised learning methods, namely principal component regression and partial least squares regression. The content of the course will help to understand the basis of the theory when doing a multivariate analysis. All topics are accompanied with hands-on exercises using the statistical software R. The participants are invited to ask as many questions as they want about the analyses on their own dataset.

Topics: This course on multivariate statistics covers two different topics:

  • Dimension reduction methods. This first chapter focuses more on principal component analysis (PCA), what is "under the hood", how many principal components to choose, how to visualize and interpret the results. A short overview on other unsupervised multivariate methods (e.g. for categorical variables, data structured into groups, multidimensional scaling - MDS) is also part of the lecture. This chapter includes a part on dimension reduction for omics data (t-SNE, UMAP) as well. Finally, two supervised learning methods are covered by this chapter: principal component and partial least squares (PLS) regressions.
  • Cluster analysis. This second chapter focuses on the two most frequently used clustering methods: k-means and hierarchical clustering (HC). It describes the different measures of dissimilarity and distances that can be used to define clusters. A short part also illustrates how to combine both algorithms (k-means and HC) into hybrid algorithms. Finally, this chapter covers the R commands that permit to produce heatmaps together with the result of a clustering algorithm.

Methods: Each day consists of blocks covering first the theory behind the methods and their applications in R, and then hands-on examples with best-practice solutions.
During the Corona-pandemic the course is held as an online course. Please consider the following constraints:

  • It will be held online via the software Zoom.
  • Please check before hand on zoom.us/test whether your computer is compatible with the tool. (No registration necessary, but you have to download some tools)
  • A stable internet connection is absolutely necessary, optimal would be a LAN access.
  • You do NOT need to have a microphone or camera, since we offer a written chat interaction from your side. However, having a camera and a microphone is advantageous.

Duration: 3 Days

Language: English

Materials:

  • Material for the course can be found here* (only for HMGU staff).
  • Please install the necessary R-packages prior to the course. The packages are listed in "Material_Multivariate_Statistics.html" which is part of the linked ZIP-folder.
  • Please be aware that the materials will be updated shortly before the next course.

Dates and Application: You can check the current dates and whether the courses are already fully booked here*.
Please apply via the forms of the HR Development department*

 

 * Links marked with * are only available for HMGU staff.