# Overview

Multivariate data are abundant and are found in all areas of application of statistics and machine learning. Multivariate analysis is concerned with the simultaneous statistical analysis of multiple variables. This is an advanced course intended for doctoral students in statistics and related fields. The course will introduce students to methodology, theoretical foundations, and computational aspects of multivariate data analysis from a modern perspective. Theoretical derivations will be presented together with practical aspects and intuition. The topics of the course include:

• Foundations
• Random vectors and matrices
• Summarization, display, and geometry of multivariate data
• Multivariate Gaussian theory
• Core techniques
• Principal components analysis (PCA)
• Linear discriminant analysis (LDA)
• Canonical correlation analysis (CCA)
• Cluster analysis and Gaussian mixture models
• Kernel methods
• Reproducing kernels
• Kernel PCA/CCA
• High-dimensional statistics and regularized procedures
• Sparse covariance estimation and Gaussian graphical model selection
• Sparse PCA/LDA/CCA

This is a 3 credit hour course in lecture format.

# Prerequisites

Stat 6802, or permission of the instructor. Students are expected to be able to read and write mathematical proofs. Preparation in multivariate calculus, linear algebra, and mathematical statistics is absolutely necessary for this course. Familiarity with the statistical computing environment R or languages such as Matlab is expected. Some of the concepts the instructor assumes that students are familiar with include:

• Convergence in probability and convergence in distribution
• Maximum likelihood, Fisher information
• Loss function, risk of an estimator
• Bias and variance
• Trace, determinants, eigenvalues, and eigenvectors

The first homework will have a few review problems that indicate the level of preparation this course requires. If you find them too difficult, then you will probably have difficulty with the rest of the course.

# Course materials

There are no required books, but the following book is recommended for supplemental reading and freely available on the OSU network:

It provides modern and broad coverage of multivariate methods. It is similar in breadth and style to The Elements of Statistical Learning by Hastie, Tisbhirani, and Friedman. The instructor will occasionally recommend reading from Izenman (2008) and other references.

A free reference for matrix algebra and calculus is:

Additional reading material will be posted to the course website during the course.

Evaluation will be based on the following components:

• 25% Homework
• 25% Exam
• 50% project

## Homework

There will be occasional homework assignments. They will be posted on Carmen and collected in lecture on the due date.

## Exam

There will be one in-class exam on March 24, 2017.

## Project

Students will work on a project through the semester, write a short (8 page, conference-style) paper, and deliver a presentation in the later part of the course. A list of potential topics will be provided by the instructor. The topic selected by the student must be approved in advance by the instructor.