|
|
Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series
Principal components analysis for high dimensional data
Debashis Paul
Department of Statistics
Stanford University
3:30PM - Thursday, February 3, 2005
Room 170, Eighteenth Avenue Bldg. (EA 170)
ABSTRACT
Suppose we have i.i.d. observations from a multivariate
Gaussian distribution with mean mu and covariance matrix
Sigma. We consider the problem of estimating the leading
eigenvectors of Sigma when the dimension p of the observation vectors increases with the sample size n. We work
under the setup where the covariance matrix is a finite
rank perturbation of identity. We show that even though
the ordinary principal components analysis may fail to
yield consistent estimators of the eigenvectors, if the
data can be sparsely represented in some known basis, then
a scheme based on first selecting a set of significant
coordinates and then applying PCA to the submatrix of sample
covariance matrix corresponding to the selected coordinates,
gives better estimates. Under suitable sparsity restrictions,
we show that the risk of the proposed estimator has the
optimal rate of convergence when measured in a squared-error
type loss. We demonstrate the performance of our method
through simulation studies and discuss some potential
applications. We also state some new results about the
behavior of the eigenvalues and eigenvectors of sample
covariance matrix when p/n converges to a positive constant.
Meet the speaker in Room 212 Cockins Hall at 4:30 p.m.
Refreshments will be served.
|