|
|
Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series
Some Contributions to Infinite and High Dimensional Model Selection Problems
Arijit Chakrabarti
Purdue University
3:30PM - Thursday, April, 15, 2003
Room 170, Eighteenth Avenue Bldg. (EA 170)
ABSTRACT
The problem of selecting a model in infinite or high dimensional
setup has been of great interest in the recent years. A familiar
example where the parameter space is infinite dimensional is the
problem of estimating the unknown square-integrable drift function
of a Brownian motion in the Gaussian White-Noise model. The high
dimensional problems typically arise when the number of possible
parameters increases with increasing sample size.
Using a complete orthonormal basis of L_2, the unknown drift
function in the White-Noise model can be represented as an infinite
linear combination of the basis functions, the cofficients being the
(unknown)Fourier coefficients and thus the problem reduces to one of
estimating the vector of Fourier coefficients. Using a simple
isometry, this problem can be equivalently recast as the problem of
estimating the (square-summable)mean vector in an infinite
dimensional normal mean problem. In this talk, I will show that
model selection by the Akaike Information Criterion(AIC) (where
under each model, all but first finitely many coordinates of the
mean vector are assumed to be zero), followed by least squares
estimation, achieves the asymptotic minimax rate of convergence
(over an appropriate subset of the parameter space) for squared
error loss. I will also present a Bayesian Model Selection rule
followed by Bayes estimates which achieves the same rate of
convergence asymptotically.
It is known that the Bayes Information Criterion(BIC) may be an
inappropriate model selection criterion and a poor approximation to
integrated likelihoods in some high dimensional problems. I will
present a generalization GBIC of BIC, which approximates the
integrated likelihood upto O(1) and a Laplace Approximation to the
integrated likelihood which is correct upto o(1) in a high
dimensional setup when the observations come from the general
exponential family of distributions. Some simulation results will be
presented which show that GBIC performs much better than BIC and the
Laplace approximation performs wonderfully well in many examples,
including some non-exponential family examples. Finally, I will
indicate some areas of application for this approximation
method.
This is joint work with Professor Jayanta K. Ghosh.
|