|
|
Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series
Regularization and Variable Selection for Multiclass Support Vector Machine and Varying-Coefficient Models with Applications in Genomics
Lifeng Wang
University of Pennsylvania
3:30PM - Tuesday, January 8, 2008
Room 240, Cockins Hall (CH 240)
ABSTRACT
Microarray technology has been widely used in biomedical research to
study complex biological systems and disease processes. Due to its
high-dimensionality, new computational and statistical methods and
rigorous theoretical development are required to draw valid inferences
from the data. In this talk, I will present two such problems and
the methods and theory that we have developed to address these
problems. The first problem is related to multiclass classification
and variable selection in presence of a very large number of genes. We
have proposed a regularized multiclass support vector machine, which
performs classification and variable selection simultaneously through an
L1-norm penalized sparse representation. A statistical learning theory
is developed to quantify the generalization error, where the number
of variables is allowed to grow much faster than the sample size. The
second problem is related to the identification of transcription factors
involved in gene regulation during a given biological process based on
the time course gene expression data. To capture the dynamic behavior of
gene expression, we propose to use a nonparametric varying-coefficient
model for such data and present a regularized estimation procedure
for variable selection that combines basis function approximations and
the smoothly clipped absolute deviation penalty (SCAD). The proposed
procedure simultaneously selects significant variables with time-varying
effects and estimates the nonzero smooth coefficient functions. Under
suitable conditions, we have established the theoretical properties of
our procedure, including consistency in variable selection and the oracle
property in estimation. I illustrate these methods with simulations and
real data examples.
Meet the speaker in Room 212 Cockins Hall at 4:30
p.m. Refreshments will be served.
|