|
|
Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series
Variable selection in clustering via Dirichlet process mixture
models
Sinae Kim
Department of Statistics, Texas A & M University
3:30PM - Tuesday, January 31, 2006
Room 170, Eighteenth Avenue Bldg. (EA 170)
ABSTRACT
The increased collection of high-dimensional data in various fields
has raised a strong interest in clustering algorithms and variable
selection procedures. A typical example is the analysis of DNA
microarray data, where there is interest in discovering disease
subtypes and isolating discriminating genes. The results could lead to
a better understanding of the underlying biological processes and help
develop targeted treatment strategies.
In this talk, I introduce a model-based method that addresses the two
problems simultaneously. I adopt a latent binary vector to identify
discriminating variables and use Dirichlet process mixture models to
define the cluster structure. I update the variable selection index
using a Metropolis algorithm and obtain inference on the cluster
structure via a split-merge MCMC technique. I explore the performance
of the methodology on simulated data and illustrate an application
with a leukemia cancer DNA microarray study.
Meet the speaker in Room 212 Cockins Hall at 4:30
p.m. Refreshments will be served.
|