|
|
Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series
Conjugate Dirichlet Process Mixture Models: Gene Expression, Efficient Sampling, and Clustering
David Dahl
University of Wisconsin
3:30PM - Thursday, January 20, 2004
Room 170, Eighteenth Avenue Bldg. (EA 170)
ABSTRACT
This talk proposes a novel conjugate Dirichlet process mixture (DPM)
model for the analysis of gene expression data, introduces a new
MCMC sampling algorithm for fitting general conjugate DPM models,
and describes a quick mode-finding algorithm for clustering in a
particular class of conjugate DPM models. Since biologists are
typically interested in expression patterns over a variety of
treatment conditions, the proposed model clusters genes having
similar patterns of expression (instead of similar levels of
expression) and naturally incorporates any number of treatment
conditions. Further, hypotheses are easily tested and false
discovery rates are readily estimated. The second part of the talk
addresses formidable computational issues arising in the use of DPM
models by introducing a new MCMC sampling algorithm for any (not
just the gene expression model) conjugate DPM model. Simulations
indicate that the proposed sampler can be significantly faster than
existing methods. The new algorithm is a merge-split sampler which
uses ideas similar to those in sequential importance sampling.
Finally, in the case of two treatment conditions, a very quick
clustering algorithm is introduced which is guaranteed to find the
mode of the posterior clustering distribution in a class of
conjugate DPM models. Pre-prints are available at
http://www.stat.wisc.edu/~dbdahl .
|