OSU Navigation Bar

The Ohio State University

Department of Statistics

Cockins Hall
rollover image OSU Statistics
            Home

design element

OSU Statistics

Home

News

Research & Consulting Groups

People

For Visitors

For Prospective Students

For Current Students & Faculty

Contact Us



rollover image

News

rollover image

Newsletter

rollover image

Seminars

Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series

Multi-level Mixture Models and Simultaneous Model Selection via Rate-Distortion Theory, with Applications to Clustering and Significance Analysis of Gene Expression Data

Rebecka Jornsten
Department of Statistics, Rutgers University

3:00PM - Friday, September 26, 2008
Room 240, Cockins Hall (CH 240)

ABSTRACT

The analysis of gene expression data present many challenges that can be formulated as model selection problems. In model-based clustering, we group genes that exhibit similar expression profiles across experimental conditions. To allow for direct and objective inference of the clustering outcome, we need to determine a sparse representation of each cluster; between which experimental conditions does the cluster expression profile truly differ? Model selection in clustering is combinatorial in the number of clusters and the number of experimental conditions, and thus presents a computationally challenging task. We introduce a simultaneous approach to subset model selection, which draws on results from rate-distortion theory. The rate-distortion formulation allows us to turn the combinatorial model selection into a fast and simple line search. Furthermore, by considering each gene as its own cluster, the simultaneous selection framework extends to significance analysis of differential expression. We can thus determine not only if a gene is differentially expressed, but also which are the disciminatory experimental conditions.

These days, data often have a complex structure, and the clustering techniques we apply should reflect this. We introduce multi-level mixture models to address this issue. The multi-level framework can incorporate multiple distance metrics into clustering simultaneously, and be used to analyze multi-factor experiments. Multi-level mixture models extend model selection in clustering to between-cluster comparisons, and can constitute a substantial savings of model parameters, allowing for more clusters to be detected than with standard clustering techniques.

Meet the speaker in Room 212 Cockins Hall at 4:30 p.m. Refreshments will be served.



If you have trouble accessing this page, or need an alternate format contact webmaster@stat.osu.edu.