|
|
Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series
Recent Advances in Clustering with Applications to Image
Annotation
Jia Li
Department of Statistics, Penn State, University Park
3:30PM - Thursday, November 13, 2008
Room 170, Eighteenth Avenue Bldg. (EA 170)
ABSTRACT
Recent advances in clustering from two directions will be presented.
First, a new clustering approach based on mode identification and
kernel density estimate will be introduced. A recently developed
optimization algorithm, namely, the Modal EM (MEM), finds an ascending
path from an arbitrary point to a local maximum (mode) of a density
in the form of mixture distributions. A cluster is formed by those
sample points that ascend to the same mode of the density function.
In mode-based clustering, the role of mixture modeling is concentrated
on density estimation (rather than capturing clusters in the mean time),
and hence the result is more robust when clusters deviate substantially
from Gaussian distributions. An algorithm, namely, Ridgeline EM (REM),
is also developed to efficiently solve the ridgeline between the density
bumps of two clusters. Theoretical properties of the ridgeline make
it valuable for diagnosing clustering results and quantifying the
separability between clusters.
In the second part of the talk, we consider clustering objects represented
by sets of weighted vectors in contrast to vectors. Weighted vector
sets are formulated as discrete distributions with finite but arbitrary
support. A new clustering algorithm, namely D2-clustering (D2 stands
for discrete distribution), is developed using linear programming to
minimize the sum of Mallows distances between sample points and their
corresponding cluster centroids. Combined with a generalized mixture
modeling method based on the concept of hypothetical local mapping,
D2-clustering is applied to real-time image annotation and is the core
of ALIPR, an online automatic image tagging system.
Meet the speaker in Room 212 Cockins Hall at 4:30
p.m. Refreshments will be served.
|