|
|
Department of Statistics, The Ohio State University
Statistics and Biostatistics Colloquium Series
A Rank-Based Clustering Method for the Analysis of Social
Inequality Data
Tim F. Liao
Chair, Department of Sociology, University of Illinois
3:30PM - Thursday, October 23, 2008
Room 170, Eighteenth Avenue Bldg. (EA 170)
ABSTRACT
When studying social, economic or health inequality, the analyst must
estimate clusters or classes contained in the data. The commonly used
methods such as latent class/cluster models or the k-mean method assume
the multivariate normal distribution. Most inequality data, however,
are non-normal in distribution. This paper proposes a rank- based cluster
analysis, which can take the form of a latent class/ finite mixture model
or a basic cluster method such the k-means algorithm; in either case,
the multivariate normal distributional assumption is no longer crucial.
There are two theoretical foundations for the proposed method—relative
deprivation theory in sociology and relative income concept in economics
on the one hand, and topological distance in mathematical thinking on
the other. This method offers an alternative view on inequality, and
is nonparametric in essence. A simulation analysis of three-clusters
mixtures indicated by two or three variables using three different data-
generating mechanisms shows that when data are normal, either the (real)
value-based or rank-based method would produce similar results. When data
depart from normality, the results are more mixed: finite mixture models
do somewhat better for data of real values while the k-means method
performs much better for ranked data. Three empirical data applications
further demonstrate the usefulness of the rank-based method: an analysis
of the 1991 British Household Panel Survey data with three variables for
socioeconomic classification, a re-analysis of the classic diabetes data,
and an exploration of fertility inequality using the 2006 U.S. General
Social Survey data. All three examples suggest some new substantive
insights unobtainable from the parametric analysis of the original data
and require much reduced computation time for estimation.
Meet the speaker in Room 212 Cockins Hall at 4:30
p.m. Refreshments will be served.
|