Dunke Zhou and Hang J. Kim, The Ohio State University
Eighteenth Avenue Building, Room 170
Abstract: (Kim) It is well known that an ordinary MCMC sampler run on a multimodal limiting distribution can suffer from the local-trap problem — that is, the chain is easily trapped in a local mode, essentially rarely or never escaping to visit other modes of the distribution. The multiset sampler (MSS) proposed by Leman et al. (2009) is a new MCMC algorithm to alleviate the local-trap problem by introducing a notion of multiset sampling distribution. We generalize the algorithm by redefning the MSS with an explicit description of the link between target distribution and sampling distribution. Attention of our work is given to construction of Markov chain, the impact of tuning parameters on mixing of the chain, inference arising from the chain, theoretical properties of the chain, and general implementation issues. The performance is illustrated on a number of examples including a variance component model, Bayesian outlier detection and a gene expression study.
Abstract: (Zhou) Variable selection in clustering of high-dimensional data had drawn attentions in machine learning and statistics literatures recently. Most of recently developed methods select a small number of variables by either explicitly or implicitly constructing importance measure for each variable. We introduce an ensemble variable importance (VI) measure by aggregating over clustering results based on different randomly selected subsets of variables. The proposed VI shows better separation between relevant and irrelevant variables and is robust against the specification of the number of groups. In addition to the development in VI, we propose a new VI-based variable selection method, which selects a set of variables through sequentially testing the existence of group structure in data. Its effectiveness is demonstrated through simulation study and a real data application.