STEM-hy: Species Tree Estimation using Maximum
likelihood (with hybridization)
Version 1.0
Laura S. Kubatko and Travis Treseder
Department of Statistics
The Ohio State University
Columbus, OH 43210 lkubatko@stat.osu.edu
Copyright 2007-2012 by Laura Salter Kubatko. This software
is provided "as is" without warranty of any kind. In no event
shall the author be held responsible for any damage resulting from
the use of this software. The program package, including source
codes, executables, and documentation, is distributed free of
charge.
The basic analyses incorporated in STEM-hy are described in the following publication:
Kubatko, L., B. C. Carstens, and L. L. Knowles. 2009. STEM: Species
Tree Estimation using Maximum likelihood for gene trees under
coalescence, Bioinformatics, available
here (doi: 10.1093/bioinformatics/btp079).
The hybridization methods are described in the following
publication:
Kubatko, LS. 2009. Identifying Hybridization Events in the Presence of
Coalescence via Model Selection, Systematic Biology 58(5):
478-488, available here.
About the Program
STEM-hy is a program for inferring maximum likelihood
species trees from a collection of estimated gene trees under the
coalescent model. The program has the following functionality:
Return the ML tree under the coalescent model with gene trees as
input data
Compute the likelihood of a user-specified tree
Search for a set of high likelihood trees using simulated
annealing
Carry out a bootstrap analysis to obtain bootstrap support for
the internal nodes of the ML species tree or to obtain a bootstrap
consensus tree
Evaluate hypotheses of hybridization in a model-selection framework
STEM-hy Version 1.0 includes all functionality of STEM 2.0.
An option for carrying out a bootstrap analysis has been
added. Alignments for individual genes must be provided in PHYLIP
format. These genes will be bootstrapped a user-specified number of
times, and gene trees for each bootstrap data set are estimated
using the program
SSA. The bootstrap gene trees in each sample are used
to estimate the species tree for each bootstrap replicate.
Hybridization analyses as described in Kubatko (2009) can now be
carried out. This analysis requires that the user specify a species
trees as the well as identify putative hybrid taxa. The program will
estimate the hybridization parameters and AIC for each hypothesis of
hybridiation.
Version 2.0, Released January 21, 2011.
STEM 2.0 is completely re-written in Clojure and is distributed as a
JAR file. It is run as a JAVA application.
A new option for supplying the set of input gene trees has been
included. It is now possible to supply a list of file names that
contain the gene trees.
The format of the settings file has been standardized to YAML
format.
STEM 1.1 could not handle some patterns of missing data. STEM
2.0 has been more extensively tested with a variety of missing data
patterns and we believe it now always handles missing data properly.
STEM 2.0 will compute the likelihood for a user-specified tree
with user-specified branch lengths, as well as providing maximum
likelihood branch lengths for a user-specified tree.
Version 1.1, Released November 26, 2008.
Improved input and output format.
Allows for different taxon samples in each gene, thereby
enabling analysis when data are missing for some lineages for
some genes.
Version 1.01 beta, Released February 2006.
Acknowledgements
Continued development of STEM/STEM-hy is based upon work supported by the
National Science Foundation under Grants DMS 0104290, DMS 0702277, and
DEB 0842219. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Contact Info
Questions or comments concerning the program can be e-mailed to me at lkubatko@stat.osu.edu. Please also e-mail me if you'd like to be advised when updated versions of the program become available.