table of contents
- expected learning outcome
- getting started
- exercise 1: species-tree inference under the coalescent
- exercise 2: lineage-tree inference under the coalescent
expected learning outcome
The objective of this activity is to carry out a species-level phylogenetic analysis using multi-locus or SNP data under the coalescent model using SVDquartets. We will use the implementation of the method in PAUP*. Both species-level and lineage-level inferences will be considered. We will also see how the basic SVDquartets method can be used for (non-coalescent based) single locus analysis.
getting started
The primary data set we will use for this tutorial is the rattlesnake data of Kubatko et al. (Syst. Biol. 60(4): 393-409, 2011). The data consist of 2 species, each divided into 3 subspecies, and an outgroup. There are 26 individuals (52 sequences) and 19 genes, for a total of 8,466 sites. The data can be downloaded in nexus format from www.stat.osu.edu/~lkubatko/data-snakes.nex. On the cluster, you can issue the command: wget http://www.stat.osu.edu/~lkubatko/data-snakes.nex
exercise 1: species-tree inference under the coalescent
- Start PAUP* and execute the data-snakes.nex data file, either by selecting this file using the "Open..." option in the File menu, or by issuing the following at the command line:
exe data-snakes.nex; - Go to the "Analysis" menu, and select "SVDquartets ...". We will discuss the possible options as a group.
For this exercise, set the number of randomly generated quartets to 20,000 (or fewer, if you have a slow computer), select the bootstrapping option, and select the species-tree analysis option. This replicates the analysis in Chifman and Kubatko (2014).
Note that the entire analysis could be run from the command line by typing "svdquartets". For a list of options, use PAUP*'s help by typing
svdq ?
The command we'll use here is: SVDQuartets nquartets=20000 speciesTree partition=snakespecies seed=1234568 bootstrap;
What is the bootstrap support for the clade containing the three S. catenatus subspecies and for the clade containing the three S. miliarius subspecies?
Answers will vary slightly due to the random selection of quartets and bootstrap samples. The catenatus clade will typically have bootstrap support above 90, while bootstrap support for miliarius is typically between 45 and 65.
- Now we'll run another analysis by going again to the SVDquartets dialog box in the Analysis menu. De-select the bootstrap option, and select the "Show quartet scores" option. Click "OK". The corresponding command is: SVDQuartets nquartets=20000 speciesTree partition=snakespecies seed=1234568 showScores=yes bootstrap=no;
Examine the output -- what do the scores represent?
Output from analyzing each quartet is shown. For each sampled quartet, three scores (representing the three possible unrooted trees) are given. Scanning over the list of sampled quartets, several different kinds of relationships are observed: sometimes one tree has a much lower score than the other two, and sometimes the scores for all three relationships are much more even. Note that numbers correspond to lineages, as ordered in the input file. Species labels are applied to build the species tree.
exercise 2: lineage tree inference under the coalescent
- Now suppose that we are interested in the relationships among the individual lineages under the coalecent model. In the "SVDquartets" dialog box, restore the defaults (make sure that the species-tree analysis option is not selected), and select the bootstrap option. Set the number of quartets sampled to be the same as in exercise 1. The command is: SVDQuartets nquartets=20000 speciesTree=no showScores=no seed=475839 bootstrap;
- GUI version only: Once the analysis completes, the estimated tree can be displayed by selecting the "Print/View SVDquartets Boostrap" option from the "Trees" menu. Notice that the subspecies have been color-coded to allow a nice visual assessment of the relationships among subspecies in the estimated tree. The commands to create the color-coding are in the file data-snakes.nex.
- Compare the lineage tree and bootstrap values you observed in this analysis to the species tree and bootstrap values you observed in exercise 1. Do the results make sense?
The bootstrap support at the species level should appear similar (subject to sampling variability) in the two analyses.