HyDe takes DNA sequence from all loci/SNPs as a concatenated PHYLIP-formatted file (but please note: HyDe is NOT a "concatenation" method). Each sequence must appear in its own row, with a 9-character taxon name, a space, and then the sequence data starting in the 11th column. The first line of the file must contain four numbers, separated by a space: (i) the number of sequences; (ii) the number of sites; (iii) the number of the outgroup in the ordered list of taxa (e.g., if the outgroup is the 5th sequence in the input file, use a "5"); (iv) the bound on the p-value used to determine statistical significance; (v) the minimum number of sites that must be observed for each pattern in order to carry out the test (recommended to be at least 10); (vi) the mininimum number of non-missing sites that must be observed for a quartet in order to carry out the test (recommended to be at least 100); and (vii) an indicator about whether "extra" information should be printed to standard output. The input file should be named data.phy and should be placed in the same directory as the executable.
An example input data file is included in the .zip file. This is the
file for the rattlesnake data used as an example in the manscript.
The first row of the file is:
52 8466 1 0.0000032 10 100 0
This means that
there are 52 sequences with 8,466bp per sequence. The first sequence
is the outgroup, and all comparisons with p-values below
0.0000032 will be declared statistically significant (this is based on a Bonferroni correction -- see the
manuscript for justification). At least 10 sites must be observed for
each pattern used in the test, and at least 100 sites must be observed
for the quartet in order for the test to be carried out. Extra output
is turned off (set this value to "1" to have the extra output printed).
Output File Format
Two output files are created. The first is called results.txt, and contains one line of output for each comparison considered. Each line lists the number of the hybrid species and the number of the two parents, the test statistic, and the p-value.
For example, the first line of the results.txt file for the
rattlesnake data is:
Hybrid: 3 Parents: 2 and 4 0.691612 0.489018
This means that when the hybrid is sequence number 3 in the list and
the parents are sequences 2 and 4, the test statistic is 0.691612 and
the p-value is 0.489018.
The second file written by the program is
called sig.results.txt. This file contains the same output as
results.txt, except that only those comparisons that resulted in a
p-value below the user-input value are listed.
An Example: Sistrurus Rattlesnakes
The data file for the rattlesnake example in the manuscript is
included in the .zip file. To run the example: