HyDe (Hybrid Detection)

Version 1.0
Laura Kubatko and Julia Chifman
Departments of Statistics and Evolution, Ecology, and Organismal Biology
The Ohio State University



Copyright 2015, 2016 by Laura Kubatko and Julia Chifman. This software is provided "as is" without warranty of any kind. In no event shall the author be held responsible for any damage resulting from the use of this software. The program package, including source codes, executables, and documentation, is distributed free of charge. This software is covered under the Gnu GPL (General Public License).


Input File Format

HyDe takes DNA sequence from all loci/SNPs as a concatenated PHYLIP-formatted file (but please note: HyDe is NOT a "concatenation" method). Each sequence must appear in its own row, with a 9-character taxon name, a space, and then the sequence data starting in the 11th column. The first line of the file must contain four numbers, separated by a space: (i) the number of sequences; (ii) the number of sites; (iii) the number of the outgroup in the ordered list of taxa (e.g., if the outgroup is the 5th sequence in the input file, use a "5"); (iv) the bound on the p-value used to determine statistical significance; (v) the minimum number of sites that must be observed for each pattern in order to carry out the test (recommended to be at least 10); (vi) the mininimum number of non-missing sites that must be observed for a quartet in order to carry out the test (recommended to be at least 100); and (vii) an indicator about whether "extra" information should be printed to standard output. The input file should be named data.phy and should be placed in the same directory as the executable.

An example input data file is included in the .zip file. This is the file for the rattlesnake data used as an example in the manscript. The first row of the file is:

        52 8466 1 0.0000032 10 100 0

This means that there are 52 sequences with 8,466bp per sequence. The first sequence is the outgroup, and all comparisons with p-values below 0.0000032 will be declared statistically significant (this is based on a Bonferroni correction -- see the manuscript for justification). At least 10 sites must be observed for each pattern used in the test, and at least 100 sites must be observed for the quartet in order for the test to be carried out. Extra output is turned off (set this value to "1" to have the extra output printed).


Output File Format

Two output files are created. The first is called results.txt, and contains one line of output for each comparison considered. Each line lists the number of the hybrid species and the number of the two parents, the test statistic, and the p-value.

For example, the first line of the results.txt file for the rattlesnake data is:

        Hybrid: 3 Parents: 2 and 4 0.691612 0.489018

This means that when the hybrid is sequence number 3 in the list and the parents are sequences 2 and 4, the test statistic is 0.691612 and the p-value is 0.489018.

The second file written by the program is called sig.results.txt. This file contains the same output as results.txt, except that only those comparisons that resulted in a p-value below the user-input value are listed.


An Example: Sistrurus Rattlesnakes

The data file for the rattlesnake example in the manuscript is included in the .zip file. To run the example:

  1. Compile the program (see program main page for intsructions)
  2. Type ./HyDe at the prompt
  3. Look at the output in results.txt and sig.results.txt. You will notice that sig.results.txt is empty since no comparisons exceed the Bonferroni-corrected significance level specified in the input file.

Notes


Please contact Laura Kubatko at kubatko.2@osu.edu with questions.