Manual to David Gerard's HybTree Perl Script for Estimating Hybridization and Time Scales In the Presence of Deep Coalescence To cite this script: Gerard D, Kubatko L, 2011. "Estimating hybridization in the presence of coalescence using phylogenetic intraspecific sampling". To cite the model: Meng C, Kubatko LS. 2009. "Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: A model". Theoretical Population Biology 75: 35-45. To cite James Degnan's Coal: Degnan, JH and LA Salter. 2005. "Gene tree distribtutions under the coalescent process". Evolution 59(1): 24-37. Make sure you have perl installed as well as the module File::Copy (if you are operating in Unix, you probably already have both). Note: Special instructions for operating in Windows environment are at the bottom of the file. Make sure the script and the Intra executable are in the same directory. Make sure the Intra executable is named "Intra". (Intra is the same thing as James Degnan's Coal program and is freely avaible on his website: http://www.coaltree.net/) Create a file titled "input_file" and place commands in this file. Place one command followed by one parameter per line. No semicolans at the end of lines please. At least one space between command and parameter. Commands must be at the very beginning of the line. Type "perl HybTree" to run the script. Note: You might need to check where perl is located in your system prior using this script. Type "whereis perl" at the Unix command prompt and if it does not return "/usr/bin/perl" then you will need to enter the HybTree script and at the very top replace "#!/usr/bin/perl" with "#!", where is what is returned to you upon typing "whereis perl". Commands: gene_number place the number of gene trees in the treefile here (required) ntaxa place the number of taxa per gene tree here (must have same number in each tree) (required) file the filename containing the gene trees goes here (required) par1 place one taxa name here that belongs to parental species 1 (at least one required) par2 Place one taxa name here that belongs to parental species 2 The species that you believe to be more closely related to the potential hybrid species should be parental species 2 (this will not effect the estimations of gamma and branchlengths, but it will effect the likelihood ratio test) (at least one required) hyb place one taxa name here that belongs to the potential hybrid species (at least one required) outg place one taxa name here that belongs to the outgroup (at least one required) gamma_tries Number of gammas one will try between 0 and 1 at an interval of 1/gamma_tries. Must be integer > 0. Default = 200. gttol end conditions for gamma in optimization (default 0.01) t1ttol or t2ttol or t3ttol end conditions for t1, t2, or t3 (default t1ttol=t2ttol=t3ttol=0.1) iter_max maximum number of iterations for each optimization (default 10) start_t1 or start_t2 or start_t3 or start_all where you want the analysis to start looking. default start_t1=start_t2=start_t3=1 start_all overrides other choices choose start values that you believe to be close to real value--but it will work if you don't (it just might take longer). bracket_t1 or bracket_t2 or bracket_t3 or bracket_all where you wish to bracket parameters of optimization default bracket_t1=bracket_t2=bracket_t3=5 bracket_all overrides other choices fine_tune_t1 or fine_tune_t2 or fine_tune_t3 or fine_tune_all since we optimize one variable at a time, this option allows us to set end of single variable optimizations. default fine_tune_t1=fine_tune_t2=fine_tune_t3=0.01 fine_tune_all overrides other options Output Files: gamma_file_ this will have your estimations of gamma (proportion of genes from P1), t1 (the most recent branch), t2 (the middle branch), t3 (the most ancient branch), and the results of the likelihood ratio test in that order (YES means that the test detects hybridization at 95% confidence). compare_likes_ this will have the likelihood estimates for gamma at the MLE and gamma at 0. Key_ give you altered names for your taxa. ###### To have H, P1, and P2 alternate as the potential hybrids, place the ITERATE script in the same directory as the HybTree and as the Intra executable. In input_file , place the new command line "iterate yes". Extra output files will be: gamma_file_HybridsAsH_ For the samples labeled as H in input file as the putative hybrids gamma_file_HybridsAsP1_ For the samples labeled as P1 in input file as the putative hybrids gamma_file_HybridsAsP2_ For the samples labeled as P2 in input file as the putative hybrids original_input_file Your original file for input_file is transferred here (it's overwritten otherwise). ############### For use in Windows ############## You'll need to download a perl compiler. I would recommend Strawberry Perl, freely available at http://strawberryperl.com/ There's a special script called HybTree_Windows.txt which works in the Windows environment (it's only two lines different in functionality, really). Instead of input_file, use the title input_file.txt Place the Intra executable as well as Degnan's cygwin1.dll file in the same folder as HybTree_Windows, input_file.txt, and your gene tree file. Through the perl command line (eg, in Strawberry Perl), enter the folder previously mentioned and type "perl HybTree_Windows.txt" and voila! Enjoy! ######## Example Dataset ############ Provided is an example input_file (titled input_file_dragons) and a gene tree file (titled DragonTrees). The researchers are interested in whether Draconi smokus resulted from a hybridization event between Draconi firus and Draconi waterus. Geckos were chosen as an outgroup . To run the script, move input_file_dragons to input_file (or input_file.txt if using Windows) and type "perl HybTree" (or "perl Hybtree_Windows.txt" if using Windows). You should get an estimated level of hybridization of 0.373 and t1 = 0.621 , t2 = 0.729 , t3 = 5.0 (or the boundary level) and a p-value of 0.000009088.