COALGF Calculator - Program for Computing Probabilities of Gene Tree Histories Given Species Trees Under the Coalescent Process with Gene Flow This program computes the probabilities of all possible gene tree histories given a species tree under the coalescent model with gene flow between both sister taxa. This program is suitable for species trees with three taxa, with one sequence sampled from each taxa. The program is able to compute gene tree history distributions for species trees with various effective population sizes, gene flow rates and speciation times. The program currently does not accept trees with more than 3 taxa. I. Installing and Running the Program Please notice that The COALGF program relies on GSL, which must be installed on your system prior to running COALGF. You may have to change the Makefile to give the correct path to the GSL libraries. To run the program in UNIX or LINUX, pleas download the source code and compile by typing at the prompt (>): > make The current version of the code was compiled using gcc, and it has not yet been tested in WINDOWS or MAC platforms. After the code is compiled, make sure you have the file myinputfile in the current directory. To run the program, type > ./COALGF to run myinputfile. II. The Input File Eight parameters are required in the input file with the following format: Coal1: Coal2: Coal3: Coal12: Geneflow1: Geneflow2: Time1: Time2: As described in Tian and Kubatko (2015), "Coal1", "Coal2", "Coal3", and "Coal12" are the scaled coalescent rates of species 1, 2, 3, and the ancient species 12, respectively (referred as C1 - C4 in the paper). "Geneflow1" and "Geneflow2" are the gene flow rates between the species 1 and 2, and the ancient species 12 and 3, respectively (referred as M1 and M2 in the paper). The two scaled speciation times "Time1" and "Time2" are referred to T1 and t2 in the paper. For one example species tree ((1,2),3), with equal effective population size N for all four species 1, 2, 3, and 12. Suppose that theta1 = theta2 = theta3 = theta12 = 4N*mu = 0.005 (here mu is the mutation rate per cite), M1 = M2 = 0.5, and the two speciation times are tau1 = 0.004 and tau2 = 0.006 (in coalescent unit). We need to scale the values of thetas and taus to run COALGF In COALGF, the parameters are relatively scaled by dividing to a chosen theta0 (theta0 = theta1) so that theta1 is always equal to 1. Thus, in above example, theta0 = 0.005. Following the scaling method described in Tian and Kubatko (2015), we can scale 2/theta by dividing 2/theta0, and scale tau by times 2/theta0. Thus, the above example has: Coal1 = (2/theta1)/(2/theta0) = theta0/theta1 = 1; Coal2 = (2/theta2)/(2/theta0) = theta0/theta2 = 1; Coal3 = (2/theta3)/(2/theta0) = theta0/theta3 = 1; Coal12 = (2/theta4)/(2/theta0) = theta0/theta12 = 1; Time1 = tau1*(2/theta0) = 1.6; Time1 = tau1*(2/theta0) = 2.4. The myinputfile of the above example is: Coal1: 1 Coal2: 1 Coal3: 1 Coal12: 1 Geneflow1: 0.5 Geneflow2: 0.5 Time1: 1.6 Time2: 2.4 Below we show another example species tree ((1,2),3), with unequal effective population sizes for all four species 1, 2, 3, and 12. Suppose that theta1 = 0.005, theta2 = 0.01, theta3 = 0.01, theta12 = 0.025, M1 = 0.5, M2 = 2, and the two speciation times are tau1 = 0.005 and tau2 = 0.01 (in coalescent unit). To scale the values of thetas and taus, we select theta0 = theta1 = 0.005, and then Coal1 = (2/theta1)/(2/theta0) = 1; Coal2 = (2/theta2)/(2/theta0) = 0.5; Coal3 = (2/theta3)/(2/theta0) = 0.5; Coal4 = (2/theta4)/(2/theta0) = 0.2; Time1 = tau1*(2/theta0) = 2; Time1 = tau1*(2/theta0) = 4. The myinputfile of the second example is: Coal1: 1 Coal2: 0.5 Coal3: 0.5 Coal12: 0.2 Geneflow1: 0.5 Geneflow2: 2 Time1: 2 Time2: 4 Simply changing the values in myinputfile can calculate the gene tree distributions for different species trees. We suggest keeping the ratio of the coalescent rate (theta) and the speciation time (tau) less than 10 to avoid any numerical issues. III. The Output File The results are stored in the file myoutputfile. In the output, we denote the probability of a genealogy with gene tree topology ((1, 2), 3) by G1, the probability of a genealogy with gene tree ((2, 3), 1) by G2, and the probability of a genealogy with gene tree ((1, 3), 2) by G3. G1H1 to G1H5 are the probabilities of five different histories consistent with topology ((1, 2), 3). G2H1 to G2H3 are the probabilities of three histories consistent with topology ((2, 3)), 1). G3H1 to G3H3 are the probabilities of three histories consistent with topology ((1, 3)), 2). Sum is the overall probabilities of all eleven histories, which should be equal to 1 (the sum could be slightly larger or smaller than 1 due to numerical issues). The output for the first example is: The coalesent rates are 1.000000, 1.000000, 1.000000, and 1.000000 The migration rates are 0.500000 and 0.500000 The times are 1.600000 and 2.400000 The probability of each history: G1H1 = 0.058190 G1H2 = 0.308962 G1H3 = 0.026748 G1H4 = 0.230032 G1H5 = 0.082044 G2H1 = 0.008584 G2H2 = 0.056383 G2H3 = 0.082044 G3H1 = 0.008584 G3H2 = 0.056383 G3H3 = 0.082044 G1 = 0.705976 G2 = 0.147012 G3 = 0.147012 Sum = 1.000000 The output for the second example is: The coalesent rates are 1.000000, 0.500000, 0.500000, and 0.200000 The migration rates are 0.500000 and 2.000000 The times are 2.000000 and 4.000000 The probability of each history: G1H1 = 0.337208 G1H2 = 0.177447 G1H3 = 0.017598 G1H4 = 0.084323 G1H5 = 0.062537 G2H1 = 0.016341 G2H2 = 0.081567 G2H3 = 0.062537 G3H1 = 0.016341 G3H2 = 0.081567 G3H3 = 0.062537 G1 = 0.679112 G2 = 0.160444 G3 = 0.160444 Sum = 1.000000 Please note that for the cases with inappropriate parameter settings, an error message "Calculation can not be completed because of numerical issues." will show up. Please double check the parameter scaling process or change the parameter settings to run the program to fix such errors. Acknowledgements Development of COALGF was partially supported by the National Science Foundation under Grant DEB-1455399. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. Contact Info Questions or comments concerning the program can be e-mailed to Yuan Tian at tian.52@osu.edu.