Supplementary information to Isolation and gene flow in a speciation continuum in newts



Download 50.06 Kb.
Date31.07.2017
Size50.06 Kb.
#25216


Supplementary information to

Isolation and gene flow in a speciation continuum in newts

Maciej Pabijan1,2, Piotr Zieliński1, Katarzyna Dudek1, Michał Stuglik1,3, Wiesław Babik1

1Institute of Environmental Sciences, Jagiellonian University, ul. Gronostajowa 7, 30-387, Kraków, Poland

2Department of Comparative Anatomy, Institute of Zoology, Jagiellonian University, ul. Gronostajowa 9, 30-387, Kraków, Poland

3Scotland’s Rural College, Integrative Animal Sciences, Easter Bush Campus, Midlothian EH25 9RG, Scotland, UK

Corresponding authors:

Maciej Pabijan, Department of Comparative Anatomy, Institute of Zoology, Jagiellonian University, ul. Gronostajowa 9, 30-387, maciej.pabijan@uj.edu.pl

Wiesław Babik, Institute of Environmental Sciences, Jagiellonian University, ul. Gronostajowa 7, 30-387, Kraków, Poland; wieslaw.babik@uj.edu.pl



Extended Methods

Validation of STRUCTURE-determined OTUs in BPP

We validated the OTUs identified in STRUCTURE by applying joint Bayesian species delimitation and species tree estimation using the program BPP v3.1 (Yang 2015). Input alignments were constructed from dataset 2 by taking 10 sequences per marker for each of the 9 STRUCTURE-defined genetic clusters (i.e., operational taxonomic units, OTUs). The population size parameters (θs) and divergence time at the root of the species tree (τ0) were empirically determined in preliminary runs and then assigned gamma priors G(1, 35) and G(2, 500) resulting in a 95% prior intervals (0.00088; 0.12296) and (0.00034; 0.04918), respectively. Other divergence time parameters were assigned a Dirichlet prior (Yang and Rannala 2010). Because BPP may be sensitive to prior distributions of θ and τ0 (Zhang et al. 2011), we also examined the influence of a wide range of demographic and divergence scenarios by modifying these priors: large populations G(1,10) and shallow divergence G(2, 2000), small populations G(2, 2000) and shallow divergence G(2, 2000), large populations G(1,10) with deep divergence G(1,10), large populations G(1,10) with intermediate divergence G(2, 1000). We specified uniform probabilities for rooted trees (prior 1 on species delimitation and phylogeny) for all analyses, as this prior places higher probabilities on models with fewer numbers of delimited species (Yang & Rannala 2014). We also checked the effects of different starting trees on our analysis. Each analysis was run twice to check for convergence.



Phylogenetic analysis using BUCKy

We used the posterior distributions of gene trees generated in MrBayes as input for BUCKy (Ané et al. 2007; Larget et al. 2010). In preliminary runs using subsets of the tips (5 or 10 tips per OTU), for most markers all 500 trees in the post-burnin posterior had different topologies. Because BUCKy first calculates and then uses relative frequencies of tree topologies from the posterior, high proportions of distinct trees may be uninformative. We therefore pruned the gene trees in the MrBayes posterior distributions by randomly selecting a single tip from each STRUCTURE-delimited group using an R script from Weisrock et al. (2012), modified to accommodate our dataset and subsampling requirements. The same individual/haplotype combination was retained across all markers. Pruning the original gene trees to 10 OTUs (9 genetic clusters plus outgroup) resulted in much lower numbers of topologies in the posterior distributions, but retained the relationships from the original gene trees computed using information from the full dataset. The first 50% of trees from the posterior distribution of each MrBayes run were discarded as burnin, leaving 500 trees for summary in MBSUM in the BUCKy package. BUCKy was invoked for 106 generations; to improve mixing, we used four chains (one cold, 3 heated) which swapped states every 50 updates (-r 50) and lowered the alpha multiplier (-m 5). BUCKy uses a Dirichlet prior on the distribution of gene trees that depends on how many genes share the same topologies, controlled by the parameter α. We expected high levels of discordance and set this parameter to 20. However, we note that preliminary runs with α = 0, 1, 10, 20 and 100 revealed that topologies of primary concordance trees and concordance factors were negligibly affected by the a priori level of discordance (data not shown).



*BEAST analyses

The *BEAST module of BEAST v2.1.3 (Heled & Drummond 2010; Bouckaert et al. 2014) was used to infer a species tree using the STRUCTURE-defined candidate species. To increase the efficiency of the MCMCs and decrease run times, we minimized the complexity of the models starting from the *BEAST template in BEAUTI 2.0. We specified identical but unlinked site (HKY model of sequence evolution, empirical nucleotide frequencies), clock (strict with diffuse gamma distributions on clock priors) and gene tree models for each marker. A Yule pure birth process was set for the species tree (an uninformative 1/X prior was set for Yule birth rate). We implemented linear growth with a constant root on the population demographic function and used a 1/X prior on the pop.mean parameter. Preliminary runs using 5 and 10 haplotypes per taxon (from dataset 2) achieved poor effective sample sizes for key parameters after 109 generations, and poor parameter and topological convergence in Tracer and AWTY. Given the prohibitively long run times that would be needed to achieve satisfactory results, we further simplified the analyses by randomly subsampling two gene copies from each Lissotriton lineage across the 74 alignments. Because these runs used only 18 (7.6% of total number) of the gene copies available per marker per analysis, we repeated the random selection procedure 10 times, constructing 10 input files.



ABC analyses

We estimated gene flow between all pairs of taxa (36 pairwise comparisons) in an Approximate Bayesian Computation (ABC) framework. We assume that newt breeding ponds correspond to discrete demes undergoing extinction/recolonization processes and thus consider the set of regional populations (i.e. putative species) as a metapopulation. It has been shown (Wakeley & Aliacar 2001; Wakeley 2004) that if one gene copy per locus is sampled per deme in a metapopulation composed of a large number of demes, the ancestral process producing the sample is identical to the unstructured coalescent. Final datasets included 60-66 loci with between 5 and 24 sequences per taxon (Table S3and Table S9); one sequence per locality was sampled.

The following set of 9 summary statistics were used for analyses: the average number of segregating sites per marker (S), the average number of fixed differences per marker (SF) and its variance (Var SF), the average number of shared polymorphisms per marker (SS) and its variance (Var SS), the average number of polymorphisms private to each taxon per marker (SP taxon 1 and taxon 2), pairwise FST, and the number of shared haplotypes summed over all markers (HS). These statistics were not strongly correlated with each other. Summary statistics for both observed and simulated datasets were calculated using mstatspop v.0.998980beta and custom Python scripts on polymorphic biallelic sites only, positions with more than two segregating variants were excluded as departing from the infinite sites model. Using the same set of loci, Zieliński et al. (2016) recovered very similar results for untransformed statistics vs. those transformed via partial least squares regression (Boulesteix & Strimmer 2007). We therefore report untransformed statistics only.

Coalescent simulations were done in fastsimcoal2.01 (Excoffier et al. 2013). Loci were simulated as independent chromosomes and a single, fixed mutation rate of 5.7 x 10-9 per site, per generation was used following Zieliński et al. (2016). We simulated loci of the same lengths and used the same number of sequences as in the observed ABC datasets. The ABC analysis was conducted within the ABCtoolbox (Wegmann et al. 2010). Prior parameter distributions for population sizes and recombination rates (Table S9) follow Zieliński et al. (2016).



References

Ané C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees. Molecular Biology & Evolution, 24, 412–426.

Bouckaert R, Heled J, Kühnert D, Vaughan T, Wu CH, Xie D, Suchard MA, Rambaut A, Drummond AJ (2014) BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Computational Biology, 10, e1003537.

Boulesteix AL, Strimmer K (2007) Partial least squares: a versatile tool for the analysis of high-dimensional genomic data. Briefings in Bioinformatics, 8, 32–44.

Excoffier L, Dupanloup I, Huerta-Sánchez E, Sousa VC, Foll M (2013) Robust demographic inference from genomic and SNP data. PLoS Genetics, 9, e1003905.

Heled J, Drummond AJ (2010) Bayesian inference of species trees from multilocus data. Molecular Biology & Evolution, 27, 570–580.

Larget B, Kotha SK, Dewey CN, Ané C (2010) BUCKy: Gene tree / species tree reconciliation with the Bayesian concordance analysis. Bioinformatics, 26, 2910–2911.

Wakeley J (2004) Metapopulation models for historical inference. Molecular Ecology, 13, 865–875.

Wakeley J, Aliacar N (2001) Gene genealogies in a metapopulation. Genetics, 159, 893–905.

Wegmann D, Leuenberger C, Neuenschwander S, Excoffier L (2010) ABCtoolbox: a versatile toolkit for approximate Bayesian computations. BMC Bioinformatics, 11, 116.

Weisrock DW, Smith SD, Chan LM, Biebouw K, Kappeler PM, Yoder AD (2012) Concatenation and concordance in the reconstruction of mouse lemur phylogeny: an empirical demonstration of the effect of allele sampling in phylogenetics. Molecular Biology & Evolution, 29, 1615–1630.

Yang Z (2015) The BPP program for species tree estimation and species delimitation. Current Zoology, 61, 854–865.

Yang Z, Rannala B (2010) Bayesian species delimitation using multilocus sequence data. Proceedings of the National Academy of Sciences, 107, 9264–9269.

Yang Z, Rannala B (2014) Unguided species delimitation using DNA sequence data from multiple loci. Molecular Biology & Evolution, 31, 3125–3135.

Zhang C, Zhang DX, Zhu T, Yang Z (2011) Evaluation of a Bayesian coalescent method of species delimitation. Systematic Biology, 60, 747–761.

Zieliński P, Nadachowska-Brzyska K, Dudek K, and Babik W (2016) Divergence history of the Carpathian and smooth newts modelled in space and time. Molecular Ecology, 25, 3912–3928.



Figure S1. Relationships among major mtDNA lineages (denoted by capital letters) in Lissotriton, modified from Pabijan et al. (2015). Species acronyms represent the morphological species in which mtDNA lineages were found. Instances of putative or confirmed mtDNA introgression are indicated by red font. Red stars indicate mtDNA lineages found in L. montandoni. For details see Babik et al. (2005), Nadachowska & Babik (2009), Zieliński et al. (2013) and Pabijan et al. (2015). LvaL. v. ampelensis; LvgL. v. graecus; LvkL. v. kosswigi; LvmL. v. meridionalis; LvsL. v. schmidtleri; LvvL. v. vulgaris



Figure S2. Estimated Ln probability of the data for a given K, from STRUCTURE HARVESTER; ten replicate analyses were run for each K.

Figure S3. Substructure within South L. v. vulgaris (light blue and dark blue colors) and L .v. graecus (orange and light orange). Grey colors denote other taxa. In the former, a distinct genetic cluster was found in Serbia, Kosovo and western Bulgaria in a total of 18 STRUCTURE runs. L .v. graecus populations from the Peloponnese Peninsula formed a divergent cluster in 4 STRUCTURE runs; three individuals from extreme southeastern Greece and Korfu contained substantial proportions of this southernmost genetic cluster.


Figure S4. Consensus trees from 10 replicates of concatenated analyses in MrBayes. OTUs represent randomly selected haplotypes from 9 delimited Lissotriton taxa. Each haplotype (36,918 bp) was constructed by the concatenation of single alleles from 74 markers; thus each replicate constitutes a unique combination of individuals and alleles representative of each taxon. Trees are rooted with chimeric haplotypes from L. helveticus, L. boscai and L. italicus (not shown); OTUs are depicted as triangles proportional to haplotype diversity within taxa (all haplotypes from the same taxon invariably grouped together). Consensus topologies resulting from 10 replicates of concatenated analyses occurred with different frequencies shown under the trees (from left to right: 4, 2, 2, 1 and 1). Posterior probabilities are shown as values at nodes, branches receiving ≥0.95 are depicted by filled circles at nodes. Gray shading indicates clades with invariable positions across replicates, colored triangles indicate taxa with unstable phylogenetic positions across replicates.



Figure S5. Variation in concordance factors (expressed as the number of genes supporting a particular clade) for main clades found in 50 replicate BUCKy analyses. Only clades encompassing two taxa are shown because all clades with more than two taxa received low support (with one exception – L. montandoni vs. all L. vulgaris lineages). Each panel shows variation in concordance factors for clades consisting of one (in bold, upper right hand-side) and other taxa used in the analyses. For instance, the primary history for L. montandoni (1st panel) consists of a sister group relationship with all L. vulgaris lineages, as shown by the relatively high number of genes (mean of 20.5) supporting the L. montandoni - L. vulgaris clade. The minor history for L. montandoni is a sister group relationship with L. v. ampelensis, supported on average by 9.4 genes. Note that each pairwise comparison of terminal taxa is supported by at least low levels of genealogical concordance, e.g. on average less than 5 genes support a particular relationship (dotted line in panels) per replicate; we interpret this ‘residual’ genealogical relatedness as the result of incomplete lineage sorting or ancient admixture between taxa. LvaL. v. ampelensis; LvgL. v. graecus; LvkL. v. kosswigi; Lvl L. v. lantzi; LvmL. v. meridionalis; Lmon L. montandoni; LvsL. v. schmidtleri; NLvv – North L. v. vulgaris; SLvv – South L. v. vulgaris



Figure S6. *BEAST results. A) DensiTree diagrams representing 4000 species trees sampled from the posteriors of paired runs of the same input file (i.e., the analyses differed only in the starting value of the random seed). Posterior probabilities (>0.5) for branches of the most frequent species tree are shown. B) Plot of posterior probabilities for all splits for paired runs (compare function in AWTY; Nylander et al. 2008) of the same analysis revealing consistency in retrieving the same species tree topology (branches that have the same posterior probability in both sets of trees will fall along the diagonal). C) 95% highest posterior density intervals for three key parameters (log likelihood, posterior and species tree height) from *BEAST analyses compared for paired runs. LvaL. v. ampelensis; LvgL. v. graecus; LvkL. v. kosswigi; Lvl L. v. lantzi; LvmL. v. meridionalis; Lmon L. montandoni; LvsL. v. schmidtleri; NLvv – North L. v. vulgaris; SLvv – South L. v. vulgaris









Figure S7. Scaled residual fit from the TreeMix analysis based on the maximum likelihood tree in Fig. 2C (no migration edges). Residuals above zero (see color palette) represent populations that are more closely related to each other in the data than in the maximum likelihood tree, and are potential candidates for admixture events.



Download 50.06 Kb.

Share with your friends:




The database is protected by copyright ©ininet.org 2024
send message

    Main page