Title:
Investigating mitochondrial DNA relationships in Neolithic Western Europe through serial coalescent simulations
Authors:
Maïté Rivollat, Stéphane Rottier, Christine Couture, Marie-Hélène Pemonge, Fanny Mendisco, Mark G. Thomas, Marie-France Deguilloux, Pascale Gerbault
SUPPLEMENTARY MATERIAL
*** Descriptive statistics **
Under a Wright-Fisher model, at similar effective population size with no migration, pairwise F_{ST} measures the genetic differentiation caused by drift between two populations. We consequently used F_{ST} as a summary statistic to assess whether levels of population differentiation greater than observed could be obtained under a simple model of population panmixia (i.e. when genetic drift is the only main process at play). We thus calculated 6 pairwise F_{ST} as a measure of genetic differences between the four groups. These statistics were determined at the ‘GROUP_LEVEL’ with ARLSUMSTAT version 3.5.1.2 (ref. 1), a modified version of Arlequin for computing summary statistics on linux operating system. Observed pairwise F_{ST} values are given in Figure 3.
*** Serial Coalescent Simulations**
Computer simulations are an essential analytical tool that helps to test alternative hypotheses and explore empirical data^{2}. More specifically, the stochasticity of the genealogical process needs to be accounted for when interpreting genetic data^{3}. The coalescent is an extremely efficient mathematical construct that deals precisely with gene genealogy stochasticity. We consequently used serial coalescent simulations that allows sampling of ancient DNA at different point backward in time to test whether genetic drift alone within a single panmictic population could explain observed patterns of mtDNA differentiation between ancient hunter-gatherers (HG), central (Central-F) and southern Neolithic (South-F) farmer samples and Gurgy.
In the serial coalescent simulations model two demographic events are assumed, as in Bramanti et al. 2009. The first one follows an initial colonization of Europe 45 000 years ago of female effective population size N_{UP}, sampled from N_{A}, an ancestral non-African effective population size of 5 000 females. N_{A} is derived from the commonly used long-term effective human population size of 10 000 individuals outside Africa^{5-8} and assuming a 1:1 female to male ratio. The second demographic event follows the Neolithic transition in Western Europe 5 900 years cal. BP of female effective population size N_{N}.
The latter date (5 900 years cal. BP) determines the “t0” in our coalescent simulations, since it is the median calibrated C14 date of the youngest ancient mtDNA group we used, see Table S1. We explored 50 values for N_{UP} ranging from 1 to 5 000 and 50 values for N_{N} ranging from 10 to 100 000 (see Table S2).
Our model differs in two major aspects from the one of Bramanti et al. 2009. First, our demographic model does not integrate a modern population sample and we do not compare any of the ancient groups to modern ones. Second, we extended the ranges for N_{UP} and N_{N} used by Bramanti et al. 2009 ([10 – 5 000] and [1 000 – 100 000], respectively) towards the lower bound of these ranges in order to account for a wider range of demographic scenarios of both population growth and decline (ranges provided in Table S2).
We used the serial coalescent algorithm fastsimcoal version 2.5.1 (ref. 9) to simulate mitochondrial genealogies of ancient hunter-gatherer and farmer sequences. In each genealogy, simulated sequences were sampled according to the numbers and the median of the radiocarbon date ranges of the sequence groups presented in Table S1 and Figure 2. We used a fixed mutation rate of 5.10^{-6 }/bp/generation (ref. 10).
50 000 gene genealogies were generated under each of the 2 500 combinations of N_{UP }and N_{N} values (Table S2). We computed the six pairwise F_{ST} between the simulated population samples (Figure 2) and recorded the proportion of simulated F_{ST} values that were greater than those observed per pairwise F_{ST} per parameter combination (Figure 3). Results were analysed and plotted using the statistical analysis programming language R, version 2.15.1 (ref. 11).
The top right area delimited by the black horizontal and vertical lines on Figure 3 shows the ranges for N_{UP} and N_{N} previously used in Bramanti et al. 2009 and Rasteiro and Chikhi 2010. These prior ranges do not permit to recover higher mtDNA differentiation observed between the contemporaneous South-F and Central-F groups under the simple assumptions of a panmictic population. However, we show here that smaller values of N_{N} (between 10 and 200 females) would be enough to obtain larger levels of genetic differentiation between these two farmer groups, without invoking population structure^{12} and/or gene-flow.
*** ABC-related approach**
Although we quantified the proportion of pairwise F_{ST} that was greater than observed, it does not show that observed levels of within and between group diversity were actually recovered in the datasets simulated under a simple panmictic model. In order to assess how close the simulated datasets were to the corresponding observed values, we carried out a rejection algorithm widely used in approximate Bayesian computation (ABC) procedure. Briefly, the ABC rejection algorithm involves calculating the difference between observed summary statistics and corresponding simulated datasets (e.g. ref. 15). In this analysis, as well as the between population diversity statistics, i.e. the 6 pairwise F_{ST},_{ }we used two within population diversity statistics, i.e. the number of segregating sites (S) and the number of pairwise differences (Pi) within group. Comparison of these observed values to those simulated provides a measure of how close a simulated dataset is to the observed dataset and thereby informs on whether observed data could be recovered under the simple panmictic population model considered.
We used the rejection algorithm of the ‘abc’ package available in R^{16} to retain the 1 000 simulations that generated simulated datasets the closest to the 14 observed summary statistics for each parameter combination (tolerance 2% of 50 000 simulations per parameter combination). This left us with 2 500 000 retained simulated datasets (Figure S1, solid lines) containing an equal number of each parameter combination. We subsequently repeated the ABC-rejection selection process over these (tolerance 0.5% of 2 500 000) to retain the 12 500 closest simulated dataset (Figure S1, dashed lines) to the observed (Figure S1, solid vertical line). Figure S1 shows that the simple panmictic population model does recover both within and between population group diversity levels.
In particular, most of the observed statistics values lie within the 95% confidence interval (dashed vertical lines) of the stringiest ABC-rejection threshold (tolerance 0.5% of 2 500 000). Even though one observed statistics value, i.e. the number of pairwise differences within the Central-F farmer group, lies slightly right of the upper limit of the 95% confidence interval, it is not completely out of range.
For the F_{ST} between Central-F and South-F and Hunter-gatherers and South-F, the difference between the mode of the distribution and the observed value is negligible, while in the former the observed value still falls within the retained simulated distribution. This corroborates that this latter pairwise comparison (Central-F and South-F) is the one for which the parameter space explored for N_{N} and N_{UP} is the narrowest, and suggests that alternative models may explain the differentiation between these two contemporaneous groups better, such as for example population structure with gene-flow.
We finally estimated N_{UP} and N_{N} effective population sizes from these 12 500 retained simulations using both the simple rejection and the local linear regression algorithms provided in the ‘abc’ R package (Figure 3). The 95% credible intervals obtained from the rejection algorithm are [5 – 3 500] N_{UP} females (mode=59.8) and [200 – 7 750] N_{N} females (mode=974.2). The estimates using the local linear regression algorithm are in similar ranges, i.e. [5 – 624] females (mode=33.3) and [1 446 – 10 055] females (mode=3 804) for N_{UP} and N_{N}, respectively. We caution against over-interpretation of the N_{UP} and N_{N} estimates we provide here. They are only given as a means of comparison to the ranges shown on Figure 3 and should not be used as global European female effective population size estimates, but rather for investigating further hypotheses of temporary population decline in Europe. Our approach is not really suited to precise estimation of effective population sizes; there is likely insufficient information in the data to make such estimates.
FIGURE S1: Simulated within (S and Pi) and between (pairwise F_{ST}) population group diversity levels retained after approximate Bayesian computation procedure. S and Pi stand for the number of segregating sites and the number of pairwise differences within population group, respectively. Solid lines show 2 500 000 retained simulated datasets each containing an equal number of each parameter combination. Dashed lines show the 12 500 closest simulated set of summary statistics retained from those 2 500 000 datasets. Dashed vertical lines show the 95% confidence interval of the dashed distributions (from 12 500 closest simulated datasets). Solid vertical lines show the observed summary statistics values. The fewer the number of simulations retained, i.e. the higher the threshold value (from solid to dashed lines), the closer the simulated datasets are to the observed values. This Figure shows that the simple panmictic population model does recover the observed values.
References:
1 Excoffier, L. & Lischer, H. E. Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows. *Molecular ecology resources* 2010; **10**, 564-567.
2 Currat, M. & Silva, N. M. Investigating European genetic history through computer simulations. *Human heredity* 2013; **76**, 142-153.
3 Barbujani, G. Genetic evidence for prehistoric demographic changes in Europe. *Human heredity* 2013; **76**, 133-141.
4 Bramanti, B., Thomas, M. G., Haak, W.* et al.* Genetic discontinuity between local hunter-gatherers and central Europe’s first farmers. *Science* 2009; **326**, 137-140, doi:10.1126/science.1176869.
5 Takahata, N. Allelic genealogy and human evolution. *Molecular biology and evolution* 1993; **10**, 2-22.
6 DeGiorgio, M., Degnan, J. H. & Rosenberg, N. A. Coalescence-time distributions in a serial founder model of human evolutionary history. *Genetics* 2011; **189**, 579-593.
7 Consortium, T. G. P. A map of human genome variation from population-scale sequencing. *Nature* 2010; **467**, 1061-1073.
8 Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. *Nature* 2011; **475**, 493-496, doi:http://www.nature.com/nature/journal/v475/n7357/abs/nature10231.html#supplementary-information.
9 Excoffier, L. & Foll, M. Fastsimcoal: a continuous-time coalescent simulator of genomic diversity under arbitrarily complex evolutionary scenarios. *Bioinformatics* 2011; **27**, 1332-1334.
10 Soares, P., Ermini, L., Thomson, N.* et al.* Correcting for purifying selection: an improved human mitochondrial molecular clock. *The American Journal of Human Genetics* 2009; **84**, 740-759.
11 Team, R. D. C. *R: A language and environment for statistical computing*. R Foundation for Statistical Computing, 2008;
12 Rasteiro, R. & Chikhi, L. Female and male perspectives on the neolithic transition in Europe: clues from ancient and modern genetic data. *PLoS One* 2013; **8**, e60944, doi:10.1371/journal.pone.0060944.
13 Beaumont, M. A. Approximate Bayesian computation in evolution and ecology. *Annual Review of Ecology, Evolution and Systematics* 2010; **41**, 1.
14 Beaumont, M. A., Zhang, W. & Balding, D. J. Approximate Bayesian computation in population genetics. *Genetics* 2002; **162**, 2025-2035.
15 Marjoram, P. & Tavaré, S. Modern computational approaches for analysing molecular genetic variation data. *Nature Reviews Genetics* 2006; **7**, 759-770.
16 Csilléry, K., Blum, M. G., Gaggiotti, O. E. & François, O. Approximate Bayesian computation (ABC) in practice. *Trends in Ecology & Evolution* 2010; **25**, 410-418.
**Share with your friends:** |