**Appendices for:**
__Developing dynamic, mechanistic species distribution models: predicting bird-mediated spread of invasive plants across northeastern North America __
Cory Merow, Nancy LaFleur, John A. Silander Jr., Adam M. Wilson and Margaret Rubega
## Table of Contents
**Appendix S1: Considerations for developing robust GB models 2**
**Appendix S2: Sensitivity Analysis 5**
**Table S1. Summary of Sensitivity Analysis on Parameters 11**
**Table S2. Summary of Sensitivity Analysis on Model Assumptions 12**
**Figure S1: No plant population growth. 13**
**Figure S2: No local bird dispersal. 14**
**Figure S3. No long distance dispersal. 14**
**Figure S4. Homogeneous landscape. 15**
**Figure S5. Homogeneous landscape with no LDD. 16**
**Figure S6: Binary Landscape. 17**
**Figure S7: Random Landscape. 18**
** 18**
**Figure S8: First Introductions in South. 19**
**Figure S9: First Introductions in North. 19**
**Figure S10: First Introductions in Center. 20**
**Figure S11: First Introductions in West. 20**
**Figure S12: First Introductions in East. 21**
**Figure S13: Long term predictions 21**
**Figure S14: Random LDD Neighborhood 25 x 25 22**
**Figure S15: Random LDD Neighborhood 51 x 51 22**
**Figure S16: Start model in 1939 23**
**Figure S17: Start model in 1959 23**
**Figure S18: Shorter Bird Dispersal and More LDD 24**
** 24**
**Appendix S3: Additional Data and Analysis 25**
**Table S3. Growth Rate and Habitat Use Data 25**
**Table S4: LULC Reclassifications 26**
**Figure S19. Seed dispersal kernel estimates 27**
**Figure S20. Christmas Bird Counts for Potential Bittersweet Dispersers 28**
**Figure S21. Consequences of eradication efforts in CT and RI 28**
**Appendix S4: Model Code……………………………………….……………………………………………...……….………………..****28**
##
## Appendix S1: Considerations for developing robust GB models
In order to appreciate the strengths and weaknesses in our models, it is necessary to understand the context in which they were conceived and appropriate expectations about their predictions. Most importantly, GB models provide the opportunity for mechanistic modeling, in contrast to statistical models that are typically phenomenological. Our goal was to see how a few simple mechanisms interact, then determine how tradeoffs between model parameters lead to categorically different types of predictive behavior. In contrast, detailed studies of species’ performance in different environments can generate detailed demographic predictions (e.g. Leicht 2005; Leicht-Young 2007), but such models necessarily focus on small spatial scales and often fail to capture an understanding of mechanistic linkages (e.g. dispersal and spread) or broad patterns. So what should an ecologist do when faced with understanding broad scale dynamics based on some insightful, but incomplete data collected at smaller scales? We answer this using GB models to connect mechanistic simplicity (plant population growth, local dispersal, and LDD) with empirical observations. Rule-based, stochastic models such as these are necessary when functional relationships between variables are unknown and temporal or spatial heterogeneity is important (Hogeweg and Hesper 1981; Darwen and Green 1996). We view empirically based GB models such as ours as a compromise between abstract generalizations and detailed, species-specific models at local scales.
What are the inputs for an empirically based GB model? Our goal was to use the minimum number of mechanisms. Of course, adding additional parameters connected to lesser mechanisms can make small improvements, but this is analogous to improving the R^{2}_{ }values associated with regression by adding more explanatory variables; it can be done, but its not particularly informative and it only distracts from identifying the most important driving mechanisms. For example, including the effect of climate on population growth rate in our model might better explain absences in northern New England (cf. Ibanez et al. 2009). But the essence of the pattern of invasion is captured with the mechanisms we included, based on the accuracy of our predictions.
GB models are very flexible and permit a wide variety of pattern formation with highly customizable rules. Flexibility permits modeling multiple processes simultaneously, however it is essential that parameters and assumptions have reasonable empirical justification because spurious conclusions can result when parameter values are simply optimized to increase model fit. Assumptions are unavoidable when constructing ‘realistic’ GB models because the data are not always collected on the same spatial or temporal scale as the model operates. For example, to make more accurate species spread predictions one must consider spatial heterogeneity and particular demographic rates. Ecologists often have little control over time series data that rely on opportunistic collection from varied historical sources. We make assumptions to synthesize data from multiple sources and multiple scales; in the case of invasion, the alternative is to remain with something akin to a simple diffusion model, which is typically only suited for small spatial scales (cf. Anderson *et al*. 2004) or overly general and abstract models that are difficult to relate to real patterns. We suggest that the best way to justify any particular assumption is through sensitivity analysis. Ideally, slight deviations from the model’s parameter values produce little effect on predictions (thus we do not rely too heavily on the specific details of something for which we have few data) and that larger deviations lead to large changes in model output (thus our assumption is reasonable and an arbitrary assumption is insufficient).
Other assumptions and simplifications can be justified by sensitivity analysis. Consider our estimate of starling velocity. We surely underestimate starling velocity by assuming straight-line movement between observation points, but by how much? When we used dispersal kernels with larger means that are consistent with the longest retention times and largest velocities, spread was only slightly faster (see Table S1, where exponential rates between 2 and 4 lead to similar predictions). Thus considering more complicated scenarios for velocity estimation does not appreciably improve our model predictions or our understanding of the system. In this way, we attempt to construct the simplest possible model that is consistent with the data to determine the minimum set of phenomena necessary to understand our ecological system.
The most sensitive inputs for our model – introduction location, plant population growth rate, mean local bird dispersal distance and the inclusion of LDD – are all empirically grounded. Thus, it is important to realize that our predictions are emergent properties specific to our system and not a consequence of the model’s flexibility. This being said, tradeoffs between parameters in our model are capable of producing similar results. For example, a larger number of random LDD events per year can be coupled with lower population growth rates to produce predictions similar to ours. In such situations, we selected the model with the greatest empirical support; in this example we have strong evidence for high growth rates and expert observation of prolific plant population growth, while it is difficult - if not impossible - to quantify the rate at which LDD events occur (Clark *et al.* 2001).
Finally, we note some important considerations regarding model evaluation. Assessing model fit provided a substantial challenge because standard methods do not exist for presence-only data. With presence-only data, one can only say whether the model correctly predicts a presence, and cannot validate predicted absences. Hence standard metrics based on binary classification such as AUC, Cohen’s Kappa or Chi-squared statistics are not applicable. Often this issue is sidestepped by either introducing pseudo-absences that play the role of true absences or, in the case of statistical models, relying on standard model comparison metrics based on likelihoods that indicate the relative performance of competing models (e.g. Gelfand *et al*. 2005). Since our model is not statistical the latter is not applicable. Introducing pseudo-absences is not appropriate because many of the ‘holes’ in our presence-only data sets are unlikely to represent true absences (Ibáñez et* al* 2009a, b). Based on our knowledge of the ecology of bittersweet (Leicht 2005; Leicht-Young 2007; Ibáñez et* al* 2009a, b; Mosher *et al.* 2009; Latimer *et al.* 2009) and, for example, the similarity in landscape between sampled and unsampled sites in Connecticut, it is unlikely that any cells in Connecticut at this spatial scale represent true absences in 2010. We were thus confronted with evaluating the model while only measuring correctly predicted presences.
## Appendix S2: Sensitivity Analysis
**METHODS**
** Modifying Assumptions**
To assess the robustness of our model, we examined the consequences of different assumptions. We expected that three population expansion mechanisms would be necessary to describe the bittersweet invasion: plant population growth, local bird dispersal and random long-distance dispersal (LDD). We omitted each of these in turn and present the results here. When we omitted plant population growth, we still allowed a cell to produce emigrants. Thus the only way a cell could increase abundance was through immigration; consequently densities stay much lower than in the full model during the time scales that we studied. This provides a test for the effect of population size on spread. When we omitted local bird dispersal, we assumed that all emigrants (dispersal outside the source cell) were lost. Finally, we considered a model where LDD was omitted. We examined the sensitivity to the neighborhood size for LDD by also considering 25 x 25 and 51 x 51 neighborhoods around the source cell (instead of the entire landscape). The target cell was chosen from a uniform distribution spanning this neighborhood.
We also modified assumptions about the landscape. Since we presumed that a heterogeneous landscape is important to understand spread, we considered three alternative landscapes: homogeneous (all favorable habitat), binary (favorable and unfavorable habitat) and randomly sorted heterogeneous habitats. For the favorable habitat in both the homogeneous and binary models we used the mean population growth rate of the three favorable habitats (lambda > 1) from the full model, weighted by the proportion of habitat in the landscape. For the binary model, this led to a growth rate of 1.5 in favorable habitat and 0.5 in unfavorable habitat. The value of 1.5 arose because most (>60%) of favorable landscape was deciduous forest. For the homogeneous landscape, a weighted average of growth rates lead to a value of 0.94, which is clearly unable to sustain spread. In this case, we considered two scenarios: (1) arbitrarily choosing a higher value for growth rate of 1.1, and (2) using an unweighted mean of the growth rates (1.38, which is unrealistically high across all habitats). For the binary landscape, we consolidated developed, agricultural, and deciduous into favorable habitat (unfavorable coniferous habitat was unchanged). For the randomized landscapes, we created 20 landscapes by randomly reorganizing the cells, ran the model 50 times in each landscape, and averaged the results. Finally, to simulate the results obtained from a standard, simple diffusion model, we considered a homogeneous landscape without LDD.
We considered alternate introduction scenarios to determine the sensitivity of the model to initial conditions. We demonstrate the sensitivity of the model to the location of the first three naturalized population by considering five alternate scenarios: introducing populations only (1) in the south along the Connecticut shoreline; (2) in northern New England (Lake Champlain area, Aroostook county, Maine, and southeastern New Hampshire); (3) in the center of New England, along the northern border of Massachusetts); (4) along the western edge of New England; and (5) along the eastern (Atlantic) shore. Since we assumed that initiating the model in 1919 was a sufficient approximation of the early state of the landscape, we also examined the consequences of beginning in 1939 or 1959 with the same set of three introduction sites.
**Parameter Sensitivity**
We used sensitivity analyses to explore the parameter space of the model (see parameters in Table 1). Our objective was to see how robust our predictions were within intervals around the selected parameters, and to discover at what point our predictions lead to qualitatively different patterns. Locally robust predictions account for parameter estimation error, while globally sensitive predictions indicate that our model may be situation-specific. We modified all bittersweet growth rates by changing the values between 0.2-0.8 above and below the selected values for the best model. We modified starling habitat use by choosing a particular habitat, forcing the use to be 0.01, 0.10, 0.25, 0.50, 0.75, or 0.95 and then rescaling the remaining habitat use coefficients such that total habitat use summed to 1. For example, if developed habitat use was set to 0.75, the observed use for the remaining habitat types was renormalized and multiplied by 0.25 to ensure that the relationships between the use in those habitats was preserved. We modified mean local bird dispersal distance by changing the exponential rate (= 1/mean number of cells) to values between 1 and 6. We modified the number of LDD events per year between 0 and 100 events per year. We also varied cell carrying capacity over four orders of magnitude and threshold for assessing a correctly predicted presence. The values used for all these analyses are summarized in table S1.
**RESULTS**
All three spread mechanisms, plant population growth, local bird dispersal and random LDD, were essential to produce accurate predictions. In the absence of local population growth within cells, population spread is extremely slow (Fig. S1). When local bird dispersal is omitted, the population fails to spread at all (Fig. S2). When LDD is omitted spread is very slow and uniform (Fig. S3). Homogeneous landscape predictions were entirely driven by uniform spread around introduction points and produced little of the spatial asymmetry in observed spread patterns (Figs. S4-5). All scenarios using a homogeneous landscape – plant population growth rate at 1.1 to reflect low growth, at 1.38 to reflect high growth, and while omitting LDD to represent a diffusion model - under-predicted spread through 1960. The high growth rate scenario then over-predicted spread from 1980 onwards.
Binary landscapes predicted much less spread than the full model and were between 2 and 37% less accurate (Fig. S6). The simpler binary model could not predict the spatial asymmetry in spread and under-predicted the early surge of high growth in developed landscapes. Furthermore, the binary model lacks the ability to decompose landscape attributes to understand the differences between habitats. From the full model we learn that developed landscapes have extremely high population growth rates that provide important source populations while deciduous forests show lesser growth and act primarily as corridors between developed and agricultural patches. Also, the binary model overlooks how the geometry of different LULC classes contribute to spread, thereby ignoring potentially important corridors.
Random landscapes bore almost no resemblance to observed patterns, largely because the lack of spatial habitat clumping compared to the full or binary models (Fig. S7). The homogeneous landscape without LDD, meant to simulate a standard diffusion model, both missed the spatial asymmetry in spread patterns and substantially under-predicted spread (Fig. S5).
The points where independent populations of bittersweet were introduced was paramount in determining the temporal pattern of spread. For all introduction scenarios except the southern introduction spread was substantially under-predicted (Figs. S9-12). When populations were introduced in the east, west and center of New England, early patterns were heavily biased toward introduction points, but were similar to the full model by 2000 (fig. S33-S34). However when populations were seeded along the southern coast (in patterns very similar to the original model seeding sites) (Fig. S8) early spread was comparable to the full model. With the exception of the introduction in the south, the full model predicted the time series through 1960 with between 50% and 82% more accuracy than the alternate scenarios (Table 2).
The model results were also sensitive to the year when the model was initiated. Initiating the model at in either 1939 or 1959 under-predicted spread by 25-62% (Figs. S16-17). Predictions from these scenarios were comparable to the full model by 2000, although slightly lower.
Sensitivity analysis revealed the relationship between predictions and variation in parameter values (Table S1). We chose not to present the standard table of sensitivities (i.e. a 1% change in parameter leads to an x% change in fit) because of the highly nonlinear nature of the parameter sensitivity. In general, variables that define attributes of the landscape in the north, such as coniferous and agricultural landscape parameters, were less important than variables describing the southern landscape. For example, predictions were qualitatively similar to figure 4 as long as population growth in coniferous forest was less than 1. A growth rate greater than 1 permitted faster spread over a much greater spatial extent but over predicted spread only minimally through 2009 because populations are only beginning to reach the northern coniferous forests. Long term predictions were vastly different if > 1; there were no barriers to movement and spread could occur throughout New England (although it takes several hundred years to fill the landscape, fig. S13). The only essential condition on population growth rate in agricultural landscapes was that values remain above one. Predictions are insensitive to agricultural growth rates above 1.3 because agricultural land in the south is typically adjacent to developed landscapes, which produces enough propagules to obscure variation in agricultural parameters.
Bird habitat use had only minimal impact on prediction accuracy (all changes, both positive and negative, in accuracy < 9%). In part, this small effect results from the geometry of the landscape. Bird habitat use was most sensitive in deciduous forest. When deciduous habitat use was reduced to 1%, accuracy decreased 9% from 1940-1960. Varying starling habitat use of coniferous habitat had no effect on spread (except to the extent that it affected the relative use of deciduous landscapes) because the *Celastrus* population growth rate was less than one and emigrants were never produced.
The number of random LDD events was very important. Using even five LDD events per year quickly increased spread; the predictions for southern New England differed only minimally from the full model, however there was much more spread in the north. Increased LDD primarily enhanced spread, compared to the full model, because seeds reached favorable and isolated habitats sooner than they would if relying only on local bird dispersal (in the 7 x 7 neighborhood around the source cell). More than five LDD events per year quickly saturated the entire landscape by 1960 and substantially over-predicted spread. Changing the number of random long distance events per year only changed the temporal dynamics and not the ultimate spatial pattern. Shrinking the LDD neighborhood from the entire landscape to a 51 x 51 neighborhood centered on a source cell had no perceivable impact on predictions (Fig. S15); most observed presences were sufficiently close to others that LDD was not limited. However reducing the LDD neighborhood to 25 x 25 increased accuracy in 1960-1980 because LDD events could not be distributed throughout New England and instead boosted growth near existing populations. However predictions in 2000-2009 had under-predicted spread in to the north, making the larger neighborhood size seem more likely (Fig. S14). The increased success in 1960-1980 does, however, suggest that additional introductions may have taken place early on in southern New England.
The model was robust to changes in mean local bird dispersal distance (1/rate) for values between 3 and 4 (the best model used 3.5). However, higher dispersal rates (smaller means) yielded insufficient spread from 1980-2009. Lower dispersal rates (larger dispersal distances) over-predicted spread, saturating the entire landscape, and are probably ecologically unrealistic.
A number of parameters were relatively unimportant and demonstrate that our predictions are robust. Predictions were insensitive to the range of random LDD, showing little difference when varied between a 20 x 20 cell neighborhood and the entire landscape (Figs. S14-15). Varying carrying capacity between 50 and 50,000 did not affect results. Only when carrying capacity was as low as 10 were an insufficient number of emigrants produced to accurately predict spread.
## Table S1. Summary of Sensitivity Analysis on Parameters
**Directory:** peoplepeople -> Math 4630/5630 Homework 4 Solutions Problem Solving ippeople -> Handling Indivisibilitiespeople -> San José State University Social Science/Psychology Psych 175, Management Psychology, Section 1, Spring 2014people -> YiChang Shihpeople -> Marios S. Pattichis image and video Processing and Communication Lab (ivpcl)people -> Peoples Voice Café Historypeople -> Sa michelson, 2011: Impact of Sea-Spray on the Atmospheric Surface Layer. Bound. Layer Meteor., 140 ( 3 ), 361-381, doi: 10. 1007/s10546-011-9617-1, issn: Jun-14, ids: 807TW, sep 2011 Bao, jw, cw fairall, sa michelsonpeople -> Curriculum vitae sara a. Michelsonpeople -> Curriculum document state board of education howard n. Lee, Cpeople -> A hurricane track density function and empirical orthogonal function approach to predicting seasonal hurricane activity in the Atlantic Basin Elinor Keith April 17, 2007 Abstract
**Share with your friends:** |