The equations above outline the two big departures between the frequentist and Bayesian schools of thought. First, the two methods generate different results. The result of Bayesian estimation is the posterior distribution p(θ|y) and is a probability distribution for θ. There is no single value for θ, rather each value has a probability of being observed. The probability of observation is informed by the likelihood function (i.e., the data) and the prior distribution. A result with higher variances indicates increased uncertainty about the probability of any single value. This distribution can be summarized through statistics such as the mean, median, or variance, but the fundamental result is a probability distribution.
This is subtly different than the frequentist result, which is a point estimate of the “true” value of . That is, in frequentist statistics, has a value that can be determined to some precision given the data (an estimate of ), and there is variance around that point that can be characterized as a function of the data. A wider variance around implies less certainty about the estimate, much as increased variance in a Bayesian posterior implies increased uncertainty about each possible value. For frequentists, the fundamental result is this point estimate; this is contrasted with the Bayesian fundamental result, which is a probability distribution.
The second difference that these equations illuminate is the inclusion of prior information. The inclusion of prior information in the estimation of parameters is unique to Bayesian analysis. The inclusion of the prior is a way to introduce additional information not contained in the data into the estimation. These prior beliefs about the distribution of the parameters can be highly specific or only loosely defined. In the extreme case, the researcher can choose an uninformative prior, essentially saying that there are no prior beliefs. This is akin to specifying a distribution with infinite variance for the prior and forces the estimation to rely completely on the data. When an uninformative prior is specified, the estimation results are similar to frequentist estimations in the sense that they rely solely on the data (i.e., the likelihood function).
A final point worth making is a similarity between Bayesian and frequentist methods. Both discussions above invoke the term “likelihood.” In fact, both methods employ the same likelihood function. The likelihood in this case merely characterizes the probability of observing the data, given a set of parameters. The difference lies in how this likelihood is treated. For Bayesians, it forms one part of the posterior distribution. Frequentists seek to find the that maximizes this function.
The previous sections outlined the basic structures of the Bayesian and Frequentist frameworks and how they compare to one another. Each paradigm has practical advantages and disadvantages when compared with the other. Bayesian methods can be more informative on small samples. Bayesian analysis can also provide more theoretically pleasing estimation results.
Bayesian analysis can have some advantages where the data do not provide much information with which to estimate parameters (namely due to the lack of prior information being included in frequentist analysis). One instance of this is when examining data with small sample sizes. Xie et al. address the small sample size question and compare the results from the Bayesian analysis to a frequentist analysis.113
The authors performed their comparison in the context of an ordered probit model.114 The authors find that when using the full sample of 76,994 observations, a Bayesian model with uninformative priors (i.e., p(θ) contains very little information and the data is relied upon to provide almost all of the information about θ) is almost identical to the frequentist model. A variety of other priors were fit to the entire sample and all models provided similar results to the frequentist model. The authors then examined the models on a subsample of 100 records. A frequentist model and a Bayesian model with an informative prior were fit to this small sample and compared to the full sample results. The Bayesian model with the informative prior provided results that were significantly closer to those observed on the entire sample.
This study reveals two important points regarding the use of Bayesian in an applied sense. First, Bayesian methods can provide real gains when examining small samples. While this may not be a relevant advantage given the current objective of modeling incursion severity across the many thousands of incursions-to-date, further rounds of research may wish to analyze small subsamples. Secondly, the advantages of Bayesian hinge upon the definition of the priors. Given an uninformative prior, the Bayesian results mimicked the frequentist results. Thus, when examining runway incursion severity, a relatively unexplored field with few prior beliefs about the impacts of variables, Bayesian methods may not provide a substantial advantage.
In addition to the beneficial small sample properties, Bayesian analysis is more theoretically pleasing. As an example, consider Griffiths et al; the authors compare Bayesian estimation with a variety of priors to the standard frequentist estimation results in the context of a probit model of mortgage types.115 In this case, the researchers used a truncated uniform prior distribution. That is, the authors had the prior belief that a coefficient is positive, and all positive values are equally likely. The mean and variance of the posterior distribution were similar to the results from the frequentist estimation. However, the Bayesian results were truncated at zero, whereas the frequentist results imply a distribution that normally distributed around the estimate, regardless of where it falls. For a variable that must be positive, this frequentist result may be incorrect. This may be especially true for variables with small effects, that is, for variables with estimated effects that are not very different from zero. The Bayesian estimates, by virtue of being truncated at zero, have a slightly different distribution – the mean and variance may be similar, but impossible values will have zero probability. Figure 67 demonstrates this graphically.
0
1
β
θ
Figure - Bayesian versus Frequentist Parameter Estimates
The red bar (top) displays a hypothetical Bayesian estimate. The width of the bar represents the distribution for the parameter estimated. Note that the bar is truncated at zero, indicating that the distribution of does not extend past zero in that direction. The blue bar (bottom) represents the variance around a frequentist point estimate, . The variance can extend outside of the reasonable range for the parameter, in this case extending to negative values. Finally, note that the point estimate is equal to the mean of the distribution of (represented by a vertical line in the bar). This need not be the case in general.
This discrepancy – truncated versus unconstrained – extends to predicted probabilities, as well. The use of a probit model confines the frequentist point estimate of the probability to be between zero and one. However, there is some variance about that point which may include illegitimate values (probabilities outside the zero to one range).116 The predicted probabilities obtained from the Bayesian estimation were truncated at zero and one respectively, constraining results to be within the valid interval. Figure 68 provides a simplified graphical explanation of this phenomenon. Griffiths et al. note that this is not a result of using a truncated prior but rather to the differences in how estimations are generated for Bayesian and frequentist methods.117
Pf
Pb
0
1
Figure - Bayesian versus Frequentist Probability Estimates
The red bar (top) represents the probability estimate from a Bayesian estimation while the blue bar (bottom) represents that from a frequentist. The frequentist point estimate of the probability, Pf, is confined to be in the valid range of zero to one. However, the variance around this point (representing uncertainty of the estimate) can extend into unreasonable ranges. This does not invalidate the frequentist estimate, and is merely an undesirable side effect of the frequentist interpretation. The Bayesian probability estimate, Pb, is again a distribution. This distribution is truncated to remain in the valid range of zero to one.
The ability to confine predicted probabilities to the appropriate bounded interval is advantageous. Additionally, if priors about the sign but not magnitude of a coefficient exist, Bayesian methods offer superior estimation results. However, as noted earlier, few if any priors exist in the runway incursion context.118 It is also unclear how useful predicted probabilities may be in this context. Regardless, Bayesian methods will likely provide results that are theoretically superior compared to the frequentist methods. The degree of superiority will however vary, and in some situations, can be quite small.
However, Bayesian methods are more difficult to implement than frequentist methods. First, inference about the effects of individual components of θ is difficult using the posterior distribution, leading to less clear policy direction. Further complicating matters is that p(θ|y) may not be written as a simple formula (i.e., there is no closed form for p(θ|y)). In these cases, simulation is required to deduce p(θ|y), requiring additional programming, computing resources, and time.
Comparative Characteristics of Frequentist Models
Frequentist methods often are at a disadvantage where Bayesian methods are advantageous, and vice versa. Frequentist estimation, by relying solely on the data to produce results, is subject to the weakness in that data. However, frequentist methods do not require prior distributions on the parameters. This has the advantage of not requiring the researcher to specify a prior distribution when no reasonable prior expectations exist. Additionally, Bayesian estimation with an uninformative prior essentially collapses to the frequentist estimate. That is, for a Bayesian without any information from a prior distribution, only information in the data can be used to estimate a result, which is exactly the frequentist technique.
Frequentist methods also have advantages in terms of implementation. Many common statistical packages implement frequentist methods for the models under consideration. Though they may require significant computing power, the requirements are substantially less than those required by Bayesian methods with simulation. The availability of “canned” implementations of frequentist methods also allows different model specifications to be tested quickly. Conversely, a significant portion of resources would be dedicated to implementing Bayesian methods, restricting the focus to a single model with one or two sets of explanatory variables that, given the lack of informative priors for runway incursions, would likely return the same results as frequentist methods.
Share with your friends: |