U. S. Department of Transportation

Interpreting Regression Output

Download 2.66 Mb.

Page	32/35
Date	02.02.2017
Size	2.66 Mb.
	#16216

1 ... 27 28 29 30 31 32 33 34 35

A Question of Interpretation: Bayesian versus Frequentist Models
Frequentist Econometrics
Bayesian Econometrics

Interpreting Regression Output

The two main outputs of the regression models presented in this report are the coefficients and standard errors. These two values are then used to compute the remaining output presented in the tables (the p-value and the confidence intervals). In general, each piece of output has the same interpretation across models, but where there are differences they will be noted.

The piece of output that receives the most attention is the estimated coefficient. The coefficient represents the impact of the independent variable on the dependent variable. For example, as in Table 183, the estimated coefficient for “# of Aircraft Involved” represents how the dependent variable (the probability of a category A incursion) changes with respect to the value of “# of Aircraft Involved.” In this particular example, the coefficient is positive, indicating that the dependent variable increases as the independent variable increases.

For ordered models, the sign of the coefficient indicates the direction of the effect. That is, positive values indicate that the probability of a category A incident (for ordered models) or a severe incident (for binary models). Negative values indicate a complementary decrease in probability. This convention is not true for the multinomial models. In those instances, it is not the absolute size or sign of a coefficient that is important; rather, it is the size and sign of that coefficient relative to the other coefficients presented in the model that are important.

Coefficients for the binary models are presented as odds ratios. These are direct transformations of the coefficients, but work multiplicatively with respect to the odds of a severe incursion. Thus, if the odds ratio is less than one, the odds of a severe incursion decrease as the independent variable increases. If the odds ratio is greater than one, then the odds of a severe event increase as the independent variable increases.¹⁰⁷

Finally, it is important to note that the coefficients do not directly translate to changes in probability. For all models presented in this report, the coefficients must be combined and then transformed to understand the direct impact on probabilities. In many cases, this transformation is mathematically complex. Thus, for the multinomial models the relevant graphs and tables indicating the change in probability are provided. As the ordered and binary models were not of primary interest, no such calculations were done for those models. Such a calculation could be performed using the coefficients provided in the model.

The second major category of output presented is the standard errors. The standard error measures how precisely the coefficient was estimated. Smaller standard errors indicate that the coefficient was precisely estimated.

The p-value is calculated with the coefficient and the standard error. The p-value measures how likely it is that the estimated coefficient is different from zero (or different from one in the case of an odds ratio). Coefficients of zero indicate that there is no relationship between the given variable and the dependent variable. The P-value approximates how likely it would be to observe the estimated coefficient if the actual value of the coefficient was zero. In other words, the p-value represents how likely it is that the estimated coefficient was a product of a random association between the dependent variable and the independent variable. In general, it is standard practice to accept that a random process did not generate the estimated coefficient if the p-value is less than .05.

The last piece of information presented is the 95% confidence interval (CI). The confidence interval represents an alternative description of the uncertainty surrounding a parameter estimate. It consists of two values, the lower bound (LB) and upper bound (UB). These values represent the endpoints of an interval representing the “most likely” values for the estimated coefficient. The estimated coefficient is the midpoint of this interval and the width of the interval is determined by the standard error. The confidence interval provides two pieces of information. First, the interval represents plausible values of the estimated coefficient, given the data on hand.¹⁰⁸ Second, if the confidence interval contains zero, this is equivalent to a p-value greater than or equal to .05. Thus, the p-value and confidence interval both capture the uncertainty surrounding the coefficient estimate.

A Question of Interpretation: Bayesian versus Frequentist Models

Regardless of the model implemented, there is an overarching concern about the interpretation of results, which cascades backwards into how the models themselves are run. There are two major schools of thought regarding the interpretation of estimation results: Bayesian and Frequentist. Discrete choice models can be implemented in either context. The difference lies in how the results are obtained and interpreted.

Frequentist Econometrics

Most people who have some statistics or econometrics training have been taught frequentist methods. There are a variety of statistical packages that implement a wide array of frequentist methods for any number of models. By and large, frequentist econometrics is the most common type of econometric study. Frequentist techniques in general are outlined in Section 4.1.2.

Treating β as fixed constants is a direct contrast to Bayesian econometrics, as discussed in the following section.

Bayesian Econometrics

The basis of Bayesian econometrics is the use of Bayes’ Rule.¹⁰⁹ Bayes’ Rule can be written as:

For example, if event A is having a disease and event B is a positive result from a test for that disease, P(A|B) is the probability of having the disease given a positive test result and can be calculated as above. The essence of this

formula is that it combines information about the data – in this case the outcome of the test (the factor P(B|A)/P(B)) – and information about the unconditional probability of the outcome – being sick (the factor P(A)). In this example, Bayes’ Rule would be:

The formula above can be extended to a regression context and used to describe a wide variety of models. Suppose the regression model has data y and parameter set θ.¹¹⁰ The above formula can be rewritten as:

This relationship can be simplified, removing extraneous information about y. It reduces exactness of the expression but maintains the most important part of the relationship defined in Bayes’ Rule (i.e., the proportional relationship between  and y). When simplified, the relationship is expressed as:

p(θ) is referred to as the “prior distribution” and represents the information available about θ before looking at the data. This information can come from previous research or the researcher’s informed beliefs. p(y|θ) is called the “likelihood” and represents the probability distribution of the data given a parameter set. Finally, p(θ|y) is called the “posterior distribution” and captures all available information on θ – information available from the data and from the prior distribution.¹¹¹ This framework can be used to estimate parameters for a variety of models based on differing likelihood functions.

As Koop notes, the probability distribution p(θ|y) is “of fundamental interest for an econometrician interested in using data to learn about parameters in a model.”¹¹² Bayesian methods focus on the interpretation and analysis of p(θ|y) to understand the relationship between θ and y.

Directory: lib -> 46000
lib -> Strategic Plan South Sudan Country Office Context
lib -> The World Order of Bahá’u’lláh Selected Letters
lib -> 1. naclo 2015 – North American Computational Linguistics Olympiad
lib -> Ieee/iee electronic Library (iel)
lib -> Lnpa wg position on Porting of Telephone Numbers Used by Voip service Providers
lib -> 1. naclo 2017 – North American Computational Linguistics Olympiad
lib -> [0] B. Hardware [2]
lib -> [1] B. Hardware [1]
lib -> Chem I honors Unit 2 Notes: Numbers in Chemistry Measurement
46000 -> Measuring the effects of aborted takeoffs and landings on traffic flow at jfk

Download 2.66 Mb.

Share with your friends:

1 ... 27 28 29 30 31 32 33 34 35

U. S. Department of Transportation

Interpreting Regression Output

Interpreting Regression Output

A Question of Interpretation: Bayesian versus Frequentist Models

Frequentist Econometrics

Bayesian Econometrics