When two variables are associated, the variables correspond with each other in predictable ways. Associations can either be positive (an increase in one variable corresponds with an increase in the other) or negative (an increase in one variable corresponds with a decrease in the other variable. For instance, the statement people with more education tend to have higher incomes than people with less education is a statement of positive association. People with more education tend to be less prejudiced than people with less education is a statement of negative association. Note that neither statement claims a causal relationship, only that changes in the variables correspond with one another.
Time Order
When a change in one variable causes the change in another variable, logically, the independent variable needs to change before the dependent variable. To invert time order leads to irrational conclusions. For example, does it make sense to conclude that a mousetrap sprang because the mouse died, or that it rained because the ground got wet While these conclusions seem nonsensical (because they are, consider this dynamic. Pornography is oftentimes considered to be a cause of violence against women. Let’s put time into the equation. If there is a relationship, one would expect that a substantial rise in pornography use would be subsequently followed by arise in violence. Has the increasing consumption of pornography over the internet contributed to this dynamic
Thus, identifying causality requires going beyond establishing that two variables are associated with one another, but also that one variable leads to a change in the other variable. The issue of time order seems straightforward enough, yet, establishing time order can be difficult because most data are cross sectional and reflect observations collected atone point in time. As such, asserting the existence of a causal relationship has to be done with care. Even when time order is satisfied, causality may still not be the case (see below. The point we are making here is not that pornography does not potentially contribute directly or indirectly to violence against women (it very well might, but that definitively establishing that it does requires putting a time dimension into the analysis. One possibility, for example, is that the consumption of pornography has impeded what could have been an even more dramatic decline in violence against women.
Categorical variables indicate typologies into which a case might fall. For example, ethnicity (e.g., White, Black, Hispanic, Asian) is a categorical variable. Categorical variables can be further distinguished as being either ordinal or nominal. Categories are ordinal if they can be sequenced in a logical order. For example, a student’s class year in school—freshman, sophomore, junior, or senior, is an ordinal categorical variable. Knowing that one student is a senior and another is a sophomore tells us not only that the two students are indifferent class years, but also that the senior has completed more classes than the sophomore.
Categorical variables are considered nominal if they cannot be sequenced in a logical order. A student’s major is an example of a nominal categorical variable. It categorizes the student and gives information about what a student is studying, but there is no reason why psychology, sociology, history, or biology majors should precede or follow one another. Ethnicity is another example of a nominal variable.
Analyzing Bivariate Relationships Between Two Scale Variables
Therefore, when analyzing scale variables, keep them in their original format, and only create categories when there is clear theoretical or analytic justification for doing so. In this section, we consider how to assess the relationships between two scale variables, situations in which both variables are numeric.
The Advantages of Modeling Relationships in Multiple Regression
Multiple Linear Regression
Normality of Residuals
Building Multiple Variable Models
How does a researcher decide which independent variables to include in a model One seemingly efficient way is to load all of the variables in the data set into the model and see which create significant findings. Although this approach seems appealing (albeit lazy, it suffers from a number of problems.
Collinearity occurs when two or more independent variables contain strongly redundant information. If variables are collinear, there is not enough distinct information in these variables for the multiple regression to operate correctly. A multiple regression with two or more independent variables that measure essentially the same thing will produce errant results. An example is the poverty rate (PVS519) and the percent of children living in poverty (PVS521). These are two different variables, but they are so strongly collinear (correlation of .983) that they are nearly indistinguishable in the regression equation.
Linear regressions can be greatly influenced by outliers—atypical cases. Outliers can pull the equation away from the general pattern, and unduly sway the regression output. But what is the appropriate way to deal with an outlier?