Table of Contents
7.4 Bivariate analysis of ordinal data
Recall that by definition, ordinal measurement allows units to be placed into ranked categories in terms of the degree that they possess an attribute. However, the exact distance between units cannot be determined.
Examples of ordinal measurement scales: (1) strongly agree (2) agree (3) disagree (4) strongly disagree (1) not important (2) a little important (3) important (4) very important (1) less than $10,000 (2) $10,001 - $20,000 (3) $20,001 - $40,000 (4) $40,000 +
Like nominal level variables, the joint distribution of two variables measured at the ordinal level is displayed through the use of a contingency table.
Ordinal measurement provides additional information beyond nominal measurement in that it allows units to be ranked in terms of the degree they possess the attribute being measured. With nominal measurement, a unit either has the attribute, or does not have it.
This additional information allows for a more precise hypothesis in that a crude assumption about the direction of the relationship between the two indicators can now be made.
H0: A person=s level of political conservatism is not associated with his/her attitude toward government welfare spending.H1: The more conservative a person=s political beliefs, the more he/she will believe that government spending on welfare is excessive.
7.5 Bivariate analysis of interval or ratio level data
With variables, or indicators, measured at the interval & ratio level, one cannot only rank units, but calculate the exact difference between them in regard to the degree to which they possess the attribute being measured.
For example, Interval Indicator:IQ Score Person 1 120 Person 2 160Difference 40
Ratio Indicator:Number of Crimes Committed in 1999State 1 1200State 2 800Difference 400
Bivariate analysis with using interval or ratio level measures enters the realm of parametric statistics.
With this level of data, bivariate analysis allows the researcher to address all the questions that could be addressed with ordinal data:
1. Are the two variables associated in the broader study population?
2 How strongly are the two variables correlated?
3. What is the direction of the relationship?
Being able to measure the exact difference between units allows the researcher to answer a number of additional questions about the nature of the relationship between two variables or indicators.
With parametric statistics it becomes possible to get a more precise estimate of the effect that one variable has on another variable. First, we can estimate the amount of change in the dependent variable that occurs with a 1 unit change in the independent variable.
For example, let=s say that we wanted to examine the relationship between time spent studying and exam scores and we have measured time spent studying in hours and have the exact exam scores on a 0-100 scale.
Using a parametric statistical technique known as bivariate regression analysis, we could estimate the number of additional points on the exam that a student would gain from 1 additional hour of studying.
A second additional question that can be addressed with interval and ratio level data is the determination of what is known as the Afunctional form@ of the relationship between two variables.
For example, is the relationship between time spent studying and exam score a linear relationship?
Linear Relationship -- When the amount of change in the dependent variable associated with a 1 unit change in the independent variable is constant across all values of the independent variable.
If the relationship between time spent studying and exam score is linear, then the increase in the exam score that one would get by moving from 4 to 5 hours studying would be the same as what one would get by moving from 9 to 10 hours.
With a linear relationship, the gain in exam score would be constant as number of hours studying increased.
It is possible that the relationship between time spent studying and exam score is a “nonlinear” relationship as well.
Nonlinear Relationship -- When the amount of change in the dependent variable associated with a 1 unit change in the independent variable is not constant across all values of the independent variable.
If there were a nonlinear relationship, the increase in exam score that one would get by moving from 4 to 5 hours of studying would not be the same as one would get by moving from 9 to 10 hours of studying.
These additional questions about the relationships between 2 interval or ratio level variables cannot be addressed with nominal or ordinal data. This illustrates why parametric statistics are more powerful than nonparametric statistics such as chi-square, Cramers V, Gamma, and it=s associated Z test.
Parametric statistics provide more information and allow additional research questions to be addressed.
The joint distribution between 2 variables measured at the interval or ratio level is not portrayed through a contingency table, but rather what is known as a scatterplot.
With a Scatterplot, each unit is plotted on the Y & X axes based on the combination of scores on the dependent variable and independent variable.
In generating a scatterplot to examine the joint distribution of two interval or ratio variables, we are still interested in testing hypotheses to see if we can infer a relationship among the population of units.
7.6 Multivariate Analysis
As previously stated, multivariate analysis examines the statistical relationships between a dependent variable and 2 or more independent variables at the same time.
We have also discussed how in social science research, rarely, if ever, does a single independent variable provide a complete causal explanation for a dependent variable.
Usually, a dependent variable (e.g. poverty) can be viewed as being caused by a number of independent variables.
For example, let’s say that we want to explain why some states have a higher % of people living in poverty compared to other states. Stated another way, we wish to identify independent variables that account for the pattern of variation in poverty rates across the 50 states.
It seems logical that the variation in poverty among states is the result of more than 1 independent variable.
Poverty can be viewed as being caused by: (a) wage levels in the state economy; (b) levels of unemployment in the state economy; and (c) the breakup of two parent families.
Because multiple independent variables are needed to explain a pattern of behavior, some type of multivariate analysis is virtually always used in research practice. The most common type of multivariate statistical analysis used in social science research is known as regression analysis.
Regression Analysis -- A statistical analysis that allows researchers to test for statistical relationships between a dependent variable and set of independent variables, and estimate the independent effects of each independent variable on the dependent variable.
In regression analysis, the independent causal effect of each independent variable can be sorted out and estimated by solving the following equation:
Y = a + b1x1 + b2x2 + b3x3 ….. + bkxk Where:a = the interceptb = the slope or unstandardized bX= the observed value of the indicator measuring an independent variable
Thus, in solving this regression equation and computing a regression analysis, the researcher calculates a “b” coefficient for each independent variable.
b is also known as an “unstandardized b” coefficient
Simply put, the unstandardized b coefficient represents the effect of an independent variable on the dependent variable, net of the effects of the other independent variables.
It is interpreted as the average change in the dependent variable associated with a 1 unit change in a particular independent variable, statistically controlling for the effects of the other independent variables.
Going back to our example, we have an indicator of a dependent variable (Y) -- the poverty rate for the 50 states;
And, we have indicators for 3 independent variables:X1. per capita incomeX2. the unemployment rateX3. % female-headed households
If we computed a regression model with these indicators, the unstandardized b coefficients represent the unique effects of each of the independent variables on the poverty rate.
Thus, if a relationship between a dependent variable and independent variable is nonspurious, the unstandardized b coefficient Beta should indicate a relationship, after the effects of the other independent variables have been accounted for, or “partialed out.”
This is known as “statistical control.” This is one benefit of regression analysis in that it allows you to test for spurious relationships by statistically controlling for the effects of the other independent variables.
Multiple R-Square -- statistic in regression analysis that measures the combined effects of all independent variables in a regression model. The closer r-square is to 1, the better the causal explanation provided by a set of independent variables.
t-test for Regression Coefficient — tests the null hypothesis that an independent variable has no effect on the dependent variable in the larger study population
Unstandardized (b) Regression Coefficient — Indicates the average change in the dependent variable associated with a 1 unit change in the dependent variable, statistically controlling for the other independent variables.
Standardized Beta Coefficient — Is used to compare the strength of the effect of each independent variable on the dependent variable. The independent variable with the largest standardized Beta (independent of the sign) has the strongest effect.
|
Author: Department of Sociology
|