Tuesday, March 4, 2014

Correlation and Regression with SPSS

The purpose of this paper is to state the assumptions for the Pearson correlation coefficient and a simple linear regression, develop null and alternative hypotheses, determine whether to reject or retain the null hypothesis, report on the SPSS analysis, generate a scatterplot and syntax and output files in SPSS.

Statistical Assumptions
The two statistical assumptions of the Pearson correlation are that the variables are bivariately normally distributed, the cases represent randomly selected samples from the population, and scores on variables for one case are independent of scores on these variables for other cases (Green & Salkind, 2014).

Brief Analysis

The research question is: Does age and the number of hours worked last week relate in a statistically significant linear fashion?

The null hypothesis is: Ho: ρ= 0; There is no correlation between the variables.

The alternative hypothesis is: H1: ρ ≠ 0; there is a real correlation between the variables.

The independent variable is age and the dependent variable is hours worked last week. Correlation coefficients were computed among the two continuous variables of age and hours worked last week. To control for Type 1 error across the two correlations, I utilized the Bonferroni approach to calculate a p value of less than .025 (.05/2 = .025) was required for significance. The results in the table 1 shows that both correlations were statistically significant at the .01 level of significance. I found r(1483) = .32, p > .000. There is a significant negative relationship between the age of participants and the number of hours worked last week. I reject the null hypothesis. The effect size is .1

A linear regression analysis was conducted to evaluate the prediction of age as it affects hours worked last week. The scatter plot for the two variables, as shown in Figure 1 indicates that the two variables are linearly related such that as age increases, the number of hours worked last week decreases.

Syntax and Output Files
 Notes Output Created 01-FEB-2014 09:14:02 Comments Input Data C:\Users\Deborah\Desktop\Stats\gss04student_corrrected.sav Active Dataset DataSet1 Filter Weight Split File N of Rows in Working Data File 1500 Missing Value Handling Definition of Missing User-defined missing values are treated as missing. Cases Used Statistics are based on all cases with valid data for all variables in the model. Syntax UNIANOVA INCOME BY RACE   /METHOD=SSTYPE(3)   /INTERCEPT=INCLUDE   /POSTHOC=RACE(TUKEY QREGW C)   /EMMEANS=TABLES(RACE)   /PRINT=ETASQ HOMOGENEITY DESCRIPTIVE   /CRITERIA=ALPHA(.05)   /DESIGN=RACE. Resources Processor Time 00:00:00.08 Elapsed Time 00:00:00.08

Correlations

CORRELATIONS
/VARIABLES=AGE HRS1
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE.
 Descriptive Statistics Mean Std. Deviation N AGE OF RESPONDENT 46.22 16.679 1495 NUMBER OF HOURS WORKED LAST WEEK 26.94 23.570 1490

Table 1.
 Correlations AGE OF RESPONDENT NUMBER OF HOURS WORKED LAST WEEK AGE OF RESPONDENT Pearson Correlation 1 -.325** Sig. (2-tailed) .000 N 1495 1485 NUMBER OF HOURS WORKED LAST WEEK Pearson Correlation -.325** 1 Sig. (2-tailed) .000 N 1485 1490 **. Correlation is significant at the 0.01 level (2-tailed).

GRAPH
/SCATTERPLOT(MATRIX)=AGE HRS1
/MISSING=LISTWISE.

Graph
[DataSet1] C:\Users\Deborah\Desktop\Stats\gss04student_corrrected.sav GET
FILE='C:\Users\Deborah\Desktop\Stats\gss04student_corrrected.sav'.
DATASET NAME DataSet1 WINDOW=FRONT.
CORRELATIONS
/VARIABLES=AGE HRS1
/PRINT=TWOTAIL NOSIG
/STATISTICS DESCRIPTIVES
/MISSING=PAIRWISE.

Correlations
 Notes Descriptive Statistics Mean Std. Deviation N AGE OF RESPONDENT 46.22 16.679 1495 NUMBER OF HOURS WORKED LAST WEEK 26.94 23.570 1490

 Correlations AGE OF RESPONDENT NUMBER OF HOURS WORKED LAST WEEK AGE OF RESPONDENT Pearson Correlation 1 -.325** Sig. (2-tailed) .000 N 1495 1485 NUMBER OF HOURS WORKED LAST WEEK Pearson Correlation -.325** 1 Sig. (2-tailed) .000 N 1485 1490 **. Correlation is significant at the 0.01 level (2-tailed).
REGRESSION
/DESCRIPTIVES MEAN STDDEV CORR SIG N
/MISSING LISTWISE
/STATISTICS COEFF OUTS CI(95) R ANOVA
/CRITERIA=PIN(.05) POUT(.10)
/NOORIGIN
/DEPENDENT HRS1
/METHOD=ENTER AGE.

Regression
 Descriptive Statistics Mean Std. Deviation N NUMBER OF HOURS WORKED LAST WEEK 26.97 23.572 1485 AGE OF RESPONDENT 46.22 16.697 1485

 Correlations NUMBER OF HOURS WORKED LAST WEEK AGE OF RESPONDENT Pearson Correlation NUMBER OF HOURS WORKED LAST WEEK 1.000 -.325 AGE OF RESPONDENT -.325 1.000 Sig. (1-tailed) NUMBER OF HOURS WORKED LAST WEEK . .000 AGE OF RESPONDENT .000 . N NUMBER OF HOURS WORKED LAST WEEK 1485 1485 AGE OF RESPONDENT 1485 1485

 Variables Entered/Removeda Model Variables Entered Variables Removed Method 1 AGE OF RESPONDENTb . Enter a. Dependent Variable: NUMBER OF HOURS WORKED LAST WEEK b. All requested variables entered.

 Model Summary Model R R Square Adjusted R Square Std. Error of the Estimate 1 .325a .105 .105 22.302 a. Predictors: (Constant), AGE OF RESPONDENT

 ANOVAa Model Sum of Squares df Mean Square F Sig. 1 Regression 86941.814 1 86941.814 174.798 .000b Residual 737619.214 1483 497.383 Total 824561.028 1484 a. Dependent Variable: NUMBER OF HOURS WORKED LAST WEEK b. Predictors: (Constant), AGE OF RESPONDENT

 Coefficientsa Model Unstandardized Coefficients Standardized Coefficients t Sig. 95.0% Confidence Interval for B B Std. Error Beta Lower Bound Upper Bound 1 (Constant) 48.162 1.704 28.267 .000 44.820 51.504 AGE OF RESPONDENT -.458 .035 -.325 -13.221 .000 -.526 -.390 a. Dependent Variable: NUMBER OF HOURS WORKED LAST WEEK

Charts (Figure 1.) 