May 17, 2022

Quantitative Research, Correlation

Summery

In this article, we will conduct a statistical regression test on existed gss.sav data set of information to predict "a father's education level (dependent variable) when you know the mother's education level (independent variable). The variable names are paeduc and maeduc. Determine the linear regression equation for predicting the father's education level from the mother's education." (CTU, 2022) Using SPSS, we will see the total variance, “what level of education we predict for the father when the mother has 16 years of education, and we create an output that shows a scatterplot with a line of best fit for your data.” (CTU, 2022)

Analysis and methodology

Our dependent variable is "paeduc" which is the Father's education level, and our predictor is an independent variable called "maeduc" which is the Mother's education level. So looking at the dataset initially, we find that the sample size, here is our predictors, is more than 20, and we are good for the regression analysis. The minimum of an independent variable must be more than 20 to get an accurate regression model. After going to the Linear Regression analysis, we chose our dependent and predictor in the right boxes, checked the necessary options, and saved and then ran the report to see the result. The full SPSS report is attached to the assignment.

Results, reports, charts, and assumptions

We start analyzing the Descriptive results table, which show that the average education level is very close, and the Std. Dev. are pretty close too. Let's meet some assumptions here. 

 

Assumption one, Then we analyze the Correlations table, where the Mothers are 0.637 and very close to the limit of ( < 0.7 ); however, it's an acceptable range correlation. 

Assumption Two, we need our outcome variables to be above 0.33, which is acceptable in our case. For the Fathers, it is 1.000, and for Mothers is .637.

Descriptive Statistics

 

Mean

Std. Deviation

N

“HIGHEST YEAR SCHOOL COMPLETED, FATHER”

11.25

4.163

977

“HIGHEST YEAR SCHOOL COMPLETED, MOTHER”

11.33

3.537

1193

 

Correlations

 

HIGHEST YEAR SCHOOL COMPLETED, FATHER

HIGHEST YEAR SCHOOL COMPLETED, MOTHER

“Pearson Correlation”

“HIGHEST YEAR SCHOOL COMPLETED, FATHER”

1.000

.637

“HIGHEST YEAR SCHOOL COMPLETED, MOTHER”

.637

1.000

“Sig. (1-tailed)”

“HIGHEST YEAR SCHOOL COMPLETED, FATHER”

.

.000

“HIGHEST YEAR SCHOOL COMPLETED, MOTHER”

.000

.

N

“HIGHEST YEAR SCHOOL COMPLETED, FATHER”

977

907

“HIGHEST YEAR SCHOOL COMPLETED, MOTHER”

907

1193

Assumption Three is the chart Probability Probability Plot (P-P Plot). In this plot, we have relatively close points, showing the deviation.

Assumption Five is the Scatterplot that we need our points to fall in a range of -3 to +3 and no point exceeding these numbers, which we don't have a good Scatterplot and it is because of a large sample size.

So, the best fit for the line of scatterplot is the following plot which for the Father it is more closer to the median with more outliners

Assumption Five is our Coefficients that we Tolerance and Variance Inflation Factor (VIF). The Variance shows how much variability this specific variable is not explained with other independent variables in our model if it is above 0.10, which shows that we are in good shape. For our model, it is (1.000), and we are in good condition. We need our VIF to be less than 10, which is good in our model, and it is (1.000). (Dr. Todd Grande, 2014)

Coefficients

Model

Unstd. Coefficients

Std Coeff.

t

Sig.

Correlations

Collinearity

B

Std. Error

Beta

Zero-order

Partial

Part

Tolerance

VIF

1

(Constant)

2.760

.358

 

7.711

.000

 

 

 

 

 

HIGHEST COMPLETED, MOTHER

.749

.030

.637

24.844

.000

.637

.637

.637

1.000

1.000

“a. Dependent Variable: HIGHEST YEAR SCHOOL COMPLETED, FATHER”

 

The Linear Regression Equation

In general, the equation for linear regression is (Y = a + bX), where Y is the dependent variable, X is the independent variable, b is the slope of the line, and a is the y-intercept. We need to analyze the Coefficients model (Table 4) to find our equation's values. The Coefficients table shows that our two variables (Beta) correlation is (0.637). Our Regression equation will be ( Y = (0.76)X + (2.572 years) + e ), in which (e = +-1).

Y = 14.733 + e = (15. 740 , 13.740) = Father's education level. Y is the level of education of the Father, and X (16 years) is the mother's level of education, and we can predict the relation of these two variables (paeduc and maeduc).

So the equation shows almost 40% of explained variance, whereas the unexplained variance is about 58%.

 

References

Dr. Todd Grande - Linear Regression in SPSS. (2014, June 11). YouTube. Retrieved 2022, from https://www.youtube.com/watch?v=U2p16pCHW3c&t=10s

CTU, (2022). Colorado Technical University. Student’s restricted panel. Retrieved 2022, from Colorado Technical University restricted area of assignments.

 

 

 

No comments:

Post a Comment

Big Data migrates to hybrid and multi-cloud environment

 IDC research predicts that the Global Datasphere will grow to 175 Zettabytes by 2025, and China's data sphere is on pace to become th...