Partial Regression

library(Keng)
data(depress)

Aiming to help researchers to understand the role of PRE in regression, this vignette will present several ways of examining the unique effect of problem-focused coping(cope_task1) on depression(depr1) controlling for emotion-focused coping(cope_emo1) and avoidance coping(cope_avo1) using the first-wave data subset in internal data depress.

Four ways will be present in the following:

  • Multiple regression with t-test.
  • Hierarchical regression with F-test.
  • The PRE of the single parameter, the partial regression coefficient of problem-focused coping.
  • One-predictor regression using the residuals.

Multiple regression with t-test

Firstly, examine the unique effect of cope_task1 using t-test. Model C (Compact model) regresses depr1 on cope_emo1 and cope_avo1. Model A(Augmented model) regresses depr1 on cope_task1, cope_emo1, and cope_avo1.

# multiple regression
fitC <- lm(depr1 ~ cope_emo1 + cope_avo1, depress)
fitA <- lm(depr1 ~ cope_task1 + cope_emo1 + cope_avo1, depress)
summary(fitA)
#> 
#> Call:
#> lm(formula = depr1 ~ cope_task1 + cope_emo1 + cope_avo1, data = depress)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.68456 -0.22996 -0.00466  0.22131  0.98298 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  2.21320    0.17230  12.845  < 2e-16 ***
#> cope_task1  -0.19433    0.03414  -5.692 5.40e-08 ***
#> cope_emo1    0.22592    0.03680   6.139 5.67e-09 ***
#> cope_avo1   -0.09672    0.04221  -2.291   0.0232 *  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3134 on 170 degrees of freedom
#>   (11 observations deleted due to missingness)
#> Multiple R-squared:  0.3233, Adjusted R-squared:  0.3114 
#> F-statistic: 27.08 on 3 and 170 DF,  p-value: 2.293e-14

As shown, the partial regression coefficient of cope_task1 is -0.16705, t(90) = -3.349, p = 0.00119.

Hierarchical regression with F-test

Secondly, examine the unique effect of cope_task1 using hierarchical regression and its F-test. In SPSS, this F-test is presented as the F-test for R2 change.

anova(fitC, fitA)
#> Analysis of Variance Table
#> 
#> Model 1: depr1 ~ cope_emo1 + cope_avo1
#> Model 2: depr1 ~ cope_task1 + cope_emo1 + cope_avo1
#>   Res.Df    RSS Df Sum of Sq      F  Pr(>F)    
#> 1    171 19.883                                
#> 2    170 16.701  1    3.1827 32.397 5.4e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As shown, F (1, 90) = 11.217, p = 0.001185. This F-test is equivalent to the t-test above, since they both examine the unique effect of cope_task1. In the case that the df of F’s numerator is 1, F = t2, and t’s df equals to the df of F’s denominator.

The PRE of the single parameter

Thirdly, examine the unique effect of cope_task1 using PRE.

print(compare_lm(fitC, fitA), digits = 3)
#>                      Baseline        C        A  A vs. C
#> SSE                      24.7 1.99e+01 1.67e+01 3.18e+00
#> n                       174.0 1.74e+02 1.74e+02 1.74e+02
#> Number of parameters      1.0 3.00e+00 4.00e+00 1.00e+00
#> df                      173.0 1.71e+02 1.70e+02 1.00e+00
#> R_squared                  NA 1.94e-01 3.23e-01 1.29e-01
#> f_squared                  NA 2.41e-01 4.78e-01 1.91e-01
#> R_squared_adj              NA 1.85e-01 3.11e-01       NA
#> PRE                        NA 1.94e-01 3.23e-01 1.60e-01
#> F(PA-PC,n-PA)              NA 2.06e+01 2.71e+01 3.24e+01
#> p                          NA 9.41e-09 2.29e-14 5.40e-08
#> PRE_adj                    NA 1.85e-01 3.11e-01 1.55e-01
#> power_post                 NA 1.00e+00 1.00e+00 1.00e+00

As shown, F (1, 90) = 11.217, p = 0.00119. The F-test of PRE is equivalent to the F-test of anova above.

One-predictor regression using the residuals

Fourthly, examine the unique effect of cope_task1 using residuals. Regress depr1 on cope_emo1 and cope_avo1, and attain the residuals of depr1, dm_res, which partials out the effect of cope_emo1 and cope_avo1 on depr1.

Regress cope_task1 on cope_emo1 and cope_avo1, and attain the residuals of cope_task1, pm_res, which partials out the effect of cope_emo1 and cope_avo1 on cope_task1.

Correlate dm_res with pm_res, we attain the partial correlation of depr1 and cope_task1.

dm_res <- lm(depr1 ~ cope_emo1 + cope_avo1, depress)$residuals
pm_res <- lm(cope_task1 ~ cope_emo1 + cope_avo1, depress)$residuals
resDat <- data.frame(dm_res, pm_res)
cor(dm_res, pm_res)
#> [1] -0.4000835

As shown, the partial correlation of depr1 and cope_task1 is -0.3329009.

Regress dm_res on pm_res, and we attain the unique effect of cope_task1 on depr1.

summary(lm(dm_res ~ pm_res, data.frame(dm_res, pm_res)))
#> 
#> Call:
#> lm(formula = dm_res ~ pm_res, data = data.frame(dm_res, pm_res))
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.68456 -0.22996 -0.00466  0.22131  0.98298 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)    
#> (Intercept) -1.301e-17  2.362e-02   0.000        1    
#> pm_res      -1.943e-01  3.394e-02  -5.725 4.51e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3116 on 172 degrees of freedom
#> Multiple R-squared:  0.1601, Adjusted R-squared:  0.1552 
#> F-statistic: 32.78 on 1 and 172 DF,  p-value: 4.51e-08

As shown, the regression coefficient of pm_res equals the partial regression coefficients of cope_task1 in fitA. However, their ts, as well as ps, are different. Why? Let’s examine the unique effect of pm_res using PRE. Note that the F-test of one parameter’s PRE is equivalent to the t-test of this parameter. In addition, Model A is relative to Model C. With your statistical purpose changing, the referents of Model C and Model A change.

fitC <- lm(dm_res ~ 1, resDat)
fitA <- lm(dm_res ~ pm_res, resDat)
print(compare_lm(fitC, fitA), digits = 3)
#>                      Baseline     C        A  A vs. C
#> SSE                      19.9  19.9 1.67e+01 3.18e+00
#> n                       174.0 174.0 1.74e+02 1.74e+02
#> Number of parameters      1.0   1.0 2.00e+00 1.00e+00
#> df                      173.0 173.0 1.72e+02 1.00e+00
#> R_squared                  NA   0.0 1.60e-01 1.60e-01
#> f_squared                  NA   0.0 1.91e-01 1.91e-01
#> R_squared_adj              NA   0.0 1.55e-01       NA
#> PRE                        NA   0.0 1.60e-01 1.60e-01
#> F(PA-PC,n-PA)              NA    NA 3.28e+01 3.28e+01
#> p                          NA    NA 4.51e-08 4.51e-08
#> PRE_adj                    NA   0.0 1.55e-01 1.55e-01
#> power_post                 NA    NA 1.00e+00 1.00e+00

Compare the PRE of pm_res with the PRE of cope_task1. It’s shown that two PREs are equivalent. However, df2s are different, which make Fs, as well as ps, different. In other words, though the unique effect of cope_task1 is constant, the compact models and augmented models used to evaluate its significance are different, which lead to different comparison conclusions (i.e., F-test and t-test results). Rethinking the F-test formula of PRE, we reach the following conclusion: With PRE being equal, the significance of PRE is determined by the df of Model C and the df-change of Model A against Model C.

Therefore, given the PRE of a specific set of predictor(s), the power of this specific set of predictor(s) are determined by the sample size n and the number of parameters [and hence the total number of predictor(s)] in the regression model. Similarly, given the PRE of a specific set of predictor(s), the required power for this specific set of predictor(s), and the number of parameters [and hence the total number of predictor(s)] in the regression model, we could compute the required sample size n.