Partial Regression

library(Keng)
data(depress)

Aiming to help researchers to understand the role of PRE in regression, this vignette will present several ways of examining the unique effect of problem-focused coping(pm1) on depression(dm1) controlling for emotion-focused coping(em1) and avoidance coping(am1) using the first-wave data subset in internal data depress.

Four ways will be present in the following:

  • Multiple regression with t-test.
  • Hierarchical regression with F-test.
  • The PRE of the single parameter, the partial regression coefficient of problem-focused coping.
  • One-predictor regression using the residuals.

Multiple regression with t-test

Firstly, examine the unique effect of pm1 using t-test. Model C (Compact model) regresses dm1 on em1 and am1. Model A(Augmented model) regresses dm1 on pm1, em1, and am1.

# multiple regression
fitC <- lm(dm1 ~ em1 + am1, depress)
fitA <- lm(dm1 ~ pm1 + em1 + am1, depress)
summary(fitA)
#> 
#> Call:
#> lm(formula = dm1 ~ pm1 + em1 + am1, data = depress)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.63018 -0.24748 -0.00681  0.21045  1.01320 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  2.10497    0.25517   8.249 1.25e-12 ***
#> pm1         -0.16705    0.04988  -3.349  0.00119 ** 
#> em1          0.19504    0.05712   3.415  0.00096 ***
#> am1         -0.06675    0.05992  -1.114  0.26822    
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.337 on 90 degrees of freedom
#> Multiple R-squared:  0.2491, Adjusted R-squared:  0.224 
#> F-statistic: 9.949 on 3 and 90 DF,  p-value: 9.928e-06

As shown, the partial regression coefficient of pm1 is -0.16705, t(90) = -3.349, p = 0.00119.

Hierarchical regression with F-test

Secondly, examine the unique effect of pm1 using hierarchical regression and its F-test. In SPSS, this F-test is presented as the F-test for R2 change.

anova(fitC, fitA)
#> Analysis of Variance Table
#> 
#> Model 1: dm1 ~ em1 + am1
#> Model 2: dm1 ~ pm1 + em1 + am1
#>   Res.Df    RSS Df Sum of Sq      F   Pr(>F)   
#> 1     91 11.498                                
#> 2     90 10.224  1    1.2743 11.217 0.001185 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As shown, F (1, 90) = 11.217, p = 0.001185. This F-test is equivalent to the t-test above, since they both examine the unique effect of pm1. In the case that the df of F’s numerator is 1, F = t2, and t’s df equals to the df of F’s denominator.

The PRE of the single parameter

Thirdly, examine the unique effect of pm1 using PRE.

print(compare_lm(fitC, fitA), digits = 3)
#>                      Baseline        C        A  A vs. C
#> SSE                      13.6 1.15e+01 1.02e+01  1.27427
#> df                       93.0 9.10e+01 9.00e+01  1.00000
#> Number of parameters      1.0 3.00e+00 4.00e+00  1.00000
#> R_squared                  NA 1.55e-01 2.49e-01  0.09359
#> f_squared                  NA 1.84e-01 3.32e-01  0.12464
#> R_squared_adj              NA 1.37e-01 2.24e-01       NA
#> PRE                        NA 1.55e-01 2.49e-01  0.11082
#> F(PA-PC,n-PA)              NA 8.38e+00 9.95e+00 11.21719
#> p                          NA 4.58e-04 9.93e-06  0.00119
#> PRE_adj                    NA 1.37e-01 2.24e-01  0.10094
#> power_post                 NA 9.59e-01 9.97e-01  0.91202

As shown, F (1, 90) = 11.217, p = 0.00119. The F-test of PRE is equivalent to the F-test of anova above.

One-predictor regression using the residuals

Fourthly, examine the unique effect of pm1 using residuals. Regress dm1 on em1 and am1, and attain the residuals of dm1, dm_res, which partials out the effect of em1 and am1 on dm1.

Regress pm1 on em1 and am1, and attain the residuals of pm1, pm_res, which partials out the effect of em1 and am1 on pm1.

Correlate dm_res with pm_res, we attain the partial correlation of dm1 and pm1.

dm_res <- lm(dm1 ~ em1 + am1, depress)$residuals
pm_res <- lm(pm1 ~ em1 + am1, depress)$residuals
resDat <- data.frame(dm_res, pm_res)
cor(dm_res, pm_res)
#> [1] -0.3329009

As shown, the partial correlation of dm1 and pm1 is -0.3329009.

Regress dm_res on pm_res, and we attain the unique effect of pm1 on dm1.

summary(lm(dm_res ~ pm_res, data.frame(dm_res, pm_res)))
#> 
#> Call:
#> lm(formula = dm_res ~ pm_res, data = data.frame(dm_res, pm_res))
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -0.63018 -0.24748 -0.00681  0.21045  1.01320 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)   
#> (Intercept) -3.248e-17  3.438e-02   0.000  1.00000   
#> pm_res      -1.670e-01  4.933e-02  -3.386  0.00104 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 0.3334 on 92 degrees of freedom
#> Multiple R-squared:  0.1108, Adjusted R-squared:  0.1012 
#> F-statistic: 11.47 on 1 and 92 DF,  p-value: 0.001044

As shown, the regression coefficient of pm_res equals the partial regression coefficients of pm1 in fitA. However, their ts, as well as ps, are different. Why? Let’s examine the unique effect of pm_res using PRE. Note that the F-test of one parameter’s PRE is equivalent to the t-test of this parameter. In addition, Model A is relative to Model C. With your statistical purpose changing, the referents of Model C and Model A change.

fitC <- lm(dm_res ~ 1, resDat)
fitA <- lm(dm_res ~ pm_res, resDat)
print(compare_lm(fitC, fitA), digits = 3)
#>                      Baseline    C        A  A vs. C
#> SSE                      11.5 11.5 10.22400  1.27427
#> df                       93.0 93.0 92.00000  1.00000
#> Number of parameters      1.0  1.0  2.00000  1.00000
#> R_squared                  NA  0.0  0.11082  0.11082
#> f_squared                  NA  0.0  0.12464  0.12464
#> R_squared_adj              NA  0.0  0.10116       NA
#> PRE                        NA  0.0  0.11082  0.11082
#> F(PA-PC,n-PA)              NA   NA 11.46646 11.46646
#> p                          NA   NA  0.00104  0.00104
#> PRE_adj                    NA  0.0  0.10116  0.10116
#> power_post                 NA   NA  0.91784  0.91784

Compare the PRE of pm_res with the PRE of pm1. It’s shown that two PREs are equivalent. However, df2s are different, which make Fs, as well as ps, different. In other words, though the unique effect of pm1 is constant, the compact models and augmented models used to evaluate its significance are different, which lead to different comparison conclusions (i.e., F-test and t-test results). Rethinking the F-test formula of PRE, we reach the following conclusion: With PRE being equal, the significance of PRE is determined by the df of Model C and the df-change of Model A against Model C.

Therefore, given the PRE of a specific set of predictor(s), the power of this specific set of predictor(s) are determined by the sample size n and the number of parameters [and hence the total number of predictor(s)] in the regression model. Similarly, given the PRE of a specific set of predictor(s), the required power for this specific set of predictor(s), and the number of parameters [and hence the total number of predictor(s)] in the regression model, we could compute the required sample size n.