Common Sample Sizes

library(Keng)

Common sample sizes needed are computed and presented in this vignette.

Use Pearson’s r

power_r(r = 0.2)
## -- Given ---------------------------
## r = 0.2
## d = 0.4082483
## sig_level = 0.05
## -- Post-hoc power analysis --------
## n = NULL
## power_post = NULL
## -- Sample size planning -----------
## power = 0.8
## Minimum sample size = 193
plot(power_r(r = 0.2))

Gignac and Szodorai (2016) recommended to consider correlations of 0.10, 0.20, and 0.30 as relatively small, typical, and relatively large for individual differences research. Use 0.20 as the prior effect size of Pearson’s r, then the sample size needed is 193.

Mediation

Assume that we test the significance of the indirect effect using the causal steps approach (Hayes, 2009). Assume a is the effect of X on M, and b is the effect of M on Y controlling for X, then the indirect effect of X on Y through M is a*b. For statistical inference, if a and b are significant, then we could infer the product of a and b (the indirect effect) should be significant. To plan the sample size for mediation models, we should ensure the significance of a and b in regression models.

According to Gignac and Szodorai (2016), we set PRE of a and b to be the square of 0.2. That’s 0.04.

The regression with a has two terms: intercept, and X, hence PA = 2. The regression without a has one term: intercept, hence PC = 1. Then we have:

power_lm(PRE = 0.04)
## -- Given ---------------------------
## PRE = 0.04
## f_squared = 0.04166667
## PC = 1
## PA = 2
## sig_level = 0.05
## -- Post-hoc power analysis --------
## n = NULL
## power_post = NULL
## -- Sample size planning -----------
## Expected power = 0.8
## Minimum sample size = 193

Accordingly, the minimum sample size for a is 193.

The regression with b has three terms: intercept, X, and M, hence PA = 3. The regression without b has two terms: intercept, and X, hence PC = 2.

power_lm(PRE = 0.04, PC = 2, PA = 3)
## -- Given ---------------------------
## PRE = 0.04
## f_squared = 0.04166667
## PC = 2
## PA = 3
## sig_level = 0.05
## -- Post-hoc power analysis --------
## n = NULL
## power_post = NULL
## -- Sample size planning -----------
## Expected power = 0.8
## Minimum sample size = 194

Accordingly, the minimum sample size for b is 194.

Taking both a and b into account, the minimum sample size for the indirect effect is 194.

Moderation

For moderation model, we should ensure the significance of the interaction term. Assume the effect of X on Y is moderated by W, then the compact regression model has three terms: intercept, X, and W, hence the argument PC is 3. The augmented regression model has four terms: intercept, X, W, and the interaction term X*W, hence the argument PA is 4. Following Gignac and Szodorai (2016), we assume the PRE of the interaction term is 0.04, then

power_lm(PRE = 0.04, PC = 3, PA = 4)
## -- Given ---------------------------
## PRE = 0.04
## f_squared = 0.04166667
## PC = 3
## PA = 4
## sig_level = 0.05
## -- Post-hoc power analysis --------
## n = NULL
## power_post = NULL
## -- Sample size planning -----------
## Expected power = 0.8
## Minimum sample size = 195

Accordingly, the minimum sample size we need is 195.

ANOVA

When an independent variable has more-than-two levels, it’s commonly recommended to use ANOVA. Actually, we could substitute regression for ANOVA by transforming a multicategorical predictor with k levels into (k-1) predictors.

For example, we have two factors: A and B. Both A and B have three levels. In regression analysis, we should transform A into 2 predictors, A_code1, A_code2. B should be tranformed into B_code1 and B_code2. To test the interaction of A and B, we should make products of A and B, then we attain A_code1*B_code1, A_code2*B_code1, A_code2*B_code1, A_code2*B_code1. The regression for this ANOVA finally has 9 terms: intercept, A_code1, A_code2, B_code1, B_code2, A_code1*B_code1, A_code2*B_code1, A_code2*B_code1, A_code2*B_code1, hence PA = 9. To test the significance of the interaction of A and B, we should conduct hierarchical regression to test the effect of all four products, hence PC = 5. For the PRE of all four products as a whole, PRE should be larger than 0.04, and we assume the PRE is 0.13 using Cohen’s (1988) criterion for medium effect size. Then we have,

power_lm(PRE = 0.13, PC = 5, PA = 9)
## -- Given ---------------------------
## PRE = 0.13
## f_squared = 0.1494253
## PC = 5
## PA = 9
## sig_level = 0.05
## -- Post-hoc power analysis --------
## n = NULL
## power_post = NULL
## -- Sample size planning -----------
## Expected power = 0.8
## Minimum sample size = 94

Accordingly, the minimum sample size we need is 94.

Control for other covariates

Assume the PRE of the focal predictor (e.g., the mediator M in the mediation model, or the interaction term in the moderation model) is the same, the addition of other covariates only changes PC and PA. For example, if we additionally control for gender and age, then for the moderation model we have:

power_lm(PRE = 0.04, PC = 5, PA = 6)
## -- Given ---------------------------
## PRE = 0.04
## f_squared = 0.04166667
## PC = 5
## PA = 6
## sig_level = 0.05
## -- Post-hoc power analysis --------
## n = NULL
## power_post = NULL
## -- Sample size planning -----------
## Expected power = 0.8
## Minimum sample size = 197

Accordingly, the minimum sample size we need is 197.

Ideally assuming the PRE of the focal predictor(s) keeping constant, when the number of the covariate(s) increases by one, the planed sample size increases by one. In practice, the addition of covariates almost always reduces the PRE of the focal predictor(s), hence the more conservative criterion of Gignac and Szodorai (2016) is adopted here.

Conclusion

Two hundred is recommended as the minimal sample size for common psychological research.

Reference

Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Routledge.

Gignac, G. E., & Szodorai, E. T. (2016). Effect size guidelines for individual differences researchers. Personality and Individual Differences, 102, 74-78.

Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical mediation analysis in the new millennium. Communication monographs, 76(4), 408-420.