The significance of the unique effect of one or a set of predictors
in the regression model is determined by the PRE (Proportional Reduction
in Error, also called partial eta_squared in ANOVA, or partial
R_squared in regression), number of parameters in the
regression model, and sample size. As a result, given PRE,
number of parameters in the regression model, and expected statistical
power, we can plan the sample size for one or a set of predictors to
reach the expected statistical power (usually 0.80) and the expected
significance level (usually 0.05). So, power_lm()
comes.
power_lm()
To plan sample size for one or a set of predictors, the following
information are needed and passed to power_lm()
as
arguments:
PRE
PRE is partial R_squared. Partial
R_squared is the square of the partial correlation. You could
calculate PRE from the partial correlation. Other statistical
software or R packages often plan sample size for regression model
through Cohen’s f_squared, or its square root, Cohen’s
f. power_lm()
use PRE here because
PRE and its square root, partial correlation, are more
meaningful. The partial correlation is the net correlation between the
outcome of regression (e.g., depression) and the predictor (e.g.,
problem-focused coping) or set of predictors (e.g., the dummy codes of
class) of interest. Put differently, the partial correlation is the pure
correlation between the outcome and the predictor or set of predictors
of interest after controlling for all other predictors, no matter how
many they are. We should give a nice guess about the partial
correlation. For example, we may guess that, after controlling for other
predictors, the partial correlation between the outcome depression and
the predictor problem-focused coping is 0.2, then PRE =
0.22 = 0.04. You may get the effect size Cohen’s
f_squared or f of problem-focused coping predicting
depression, and in this case you could convert Cohen’s
f_squared or f to PRE. Keng
provides a function calc_PRE()
to help users to convert the
partial correlation r_p
, or f_squared
or
f
to PRE.
PA
and PC
Suppose that your regression model has m predictors totally. This
model is the augmented model (Model A), and has both the focal
predictors (e.g., gender, the dummy codes of class) and other
less-important predictors like covariates. The number of parameters of
this augmented model (Model A), PA
, is m + 1, since the
intercept is also a parameter. PA
should be at least 1.
The model without the focal predictors is the compact model (Model
C). Suppose that the number of the focal predictors is k, the resulting
number of parameters of the compact model (Model C), PC
, is
m + 1 - k. PC
should be at least 0.
power
power
is the expected statistical power, usually it’s
set to be 0.80.
sig.level
sig.level
is the expected significance level, and it’s
commonly known as the cut-off p -value. Usually it’s set to be
0.05.
n
n
is an optional argument. n
is the sample
size, and should be larger than PA
. If n
is
given, post-hoc power would be computed.
Note that power_lm()
follows Aberson’s (2019), and the
planed sample size is more conservative than other statistical software
like G*power. However, the difference is small and negligible.
Given that regression analysis is equivalent to t-test and ANOVA,
power_lm()
could plan the sample size for perhaps all
common research designs.
You may be interested in the power and required sample size of the full regression model, you could treat all predictors as a set. Suppose your regression model has m predictors, in this case the Model C is the intercept-only model, hence PC = 1, and PA = m + 1.
You may be interested in the power and required sample size of one continuous predictor. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.
You may be interested in the two-way moderation model. In the two-way moderation model, the focal predictor is actually the two-way interaction term. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.
You may be interested in the three-way moderation model. In the three-way moderation model, the focal predictors are two two-way interaction terms and one three-way interaction term. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 3.
If you are interested in the difference between the mean of one group and 0, you may turn to the one-sample t -test. Or, you could establish a intercept-only model. Then the focal parameter is the intercept. In this case PA = 1, PC = 0.
Note that in this case you must use the CORRECT
PRE to yield correct power and planned sample size. Do not
compute PRE from Cohen’s one-sample d ; instead,
compute PRE from the t value of the one-sample
t -test. You could also compute the correct PRE using
compare_lm()
function.
If you are interested in the difference between two groups (e.g., experimental vs control), you may turn to the t -test. Or, you could treat the group variable as a binary predictor and conduct regression analysis. Then the focal predictor is the binary group predictor. Suppose your regression model has m predictors, in this case PA = m + 1, PC = (m + 1) - 1.
If you are interested in the difference between multiple groups, you may turn to ANOVA. Or, you could treat the group variable as a multicategorical independent variable. Then you could code it using a coding schema like dummy coding. No matter which coding schema you use, for a multicategorical independent variable with j levels, it should be coded into (j - 1) predictors, which are the set of focal predictors. Suppose your regression model has m predictors, among which there are (j - 1) codes, in this case PA = m + 1, PC = (m + 1) - (j - 1).
If you are interested in the outcome that were repeatedly measured, you may turn to repeated-measures-ANOVA. In essence, repeated-measures-ANOVA computes the difference score of interest (contrasts), and then conduct between-factor ANOVA. Similarly, you could compute the difference score of interest, and then conduct regression analysis.
A special case is there is no between-subject factor. Under this circumstance, treat the difference score as the outcome and establish a intercept-only model like one-sample t -test.
Aberson, C. L. (2019). Applied power analysis for the behavioral sciences. Routledge.