Some Reservations About Power Analysis

Siu L. Chow

University of Regina, Regina, Saskatchewan, Canada

Cohen (December 1990) said that (a) a test statistic significant at the .026 level does not mean that H₀ is true with a probability of .026. and (b) the power of a statistical test is the probability, of detecting an effect at the level specified by beta. The first statement is true. but the second is not. The first statement is correct because the probability of Type I error is a conditional probability, namely, the probability of rejecting H₀ contingent on the fact that H₀ is true. Likewise. the probability of Type II error is a conditional probability, namely, the probability of accepting H₀ contingent on the fact that H₀ is false. Being a complement of the probability of Type II error. a test's power is also a conditional probability. Hence. to say that the power of a test is .9 is to say that the likelihood of choosing H₁, given that H₀ is false is .9. However. the probability of Event A contingent on Event B does not describe the unconditional likelihood of Event A. Consequently. the power of a test (a conditional probability) is uninformative as to how likely an effect may be detected (an unconditional probability).

Assuming that science is inevitably about magnitude. Cohen (1990) suggested that paying attention to statistical power would lead us to realize the importance of effect size (qua an index of magnitude). That is, consideration of statistical power renders it possible to make rational judgment as to the acceptability of a research hypothesis. This reason for considering statistical power may be questioned.

First, magnitude is neither the exclusive nor the ultimate concern of science. What is important is validity (Cook & Campbell. 1979; Manicas & Secord, 1983, Meehl, 1978, 1990), namely, the question: Is Theory T warranted given this set of data? Statistics has a crucial, but limited, role to play as an interim step in theory corroboration. A test of significance, not an estimate of effect size, suffices for such a purpose (Chow, 1988, 1989, in press-a. in press-b, Tukey, 1960). Cohen (1990) has not made clear what role statistical power has in determining the validity of a theory-corroboration study.

The second reason to question the importance of statistical power is that theory development is a conceptual endeavor, not a statistical one. Theory development consists of (a) identifying the to-be-studied phenomenon, (b) formulating an explanatory theory for the phenomenon, (c) making an informed comparison between the theory in question and its potential contenders, (d) deriving implications from the theory, as well as those from its contenders, (e) designing and conducting empirical research, (f) analyzing data. (g) drawing research conclusions, (h) assessing the theory vis-à-vis data collected for that purpose, and (i) repeating the entire sequence--beginning at Step b--if revision of the theory is required (Chow. 1989).

Occasions may arise when it is necessary to consider questions related to Cohen's (1990) rational judgment concerns, namely, whether a phenomenon of a certain magnitude is nontrivial, how important a phenomenon of a nontrivial size is, how important the theory is, and the like. However, these are extra statistical considerations. They cannot be answered by appealing to an effect-size estimate or the power of a test. The important point is that using tests of significance--Steps f and g--does not preclude a researcher from considering these questions--for example, Steps a or c.

Cohen (1990) found that using significance tests was "strange and backward" (p. 1307) because null hypothesis is always false. Underlying this null hypothesis always-false assertion is the assumption that a null hypothesis is a categorical proposition descriptive of the world. such as P1:

H₀: There is no difference between the mean of the control and the mean of the experimental conditions. ….. P1

However, P1 is a misrepresentation of the true state of affairs, what is said in statistics textbooks notwithstanding. Consider an experimental study of the research hypothesis that information in the short-term store is acoustic in nature. Suppose subjects are given a well-defined experimental task, namely, to remember 10 acoustically similar (AS) or acoustically dissimilar (ADS) letters. The research hypothesis is given an empirical realization in the form of the experimental hypothesis: "If the research hypothesis is true, the mean of the AS group should be different from that of the ADS group." The statistical task is to determine whether or not AS_M is different from ADS_M.

The difference between AS_M and ADS_M is not necessarily zero even if the research hypothesis is false. In other words, doubts about the truth of the experimental hypothesis arise because of the logical complement of the research hypothesis: "If the research hypothesis is not true, then there should be no difference between AS_M and S_M." In other words. both H_O and H₁ are strictly the conditional propositions, P2 and P3, respectively:

If the research hypothesis is not true, then H₀ (viz., AS_M should be equal to ADS_M). … P2

If the research hypothesis is true, then H₁ (viz.. AS_M should not be equal to ADS_M). … P3

That is to say, H₀ (as described in statistics textbooks) is actually the consequent of a conditional proposition. In such a capacity, H₀ is not a categorical proposition descriptive of the world. Instead, it is a prescription; it instructs a researcher to accept the complement of a research hypothesis if there is no difference between the experimental and control conditions. Being a prescriptive statement, H₀ is neither true nor false. That is. the null hypothesis always-false assumption itself is questionab1e.

A research hypothesis and its logical complement are two mutually exhaustive and mutually exclusive alternatives. It is hence neither strange nor backward to determine the tenability of a research hypothesis by testing its logical complement. The null hypothesis is chosen because of the fact that its underlying sampling distribution is a well-defined one. The sampling distribution provides us with an unambiguous decision criterion whose stringency is readily understood (e.g., an alpha level of .01 is more stringent than an alpha level of .05).

In sum, two putative advantages of basing theoretical conclusions on statistical power can be questioned. A null hypothesis is not a categorical proposition descriptive of the world. but a prescriptive statement. Using tests of significance is not incompatible with making rational judgment.

REFERENCES

Chow, S. L. (1988). Significance test or effect size? Psychological Bulletin, 103, 105-110.

Chow. S. L. (1989). Significance tests and deduction: Reply to Folger (I 989). Psychological Bulletin, 106, 161-165.

Chow, S. L. (in press-a). Conceptual rigor versus practical impact. Theory & Psychology.

Chow, S. L. (in press-b). Rigor and logic. Theory & Psychology

Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304-1312.

Cook, T D., & Campbell, D. T (1979). Quasi-experimentation: Design and analysis issues for field settings. Chicago: Rand McNally.

Manicas. P. T., & Secord, P. F. (1983). Implications for psychology of the new philosophy of science. American Psychologist, 38, 399-413.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow Progress of soft psychology. Journal of Consulting and Clinical Psychology, 36, 806-834.

Meehl, P. E. (1990). Appraising and amending theories: The Strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry, 1, 108-141.

Tukey, J. W. (1960). Conclusions vs. decisions. Technometrics, 2, 1-11.