Rereading Meehl
Lately, I’ve been rereading a lot of Meehl’s papers on the epistemological problems with research in psychology. This passage from “The Problem Is Epistemology, Not Statistics: Replace Significance Tests by Confidence Intervals and Quantify Accuracy of Risky Numerical Predictions” strikes me as an almost perfect summary of his concerns, although it’s quite abstract and assumes familiarity with his “crud factor principle”, which asserts that all variables in psychology are correlated with all other variables.
I said that the crud factor principle is the concrete empirical form, realized in the sciences, of the logician’s formal point that the third figure of the implicative (mixed hypothetical) syllogism is invalid, the error in purported deductive reasoning termed affirming the consequent. Speaking methodologically, in the language of working scientists, what it comes to is that there are quite a few alternative theories \(T'\), \(T"\), \(T"'\), \(\ldots\) (in addition to the theory of interest \(T\)) that are each capable of deriving as a consequence the statistical counternull hypothesis \(H^{*}: \delta = (\mu_1 - \mu_2) > 0\), or, if we are correlating quantitative variables, that \(\rho > 0\). We might imagine (Meehl, 1990e) a big pot of variables and another (not so big but still sizable) pot of substantive causal theories in a specified research domain (e.g., schizophrenia, social perception, maze learning in the rat). We fantasize an experimenter choosing elements from these two pots randomly in picking something to study to get a publication. (We might impose a restriction that the variables have some conceivable relation to the domain being investigated, but such a constraint should be interpreted very broadly. We cannot, e.g., take it for granted that eye color will be unrelated to liking introspective psychological novels, because there is evidence that Swedes tend to be more introverted than Irish or Italians.) Our experimenter picks a pair of variables randomly out of the first pot, and a substantive causal theory randomly out of the second pot, and then randomly assigns an algebraic sign to the variables' relation, saying, “\(H^{*}: \rho > 0\), if theory \(T\) is true.” In this crazy example there is no semantic-logical-mathematical relation deriving \(H^{*}\) from \(T\), but we pretend there is. Because \(H_0\) is quasi-always false, the counternull hypothesis \(~H_0\) is quasi-always true. Assume perfect statistical power, so that when \(H_0\) is false we shall be sure of refuting it. Given the arbitrary assignment of direction, the directional counternull \(H^{*}\) will be proved half the time; that is, our experiment will “come out right” (i.e., as pseudo-predicted from theory \(T\)) half the time. This means we will be getting what purports to be a “confirmation” of \(T\) 10 times as often as the significance level \(\alpha = .05\) would suggest. This does not mean there is anything wrong with the significance test mathematics; it merely means that the odds of getting a confirmatory result (absent our theory) cannot be equated with the odds given by the \(t\) table, because those odds are based on the assumption of a true zero difference. There is nothing mathematically complicated about this, and it is a mistake to focus one’s attention on the mathematics of \(t\), \(F\), chi-square, or whatever statistic is being employed. The population from which we are drawing is specified by variables chosen from the first pot, and one can think of that population as an element of a superpopulation of variable pairs that is gigantic in size but finite, just as the population, however large, of theories defined as those that human beings will be able to construct before the sun burns out is finite. The methodological point is that \(T\) has not passed a severe test (speaking Popperian), the “successful” experimental outcome does not constitute what philosopher Wesley Salmon called a “strange coincidence” (Meehl, 1990a, 1990b; Nye, 1972; Salmon, 1984), because with high power \(T\) has almost an even chance of doing that, absent any logical connection whatever between the variables and the theory.
Put another way: what’s wrong with psychology is that the theories being debated make such vague predictions as to be quasi-unfalsifiable. NHST is bad for the field because it encourages researchers to make purely directional predictions – that is, to make predictions that are so vague that their success is, to use an old-fashioned word, vain-glorious.