A research design is a plan that guides researchers in the gathering, organizing, and analyzing of their data. The research design developed by a researcher depends on the research question(s) that she is asking. For example, if the researcher wishes to test a causal hypothesis — such as, does cigarette smoking cause lung cancer? — then a research design that includes an experimental study would be the most direct way to test the hypothesis. The researcher, of course, cannot perform such an experiment with human participants because this would be unethical. Thus, if her goal is to determine whether cigarette smoking causes cancer in humans, her research design would need to include an animal species that she believes has a respiratory system similar to that of humans. She also would need reasons to believe that this species’s respiratory system would respond to cigarette smoke in ways similar to that of humans. Based on the results of many prior studies and on her interpretations of them, the researcher might decide that the common Norway rat (Rattus norvegicus) meets these requirements. In her research design, therefore, she would include rats as the experimental subjects. In order to develop a good research design, the researcher also would need to consider many additional matters. In short, the development of a research design adequate for answering one’s research questions typically is very difficult because it requires a great deal of knowledge and critical thinking.
The research design developed by a researcher affects the validity of the research that is then performed. In other words, the conclusions that we can draw from a study’s results depend on its research validity: on its ability to adequately answer the research question(s) being asked. In order for a research study to be considered valid, it must do two things:
- Requirement 1: in testing a particular hypothesis, it must provide results that can rule out alternative hypotheses (other plausible explanations for the results obtained).
- Requirement 2: in making relevant observations, it must provide results that can be applied to other situations (especially to real-life situations).
The first requirement refers to a study’s internal validity and the second requirement to the study’s external validity.
Internal validity refers to the ability of a research design to provide an adequate test of an hypothesis — its ability to rule out all plausible explanations for the results but the explanation being tested. For example, let’s say that a researcher decides that a particular medication prevents the development of heart disease because he found that research participants who took the medication developed lower rates of heart disease than did those who never took the medication. This interpretation of the study’s results is likely to be correct, however, only if the study has high internal validity. In order for it to have high internal validity, the research design must have controlled for the directionality and third-variable problems (see Section 3-3), as well as for the effects of other extraneous variables (see Section 4-1). In short, the researcher would have needed to perform an experimental study (see Section 4-2 and Section 5-1) in which:
- participants were randomly assigned to the experimental and control groups;
- participants did not know whether they were taking the medication or a placebo (see Table 1).
Symptoms of heart disease
Symptoms of heart disease
Table 1. Internally valid study design for testing a medication hypothesized to prevent heart disease
The most internally valid studies are experimental studies because they are better than correlational and case studies at controlling for the directionality and third-variable problems, as well as for the effects of other extraneous variables.
As an example of experimental research with high internal validity, let’s look at a series of experimental studies performed by a medical researcher named Joseph Goldberger during the early 1900’s (Chase, 1980; Kraut, undated; Stanovich, 2004). Goldberger wanted to find the cause of a disease called pellagra, the symptoms of which include gastrointestinal problems, mental disturbances, and skin eruptions. Many medical doctors theorized that the disease is caused by some sort of “germ” that, when passed from one person to another, gives rise to the symptoms of pellagra. Because the disease was more likely to occur among people living in unsanitary conditions (such as in homes with no plumbing), some physicians theorized that the unknown microorganism was passed through bodily wastes and fluids. Goldberger, on the other hand, suspected that the disease was caused by an inadequate diet, and that the correlation with sanitation was due to a third variable, poverty, that caused people to have inadequate sanitation and to receive poor nutrition.
The research question Goldberger asked was this: is pellagra caused by poor nutrition? He attempted to answer this question by testing the following hypothesis: eating a low-protein diet will cause symptoms of pellagra. His experiment is outlined in Table 2.
Eats low-protein diet
Symptoms of pellagra
Eats normal diet
Symptoms of pellagra
Table 2. Goldberger’s experiment for testing the hypothesis that eating a low-protein diet will cause symptoms of pellagra
Goldberger’s research participants were prisoners from the Mississippi State Prison (today, it would be considered unethical to use prisoners for such a study) and split them into two groups:
- Experimental group. This group received the low-protein diet.
- Control group. This group received the normal diet.
Within five months, many of the prisoners in the Experimental Group had developed pellagra, whereas none of the prisoners in the Control Group had developed the disease. Because of its high internal validity, Goldberger was able to conclude that pellagra is caused by poor nutrition.
Goldberger performed a second study designed to rule out the alternative explanation that a microorganism passed in bodily fluids causes pellagra. Goldberger hypothesized that, if the alternative explanation is wrong, then people who consume the bodily fluids of pellagra patients will not develop pellagra. His experiment is outlined in Table 3.
Eats low-protein diet
Symptoms of pellagra
Eats normal diet
Symptoms of pellagra
Table 3. Goldberger’s experiment for testing the germ theory of pellagra
For this experiment, Goldberger’s research participants were himself, his wife, and his research assistants. First, he took secretions from the nose and throat of pellagra victims and placed them into his own mouth. He did not get sick. But because the microorganisms may not have been at high concentrations in these secretions, he did the following:
he selected two patients — one with scaling sores and the other with diarrhea. He scraped the scales from the sores, mixed the scales with four cubic centimeters of urine from the same patients, added an equal amount of liquid feces, and rolled the mixture into little dough balls by the addition of four pinches of flour. The pills were taken voluntarily by him, by his assistants and by his wife. (Bronfenbrenner & Mahoney, 1975, p. 11; quoted in Stanovich, 2004, p. xx)
If a microorganism had been involved, then it seems likely that at least one or two of the participants would have become ill. But neither Goldberger nor his associates developed pellagra after consuming the fluids of people with the disease (and, therefore, the control condition did not need to be performed). By comparing the results of Goldberger’s several studies, we can feel very confident that pellagra is caused by an inadequate diet and not by a microorganism. Because of the care he took in developing his research design and then performing the research, Goldberger’s investigations into the causes of pellagra definitely had high internal validity.
Did the Milgram study provide an adequate test of the hypothesis that people will obey an authority’s commands to physically harm a stranger? In other words, did the design of Milgram’s research allow him to make observations that provided strong support for this claim? Yes, Milgram’s research had high internal validity: by performing experimental studies in which he carefully manipulated a number of variables (such as the immediacy of the Learner) while controlling for the effects of extraneous variables, Milgram was able to show that several situational factors were the primary causes of differences in obedience (see Section 6-6).
External validity refers to the ability of a research design to provide results that can be generalized to other situations, especially to natural (“real-life”) situations. For example, if the researcher in the fictional heart-medication study described above found that the medication, under controlled conditions, prevented the development of heart disease in research participants, he would want to generalize these findings — to state that the medication will prevent heart disease in the general population. However, let’s say that the research design required the exclusion of many potential participants — such as people who abuse alcohol or other drugs, suffer from diabetes, weigh more than average for their height, and have never suffered from a mood or anxiety disorder. These are common risk factors for heart disease and, by excluding these groups, the study’s results would provide little evidence that the medication will be effective for people with these risk factors. In other words, he study would have low external validity and, hence, we could not generalize its results to the general population.
This commonly happens in tests of antidepressant medications (see Brown University, 2002). Because researchers want to make sure that the antidepressant effects of the medications being tested are not hidden by the effects of extraneous variables, they often have excluded potential participants with one or more of the following characteristics:
- people who are addicted to alcohol or illicit drugs;
- people who take various medications;
- people who have anxiety disorders (such as phobic disorders);
- people who suffer from depression with psychosis;
- people with mild depression (because they would show only a small response to the medication).
If a study excluded people with these characteristics, then most people suffering from depression would be excluded from the final pool of participants. The study’s results, therefore, would provide little information about how most depressed people will respond to the medication.
Internal Versus External Validity
Studies with high internal validity often have low external validity because, in order to adequately control for the effects of extraneous variables, as well as for the directionality and third-variable problems, it is necessary to create an artificial situation — a situation unlike the natural situations to which we might want to apply the results. In other words, the greater the amount of control in the research design, the more unnatural the research situation becomes. High external validity depends on two factors:
- The participants included in the sample. The participants should be a representative sample of the population of people to which one wants to generalize (see Section 4-1) .
- The physical realism of the research setting. The physical setting in which the research takes place should be similar with respect to relevant and important characteristics of the natural situation to which one wants to generalize.
Milgram (1974) reported that he had been concerned about the representativeness of his sample when he began his obedience experiments. When he performed the first studies, he used undergraduates from Yale University and found that about 60% fully obeyed the Experimenter. But, according to Milgram, another Yale professor
immediately dismissed these findings as having no relevance to ‘ordinary’ people, asserting that Yale undergraduates are a highly aggressive, competitive bunch who step on each other’s necks on the slightest provocation. He assured me that when ‘ordinary’ people were tested, the results would be quite different. (p. 170)
In order to make the sample more representative of the general population, Milgram sampled both white and black men from a variety of occupational backgrounds: professionals, white-collar workers, blue-collar workers, and the unemployed. He found that these participants showed similar levels of obedience (≈65%) to that of the Yale undergraduates (≈60%) in the Standard Condition.
To further demonstrate the external validity of his studies with respect to the general population, Milgram performed studies of other groups (see Milgram, 1974). For example, he found that ≈65% of women showed full obedience in the Standard Condition. Furthermore, he reported the results of obedience studies performed by different researchers in other countries (West Germany, Italy, South Africa, Australia, etc.). In all, the majority of participants fully obeyed the Experimenter in the Standard Condition. In fact, the percentages were higher in some cases. For example, 85% of the West-German participants were fully obedient. These results show that it is an empirical question (that is, a question for which one must make the relevant observations) whether or not a particular characteristic of a study reduces its external validity. It is very important to remember that we cannot dismiss experimental findings based only on our intuition that the participants “obviously” were not representative of the population.
What about the fact that Milgram’s studies took place in laboratory settings — settings that “obviously” are very different from most real-life settings. Does this mean that the results have little or no relevance outside of the laboratory? If the answer to this question is “yes,” then the results of controlled laboratory studies never would have enough external validity to allow us to apply their results to natural situations. Milgram (1974), however, stated that people who make this criticism expose their serious misunderstanding of science. He argued that, by creating a research situation that strips away the complexities of the natural world, controlled laboratory studies allow researchers to discover the basic processes that must exist if a phenomenon is to occur. That is, by constructing a simplified research situation, scientists are better able to test their predictions about how the natural world operates. Creating an appropriately artificial situation is a strength, not a weakness, of laboratory research, Milgram argued.
Stanovich (2004) argued a similar point when he stated that, contrary to common belief,
the artificiality of scientific experiments is not an accidental oversight. It is intentionally sought. Scientists deliberately set up conditions that are unlike those that occur naturally because this is the only way to separate many inherently correlated variables [confounded variables; see Section 3-2] that determine events in the world. (p. 99)
When designing controlled laboratory studies, researchers attempt to recreate particular aspects of the natural situation — aspects thought to be important for causing the phenomena being studied; they do not want to recreate the entire natural situation. In the laboratory, researchers are trying to test a theory about one or more causal relationships. An artificial laboratory setting is more likely to reveal these causal relationships because confounds among variables that often occur in natural situations are excluded in the laboratory situation.
Milgram (1974) argued that some important processes underlying obedience to authority were revealed by his experimental studies, and that these discoveries are applicable to a wide range of natural situations. In fact, he concluded that the obedience shown by his participants to the cruel commands of the Experimenter was the result of “the capacity for man to abandon his humanity, indeed, the inevitability that he does so, as he merges his unique personality into larger institutional structures” (p. 188). This interpretation of his results led him to claim that this capacity to abandon our sympathy and empathy for others “is a fatal flaw that nature has designed into us, and which in the long run gives our species only a modest chance of survival” (p. 188). There seems little doubt that his laboratory experiments revealed something important about human nature, although his conclusion that we have only a “modest chance of survival” because of a “fatal flaw” in our nature seems to be highly speculative and definitely not justified by the results of his research.
In the following video clip (the entire video is here), cognitive psychologist, Daniel Willingham, discusses why it’s so difficult to do research on teaching and learning. How are some of these difficulties related to the distinction between internal and external validity?
Study Questions for Section 6-7
- How would you define “research design” in your own words?
- What is meant by the term, “research validity”?
- What are the two requirements for research validity?
- How would you define “internal validity” in your own words?
- Would random assignment to different groups increase, decrease, or have no effect on the internal validity of a study? Why?
- Are the directionality and third-variable problems related to internal validity? If so, how are they related?
- How would you define “external validity” in your own words?
- Would random assignment to different groups increase, decrease, or have no effect on the external validity of a study? Why?
- Are the directionality and third-variable problems related to external validity? If so, how are they related?
- What was Joseph Goldberger’s theory of pellagra?
- Goldberger was more concerned with maximizing which kind of validity in his research? Why?
- How did Goldberger show that pellagra was much more likely to be caused by inadequate nutrition than by an organism transmitted through bodily fluids?
- Why would we conclude that Milgram’s studies had high internal validity?
- Let’s say that we want to show that a new medication reduces high levels of cholesterol in people over 40. We perform a study of 1000 men over 40, 500 of whom receive the medication and 500 of whom receive a placebo. The study is performed in a double-blind manner (that is, neither the researchers nor the participants know who is getting the medication). Which type of validity would be lower in this study: internal or external? Why?
- Variations in which two factors determine the degree to which a study is externally valid?
- Which factor was Milgram most concerned with when he tested people from the surrounding community in addition to undergraduate students?
- Given that they reduce external validity, why are artificial research conditions created in laboratory settings?
Brown University News Service. (2002, March). Antidepressant drug trials turn away most of the depressed population. Retrieved November 24, 2011 from http://www.brown.edu/Administration/News_Bureau/2001-02/01-091.html
Bronfenbrenner, U., & Mahoney, M. (Eds.). (1975). Influences on human development. Hinsdale, IL: Dryden Press.
Chase, A. (1977). The legacy of Malthus: The social costs of the new scientific racism. New York: Alfred Knopf.
Kraut, A. (n.d.). Dr. Joseph Goldberger & the war on pellagra. Office of NIH History. Retrieved November 24, 2011, from http://history.nih.gov/exhibits/goldberger/index.html
Milgram, S. (1974). Obedience to authority: An experimental view. New York: Harper & Row Publishing.
Stanovich, K. E. (2004). How to think straight about psychology (7th ed.). Boston: Pearson.