6 minute read


Limitations Of Survey Research And Problems With Interpretations

Surveys obtain information by asking people questions. Those questions are designed to measure some topic of interest. We want those measurements to be as reliable and valid as possible, in order to have confidence in the findings and in our ability to generalize beyond the current sample and setting (i.e., external validity). Reliability refers to the extent to which questions evoke reproducible or consistent answers from the respondent (i.e., random measurement error is minimized). Validity refers to the extent to which the questions are actually getting at what we want them to measure (i.e., nonrandom measurement error is minimized). The relationship between reliability and validity can be intuitively seen using the metaphor of a target containing a series of concentric rings extending from the ‘‘bulls eye’’ (Trochim). A reliable and valid measure would look like a tightly clustered group of shots all in the bulls-eye; a reliable but invalid measure would look like a tightly clustered group of shots at the target periphery; a valid but unreliable measure would look like a scattering of shots all over the target; and an unreliable and invalid measure would look like a scattering of shots across only one side of the target.

Table 1 Selected features of nine recent large-scale surveys SOURCE: Author

Table 1 (continued) Selected features of nine recent large-scale surveys

At the root of these measurement issues is how the survey questions are asked. Careful crafting of survey questions is essential, and even slight variations in wording can produce rather different results. Consider one of the most commonly studied issues in aging: activities of daily living (ADLs). ADLs refer to the basic tasks of everyday life such as eating, dressing, bathing, and toileting. ADL questions are presented in a staged fashion asking first whether the respondent has any difficulties in performing the task by themselves and without the use of aids. If any difficulty is reported, the respondent is then asked how much difficulty he or she experiences, whether any help is provided by another person or by an assisting device, how much help is received or how often the assisting device is used, and who is that person and what is that device.

Surprisingly, prevalence estimates of the number of older adults who have ADL difficulties vary by as much as 60 percent from one national study to another. In addition to variations in sampling design, Wiener, Hanley, Clark, and Van Nostrand report that differences in the prevalence estimates result from the selection of which specific ADLs the respondents are asked about, how long the respondent had to have the ADL difficulty before it counts, how much difficulty the respondent had to have, and whether the respondent had to receive help to perform the ADL. Using results from a single study in which different versions of ADL questions were asked of the same respondents, Rodgers and Miller (1997) have shown that the prevalence rate can range from a low of 6 percent to a high of 28 percent. With those same data, Freedman has found that the prevalence of one or more ADL difficulties varies from 17 percent to nearly 30 percent depending on whether the approach reflects residual difficulty (i.e., even with help or the use of an assisting device) or underlying difficulty (i.e., without help or using an assisting device).

A related concern is the correspondence between self-reported ADL abilities and actual performance levels. Although there are obvious drawbacks to direct observation of ADLs (including privacy), performance-based assessments of lower and upper body physical abilities can be conducted in personal interviews. Examples for the upper body include assessing grip strength using hand-held dynamometers, the ability to hold a one gallon water jug at arms length, and to pick up and replace pegs in a pegboard, while examples for the lower body include measured and timed walks, standing balance tests, and repeated chair stands. Simonsick and colleagues have shown that carefully crafted questions eliciting self-reports of lower- and upper-body physical abilities are generally consistent with performance-based assessments on the same respondents.

Even when reliable and valid questions are asked, there can still be serious problems due to missing data. Missing data comes in three varieties: people who refuse to participate (the issue of response rates), questions that are left unanswered (the issue of item missing values), and (in longitudinal studies) respondents who are lost to follow-up (the issue of attrition). The problem is that missing data results in (1) biased findings if the people for whom data is missing are systematically different, (2) inefficient statistical estimates due to the loss of information, and (3) increased analytic complexity because most statistical procedures require that each case has complete data (Little and Schenker). Methods to deal with missing data include naive approaches like unconditional mean imputation (i.e., substituting the overall sample mean), and sophisticated methods like expectation-maximization algorithms or multiple imputation procedures. The utility of these methods depends on whether the data is missing completely at random, or if it reflects a nonignorable pattern. The latter requires use of the more sophisticated approaches.

The most important limitation of surveys has to do with internal validity, or the establishment of causal relationships between an independent variable (the cause, denoted by X) and a dependent variable (the effect, denoted by Y). There are three fundamental criteria for demonstrating that X is a probabilistic cause of Y (Suppes): (1) the probability of Y given that X has occurred must be greater than the probability of Y in the absence of X; (2) X must precede Y in time; and, (3) the probability of X must be greater than zero. Implicit in the first criterion is the presence of a comparison group. Several threats to internal validity exist that constitute rival hypotheses for the explanation that X causes Y (Campbell and Stanley). When well designed and administered, the classic two-group experimental design eliminates these because the assignment to either the experimental or control group is randomly determined and both groups are measured before and after the experimental group is exposed to X. Therefore, the potential threats to internal validity are equivalent for both the experimental and control groups, leaving the difference between the before and after comparisons due solely to the experimental group’s exposure to X. Thus, experimental designs meet the criteria for probabilistic causation.

In survey research, however, this is not the case because assignment to the experimental versus control group has not been randomized and the time sequence has not been manipulated. Therefore, the survey researcher must make the case that the causes are antecedent to the consequences, and that the groups being compared were otherwise equivalent. The former is often only addressable by logic, and the latter is only addressable by matching the groups being compared on known risk factors, or by statistically adjusting for known risk factors. In contrast, well-performed randomization creates equivalence on everything, whether it is known or not. That is why survey-based research traditionally includes numerous covariates in an attempt to resolve the problem of potential confounders. Basically, survey researchers must rule out all competing explanations of the observed relationship between X and Y in order to suggest (but not demonstrate) that a causal relationship exists.

Given the limitations of surveys that have been mentioned in this entry, one might ask why surveys are conducted at all. There are several important reasons. Surveys gather data about relationships between people, places, and things as they exist in the real world setting. Those relationships can not all be examined in laboratory experiments. Moreover, surveys allow the collection of data about what people think and feel, and facilitate the collection of information in great breadth and depth. Surveys are also very cost-efficient. Finally, surveys are an excellent precursor for planning and designing experimental studies. Thus, despite their limitations, surveys are and will continue to be a major source of high-quality information with which to explore the aging process.

Additional topics

Medicine EncyclopediaAging Healthy - Part 4Surveys - Cross-sectional Versus Longitudinal Surveys, Limitations Of Survey Research And Problems With Interpretations, Major Recent Surveys