Methodolgical Progress Note: Handling Missing Data in Clinical Research
© 2019 Society of Hospital Medicine
Research, in the field of Hospital Medicine, often leverages data collected for reasons other than research. For example, electronic medical record data or patient satisfaction survey results can be used to answer questions that are relevant to the practice of hospital medicine. In these types of datasets, data will inevitably be missing. Such missing data can compromise our ability to draw definitive conclusions from our research study. This review introduces the concept of missing data, describes patterns and mechanisms of missing data, and discusses common approaches for the handling of missing data, including sensitivity analyses for determining how robust the results are despite assumptions made about the missing data.
CONSEQUENCES OF MISSING DATA
Missing data create a host of problems for researchers. First, missing data result in a loss of information and can diminish the power of the proposed study. Second, the irregular data complicate the analysis because many of the standard software procedures used have been developed for fully observed or “complete” data (ie, each subject has a value for all measures of interest). Finally, missing data may introduce bias due to the systematic difference between the observed and the unobserved data. For example, if men are less likely than women to complete all questions in a patient satisfaction survey when they are not satisfied, then hospital satisfaction analyses that rely on completed surveys would tend to provide biased estimates of the satisfaction males have with their care.
MINIMIZING MISSING DATA WITH STUDY DESIGN
The ideal approach to mitigating problems caused by missing data is to anticipate and incorporate strategies to minimize missing data into the study design (ie, when planning data collection protocols for prospective studies). This plan should provide strategies for minimizing nonresponse and estimating the magnitude of anticipated missing data to ensure that the study achieves sufficient strength despite the missing data.
Strategies for minimizing nonresponse include (1) informing potential study participants, at initial contact, about the implications of missing data on the ability to answer the research question; (2) collecting several phone numbers, addresses, preferred method of contact and calling times, as well as an alternative contact, in case the primary study contact is unable to be reached; (3) specifying the number of call backs, as well as the time of contact; and (4) piloting data capture questions for phrasing, clarity, and sensitivity, in order to resolve problems before initiating the study. One approach that can be used to mitigate the impact of missing data in surveys is to contact a sample of the initial nonrespondents using a more intensive follow-up approach (eg, a nonresponse to a mailed survey is followed up by a telephone call in order to conduct the survey again over the phone), and this is referred to as “nonresponse two-phase sampling.” The additional data, captured in the second phase, not only reduces the nonresponse rate but can also provide important information on the missing data mechanism.1,2 In longitudinal studies with dropouts, one can measure participants’ intent to drop out in order to evaluate how much the probability of dropping out depends on missing responses.3 One may also choose to determine the power and implications of sample size under different missing data assumptions.4