Should Electronic Health Record-Derived Social and Behavioral Data Be Used in Precision Medicine Research?

Brittany  Hollister; Vence L. Bonham

doi:10.1001/amajethics.2018.873.

Medicine and Society

Sep 2018

Should Electronic Health Record-Derived Social and Behavioral Data Be Used in Precision Medicine Research?

Brittany Hollister, PhD and Vence L. Bonham, JD

AMA J Ethics. 2018;20(9):E873-880. doi: 10.1001/amajethics.2018.873.

Abstract

Precision medicine research initiatives aim to use participants’ electronic health records (EHRs) to obtain rich longitudinal data for large-scale precision medicine studies. Although EHRs vary widely in their inclusion and formatting of social and behavioral data, these data are essential to investigating genetic and social factors in health disparities. We explore possible biases in collecting, using, and interpreting EHR-based social and behavioral data in precision medicine research and their consequences for health equity.

Social and Behavioral Data in Precision Medicine

“Precision medicine,” “individualized medicine,” and “personalized medicine” are terms used to describe the approach to health care that considers a broad range of data types to determine the unique treatment and care needs of an individual. While genomic variation has been a major focus of precision medicine, a number of programs recognize that moving toward truly personalized health care requires an understanding of biological, environmental, social, and behavioral determinants of health in addition to genomic data.^1-7 Social and behavioral data cover a large range of information but generally can be grouped into 4 categories: demographic, lifestyle and behavioral, psychosocial, and geographic.⁸ The body of scientific research shows that inequalities in social conditions are fundamental causes of population health differences.^9-11 Social and behavioral data are important in demonstrating the role of social conditions in these health differences. For example, factors such as substance use, eating habits, activity levels, and risk-taking behaviors account for approximately 40% to 50% of the risk associated with preventable premature deaths in the US.^12,13

Currently, a number of large-scale cohort initiatives are collecting social and behavioral data for use in research.^2-7 Until recently, these data have come from participant surveys and other retrospective self-report methods.² However, many precision medicine research programs utilize electronic health record (EHR) data, as EHRs contain rich longitudinal and detailed phenotype data collected through patients’ visits.^14,15 For research programs to improve health outcomes and address health disparities, social and behavioral data must be accurately collected from patients and be retrievable from EHRs.¹⁶ Currently, extraction and use of these data present challenges due to inconsistencies across EHRs and inaccuracies in recorded data. Unless these challenges are addressed, EHR-derived social and behavioral data could limit the usefulness and applicability of precision medicine research.

Thoughtful inquiry and expansive discourse on the limitations of EHR-derived social and behavioral data are necessary if precision medicine research initiatives are to avoid inadvertent harm. What data are included or excluded from EHRs that can impact the rigor of precision medicine research? How does bias occur in collecting, using, and interpreting social and behavioral data? What are the possible consequences of interpreting and using data gathered through biased collection methodologies? Grappling with these questions can help promote better understandings of the data’s limitations and help inform strategies to reduce the data’s misapplication by researchers.

Social and Behavioral Data Collection in EHRs

Recognizing the importance of formally and systematically capturing social and behavioral measures, the National Academy of Medicine (NAM) (formerly the Institute of Medicine) recommended social environment measures’ inclusion in EHRs.⁸ Specifically, the NAM recommends intentional collection of structured social environment data, which in turn would encourage standardization of such data across patients, thereby reducing the probability of undesired bias. The NAM also recommends that a plan be developed by the National Institutes of Health to expand the use of EHRs in research by including social and behavioral data.⁸ Recognition by the NAM of the need to incorporate social data in clinical care, together with the importance of these data in precision medicine research, is likely to accelerate inclusion of these data in EHRs.

Across most EHR platforms, however, patients’ social and behavioral data are not consistently collected. These data are often unstructured and highly variable,⁵ and their inclusion is at health care professionals’ discretion.⁶ An example of data from a hypothetical patient’s EHR is provided in the figure.

Figure. Example of Social and Behavioral Data That Might Be Included in a Patient’s EHR

Variation in the content and completeness of social and behavioral data in EHRs is problematic for precision medicine research because the quality of the research is limited by the quality of the data. Despite challenges of uniform collection of social and behavioral data, methods are being developed to extract these data for use in large-scale precision medicine studies.¹⁷ As precision medicine research programs begin to utilize these data, it is important to consider the potential harms of their exclusion from EHRs or their misuse by researchers.

Limitations of Research with EHR Data

Limited patient participation. While EHR-derived social and behavioral data have potential to contribute to our understanding of multifactorial causes of health outcomes, which is one goal of precision medicine, these data require special consideration because they are commonly not self-reported by participants, who perceive such data as sensitive. Rather, clinicians normally record these data in patients’ EHRs. Potential study participants’ willingness to provide ongoing access to their EHR due to privacy concerns has been identified as a barrier to recruitment in precision medicine research programs.¹⁸

To encourage patient participation in precision medicine research programs that use EHR-derived social and behavioral data, it is important that researchers engage individuals as partners rather than only as prospective human subjects. Transparent communication and education about how participants’ information will be protected, deidentified, and used are imperative for maintaining trust so that individuals are more inclined to participate.^15,19 It is important that potential participants understand how their data might be used, the limitations of privacy protections, and other potential risks.²⁰

Due to the trust many physicians have established with their patients, they wield considerable influence on their patients’ decisions to participate in research. Trust between clinical professionals and their patients creates an environment wherein patients will be more likely to share their social and behavioral data for use in research. Consequently, physicians are likely to be key conduits for participant recruitment in precision medicine research programs.²¹ Therefore, it is up to these programs to develop relationships with physicians and keep them informed of, and involved with, precision medicine research programs.

Biases in collecting and analyzing EHR-derived social and behavioral data. Bias is present throughout the research process, from the recording of data to the interpretation of results. Decisions about which information to record in EHRs can lead to bias in the type of data available and affect the accuracy and completeness of what is recorded. For example, health care professionals, who vary in the content and completeness of data they include in EHRs,⁶ could be influenced by discussions of social and behavioral health indicators with patients, possibly unconsciously biasing available social and behavioral data. Because of data recording inconsistencies, important social and behavioral data could be missing from EHRs.¹⁷

Inclusion or exclusion of data from precision medicine studies can lead to confounding or misrepresenting research conclusions, which can be harmful in studies of diseases with health disparities.² For example, in Non et al’s study of blood pressure, inclusion of education in the prediction model eliminated the association between genetic ancestry and blood pressure, since education was associated with both the predictor (genetic ancestry) and the outcome (blood pressure) variables.²² Exclusion of social and behavioral data from future precision medicine studies could generate misleading observations or spurious correlations between predictor and outcome variables.

Beyond biases in the recording of social and behavioral data, there can be biases in the use of these data by precision medicine researchers. When extracting social and behavioral data from unstructured free text rather than from structured fields of EHRs, methods such as text-mining algorithms are necessary. However, biases in the algorithm training data sets—for example, overrepresentation of a population—can lead to biases in the algorithms themselves, such that the algorithms only function for an overrepresented population.¹⁷

When social and behavioral data are missing, it can be challenging to determine how to approach large-scale analyses. Some methods for handling missing data, such as imputation, rely on creating new data from patterns in available data. But if the data used in imputation have biases, the imputed data will, too. Furthermore, most imputation methods developed for EHR data focus on clinical data. These methods are powerful but rely on assumptions of relationships between clinical variables such as hemoglobin A1c values and type 2 diabetes.²³ Imputation methods can predict missing hemoglobin A1c values from available clinical data, such as diabetes medication use or fasting glucose measurements, because these values are clearly related.²³ Given that imputation methods are prone to bias, imputed social and behavioral data might not be accurate because the relationships of social and behavioral variables to each other are less defined.

Another problematic approach is the use of EHR-derived social data without consideration of the social and historical biases inherent in the data’s collection.^24,25 One example from outside of precision medicine is the use of policing data to build models of predictive policing. Data used in these models are based on existing patterns of police activity, which are already skewed due to overpolicing in minority neighborhoods. Therefore, when these models make recommendations for areas that require police monitoring, they utilize data that reinforce patterns of overpolicing.²⁶ Within clinical settings, research has shown that EHR-derived data can be biased for several possible reasons, ranging from differences in physician delivery of care and recording of data to the methods of extracting the data from EHRs.²⁷ When researchers make use of social and behavioral data in EHRs, it is important that they consider and are conscious of potential biases not only in the reporting of data but also in the extraction of data, in analyses, and in interpretation of results.^28,29 Without addressing these considerations, models built on biased EHR-derived social and behavioral data may only reflect biases rather than useful information, as observed in the predictive policing example.

Within EHR research, frameworks for addressing bias for some types of clinical data have been developed; precision medicine researchers can utilize these methods when considering the biases of existing social and behavioral data in EHRs.^30,31 Without carefully accounting for all sources of bias, precision medicine researchers have the potential to exacerbate the existing injustices that underrepresented populations experience.

Conclusion

The inclusion of EHR-derived social and behavioral data in precision medicine research is important to gain a holistic perspective of health. However, biases in the collection and analysis of EHR-derived social and behavioral data can have ethical implications. Researchers must use these data in a manner that will not exacerbate existing injustices in health care. Going forward, the inclusion of structured social and behavioral data in EHRs will aid in the process of reducing biases in documentation.

Development of the Electronic Health Record

Jim Atherton, MD

State of the Art and Science

Mar 2017

Language, Structure, and Reuse in the Electronic Health Record

Angus Roberts, PhD

Case and Commentary

Mar 2011

Use of Electronic Patient Data in Research

Stephen T. Miller, MD and Rexann G. Pickering, PhD, CIP, RN

Medicine and Society

Should Electronic Health Record-Derived Social and Behavioral Data Be Used in Precision Medicine Research?

Abstract

Social and Behavioral Data in Precision Medicine

Social and Behavioral Data Collection in EHRs

Figure. Example of Social and Behavioral Data That Might Be Included in a Patient’s EHR

Limitations of Research with EHR Data

Conclusion

Read More

References

Also in this Issue

Sep 2018

How Could Commercial Terms of Use and Privacy Policies Undermine Informed Consent in the Age of Mobile Health?

How Stratification Unites Ethical Issues in Precision Health

Why Does the Shift from “Personalized Medicine” to “Precision Health” and “Wellness Genomics” Matter?

Graphic Medicine and the Limits of Biostatistics