A recent graduate of a large internal medicine residency program, Dr. Smith is beginning her infectious disease fellowship at a top-notch program. As a person who has lived with the HIV virus her entire life, she has a particular interest in infectious diseases—specifically, HIV and HIV transmission. She chooses to focus the research project required for her fellowship on HIV transmission to neonates.
As a physician and a patient, the importance of privacy is not lost upon her, and she chooses to pursue a fellowship in a program that shares her views on privacy and confidentiality. In fact, her program has recently purchased and implemented a new electronic medical record system for patient data.
Dr. Smith is most interested in the soon-to-be mothers and their unborn children. Over the course of her first few months in fellowship, she begins to collect data on HIV-positive mothers and retrieves information regarding the HIV status of their newly born children through her hospital’s electronic health records. While HIV incidence and fetal transmission are well known from a public health standpoint, no specific data exist for her particular community or the hospital’s patients. Although newborn HIV testing is mandatory at her hospital, treatment of an HIV-positive mother is not. Dr. Smith begins to gather data on HIV-positive mothers and transmission to their infants. She wishes to be able to provide more accurate information to her future patients about the incidence of HIV transmission to newborns in her community.
A few months after beginning data collection, Dr. Smith is approached by one of her colleagues, who has become aware of the project and is concerned that it is unethical to collect and compile existing data without obtaining informed consent from the participants.
The hypothetical Dr. Smith has fallen into the trap that has led many past investigators to violate ethical principles in pursuit of scientific goals. Dr. Smith has a disease and she wants to protect others from getting it by collecting clinical data. She can do that without troubling the subjects because the data already exist. All she has to do is to look at patient records. The patients actually come to the clinic where she works. Although they may not be her own patients, they are hers inasmuch as she is a professional caregiver in the clinic. As a result of the project, Dr. Smith will be able to give her patients better information about HIV transmission from pregnant women to their newborns.
With such laudable aims to this project, why do we have ethical concerns about Dr. Smith’s project?
Scientific interest is not a justification for violating ethical principles of autonomy and nonmaleficence. The acts of Nazi doctors in the 1930s arose originally from scientific queries rather than political motives . The research at Tuskegee  and other instances of inappropriate research on vulnerable populations  were largely undertaken out of scientific curiosity.
Mining electronic medical records for data might be viewed as less harmful than the egregious insults on vulnerable subjects, but Dr. Smith’s project involves a violation of privacy and, as such, of patients’ autonomy. That is why there are clear and distinct ethical, professional, and legal guidelines for the collection and use of data from medical records.
Dr. Smith might argue that her project is more along the lines of a patient-care registry. Registries are useful quality-improvement tools in clinical care, particularly for patients with chronic conditions. Registries made from electronic medical records are one of the “meaningful use” objectives of new health care reform legislation. Dr. Smith, however, is mining the medical records to complete her fellowship requirements, not principally to improve patient care.
Restarting the Project Properly
Clear guidelines exist for initiating a project in data mining. First, Dr. Smith must inquire whether her clinic or institution has procedures in place for mining electronic medical records. She should determine whether the clinic’s patient consent forms for medical care include the provision that registries for patients with particular medical conditions may be made or electronic data searches may be performed. She should scrutinize those procedures and consents to make certain that the records of patients who declined inclusion in the registry or searches are left out. If Dr. Smith does not find adequate procedures for inclusion and exclusion in electronic data mining at her institution, she should work to put them in place. That would be a superb project for a beginning investigator.
If her clinic has appropriate procedures, they will include oversight by an institutional review board (IRB) for research activities, including both data mining and interventional research. Any investigator or educational program requiring research should be well-versed in IRB policies for medical record reviews.
Collection of data from medical records for research purposes—specifically the creation of a database—is permitted under criteria established in the Code of Federal Regulations (CFR) . If data were collected solely for nonresearch purposes, such as medical treatment or diagnosis, the project will meet criteria 5 for expedited review by the IRB .
IRB approval of creation of a registry does not, however, provide approval for using data from the registry in other research projects. Each project is considered a separate research study, and each study needs IRB approval.
Dr. Smith’s project involves protected health information, which can be used to identify an individual. The Health Insurance Portability and Accountability Act of 1996 (HIPAA) lists 18 individual identifiers, including names, medical record numbers, social security numbers, license or beneficiary numbers, all dates related to identification of an individual except year of birth, and any address information more specific than state . Due to the sensitive nature of the data that Dr. Smith is collecting, the IRB may require that she apply for a Certificate of Confidentiality (COC) from the Department of Health and Human Services . The COC will protect the researcher and institution from being compelled to disclose information that would identify research subjects in any civil, criminal, administrative, legislative, or other proceeding, whether federal, state, or local. Researchers can apply for a COC if data collected in a study have the potential to cause adverse financial, employment, insurability, or reputation consequences for the subject if information is disclosed.
Once the registry has received IRB approval, Dr. Smith or other investigators can apply for expedited review or an exemption certification.These much simpler applications permit investigators to proceed with IRB approval without having to apply for complete reviews. The exemption certification can be granted by the IRB if the project is studying existing data, (i.e., data from the registry), or if the information is recorded by the investigator in such a manner that subjects cannot be identified directly or indirectly.
Investigators eager to explore databases should be aware that preexisting de-identified data are available in the public domain, with safeguards to ensure appropriate and ethical use [8, 9]. It is not free, but selected information can be obtained at a reasonable cost. Mining of local or combined data sets is a legitimate research activity that can be accomplished with adherence to regulations, cognizance of the reasons for their development, and proper respect for subjects.
- Braund J, Sutton DG. The case of Heinrich Wilhelm Poll (1877-1939): a German-Jewish geneticist, eugenicist, twin researcher, and victim of the Nazis. J Hist Biol. 2008;41(1):1-35.
- Thomas SB, Quinn SC. The Tuskegee Syphilis Study, 1932 to 1972: implications for HIV education and AIDS risk education programs in the black community. Am J Public Health. 1991;81(11):1498-1505.
- Frieden TR, Collins FS. Intentional infection of vulnerable populations in 1946-1948: another tragic history lesson. JAMA. 2010;304(18):2063-2064.
US Department of Health and Human Services. Protection of human subjects. 45 CFR sec 46. http://ohsr.od.nih.gov/guidelines/45cfr46.html. Accessed February 16, 2011.
Code of Federal Regulations. Institutional review boards. 21 CFR 56 sec 110. http://edocket.access.gpo.gov/cfr_2002/aprqtr/21cfr56.110.htm. Accessed February 16, 2011.
Centers for Disease Control. HIPAA privacy rule and public health: guidance from CDC and the US Department of Health and Human Services. MMWR. 2003;52 Suppl 1:1-17, 19-20. http://www.cdc.gov/mmwr/preview/mmwrhtml/m2e411a1.htm. Accessed February 16, 2011.
US Department of Health and Human Services. Certificates of confidentiality kiosk. http://grants.nih.gov/grants/policy/coc/. Accessed February 16, 2011.
Smith A, Steinman M. Datasets for research on hospitalized adults. SGIM Forum. 2010;33(12):7. http://www.sgim.org/userfiles/file/SGIM%20December%202010.pdf. Accessed February 16, 2011.
Society of General Internal Medicine. Dataset compendium. http://www.sgim.org/go/datasets. Accessed February 16, 2011.