Secondary Uses of De-Identified Data and the Avoidance of Harm
The Department of Health and Human Services (DHHS) Office of Human Research Protections (OHRP) has deemed that research on specimens or data that have been delinked from personally identifiable information is not subject to federal regulation related to human subjects [2, 3], which is consistent with guidelines for exemption by the Common Rule that regulates the protection of human subjects in all federally funded research  and the Health Insurance and Portability and Accountability Act (HIPAA) that protects against the disclosure of individually identifiable health information . Neither statute, however, provides clarity on the oversight of secondary use of genetic information which, in sufficient quantity, may - in and of itself - allow re-identification [18, 19]. Treating nominally de-identified DNA samples and/or derived genetic information as exempt from human subjects regulation facilitates the goal of data sharing among researchers and institutions while minimizing the potential for harm to individuals arising from public release of confidential personal information .
However, harms may emerge when group identification is retained with sample collections, leading to stigmatization or other kinds of "group harm" [21, 22]. Individual and group harm may also emerge in the form of a violation of trust when samples are used in research that the original study participants would find objectionable, a form of "dignitary harm" . In 1989, for example, 200 Havasupai tribal members provided blood samples for what was described by researchers at Arizona State University as a population-based study of diabetes. Later, the Tribe discovered that the samples were used in a number of other studies involving research on schizophrenia, inbreeding, and human migration. In 2004, the Tribe filed a lawsuit against the Arizona Board of Regents claiming that the original informed consent agreement was violated by these secondary uses . Under current guidelines, the secondary distribution of individually de-identified data was not subject to research oversight and yet, Tribal research participants (both individually and as a group) experienced harm. Moreover, the harm incurred was not simply due to a "breach of contract" (i.e., uses not specified at the time of consent) but from the use of samples for research purposes regarded as culturally dissonant and deeply objectionable . In 2010, the Board of Regents agreed to pay $700,000 to tribe members as part of a settlement with the Tribe. In addition, the university agreed to return blood samples and provide assistance in building a health clinic on the Havasupai reservation and provide educational scholarships for tribal members .
The HGDP Diversity Panel samples are individually de-identified but linked to population of origin and, arguably, certain of the groups represented in the collection have been harmed by findings such as those outlined in Table 2. With respect to the potential for dignitary harm to individuals, there is not enough publicly available information on the terms of informed consent to judge whether the reported research uses are consistent with participants' expectations. Nevertheless, it is not hard to imagine that some contributing participants would regard as objectionable research that attempts to correlate genetic variation with social identity or geographic location, or implies ethnic differences in addiction, mental illness, or intelligence. Indeed, initial objections to the originally proposed Human Genome Diversity Project (which, as noted above, is largely unrelated to the current collection managed by the CEPH) were based in concerns that samples would be used in these and related ways [7–12].
Implications of HGDP Uses for Research Governance
We acknowledge that the degree to which the research uses described in Table 2 represent a tangible harm to individual research subjects and/or communities is subject to interpretation and disagreement. Our findings are interesting not because of what they say about the secondary uses of the HGDP Diversity Panel per se, but because of what they suggest about the range of research uses that are possible when samples and/or data are rendered exempt from research oversight. Investigators and institutions with primary responsibility for standing biospecimen collections and/or data repositories should recognize that potential harms cannot be altogether avoided by removing individually identifying information. While it may be perfectly legitimate, from a narrow regulatory vantage point, to waive research oversight in such cases, foregoing governance of secondary research uses could prove, in certain cases, ethically inadequate . And this will remain true even if all participants have provided explicit permission for broad data sharing and open-ended research use at the time of informed consent.1
It is impossible to say whether a more systematic form of oversight on the part of the CEPH, that addressed the potential for group and/or individual dignitary harm, would have avoided these outcomes or resulted in published research better aligned with participants' (presumed) expectations. A challenge for sample collections such as the HGDP Diversity Panel, which have been aggregated over long periods of time, is that original informed consent documents are either unavailable or fail to adequately anticipate the full range of current and potential secondary uses. Hence there is no firm basis to guide a Data Access Committee (DAC) or similar oversight body with respect to whether a proposed use is allowable or prohibited. Moreover, even when consent is available, it is unclear whether this type of front-end review sufficiently addresses the implicit expectations of individual participants or identifies when groups' interests could be significantly compromised by particular classes of investigation.
Rather than grounding decision-making solely in the specifics of the consent language, DACs and similar oversight bodies should consider alternative mechanisms for soliciting the views of individuals with salient insights regarding the interests of participants or their communities. In this way, a more beneficial, and ultimately trustworthy, form of data stewardship will be achieved .