By michellehudson | March 7, 2013
Some reflections on research data confidentiality, privacy, and curation
Limor Peer
Maintaining research subjects’ confidentiality is an essential feature of the scientific research enterprise. It also presents special challenges to the data curation process. Does the effort to open access to research data complicate these challenges?
A few reasons why I think it does: More data are discoverable and could be used to re-identify previously de-identified datasets; systems are increasingly interoperable, potentially bridging what may have been insular academic data with other data and information sources; growing pressure to open data may weaken some of the safeguards previously put in place; and some data are inherently identifiable.
But these challenges should not diminish the scientific community’s firm commitment to both principles. It is possible, and desirable, for openness and privacy co-exist. It will not be simple to do, and here’s what we need to keep in mind:
First, let’s be clear about semantics. Open data and public data are not the same thing. As Melanie Chernoff observed, “All open data is publicly available. But not all publicly available data is open.” This distinction is important because what our community means by open (standards, format) may not be what policy-makers and the public at large mean (public access). Chernoff rightly points out that “whether data should be made publicly available is where privacy concerns come into play. Once it has been determined that government data should be made public, then it should be done so in an open format.” So, yes, we want as much data as possible to be public, but we most definitely want data to be open.
Another term that could be clarified is usefulness. In the academic context, we often think of data re-use by other scholars, in the service of advancing science. But what if the individuals from whom the data were collected are the ones who want to make use of it? It’s entirely conceivable that the people formerly known as “research subjects” begin demanding access to, and control over, their own personal data as they become more accustomed to that in other contexts. This will require some fresh ideas about regulation and some rethinking of the concept of informed consent (see, for example, the work of John Wilbanks, NIH, and the National Cancer Institute on this front). The academic community is going to have to confront this issue.
Precisely because terms are confusing and often vaguely defined, we should use them carefully. It’s tempting to pit one term against the other, e.g., usefulness vs. privacy, but it may not be productive. The tension between privacy and openness or transparency does not mean that we have to choose one over the other. As Felix Wu says, “there is nothing inherently contradictory about hiding one piece of information while revealing another, so long as the information we want to hide is different from the information we want to disclose.” The complex reality is that we have to weigh them carefully and make context-based decisions.
I think the IASSIST community is in a position to lead on this front, as it is intimately familiar with issues of disclosure risk. Just last spring, the 2012 IASSIST conference included a panel on confidentiality, privacy and security. IASSIST has a special interest group on Human Subjects Review Committees and Privacy and Confidentiality in Research. Various IASSIST members have been involved with heroic efforts to create solutions (e.g., via the DDI Alliance, UKDA and ICPSR protocols) and educate about the issue (e.g., ICPSR webinar , ICPSR summer course, and MANTRA module). A recent panel at the International Data Curation Conference in Amsterdam showcased IASSIST members’ strategies for dealing with this issue (see my reflections about the panel).
It might be the case that STEM is leading the push for open data, but these disciplines are increasingly confronted with problems of re-identification, while the private sector is increasingly being scrutinized for its practices (see this on “data hops”). The social (and, of course, medical) sciences have a well-developed regulatory framework around the issue of research ethics that many of us have been steeped in. Government agencies have their own approaches and standards (see recent report by the U.S. Government Accountability office). IASSIST can provide a bridge; we have the opportunity to help define the conversation and offer some solutions.