Already a member?

Sign In
Syndicate content

Archiving, Preservation, Curation

Defining the nature of data archives, their functions and their operation

The International Workshop on Social Science Data Archives, held in Taiwan, sponsored by IASSIST

The International Workshop on Social Science Data Archives, sponsored by IASSIST, was held on September 15 in Conference Room II, Research Center for Humanities and Social Science (RCHSS), Academia Sinica, Taipei, Taiwan. The invited speakers included Prof. Dr. Christof Wolf from GESIS – Leibniz-Institute for the Social Sciences, Dr. Yukio Maeda and Dr. Kaoru Sato from Social Science Japan Data Archive (SSJDA), University of Tokyo, and Dr. Won-ho Park, Dr. Seokho Kim from Korea Social Science Data Archive (KOSSDA), Seoul National University.

The finalized workshop agenda is listed in the following. We also had Dr. Ruoh-rong Yu introduce the Survey Research Data Archive of Taiwan. The topics of the presentations covered data curation, preservation, and dissemination services provided by each data archive. 

09:00~09:30

Registration

09:30~09:40

Opening Remarks

Dr. Ching-Ching Chang
Chair Professor
Department of Advertising
National Cheng-Chi University, Taiwan

Morning Session

Session Chair: Dr. Ching-ching Chang

09:40~10:20

Curating, Preserving, and Disseminating Social Science Micro Data at Social Science Japan Data Archive

Dr. Yukio Maeda
Professor
Institute of Social Science
University of Tokyo, JAPAN

10:20~11:00

Introduction to Korea Social Science Data Archive

Dr. Won-ho Park
Associate Professor
Department of Political Science and International Relations
Seoul National University, KOREA

Dr. Seokho Kim
Associate Professor
Department of Sociology
Seoul National University, KOREA

Dr. In Chol Shin
Senior Researcher
Korea Social Science Data Archive
Seoul National University Asia Center, KOREA

11:00~11:20

Tea Break

 

11:20~12:00

Introduction to Survey Research Data Archive of Taiwan

Dr. Ruoh-Rong Yu
Research Fellow and Executive Director
Center for Survey Research
Research Center for Humanities and Social Sciences, Academia Sinica, TAIWAN

12:00~14:00

Lunch

 

Afternoon Session

Session Chair: Dr. Chyi-In Wu. (Research Fellow, Institute of Sociology, Academia Sinica)

14:00~15:00

Services for Survey Data: The GESIS Perspective

Dr. Christof Wolf
President
GESIS – Leibniz Institute for the Social Sciences, GERMANY

15:00~15:20

Closing Remarks

 

The registration of the workshop started May 1, 2017. The registration fee was NT$200, which included conference printed materials, lunch and light refreshments.  69 researchers attended the workshop. Most of the attendants were local scholars, while others were from Thailand, Turkey or other countries.

In the opening remarks, Dr. Chang stressed the importance of data archives, and gave a brief introduction to the speakers of the morning sessions.

The speaker of first session, Dr. Maeda, introduced the development and current practice of SSJDA. In addition, he also introduced some other data centers in Japan, including Leviathan Data Bank, Rikkyo University Data Archive, and Research Centre for Information and Statistics of Social Science at Hitotsubashi University.

SSJDA was started in 1998, with deposits amounting to 2,018 datasets. Its main collections include the Japanese General Social Surveys, Japanese Life Course Panel Surveys, Japanese Election Studies, National Family Research of Japan, Working Persons Survey, and Elementary School Students Survey. Researchers affiliated with academic institutions and graduate students can get access to SSJDA datasets for academic purposes. Applicants should sign an agreement (pledge) and get permission from PI in advance. Under the supervision of professors, undergraduate students are allowed to access certain data for paper writing. Such usage is classified as educational use, instead of research use. Some datasets are for research use only, and are not available for educational use.

SSJDA also offers several seminars on data usage and a one-week seminar on quantitative analysis every year. SSJDA built a desktop application for managing metadata based on the DDI lifecycle named Easy DDI Organizer (EDO). EDO can be used to edit metadata, import metadata and variable information from statistical software, and export documents. It is a useful tool for researchers, data users, and data archives. However, this system only has a Japanese version.

The second speaker was Dr. Park from KOSSDA. KOSSDA is Korea’s leading data archive, with expertise in the collection, dissemination, and promotion of research materials through various academic events and methodology education programs. Started in 1983 as a non-profit social science library, KOSSDA began to collect survey data in 2003, and moved to Seoul National University Asia Center in 2015.

KOSSDA collects survey data, statistical tables, qualitative interviews and narrative history data, documents, observation records, and other kinds of data produced by research institutes and individuals. KOSSDA also establishes digital databases, and provides access to the data. Its main collections include the Korean General Social Survey, ISSP Annual Topical Module Survey, Gallup Omnibus Survey, etc. KOSSDA has translated 250 survey datasets to English, including their questionnaires and codebooks.

KOSSDA is now rebuilding its website to enhance its data searching function and to improve web design. KOSSDA offers methodology training programs, data fairs, and a research paper competition every year.

After a 20-minute tea break, the presentation on SRDA kicked off. The speaker, Dr. Yu, is the Executive Director of the Center for Survey Research at Academia Sinica. SRDA was established in 1994. There are now eleven full-time workers in SRDA, including two IT staff members. The data archived by SRDA include survey data, census data, and in-house value-added data.

SRDA curates academic survey data such as the Taiwan Social Change Survey, Panel Study of Family Dynamic, Taiwan Social Image Survey, Taiwan Youth Project, Taiwan Education Panel Survey, and Taiwan’s Election and Democratization Study. In addition, SRDA also curates government survey data including the Manpower Survey, Manpower Utilization Survey, Woman’s Marriage, Fertility and Employment Survey, Survey of Family Income and Expenditure, Digital Opportunity Survey for Individuals and Households, Survey on Workers’ Living and Employment Conditions, etc. The number of datasets dissimilated by SRDA exceeds 2,800, in which 315 datasets have English versions.

A membership scheme is adopted by SRDA. Academia Sinica members are researchers at Academia Sinica. Regular members are faculty, researchers, students, or research assistants at colleges or research institutes. There are now about 2,302 members. A member can get access to most of the archived data by direct downloading from the SRDA website.

SRDA members can also apply for data with restricted access. The restricted datasets can be used via on-site access or remote access. All services provided by SRDA are now free of charge.

SRDA offers workshops, webinars, and on-campus lectures to promote data usage. In addition, SRDA maintains some social media websites such as a Facebook fan page, Youtube Channel, and SRDA blog.

SRDA has been constructing a data-based bibliography for years. Since 2016, SRDA has begun to register DOI via da|ra. One task in progress is to construct a data integration platform for Taiwan Social Change Survey data of various years. Other main tasks include enlarging data storage, broadening membership, remodeling the website, developing data management plans, and constructing an evaluation scheme for data disclosure risk.

In the afternoon session, Dr. Chi-in Wu was the chair. The presenter, Dr. Wolf, introduced the development and current progress of GESIS. Relative to data archives of Asia countries, the budget and personnel of GESIS are very large. GESIS was founded in 1960, and the data archive for social science is one of the five research departments of GESIS. There are about 70 staff members in the data archive for social science, belonging to seven teams.

GESIS currently has about 6,000 datasets, which mainly focus on migration, election, values and attitudes, and social behavior. ISSP, CSES, EVS, and ALLBUS are some well-known social science survey programs. It is easy for PIs to upload datasets through the Datorium system, which is a self-deposit service for sharing data.

Dr. Wolf stressed the importance of DOI (Digital Object Identifiers), and introduced the DOI registration service built by GESIS  da|ra. Da|ra has 576,297 registered DOI names and 88 data providers worldwide, including ICPSR, SRDA, etc. In addition to hosting da|ra, GESIS is devoted to developing international standards for data documentation and data archiving, and providing training and consulting services to researchers.

In the presentation, Dr. Wolf also talked about the secure data center of GESIS. The secure data center enables researchers to access sensitive, and weakly anonymized data. It is a locked room without internet. Users have to sign contracts in advance. Any inputs and outputs are required to be checked for disclosure risk. In the future, the secure data center will establish a remote access system, which can provide secure access to the data curated in CESSDA.

A business meeting was kicked off on the next day (September 16). Besides the guests from GESIS, KOSSDA and SSJDA, participants of the meeting included researchers at the Center for Survey Research, and all the staff of SRDA. The agenda was as below.

Development of Consortium of European Social Science Data Archives (CESSDA)

Christof Wolf (GESIS)

Connections among SSJDA, KOSSDA and SRDA in Recent Years

Ruoh-rong Yu (SRDA)

Possible Future Collaboration among Data Archives

All Participants

There have been frequent connections among KOSSDA, SSJDA and SRDA in recent years. Conferences and/or workshops were hosted in rotation in 2008, 2012, 2014, 2015, 2016, and 2017.

In 2016, KOSSDA organized an international conference with invited guests from SSJDA at the University of Tokyo (Japan), CNSDA at Renmin University (China), and SRDA at Academia Sinica (Taiwan). In this conference, a consensus was reached to develop a regional association of data archives in Asian countries, namely the Networks of Asian Social Science Data Archive (NASSDA).

The main purpose of the business meeting this year was to discuss possible future collaboration among data archives in Asia countries. The brief conclusions are listed in the following:

  1. To build a joint data catalogue for the archives involved.
  2. To construct web linkages and brief introduction among archives.
  3. To have a contact person for each data archive for future cooperation.

NASSDA members will hold annual workshop or conferences on a rotating basis. Further collaboration will be discussed in the near future. 

IASSIST Quarterly (IQ) volume 40-2 is now on the website: Revolution in the air

Welcome to the second issue of Volume 40 of the IASSIST Quarterly (IQ 40:2, 2016). We present three papers in this issue.

http://iassistdata.org/iq/issue/40/2

First, there are two papers on the Data Documentation Initiative that have their own special introduction. I want to express my respect and gratitude to Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences). Joachim (Achim) and Mary Vardigan (University of Michigan) have several times and for many years communicated to and advised the readers of the IASSIST Quarterly on the continuing development of the DDI. The metadata of data is central for the use and reuse of data, and we have come a long way through the efforts of many people.    

The IASSIST 2016 conference in Bergen was a great success - I am told. I was not able to attend but heard that the conference again was 'the best ever'. I was also told that among the many interesting talks and inputs at the conference Matthew Woollard's keynote speech on 'Data Revolution' was high on the list. Good to have well informed informers! Matthew Woollard is Director of the UK Data Archive at the University of Essex. Here in the IASSIST Quarterly we bring you a transcript of his talk. Woollard starts his talk on the data revolution with the possibility of bringing to users access to data, rather than bringing data to users. The data is in the 'cloud' - in the air - 'Revolution in the air' to quote a Nobel laureate. We are not yet in the post-revolutionary phase and many issues still need to be addressed. Woollard argues that several data skills are in demand, like an understanding of data management and of the many ethical issues. Although he is not enthusiastic about the term 'Big Data', Woollard naturally addresses the concept as these days we cannot talk about data - and surely not about data revolution - without talking about Big Data. I fully support his view that we should proceed with caution, so that we are not simply replacing surveys where we 'ask more from fewer' with big data that give us 'less from more'. The revolution gives us new possibilities, and we will see more complex forms of research that will challenge data skills and demand solutions at data service institutions.  

Papers for the IASSIST Quarterly are always very welcome. We welcome input from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing a presentation, give a thought to turning your one-time presentation into a lasting contribution. We permit authors 'deep links' into the IQ as well as deposition of the paper in your local repository. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the session participants, and will be readily available on the IASSIST website at http://www.iassistdata.org

Authors are very welcome to take a look at the instructions and layout:

http://iassistdata.org/iq/instructions-authors

Authors can also contact me via e-mail: kbr@sam.sdu.dk. Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.

Karsten Boye Rasmussen   
Editor, IASSIST Quarterly

IQ double issue 38(4)/39(1) is up, and so is vol 39(2)!

Hi folks!  A lovely gift for your reading pleasure over the holidays, we present two, yes, TWO issues of the IASSIST Quarterly.  The first is the double issue, 38(4)/39(1) with guest editors, Joachim Wacherow of GESIS – Leibniz Institute for the Social Sciences in Germany and Mary Vardigan of ICPSR at the University of Michigan, USA.  This issue focuses on the Data Documentation Initiative (DDI) and how it makes meta-analysis possible.  The second issue is 39(2), and is all about data:  avoiding statistical disclosure, using data, and improving digital preservation.  Although we usually post the full text of the Editor's Notes in the blog post, it seems lengthy to do that for both issues.  You will find them, though, on the web site: the Editor's Notes for the double issue, and the Editor's Notes for issue 39(2).

Michele Hayslett, for the IQ Publications Committee

A decade against decay: the 10th International Digital Curation Conference

The International Digital Curation Conference (IDCC) is now ten years old. On the evidence of its most recent conference, is in rude health and growing fast.

IDCC is the first time IASSIST decided to formally support another organisational conference. I think it was a wise investment given the quality of plenaries, presentations, posters, and discussions.

DCC already has available a number of blogs covering the substance of sessions, including an excellent summary by IASSIST web editor, Robin Rice. Presentations and posters are already available, and video from plenary sessions will soon be online.

Instead I will use this opportunity to pick-up on hanging issues and suggestions for future conferences.

One was apportionment of responsibility. Ultimately, researchers are responsible for management of their data, but they can only do so if supporting infrastructure is in place to help them. So, who is responsible for providing that: funders or institutions? This theme emerged in the context of the UK’s Engineering and Physical Sciences Research Council who will soon enforce expectations identifying the institution as responsible for supporting good Research Data Management.

Related to that was a discussion on the role of libraries in this decade. Are they relevant? Can they change to meet new challenges? Starting out as a researcher who became a data archivist and is now a librarian, I wouldn’t be here if libraries weren’t meeting these challenges. There’s a “hush” of IASSIST members also ready to take issue with the suggestions libraries aren’t relevant or not engaged with data, in fact they did so at our last conference.

Melissa Terras, (UCL) did a fantastic job presenting [PDF] work in the digital humanities that is innovative in not only preserving, but rescuing objects – and all done on small change research budgets. I hope a future IDCC finds space for a social sciences person to present on issues we face in preservation and reuse. Clifford Lynch (CNI) touched on the problems of data reuse and human subjects, which remained one of the few glancing references to a significant problem and one IASSIST members are addressing. Indeed, thanks must go to a former president of this association, Peter Burhill (Edinburgh) who mentioned IASSIST and how it relates to the IDCC audience on more than one occasion.

Finally, if you were stimulated by IDCC’s talk of data, reuse, and preservation then don’t forget our own conference in Minneapolis later this year.

Version 4, Research Data Curation Bibliography & the IQ

Another reason to write for the IQ: you might get yourself into Charles Bailey's prestigious bibliography, at

http://digital-scholarship.org/rdcb/rdcb.htm

I'm pleased to see no less than 7 IQ articles in the latest version. I didn’t count IASSISTers who published elsewhere but several of those were in the list as well.

Research Data Curation Bibliography

Charles W. Bailey, Jr.

Houston: Digital Scholarship

Version 4: 6/23/2014

Altman, Micah, and Mercè Crosas. "The Evolution of Data Citation: From Principles to Implementation" IASSIST Quarterly 37, no. 1-4 (2013): 62-70. http://www.iassistdata.org/iq/evolution-data-citation-principles-implementation

Bender, Stefam, and Jorg Heining. "The Research-Data-Centre in Research-Data-Centre Approach: A First Step towards Decentralised International Data Sharing." IASSIST Quarterly 35, no. 3 (2011): 10-16. http://www.iassistdata.org/iq/research-data-centre-research-data-centre-approach-first-step-towards-decentralised-international

Mooney, Hailey. "A Practical Approach to Data Citation: The Special Interest Group on Data Citation and Development of the Quick Guide to Data Citation." IASSIST Quarterly 37, 1-4 (2013): 71-77. http://iassistdata.org/iq/practical-approach-data-citation-special-interest-group-data-citation-and-development-quick-guide

 Ribeiro, Cristina, Maria Eugénia, and Matos Fernandes. "Data Curation at U. Porto: Identifying Current Practices across Disciplinary Domains." IASSIST Quarterly 35, no. 4 (2011): 14-17. http://www.iassistdata.org/iq/data-curation-uporto-identifying-current-practices-across-disciplinary-domains

 Schumann, Natascha. "Tried and Trusted: Experiences with Certification Processes at the GESIS Data Archive." IASSIST Quarterly 36, no. 3/4 (2012): 24-27. http://www.iassistdata.org/iq/tried-and-trusted-experiences-certification-processes-gesis-data-archive-0.

 Schumann, Natascha, and Astrid Recker. "De-mystifying OAIS compliance: Benefits and challenges of mapping the OAIS reference model to the GESIS Data Archive." IASSIST Quarterly 36, no. 2 (2012): 6-11. http://www.iassistdata.org/iq/de-mystifying-oais-compliance-benefits-and-challenges-mapping-oais-reference-model-gesis-data-arc

 Yoon, Ayoung, and Helen Tibbo. "Examination of Data Deposit Practices in Repositories with the OAIS Model." IASSIST Quarterly 35, no. 4 (2011): 6-13. http://www.iassistdata.org/downloads/iqvol35_tibbo.pdf

 Congratulations to the authors.

Robin Rice, IASSIST Web Editor

Research Data Management Issues Across Environments

Lots of conversations going on these days in different venues where people are asking many of the same questions:  how do we teach researchers about data management with limited staff, and what data management services should we offer?  How do we find sustainable ways to manage data that leverage the efforts of many different repositories, those in government, institutions and disciplinary ones?  How do we coalesce standard practice and reasonable but effective policies at at least the national level and preferably on a global scale?  What roles should governments play?  How much can we as data professionals accomplish on our own?  The Data Management and Curation SIG will host a workshop to talk about these and other issues across different countries and environments next Tuesday. Our speakers will include:

  • Dan Gillman, U.S. Bureau of Labor Statistics
  • Marcel Hebing, DIW Berlin
  • Chuck Humphrey, University of Alberta
  • Steven McEachern, Australian Data Archive
  • Barry Radler, Institute on Aging, University of Wisconsin-Madison
  • Robin Rice, EDINA and Data Library at the University of Edinburgh
  • Kathleen Shearer, Confederation of Open Access Repositories and Research Data Canada

Looking forward to seeing many of you in Toronto!

Michele Hayslett, University of North Carolina at Chapel Hill & Stefan Kramer, American University

White Paper Urges New Approaches to Assure Access to Scientific Data

Press release posted on behalf of Mark Thompson-Kolar, ICPSR.

12/12/2013:  (Ann Arbor, MI)—More than two dozen data repositories serving the social, natural, and physical sciences today released a white paper recommending new approaches to funding sharing and preservation of scientific data. The document emphasizes the need for sustainable funding of domain repositories—data archives with ties to specific scientific communities.

“Sustaining Domain Repositories for Digital Data: A White Paper,” is an outcome of a meeting convened June 24-25, 2013, in Ann Arbor. The meeting, organized by the Inter-university Consortium for Political and Social Research (ICPSR) and supported by the Alfred P. Sloan Foundation, was attended by representatives of 22 data repositories from a wide spectrum of scientific disciplines.

Domain repositories accelerate intellectual discovery by facilitating data reuse and reproducibility. They leverage in-depth subject knowledge as well as expertise in data curation to make data accessible and meaningful to specific scientific communities. However, domain repositories face an uncertain financial future in the United States, as funding remains unpredictable and inadequate. Unlike our European competitors who support data archiving as necessary scientific infrastructure, the US does not assure the long-term viability of data archives.

“This white paper aims to start a conversation with funding agencies about how secure and sustainable funding can be provided for domain repositories,” said ICPSR Director George Alter. “We’re suggesting ways that modifications in US funding agencies’ policies can help domain repositories to achieve their mission.”

Five recommendations are offered to encourage data stewardship and support sustainable repositories: 

  •  Commit to sustaining institutions that assure the long-term preservation and viability of research data
  • Promote cooperation among funding agencies, universities, domain repositories, journals, and other stakeholders 
  •  Support the human and organizational infrastructure for data stewardship as well as the hardware
  •  Establish review criteria appropriate for data repositories
  • Incentivize Principal Investigators (PIs) to archive data

While a single funding model may not fit all disciplines, new approaches are urgently needed, the paper says.

“What’s really remarkable about this effort—the meeting and the resulting white paper—has been the consensus across disciplines from astronomy to archaeology to proteomics,” Alter said. “More than two dozen domain repositories from so many disciplines are saying the same thing: Data sharing can produce more science, but data stewards must know the needs of their scientific communities.”

This white paper is a must read for anyone who wants to understand the role of scientific domain repositories and their critical role in the advancement of science. It can be downloaded at http://datacommunity.icpsr.umich.edu

 

The Inter-university Consortium for Political and Social Research (ICPSR), based in Ann Arbor, MI, is the largest archive of behavioral and social science research data in the world. It advances research by acquiring, curating, preserving, and distributing original research data. www.icpsr.umich.edu

The Alfred P. Sloan Foundation is a philanthropic, not-for-profit grantmaking institution based in New York City. Established in 1934, the Foundation makes grants in support of original research and education in science, technology, engineering, mathematics, and economic performance. www.sloan.org

###

IASSIST Fellows 2013

 

The IASSIST Fellows Committee is glad to announce through this post the six recipients of the 2013 IASSIST Fellowship award. We are extremely excited to have such a diverse and interesting group with different backgrounds and experience and encourage IASSISTers to welcome them at our conference in Cologne, Germany.

Please find below their names, countries and brief bios:

Chifundo Kanjala (Tanzania) 

Chifundo currently works as a Data Manager and data documentalist for an HIV research group called ALPHA network based at London School of Hygiene and Tropical Medicine's department of Population Health, Chifundo spends most of his time in Mwanza, Tanzania but do travel from time around Southern and Eastern Africa to work with colleagues in the ALPHA network.Before joining the London School of Hygiene and Tropical Medicine, he was working as a Data analyst consultant at Unicef, Zimbabwe.Currently working part time on a PhD with London school of Hygiene and Tropical Medicine. He has an MPhil in Demography from university of Cape Town, South Africa and a BSc Statistics Honours degree from University of Zimbabwe.


Judit Gárdos (Hungary) 

Judit Gárdos studied Sociology and German Language and Literature in Budapest, Vienna and Berlin. She is PhD-candidate in sociology, with a topic on the philosophy, sociology and anthropology of quantitative sociology. She is young researcher at the Institute of Sociology of the Hungarian Academy of Sciences. Judit has been working at the digital archive and research group called "voicesofthe20century.hu" that is collecting qualitative, interview-based sociological research collections of the last 50 years. She is coordinating the work at the newly-funded Research Documentation Center of the Center for Social Sciences at the Hungarian Academy of Sciences.


Cristina Ribeiro (Portugal) 

Cristina Ribeiro is an Assistant Professor in Informatics Engineering at Universidade do Porto and a researcher at INESC TEC. She has graduated in Electrical Engineering, holds a Master in Electrical and Computer Engineering and a Ph.D. in Informatics. Her teaching includes undergraduate and graduate courses in information retrieval, digital libraries, knowledge representation and markup languages. She has been involved in research projects in the areas of cultural heritage, multimedia databases and information retrieval. Currently her main research interests are information retrieval, digital preservation and the management of research data.


Aleksandra Bradić-Martinović (Serbia) 

Aleksandra Bradić-Martinović, PhD is the Research Fellow at the Institute of Economic Sciences, Belgrade, Serbia. Her field of expertize is research of information and communication technology implementation in economy, especially in banking, payment system operations and stock exchange operations. Aleksandra is also engaged in education process in Belgrade Banking Academy at the following subjects: E-banking and Payment Systems, Stock Market Dealings and Management Information Systems. She was engaged at several projects in the field of education. At the FP7 SERSCIDA project she is a Serbia team coordinator.


Anis Miladi (Tunisia) 

Anis Miladi earned his Bachelor degree in computer sciences and multimedia in 2007 and a Master degree in Management of Information Systems and organizations in 2008 and he is currently finalizing his master degree in project management(projected date summer 2013). Before joining the Social and Economic Survey Research Institute at Qatar University as Survey Research technology specialist in 2009, he worked as a programmer analyst in a private IT services company In Tunisia. His Area of expertise includes managing computer assisted surveys CAPI,CATI(Blaise surveying system)  in addition to Enterprise Document Management Systems, Enterprise Portals (SharePoint).


Lejla Somun-Krupalija (Sarajevo) 

Lejla currently serves as the Senior Program and Research Officer at the Human Rights Centre of the University of Sarajevo. She has over 15 years of experience in research, policy development in social inclusion issues. She is the Project Coordinator of the SERSCIDA FP7 project that aims to open data services/archives in the Western Balkan region in cooperation with CESSDA members. She had been engaged in the NGO sector previously, particularly on issues of capacity building and policy development in the areas of gender equality, the rights of persons with disabilities and issues of social inclusion and forced migration. She teaches academic writing, qualitative research, and gender and nationalism at the University of Sarajevo. 

Some reflections on research data confidentiality, privacy, and curation by Limor Peer

Some reflections on research data confidentiality, privacy, and curation

Limor Peer

Maintaining research subjects’ confidentiality is an essential feature of the scientific research enterprise. It also presents special challenges to the data curation process. Does the effort to open access to research data complicate these challenges?

A few reasons why I think it does: More data are discoverable and could be used to re-identify previously de-identified datasets; systems are increasingly interoperable, potentially bridging what may have been insular academic data with other data and information sources; growing pressure to open data may weaken some of the safeguards previously put in place; and some data are inherently identifiable

But these challenges should not diminish the scientific community’s firm commitment to both principles. It is possible, and desirable, for openness and privacy co-exist. It will not be simple to do, and here’s what we need to keep in mind:

First, let’s be clear about semantics. Open data and public data are not the same thing. As Melanie Chernoff observed, “All open data is publicly available. But not all publicly available data is open.” This distinction is important because what our community means by open (standards, format) may not be what policy-makers and the public at large mean (public access). Chernoff rightly points out that “whether data should be made publicly available is where privacy concerns come into play. Once it has been determined that government data should be made public, then it should be done so in an open format.” So, yes, we want as much data as possible to be public, but we most definitely want data to be open.

Another term that could be clarified is usefulness. In the academic context, we often think of data re-use by other scholars, in the service of advancing science. But what if the individuals from whom the data were collected are the ones who want to make use of it? It’s entirely conceivable that the people formerly known as “research subjects” begin demanding access to, and control over, their own personal data as they become more accustomed to that in other contexts. This will require some fresh ideas about regulation and some rethinking of the concept of informed consent (see, for example, the work of John Wilbanks, NIH, and the National Cancer Institute on this front). The academic community is going to have to confront this issue.

Precisely because terms are confusing and often vaguely defined, we should use them carefully. It’s tempting to pit one term against the other, e.g., usefulness vs. privacy, but it may not be productive. The tension between privacy and openness or transparency does not mean that we have to choose one over the other. As Felix Wu says, “there is nothing inherently contradictory about hiding one piece of information while revealing another, so long as the information we want to hide is different from the information we want to disclose.” The complex reality is that we have to weigh them carefully and make context-based decisions.

I think the IASSIST community is in a position to lead on this front, as it is intimately familiar with issues of disclosure risk. Just last spring, the 2012 IASSIST conference included a panel on confidentiality, privacy and security. IASSIST has a special interest group on Human Subjects Review Committees and Privacy and Confidentiality in Research. Various IASSIST members have been involved with heroic efforts to create solutions (e.g., via the DDI Alliance, UKDA and ICPSR protocols) and educate about the issue (e.g., ICPSR webinar , ICPSR summer course, and MANTRA module). A recent panel at the International Data Curation Conference in Amsterdam showcased IASSIST members’ strategies for dealing with this issue (see my reflections about the panel).

It might be the case that STEM is leading the push for open data, but these disciplines are increasingly confronted with problems of re-identification, while the private sector is increasingly being scrutinized for its practices (see this on “data hops”). The social (and, of course, medical) sciences have a well-developed regulatory framework around the issue of research ethics that many of us have been steeped in. Government agencies have their own approaches and standards (see recent report by the U.S. Government Accountability office). IASSIST can provide a bridge; we have the opportunity to help define the conversation and offer some solutions.

IQ Special Quadruple Issue: The Book of the Bremen Workshop

Welcome to this very special IASSIST Quarterly issue. We now present volume 34 (3 & 4) of 2010 and volume 35 (1 & 2) of 2011. Normally we have about three papers in a single issue. In this super-mega-special issue we have fourteen papers from the countries: Finland, Ireland, United Kingdom, Austria, Czech Republic, Denmark, Germany, Norway, Slovenia, Belarus, Hungary, Lithuania, Poland and Switzerland. This will be known in IASSIST as the “The book of the Bremen Workshop”.

The workshop took place in April 2009 at the University of Bremen. The workshop was hosted by the Archive for Life Course Research at Bremen and funded by the Timescapes Initiative with support from CESSDA. The background and context of the workshop as well as short introductions to the many papers are found in the Editorial Introduction by the guest editors Bren Neale and Libby Bishop. The many papers are the result of the effort of numerous authors that were instrumental in the development and fulfillment of the many outcomes of the workshop. The introduction by the guest editors shows impressive lists of short-term activities, agreed goals, and also strategies for development. There are future initiatives and the future looks bright and interesting.The focus of the Bremen Workshop is on “qualitative (Q) and qualitative longitudinal (QL) research and resources across Europe”. I would have called that a qualitative workshop but you can see from the introduction and the papers that this subject is often referred to as “qualitative and QL data”. The “and QL” emphasizes that the longitudinal aspect is the special and important issue. In the beginning of IASSIST data was equivalent to quantitative data. However, digital archives found in the next wave that the qualitative data also with great value were made available for secondary research. The aspect of “longitudinal” further accentuates that value creation.

This is a growing subject area. During the processing one of the authors wanted to update her paper and asked for us to replace the sentence “80 archived qualitative datasets and yearly around 30-40 datasets are ordered for re-use” with “115 archived qualitative datasets and yearly around 50-60 datasets are ordered for re-use”. Yes, we do have a somewhat long processing time but this is still a very fast growth rate. I want to thank Libby Bishop for not being annoyed when I persistently reminded her of the IQ special issues. I’m sure the guest editors with similar persistency contacted the authors. It was worth it.

As in Sherlock Holmes we might look for what is not there as when curiosity is raised by the fact that “the dog did not bark”. IASSIST has had and continues to have a majority of its membership in North America so it is also remarkable that we here present the initiative on “qualitative (Q) and qualitative longitudinal (QL) research” with a European angle. Hopefully the rest of the world will enjoy these papers and there will probably be more papers both from Europe but also from the others regions covered by the IASSIST members.

Articles for the IQ are always very welcome. They can be papers from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. If you don’t have anything to offer right now, then please prepare yourself for the next IASSIST conference and start planning for participation in a session there. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is much appreciated as the information in the form of an IQ issue reaches many more people than the session participants and will be readily available on the IASSIST website at http://www.iassistdata.org.

Authors are very welcome to take a look at the description for layout and sending papers to the IQ:
http://iassistdata.org/iq/instructions-authors
Authors can also contact me via e-mail: kbr@sam.sdu.dk. Should you be interested in compiling a special issue for the IQ as guest editor (editors) I will also delighted to hear from you.

Karsten Boye Rasmussen
Editor August 2011

Image Credit: by mitko-denev on flickr

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...