Already a member?

Sign In
Syndicate content


but how do i *do* qualitative research? bridging the gap between qualitative researchers and methods resources--part 4

Last week Mandy shared ideas about how librarians and other data-support professionals can act as connectors, crusaders and collaborators on campus in order to better provide and develop qualitative research teaching resources. This week we want to build on Mandy’s suggestions by looking a bit more broadly at how librarians and other data-support professionals can help build community around qualitative research at their institutions.

As qualitative research methods are often not the dominant methodologies used in departments and institutions, qualitative researchers may not have a strong support network and may lack colleagues to consult with or learn from. I have regularly heard from graduate students embarking on a qualitative project that they are struggling to learn about qualitative methods on their own because their advisor doesn’t know much about qualitative research or because their department does not offer a course in qualitative research methods. I have also heard from the lone qualitative research faculty member in a department about how isolating that can be. So building a community where qualitative researchers can support, connect with and learn from each other can be very important. Libraries, often seen as neutral ground on campus, have an opportunity to play a unique role as facilitators building this community by connecting people and leveraging resources from across campus.

Some activities that librarians and data support professionals may want to consider to help build a qualitative research community on their campus include:

  • Conduct an environmental scan.  Each campus has a different research support environment, so it may help to first do an environmental scan of your campus to learn who is already doing what to support qualitative research. Document what you learn! Think about how the library can bring together existing resources and build on them.
  • Then, act as a clearinghouse around qualitative methods and support. As you investigate what is going on across your campus related to qualitative research, compile and publicize what you find.  For example, you may create:
  1. lists of campus resources, including labs and software/support tools available,
  2. lists of qualitative methods courses offered in departments across campus,
  3. a directory of faculty members with strengths in qualitative research who agree to act as a resource for others.

As Jill mentioned in our second blogpost, Libguides/research guides are often a great way to capture and share this type of information with the university community and with  library colleagues. Additionally, sharing this information through other campus organizations, such as a graduate resource center, might be a good way to reach an audience with related needs. 

  • Establish/host/contribute to a campus listserv for those with questions or who need help working with qualitative methods and tools.
  • Partner with faculty/graduate students/other campus departments to establish/host a qualitative research support group on your campus. These could be informal brown bags, reading groups, or a place for attendees to meet others and ask questions of those  interested in similar issues.
  • Consider helping to establish a mentor program partnering those new to qualitative research with more experienced researchers.
  • Host/organize/co-sponsor a qualitative research event/symposium - this not only fosters building a vibrant qualitative research community but can help demonstrate the library’s commitment to serving this community.
  • Offer library spaces and resources to support qualitative groups, events, etc.

I hope these suggestions get you started thinking about ways to build a community on your campus. We’d love to hear what you are already doing and we welcome comments here, emails to the IASSIST listserv, the QSSHDIG google group, or directly to the authors, and/or comments in this “Blog Conversations” doc embedded in the QSSHDIG website.

This is the last of our 4-part blog series, “How Do I *Do* Qualitative Research? Bridging the Gap between Qualitative Researchers and Methods Training Resources.” QSSHDIG would love to get more conversations going on the IASSIST blog - there's a section at the bottom of our “Blog Conversations” doc for suggesting future QSSDHIG posts - please do!

But How Do I *Do* Qualitative Research? Bridging the Gap between Qualitative Researchers and Methods Resources--PART 3

Connectors, Crusaders, and Collaborators

Jill’s post from last week detailed ways we data-support professionals can connect researchers to library collections and other information sources for bridging the qualitative methods gap. I’d like to offer more ideas for how we can act not only as connectors but also as crusaders and collaborators on our campuses specifically in the realm of providing and developing qualitative-research teaching resources. 

Because we work with researchers across campus and in various capacities, data-support professionals are often more aware of methods gaps than are the academic departments we support. As such, we are well-positioned to act as intermediaries to address these gaps by connecting researchers to existing resources and crusading for additional resources. For example:

  • We can use our cross-disciplinary knowledge to connect researchers to resources (including other people) about which they may be unaware:
    • “Did you know that the College of Education offers a qualitative methods class that’s open to any discipline?”
    • “Do you know about ResearchTalk's Qualitative Research Summer Intensive?”
    • “Professor X does narrative analysis - maybe you should contact them to see if they’d be on your dissertation committee?”
    • “Professor Y does mixed-methods research - perhaps they’d be willing to consult on the qualitative aspects of your research, and maybe even be a co-investigator?”
  • When we encounter campus researchers experiencing this qualitative methods gap, we can document these occasions and then share them with those who can affect positive change:
    • Tell chairs of academic departments and/or methods professors that graduate students attending qualitative analysis software workshops want this training integrated into their qualitative methods class.
    • Contact the Graduate School and recommend they coordinate student support groups focused on qualitative research [Liz’s post next week will have even more community-building suggestions].
    • Share information gleaned from research consultations and/or instruction sessions with your library administrators to advocate for funds for qualitative-research collections/e-resources and for your own professional development in qualitative methods training.

In addition to connecting and crusading, we data-support professionals can collaborate with other campus researchers to buttress qualitative methods training on campuses. For example:

  • Librarians can draw on their own expertise and also partner with qualitative researchers on campus to offer presentations, workshops, brown bags, etc., aimed at addressing the qualitative methods gap:
    • In my presentation with sociology professor Dr. Ralph LaRossa, “The Logics and Logistics of Qualitative Research”, Dr. LaRossa discusses the methodological steps involved in building theoretically-rich qualitative analyses, then I outline the specific features of NVivo qualitative research software that complement and facilitate these analyses.
    • When teaching qualitative research softwares such as NVivo, Atlas.ti, Quirkos, MAXQDA, or Dedoose, pointedly integrate methodological concepts along with teaching the mechanics (e.g., “Grounded Theory in the Atlas.ti Environment,” “Using NVivo Memos to Document your Methodological Process,” “Developing Variables and Rich Coding Schemas using NVivo Hierarchical Nodes,” “De-Identifying Interview Transcripts in Quirkos,” etc.)

These are just a few ideas for how we can use the three Cs (connecting, crusading, and collaborating) to address this qualitative methods gap. What ideas do you have? What are your successes in this area? Or your slip-ups that we can all learn from?

We welcome comments here, emails to the IASSIST listserv, the QSSHDIG google group, or directly to the authors, and/or comments in this “Blog Conversations” doc embedded in the QSSHDIG website. Also, there's a section at the bottom of the "Blog Conversations" doc for suggesting future QSSDHIG posts - please do!

Stayed tuned for Part 4 of our blog series next week, when Liz Cooper will address how librarians and other data-support professionals can help build community at their institutions around qualitative research.

Special Issue of International Journal of Librarianship: Data Librarianship

[Posted on behalf of Kristi Thompson]

The special issue of the International Journal of Librarianship on Data Librarianship I guest edited has been published. One feature in particular that might be of interest to IASSISTers is an article in which we interviewed a number of people who had prominent and interesting careers in the field and asked them to share their thoughts on how data services have changes over the course of their careers, as well as advice for newbies.

Our interviewees include IASSIST President Tuomas J. Alaterä, as well as Ann Green, Guangjing Li (China), Jian Qin, Wendy Watkins, and Lynn Woolfrey. Hopefully it will help spread the IASSIST word to some audiences we don't always reach! (Readership for this journal is largely in Asia so far.)

Article (open access):

Other articles and book reviews will also be of interest. Table of contents:

Kristi Thompson

But How Do I *Do* Qualitative Research? Bridging the Gap between Qualitative Researchers and Methods Resources--PART 2

Mandy kicked off our 4-part blog series last week with an inaugural post that provided background and context for this project, which centers around a specific challenge faced by many qualitative researchers: lack of qualitative methods training.  This post offers some concrete ideas to data-support professionals on how to leverage library collections and other information sources that direct researchers to secondary and tertiary sources on qualitative methods to address this issue.

Possibly the most basic yet high-impact way to bridge the gap between researchers and qualitative methods is to create a LibGuide—or any other online research guide or pathfinder—dedicated to qualitative research resources.  It can be embedded within an existing resource, such as a library’s Data Services page or a course-specific guide, or it can exist as an independent guide unto itself.

Now, I know what you’re thinking: when faced with an instructional or research need, a librarian’s knee-jerk reaction is often, “Let’s make a LibGuide!”  While research guides aren’t necessarily appropriate for every topic or service we provide as information and data specialists (i.e., LibGuides aren’t a panacea, per se), they can be a useful didactic medium for culling and delivering information and resources specific to qualitative research methods.   

Online research guides, like LibGuides, allow researchers to benefit from the guidance and expertise of a data-support professional in an autonomous, self-paced manner.  This type of learning object is well-suited for researchers who fit the profile Mandy described in her introductory post for this series: those who aren’t getting formal qualitative methods training and need to get up to speed quickly on their own, and/or researchers with varying degrees of qualitative methods knowledge and experience who would like a set of materials, sources, and resources to which they can refer back periodically.  The portability of a LibGuide also makes it convenient for use by data-support professionals in mediated settings, such as consultations or instruction sessions.  And at institutions where demand for research methods and tools training outstrips an individual or staff’s capacity to provide one-on-one or even course-embedded support, a LibGuide can solve the scalability problem for a large and diverse set of needs (although, obviously, not all of them).  Lastly, LibGuides aren’t just for researchers; they also serve to assist colleagues and fellow information and data specialists in providing reference and research services at an institution (and beyond) and are an effective tool for cross-training.  This last point is relevant especially for fairly specialized, niche areas of expertise, like qualitative research methods.

 A number of qualitative research guides already exist and are worth checking out (e.g., Duke University Libraries’ guide on Qualitative Research by Linda Daniel, UCSF Library’s Qualitative Research Guide by Evans Whitaker).  Below are some suggestions for content to include—with a focus on resources related to learning about qualitative methods—if one were to consider creating anew or building on an existing qualitative research guide.

Select bibliography of relevant literature

A centralized bibliography, or simply citations/links dispersed throughout a guide, can be useful in directing users to relevant secondary and tertiary sources to learn more about qualitative research methods.  Citations to the following literature types may be appropriate to include:  

As a companion, it may be useful to link to relevant catalogs and social-sciences databases to search for additional literature and research.  However, not all catalogs and bibliographic databases index research methods, and even those that do often don’t index qualitative methods with appropriate granularity.  Thus, it may be necessary to provide tips on how best to search for studies that employ qualitative methods (e.g., NYU Libraries’ guide on Locating Qualitative Research by Susan Jacobs).  This way, a guide delivers select sources but also teaches users how to find additional methods-related sources themselves.

Links to subscription-based resources

There are a number of specialized library databases designed for quick reference and/or in-depth, self-guided learning, and these serve as rich fonts of information about qualitative methods.  Three possible resources that fit this bill are:   

  • Credo Reference
    While not specific to research methods, Credo is a multidisciplinary, searchable collection of digitized reference works (e.g., dictionaries, encyclopedias, and handbooks) that provide helpful background information on a topic but also act as springboards to connect users to further readings and cross-references.  Credo can serve as a good starting point for researchers looking to learn more about qualitative methods in general or about a particular methodology.   
  • Sage Research Methods Online (SRM)
    SRM is an online multimedia collection devoted to research methods (qualitative and quantitative), with a special emphasis on research skills training.  It includes e-versions of SAGE’s Little Blue Books, an instructional series on qualitative research methods, as well as many other searchable, full-text interdisciplinary SAGE reference and journal sources that allow users to deep-dive into a particular method at each stage of the research lifecycle.  SRM also provides resources for teaching qualitative methods, including case studies, sample datasets, and instructional videos.

  • Oxford Bibliographies Online (OBO)
    OBO offers access to online, peer-reviewed, annotated bibliographies that are organized by discipline and are written by experts in their fields.  One can find entire bibliographies or portions of a bibliography dedicated to qualitative methods in a given field of social science (e.g., sociology, political science) or to a particular qualitative methodology (e.g., anthropology, education).  This makes OBO a good source of information for discipline-specific qualitative research methods.  

In addition to researchers, these resources can be indispensible to data-support professionals who are asked to consult on research projects using qualitative methods outside their own specializations, or who are asked to consult with researchers who sit in (or between) disciplines with which they are less familiar or comfortable.

List of professional development and ongoing learning opportunities

Researchers who are new to a qualitative research method may want to learn more about it in an interactive manner beyond (or in lieu of) the classroom.  Opportunities to do so are plentiful and varied, and may take the following forms:

I hope you have enjoyed my suggestions in this post!  They are by no means exhaustive.  I would love to here what you think, what you would add, or what you’re already using on your guides to address the qualitative methods gap discussed in this series of blog posts.  

We welcome comments here, emails to the IASSIST listserv, the QSSHDIG google group, or directly to the authors, and/or comments in this “Blog Conversations” doc embedded in the QSSHDIG website. Also, there's a section at the bottom of the "Blog Conversations" doc for suggesting future QSSDHIG posts - please do!

Stayed tuned for Part 3 of our blog series next week when Mandy Swygart-Hobaugh will share ideas for developing and providing training resources in collaboration with faculty and academic departments that are mindful of the qualitative methods gap.

But How Do I *Do* Qualitative Research? Bridging the Gap between Qualitative Researchers and Methods Resources--PART 1

The IASSIST Qualitative Social Science & Humanities Data Interest Group (QSSHDIG) was created in October 2016, its central purpose: to foster conversations regarding the needs of researchers who generate qualitative data, and what types of services librarians and other information professionals can develop to support these researchers in managing their data/source materials throughout the research lifecycle.

This four-post blog series engages in one particular conversation: challenges researchers face in terms of a lack of qualitative methods training, and strategies for how data-support professionals can address these challenges. The following QSSHDIG members are the series authors:

  • Jill Conte, Social Sciences Librarian at New York University
  • Liz Cooper, Social Sciences Librarian at the University of New Mexico
  • Mandy Swygart-Hobaugh, Social Sciences Librarian at Georgia State University

To foster conversation, we welcome comments here, emails to the IASSIST listserv, the QSSHDIG google group, or directly to the authors, and/or comments in this “Blog Conversations” doc embedded in the QSSHDIG website. Also, there's a section at the bottom of the "Blog Conversations" doc for suggesting future QSSDHIG posts - please do!

Why have this conversation?

Many social science researchers (students and faculty alike) are increasingly conducting qualitative research while lacking formal training in qualitative methods. This may be due to various factors, including but not limited to the following:

  • their particular discipline does not widely embrace qualitative research,
  • their discipline just recently began emphasizing mixed methods (using qualitative and quantitative methods) when previously it was predominantly quantitative-based,
  • they are in an interdisciplinary academic program without a strong research methods training component.

Those of us who offer training sessions on qualitative data analysis software (such as NVivo, Atlas.ti, Quirkos, or Dedoose) often experience researchers coming to these sessions without the methodological background to *do* qualitative research or understand what the software can/cannot do for them - sometimes hoping that the software will have the “magic button” to solve their lack of training. Similarly, as social science liaison librarians we often witness this qualitative methods gap during our research consultations. Although this dilemma of lack of methods training is not unique to qualitative research (i.e., researchers lacking quantitative research training are known to attend statistical software training sessions), when compared to their quantitative counterparts, qualitative researchers often have less resources for support and for building necessary skills.

The three posts in the remainder of this blog series will offer concrete strategies for how data-support professionals can act as bridges between social science researchers and the resources they need to strengthen their qualitative research and methodologies skills:

  • Jill Conte’s post will offer suggestions for connecting researchers with secondary and tertiary sources for qualitative research training. [to be posted Monday, July 24] 
  • Mandy Swygart-Hobaugh’s post will share ideas for developing and providing training resources in collaboration with faculty and academic departments that are mindful of this methods gap. [to be posted Monday, July 31] 
  • Liz Cooper’s post will address how librarians and other data-support professionals can help build community at their institutions around qualitative research. [to be posted Monday, August 7] 

IQ Volume 40 Issue 3 now available

Issue 40(3) is now online at

Editor’s Notes

Being international - and proud of it!

IASSIST is proud of being international. These days some us of find it important to emphasize how international collaboration has improved and made our lives more efficient. In the small but around-the-globe-reaching world of IASSIST, many national data archives have come into existence as well as continuing their development, through friendly international support and spreading of knowledge and good practices among IASSISTers. So let us cherish the 'International' in IASSIST. We are proud of the lead 'I' for 'International' in the IASSIST acronym and have no intention of changing that to 'N' for 'National'. It is also my impression that data archives all over the world simply don't have the facilities for storing 'alternative facts' as they are shy of all kinds of documentation.

Welcome to the third issue of Volume 40 of the IASSIST Quarterly (IQ 40:3, 2016). Four papers with authors from three continents are presented in this issue. The paper 'Demonstrating Repository Trustworthiness through the Data Seal of Approval' is a summary of a panel session at the IASSIST 2015 conference in Minneapolis with panel members Stuart Macdonald, Ingrid Dillo, Sophia Lafferty-Hess, Lynn Woolfrey, and Mary Vardigan. The paper has an introduction from DANS in the Netherlands where the Data Seal of Approval (DSA) originated. Cases from the US and South Africa are presented and the future of the DSA including possible harmonization with other systems is discussed. DSA certifications are basically consumer guidance, clearly assisting all the involved parties. Depositors and funding bodies will be assured that data are reliably stored, researchers can reliably access the data repositories, and repositories are supported in their work of archiving and distribution of data.

The second article brings us to the actual use of data. From the UK Data Service, Rebecca Parsons and Scott Summers in 'The Role of Case Studies in Effective Data Sharing, Reuse and Impact' take us into positive narratives around secondary data. The background is that although the publishing of data is now recognised by funders, the authors find that ‘showcasing’ brings motivation for data sharing and reuse as well as improving the quality of data and documentation. The impact of case studies is all-sided and research, depositing data, and the brand recognition of the UK Data Service are among the areas investigated. The future is likely to include new case studies developed for use in teaching in schools, with easy linking to datasets, as well as for researchers being assisted to build their own portfolios. The appendix presents case studies on research and impact.

In the third article, we are situated in data creation. Muhammad F. Bhuiyan and Paula Lackie from Carleton College in Minnesota write on 'Mitigating Survey Fraud and Human Error: Lessons Learned from A Low Budget Village Census in Bangladesh'. As the 'fraud' term implies, they are looking into the problem of data creators being too creative, but more importantly they are investigating the essential area of data quality. The authors explain how selected technological assets like the use of geographic information systems (GIS) and audio-capturing smart pens improved data quality. The use of these tools is exemplified through many scenarios described in the paper. Furthermore, a procedure of daily monitoring and fast transcription lead to quick surveyor re-training and dismissal of others, thus minimising data errors. For those interested in false data and its detection, the introduction in particular has valuable references to literature.

In the last paper the difficult task of handling images is addressed in 'Image Management as a Data Service' by Berenica Vejvoda, K. Jane Burpee, and Paula Lackie. Vejvoda and Burpee work at McGill University in Montreal. You have already met Lackie from Carleton College in relation to the third paper above. The 'images' in the article are digital images, and the authors suggest that the knowledge of digital data services across the 'research data lifecycle' also benefits the management of digital images. Digital images are numerical data, and the article compares the data, metadata, and paradata of a survey respondent to the information on a digital image. Considerations from normal data concerning system formats and storage space also apply to management of images. In the last section the paper introduces copyright issues that are complicated, to say the least. Just as reuse of normal data can have ethical angles, it is even more apparent that images can have complicated issues of privacy and confidentiality.

Papers for the IASSIST Quarterly are always very welcome. We welcome input from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing a presentation, give a thought to turning your one-time presentation into a lasting contribution. We permit authors 'deep links' into the IQ as well as deposition of the paper in your local repository. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the session participants, and will be readily available on the IASSIST website at http://www.iassistd

Authors are very welcome to take a look at the instructions and layout:

Authors can also contact me via e-mail: Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.

Karsten Boye Rasmussen

January 2017


IASSIST Call for Event Sponsorship Proposals 2017 Round 2: “Mini Grants”

The IASSIST Liaison and Organizational Sponsorship Task Force is seeking proposals for sponsorships of regional or local events during calendar year 2017. In this second round of sponsorships we will be awarding up to four grants of $500 USD each, but requests for any amount up to $500 USD will be considered.

The goal of these sponsorships is to support local networks of data professionals and data-related activities across the globe in order to help support IASSISTers activities throughout the year and increase awareness of the value of IASSIST membership.

Events should be a gathering of data professionals from multiple institutions and may vary in size and scope from workshops, symposia, conferences, etc. These may be established events or new endeavors. We are particularly looking to sponsor regional or local level events that will attract data professionals who would benefit from IASSIST membership, but may not always be able to travel to attend IASSIST conferences. Preference will be given to events from geographic areas outside of traditional IASSIST conference locations (North America and Western Europe), and from underrepresented membership areas as such as Latin/South America, Africa, Asia/Pacific, and Eastern Europe.

Requests for sponsorships may be monetary, and may also include a request for mentorship assistance by matching the event planning committee with an experienced IASSIST member with relevant expertise (e.g., conference planning, subject/content, geographic familiarity).

Accepted events will be required to designate an active IASSIST member as the liaison. Generally, this would be an IASSIST member who will be attending the event and although not required, may be on the planning committee or otherwise contributing to the event. The liaison will be responsible for assistance with coordinating logistics related to the sponsorship, ensuring that the sponsorship is recognized at the event, and contributing a post to the IASSIST iBlog about the event.

Proposals should include:

  • Name of the event and event details (date, location, any other pertinent information)
  • Organizing or hosting institution
  • Description of event and how it relates to IASSIST goals and communities
  • Specific request for sponsorship: amount of money and/or mentorship assistance
  • Description of how the sponsorship will be used
  • Name and contact information of person submitting proposal and designated event liaison to IASSIST (if different)

Proposals are due on Friday, June 30 2017 via the Application Form. Notification of sponsorship awards will be by July 21 2017. The number and monetary extent of awarded sponsorships will depend on the number and quality of applications received within a total budgeted limit. Individual sponsorship requests may range from $0 USD (request for mentorship only) to $500 USD.

Please direct questions to Jen Doty, IASSIST Membership Chair (

2016-2017 report for the Qualitative Social Science and Humanities Data Interest Group (QSSHDIG)

The Qualitative Social Science and Humanities Data Interest Group (QSSHDIG) was formed in fall 2016. We decided to focus our efforts on the conference this year. We have some continuing projects planned for next year. We are meeting at the conference on Tuesday, May 22 at 4pm in the Oread Lobby.
IASSIST 2017 Conference activities:
Continuing Projects:
  • We are developing a blog post series on the challenges of balancing teaching/providing resources for qualitative *software* against teaching/providing resources for qualitative methods. Mandy is currently leading that effort.
  • We have been working on developing a LibGuide compiling qualitative and humanities data resources (e.g., finding data sources, analysis tools, etc.). We are gathering some resources now and will talk more about this at our group meeting tomorrow. Lynda is leading this effort.
  • We also have an email list for everyone interested in Qualitative Social Science or Humanities Data research.
  • We would love to have any  members help out with these efforts. If you are interested, please email Lynda or Mandy.

IASSIST Geospatial Interest Group 2017 Report


2016-2017 Report on the IASSIST Geospatial Interest Group

The group was founded in the Spring/Summer of 2016 and met at the 2016 Bergen conference. The central purpose of the IASSIST Geospatial Interest Group is to create a network for members focused on issues of geospatial data as related to the social sciences. The current chair is Jennifer Moore (Washington University in St. Louis)

At the meeting in Bergen (review notes) we discussed the merits of an IASSIST geospatial interest group in relation to existing groups (e.g. ALA), recommending geospatial resources and tools for institutions with limited resources available to support GIS, and whether the interest group needs to be laser focused on the social sciences.


As a result of the discussions, we developed lists of resources and tools, which are not exhaustive.

For communication we established a Google Group, but traffic has been light. In 2017/2018 the group may explore a more effective communication tool.

The Challenge of Rescuing Data: Lessons and Thoughts

A version of this post originally appeared on the NYU Data Dispatch blog.

Data rescue efforts began in January 2017, and over the past few months many institutions hosted hack-a-thon style events to scrape data and develop strategies for preservation. The Environmental Data & Governance Initiative (EDGI) developed a data rescue toolkit, which apportioned the challenge of saving data by distinct federal agency. 

We've had a number of conversations at NYU and with other members of the library community about the implications of preserving federal data and providing access to it. The efforts, while important, call attention to a problem of organization that is very large in scope and likely cannot be solved in full by libraries.

Also a metaphor for preserving federal data

Thus far, the divide-and-conquer model has postulated that individual institutions can "claim" a specific federal agency, do a deep dive to root around its websites, download data, and then mark the agency off a list as "preserved." The process raises many questions, for libraries and for the data refuge movement. What does it mean to "claim" a federal agency? How can one institution reasonably develop a "chain of custody" for an agency's comprehensive collection of data (and how do we define chain of custody)?

How do we avoid duplicated labor? Overlap is inevitable and isn't necessarily a bad thing, but given the scope of the challenge, it would be ideal to distribute efforts so as to benefit from the hard work of metadata remediation that all of us will inevitably do.

These questions suggest even more questions about communication. How do we know when a given institution has preserved federal data, and at what point do we feel ready as a community to acknowledge that preservation has sufficiently taken place? Further, do we expect institutions to communicate that a piece of data has been published, and if so, by what means? What does preservation mean, especially in an environment where data is changing frequently, and what is the standard for discovery? Is it sufficient for one person or institution to download a file and save it? And when an institution claims that it has “rescued” data from a government agency, what commitment does it have to keep up with data refreshes on a regular basis?

An example of an attempt to engage with these issues is Stanford University’s recent decision to preserve the Housing and Urban Development spatial datasets, since they were directly attacked by Republican lawmakers. Early in the Spring 2017 semester, Stanford downloaded all of HUD's spatial data, created metadata records for them, and loaded them into their spatial discovery environment (EarthWorks).

A HUD dataset preserved in Stanford's Spatial Data Repository and digital collections

We can see from the timestamp on their metadata record that the files were added on March 24, 2017. Stanford's collection process is very robust and implies a level of curation and preservation that is impressive. As colleagues, we know that by adding a file, Stanford has committed to preserving it in its institutional repository, presenting original FGDC or ISO 19139 metadata records, and publishing their newly created records to OpenGeoMetadata, a consortium of shared geospatial metadata records. Furthermore, we know that all records are discoverable at the layer level, which suggests a granularity in description and access that often is not present at many other sources, including

However, if I had not had conversations with colleagues who work at Stanford, I wouldn't have realized they preserved the files at all and likely would've tried to make records for NYU's Spatial Data Repository. Even as they exist, it's difficult for me to know that these files were in fact saved as part of the Data Refuge effort. Furthermore, Stanford has made no public claim or longterm "chain of custody" agreement for HUD data, simply because no standards for doing so currently exist.

Maybe it wouldn't be the worst thing for NYU to add these files to our repository, but it seems unnecessary, given the magnitude of federal data to be preserved. However, some redundancy is a part of the goals that Data Refuge imagines:

Data collected as part of the #DataRefuge initiative will be stored in multiple, trusted locations to help ensure continued accessibility. [...]DataRefuge acknowledges--and in fact draws attention to--the fact that there are no guarantees of perfectly safe information. But there are ways that we can create safe and trustworthy copies. DataRefuge is thus also a project to develop the best methods, practices, and protocols to do so.

Each institution has specific curatorial needs and responsibilities, which imply choices about providing access to materials in library collections. These practices seldom coalesce with data management and publishing practices from those who work with federal agencies. There has to be some flexibility between community efforts to preserve data, individual institutions and their respective curation practices.

"That's Where the Librarians Come In"

NYU imagines a model that dovetails with the Data Refuge effort in which individual institutions build upon their own strengths and existing infrastructure. We took as a directive some advice that Kimberly Eke at Penn circulated, including this sample protocolWe quickly began to realize that no approach is perfect, but we wanted to develop a pilot process for collecting data and bringing it into our permanent geospatial data holdings. The remainder of this post is a narrative of that experience in order to demonstrate some of the choices we made, assumptions we started with, and strategies we deployed to preserve federal data. Our goal is to preserve a small subset of data in a way that benefits our users and also meets the standards of the Data Refuge movement.

We began by collecting the entirety of publicly accessible metadata from, using the underlying the CKAN data catalog API. This provided us with approximately 150,000 metadata records, stored as individual JSON files. Anyone who has worked with metadata knows that it’s messy and inconsistent but is also a good starting place to develop better records. Furthermore, the concept of serves as an effective registry or checklist (this global metadata vault could be another starting place); it's not the only source of government data, nor is it necessarily authoritative. However, it is a good point of departure, a relatively centralized list of items that exist in a form that we can work with.

Since NYU Libraries already has a robust spatial data infrastructure and has established workflows for accessioning GIS data, we began by reducing the set of records to those which are likely to represent spatial data. We did this by searching only for files that meet the following conditions:

  • Record contains at least one download resource with a 'format' field that contains any of {'shapefile', 'geojson', 'kml', 'kmz'}
  • Record contains at least one resource with a 'url' field that contains any of {'shapefile', 'geojson', 'kml', ['original' followed by '.zip']}

That search generated 6,353 records that are extremely likely to contain geospatial data. From that search we yielded a subset of records and then transformed them into a .CSV.

The next step was to filter down and look for meaningful patterns. We first filtered out all records that were not from federal sources, divided categories into like agencies, and started exploring them. Ultimately, we decided to rescue data from the Department of Agriculture, Forest Service. This agency seems to be a good test case for a number of the challenges that we’ve identified. We isolated 136 records and organized them here (click to view spreadsheet). However, we quickly realized that a sizable chunk of the records had already somehow become inactive or defunct after we had downloaded them (shaded in pink), perhaps because they had been superseded by another record. For example, this record is probably meant to represent the same data as this record. We can't know for sure, which means we immediately had to decide what to do with potential gaps. We forged ahead with the records that were "live" in

About Metadata Cleaning

There are some limitations to the metadata in that required our team to make a series of subjective decisions:

  1. Not everything in points to an actual dataset. Often, records can point to other portals or clearinghouses of data that are not represented within We ultimately decided to omit these records from our data rescue effort, even if they point to a webpage, API, or geoservice that does contain some kind of data.
  2. The approach to establishing order on is inconsistent. Most crucially for us, there is not a one-to-one correlation between a record and an individual layer of geospatial data. This happens frequently on federal sites. For instance, the record for the U.S. Forest Service Aerial Fire Retardant Hydrographic Avoidance Areas: Aquatic actually contains eight distinct shapefile layers that correspond to the different regions of coverage. NYU’s collection practice dictates that each of these layers be represented by a distinct record, but in the catalog, they are condensed into a single record. 
  3. Not all data providers publish records for data on consistently. Many agencies point to some element of their data that exists, but when you leave the catalog environment and go to the source URL listed in the resources section of the record, you’ll find even more data. We had to make decisions about whether or not (and how) we would include this kind of data.
  4. It’s very common that single metadata records remain intact, but the data that they represent changes. The Forest Service is a good example of this, as files are frequently refreshed and maintained within the USDA Forestry geodata clearinghouse. We did not make any effort in either of these cases to track down other sets of data that the metadata records gesture toward (at least not at this time).

Relatedly, we did not make attempts to provide original records for different formats of what appeared to be the same data. In the case of the Forest Service, many of the records contained both a shapefile and a geodatabase, as well as other original metadata files. Our general approach was to save the shapefile and publish it in our collection environment, then bundle up all other "data objects" associated with a discrete record and include them in the preservation environment of our Spatial Data Repository.

Finally, we realized that the quality of the metadata itself varies widely. We found that it’s a good starting place to creating metadata for discovery, even if we agree that a record is an arbitrary way to describe a single piece of data. However, we had to clean the records to adhere to the GeoBlacklight standard and our own internal cataloging practices. Some of the revisions to the metadata are small and reflect choices that we make at NYU (these are highlighted in red). For instance, the titles were changed to reflect a date-title-area convention that we already use. Other fields (like Publisher) are authority controlled and were easy to change, while others, like format and provenance, were easy to add. For those unfamiliar with the GeoBlacklight standard, refer to the project schema pages and related documentation. Many of the metadata enhancements are system requirements for items to be discovered within our Spatial Data Repository. Subjects presented more of a problem, as these are drawn from an informal tagging system on We used an elaborate process of finding and replacing to remediate these subjects into the LCSH Authority, which connects the items we collect into our larger library discovery environment.

The most significant changes are in the descriptions. We preserved the essence of the original description, yet we cleaned up the prose a little bit and added a way to trace the item that we are preserving back to its original representation in In the case of aforementioned instances, in which a single record contains more than one shapefile, we generated an entirely new record and referenced it to the original UUID. 

Future Directions: Publishing Checksums

Libraries' ability to represent precisely and accurately which datasets, or components of datasets, have been preserved is a serious impediment to embarking on a distributed repository / data-rescue project. Further, libraries need to know if data objects have been preserved and where they reside. To return to the earlier example, how is New York University to know that a particular government dataset has already been "rescued" and is being preserved (either via a publicly-accessible repository interface, or not)?

Moreover, even if there is a venue for institutions to discuss which government datasets fall within their collection priorities (e.g. "New York University cares about federal forestry data, and therefore will be responsible for the stewardship of that data"), it's not clear that there is a good strategy for representing the myriad ways in which the data might exist in its "rescued" form. Perhaps the institution that elects to preserve a dataset wants to make a few curatorial decisions in order to better contextualize the data with the rest of the institution's offerings (as we did with the Forest Service data). These types of decisions are not abnormal in the context of library accessioning.

The problem comes when data processing practices of an institution, which are often idiosyncratic and filled with "local" decisions to a certain degree, start to inhibit the ability for individuals to identify a copy of a dataset in the capacity of a copy. There is a potential tension between preservation –– preserving the original file structure, naming conventions, and even level of dissemination of government data products –– and discovery, where libraries often make decisions about the most useful way for users to find relevant data that are in conflict with the decisions exhibited in the source files.

For the purposes of mitigating the problem sketched above, we propose a data store that can be drawn upon by all members of the library / data-rescue community, whereby the arbitrary or locally-specific mappings and organizational decisions can be related back to original checksums of individual, atomic, files. File checksums would be unique identifiers in such a datastore, and given a checksum, this service would display "claims" about institutions that hold the corresponding file, and the context in which that file is accessible.

Consider this as an example:

  • New York University, as part of an intentional data rescue effort, decides to focus on collecting and preserving data from the U.S. Forest Service.
  • The documents and data from Forest Service are accessible through many venues:
    • They (or some subset) are linked to from a record
    • They (or some subset) are linked to directly from the FSGeodata Clearinghouse
    • They are available directly from a geoservices or FTP endpoint maintained by the Forest Service (such as here).
  • NYU wants a way to grab all of the documents from the Forest Service that it is aware of and make those documents available in an online repository. The question is, if NYU has made organizational and curatorial decisions about the presentation of documents rescued, how can it be represented (to others) that the files in the repository are indeed preserved copies of other datasets? If, for instance, Purdue University comes along and wants to verify that everything on the Forest Service's site is preserved somewhere, it now becomes more difficult to do so, particularly since those documents never possessed a canonical or authoritative ID in the first place, and even could have been downloaded originally from various source URLs.

Imagine instead that as NYU accessions documents ––restructuring them and adding metadata –– they not only create checksum manifests (similar to, if not even identical to the ones created by default by BagIt), but also deposit those manifests to a centralized data store in such a form that the data store could now relate essential information:

The file with checksum 8a53c3c191cd27e3472b3e717e3c2d7d979084b74ace0d1e86042b11b56f2797 appears in as a component of the document instituton_a_9876... held by New York University.

Assuming all checksums are computed at the lowest possible level on files rescued from Federal agencies (i.e., always unzip archives, or otherwise get to an atomic file before computing a checksum), such a service could use archival manifest data as a way to signal to other institutions if a file has been preserved, regardless of whether or not it exists as a smaller component of a different intellectual entity –– and it could even communicate additional data about where to find these preserved copies. In the example of the dataset mentioned above, the original record represents 8 distinct resources, including a Shapefile, a geodatabase, an XML metadata document, an HTML file that links to an API, and more. For the sake of preservation, we could package all of these items, generate checksums for each, and then take a further step in contributing our manifest to this hypothetical datastore. Then, as other institutions look to save other data objects, they could search against this datastore and find not merely checksums of items at the package level, but actually at the package component level, allowing them to evaluate which portion or percentage of data has been preserved.

A system such as the one sketched above could efficiently communicate preservation priorities to a community of practice, and even find use for more general collection-development priorities of a library. Other work in this field, particularly that regarding IPFS, could tie in nicely –– but unlike IPFS, this would provide a way to identify content that exists within file archives, and would not necessitate any new infrastructure for hosting material. All it would require is for an institution to contribute checksum manifests and a small amount of accompanying metadata to a central datastore.


Even though our rescue of the Forest Service data is still in process, we have learned a lot about the challenges associated with this project. We’re very interested in learning about how other institutions are handling the process of rescuing federal data and look forward to more discussions at the event in Washington D.C. on May 8.

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect


  • Resources


    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...