Already a member?

Sign In
Syndicate content

Research Data Management

IQ 40:1 Now Available!

Our World and all the Local Worlds
Welcome to the first issue of Volume 40 of the IASSIST
Quarterly (IQ 40:1, 2016). We present four papers in this issue.
The first paper presents data from our very own world,
extracted from papers published in the IQ through four
decades. What is published in the IQ is often limited in
geographical scope and in this issue the other three papers
present investigations and project research carried out at
New York University, Purdue University, and the Federal
Reserve System. However, the subject scope of the papers
and the methods employed bring great diversity. And
although the papers are local in origin they all have a strong
focus for generalization in order to spread the information
and experience.


We proudly present the paper that received the 'best
paper award' at the IASSIST conference 2015. Great thanks
are expressed to all the reviewers who took part in the
evaluation! In the paper 'Social Science Data Archives: A
Historical Social Network Analysis' the authors Kristin R.
Eschenfelder (University of Wisconsin-Madison), Morgaine
Gilchrist Scott, Kalpana Shankar, and Greg Downey
are reporting on inter-organizational influence and
collaboration among social science data archives through
data of articles published in IASSIST Quarterly in 1976
to 2014. The paper demonstrates social network analysis
(SNA) using a web of 'nodes' (people/authors/institutions)
and 'links' (relationships between nodes). Several types
of relationships are identified: influencing, collaborating,
funding, and international. The dynamics are shown in
detail by employing five year sections. I noticed that from
a reluctant start the amount of relationships has grown
significantly and archives have continuously grown better
at bringing in 'influence' from other 'nodes'. The paper
contributes to the history of social science data archives and
the shaping of a research discipline.


The paper 'Understanding Academic Patrons’ Data Needs
through Virtual Reference Transcripts: Preliminary Findings
from New York University Libraries' is authored by Margaret
Smith and Jill Conte who are both librarians at New York
University, and Samantha Guss, a librarian at University
of Richmond who worked at New York University from
2009-14. The goal of their paper is 'to contribute to the
growing body of knowledge about how information
needs are conceptualized and articulated, and how this
knowledge can be used to improve data reference in an
academic library setting'. This is carried out by analysis of
chat transcripts of requests for census data at NYU. There is
a high demand for the virtual services of the NYU Libraries
and there are as many as 15,000 annual chat transactions.
There has not been much qualitative research of users'
data needs, but here the authors exemplify the iterative
nature of grounded theory with data collection and analysis
processes inextricably entwined and also using a range of
software tools like FileLocator Pro, TextCrawler, and Dedoose.
Three years of chat reference transcripts were filtered down
to 147 transcripts related to United States and international
census data. The unique data provides several insights,
shown in the paper. However, the authors are also aware of
the limitations in the method as it did not include whether
the patron or librarian considered the interaction successful.
The conclusion is that there is a need for additional librarian
training and improved research guides.


The third paper is also from a university. Amy Barton, Paul
J. Bracke, Ann Marie Clark, all from Purdue University,
collaborated on the paper 'Digitization, Data Curation,
and Human Rights Documents: Case Study of a Library
Researcher-Practitioner Collaboration'. The project
concerns the digitization of Urgent Action Bulletins of
Amnesty International from 1974 to 2007. The political
science research centered on changes of transnational
human rights advocacy and legal instrumentation, while
the Libraries’ research related to data management,
metadata, data lifecycle, etcetera. The specific research
collaboration model developed was also generalized for
future practitioner-librarian collaboration projects. The
project is part of a recent tendency where academic
libraries will improve engagement and combine activities
between libraries and users and institutions. The project
attempts to integrate two different lifecycle models thus
serving both research and curatorial goals where the
central question is: 'can digitization processes be designed
in a manner that feeds directly into analytical workflows
of social science researchers, while still meeting the
needs of the archive or library concerned with long-term
stewardship of the digitized content?'. The project builds
on data of Urgent Action Bulletins produced by Amnesty
International for indication of how human rights concerns
changed over time, and the threats in different countries
at different periods, as well as combining library standards
for digitization and digital collections with researcher-driven
metadata and coding strategies. The data creation
started with the scanning and creation of the optical
character recognized (OCR) version of full text PDFs for text
recognition and modeling in NVivo software. The project
did succeed in developing shared standards. However, a
fundamental challenge was experienced in the grant-driven
timelines for both library and researcher. It seems to me that
the expectation of parallel work was the challenge to the
project. Things take time.


In the fourth paper we enter the case of the Federal Reserve
System. San Cannon and Deng Pan, working at the Federal
Reserve Bank in Kansas City and Chicago, created a pilot
for an infrastructure and workflow support for making the
publication of research data a regular part of the research
lifecycle. This is reported in the paper 'First Forays into
Research Data Dissemination: A Tale from the Kansas City
Fed'. More than 750 researchers across the system produce
yearly about 1,000 journal articles, working papers, etcetera.
The need for data to support the research has been
recognized, and the institution is setting up a repository
and defining a workflow to support data preservation
and future dissemination. In early 2015 the internal Center
for the Advancement of Research and Data in Economics
(CADRE) was established with a mission to support, enhance,
and advance data or computationally intensive research,
and preservation and dissemination were identified as
important support functions for CADRE. The paper presents
details and questions in the design such as types of
collections, kind and size of data files, and demonstrates
influence of testers and curators. The pilot also had to
decide on the metadata fields to be used when data is
submitted to the system. The complete setup including
incorporated fields was enhanced through pilot testing and
user feedback. The pilot is now being expanded to other
Federal Reserve Banks.


Papers for the IASSIST Quarterly are always very welcome.
We welcome input from IASSIST conferences or other
conferences and workshops, from local presentations or
papers especially written for the IQ. When you are preparing
a presentation, give a thought to turning your one-time
presentation into a lasting contribution. We permit authors
'deep links' into the IQ as well as deposition of the paper in
your local repository. Chairing a conference session with
the purpose of aggregating and integrating papers for a
special issue IQ is also much appreciated as the information
reaches many more people than the session participants,
and will be readily available on the IASSIST website at
http://www.iassistdata.org.


Authors are very welcome to take a look at the instructions
and layout: http://iassistdata.org/iq/instructions-authors.

Authors can also contact me via e-mail: kbr@sam.sdu.dk.
Should you be interested in compiling a special issue for
the IQ as guest editor(s) I will also be delighted to hear
from you.


Karsten Boye Rasmussen
June 2016
Editor

IASSIST 2016 Program At-A-Glance, Part 2: Data infrastructure, data processing and research data management

 

Here's another list of highlights from IASSIST2016 which is focusing on the data revolution. For previous highlights, see here.

Infrastructure

  • For those of you with an interest in technical infrastructure, the University of Applied Sciences HTW Chur will showcase an early protype MMRepo (1 June, 3F), whose function is to store qualitative and quantitative data into one big data repository.
  • The UK Data Service will present the following panel "The CESSDA Technical Framework - what is it and why is it needed?", which elaborates how the CESSDA Research Infrastructure should have modern data curation techniques rooted in sophisticated IT capabilities at its core, in order to better serve its community.

  • If you have been wondering about the various operational components and the associated technology counterparts involved with running a data science repository, then the presentation by ICPSR is for you. Participants in that panel will leave with an understanding of how the Archonnex Architecture at ICPSR is strengthening the data services offered to new researchers and much more.

Data processing

Be sure to check out the aforementioned infrastructure offerings if you’re interested in data processing, but also check out a half-day workshop on 31 May, “Text Processing with Regular Expressions,” presented by Harrison Dekker, UC Berkeley, that will help you learn regular expression syntax and how to use it in R, Python, and on the command line. The workshop will be example-driven.

Data visualisation

If you are comfortable working with quantitative data and are familiar with the R tool for statistical computing and want to learn how to create a variety of visualisations, then the workshop by the University of Minnesota on 31 May is for you. It will introduce the logic behind ggplot2 and give participants hands-on experience creating data visualizations with this package. This session will also introduce participants to related tools for creating interactive graphics from this syntax.

Programming

  • If you’re interesting in programming there’s a full-day Intro to Python for Data Wrangling workshop on 31 May, led by Tim Dennis, UC San Diego,  that will provide tools to use scientific notebooks in the cloud, write basic Python programs, integrate disparate csv files and more.

  • Also, the aforementioned Regular Expressions workshop also on 31 May will offer  in-workshop opportunities  to working with real data and perform representative data cleaning and validation operations in multiple languages.

Research data management

  • Get a behind-the-scenes look at data management and see how an organization such as the Odum Institute manages its archiving workflows, head to “Automating Archive Policy Enforcement using Dataverse and iRODS” on 31 May with presenters from the UNC Odom Institute, UNC Chapel Hill. ’Participants will see machine actionable rules in practice and be introduced to an environment where written policies can be expressed in ways an archive can automate their enforcement.

  • Another good half-day workshop, targeted to for people tasked with teaching good research data management practices to researchers is  “Teaching Research Data Management Skills Using Resources and Scenarios Based on Real Data,” 31 May, with presenters from ICPSR, the UK Data Archive and FORS. The organisers of this workshop will showcase recent examples of how they have developed teaching resources for hands-on-training, and will talk about successes and failures in this regard.

Tools

If you’re just looking to add more resources to your data revolution toolbox, whether it’s metadata, teaching, data management, open and restricted access, or documentation, here’s a quick list of highlights:

  • At Creating GeoBlacklight Metadata: Leveraging Open Source Tools to Facilitate Metadata Genesis (31 May), presenters from New York University will provide hands-on experience in creating GeoBlacklight geospatial metadata, including demos on how to capture, export, and store GeoBlacklight metadata.

  • DDI Tools Demo (1 June). The Data Documentation Initiative (DDI) is an international standard for describing statistical and social science data.

  • DDI tools: No Tools, No Standard (3 June), where participants will be introduced to the work of the DDI Developers Community and get an overview of tools available from the community.

Open-access

As mandates for better accessibility of data affects more researchers, dive into the Conversation with these IASSIST offerings:

Metadata

Don’s miss IASSIST 2016’s offerings on metadata, which is the data about the data that makes finding and working with data easier to do. There are many offerings, with a quick list of highlights below:

  • Creating GeoBlacklight Metadata: Leveraging Open Source Tools to Facilitate Metadata Genesis (Half-day workshop, 31 May), with presenters from New York University

  • At Posters and Snacks on 2 June, Building A Metadata Portfolio For Cessda, with presenters from the Finnish Social Science Data Archive; GESIS – Leibniz-Institute for the Social Sciences; and UK Data Service

Spread the word on Twitter using #IASSIST16. 


A story by Dory Knight-Ingram (
ICPSR)

Latest Issue of IQ Available! Data Documentation Initiative - Results, Tools, and Further Initiatives

Welcome to the third issue of Volume 39 of the IASSIST Quarterly (IQ 39:3, 2015). This special issue is guest edited by Joachim Wackerow of GESIS – Leibniz Institute for the Social Sciences in Germany and Mary Vardigan of ICPSR at the University of Michigan, USA. That sentence is a direct plagiarism from the editor’s notes of the recent double issue (IQ 38:4 & 39:1). We are very grateful for all the work Mary and Achim have carried out and are developing further in the continuing story of the Data Documentation Initiative (DDI), and for their efforts in presenting the work here in the ASSIST Quarterly.

As in the recent double issue on DDI this special issue also presents results, tools, and further initiatives. The DDI started 20 years ago and much has been accomplished. However, creative people are still refining and improving it, as well as developing new areas for the use of DDI.

Mary Vardigan and Joachim Wackerow give on the next page an overview of the content of DDI papers in this issue.

Let me then applaud the two guest editors and also the many authors who made this possible:

  • Alerk Amin, RAND Cooperation, www.rand.org, USA
  • Ingo Barkow, Associate Professor for Data Management at the University for Applied Sciences Eastern Switzerland (HTW Chur), Switzerland
  • Stefan Kramer, American University, Washington, DC, USA
  • David Schiller, Research Data Centre (FDZ) of the German Federal Employment Agency (BA) at the Institute for Employment Research (IAB)
  • Jeremy Williams, Cornell Institute for Social and Economic Research, USA
  • Larry Hoyle, senior scientist at the Institute for Policy & Social Research at the University of Kansas, USA
  • Joachim Wackerow, metadata expert at GESIS - Leibniz Institute for the Social Sciences, Germany
  • William Poynter, UCL Institute of Education, London, UK
  • Jennifer Spiegel, UCL Institute of Education, London, UK
  • Jay Greenfield, health informatics architect working with data standards, USA
  • Sam Hume, vice president of SHARE Technology and Services at CDISC, USA
  • Sanda Ionescu, user support for data and documentation, ICPSR, USA
  • Jeremy Iverson, co-founder and partner at Colectica, USA
  • John Kunze, systems architect at the California Digital Library, USA
  • Barry Radler, researcher at the University of Wisconsin Institute on Aging, USA
  • Wendy Thomas, director of the Data Access Core in the Minnesota Population Center (MPC) at the University of Minnesota, USA
  • Mary Vardigan, archivist at the Inter-university Consortium for Political and Social Research (ICPSR), USA
  • Stuart Weibel, worked in OCLC Research, USA
  • Michael Witt, associate professor of Library Science at Purdue University, USA.

I hope you will enjoy their work in this issue, and I am certain that the contact authors will enjoy hearing from you
about new potential results, tools, and initiatives.

Articles for the IASSIST Quarterly are always very welcome. They can be papers from IASSIST conferences or other
conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing
a presentation, give a thought to turning your one-time presentation into a lasting contribution to continuing development. As an author you are permitted ‘deep links’ where you link directly to your paper published in the IQ. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the session participants, and will be readily available on the IASSIST website at http://www.iassistdata.org.

Authors are very welcome to take a look at the instructions and layout: http://iassistdata.org/iq/instructions-authors. Authors can also contact me via e-mail: kbr@sam.sdu.dk.

Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.

Karsten Boye Rasmussen
September 2015
Editor

New Perspectives on DDI

This issue features four papers that look at leveraging the structured metadata provided by DDI in
different ways. The first, “Design Considerations for DDI-Based Data Systems,“ aims to help decisionmakers
by highlighting the approach of using relational databases for data storage in contrast to
representing DDI in its native XML format. The second paper, “DDI as a Common Format for Export
and Import for Statistical Packages,” describes an experiment using the program Stat/Transfer to
move datasets among five popular packages with DDI Lifecycle as an intermediary format. The paper
“Protocol Development for Large-Scale Metadata Archiving Using DDI Lifecycle” discusses the use
of a DDI profile to document CLOSER (Cohorts and Longitudinal Studies Enhancement Resources,
www.closer.ac.uk), which brings together nine of the UK’s longitudinal cohort studies by producing a
metadata discovery platform (MDP). And finally, “DDI and Enhanced Data Citation“ reports on efforts in
extend data citation information in DDI to include a larger set of elements and a taxonomy for the role
of research contributors.

Mary Vardigan - vardigan@umich.edu
Joachim Wackerow - Joachim.Wackerow@gesis.org

Looking Back/Moving Forward - Reflections on the First Ten Years of Open Repositories

Open Repositories conference celebrated its first decade by having four full days of exciting workshops, keynotes, sessions, 24/7 talks, and development track and repository interest group sessions in Indianapolis, USA. All the fun took place in the second week of June. The OR2015 conference was themed "Looking Back/Moving Forward: Open Repositories at the Crossroads" and it brought over 400 repository developers and managers, librarians and library IT professionals, service providers and other experts to hot and humid Indy.

Like with IDCC earlier this year, IASSIST was officially a supporter of OR2015. In my opinion, it was a worthy investment given the topics covered, depth and quality of presentations, and attendee profile. Plus I got to do what I love - talk about IASSIST and invite people to attend or present in our own conference.

While there may not be extremely striking overlap with IASSIST and OR conferences, I think there are sound reasons to keep building linkages between these two. Iassisters could certainly provide beneficial insight on various RDM questions and also for instance on researchers' needs, scholarly communication, reusing repository content, research data resources and access, or data archiving and preservation challenges. We could take advantage of the passion and dedication the repository community shows in making repositories and their building blocks perfect. It's quite clear that there is a lot more to be achieved when repository developers and users meet and address problems and opportunities with creativity and commitment.

 

While IASSIST2015 had a plenary speaker from Facebook, OR had keynote speakers from Mozilla Science Lab and Google Scholar. Mozilla's Kaitlin Thaney skyped a very interesting opening keynote (that is what you resort to when thunderstorms prevent your keynote speaker from arriving!) on how to leverage the power of the web for research. Distributed and collaborative approach to research, public sharing and transparency, new models of discovery and freedom to innovate and prototype, and peer-to-peer professional development were among the powers of web-enabled open science.
 
Anurag Acharya from Google gave a stimulating talk on pitfalls and best practices on indexing repositories. His points were primarily aimed at repository managers fine-tuning their repository platforms to be as easily harvestable as possible. However, many of his remarks are worth taking into account when building data portals or data rich web services. On the other, hand it can be asked if it is our job (as repository or data managers) to make things easy for Google Scholar, or do we have other obligations that put our needs and our users first. Often these two are not conflicting though. What is more notable from my point of view was Acharya's statement that Google Scholar does not index other research outputs (data, appendixes, abstracts, code…) than articles from the repositories. But should it not? His answer was that it would be lovely, but it cannot be done efficiently because these resources are not comprehensive enough, and it would not possible for example to properly and accurately link users to actual datasets from the index. I'd like to think this is something for IASSIST community to contemplate.

Open Researcher and Contributor ID (ORCID) had a very strong presence in OR2015. ORCID provides an open persistent identifier that distinguishes a researcher from every other researcher, and through their API interfaces that ID can be connected to organisational and inter-organisational research information systems, helping to associate researchers and their research activities. In addition to a workshop on ORCID APIs there were many presentations about ORCID integrations. It seems that ORCID is getting close to reaching a critical mass of users and members, allowing it to take big leaps in developing its services. However, it still remains to be seen how widely it will be adopted. For research data archiving purposes having a persistent identifier provides obvious advantages as researchers are known to move from one organisation to another, work cross-nationally, and collaborate across disciplines.

Many presentations at least partly addressed familiar but ever challenging research data service questions on deposits, providing data services for the researcher community and overcoming ethical, legal or institutional barriers, or providing and managing a trustworthy digital service with somewhat limited resources. Check for example Andrew Gordon's terrific presentation on Databrary, a research-centered repository for video data. Metadata harmonisation, ontologies, putting emphasis on high quality metadata and ensuring repurposing of metadata were among the common topics as well, alongside a focus on complying with standards - both metadata and technical.

I see there would be a good opportunity and considerable common ground for shared learning here, for example DDI and other metadata experts to work with repository developers and IASSIST's data librarians and archivists to provide training and take part in projects which concentrate on repository development in libraries or archives.

Keynotes and a number of other sessions were live streamed and recorded for later viewing. Videos of keynotes and some other talks and most presentation slides are available already, rest of the videos will be available in the coming weeks.

"Before anything else, preparation is the key to success." Notes from RDMF13: Preparing Data for Deposit

The Digital Curation Centre’s most recent Research Data Management Forum took place last week in London.

UK Data Service’s Louise Corti began the day with an overview of their acquisitions process. The Service (under various names) is almost 50 years old that gives it experience and perspective many institutions do not have. Lessons from those years include the importance of a collections development policy that’s allowed to evolve. The Archive evaluates on a basis of teaching and re-use for validation and replication. They have learnt from past mistakes and now keep access licences to three options: open, safeguarded (requiring registration), and controlled (locked-down access). Common problems persist however. Poor file names, weak description of methods and contextual documentation, limited metadata, and unexplained missing data files. The UK Data Service play a number of roles as a data service, from hand-holders and evangelical preachers, to being the Economic and Social Research Council’s police officer for non-compliance on data sharing.

Suzanne Embury made a valuable point in her presentation. Of course, the one thing we know is we don’t know how other people will re-use data in the future. But we can reasonably guess what they will want to do is discover, integrate, and aggregate it. To this end, simple things can help – check spellings, aim for standardised vocabularies, avoid acronyms. Finally, apply a domain expert test to see if people in the discipline can independently understand the data. With that, echoes of Gary King’s replication standard came to mind.

A presentation on meeting the RDM challenge focused on the University of Loughborough who have adopted a data preservation and sharing solution based on figshare and Arkivum support. Loughborough desire making depositing data as easy as possible for researchers by taking care of as much of back end stuff as possible. But at what cost, in both finances and quality? At the last IASSIST we learnt RDM takes a village, but Loughborough acknowledged the contribution of 61 people in setting up their service, so maybe it really takes a small metropolitan statistical area.

IASSIST’s own web editor Robin Rice directed us through data deposit at the University of Edinburgh guided by former IASSIST president Peter Burnhill’s refrain of "helping researchers to do the right thing". Edinburgh provide support throughout the data lifecycle with strong training resources (Research Data MANTRA), plus face-to-face sessions on managing data, creating DMP, good practice, handling data in SPSS, working with personal and sensitive research data. Like the UK Data Service, they recognise the value in keeping things simple and offering good incentives. Licence options, for example. Their repository only accepts open data (CC-BY 4.0) but depositing is based on five required metadata fields. In return, depositors get their data available quickly with open download stats for every item.

The afternoon sessions split into three discussion groups. Emerging from them were thoughts on keeping metadata requirements as simple as possible, recognising the concentrate on different aspects depending on the discipline; some disciplines require precision while others do not require so much. An acknowledgement that data discovery is often undertaken through google. Also, while there inevitably is a range of people providing a service, there needs to be or a person connecting existing resources in a university. Finally, raising awareness is a problem, demand related to institutional awareness.

Presentations from the event are available from the DCC, and tweets with the hashtag #rdmf13. The DCC will be blogging about the discussion group sessions.

Spring forward! The Jisc Research Data Spring programme

On 26/27 February, I attended Jisc Data Spring “Sandpit 1” in the English city of Birmingham. Data Spring is a funding programme supporting UK based projects in Research Data Management (RDM), and something of a successor to the successful Managing Research Data programmes (MRD) that did so much to get RDM training and tools underway in the UK’s education sector.

Unlike the traditional proposal-evaluation-funding model, Data Spring takes a more collaborative, interactive approach, splitting the programme into separate stages at which projects may no longer receive funding. If that sounds like the approach of entertainment modern TV shows, then you would not be wrong to think that. Beginning with an open call, some 70 proposals were available online for voting and comments. These reduced to 44 by the time of a workshop [PDF] at the recent IDCC conference. At the “Sandpit” (metaphorical, not literal, sadly), these proposals had to fit 27 available slots to proceed to the next stage. Through a process of negotiation, mergers and acquisitions, and hasty matchmaking, all 44 managed to get through in some form from the first day to the second.

The second day consisted of the now 27 projects making four-minute pitches to a panel of judges. By mid-March, successful projects will receive notice of three months testing and prototype funding before reporting to a similar event in June. Following this event, projects may receive a further four months of funding before a final workshop in November allows six months of funding leading to the programme’s conclusion in 2016.

Having been part of the JISCMRD Program (Jisc has since switched to sentence case from caps), it was notable how much the area has moved on since those days. From evidence gathering and basic training tools to RDM support focused on integration into existing workflows. That this occurred is a testament to the original MRD programme, and the support, work, and imaginations of those involved. Whatever projects make it through to the end of Data Spring, I have no doubt they will be worth the attention of people involved in Research Data Management both inside and outside the UK.

You can review projects at the Data Spring ideascale and figshare pages and tweet about them using #dataspring.

UPDATE: a storify of the event is also available.

A decade against decay: the 10th International Digital Curation Conference

The International Digital Curation Conference (IDCC) is now ten years old. On the evidence of its most recent conference, is in rude health and growing fast.

IDCC is the first time IASSIST decided to formally support another organisational conference. I think it was a wise investment given the quality of plenaries, presentations, posters, and discussions.

DCC already has available a number of blogs covering the substance of sessions, including an excellent summary by IASSIST web editor, Robin Rice. Presentations and posters are already available, and video from plenary sessions will soon be online.

Instead I will use this opportunity to pick-up on hanging issues and suggestions for future conferences.

One was apportionment of responsibility. Ultimately, researchers are responsible for management of their data, but they can only do so if supporting infrastructure is in place to help them. So, who is responsible for providing that: funders or institutions? This theme emerged in the context of the UK’s Engineering and Physical Sciences Research Council who will soon enforce expectations identifying the institution as responsible for supporting good Research Data Management.

Related to that was a discussion on the role of libraries in this decade. Are they relevant? Can they change to meet new challenges? Starting out as a researcher who became a data archivist and is now a librarian, I wouldn’t be here if libraries weren’t meeting these challenges. There’s a “hush” of IASSIST members also ready to take issue with the suggestions libraries aren’t relevant or not engaged with data, in fact they did so at our last conference.

Melissa Terras, (UCL) did a fantastic job presenting [PDF] work in the digital humanities that is innovative in not only preserving, but rescuing objects – and all done on small change research budgets. I hope a future IDCC finds space for a social sciences person to present on issues we face in preservation and reuse. Clifford Lynch (CNI) touched on the problems of data reuse and human subjects, which remained one of the few glancing references to a significant problem and one IASSIST members are addressing. Indeed, thanks must go to a former president of this association, Peter Burhill (Edinburgh) who mentioned IASSIST and how it relates to the IDCC audience on more than one occasion.

Finally, if you were stimulated by IDCC’s talk of data, reuse, and preservation then don’t forget our own conference in Minneapolis later this year.

Hallelujah and praise the LARD! The first London Area Research Data group meeting

LARD is London Area Research Data and this was its inaugural meeting, informally bringing together various people from London based institutions (and as far away as Reading) who are charged in some way with Research Data Management (RDM) - be it research support or repository work.

These are my notes, which lack attribution partly because I couldn't remember where every person was from, and also it wasn't clear if the meeting was on or off the record. Nonetheless, I felt there were some interesting points that deserve sharing as an insight into how UK universities (and one research centre) are dealing with RDM less than a year away from the EPSRC deadline on expectations of compliance for research data.

The first item in what was a free form discussion (think RDM jazz - hence my beat style kind of note taking, with full stops however), was policies. Some institutions have data policies, some have draft policies, and others have no policy. The mood seemed to be that a policy was more effective as a mandate for focusing university attention and resources on support services, not so much for grabbing researchers’ attention. Researchers, it was said, tend to react more to what funders want rather than university policies or documents. Those universities that competed for Medical Research Council (MRC) funding felt the MRC demanded institutional data policies, and so those institutions tended to adopt or have drafts ready for adoption. Yet most researchers are not funded by one of the RCUK councils, and these are often funders without data mandates. The group found a problem telling researchers that they don’t own their own data (it’s often funders or institutions through employee created works clauses). There was also a sense that researchers worry about data protection and are looking for practical guidance on how to keep data safe and secure. There was also a recognition that disciplines matter, those disciplines that do not have a strong culture of sharing data can be helped with the weight of institutional support providing the infrastructure to support RDM. This tackles the disciplinary focus of researchers, or localism. An example of how a bad experience can focus attention was mentioned when a researcher lost data by plugging a malware infected hard drive into a university network and had to have the drive and the copy of the data destroyed. Episodes like this can be used to tackle the culture of “improvisation” when it comes researchers “backing-up” their data without, or without engaging, institutional support. Aside from acting as a “wake-up” for researchers, they can push universities into providing workable, easy to use, institutional storage - either working storage or preservation in an institutional repository.

Discussion then moved round to the EPSRC expectations for research data, with those who attended a recent DCC event on the EPSRC expectations reporting that the EPSRC are not looking to get rid of opportunities for supporting research, so are not likely to cut off funding come May 2015. However, they do expect to see evidence that institutions are working towards or trying to improve storage, support, and data discovery and access. Nonetheless, there is no doubt the EPSRC policy has focused knowledge and effort in institutions towards RDM. Then training was mentioned. When the “T” word is mentioned I often think of that line about if people don't want to come how are you going to stop them? To save us from preparing to teach to empty rooms, the thinking now seems to be towards providing support when people need it and building up a directory of experts to refer to when appropriate. Structured support is based on identifying four key stages in the data lifecycle: submitting a proposal (for help on data management planning), when proposals are accepted (implementing RDM), mid-project (supporting implementation), and towards the close to talk about preservation. The key is to keep engagement with researchers. One institution is trying to do this for all research projects at that institution so is working with their research office to target RCUK funded projects. Another institution initially plans to work with a sample of projects.

By now the discussion had moved on to data management planning. One institution had a Data Management Plan (DMP) template and DMP requirement as part of its data policy, with separate plans for staff and postgraduate students. The feeling was that template texts are not such a good thing if they are copied and pasted into DMPs. A case was mentioned of one research funder refusing to fund a project because the DMP used identical text to another DMP submitted from that institution. The DCC’s DMPOnline tool was mentioned, particularly it’s ability to be customised towards an institution. It was also mentioned that DMPOnline has been much improved in later versions. A policy was mentioned at one institution of not offering storage until a DMP has been completed, another institution reported on how there is a checkbox in the research office to signify that the DMP has been looked at by the data management officer.

The RDM equivalent of Godwin's law (or Godwin's Rule of Nazi Analogies), is that at some point cost will be mentioned. How to cost RDM is an ongoing problem. Given the problem of identifying costs that specifically relate to RDM activity, as opposed to to typical research requirements that have an RDM aspect, an additional problem is that RCUK funders mostly allow budgeting for RDM but that budgeting must not identify activity that is supported as part of general institutional funding. Auditing costs is a problem. Storage tends to have the easier to identify costs (storage per byte for example), but this can be a problem if data is stored in an institutional repository when the budget for the project identified separate storage costs. For this reason, solutions like Arkivum may be advantageous as they can be specified as an auditable costs.

The coda to this discussion concerned metadata. It was said that funders were keen on ensuring that good quality metadata accompanies research data generated by projects they support, and that they are willing to allow proposals that factor in additional time and resources for metadata. However, an obvious problem is who should be adding that metadata - is it researchers who know the data, but not necessarily the standard or see its importance in the way RDM support staff do; or should it be RDM staff, particularly repository staff, who know they type of information required but do not necessarily know the data or discipline that well. Finally, hitting on a standard that that is applicable to all data is a problem. Social science is not the same as genetics; art history is not the same as management. It was then asked if there was a way to harvest metadata when that metadata is created elsewhere (say, the UK Data Service). Both the DCC and UK Data Service are working on a Jisc funded Research Data Registry and Discovery Service and the European Union are also working on data discovery platforms that imports/exports catalogue record metadata.

The feeling at the end of this initial meeting was LARD provided a useful forum for sharing practice and learning from contemporaries and there was enthusiasm for follow-up meetings including those based around structured themes. If you work in a big city, and there are people doing similar things to you in that city, take advantage and get together to talk. So, thanks to Gareth Knight (LSHTM), Stephen Grace (UEL), and Veronica Howe (KCL) for organising, facilitating, and hosting LARD #1.

IASSIST SIGDMC Annual Report 2013-2014

By Carol Perry & Stefan Kramer, co-chairs
Last updated: 2014-05-29 by CP

  • The major activity of the Data Management & Curation Interest Group (SIGDMC) in the last year was the conceptualization, organization, submission, and offering of the June 2, 2014, morning workshop Data Management & Curation: Lessons from Government, Academia, and Research. It features seven invited presenters, and session and breakout group moderators from the SIGDMC membership, which also provided input on the breakout group topics.
  • As of May 26, 2014 SIGDMC membership is at just under 70, having been fairly steady over the year in terms of Google Group membership.  
  • The Data Management and Curation Resources page on the IASSIST website has been reviewed and updated. The list now contains 59 resources;  9 new resources were added since May 2013. Minglu Wang, Limor Peer and Wendy Mann are responsible for this resource. 
  • Progress was made in keeping the IASSIST blog active, however, we did not quite meet our goal of one blog per month. 
  • The members who attend the annual IASSIST conference in Toronto have been invited to participate in an in-person meeting on June 4, where the election outcome of the successor of Carol Perry as co-chair will first be announced, and future goals for the group be discussed.

Research Data Management Issues Across Environments

Lots of conversations going on these days in different venues where people are asking many of the same questions:  how do we teach researchers about data management with limited staff, and what data management services should we offer?  How do we find sustainable ways to manage data that leverage the efforts of many different repositories, those in government, institutions and disciplinary ones?  How do we coalesce standard practice and reasonable but effective policies at at least the national level and preferably on a global scale?  What roles should governments play?  How much can we as data professionals accomplish on our own?  The Data Management and Curation SIG will host a workshop to talk about these and other issues across different countries and environments next Tuesday. Our speakers will include:

  • Dan Gillman, U.S. Bureau of Labor Statistics
  • Marcel Hebing, DIW Berlin
  • Chuck Humphrey, University of Alberta
  • Steven McEachern, Australian Data Archive
  • Barry Radler, Institute on Aging, University of Wisconsin-Madison
  • Robin Rice, EDINA and Data Library at the University of Edinburgh
  • Kathleen Shearer, Confederation of Open Access Repositories and Research Data Canada

Looking forward to seeing many of you in Toronto!

Michele Hayslett, University of North Carolina at Chapel Hill & Stefan Kramer, American University

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...