Already a member?

Sign In
Syndicate content

Blogs

IASSIST Fellows Program 2013-14

The IASSIST Fellows Program is pleased to announce that it is now accepting applications for financial support to attend the IASSIST 2014 conference in Toronto [http://www.library.yorku.ca/cms/iassist/], from data professionals who are developing, supporting and managing data infrastructures at their home institutions.

Please be aware that funding is not intended to cover the entire cost of attending the conference. The applicant’s home institution must provide some level of financial support to supplement an IASSIST Fellow award. Strong preference will be given to first time participants and applicants from those countries currently with insufficient representation at IASSIST. Only fully completed applications will be considered. Applicants submitting a paper for the conference will be given priority consideration for funding.

You may apply for funding via this form.The deadline for applications is the 31st of January 2014.

For more information, to apply for funding or nominate a person for a Fellowship, please send an email to the Fellows Committee chairs, Luis Martínez-Uribe (
lmartinez@march.es) and Stuart Macdonald (srm262@cornell.edu).

IASSIST 2014 Call for Papers

ALIGNING DATA AND RESEARCH INFRASTRUCTURE
IASSIST 2014 Annual Conference Call for Paper and Session Proposals

This year’s conference theme touches upon the international and interdisciplinary requirements of aligning data and research infrastructure. The 2013 OECD Global Science Forum report on New Data for Understanding the Human Condition identifies key challenges for international data collaboration that beg for new solutions. Among these challenges is the mounting pressure for new forms of social science data. In today’s abundance of personal data, new methods are being sought to combine traditional social science data (administrative, survey, and census data) with new forms of personal data (social networking, biomarkers or transaction data) or with data from other domains. Similarly, the need for open data, archiving, and long-term curation infrastructures has been identified for research data in the natural, physical, and life sciences. Funders in all areas are pushing to enable the replication and/or reuse of research data. What alignments are needed between data and research infrastructure to enable these possibilities?

The international research community is in the midst of building a global data ecosystem that consists of a mixture of domain data repositories, data archives, data libraries, and data services and that seeks ways to facilitate data discovery, integration, access, and preservation. Evidence of this transformation is found in the recently established ICSU World Data System and in the Research Data Alliance. Like IASSIST, these organisations are contributing to the development of a global data ecosystem. Alignment or unification of strategies must take place at many levels to achieve this. How do we proceed? What advancements are needed in research data management, research infrastructure, and the development of new expertise?

Conference Tracks

We welcome submissions on the theme outlined above and encourage conference participants to propose papers and sessions that will be of interest to a diverse audience. To facilitate the organisation and scheduling of sessions, three distinct tracks have been established. If you are unsure which track your submission belongs or you feel that it applies to more than one track, submit your proposal and if accepted, the Programme Committee will find an appropriate fit.

Track 1: Research Data Management

  • New data types and their management
  • Challenges in exchanging research data across disciplines
  • Using social science data with data from other domains
  • Data linkage in the creation of new social science data
  • Data management within the global research data ecosystem
  • Data archives and repositories in the global data ecosystem
  • Best practices in the global data ecosystem
  • Metadata enabling the interoperability of research data
  • Application of DDI, SDMX, other metadata schema, taxonomies or ontologies in research data management
  • Data management policies and workflow systems
  • Data attribution and citation systems

Track 2: Professional Development

  • Training challenges given the growing number of professional positions within the global data ecosystem, which includes data curators, data scientists, data librarians, data archivists, etc.
  • Teaching end-users to work with research data
  • Data and statistical literacy
  • Data collection development in libraries and other institutions
  • Explorations of data across subject areas and geographic regions
  • Copyright clearance, privacy and confidential data
  • Working with ethics review boards and research service offices
  • Interdisciplinarity – promoting the cross-use of data
  • Training researchers about research data management planning
  • Liaison librarians’ roles in research data

Track 3: Data Developers and Tools

  • New infrastructure requirements in the global data ecosystem
  • Infrastructure supporting Data Without Borders
  • Tools to develop and support new social science data
  • Crowdsourcing applications in producing new social science data
  • Data dives or hackathons
  • API development supporting research data management
  • Open data web services
  • Applications of research data visualisation in the social sciences
  • Preservation tools for research data
  • Tools for data mining
  • Data technology platforms: cloud computing and open stack storage
Conference Formats

The Programme Committee welcomes submissions employing any of the following formats:

Individual proposal
This format consists of a 15 to 20 minute talk that is typically accompanied with a written paper. If your individual proposal is accepted, you will be grouped into an appropriate session with similarly themed presentations.
Session proposal
Session proposals consist of an identified set of presenters and their topics. Such proposals can suggest a variety of formats, e.g. a set of three to four presentations, a discussion panel, a discussion with the audience, etc. If accepted, the person who proposed the session becomes the session organiser and is responsible for securing speakers/participants and a chair/moderator (if not standing in that role him/herself).
Pecha Kucha proposal
A proposal for this programme event consists of a presentation of 20 slides shown for 20 seconds each, with heavy emphasis on visual content. Presentations in this event are timed and speakers are restricted to seven minutes.
Poster or demonstrations proposal
Proposals in this category should identify the message being conveyed in a poster or the nature of the demonstration being made.
Round table discussion proposal
Round table discussions typically take place during lunch and have limited seating. Please indicate how you plan to share the output of your round table discussion with all of IASSIST.

Session formats are not limited to the ideas above and session organisers are welcome to suggest other formats.

All submissions should include the proposed title and an abstract no longer than 200 words (note: longer abstracts will be returned to be shortened before being considered). Please note that all presenters are required to register and pay the registration fee for the conference. Registration for individual days will be available.

Please use this online submission form to submit your proposal. If you are unsure which track your submission fits or if you feel it belongs in more than one track, the Program Committee will find an appropriate place.

We also welcome workshop proposals around the same themes. Successful proposals will blend lecture and active learning techniques. The conference planning committee will provide the necessary classroom space and computing supplies for workshops. For previous examples of IASSIST workshops, please see the descriptions of 2011 workshops and 2013 workshops. Typically workshops are half-day with 2-hour and 3-hour options.

  • Deadline for submission: December 9, 2013 (2013.12.09)
  • Notification of acceptance: February 7, 2014 (2014.02.07).
Program Chairs
  • Johan Fihn
  • Jen Green
  • Chuck Humphrey

re3data.org and OpenAIRE sign MoU during Open Access Week; new re3data.org features

Last month, OpenAIRE (Open Access Infrastructure for Research in Europe) and re3data.org signed a Memorandum of Agreement to “work jointly to facilitate research data registration, discovery, access and re-use” in support of open science.  OpenAIRE is an infrastructure for open access that works to track and measure research output (originally designed to monitor EU funding activities).  re3data.org is an online listing of research data repositories.

re3data.org and OpenAIRE will exchange metadata in order for OpenAIRE to “integrate data repositories indexed in the re3data.org registry and in turn return information about usage statistics for datasets and inferred links between data and publications.”

For more information, see the OpenAIRE press release on the MoU.

In addition, re3data.org is now mentioned in Nature's Scientific data's deposition policy, which encourages the registration of repositories with the service, as well as a collaboration with BioSharing.

In addition, re3data.org has made other recent enhancements, including:

Now users can browse re3data.org repositories by:

  1. subject
  2. content type
  3. country

Furthermore, a re-designed the repository record now groups information into the categories of: general, institutions, terms, and standards.  They have added many more repositories in the past few months, so check it out!

Announcing the Release of the CRDCN Dataset Builder

The Canadian Research Data Centre Network (CRDCN) is pleased to announce the release of the CRDCN Dataset Builder.

In collaboration with Statistics Canada and Metadata Technology North America, the Dataset Builder allows researchers working (or intending to work) in a Canadian RDC the ability to browse, search for and select variables in the Statistics Canada surveys currently housed in the RDCs. 

Utilizing DDI Lifecycle metadata, the Dataset Builder allows researchers to find and select variables, as well as produce SAS, SPSS or Stata syntax to help read in and format the variables, and produce customized documentation (Layout and Codebooks) for the dataset they create using the app. 

 A one-page installation, setup and use guide can be found at this link, with a link to more descriptive documentation if necessary: https://docs.google.com/document/d/135Eq2fwVRtlMdENpQjmZe5Zjm1OFGImCxtyWeV_7sdI [docs.google.com]

The application is open-source software, so please contact Metadata Technology NA if you're interested in contracting them to customize this application for your own organization.

Please contact Dave Haans (dave.haans@utoronto.ca) for more information.

Scientific Data Repositories Issue Call for Change on Funding Models for Data Archives

For Immediate Release
September 16, 2013
Contact: Mark Thompson-Kolar, 734-615-7904
mdmtk@umich.edu

(Ann Arbor, MI) — Representatives of 25 organizations that archive scientific data today released a Call for Action urging the creation of sustainable funding streams for domain repositories — data archives with close ties to scientific communities.

The document was developed after a meeting of data repositories across the social and natural sciences June 24-25, 2013, in Ann Arbor. The meeting was organized by the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan and supported by the Alfred P. Sloan Foundation to discuss challenges facing domain repositories, particularly in light of the February 2013 memorandum from the U.S. Government’s Office of Science and Technology Policy (OSTP) requiring public access to federally funded data.

Domain repositories in the natural and social sciences are built upon close relationships to the scientific communities that they service. By leveraging in-depth knowledge of the subject matter, domain repositories add value to the stored data beyond merely preserving the bits. As a result, repositories contribute to scientific discovery while ensuring that data curation methods keep pace as science evolves. “However, the systems currently in place for funding repositories in the US are inadequate for these tasks,” the document states.

The Call for Action argues that “Domain repositories must be funded as the essential piece of the US research infrastructure that they are,” emphasizing the importance of:

•    Ensuring funding streams that are long-term, uninterrupted and flexible
•    Creating systems that promote good scientific practice
•    Assuring equity in participation and access

The document expresses concerns regarding current and future funding models in consideration of the OSTP rules. “The push toward open access, while creating more equity of access for the community of users, creates more of a burden for domain repositories because it narrows their funding possibilities.”

“We are memory institutions,” ICPSR Director George Alter said. “One of our missions is to ensure data will be available for a long time, yet we’re being funded by short-term grants. There is a mismatch between our mission and the way we are funded. Widening access to data is a good thing. Everyone agrees on that. But it has to be done in a way that provides sustainable funding to the organizations that preserve and distribute the data.”

Repositories may require varied funding models, based on their scientific domain, the document states. “But in every case, creating sustainable funding streams will require the coordinated response of multiple stakeholders in the scientific, archival, academic, funding, and policy communities.”

The statement is endorsed by 30 domain repository representatives. It can be viewed on the ICPSR’s website at http://tinyurl.com/dataarchives.

The Inter-university Consortium for Political and Social Research (ICPSR), based in Ann Arbor, MI, is the largest archive of behavioral and social science research data in the world. It advances research by acquiring, curating, preserving, and distributing original research data. www.icpsr.umich.edu

The Alfred P. Sloan Foundation is a philanthropic, not-for-profit grantmaking institution based in New York. Established in 1934, the Foundation makes grants in support of original research and education in science, technology, engineering, mathematics, and economic performance. www.sloan.org

Posted by request to the editor, in line with IASSIST members' interests.

I am he as you are he as you are me and we are all together

I'm just in the process of updating who we follow from our @iassistdata twitter account (we follow members who follow us - when I get round to updating things, sorry).

Given the huge* number of followers we now have, (595, thank you one and all) I thought it would be interesting to see what we looked like according to our twitter bios.

No surprises: we define ourselves as data people or organisations, in terms of "research", "librarian" (and library related terms), "social" "science", "digital", "information", and "universities". It suggests people following us are the type of people that should be following us given the organisation's goals, and hopefully are getting some value from following @iassistdata.

*Obviously a subjective assessment when Justin Beiber has 44,625,042.

 @iassistdata twitter follower bios

Finding Historical Economic Data through FRASER and ALFRED

The North Carolina Library Association's Government Resources Section had an excellent webinar yesterday on finding historical (or vintage) economic data using FRASER and ALFRED.  The recording and slides are available to everyone. Enjoy!

Sharing data: good for science, good for you

See video

DANS has published a video to promote storing and sharing data within the research community. The video is available in Dutch and English, and shown on the DANS Youtube channel. The title of the English video is 'Sharing data: good for science, good for you': http://youtu.be/HJbo-OAaJ1I

"Scientific research produces data. The lifetime of these data varies greatly. Stored on a hard disk or USB stick they are likely to be lost in the near future together with the storage medium. Luckily, there is another, more sustainable option, which benefits science.

In this video Dutch historian Martijn Kleppe (Erasmus University Rotterdam) explains why he opened up his big photo database for other researchers to use, and quantitative data analyst Manfred te Grotenhuis (Radboud University Nijmegen) speaks about the treasures in data archives that are waiting to be discovered by researchers.

Both scientists made use of the online archiving system EASY from DANS (Data Archiving and Networked Services) in the Netherlands. As an institute of KNAW and NWO, DANS promotes sustained access to digital research data."

Feedback is welcome.

Marion Wittenberg

Congratulations to Dan Tsang and Wendy Watkins!

As some of you may know, Dan Tsang and Wendy Watkins have been named the 2013 ICSPR Flanagan Award winners for distinguished service as an ICPSR OR, http://www.icpsr.umich.edu/icpsrweb/ICPSR/support/announcements/2013/07/icpsr-announces-2013-warren-e-miller

UC-Irvine recognizes Dan here, http://www.lib.uci.edu//features/spotlights/dt-award.html
Perhaps a Canadian colleague has a similar link for Wendy.

Congratulations to both Dan and Wendy!

The Role of Data Repositories in Reproducible Research

Cross posted from ISPS Lux et Data Blog

These questions were on my mind as I was preparing to present a poster at the Open Repositories 2013 conference in Charlottetown, PEI earlier this month. The annual conference brings the digital repositories community together with stakeholders, such as researchers, librarians, publishers and others to address issues pertaining to “the entire lifecycle of information.” The conference theme this year, “Use, Reuse, Reproduce,” could not have been more relevant to the ISPS Data Archive. Two plenary sessions bookended the conference, both discussing the credibility crisis in science. In the opening session, Victoria Stodden set the stage with her talk about the central role of algorithms and code in the reproducibility and credibility of science. In the closing session, Jean-Claude Guédon made a compelling case that open repositories are vital to restoring quality in science.

My poster, titled, “The Repository as Data (Re) User: Hand Curating for Replication,” illustrated the various data quality checks we undertake at the ISPS Data Archive. The ISPS Data Archive is a small archive, for a small and specialized community of researchers, containing mostly small data. We made a key decision early on to make it a "replication archive," by which we mean a repository that holds data and code for the purpose of being used to replicate and verify published results.

The poster presents ISPS Data Archive’s answer to the questions of who is responsible for the quality of data and what that means: We think that repositories do have a responsibility to examine the data and code we receive for deposit before making the files public, and that this data review involves verifying and replicating the original research outputs. In practice, this means running the code against the data to validate published results. These steps in effect expand the role of the repository and more closely integrate it into the research process, with implications for resources, expertise, and relationships, which I will explain here.
First, a word about what data repositories usually do, the special obligations reproducibility imposes, and who is fulfilling them now. This ties in with a discussion of data quality, data review, and the role of repositories.

Data Curation and Data Quality

A well-curated data repository is more than a place to put data. The Digital Curation Center (DCC) explains that data curation means ensuring data are accessible to designated users for first time use and reuse. This involves a set of curatorial practices – maintaining, preserving and adding value to digital research data throughout its lifecycle – which reduces threat to the long-term research value of the data, minimizes the risk of its obsolescence, and enables sharing and further research. An example of a standard-setting curation process is the Inter-university Consortium for Political and Social Research (ICPSR). This process involves organizing, describing, cleaning, enhancing, and preserving data for public use and includes format conversions, reviewing the data for confidentiality issues, creating documentation and metadata records, and assigning digital object identifiers. Similar data curation activities take place at many data repositories and archives.

These activities are understood as essential for ensuring and enhancing data quality. Dryad, for example, states that its curatorial team “works to enforce quality control on existing content.” But there are many ways to assess the quality of data. One criterion is verity: Whether the data reflect actual facts, responses, observations or events. This is often assessed by the existence and completeness of metadata. The UK’s Economic and Social Research Council (ESRC), for example, requests documentation of “the calibration of instruments, the collection of duplicate samples, data entry methods, data entry validation techniques, methods of transcription.” Another way to assess data quality is by its degree of openness. Shannon Bohle recently listed no less than eight different standards for assessing the quality of open data on this dimension. Others argue that data quality consists of a mix of technical and content criteria that all need to be taken into account. Wang & Strong’s 1996 article claims that, “high-quality data should be intrinsically good, contextually appropriate for the task, clearly represented, and accessible to the data consumer.” More recently, Kevin Ashley observed that quality standards may be at odds with each other. For example, some users may prize the completeness of the data while others their timeliness. These standards can go a long way toward ensuring that data are accurate, complete, and timely and that they are delivered in a way that maximizes their use and reuse.

Yet these procedures are “rather formal and do not guarantee the validity of the content of the dataset” (Doorn et al). Leaving aside the question of whether they are always adhered to, these quality standards are insufficient when viewed through the lens of “really reproducible research.” Reproducible science requires that data and code be made available alongside the results, to allow regeneration of the published results. For a replication archive, such as the ISPS Data Archive, the reproducibility standard is imperative.

Data Review

The imperative to provide data and code, however, only achieves the potential for verification of published results. It remains unclear as to how actual replication occurs. That’s where a comprehensive definition of the concept of “data review” can be useful: At ISPS, we understand data review to mean taking that extra step – examining the data and code received for deposit and verifying and replicating the original research outputs.

In a recent talk, Christine Borgman pointed out that most repositories and archives follow the letter, not the spirit, of the law. They take steps to share data, but they do not review the data. “Who certifies the data? Gives it some sort of imprimatur?” she asks. This theme resonated at Open Repositories. Stodden asked: “Who, if anyone, checks replication pre-publication?” Chuck Humphrey lamented the lack of an adequate data curation toolkit and best practices regarding the extent of data processing prior to ingest. And Guédon argued that repositories have a key role to play in bringing quality to the foreground in the management of science.

Stodden’s call for the provision of data and code underlying publication echoes Gary King’s 1995 definition of the “replication standard” as the provision of, “sufficient information… with which to understand, evaluate, and build upon a prior work if a third party could replicate the results without any additional information from the author.” Both call on the scientific community to take up replication for the good of science as a matter of course in their scientific work. However, both are vague as to how this can be accomplished. Stodden suggested at Open Repositories that this activity is community-dependent, often done by students or by other researchers continuing a project, and that community norms can be adjusted by rewarding high integrity, verifiable research. King, on the other hand, argues that “the replication standard does not actually require anyone to replicate the results of an article or book. It only requires sufficient information to be provided – in the article or book or in some other publicly accessible form – so that the results could in principle be replicated” (emphasis added in italics). Yet, if we care about data quality, reproducibility, and credibility, it seems to me that this is exactly the kind of review in which we should be engaging.

A quick survey of various stakeholders in the research data lifecycle reveals that data review of this sort is not widely practiced:

  • Researchers, on the whole, do not do replication tests as part of their own work, or even as part of the peer review process. In the future, they may be incentives for researchers to do so, and post-publication crowd-sourced peer review in the mold of Wikipedia, as promoted by Edward Curry, may prove to be a successful model.
  • Academic institutions, and their libraries, are increasingly involved in the data management process, but are not involved in replication as a matter of course (note some calls for libraries to take a more active role in this regard).
  • Large or general data repositories like Dryad, FigShare, Dataverse, and ICPSR provide useful guidelines and support varying degrees of file inspection, as well as make it significantly easier to include materials alongside the data, but they do not replicate analyses for the purpose of validating published results. Efforts to encourage compliance with (some of) these standards (e.g., Data Seal of Approval) typically regard researchers responsible for data quality, and generally leave repositories to self-regulate.
  • Innovative services, such as RunMyCode, offer a dissemination platform for the necessary pieces required to submit the research to scrutiny by fellow scientists, allowing researchers, editors, and referees to “replicate scientific results and to demonstrate their robustness.” RunMyCode is an excellent facilitator for people who wish to have their data and code validated; but it relies on crowd sourcing, and does not provide the service per se.
  • Some argue that scholarly journals should take an active role in data review, but this view is controversial. A document produced by the British Library recently recommended that, “publishers should provide simple and, where appropriate, discipline-specific data review (technical and scientific) checklists as basic guidance for reviewers.” In some disciplines, reviewers do check the data. The F1000 group identifies the “complexity of the relationship between the data/article peer review conducted by our journal and the varying levels of data curation conducted by different data repositories.” The group provides detailed guidelines for authors on what is expected of them to submit and ensures that everything is submitted and all checklists are completed. It is not clear, however, if they themselves review the data to make sure it replicates results. Alan Dafoe, a political scientist at Yale, calls for better replication practices in political science. He places responsibility on authors to provide quality replication files, but then also suggests that journals encourage high standards for replication files and that they conduct a “replication audit” which will “evaluate the replicability and robustness of a random subset of publications from the journal.”

The ISPS Data Archive and Reproducible Research

This brings us to the ISPS Data Archive. As a small, on-the-ground, specialized data repository, we are dedicated to serious data review. All data and code – as well as all accompanying files – that are made public via the Archive are closely reviewed and adhere to standards of quality that include verity, openness, and replication. In practice it means that we have developed curatorial practices that include assessing whether the files underlying a published (or soon to be published) article, and provided by the researchers, actually reproduce the published results.

This requires significant investment in staffing, relationships, and resources. The ISPS Data Archive staff has data management and archival skills, as well as domain and statistical expertise. We invest in relationships with researchers and learn about their research interests and methods to facilitate communication and trust. All this requires the right combination of domain, technical and interpersonal skills as well as more time, which translates into higher costs.

How do we justify this investment? Broadly speaking, we believe that stewardship of data in the context of “really reproducible research” dictates this type of data review. More specifically, we think this approach provides better quality, better science, and better service.

  • Better quality. By reviewing all data and code files and validating the published results, the ISPS Data Archive essentially certifies that all its research outputs are held to a high standard. Users are assured that code and data underlying publications are valid, accessible, and usable.
  • Better science. Organizing data around publications advances science because it helps root out error. “Without access to the data and computer code that underlie scientific discoveries, published findings are all but impossible to verify” (Stodden et al.) Joining the publication to the data and code combats the disaggregation of information in science associated with open access to data and to publications on the Web. In effect, the data review process is a first order data reuse case: The use of research data for research activity or purpose other than that for which it was intended. This places the Archive as an active partner in the scientific process as it performs a sort of “internal validity” check on the data and analysis (i.e., do these data and this code actually produce these results?).

    It’s important to note that the ISPS Data Archive is not reviewing or assessing the quality of the research itself. It is not engaged in questions such as, was this the right analysis for this research question? Are there better data? Did the researchers correctly interpret the results? We consider this aspect of data review to be an “external validity” check and one which the Archive staff is not in a position to assess. This we leave to the scientific community and to peer review. Our focus is on verifying the results by replicating the analysis and on making the data and code usable and useful.

  • Better service. The ISPS Data Archive provides high level, boutique service to our researchers. We can think of a continuum of data curation that progresses from a basic level where data are accepted “as is” for the purpose of storage and discovery, to a higher level of curation which includes processing for preservation, improved usability, and compliance, to an even higher level of curation which also undertakes the verification of published results.

This model may not be applicable to other contexts. A larger lab, greater volume of research, or simply more data will require greater resources and may prove this level of curation untenable. Further, the reproducibility imperative does not neatly apply to more generalized data, or to data that is not tied to publications. Such data would be handled somewhat differently, possibly with less labor-intensive processes. ISPS will need to consider accommodating such scenarios and the trade-offs a more flexible approach no doubt involves.

For those of us who care about research data sharing and preservation, the recent interest in the idea of a “data review” is a very good sign. We are a long way from having all the policies, technologies, and long-term models figured out. But a conversation about reviewing the data we put in repositories is a sign of maturity in the scholarly community – a recognition that simply sharing data is necessary, but not sufficient, when held up to the standards of reproducible research.

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...