Already a member?

Sign In
Syndicate content

Blogs

New IQ now available!

Editor notes: 

Data, the whole Data, and nothing but the Data … and the Metadata, and the Access to Data

Welcome to the third issue of volume 38 of the IASSIST Quarterly (IQ 38:3, 2014). This issue is unquestionably about data. There are three papers on projects for improving delivery of data to users.

The first paper is ‘Distributing Access to Data, not Data’ by David Schiller from the Institute for Employment Research (IAB) at Nuremberg (Germany) and Richard Welpton at UK Data Archive, University of Essex (UK). They focus on the problem that access to European microdata for researchers is restricted by national borders and the barriers for performing comparative analyses between the member states. The ‘Data without Boundaries’ project now has an initiative to build a ‘European Remote Access Network’ (EuRAN). The problem is that prevention of identifying respondents in the microdata conflicts with the importance for modern research methods of access to detailed data. Some control is necessary and the paper describes remote access as the appropriate answer in the forms of job submission, remote execution, and remote desktop. As an example, one version of secure remote desktop access encrypts pictures of the desktop screens to make secure the transport over the Internet. The authors reference a set of principles for access, e.g., that it is not desirable to physically move data and that access should come through a single point that can access multiple sources of data. The researchers’ need to analyse the data is supported by a ‘Virtual Research Environment’ that includes software for generating and presenting results through the EuRAN project.

The next paper presents a two-year metadata project based upon two well-known series of studies: the American National Election Study (ANES) and the US General Social Survey (GSS). The goal is to improve their metadata and build demonstration tools to illustrate the value of structured, machine-actionable metadata as reported in ‘Creating Rich, Structured Metadata: Lessons Learned in the Metadata Portal Project’. The authors are Mary Vardigan (Inter-university Consortium for Political and Social Research (ICPSR)), Darrell Donakowski (American National Election Studies (ANES), University of Michigan), Pascal Heus (Metadata Technology North America (MTNA)), Sanda Ionescu (ICPSR), and Julia Rotondo (NORC at University of Chicago). The article reports on their experiences, and also includes recommendations. The National Science Foundation funded the project under the ‘Metadata for Long-standing Large-Scale Social Science Surveys’ (META-SSS) program. ICPSR and ANES are co-distributors of most of the ANES studies while the GSS is co-distributed by NORC, the Roper Center, and ICPSR. In the project metadata tools revealed small differences between supposed identical datasets, for instance in study titles, variable names, etc. The project also decided which types of content to include. Both of the the series are huge collections - as the 58 ANES surveys contain 79,521 variables and the cumulative GSS has 5,558 variables. Marking up this legacy documentation is laborious and time-intensive and the future naturally lies in capturing the metadata at the source. In conclusion, the project learned a great deal about converting legacy documentation and identified several steps for documentation development, including the areas of paradata and versions of datasets. The concept of versions of datasets relates to the solution described in the first paper of not bringing data but access to data to the users.

The third paper demonstrates further work in the project described above. In the paper ‘Mapping the General Social Survey to the Generic Statistical Business Process Model: NORC’s Experience’ the three authors - Scot Ausborn, Julia Rotondo, and Tim Mulcahy – all from NORC at the University of Chicago - present how they carried out the mapping of the GSS workflow to the Generic Statistical Business Process Model (GSBPM). An analysis of the business processes for the production of survey data was carried out with the intention of direct capture of survey cycle DDI-based metadata, thus avoiding the need to generate it retroactively. The work is based upon an internal survey of GSS staff, asking them to explicate their respective roles on the survey in terms of the GSBPM. Connecting aspects of the GSS workflow to elements of the GSBPM produced a comprehensive and integrative view of the individual efforts that together produce the survey. Of the lessons learned, I noticed that they later found that it may have been more fruitful to have held a workshop in which GSS staff could discuss the workflow processes together, rather than having a survey with each person providing his or her input in isolation. They mention that they think an expert in GSBPM could have conducted the mapping of the workflow; however they did identify points for improvement in the workflow relating to both metadata and paradata.

Articles for the IASSIST Quarterly are always very welcome. They can be papers from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing a presentation, give a thought to turning your one-time presentation into a lasting contribution to continuing development. As an author you are permitted ‘deep links’ where you link directly to your paper published in the IQ. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the session participants, and will be readily available on the IASSIST website at http://www.iassistdata.org.

Authors are very welcome to take a look at the instructions and layout:http://iassistdata.org/iq/instructions-authors.

Authors can also contact me via e-mail: kbr@sam.sdu.dk. Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.


Karsten Boye Rasmussen
March 2015
Editor

Winner announced for first IASSIST Paper Competition

Dear IASSIST Members,

In our call for this year's conference we included a new Paper Track that would require members to submit a full paper in advance of the conference. We also created a best paper competition as an incentive to submit. I have the pleasure to announce a winner of our first IASSIST Paper Competition!

The winning paper was "Sustainability of Social Science Data Archives: A Historical Network Perspective” by Kristin R. Eschenfelder, Morgaine Gilchrist Scott, Kalpana Shankar, Ellen LeClere, Rebecca Lin, and Greg Downey. Kristin as lead author will receive a free registration for a future IASSIST conference, and the entire team will be recognized at the IASSIST Business Meeting on Wednesday, June 3 at 4:45-5:15. The paper stood out for its fit with the conference theme, relevance to IASSIST Quarterly, and research design.

We have all submitted conference papers available on our website as a way to encourage feedback from attendees (https://sites.google.com/a/umn.edu/iassist-2015/paper-submissions). Every paper will be considered for publication in IQ.

A big IASSIST thanks to our authors for helping us kick off a potentially new tradition. Also, thank you to the sub-committee members (Karen Hogenboom, Thomas Lindsay, Sara Holder, Michelle Edwards, and Berenica Vejvoda) for the hard work to select a winner.

They are helping to make IASSIST 2015 the best conference ever!  See you all soon!

Lynda & Sam
Program Committee Co-Chairs

Lynda M. Kellam Data Services & Government Information Librarian Adjunct Lecturer in Political Science University of NC at Greensboro

"Before anything else, preparation is the key to success." Notes from RDMF13: Preparing Data for Deposit

The Digital Curation Centre’s most recent Research Data Management Forum took place last week in London.

UK Data Service’s Louise Corti began the day with an overview of their acquisitions process. The Service (under various names) is almost 50 years old that gives it experience and perspective many institutions do not have. Lessons from those years include the importance of a collections development policy that’s allowed to evolve. The Archive evaluates on a basis of teaching and re-use for validation and replication. They have learnt from past mistakes and now keep access licences to three options: open, safeguarded (requiring registration), and controlled (locked-down access). Common problems persist however. Poor file names, weak description of methods and contextual documentation, limited metadata, and unexplained missing data files. The UK Data Service play a number of roles as a data service, from hand-holders and evangelical preachers, to being the Economic and Social Research Council’s police officer for non-compliance on data sharing.

Suzanne Embury made a valuable point in her presentation. Of course, the one thing we know is we don’t know how other people will re-use data in the future. But we can reasonably guess what they will want to do is discover, integrate, and aggregate it. To this end, simple things can help – check spellings, aim for standardised vocabularies, avoid acronyms. Finally, apply a domain expert test to see if people in the discipline can independently understand the data. With that, echoes of Gary King’s replication standard came to mind.

A presentation on meeting the RDM challenge focused on the University of Loughborough who have adopted a data preservation and sharing solution based on figshare and Arkivum support. Loughborough desire making depositing data as easy as possible for researchers by taking care of as much of back end stuff as possible. But at what cost, in both finances and quality? At the last IASSIST we learnt RDM takes a village, but Loughborough acknowledged the contribution of 61 people in setting up their service, so maybe it really takes a small metropolitan statistical area.

IASSIST’s own web editor Robin Rice directed us through data deposit at the University of Edinburgh guided by former IASSIST president Peter Burnhill’s refrain of "helping researchers to do the right thing". Edinburgh provide support throughout the data lifecycle with strong training resources (Research Data MANTRA), plus face-to-face sessions on managing data, creating DMP, good practice, handling data in SPSS, working with personal and sensitive research data. Like the UK Data Service, they recognise the value in keeping things simple and offering good incentives. Licence options, for example. Their repository only accepts open data (CC-BY 4.0) but depositing is based on five required metadata fields. In return, depositors get their data available quickly with open download stats for every item.

The afternoon sessions split into three discussion groups. Emerging from them were thoughts on keeping metadata requirements as simple as possible, recognising the concentrate on different aspects depending on the discipline; some disciplines require precision while others do not require so much. An acknowledgement that data discovery is often undertaken through google. Also, while there inevitably is a range of people providing a service, there needs to be or a person connecting existing resources in a university. Finally, raising awareness is a problem, demand related to institutional awareness.

Presentations from the event are available from the DCC, and tweets with the hashtag #rdmf13. The DCC will be blogging about the discussion group sessions.

IASSIST election results, 2015

Hello IASSISTers!

Here are the official results of the 2015 IASSIST elections.  There was a 61% voter turnout.  The winning candidates are:

President: Tuomas Alaterä

Vice President: Jen Green

Secretary: Ryan Womack

Africa Regional Secretary: Lynn Woolfrey

Asia-Pacific Regional  Secretary: Sam Spencer

Canada Regional Secretary: Carol Perry

Europe Regional Secretary: David Schiller

USA Regional Secretary: San Cannon

AC Member, Canada: Berenica Vejvoda

AC Members, Europe: Oliver Watteler and Arne Wolters

AC Members, USA: Kate McNeill, Jen Darragh, and Ashley Jester

Many, many thanks to all candidates who agreed to stand, and congratulations to our new officers.  Newly elected officers’ terms officially begin at the end of the Annual Business Meeting of the Association at the 41st Annual IASSIST conference in Minneapolis, but they are welcome to attend the Administrative Committee meeting preceding the conference as observers if they so wish.

Melanie Wright

Chair, IASSIST Nominations and Elections Committee

“You can’t have a democratic society, without having a good data base.”

Janet L. Norwood, former US Bureau of Labor Statistics commissioner, dies

On the passing of this iconic defender of the neutrality of public data, I am struck how important Janet Norwood was to establishing a sound path for data advocacy as well as reminded of how necessary it is to have continuous education about this topic.  In fact, swimming in ready-access to data as we are today, it's especially important that we, as data professionals, remain alert to and defend a couple of aphorisms:

  • Stay true to the facts; Zealously retain non-partisan associations in the recording of all public data, analyses and reporting.
  • Use it for GOOD -- never for EVIL”  Encourage the use of public data for the public good.

 In reviewing the memorials to Janet Norwood, a couple of succinct statments seem apt (in addition to the heading of this post).

Simply put, all U.S. policy makers, businesses and families can make better decisions every day because of Janet Norwood’s work at B.L.S. ~Erica L. Groshen, the bureau’s current commissioner

“I believe strongly,” said economist Janet L. Nowood, “that an objective, scientifically created system of data is essential for a democracy to flourish.” ~ Democracy’s Statistician: Janet L. Norwood, 1923-2015 By Social Science Space.

~Paula Lackie (Carleton College & cochair of the IASSIST Professional Development Committee)

Spring forward! The Jisc Research Data Spring programme

On 26/27 February, I attended Jisc Data Spring “Sandpit 1” in the English city of Birmingham. Data Spring is a funding programme supporting UK based projects in Research Data Management (RDM), and something of a successor to the successful Managing Research Data programmes (MRD) that did so much to get RDM training and tools underway in the UK’s education sector.

Unlike the traditional proposal-evaluation-funding model, Data Spring takes a more collaborative, interactive approach, splitting the programme into separate stages at which projects may no longer receive funding. If that sounds like the approach of entertainment modern TV shows, then you would not be wrong to think that. Beginning with an open call, some 70 proposals were available online for voting and comments. These reduced to 44 by the time of a workshop [PDF] at the recent IDCC conference. At the “Sandpit” (metaphorical, not literal, sadly), these proposals had to fit 27 available slots to proceed to the next stage. Through a process of negotiation, mergers and acquisitions, and hasty matchmaking, all 44 managed to get through in some form from the first day to the second.

The second day consisted of the now 27 projects making four-minute pitches to a panel of judges. By mid-March, successful projects will receive notice of three months testing and prototype funding before reporting to a similar event in June. Following this event, projects may receive a further four months of funding before a final workshop in November allows six months of funding leading to the programme’s conclusion in 2016.

Having been part of the JISCMRD Program (Jisc has since switched to sentence case from caps), it was notable how much the area has moved on since those days. From evidence gathering and basic training tools to RDM support focused on integration into existing workflows. That this occurred is a testament to the original MRD programme, and the support, work, and imaginations of those involved. Whatever projects make it through to the end of Data Spring, I have no doubt they will be worth the attention of people involved in Research Data Management both inside and outside the UK.

You can review projects at the Data Spring ideascale and figshare pages and tweet about them using #dataspring.

UPDATE: a storify of the event is also available.

A decade against decay: the 10th International Digital Curation Conference

The International Digital Curation Conference (IDCC) is now ten years old. On the evidence of its most recent conference, is in rude health and growing fast.

IDCC is the first time IASSIST decided to formally support another organisational conference. I think it was a wise investment given the quality of plenaries, presentations, posters, and discussions.

DCC already has available a number of blogs covering the substance of sessions, including an excellent summary by IASSIST web editor, Robin Rice. Presentations and posters are already available, and video from plenary sessions will soon be online.

Instead I will use this opportunity to pick-up on hanging issues and suggestions for future conferences.

One was apportionment of responsibility. Ultimately, researchers are responsible for management of their data, but they can only do so if supporting infrastructure is in place to help them. So, who is responsible for providing that: funders or institutions? This theme emerged in the context of the UK’s Engineering and Physical Sciences Research Council who will soon enforce expectations identifying the institution as responsible for supporting good Research Data Management.

Related to that was a discussion on the role of libraries in this decade. Are they relevant? Can they change to meet new challenges? Starting out as a researcher who became a data archivist and is now a librarian, I wouldn’t be here if libraries weren’t meeting these challenges. There’s a “hush” of IASSIST members also ready to take issue with the suggestions libraries aren’t relevant or not engaged with data, in fact they did so at our last conference.

Melissa Terras, (UCL) did a fantastic job presenting [PDF] work in the digital humanities that is innovative in not only preserving, but rescuing objects – and all done on small change research budgets. I hope a future IDCC finds space for a social sciences person to present on issues we face in preservation and reuse. Clifford Lynch (CNI) touched on the problems of data reuse and human subjects, which remained one of the few glancing references to a significant problem and one IASSIST members are addressing. Indeed, thanks must go to a former president of this association, Peter Burhill (Edinburgh) who mentioned IASSIST and how it relates to the IDCC audience on more than one occasion.

Finally, if you were stimulated by IDCC’s talk of data, reuse, and preservation then don’t forget our own conference in Minneapolis later this year.

Chronology of data library and data centres

A few days ago I asked on the IASSIST mailing list for some help in order to find out dates of creation of data libraries, data centres and such services. It was overwhelming to receive answers from colleagues from everywhere with dates and some other useful information about the establishment of local data support and national services.

There is a wealth of information in this community around these issues and with the increasing importance of data services we need to make sure we collect and make this information accessible. After all, our data obsession comes with the trade. ; )

There were many colleagues that asked for all the information to be compiled and shared. Thus I have prepared an initial google sheet titled "Chronology of data libraries and data services" with the information from all responses.

I have added a few extra fields such as country or type of service but am sure there would be many others that could be interesting. The list is by no means complete or perfect so I ask again for help from colleagues to add or edit (you will need to request edit access for this).

I also wonder whether other information of IASSIST membership could be merged to construct an even more powerful dataset. All comments, suggestions and volunteering is welcome.

IASSIST Fellows Program 2014-15

The IASSIST Fellows Program is pleased to announce that it is now accepting applications for financial support to attend the IASSIST 2015 conference in Minneapolis [https://sites.google.com/a/umn.edu/iassist-2015/], from data professionals who are developing, supporting and managing data infrastructures at their home institutions.

Please be aware that funding is not intended to cover the entire cost of attending the conference. The applicant's home institution must provide some level of financial support to supplement an IASSIST Fellow award. Strong preference will be given to first time participants and applicants from those countries currently with insufficient representation at IASSIST. Only fully completed applications will be considered. Applicants submitting a paper for the conference will be given priority consideration for funding.

You may apply for funding via this form <https://docs.google.com/spreadsheet/viewform?usp=drive_web&formkey=dEhLcnNIcE4xWW9NUzBwZnViNy1sUWc6MA#gid=0>.The deadline for applications is the 31st of January 2015.

For more information, to apply for funding or nominate a person for a Fellowship, please send an email to the Fellows Committee chairs, Florio Arguillas (foa2@cornell.edu) and Stuart Macdonald (stuart.macdonald@ed.ac.uk)

Hallelujah and praise the LARD! The first London Area Research Data group meeting

LARD is London Area Research Data and this was its inaugural meeting, informally bringing together various people from London based institutions (and as far away as Reading) who are charged in some way with Research Data Management (RDM) - be it research support or repository work.

These are my notes, which lack attribution partly because I couldn't remember where every person was from, and also it wasn't clear if the meeting was on or off the record. Nonetheless, I felt there were some interesting points that deserve sharing as an insight into how UK universities (and one research centre) are dealing with RDM less than a year away from the EPSRC deadline on expectations of compliance for research data.

The first item in what was a free form discussion (think RDM jazz - hence my beat style kind of note taking, with full stops however), was policies. Some institutions have data policies, some have draft policies, and others have no policy. The mood seemed to be that a policy was more effective as a mandate for focusing university attention and resources on support services, not so much for grabbing researchers’ attention. Researchers, it was said, tend to react more to what funders want rather than university policies or documents. Those universities that competed for Medical Research Council (MRC) funding felt the MRC demanded institutional data policies, and so those institutions tended to adopt or have drafts ready for adoption. Yet most researchers are not funded by one of the RCUK councils, and these are often funders without data mandates. The group found a problem telling researchers that they don’t own their own data (it’s often funders or institutions through employee created works clauses). There was also a sense that researchers worry about data protection and are looking for practical guidance on how to keep data safe and secure. There was also a recognition that disciplines matter, those disciplines that do not have a strong culture of sharing data can be helped with the weight of institutional support providing the infrastructure to support RDM. This tackles the disciplinary focus of researchers, or localism. An example of how a bad experience can focus attention was mentioned when a researcher lost data by plugging a malware infected hard drive into a university network and had to have the drive and the copy of the data destroyed. Episodes like this can be used to tackle the culture of “improvisation” when it comes researchers “backing-up” their data without, or without engaging, institutional support. Aside from acting as a “wake-up” for researchers, they can push universities into providing workable, easy to use, institutional storage - either working storage or preservation in an institutional repository.

Discussion then moved round to the EPSRC expectations for research data, with those who attended a recent DCC event on the EPSRC expectations reporting that the EPSRC are not looking to get rid of opportunities for supporting research, so are not likely to cut off funding come May 2015. However, they do expect to see evidence that institutions are working towards or trying to improve storage, support, and data discovery and access. Nonetheless, there is no doubt the EPSRC policy has focused knowledge and effort in institutions towards RDM. Then training was mentioned. When the “T” word is mentioned I often think of that line about if people don't want to come how are you going to stop them? To save us from preparing to teach to empty rooms, the thinking now seems to be towards providing support when people need it and building up a directory of experts to refer to when appropriate. Structured support is based on identifying four key stages in the data lifecycle: submitting a proposal (for help on data management planning), when proposals are accepted (implementing RDM), mid-project (supporting implementation), and towards the close to talk about preservation. The key is to keep engagement with researchers. One institution is trying to do this for all research projects at that institution so is working with their research office to target RCUK funded projects. Another institution initially plans to work with a sample of projects.

By now the discussion had moved on to data management planning. One institution had a Data Management Plan (DMP) template and DMP requirement as part of its data policy, with separate plans for staff and postgraduate students. The feeling was that template texts are not such a good thing if they are copied and pasted into DMPs. A case was mentioned of one research funder refusing to fund a project because the DMP used identical text to another DMP submitted from that institution. The DCC’s DMPOnline tool was mentioned, particularly it’s ability to be customised towards an institution. It was also mentioned that DMPOnline has been much improved in later versions. A policy was mentioned at one institution of not offering storage until a DMP has been completed, another institution reported on how there is a checkbox in the research office to signify that the DMP has been looked at by the data management officer.

The RDM equivalent of Godwin's law (or Godwin's Rule of Nazi Analogies), is that at some point cost will be mentioned. How to cost RDM is an ongoing problem. Given the problem of identifying costs that specifically relate to RDM activity, as opposed to to typical research requirements that have an RDM aspect, an additional problem is that RCUK funders mostly allow budgeting for RDM but that budgeting must not identify activity that is supported as part of general institutional funding. Auditing costs is a problem. Storage tends to have the easier to identify costs (storage per byte for example), but this can be a problem if data is stored in an institutional repository when the budget for the project identified separate storage costs. For this reason, solutions like Arkivum may be advantageous as they can be specified as an auditable costs.

The coda to this discussion concerned metadata. It was said that funders were keen on ensuring that good quality metadata accompanies research data generated by projects they support, and that they are willing to allow proposals that factor in additional time and resources for metadata. However, an obvious problem is who should be adding that metadata - is it researchers who know the data, but not necessarily the standard or see its importance in the way RDM support staff do; or should it be RDM staff, particularly repository staff, who know they type of information required but do not necessarily know the data or discipline that well. Finally, hitting on a standard that that is applicable to all data is a problem. Social science is not the same as genetics; art history is not the same as management. It was then asked if there was a way to harvest metadata when that metadata is created elsewhere (say, the UK Data Service). Both the DCC and UK Data Service are working on a Jisc funded Research Data Registry and Discovery Service and the European Union are also working on data discovery platforms that imports/exports catalogue record metadata.

The feeling at the end of this initial meeting was LARD provided a useful forum for sharing practice and learning from contemporaries and there was enthusiasm for follow-up meetings including those based around structured themes. If you work in a big city, and there are people doing similar things to you in that city, take advantage and get together to talk. So, thanks to Gareth Knight (LSHTM), Stephen Grace (UEL), and Veronica Howe (KCL) for organising, facilitating, and hosting LARD #1.

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...