Session D3: views from near a plug....

By San | May 18, 2007

Your trusty blogger attended Data Access Questions:  Open and Shut (session D3).   She assumed that, as an august member of the blogging press, she would be afforded a primo seat.  But no, she again found herself sitting on the floor in the corner, next to another fake potted palm!  Thank heavens the session was moved to a neighboring room—larger and with a microphone. Obviously, a popular session on the challenges of working with data agreements from the perspectives of acquisition and user support.

Susan Cadogan, member of the acquisitions team at the UK data archive, opened with a review of the archive from 1967 to date, its expansion and a review of the legal framework for data deposit, maintenance, and dissemination.  Access to deposits is governed by one of three access conditions:  completely open access for researchers, access with the archive acting on behalf of the depositor, and use dependent on permission of the depositor (the last only on request).    The remainder was a discussion of relations with depositors in general and development of the requisite agreements governing access and use as they’ve changed over time.  Although the goal is to make data available as widely as possible, it’s also governed by requirements of the end-user license.  Over time, the collection has expanded in scope to include datasets with a greater degree of detail and wider geographic coverage, both of which can correlate with increased disclosure risks.  Prime examples are labor force and demographic datasets, which can entail special-use licenses for users.

Looking ahead, the archive plans to acquire commercial data, especially of the financial and marketing variety, being both rich in information and expensive in €   £   $.   (In the case of Datastream, content is limited to historical data due to the cost.) These have represented new challenges in the form of negotiating rather tight and specific access contracts. Future sustainability remains a question.  (Read:  How affordable is this stuff in the long term? )

In response to several questions, she elaborated on the need to acquire data more broadly (looking beyond the needs of individual research teams); the priority of getting data deposited by researchers, cleaned by archive staff, then dealing with a specified embargo period as the “price” of getting data as close in time to its production as possible (lest it disappear); how to enforce data restrictions imposed on users of more restrictive studies (site visits will be undertaken at the termination of approved projects).

Keit Bang was lead presenter of When Data Aren’t Open (with  Jennifer Darrough, formerly at the Penn State Population Research Institute).  He reviewed the history, mission, and organization of PRI, and specifically the role of the data archive that supports its researchers.  In particular, services associated with restricted-used datasets have expanded over the past several years to include 10 unique datasets, 20 restricted data contracts, support for the 105 researchers who are party to these contracts.  Challenges unique to these services include potentially protracted discussions regarding security plans; education of users regarding their obligations to meet licensing commitments (accomplished via various workshops, seminars, and general scolding when required); controlling physical access by developing security protocols with computing staff; and negotiating paperwork. Value to the mission of PRI can be demonstrated by the increasing demand for such services and publishing output (150+ articles published between 2004-2007) by researchers using restricted data services.

Robert Downs, senior digital archivist, represented CIESIN presenters on the topic of the Creative Commons licensing movement and the combined goals of  allowing people to  use and redistribute their data, document use, provide appropriate attribution, and track provenance.  Traditional data licensing is, of course, challenging, fraught with time-consuming paperwork, records maintenance hassles, and others too numerous to detail here.  (Thank heavens for university counsel!) There are two parts to permissions issues for CIESIN:  getting permissions from data providers (including identifying ownership, not easy in the case of researcher collaborations), requesting non-exclusive rights for CIESIN, avoiding use restrictions whenever possible, requesting permission to permit 3rd party redistribution.  The second component concerns what can be called the user or distribution end:  establishing use and re-distribution, determining what parts may be copyrighted versus freely distributable (example:  Natural Disaster Hotspots Datasets might be copyrightable but underlying hazard and vulnerability data may not be).  How to handle such murky issues muddies application of the Creative Commons concept.

Efforts so far have included a CC license to maps CIESIN creates and disseminates which such information embedded in the graphic and included in related metadata.  (For information on CC:  )

Tanvi Desai, database manager at LSE Research Laboratory, supports about 200 academic researchers.  She shared her experiences regarding procedures to gain access to Eurostat products, a daunting task for many and especially for those not affiliated with an EU academic or research institution.  The latter group must apply for admissibility, a process that takes – well, let’s just say it takes a really, really long time, a lot longer than the boilerplate 10 weeks.  She described the differences between institutional contracts and individual contracts for data use, sharing of output with Eurostat, and a host of related procedures regarding user status at their institutions, inconsistent coverage of variables/countries across datasets, difficulties associated with collaborators at more than one institution, the benefits and challenges associated with various Eurostat data access methods (CD-ROM distribution, via secure data lab in Luxembourg, remote system), and difficulties regarding getting data cleared for release by Eurostat statisticians.  (Nope, not going near that last one in print.)

Submitted by Pam  Baxter (the one behind the plant….)