This workshop will provide participants with a crash course on the basics and terminology of finance across borders. nbsp;It will also discuss commercial and free sources for international financial data. The differences in North and South American, European, and Asia-Pacific markets will be detailed in terms of how reporting standards and control mechanisms translate into data availability and quality. nbsp;Sources for single market versus global data will be examined. nbsp;Finally, statistical analysis tools for financial data will be compared.
SPSS, STATA and SAS: Flavors of Statistical Software
A. Michelle Edwards (University of Guelph)
So many different flavors to choose from – how will we ever choose? Are they all the same? Do they have the same functionality? Which one should I use? Which is the quickest to learn? Questions many of us have encountered in one form or another… This workshop will take you on a quick tour of Stata, SPSS, and SAS. We will examine a data file using each package. Is one more user-friendly than the others? Are there significant differences in the codebooks created? We will also look at creating a frequency and cross-tabulation table in each. Which output screen is easiest to read and interpret? The goal of this workshop is to give you an overview of these products and provide you with the information you need to determine which package fits the requirements of you and your user. Please bring your experiences and/or horror stories about working with statistical software to this workshop. Together we'll try to demystify the flavors of statistical software and help you decide on a favorite flavor.
Inside Roper Center Services: Beyond Survey Questions and Answers
Lois E. Timms-Ferrara (The Roper Center for Public Opinion Research)
Marc Maynard (The Roper Center for Public Opinion Research)
This workshop will provide instruction to participants interested in supporting the discovery and utilization of public opinion surveys using the new release of the Roper Center’s iPOLL Databank and RoperExpress services. After a brief introduction to Roper Center services, participants will be trained in the hands-on use of the most recent release of the iPOLL Databank highlighting new search and reporting features and new graphical display. Attendees will be walked through the search interface and provided with tips and suggestions for more efficient and productive results. nbsp;Further, the workshop will cover RoperExpress data download services, conversion of ASCII datafiles into SPSS, especially focusing on tips to using Center documentation and data. Workshop coverage will include an introduction to Roper Center use of social media for communicating with users about Center resources, as well as, suggested practices for providing campus-wide access to iPOLL and RoperExpress. The opinion research industry is encouraging greater transparency in reporting surveys, from full reporting of citations, sampling methodology, questions, and disposition of attempts to reach respondents in order to calculate response rates. The workshop will cover what this new attempt at disclosure means for the research community and provide case studies of how to calculate response rates using the various acceptable formulas from the newly released information.
Digital Preservation Management - Part 1: Standards and Practice
Nancy McGovern (ICPSR)
The Preserving Digital Information released in 1996 marks a starting point for the emergence of the digital preservation community, consolidating data archive and digital archive practice from the 1960s on and more recent developments in digital content management across a range of domains. For more than a dozen years, standards and common practice for digital preservation have been developed and increasingly promulgated. This session demonstrates the utility of standards and practice as tools for managers of digital content in developing effective digital preservation programs that fit the needs and resources of their organizations. The session uses Trusted Digital Repositories: Attributes and Responsibilities http://www.oclc.org/programs/ourwork/past/trustedrep/repositories.pdf to frame the discussion of the organizational context for digital preservation programs and Reference Model for an Open Archival Information System http://public.ccsds.org/publications/archive/650x0b1.pdf (OAIS) for discussing technological infrastructure requirements. The Digital Preservation Management tutorial http://www.icpsr.umich.edu/dpm/ provides useful background on concepts and foundations of digital preservation for review prior to the workshop.
Newly Available Integrated Data from The University of Michigan and The Minnesota Population Center, part 1
Sarah Flood (Minnesota Population Center)
Katie Genadek (Minnesota Population Center)
Christopher Ward (ICPSR)
Newly Available Integrated Data for Social Scientists, Demographers and Health Researchers In this day-long two-part workshop, representatives from the Minnesota Population Center (MPC), the Population Studies Center (PSC) and the Inter-university Consortium for Political and Social Research (ICPSR) will team up to demonstrate the very latest harmonized resources for social, demographic, and health research. PSC and ICPSR staff and faculty will demonstrate the Integrated Fertility Survey Series (IFSS) a new data product funded by a 5-year grant funded by the Eunice Kennedy Shriver National Institutes for Child Health and Human Development. The IFSS is designed to provide harmonized data files from 10 national surveys of women's union formation and fertility behavior. The surveys belong to 3 separate survey series: (1) Growth of American Families (1955, 1960), (2) National Fertility Survey (1965, 1970) and the (3) National Survey of Family Growth (1973, 1976, 1982, 1988, 1995, 2002). The presenters will have harmonized data available to demonstrate the harmonization method and the delivery method for the data files, and will also describe their harmonized cross-sectional weights and design components, which can be applied to analyses across time. The Minnesota Population Center (MPC) is one of the world's leading developers of demographic data resources (including IPUMS-USA, IPUMS International, IPUMS-CPS, NHGIS, NAPP, and IHIS) . In this workshop, attendees will learn about the content of the latest MPC data resources and receive basic information about how to get and use the data, which are available free over the internet. A special focus of this workshop will be the MPC's newest project, ATUS-X, which provides harmonized American Time Use Survey data from 2003 forward on how U.S. adults divide their time among activities. ATUS-X is an MPC project conducted in cooperation with researchers at the University of Maryland Population Research Center.
Introduction to R
Harrison Dekker (University of California Berkeley)
Ryan Womack (Rutgers University)
R is a free and open source statistics package that has become increasingly popular in recent years. It runs under all the major operating systems and has an enormous and expanding collection of add-on packages to support data analysis in almost every imaginable domain. It's downside, though, is that R is almost entirely command line driven, giving it a steeper learning curve than commercial software options like SAS, SPSS, and Stata. This workshop will focus on R's unique command syntax. In particular, we'll explore how to load and manipulate data in R. Examples will be similar to those used in the morning SAS, Stata, and SPSS workshop.
DDI 3 Repository-Based Data Management with Colectica
Jeremy Iverson (Algenta )
Colectica is a platform for creating, documenting, managing, distributing, and discovering data. Colectica is built on open standards including DDI 3. This training course covers the following topics: Introduction to ColecticaWorking with metadata repositories for collaboration and version controlDocumenting concepts and general study designDesigning and documenting data collection instrumentsCreating and documenting data productsIngesting existing resourcesPublishing resourcesEnabling enhanced discoverability by leveraging existing social networking technologiesHands-on: use Colectica to manage a sample study
Go screencasting go! Creating screencast tutorials to support data products
Lynda Kellam (University of North Carolina Greensboro)
Many libraries have created screencast tutorials to support instruction efforts. Typically these tutorials demonstrate specific journal article databases. More data professionals are recognizing the usefulness of screencast tutorials for teaching about data products to specific audiences. An example is ICPSR's conversion of many of its text-based tutorials to screencast versions on the Data User Help Center (http://www.icpsr.umich.edu/icpsrweb/ICPSR/help/datausers/) In this workshop, we will learn how to create a basic screencast tutorial using a free software called Jing!. We will first overview the software available for screencast creation and then discuss best practices in creating screencast tutorials, including length, audience, and scripting. In the second portion, participants will write a short script and practice using Jing! to create their own screencast tutorials for a chosen data product. Finally, participants will discuss the place of screencast tutorials in the data world with the goal of creating best practices guidelines for data-oriented screencast tutorials. Participants will leave the workshop with a finished draft of a screencast tutorial as well as guidelines and suggestions for future tutorials.
Digital Preservation Management - Part 2: Trends and Sustainability
Nancy McGovern (ICPSR)
Standards and practice define the what and the how of digital preservation. A challenge for managers is continuing effective management of digital content over time as technology evolves, bringing new content and new techniques, and available resources for digital programs fluctuate. This session reviews trends in recent research and development in digital preservation (e.g., repositories, tools, workflows), identifies current sources for continually tracking developments, and discusses the challenges of establishing sustainable programs. Note: Part 1 is a prerequisite for Part 2. Both workshops include as many exercises and examples as possible.
Newly Available Integrated Data from The University of Michigan and The Minnesota Population Center, part 2
Sarah Flood (Minnesota Population Center)
Katie Genadek (Minnesota Population Center)
Christopher Ward (ICPSR)
Newly Available Integrated Data for Social Scientists, Demographers and Health Researchers In this day-long two-part workshop, representatives from the Minnesota Population Center (MPC), the Population Studies Center (PSC) and the Inter-university Consortium for Political and Social Research (ICPSR) will team up to demonstrate the very latest harmonized resources for social, demographic, and health research. PSC and ICPSR staff and faculty will demonstrate the Integrated Fertility Survey Series (IFSS) a new data product funded by a 5-year grant funded by the Eunice Kennedy Shriver National Institutes for Child Health and Human Development. The IFSS is designed to provide harmonized data files from 10 national surveys of women's union formation and fertility behavior. The surveys belong to 3 separate survey series: (1) Growth of American Families (1955, 1960), (2) National Fertility Survey (1965, 1970) and the (3) National Survey of Family Growth (1973, 1976, 1982, 1988, 1995, 2002). The presenters will have harmonized data available to demonstrate the harmonization method and the delivery method for the data files, and will also describe their harmonized cross-sectional weights and design components, which can be applied to analyses across time. The Minnesota Population Center (MPC) is one of the world's leading developers of demographic data resources (including IPUMS-USA, IPUMS International, IPUMS-CPS, NHGIS, NAPP, and IHIS) . In this workshop, attendees will learn about the content of the latest MPC data resources and receive basic information about how to get and use the data, which are available free over the internet. A special focus of this workshop will be the MPC's newest project, ATUS-X, which provides harmonized American Time Use Survey data from 2003 forward on how U.S. adults divide their time among activities. ATUS-X is an MPC project conducted in cooperation with researchers at the University of Maryland Population Research Center.
2010-06-02: Plenary I
Repositories and Cloud Services for Data Cyberinfrastructure
Sandy Payette (DuraSpace)
An historical look at the emergence of infrastructure – electric grids, railways, and the Internet – reveals that a key developmental stage has been reached when formerly incompatible or stand-alone systems are interconnected via adapters and gateways. We see this in the digital repository domain, where the notion of a digital repository began as a predominantly institutional phenomenon, or in certain cases a disciplinary phenomenon, but is now evolving to a point where repositories are becoming integrated components of larger systems and distributed infrastructure. At the same time, we see the emergence of cloud technologies that offer large-scale storage and compute capabilities, but with open questions on how well the cloud will meet the requirements of digital preservation and data archiving. In this session, I will discuss how the DuraSpace not-for-profit organization is working to evolve open source repositories and cloud services to serve as core components within emerging data cyberinfrastructure. In collaboration with the Fedora and DSpace communities, we are positioning repositories and cloud services to become part of a distributed data curation and archiving fabric that encompasses both institutional systems and the Web. I will also provide highlights from the DuraSpace partnership with the Data Conservancy, an NSF-funded Datanet project
2010-06-02: A1: Developing a Longitudinal Data Archive: Lessons Throughout the Data Lifecycle
Direct Visualization of Longitudinal Data
Kevin Pulo (Australian National University)
This session examines best practice standards associated with managing and archiving longitudinal data. The data structure of longitudinal studies is more complex than for one-dimensional study designs and therefore new issues typically arise with the process of data management, archiving and analysis. There are a number of different types of longitudinal studies now in existence, including but not limited to: - panel surveys following a cohort of individuals - panel surveys following a random population sample of individuals over a period of time - repeated cross-sectional surveys with a different cross-section of individuals sampled at each time point. Five major longitudinal panel studies are currently archived at the Australian Social Science Data Archive, and there is increasing demand to archive data from regional and national longitudinal surveys, as well as repeated cross-sectional data such as Australian election campaigns. The presentations in this session will examine archiving practices associated with the different stages of longitudinal data archiving, including: - Archiving cross-sectional time series data (Leanne den Hartog) - Archiving longitudinal panel data (Steven McEachern) - Visualisation of longitudinal panel data (Kevin Pulo)
Meta Data Standards for Managing and Archiving Longitudinal Data: Achieving Best Practice
Steven McEachern (Australian National University)
Melanie Spallek (University of Queensland)
Michele Haynes (University of Queensland)
Mark Western (University of Queensland)
This session examines best practice standards associated with managing and archiving longitudinal data. The data structure of longitudinal studies is more complex than for one-dimensional study designs and therefore new issues typically arise with the process of data management, archiving and analysis. There are a number of different types of longitudinal studies now in existence, including but not limited to: - panel surveys following a cohort of individuals - panel surveys following a random population sample of individuals over a period of time - repeated cross-sectional surveys with a different cross-section of individuals sampled at each time point. Five major longitudinal panel studies are currently archived at the Australian Social Science Data Archive, and there is increasing demand to archive data from regional and national longitudinal surveys, as well as repeated cross-sectional data such as Australian election campaigns. The presentations in this session will examine archiving practices associated with the different stages of longitudinal data archiving, including: - Archiving cross-sectional time series data (Leanne den Hartog) - Archiving longitudinal panel data (Steven McEachern) - Visualisation of longitudinal panel data (Kevin Pulo)
Archiving cross-sectional Time Series Data: Data is Only Half the Story
Leanne den Hartog (University of Western Australia)
This session examines best practice standards associated with managing and archiving longitudinal data. The data structure of longitudinal studies is more complex than for one-dimensional study designs and therefore new issues typically arise with the process of data management, archiving and analysis. There are a number of different types of longitudinal studies now in existence, including but not limited to: - panel surveys following a cohort of individuals - panel surveys following a random population sample of individuals over a period of time - repeated cross-sectional surveys with a different cross-section of individuals sampled at each time point. Five major longitudinal panel studies are currently archived at the Australian Social Science Data Archive, and there is increasing demand to archive data from regional and national longitudinal surveys, as well as repeated cross-sectional data such as Australian election campaigns. The presentations in this session will examine archiving practices associated with the different stages of longitudinal data archiving, including: - Archiving cross-sectional time series data (Leanne den Hartog) - Archiving longitudinal panel data (Steven McEachern) - Visualisation of longitudinal panel data (Kevin Pulo)
2010-06-02: A2: Downstream Curation: Researchers and Data Management
Learning by Doing: Cases of Librarians Working with Faculty Research Data for the First Time
Jake Carlson (Purdue University Libraries)
Michael Witt (Purdue University Libraries)
With few precedents to follow, libraries and librarians who are beginning to explore their potential roles in data curation are grappling to relate their training and experience as librarians to the development and stewardship of data collections. In 2008, a group of librarians at Purdue University conducted an exercise to learn more about what data curation might mean to them in practical terms by identifying and engaging potential data contributors on campus. Subject-specialist librarians engaged six data creators from different disciplines, each in their own way, to solicit contributions from them. The librarians were asked to report back with a collection-level description of the dataset, their rationale for its selection, and a narrative of how they engaged and interacted with the data creator to answer questions such as, “How should the dataset be presented in the repository?” and “What policies are needed for its submission, use, and preservation?” The datasets were then ingested into a prototype data repository. This session will present the structure of the exercise along with vignettes to illustrate the challenges, issues, and insights of the librarians from their perspectives.
Socialization of the Institutional Repository: IR Plus
Suzanne Bell (University of Rochester)
Nathan Sarr (University of Rochester Libraries)
IR Plus is a new, open source platform for institutional repositories. Based on several years of user research, IR Plus was written to support faculty and grad students in the creation phases of their research, rather than just being a receptacle for finished work. For social scientists, the format-agnostic personal workspace supports storing and sharing large datasets, allowing them to collaborate with colleagues all over the world. When the work is ready for "publication" into the repository, a controlled but flexible approach to metadata allows easy addition of new types while keeping the overall metadata clean. Bringing the repository closer to social networking systems, users of the system can showcase and provide access to their work through Researcher Pages, which are easy to create and maintain. Users can now also feature "personal publications" on their Researcher Pages: material from the workspace that has not been added to any of the collections in the repository. While IR Plus was not created just for data users, we feel IASSIST members will find a number of its features interesting and useful, and it provides a study of a formal, institutional system edging into the world of social networking and collaboration.
Information Behaviour of Life Science Researchers – Informing Funders and Service Providers
Stuart Macdonald (EDINA National Data Centre, University of Edinburgh)
This paper will discuss the findings of the RIN-funded Case Studies in Life Sciences project, undertaken by a team of social scientists and information specialists from the Institute for the Study of Science, Technology and Innovation (ISSTI) and from Information Services and the Digital Curation Centre (DCC) at the University of Edinburgh. The aim of the project was to improve understanding of information use in the life sciences, and to provide a broader and deeper base of evidence to inform discussions about how information policy and practice can most effectively be supported and improved. Case studies were conducted across laboratories and research groups from 7 sub-disciplines of the life sciences and deployed a range of methodologies and tools including short-term ethnographic techniques and semi-structured instruments. Our key conclusion indicates that policies and strategies of research funders and information service providers must be informed by an understanding of the constraints and practices of different research communities. Only thus will they be effective in optimising use and exchange of information, and in ensuring that they are scientifically productive and cost-effective The full report ‘Patterns of information use and exchange: case studies of researchers in the life sciences’ is available at: http://www.rin.ac.uk/case-studies.
2010-06-02: A3: New Directions in Inter-Archival Collaboration from the Data Preservation Alliance for the Social Sciences (Data-PASS)
How collaborative preservation works
Micah Altman (Harvard University)
The Data-PASS partnership engages in collaboration at three levels: coordinated operations, development of best practices, and creation and use of open-source shared infrastructure. The first talk in the session provides an update on our search for replication and distributed storage technologies for preservation. Systems like iRODS and LOCKSS can be developed into preservation environments for social science data archives. The key when implementing these preservation environments will be the modification of existing archive policies and procedures to reflect new dependence on collaboration. The second talk discusses the collection of international public opinion data collected by the USIA, which began in 1952 and extended through 1999. Until recently, these data were difficult to access. The Roper Center and the National Archives and Records Administration have identified, rescued, and made these data available to the research community. The third talk describes a new alliance between ICPSR and Institutional Repositories (IRs) with the goal of preserving and re-using social science data. This talk focuses on the formation of these partnerships; how an archiving guide for IRs will be developed; and new services that ICPSR can offer to IRs to assist with social science data. The fourth talk summarizes the efforts of ICPSR and the Roper Center to migrate punched card data to modern preservation formats. This presentation focuses on the recovery of the Cornell Retirement Study, a longitudinal study that began in 1952. The final talk discusses the current collaborative structure of Data-PASS, our agreements, infrastructure, and the services and infrastructure available to new partners.
Retirement in the 1950s: Recovering The Cornell Study of Occupational Retirement
Amy Pienta (ICPSR)
The Data-PASS partnership engages in collaboration at three levels: coordinated operations, development of best practices, and creation and use of open-source shared infrastructure. The first talk in the session provides an update on our search for replication and distributed storage technologies for preservation. Systems like iRODS and LOCKSS can be developed into preservation environments for social science data archives. The key when implementing these preservation environments will be the modification of existing archive policies and procedures to reflect new dependence on collaboration. The second talk discusses the collection of international public opinion data collected by the USIA, which began in 1952 and extended through 1999. Until recently, these data were difficult to access. The Roper Center and the National Archives and Records Administration have identified, rescued, and made these data available to the research community. The third talk describes a new alliance between ICPSR and Institutional Repositories (IRs) with the goal of preserving and re-using social science data. This talk focuses on the formation of these partnerships; how an archiving guide for IRs will be developed; and new services that ICPSR can offer to IRs to assist with social science data. The fourth talk summarizes the efforts of ICPSR and the Roper Center to migrate punched card data to modern preservation formats. This presentation focuses on the recovery of the Cornell Retirement Study, a longitudinal study that began in 1952. The final talk discusses the current collaborative structure of Data-PASS, our agreements, infrastructure, and the services and infrastructure available to new partners.
Building Partnerships Between Social Science Data Archives and Institutional Repositories
Jared Lyle (ICPSR)
The Data-PASS partnership engages in collaboration at three levels: coordinated operations, development of best practices, and creation and use of open-source shared infrastructure. The first talk in the session provides an update on our search for replication and distributed storage technologies for preservation. Systems like iRODS and LOCKSS can be developed into preservation environments for social science data archives. The key when implementing these preservation environments will be the modification of existing archive policies and procedures to reflect new dependence on collaboration. The second talk discusses the collection of international public opinion data collected by the USIA, which began in 1952 and extended through 1999. Until recently, these data were difficult to access. The Roper Center and the National Archives and Records Administration have identified, rescued, and made these data available to the research community. The third talk describes a new alliance between ICPSR and Institutional Repositories (IRs) with the goal of preserving and re-using social science data. This talk focuses on the formation of these partnerships; how an archiving guide for IRs will be developed; and new services that ICPSR can offer to IRs to assist with social science data. The fourth talk summarizes the efforts of ICPSR and the Roper Center to migrate punched card data to modern preservation formats. This presentation focuses on the recovery of the Cornell Retirement Study, a longitudinal study that began in 1952. The final talk discusses the current collaborative structure of Data-PASS, our agreements, infrastructure, and the services and infrastructure available to new partners.
USIA Office of Research Surveys, 1952-99NARA –Roper Center Collaboration: An Update
Lois Timms-Ferrara (The Roper Center for Public Opinion Research)
The Data-PASS partnership engages in collaboration at three levels: coordinated operations, development of best practices, and creation and use of open-source shared infrastructure. The first talk in the session provides an update on our search for replication and distributed storage technologies for preservation. Systems like iRODS and LOCKSS can be developed into preservation environments for social science data archives. The key when implementing these preservation environments will be the modification of existing archive policies and procedures to reflect new dependence on collaboration. The second talk discusses the collection of international public opinion data collected by the USIA, which began in 1952 and extended through 1999. Until recently, these data were difficult to access. The Roper Center and the National Archives and Records Administration have identified, rescued, and made these data available to the research community. The third talk describes a new alliance between ICPSR and Institutional Repositories (IRs) with the goal of preserving and re-using social science data. This talk focuses on the formation of these partnerships; how an archiving guide for IRs will be developed; and new services that ICPSR can offer to IRs to assist with social science data. The fourth talk summarizes the efforts of ICPSR and the Roper Center to migrate punched card data to modern preservation formats. This presentation focuses on the recovery of the Cornell Retirement Study, a longitudinal study that began in 1952. The final talk discusses the current collaborative structure of Data-PASS, our agreements, infrastructure, and the services and infrastructure available to new partners.
Margaret O. Adams, (National Archives and Records Administration)
The Data-PASS partnership engages in collaboration at three levels: coordinated operations, development of best practices, and creation and use of open-source shared infrastructure. The first talk in the session provides an update on our search for replication and distributed storage technologies for preservation. Systems like iRODS and LOCKSS can be developed into preservation environments for social science data archives. The key when implementing these preservation environments will be the modification of existing archive policies and procedures to reflect new dependence on collaboration. The second talk discusses the collection of international public opinion data collected by the USIA, which began in 1952 and extended through 1999. Until recently, these data were difficult to access. The Roper Center and the National Archives and Records Administration have identified, rescued, and made these data available to the research community. The third talk describes a new alliance between ICPSR and Institutional Repositories (IRs) with the goal of preserving and re-using social science data. This talk focuses on the formation of these partnerships; how an archiving guide for IRs will be developed; and new services that ICPSR can offer to IRs to assist with social science data. The fourth talk summarizes the efforts of ICPSR and the Roper Center to migrate punched card data to modern preservation formats. This presentation focuses on the recovery of the Cornell Retirement Study, a longitudinal study that began in 1952. The final talk discusses the current collaborative structure of Data-PASS, our agreements, infrastructure, and the services and infrastructure available to new partners.
2010-06-02: A4: DDI 3 Tools: Possibilities for Implementers (to be cont.)
Colectica: New Technology for Social Science Research
Jeremy Iverson (Algenta Technologies)
Dan Smith (Algenta Technologies)
This demonstration will show Colectica, a set of fully supported, commercial tools specifically designed for questionnaire creation and data documentation. These tools can automatically create CAI source code, paper questionnaires, and statistical source code. They enable data and documentation to be published to the web and to paper documentation formats. An ISO 11179 based metadata repository, backed by DDI3, enables collaborative workflows for the entire research process. The entire data life cycle can be easily visualized using the free Colectica Express viewer.
Archive (DDA) is a national data bank for researchers and students in Denmark and abroad. DDA is dedicated to the acquisition, preservation and dissemination of machine-readable data created by researchers from the social science and health science communities. The DDA needs to convert their existing metadata format into DDI 3. Additionally, the archive wants to integrate information from various metadata sources across into DDI 3. To achieve these objectives DDA began in the fall of 2008 to produce a workbench-style DDI editor architected in such a way that layers can be swapped out and replaced, e.g. backend persistence and DDI 3 id generation. The editor has an Eclipse Rich Client Platform (RCP) as the front end and a public API for developers, both licensed as open source software.
DDI4RDC: Metadata driven framework for the Canada Research Data Centre Network
Pascal Heus (Metadata Technology)
The DDI4RDC project aims at the implementation of open source solutions for the deployment of a DDI3 driven framework for the management of data and metadata across the Canada Research Data Centre Network. The initial phase of the project focuses on metadata management and researcher tools and is expected to complete by mid-2011. This session will provide an overview of the platform, demo the DDI editing tools and the back-end registry technology, share experiences and lessons learned during development, and outline next steps. This project is funded by the Canadian Fund for Innovation under the umbrella of the University of Manitoba and is a collaborative effort between the Canada RDC Network, Metadata Technology North America (USA), Breckenhill (Canada), Algenta Technologies (USA), and Ideas2evidence Ltd (Norway).
Documenting and Disseminating Longitudinal Data Online with DDI 3
Alerk Amin (CentERdata, Tilburg University)
Questasy is an operational web application developed to manage the dissemination of data and metadata for panel surveys. Its easy-to-use interface allows administrators to create and manage metadata, and researchers to browse and search the metadata. Questasy provides a full description of survey projects from data collection to datasets and now also publications. The flexibility of the system enables new input and output possibilities for researchers.
XSLT and DDI: Using Metadata to drive data capture and processing
Samuel Spencer (Australian Bureau of Statistics)
DDI offers users numerous ways to use and capture data and metadata. With the shift towards a statistical lifecycle approach in DDI 3, we must look at how these two facets can work together to drive capture and processing of events downstream of the lifecycle. In this presentation there we will be a discussion of how the metadata captured in DDI 3 can be used and transformed to assist other processes within the lifecycle. Finally, this will be demonstrated using a DDI Instance of the Internet Access Survey from the Australian Bureau of Statistics showing how an XSLT transform can convert DDI 3.1 metadata in XForms for internet data capture.
2010-06-02: B1: Models of Collaboration in Data Curation
Social Networks and Networked Data: A View from the Humanities
Toby Burrows (University of Western Australia)
This paper draws on the work carried out by the Australian Research Council’s Network for Early European Research (NEER) between 2005 and 2009 to build an international social network and shared data services for Early European researchers. It will examine the means used to connect this interdisciplinary research community and to encourage collaboration among its members. It will offer an assessment of the effectiveness of the methods employed. The NEER experience has formed one of the bases for a study of data archiving in the humanities, carried out for the Australian Social Science Data Archive (ASSDA) during 2009. This paper will also present some of the findings from this study and will outline its recommended models for delivering collaborative data services over the Web. It will examine areas of overlap and similarity between the social sciences and the humanities, as well as identifying areas of difference and distinctiveness.
IPUMS International: Expanding Support for International Comparison and Data Access
Wendy Thomas (Minnesota Population Center)
Peter Clark (Minnesota Population Center)
IPUMS International contains an integrated set of census microdata samples from over 44 countries and 130 censuses. By providing detailed metadata and hierarchical harmonization at the variable level, IPUMS International supports international comparison of census related data on a world wide scale. A new cooperative project with the World Bank and Paris21 is completing the data/metadata loop with contributing countries by creating DDI documentation for the original data sets included in our collection and returning it to the country of origin. These DDI files can then be used with the IHSN Microdata Toolkit and locally held data files to provide new levels of access to microdata as well as create new aggregate tables. We will discuss the challenges of integrating metadata across such varied data samples, and the techniques developed to manage these challenges.
Emerging Trends in Data Curation: New Initiatives and collaborations in Africa
Kizito Kasozi (Uganda Bureau of Statistics)
The value of data in Africa has for long been limited to the initial analysis and report publication. The practice of data curation remained a concept with no known practice untill around the 2000's. Ever since the introduction of the Multimedia Data Management Toolkit developed by the International Household Survey Network (IHSN), Africa has embraced the practice of data preservation. A number of countries have adopted the toolkit as the principle method of data documentation and archiving. The number of countries using and planning to use the toolkit increase by the day. This very positive development however has its own challenges and thereby calls for collaborative support from other development partners. The African Development Bank and United Nations Economic Commission for Africa are among the few who have joined hands to further this cause. The purpose of the paper will be to highlight the new inititives and re-echo the need to develop regional capacities to support data curation through collegial networks.
2010-06-02: B2: Connecting the Dots: New Tools for Research
VIVO: Enabling National Networking of Scientists
Jon Corson-Rikert (Cornell University)
Ellen Cramer (Cornell University)
VIVO is a semantic web application developed by the Cornell University Library in 2003 to meet individual and institutional research discovery needs. Whereas initially Cornell depended solely on manual curation for the accumulation of content, much of the information is now automatically ingested from local data resources. VIVO stores data as distributed sets of Resource Description Framework (RDF) statements using concepts and properties from standard ontologies. Employing Linked Data principles (http://linkeddata.org), globally unique identifiers (URIs) for the national network’s resources are directly dereferenceable on the Web, allowing access to the RDF data using standard HTTP requests while presenting human users with a standard HTML representation through any Web browser. Resources at one distributed VIVO node can directly link to resources at any other VIVO node or to other similarly published resources on the Web, allowing automated clients to crawl, analyze, and re-represent the graph of data. Information from local VIVO systems will be aggregated into distributed indexing nodes for full text or RDF queries using the SPARQL language, for network analysis, and for visualization. VIVO has been independently deployed at multiple universities in the U.S, and in Australia and China. The NIH VIVO project will address scalability through multiple independently administered but coordinated installations sharing a common, extensible ontology and supporting direct cross-linking as described above. VIVO can therefore provide a customized and extensible presence at the diverse participating institutions and provide convincing and varied models for propagation under full local institutional control in the national context. The ability to create and add additional content models and integration with existing data resources are supported.
Discovering and Harmonizing Questionnaire Data Using Social Networks
Dan Smith (Algenta Technologies)
Question banks and metadata repositories have been a traditional resource used in the design of new questionnaires and a source for questions and variables used for secondary analysis. Meshing social networks with traditional repositories and new metadata standards presents fresh ways for researchers to collaborate. This presentation will examine how social networks can be used to find, rank, annotate, and harmonize items stored in repositories, using Colectica’s repository as an example. It will also show how researchers in similar areas can find and interact with each other through their use of common data and metadata items.
Michael Bieber (New Jersey Institute of Technology)
This research IntegraL provides a new means for connecting digital content, users’ search needs, and scientists together. IntegraL adopts a “lightweight” mechanism that connects data across content service providers and makes different forms of digital content interoperable. IntegraL facilitates a virtual restructuring of public web spaces and services, with authenticated digital libraries into broad “federated” digital library spaces constructed from numerous interrelationships. Elements reside within a rich context of meta-information that helps users understand and work with them. This provides a ripe environment for organizations and individual people to develop small, specialized collections and services, which automatically become part of the federated space and accessible to those they can benefit. IntegraL extends the boundaries of how we think about and interact with digital libraries. Our study indicates that IntegraL helps to enhance search productivity. Users find IntegraL useful in locating relevant information. IntegraL also helps users to do tasks more effectively. Users are confident of IntegraL’s ability to perform satisfactory search results. IntegraL is mostly built on open-source software, which can be reliable, auditable and cost-effective. The broader impact of this study is that it provides recommendations to users for further deep search among wide ranges of virtual resources.
2010-06-02: B3: Beyond the Traditional Data Archive
Designing Data Services for the Institutional Repository
Ryan Womack (Rutgers University Libraries)
Ron Jantz (Rutgers University Libraries)
The capacity to archive research datasets and make them accessible is a role that an increasing number of institutional repositories in universities are taking on. At Rutgers University, the capacity to handle data is being added to RUCore, the Rutgers University Community Repository. RUCore already supports scholarly papers, dissertations, images, sound, and video. This presentation discusses the work of the RUCore Data Working Group in setting format and metadata standards for datasets, designing an architecture appropriate to data in our Fedora repository, and creating a web interface that makes the datasets and related RUCore collections discoverable. Relevant comparisons to peer institutions and digital curation practices are discussed. The data service is being prototyped through our work with faculty in the School of Engineering and the School of Communication and Information. This faculty research data, along with examples of datasets acquired by the Libraries that need preservation, provide the initial testbed for the RUCore Data Service. Development will continue on a complete curation lifecycle for datasets, from initial deposit to versioning and revision.
New Dialogues with the Research Community. About the Collection, Management and Dissemination of Interview
Marion Wittenberg (Data Archiving and Networked Services)
Increasingly DANS is involved in the curation of a specific kind of qualitative datasets, namely interviews. This paper reports on a number of activities carried out by DANS to collect, manage and disseminate interviews. The first project in which we are engaged is ‘Getuigen Verhalen’ (Telling Witnesses). In this project hundreds of Dutch eyewitnesses to the Second World War will be interviewed. The core collection of interviews will be archived and unlocked by DANS. The project addresses issues of protection of the privacy, sustainable storage of video and audio, documentation of interviews, enrichment of data. The second project, ‘Veteran Tapes VP,’ encourages re-use of qualitative data and experiments with publication of research results. The project aims at the realisation of an ‘enriched publication’ based on a research corpus of transcribed interviews of Dutch veterans. In this project we work together with a group of scholars from a number of scientific disciplines and universities. The researchers implement the concept enriched publication using protocols and tools developed in the project. New projects will focus on advanced disclosure techniques for interviews, in particular speech processing. The paper reports on the results of these projects; the achievements, the problems, and future plans.
Economics Online – Open Access to Economics Research and Data
David Puplett (London School of Economics )
Economists Online (http://www.economistsonline.org) is a portal that contains research outputs from Europe’s top Economics institutions. One of the unique features of this portal is that datasets created as part of the research process are also included, linked directly to the publications they are associated with. Economists Online now features content from around the world, but retains a strong European focus. The NEREUS consortium (http://www.nereus4economics.info) maintain the portal and are strong advocates of Open Access. The portal represents the type of open international infrastructure that researchers increasingly call for. By including datasets, many research datasets have become openly available for the first time as a result. The Datasets themselves are stored in a specialist data repository system, Harvard’s Dataverse. Economists Online harvests metadata from the Dataverse and displays information about each dataset in the portal along side its related publication. The Dataverse uses DDI metadata standards to describe each dataset, and each dataset is grouped by institution. This presentation will explain some of the technical hurdles faced, such as selecting the best infrastructure to use, agreeing on standard metadata across participating institutions, as well as the challenges faced in gaining author permission to make their research data open access as part of Economists Online.
2010-06-02: B4: DDI 3 Tools: Possibilities for Implementers (continued)
Prototype of a Metadata Editor and further DDI services developed in the German National Educational Panel
Ingo Barkow (DIPF - German Institute for International Educational Research)
The National Educational Panel Study (NEPS) - financed by the German Federal Ministry of Education and Research - is an educational longitudinal study with a planned running time of decades and a very complex design. The database programming team from DIPF - the German Institute for International Educational Research is currently developing a Metadata Editor for this project which will in the long run feature a DDI3-compliant import and export web service connected to a SQL Server database. This session will show this editor as work in progress as well as introduce some planned extensions to the Data Warehouse structure in NEPS like a connector to a XML-based CAPI instrument to be released in 2011. The Metadata Editor as well as all other tools developed in NEPS will be released as open source products.
QDDS - Questionnaire Development Documentation System, current version and future plans
Andias Wira-Alam (GESIS - Leibniz Institute for the Social Sciences)
QDDS was conceived to design questionnaires independent of a survey system and to document the development process. Therefore a versioning concept was implemented with distinct commenting on single changes. The initial implementation was based on DDI 2.1 plus extensions. With DDI 3 the entire versioning data could be stored inside the standard. This talk will give an overview on the existing system and the plans for DDI 3 migration.
DDI3 Uniform Resource Names: locating and providing the related DDI3 objects
Joachim Wackerow (Leibniz Institute for the Social Sciences)
DDI3 Uniform Resource Names (URN) are great as persistent identifiers for DDI3 objects. But an application would need not just a unique name (URN) but the location (URL) of the DDI3 object which is identified by the DDI3 URN. General thoughts on resolving URN's to URL's will be discussed. A distributed resolution system based on the Domain Name System (DNS) will be proposed. Structural details of a web service for providing DDI3 objects will be described.
SDA has for years been a useful means of providing basic analytic results quickly and easily for datasets stored in data archives. Recent enhancements are designed to allow SDA Web archives to provide more complete analytic services, especially for groups to collaborate in analysis. The main items to be discussed are the following: Private workspaces for analysts to save their created variables and to protect them from deletion or modification by other users of the data archive. This feature will facilitate collaboration across sites on a project.Standard errors and confidence intervals for complex samples covering all analysis procedures.Charts for means and for regression diagnostics (in addition to charts for tables). This presentation will illustrate how these new features can enhance the range of analysis services provided by data archives for their users and especially how archives can facilitate multi-site and cross-national research.
Developing an Interactive Survey Question Bank: Early Lessons Learned
Jack Kneeshaw (UK Data Archive)
A key theme for the UK's Survey Question Bank (SQB) (www.surveyney.ac.uk/sqb) - the successor to the former Question Bank (Qb) run from the University of Surrey - is the push to build an online community of users. The SQB is delivering a range of strategies across the 'traditional-through-innovative' spectrum to get users involved in the service. Traditional features include the 'top-down' delivery of news via mailing lists and newsfeeds. The service has also used twitter (www.twitter.com/surveynetacuk) and Methodspace (www.methodspace.com/groups/surveyresourcesnetwork) to encourage greater 'bottom-up' activity. Most innovatively, the service will soon introduce an interactive element to its forthcoming survey question database, allowing users to comment on survey questions.nbsp;Later in 2010, the SQB will be investigating how it might house a 'grey literature' repository so that users might find a home for their previously unpublished resources/tools, papers, presentations etc. This presentation will describe these strategies and review early successes and failures.
2010-06-02: C2: Data Sharing: An Important Step in Scientific Method
Copyright and Facts: Issues in Licensing and Redistribution for Social Science Data Professionals
San Cannon (Federal Reserve Board )
While data and statistics have always been the backbone of empirical research, they are now important intellectual property even outside the halls of academia. Everything from the global financial system to shipping gifts for the holidays depends on data: in today’s digital world, information and data are crucial commodities. But often there are strings attached to the access, usage and reporting of data, even if the data are “free.” Researchers may compile a dataset but what rights do they have for the use of those data? This paper will outline some of the issues and considerations of which data professionals in the social science need to be aware. Copyright, licensing, redistribution and intellectual property rights are now important issues that data users, and those who support them, need to understand early on in the research process.
Reproducibility of Computational Results: Opening Code and Data
Victoria Stodden (Yale Law School)
Scientific computation is emerging as absolutely central to the scientific method, but the prevalence of very relaxed practices is leading to a credibility crisis. Reproducible computational research, in which all details of computations — code and data — are made conveniently available to others, is a necessary response to this crisis. Questions emerge regarding scientists' incentives and motivations to share. This talk presents results from a survey of computational scientists to determine the factors that facilitate code and data sharing and those that create barriers. One major result finds that sharing is done for reasons other than direct personal gain, but when scientists choose not to reveal data or code this is due to perceived personal impact. A second major finding is the prominence of Intellectual Property concerns with regard to not sharing code and data. Solutions to the various barriers are discussed, including how the "Reproducible Research Standard" (Stodden 2008), which proposes a licensing structure consonant with scientific norms, can thus encourage open sharing in scientific research.
Barriers to Data Sharing: New Evidence from a US Survey
Amy Pienta (ICPSR)
George Alter (ICPSR)
Jared Lyle (ICPSR)
Recent studies demonstrate that the majority of social science data is not preserved or shared through social science data archives and other formal archival arrangements. This motivates further investigation about the various ways researchers share their data (including more “informal” data sharing) and the factors that underlie their data sharing behavior. We developed a survey to collect information from principal investigators (PIs) of federally funded research grants in the US about their experiences with data sharing (n=1,021). We also collected information about various factors that might be related to data sharing behavior including: normative data sharing practices in their discipline, perceived barriers to data sharing, rank/tenure, institutional type, gender and so on. We find that while only 12% of the PIs have archived their data, 45% have shared their data outside the immediate research team. Being in a discipline that favors data sharing is positively associated with the likelihood that a PI shares his or her research data. Perceived barriers to data sharing reduce the likelihood one shares data. Other factors associated with data sharing include rank/tenure status and duration of the grant. Implications for data archives are also discussed.
On the Lam or in Collaboration - Increasing Competence in Long-Term Preservation
Tuomas J. Alaterä (Finnish Social Science Data Archive (FSD))
Recently the Finnish Social Science Data Archive (FSD) has taken part in a pilot project by the National Digital Library which intends to develop organisations’ readiness to "make the transition into an electronic operating environment". The project conducted a series of surveys in late 2009 measuring the organisations’ data management quality and maturity. At FSD we would want to believe that we already operate electronically. This presentation summarises the findings of the report. The Digital Library project aims to create a centralised long-term preservation solution for the digital cultural objects, and later, for research data too. The project participants vary from large national institutions to regional museums and private archives. Finding a single solution that would ensure long-term preservation and dissemination will be a challenge. Sharing of competence, building partnerships and monitoring international development are crucial for success. This presentation seeks to answer the following questions: How does a national data archive do when compared to libraries, traditional archives and museums ("LAMs")? What could be the mutual benefits of actively working with the LAMs in the field of long-term preservation? Metadata used in and by the LAMs differ notably from research data documentation. How to approach other metadata formats than DDI?
What happens when an organization that has been collecting data closes shop? Has their data been archived? Has it been disseminated to the academic community? Has all the metadata been kept with its associated data? This talk illustrates a case study of a data rescue in a Data Centre. We were informed that there was a series of datasets that needed a home because the government funding for this organization was withdrawn. The talk will include: the background of the Centre for Research in Canada (CRIC); how the rescue mission came about; the rescue mission itself; and the importance of data rescue. Then the question of whose responsibility it is to keep data alive will be discussed.
Since its establishment in 2005, Data Archiving Networked Services (DANS) has been storing and making research data in the arts and humanities and social sciences permanently accessible. To this end DANS itself develops permanent archiving services, stimulates others to follow suit, and works closely with data managers to ensure as much data as possible is made freely available for use in scientific research. Economic sustainability is a key issue for DANS. For this reason DANS initiated a research project in 2007 to have a better understanding of the costs of digital archiving. DANS adopted and adjusted two managerial models, the Balanced Scorecard (BSC) and the Activity Based Costing (ABC) model. The design of the ABC model is based on DANS activities which are categorized in five clusters: Administration, Networked Services, ICT (RD and maintenance), Data Acquisition and Archiving. The Balanced Scorecard of DANS builds on the following business perspectives: Impact, Enablers/Users, Processes and Supporters. Fifteen success factors are allocated to them. The success factors are further specified with over forty performance indicators. During this presentation we will explain how the data generated by the models can be combined and can be used as decision making instruments regarding project specific costs, funding-related decisions, and predicting expenses.
2010-06-02: C4: New Directions in Qualitative Data Access
The Orwellian Data Processing and Provision System of the Historical Archives of the State Security Services
Zoltán Lux (Historical Archives of the Hungarian State Security and LG Co)
The Historical Archives of the State Security Services (www.abtl.hu) has the task of preserving the pre-1989 documentary state-security archives, surveying their content, and making them available to citizens and researchers under strict conditions. The presentation provides on the one hand a broad outline of the Orwellian information system of the Archives, which guards all change and access to it, and the methods used to support this large mass of document accessing (special meta-data structure, mass use of OCR, examination of the introduction of text-exploring tools). On the other hand it seeks to show how, alongside strong access restrictions, it can make ever more of documents and service provisions compiled from the database (e. g. Archontology— https://www.abtl.hu/archontologia —and the photographic database https://www.abtl.hu/spyOne/anonymous) fully publicly accessible on the Internet or accessible by special privilege (e. g. to researchers). The database of the Historical Archives is also interesting from an IT point of view because the mass digitalization and the beginning of OCR use will bring an increase in volume of 2–3 TB a year.
Qualitative Data in DDA – Coping with New Formats
Anne Sofie Kjeldgaard (Danish Data Archive)
One of the places data archives experience emerging new and challenging data formats is within qualitative research. New social technologies such as e.g. Facebook, debates on the internet and photographing on mobile phones produce data for qualitative research project. The Danish Data Archive (DDA) has a long standing ambition to archive qualitative data alongside with quantitative data. To welcome qualitative data we need knowledge about variations in data formats which we must be able to support and archive. The article will be based on a study of data formats and the use of CAQDAS – Computer assisted Qualitative Data Analysis Software – in Danish, empirically based social science PhD thesis published in 2009. The study will be related to comparable studies carried out in national and international contexts as well as initiatives concerning archiving of qualitative data among our fellow data archives.
The RACcER Project: A Data Partnership Between the Irish Qualitative Data Archive (IQDA) and A Major Community Based Childhood Intervention Strategy (Tallaght West CDI)
Jane Gray (Irish Qualitative Data Archive)
Aileen O'Carroll (Irish Qualitative Data Archive)
Tara Murphy (Irish Qualitative Data Archive)
This paper will describe the ongoing work of an innovative partnership between the Irish Qualitative Data Archive and the Tallaght West Childhood Development Initiative (based in west Dublin). RACcER (Re-Use and Archiving of Complex Community-Based Evaluation Research) aims to explore and implement new approaches to meeting the ethical and practical challenges involved in archiving, and creating appropriate levels of access to the complex qualitative and contextual data generated in the rigorous evaluation of a major community-based childhood intervention strategy. The project objectives include: documenting the concerns and expectations of research funding agencies, researchers and potential users; evaluating and enhancing IQDA protocols and procedures, especially in relation to evaluation research, through a participatory process of preparing qualitative data from CDI for archiving; dissemination of the outcomes of the project through the CDI and IQDA websites. It is intended that the partnership will act as a major demonstrator project for the promotion of qualitative data archiving and re-use across the Irish social science communities. The project has been co-funded by the Irish Research Council for the Humanities and Social Sciences (IRCHSS) and Tallaght West CDI.
The Afrobarometer at Ten: Building a Network of Survey Research in Africa
Boniface Dulani (Michigan State University and University of Malawi)
For ten years, the Afrobarometer has undertaken a comparative series of national public opinion surveys that measure public attitudes toward democracy, governance, the economy and market reform, leadership, identity and other issues in Africa. From its inception, the Afrobarometer project has worked with a network of partners, based in Africa and the United States, to generate scientifically reliable data on public opinion in Africa while at the same time aiming to strengthen institutional capacity for survey research in Africa. Additionally, the Afrobarometer has also sought to widely disseminate and apply its results to various stakeholders, including those based in Africa and outside of the continent. This presentation will offer an overview of the Afrobarometer project in the preceding ten years, identifying some of the key achievements made in realizing its objectives as well as some of the challenges encountered in carrying out survey research in Africa.
Repositories and Cloud Services for Data Cyberinfrastructure
Sandy Payette (DuraSpace)
An historical look at the emergence of infrastructure – electric grids, railways, and the Internet – reveals that a key developmental stage has been reached when formerly incompatible or stand-alone systems are interconnected via adapters and gateways. We see this in the digital repository domain, where the notion of a digital repository began as a predominantly institutional phenomenon, or in certain cases a disciplinary phenomenon, but is now evolving to a point where repositories are becoming integrated components of larger systems and distributed infrastructure. At the same time, we see the emergence of cloud technologies that offer large-scale storage and compute capabilities, but with open questions on how well the cloud will meet the requirements of digital preservation and data archiving. In this session, I will discuss how the DuraSpace not-for-profit organization is working to evolve open source repositories and cloud services to serve as core components within emerging data cyberinfrastructure. In collaboration with the Fedora and DSpace communities, we are positioning repositories and cloud services to become part of a distributed data curation and archiving fabric that encompasses both institutional systems and the Web. I will also provide highlights from the DuraSpace partnership with the Data Conservancy, an NSF-funded Datanet project
2010-06-03: D1: Automated Curation Tools and Services for Metadata
Designing Flexible Workflow for Upstream Participation of the Scientific Data Community
Robert Downs (Columbia University)
Robert Chen (Columbia University)
Providing sustainable access to scientific data and research-related information can spawn new opportunities for their use by current and future scientific, educational, and decision-making communities. Submitting scientific data for ingest into a digital data repository prior to the end of a research, data collection, or data creation project facilitates collaborative preparation of data and metadata while project resources are still available. Engaging data producers earlier in the scientific data lifecycle should improve preparation of scientific data and metadata for future use and dissemination. We describe here the design of a data submission and workflow system aimed at providing flexible capabilities for organizing the contributions of interdisciplinary producers of scientific data, reviewers, and archivists. The system provides a framework for iterative improvement in the range and quality of information obtained from data sources as data products are developed and finalized and for integration of this workflow with selection and appraisal processes by community reviewers and digital archivists, taking into account both discovery and preservation metadata needs.
A Data Curation Application Using DDI: The DAMES Data Curation Tool for Organising Specialist Social Science Data Resources
Simon Jones (University of Stirling )
Guy Warner (University of Stirling )
Paul Lambert (University of Stirling )
Jesse Blum (University of Stirling )
This paper will present the public access metadata tools and services developed by the UK's 'Data Management through e-Social Science' research Node (www.dames.org.uk). Requirements of the Node are to curate and distribute data resources linked with a number of specialist topics (data on occupations; educational qualifications; ethnicity and immigration; social care; and e-Health data). Heterogeneous data formats and structures can be identified. Most resources are generated through academic research, and are typically freely distributed, but lacking in firm standards of preparation. Services are required to collect, process, and subsequently distribute these data resources for the benefit of social science research. Metadata tools and services have been prepared to generate metadata on the relevant resources. This occurs principally in the form of an online 'data curation tool' which is intended to be accessible to non-specialists. Metadata collected through the tool is stored in DDI format (DDI 3), through which it can subsequently be exploited for further data analysis and organisational tasks, including facilitating the further distribution of the data. This paper will describe the currently available tools, and ongoing issues in designing and employng these metadata-oriented tools and services.
Building the Infrastructure for Enhanced Publications Using DDI 3
Alerk Amin (CentERdata)
Rob Grim (University of Tilburg)
Maarten Hoogerwerf (DANS)
There is a lot of discussion about the advantages of DDI 3, but many people wonder if it is worth all the complexity. Moving to DDI 3 involves complex structures, tools, workflows, infrastructure and expensive documentation efforts. The DatapluS project demonstrates the added value of these efforts by providing a concrete use case where publications are enriched with variable-level information about the research process. The project consists of two tools for researchers, the Enhanced Publications Editor (EPE) and the Subject portal, and the required infrastructure to integrate these tools with the data from the archives. The EPE enables authors to link their publications to datasets and variables in the archives and with additional metadata of how they did their research. The Subject Portal publishes the resulting information and uses it to provide an enhanced search interface for researchers. The strength of DatapluS is the collaboration between data archives, libraries, and research organizations. By aligning their existing responsibilities with those needed for the project, they can now serve the researchers using an efficient and sustainable infrastructure. These lessons learned show that strategic collaboration can efficiently provide advanced services and benefits for the researchers using DDI 3 infrastructure.
Implementing DDI 3.0: a Case Study of the German Microcensus
Andias Wira-Alam (GESIS – Leibniz Institute for the Social Sciences )
Oliver Hopt (GESIS – Leibniz Institute for the Social Sciences )
This paper shares our experience in developing an application software for the metadata of the German Microcensus on the variable level. First, we develop an editor acted in compliance with DDI 3.0 standard as the documentation software which improves and simplies the process of the data documentation. Second, we develop a web information system in order to present various looks at the metadata to the end users. The scope of our work depicts the development cycle of an application software based on DDI 3.0 standard. More technical details are also presented in this paper.
Richard Wiseman (University of Manchester, Mimas )
Susan Noble (University of Manchester)
Celia Russell (University of Manchester)
One of the key features of the global financial crisis was its unexpected nature. This was a paradox as extensive data is collected from around the world with the primary aim of maintaining global financial stability. This failing of our current socioeconomic data framework, combined with a more widespread dissatisfaction with the present state of statistical information on economies and societies, prompted French president Nicholas Sarkozy to set up a commission designed to identify the limits of GDP as an indicator of economic performance. In this talk, we discuss the outcomes and likely impact of the commission which reported in 2009. We also consider the production of more relevant indicators of social progress, assess the feasibility of alternative measurement tools and discuss the likelihood of their adoption. International data are also widely used by the research community, and we also present here the results of a preliminary analysis of the ESDS International web server logs showing the countries and indicators preferentially chosen by academics when looking for data to support their research. These results tell us about the kind of data researchers choose when looking for data to support their research and the future directions we should consider as providers and facilitators of cross-national data.
Using Administrative Data for Social Science Research: Promise and Peril
Fredric Gey (University of California, Berkeley )
Administrative records from ongoing social and tax programs can provide a rich source for specialized research in the social sciences. Welfare, employment, income tax, support programs for pregnant and parenting teens, scholarship applications, can, among many others be mined for insights into social and economic behavior of specialized sup-populations of the general population. This source of information comes at a price of diligence in understanding the statistical universe you are utilizing, dealing with dirty data (duplicate records, missing data, confusing data), protection of the privacy of individuals and understanding the peculiar structure and operations of particular data processing systems within operational social programs. This presentation will draw from the UC DATA’s experience of more than a decade of social science research using administrative data. It is hoped that our experience can enlighten and prepare others for the challenge and rewards of preparing and repurposing administrative records for research uses that they were not originally designed. Administrative data will be compared (in terms of utility, accuracy and completeness) with rigorously design social survey data.
But it's not the same thing! Using National Labour Data in Cross-National Comparative Studies on Precarious Employment
Walter Giesbrecht (York University )
Attempting to compare labour data from different countries or regions can be fraught with danger, since definitions of concepts can vary tremendously, and gaps exist in the data collected. Apparently similar concepts such as part-time work, or permanent vs. temporary employment, are often not strictly comparable using published aggregate data. Surveys designed to gather data for national policy reasons do not automatically generate data that are comparable for use in cross-national studies. Harmonizing these concepts and definitions involves considering their deployment in national contexts in order to assure that cross-national comparisons are truly comparing likes with likes. The Comparative Perspectives Database (still under development) is a project that is attempting to generate comparative multi-dimensional data tables on aspects of precarious employment, using the microdata from a total of seven surveys spanning thirty countries (Canada, United States, European Union (EU-27) and Australia). I will discuss in detail some of the problems we encountered in producing a codebook that would allow us to produce useful cross-national data tables, as well as describe other similar projects conducted elsewhere.
Wikiprogress is a statistical wiki on progress. The beta was launched at the OECD World Forum in Busan Korea in October 2009. Wikiprogress is: is a global platform for sharing, measuring and evaluating societal, economical and environmental progress. the main area where initiatives are shared, which exist around the world on the measurement of progress, as well as their use for raising awareness amongst stakeholders, informing them on key economic, social and environmental trends and allowing them to discuss relevant issues based on solid evidence and statistics on an interesting and wide range of topics. the best place to find answers the following questions: who is developing initiatives on measuring progress (well-being, quality of life, freedom, etc.)? what type of Taxonomy does this initiative use? which indicators are being used to measure the different dimensions of progress? how are Countries (or regions/communities) achieving progress over time and in comparison to other similar territories? the focal point where both experts and practitioners share practices on indicators design, calculation and dissemination, as well as where stakeholders interested in developing initiatives in this field can find reference documents and assistance on how to establish measuring progress initiatives.
2010-06-03: D3: Virtual Research Environments: Tools for Presenting and Storing Data
Data Warehouses and Business Intelligence for Social Sciences. Aims and Possibilities of a Data Warehouse within the National Educational Panel Study (NEPS) in Germany
David Schiller (University of Bamberg, National Educational Panel Study (NEPS) )
The National Educational Panel Study NEPS (which is supported by the German Federal Ministry of Education and Research) was established in 2008. Based on a multicohort sequence design 60,000 target persons within six starting cohorts will be followed through their life course. The main aims of the NEPS are to increase the knowledge about the development of competencies, the impact of learning environments, social inequality and educational decisions, educational acquisition with migration background, and the returns of education. The NEPS data will be stored in a data warehouse. A data warehouse and the tools of Business Intelligence provide a higher level of flexibility than ordinary file servers or conventional transactional databases. Some of these additional opportunities are: Easier access to the data, data can be offered in different dimensions for various statistical packages (such as Stata), data from different sources can be matched, the user can build his/her own private use files (e. g., by selecting variables depending on the personal needs via a shopping basket), results and new computed variables can be easily stored within the data warehouse for further use by different users, the statistical power within the data warehouse can be used without an additional statistical software and even new solutions of disclosure control and anonymization techniques can be integrated.
A User-Driven and Flexible Procedure for Data Linkin
Cees van der Eijk (University of Nottingham)
Eliyahu V. Sapir (University of Nottingham)
The PIREDEU program develops a large-scale pilot infrastructure for research into electoral democracy in the European Union. This program comprises, for each of 27 EU member-states five data components: voter studies, candidate studies, content analyses of mass media, content analyses of party manifestos, and contextual data. The infrastructure builds user-friendly tools for integrating these very different kinds of data into data-sets tailored to individual researchers’ needs. This paper discusses, after a brief description of the PIREDEU program preconditions for successful data integration (including ex-ante harmonization and pre-linking)a brief overview of the multi-dimensional conceptual space defining different forms of linking the various data componentsthe strategy designed for linking data without restraining users to a limited number of pre-defined possibilities, and to provide analysts to specify the kind of linking that optimally suits their research needs, and the on-line implementation of this strategythe effects of the on-line linking tools on the cohesion of networks of researchers, effective use of data, quality of data and need for new data collectionconditions under which the approach to data linking developed for PIREDEU can be extended to other data holdings (without ex-ante harmonization and pre-linking).
Virtual Center for Collaborative Research (ViCtoR)
Pascal Heus (Metadata Technology)
The purpose of the Virtual Center for Collaborative Research (ViCtoR) project is to support the development of a DDI metadata driven web based platform that provides researchers with a flexible and dynamic environment for the purposes of discovering and analyzing microdata, fostering collaboration, and facilitating the preservation of research knowledge. The platform will consist in a web-based environment that will offer users tools to manage virtual research projects (derive customized data files for a particular purpose; share files and knowledge with team members; package research outputs as enhanced publications for preservation and dissemination) and to facilitate social networking. The facility will also include tools to manage research outputs and knowledge (data, document or script libraries; researcher directories; Wiki and other collaborative tools such as chat rooms, discussion forums, news, event calendar, etc.) The project is in its pilot phase and we will present the initial version of the environment focusing on metadata exploration, dataset customization and retrieval, research project management, and knowledge capture. This work is being supported by the NORC Data Enclave (http://www.norc.org/DataEnclave) and implemented by Metadata Technology North America.
InFuse: Data Feeds for the UK 2001 and 2011 Censuses and Beyond
Justin Hayes (The University of Manchester)
The InFuse Project is building on the successes of the recent CAIRD Project in demonstrating the feasibility, and some of the potential benefits of applying the emerging open standards SDMX metadata/ transfer schema in combination with a web service to the large and complex aggregate outputs from the UK 2001 Census to create a data feed providing comprehensive and flexible access to data and metadata. The primary objective of the project is to develop a dissemination application based on a complete UK 2001 Census data feed that will provide an operational service to the Census Dissemination Unit’s user base across UK academia by September 2010. The service will then be developed to incorporate information from the UK 1971, 1981, 1991 and 2011 censuses, as well as other, non-census datasets. Further research will create new metadata on geographic and definitional comparability to enable integrated use of these datasets. This paper will describe the methods used in the InFuse Project, the outputs to date, some potential benefits and impacts, and the parallel partnership work between the Census Dissemination Unit and the UK Office for National Statistics in developing data feed dissemination from source for the aggregate outputs from the UK 2011 Census.
2010-06-03: D4: Restricted Data Access: Principles and Standards
Survey on Access to African Government Microdata for Social Science Research
Lynn Woolfrey (University of Cape Town)
Governments mandate their National Statistics Offices to collect empirical data to determine appropriate policies. Re-use of this data for research can provide input regarding the effectiveness of government action. In Western Europe and North America policies and institutions support the efficient collection and sharing of official data for research purposes. In Africa the sharing of government microdata is constrained by several obstacles. African National Statistics Offices have limited resources to curate microdata and ensure its long-term availability. Consequently many African data producers do not follow international best practice with regard to survey data management or share the microdata from the surveys they conduct. This was confirmed by a survey conducted in order to investigate the availability of survey microdata from African National Statistics Offices for research. A further obstacle to access to government microdata in Africa is inadequate producer-user communication channels. Concerns around the confidentiality of respondent information also present a barrier to data usage for research, as does the bureaucratic nature of government institutions involved in data production in African countries. Access to official microdata for research requires sound data usage policies driven by African decision-makers who appreciate the role of information utilisation in national development.
Developing a Statistical Disclosure Standard for Europe
Tanvi Desai (London School of Economics )
The European Union has long faced the problem of how overcome the challenge of sharing data across borders for effective cross-national research. One of the key issues is a harmonisation of standards and protocols. The ESSNet project funded by Eurostat has developed a protocol for statistical disclosure that can be implemented by all european member states. This presentation will outline some of the challenges faced, the standard developed, and will look how the standard might be used to change the european data infrastructure in the future.
Settings, Practices and Data Access: Results of a Survey of UK Social Scientists
Jo Wathan (University of Manchester )
Where access to data are controlled by a data service or depositor, their use may be restricted to individuals who are able to adhere to certain conditions regarding their use. These conditions may require the applicant to store, use or limit sharing in particular ways. In the UK such conditions have become more commonplace with the growth of special licences and securing settings for government microdata. In order to assess the potential impact of these conditions on the usability of data, a survey was conducted to obtain data from a representative sample UK social scientists in ten disciplines to better understand their working environments and practices during the autumn of 2009. A 61% response rate was achieved resulting in over six hundred completed questionnaires. The survey covered a range of questions which included, access to computing facilities, including data and printout storage, home working, data transportation, awareness of institutional policies on personal data, and attitudes to a range of access conditions.
International Access to Restricted Data - A Principles-Based Standards Approach
Felix Ritchie (UK Office for National Statistics)
Access to restricted microdata for research is increasingly part of the data dissemination strategy within countries, made possible by improvements in technology and changes in the risk-benefit perceptions of NSIs. For international data sharing, relatively little progress has been made. Recent developments in Germany, the Netherlands and the US are notable as exceptions. This paper argues that the situation is made more complex by the lack of a general coherent risk-assessment framework. Discussions about whether something should be done become sidetracked into discussions about how procedural issues would constrain implementation. International data sharing negotiations quickly become bilateral, often dataset-specific, and of limited general value. One way forward is to decouple implementation from principles. A principles-based risk-assessment framework could be designed to address the multiple-component data security models which are increasingly seen as best practice. Such a framework allows decisions a out access to focus on legal-procedural issues; similarly, secure facilities could be developed to standards independent of dataset-specific negotiations. In an international context, proposals for classification systems are easier to agree than specific multilateral implementations. The paper concludes with examples from the UK and cross-European projects to show how such principles-based standards could work in practice.
This session focuses on a project with a remit to produce some evidence of how teachers use data resources, and the impact on student learning. There is concern about levels of data and statistical literacy skills of UK social science students, even though the cutting edge of social science is reliant on use of real world datasets, and there is a great desire to improve research-led teaching in the area. The project collates the experience of attempts to upskill students in data and its discipline-related usage, and provides an illustration of educational practice at both discipline and national level. The case studies showcase attempts to make learning and teaching about and with data a less passive experience. The UK national data centres provide access to a wealth of social science data - provided by national census agencies, and inter governmental organisations including the OECD, IMF, UN and World Bank - for undergraduate and postgraduate study. Students often avoid handling and discussing data in their study unless forced to confront it. The challenge for educators lies in promoting students' use of data, but the benefits in doing so improve both academic performance and job prospects for students.
Outreach to New Communities: The Census 2010 Project
Lisa Neidert (University of Michigan )
Data Support at the Population Studies Center receives money from the university provost to provide support for census data on the UM campus. With the advent of the 2010 Census, we are in the process of a many-pronged outreach effort related to the 2010 Census. The purpose is to reach communities that could/should be using census data to inform their stories, lives, and research. We also want to underscore the importance in participating in the 2010 Census. The first effort was a Census 2010 Boot Camp for Journalists, which provided 25 journalists with training to understand the importance of census data and how to analyze the data for local and national stories: http://mblog.lib.umich.edu/McCormick In the second effort we are sponsoring an ad contest (YouTube-like), where students create short videos encouraging participation with the 2010 Census. The ‘hard-to-count’ areas in Ann Arbor are the student-dominated university neighborhoods. Finally, we will be teaching a one-credit course in Winter 2010 on “The United States Census.” Portions of the course will be presented in other classes throughout the academic term. The presentation will discuss the challenges and rewards of outreach to new communities.
Developing an Internet based Data Service at SSJDA in Japan
Keiichi Sato (University of Tokyo , Institute of Social Science)
The Social Science Japan Data Archive (SSJDA) collects, maintains, and provides access to the academic community, a vast archive of social science data (quantitative data obtained from social surveys) for secondary analyses. As a unit within the Center for Social Research and Data Archives, Institute of Social Science, University of Tokyo, SSJDA aims to promote empirical research on Japan in the social sciences, and has been disseminating survey data since April, 1998. The total number of available datasets is about 1,200 in the end of 2008. SSJDA plays the role of a major data provider for those who seek to analyze the Japanese society using micro data. This presentation discusses the recent efforts in developing the internet based system at SSJDA. SSJDA has an on-line searching system powered by full-text search engine. It has also developed an on-line data provision system which started to operate from April 2009. The English version of the same system will be made public early in 2010. In addition, SSJDA has just started to consider the adoption of DDI seriously.
2010-06-03: E2: Connecting the IASSIST Community Across the Web: IASSIST Publications Committee and e-Community Infrastructure Action Group Discussion with Members
Connecting the IASSIST Community Across the Web: IASSIST Publications Committee and e-Community Infrastructure Action Group Discussion with Members
Walter Piovesan (University of Vancouver)
Bo Wandschneider (University of Guelph)
Harrison Dekker (University of California Berkeley)
Carol Perry (University of Guelph)
Amy West (University of Minnesota)
Jennifer Darragh (Johns Hopkins University )
The IASSIST website was created by a lone volunteer in 1999 and through a massive volunteer effort by an expanded Pubs committee was formed into the present site in 2002. Calls for modernization were heard in the conference hallways at least two years ago, and a migration to an open source web content management system was envisaged. Following some ups and downs and many skype conference calls during 2009/10, a new ‘swat team’ emerged to build a new website to allow greater participation and interaction by members, along with a brand new look and feel. Meanwhile, the e-Community Infrastructure Action Group was formed in 2009, charged with proposing an umbrella of infrastructure for the IASSIST e-Community. IASSIST has long used virtual means to build its community. Mail lists, websites, blogs, online databases, virtual workspaces, and social networking are now a day-to-day part of the operation of the association. Clearly, new technologies afford us the possibility for synergy among our multifarious online presences in a way that can enrich our community and ease its administration. In the tradition of the Outreach and Strategy committees: we would like to share our work over the last year and open the floor for discussion.
2010-06-03: E3: Panel: Confidentiality and Access Concerns of the Social Sciences and Human Subjects Ethics Review Boards
Ethics Review in Finland
Arja Kuula (Finnish Social Science Data Archive)
In the United States and elsewhere, the need for ethics review was addressed in response to egregious ethical violations in medical and biological research. Research review in the social and behavioral sciences has slowly evolved from this medical background, but several contentious issues remain in the effort to best address the needs and risks of social science data collection, analysis, and archival. This panel will be structured to bring together various stakeholders in the ongoing worldwide discussion of ethics review in the social sciences to discuss current needs and future directions in the United States and elsewhere. This session will be structured as a panel discussion and not as a series of presentations. Each panelist will be given a chance at the beginning to describe his or her viewpoints regarding the current state of regulation, data access and confidentiality in social science research. However, a moderated discussion centered around issues of data accessibility, confidentiality, regulatory development, and the roles of data archives, to name a few, will comprise the bulk of the session. The first panelist, Arja Kuula, will describe the standpoints and the scope of the Finnish ethics review system, which differs from the U.S. system. One of the goals of the Finnish system is to find a balance between confidentiality and the openness of science and research. In addition to informing about the ethical norms that relate to data archiving, Arja will tell how the Finnish Social Science Data Archive has been involved in making guidelines for ethics review at the University of Tampere. The second panelist, Robert Downs, will present on the perspective that the IRB takes on data management when reviewing research protocols for the protection of human subjects and discuss ways in which these issues can be addressed. He will also discuss some challenges that the protection of human subjects presents to researchers and archives for the collection, management, and dissemination of data. The third panelist, Yasamin Miller, will speak on the current state of ethics review and how it should and will change in the future. Yasamin will also provide perspectives, concerns, ideas, etc. from the human subjects review board/IRB side of the table.
Scientific Data Management for the Protection of Human Subjects
Robert Downs (Columbia University)
In the United States and elsewhere, the need for ethics review was addressed in response to egregious ethical violations in medical and biological research. Research review in the social and behavioral sciences has slowly evolved from this medical background, but several contentious issues remain in the effort to best address the needs and risks of social science data collection, analysis, and archival. This panel will be structured to bring together various stakeholders in the ongoing worldwide discussion of ethics review in the social sciences to discuss current needs and future directions in the United States and elsewhere. This session will be structured as a panel discussion and not as a series of presentations. Each panelist will be given a chance at the beginning to describe his or her viewpoints regarding the current state of regulation, data access and confidentiality in social science research. However, a moderated discussion centered around issues of data accessibility, confidentiality, regulatory development, and the roles of data archives, to name a few, will comprise the bulk of the session. The first panelist, Arja Kuula, will describe the standpoints and the scope of the Finnish ethics review system, which differs from the U.S. system. One of the goals of the Finnish system is to find a balance between confidentiality and the openness of science and research. In addition to informing about the ethical norms that relate to data archiving, Arja will tell how the Finnish Social Science Data Archive has been involved in making guidelines for ethics review at the University of Tampere. The second panelist, Robert Downs, will present on the perspective that the IRB takes on data management when reviewing research protocols for the protection of human subjects and discuss ways in which these issues can be addressed. He will also discuss some challenges that the protection of human subjects presents to researchers and archives for the collection, management, and dissemination of data. The third panelist, Yasamin Miller, will speak on the current state of ethics review and how it should and will change in the future. Yasamin will also provide perspectives, concerns, ideas, etc. from the human subjects review board/IRB side of the table.
2010-06-03: E4: Secure Remote Access to Restricted Data
Secure Data Service; an Improved Access to Disclosive Data
Reza Afkhami (UK Data Archive)
Melanie Wright (UK Data Archive)
The UKDA Secure Data Service is a new service to allow controlled restricted access procedures for making more detailed microdata files available to some users (Approved Researchers), subject to conditions of eligibility, purpose of use, security procedures, and other features associated with access to the SDS data. Its operation is legally framed by the 2007 statistics Act. A key problem in Secure Data Service (SDS) data confidentiality is to balance the legitimate requirements of data users and confidentiality protection. Employing security technologies used by the military and banking sectors, the SDS will allow trained researchers to remotely access data which is held securely on central SDS servers at the UK Data Archive. The aim of the service is to provide approved academics unprecedented access to valuable data for research from their home institutions, with all of the necessary safeguards to ensure that data is held, accessed and handled securely. The SDS follows a model which suggests that the safe use of data should cover the elements of safe project, safe people, safe setting and safe output. In order to achieve the above goal, data security depends on a matrix of factors, including technical, legal, contractual, and educational.
The 2006 French Census: A New Collection, A New Dissemination. Which Place for a Data Archive?
Alexandre Kych (Centre Maurice Halbwachs (CMH))
The 1999 census was the last exhaustive census in France that has moved towards a continuous population census since 2006. At that time, the French national statistical office (INSEE) renewed profoundly his website and propose today several collections of aggregated tables, at municipality level, and more than 10 different microdata files. Anybody can now download these tables and files without any commitment and free of charge. Yet the on-line microdata files are not complete due to confidentiality protection requirements and the standard aggregated tables cannot answer all the questions. The presentation will focus on the evolution of the specific role of the French Data Archives in this new context. The Centre Maurice Halbwachs (CMH), who is in charge within Réseau Quetelet to provide access to government micro data for the researchers, offers in cooperation with INSEE specific ways for tabulations on request. It also provides the annual census survey micro data file. The new remote access built by INSEE in cooperation with the Data Archives will offer other possibilities. Finally, not surprisingly, researchers also continue to turn to CMH for getting help and advice.
Imagining the Possibilities of Collaborative Spatial Learning: OCUL’s Geospatial Portal Project and Its Inspirations
Leanne Hindmarch (Ontario Council of University)
Jenny Marvin (University of Guelph)
OCUL’s Geospatial and Health Informatics Cyberinfrastructure Portal (“Geospatial Portal” for short) is a new project to create a data storage and discovery tool in Ontario, Canada, intended to improve access to geospatial and health data for Ontario researchers and students. One of the priorities of the project is to explore how new, collaborative, web-based technologies have influenced teaching and learning of spatial concepts and dissemination and use of spatial data. We then intend to incorporate these tools into the Geospatial Portal, thus encouraging its integration into fundamental classroom and research processes. In this presentation, we will provide an engaging and fun look at the latest online technologies for sharing, creating, and working with spatial data, with visual examples of projects featuring mashups, collaboration tools, and community-contributed data. We’ll offer a first look at our plans to use such technologies in the Geospatial Portal, to encourage engagement with spatial data literacy among the Ontario academic community.
The Census Dissemination Unit at the UK national data centre based at the University of Manchester has been delivering Census Aggregate Statistics over the web to academics for over 10 years. The tool used to deliver it - Casweb - was once-upon-a-time innovative. Unfortunately this is no longer the case. The tool 'does the job' but users (say they) are looking for more interactivity, improved ease of use, and better ways to combine data. Achieving this is not easy given a small team whose remit is to support tens of thousands of users, as well as keeping up with changing user expectations and undertaking in-house development. And all this when the data delivery methods are changing too. Still, never ones to turn down a challenge, the team at Manchester has undertaken to update the data access system in-line with user requirements. A separate paper discusses the underpinning technology (the InFuse project based on SDMX and a data feed approach). This paper/presentation shows how we are engaging directly with users - both face to face and using social software - to get them to tell us what they really really want.
Lynda Kellam (University of North Carolina at Greensboro)
Amy West (University of Minnesota)
Katharin Peter (University of North Carolina)
Social Networking ranges from the sublime to the del.icio.us. Effective social networking tools fit into one's work styles, further one's goals and support a playful approach to one's work. Join us as we share and skewer our favorite social networking toys for bookmarking, microblogging, promoting data services, promoting ourselves and keeping up to date on the data world. We will also look at how some of our favorite data producers are using and misusing social networking and web 2.0 tools.
Using Ethnography in the Library: How to Study the Students in their Native Data-Gathering Habits
Lois Stickell (University of North Carolina at Charlotte )
Most libraries have access to myriad information sources and data of all types. The librarians know how to access the data but do the students? While webpage redesigns and other attempts to make user information more helpful often focus on usability studies, there is little research on how students make initial choices about where to go for data. The University of North Carolina at Charlotte has embarked on a ethnographically-based study of student use of the library that includes hiring an Anthropologist to facilitate helping librarians better understand student behavior. This study is in the early stages but is patterned after study at the University of Rochester. M presentation will focus on how the library plans to incorporate this study and the changes it suggests into an overall re-design of library websites and the physical facility itself with the goal of better serving the students. While the overall study will be broad, my presentaiton will focus on how the lbirary plans to focus on how students seek/find data and how they succeed and/or fail at the process.
Infrastructure for data collection, access, analysis and preservation; new data partnerships in knowledge communities; and some explication of roles and responsibilities--a slide is worth a thousand words. This Pecha Kucha explores institutional readiness preparatory to the possible establishment of a new data services center at a research university in Singapore.
The traditional approach to data security has been a 'data management' approach where the data provider takes full responsility for data security delivering a 'safe' anonymised file to the users who tend to be viewed as a risk. This paper will argue that a 'researcher management' approach where researchers are trained to understand data security and disclosure control, and are encouraged to work cooperatively with data providers has many benefits that a 'data management' approach cannot deliver.
2010-06-04: F1: Data Reference in Depth (Part I): Subjects, Sources and Challenges
Data Reference in DepthTrade, Prices, Production, Consumption
Amy West (University of Minnesota)
Data services librarians are expected to be fluent in a range of sources dependent on the needs of their patrons. The data librarian needs to be prepared to move easily between often diverse subject areas. The presenters in this session will give participants a sampling of a data librarian's "day in the life" by highlighting a few major subject areas and their primary data sources. Participants will get a head start on dealing with some particular reference challenges by getting an introduction to some key sources and particular challenges that crop up in a sampling of different areas. First, Amy West will lead us through the world of International Trade, Commodities, Prices and Production. Next Lynda Kellam will move to the U.S. with a focus on the major changes to occur with the U.S. 2010 Census and American Community Survey. Walter Giesbrecht will cover labor with a look at both national and international sources as well as the problems of harmonization. The developing world will be the focus of Kristi Thompson's talk as she looks at survey data on International Development. Her focus will be on survey data on developing countries. She will look at some of the different groups that conduct surveys and make the microdata available: national, intergovernmental organizations such as the United Nations and the World Bank, and nongovermnetal organization sources such as the Demographic and Health Surveys. She will compare the major sources on topics covered, samples, geographic and temporal coverage, and access issues. Mary Tao will finish off with a timely look at data on the credit crisis. She will discuss where to start when a researcher asks for data/statistical information on mortgages, credit cards, financial institutions, and other financial crisis-related materials. Both fee-based and free resources will be covered in this presentation.
Data services librarians are expected to be fluent in a range of sources dependent on the needs of their patrons. The data librarian needs to be prepared to move easily between often diverse subject areas. The presenters in this session will give participants a sampling of a data librarian's "day in the life" by highlighting a few major subject areas and their primary data sources. Participants will get a head start on dealing with some particular reference challenges by getting an introduction to some key sources and particular challenges that crop up in a sampling of different areas. First, Amy West will lead us through the world of International Trade, Commodities, Prices and Production. Next Lynda Kellam will move to the U.S. with a focus on the major changes to occur with the U.S. 2010 Census and American Community Survey. Walter Giesbrecht will cover labor with a look at both national and international sources as well as the problems of harmonization. The developing world will be the focus of Kristi Thompson's talk as she looks at survey data on International Development. Her focus will be on survey data on developing countries. She will look at some of the different groups that conduct surveys and make the microdata available: national, intergovernmental organizations such as the United Nations and the World Bank, and nongovermnetal organization sources such as the Demographic and Health Surveys. She will compare the major sources on topics covered, samples, geographic and temporal coverage, and access issues. Mary Tao will finish off with a timely look at data on the credit crisis. She will discuss where to start when a researcher asks for data/statistical information on mortgages, credit cards, financial institutions, and other financial crisis-related materials. Both fee-based and free resources will be covered in this presentation.
How the 3 Little Pigs Lost their Houses to subprime Mortgages and Other Big Bad Wolves
Mary Tao (Federal Reserve Bank of New York)
Data services librarians are expected to be fluent in a range of sources dependent on the needs of their patrons. The data librarian needs to be prepared to move easily between often diverse subject areas. The presenters in this session will give participants a sampling of a data librarian's "day in the life" by highlighting a few major subject areas and their primary data sources. Participants will get a head start on dealing with some particular reference challenges by getting an introduction to some key sources and particular challenges that crop up in a sampling of different areas. First, Amy West will lead us through the world of International Trade, Commodities, Prices and Production. Next Lynda Kellam will move to the U.S. with a focus on the major changes to occur with the U.S. 2010 Census and American Community Survey. Walter Giesbrecht will cover labor with a look at both national and international sources as well as the problems of harmonization. The developing world will be the focus of Kristi Thompson's talk as she looks at survey data on International Development. Her focus will be on survey data on developing countries. She will look at some of the different groups that conduct surveys and make the microdata available: national, intergovernmental organizations such as the United Nations and the World Bank, and nongovermnetal organization sources such as the Demographic and Health Surveys. She will compare the major sources on topics covered, samples, geographic and temporal coverage, and access issues. Mary Tao will finish off with a timely look at data on the credit crisis. She will discuss where to start when a researcher asks for data/statistical information on mortgages, credit cards, financial institutions, and other financial crisis-related materials. Both fee-based and free resources will be covered in this presentation.
Lynda Kellam (University of North Carolina at Greensboro)
Data services librarians are expected to be fluent in a range of sources dependent on the needs of their patrons. The data librarian needs to be prepared to move easily between often diverse subject areas. The presenters in this session will give participants a sampling of a data librarian's "day in the life" by highlighting a few major subject areas and their primary data sources. Participants will get a head start on dealing with some particular reference challenges by getting an introduction to some key sources and particular challenges that crop up in a sampling of different areas. First, Amy West will lead us through the world of International Trade, Commodities, Prices and Production. Next Lynda Kellam will move to the U.S. with a focus on the major changes to occur with the U.S. 2010 Census and American Community Survey. Walter Giesbrecht will cover labor with a look at both national and international sources as well as the problems of harmonization. The developing world will be the focus of Kristi Thompson's talk as she looks at survey data on International Development. Her focus will be on survey data on developing countries. She will look at some of the different groups that conduct surveys and make the microdata available: national, intergovernmental organizations such as the United Nations and the World Bank, and nongovermnetal organization sources such as the Demographic and Health Surveys. She will compare the major sources on topics covered, samples, geographic and temporal coverage, and access issues. Mary Tao will finish off with a timely look at data on the credit crisis. She will discuss where to start when a researcher asks for data/statistical information on mortgages, credit cards, financial institutions, and other financial crisis-related materials. Both fee-based and free resources will be covered in this presentation.
Data Reference in Depth: Sources of International Labour Data
Walter Giesbrecht (York University)
Data services librarians are expected to be fluent in a range of sources dependent on the needs of their patrons. The data librarian needs to be prepared to move easily between often diverse subject areas. The presenters in this session will give participants a sampling of a data librarian's "day in the life" by highlighting a few major subject areas and their primary data sources. Participants will get a head start on dealing with some particular reference challenges by getting an introduction to some key sources and particular challenges that crop up in a sampling of different areas. First, Amy West will lead us through the world of International Trade, Commodities, Prices and Production. Next Lynda Kellam will move to the U.S. with a focus on the major changes to occur with the U.S. 2010 Census and American Community Survey. Walter Giesbrecht will cover labor with a look at both national and international sources as well as the problems of harmonization. The developing world will be the focus of Kristi Thompson's talk as she looks at survey data on International Development. Her focus will be on survey data on developing countries. She will look at some of the different groups that conduct surveys and make the microdata available: national, intergovernmental organizations such as the United Nations and the World Bank, and nongovermnetal organization sources such as the Demographic and Health Surveys. She will compare the major sources on topics covered, samples, geographic and temporal coverage, and access issues. Mary Tao will finish off with a timely look at data on the credit crisis. She will discuss where to start when a researcher asks for data/statistical information on mortgages, credit cards, financial institutions, and other financial crisis-related materials. Both fee-based and free resources will be covered in this presentation.
2010-06-04: F2: Data Management: Engaging Researchers and Crossing Disciplines
Data management: engaging researchers and crossing disciplines
Veerle Van den Eynden (UK Data Archive)
Data management is a growing area of service for data archives and university libraries alike. How do data professionals support researchers in their efforts to manage, disseminate, and preserve their research data? What are the roles of librarians and data archivists? This group of presenters will detail how they and their colleagues have been enabling this work at their institution through a variety of activities such as: identifying the needs of researchers, engaging researchers further upstream in the data life cycle, working with researchers and colleagues across disciplines, involving librarians, conducting trainings, developing learning materials and guidelines, and facilitating the implementation of data management plans.
Services, Policy, Guidance and Training: Improving Research Data Management at One Institution
Robin Rice (Edinburgh University)
Data management is a growing area of service for data archives and university libraries alike. How do data professionals support researchers in their efforts to manage, disseminate, and preserve their research data? What are the roles of librarians and data archivists? This group of presenters will detail how they and their colleagues have been enabling this work at their institution through a variety of activities such as: identifying the needs of researchers, engaging researchers further upstream in the data life cycle, working with researchers and colleagues across disciplines, involving librarians, conducting trainings, developing learning materials and guidelines, and facilitating the implementation of data management plans.
Data management is a growing area of service for data archives and university libraries alike. How do data professionals support researchers in their efforts to manage, disseminate, and preserve their research data? What are the roles of librarians and data archivists? This group of presenters will detail how they and their colleagues have been enabling this work at their institution through a variety of activities such as: identifying the needs of researchers, engaging researchers further upstream in the data life cycle, working with researchers and colleagues across disciplines, involving librarians, conducting trainings, developing learning materials and guidelines, and facilitating the implementation of data management plans.
Katherine McNeill (Massachusetts Institute of Technology)
Data management is a growing area of service for data archives and university libraries alike. How do data professionals support researchers in their efforts to manage, disseminate, and preserve their research data? What are the roles of librarians and data archivists? This group of presenters will detail how they and their colleagues have been enabling this work at their institution through a variety of activities such as: identifying the needs of researchers, engaging researchers further upstream in the data life cycle, working with researchers and colleagues across disciplines, involving librarians, conducting trainings, developing learning materials and guidelines, and facilitating the implementation of data management plans.
2010-06-04: F3: Connecting with the Community: Stakeholder Participation in the Development and Operation of Qualitative Data Archives
Assessing models of stakeholder participation for the development and acceptance of the Australian Qualitative Archive
Lynda Cheshire (The University of Queensland )
Michael Emmison (The University of Queensland )
Alex Broom (The University of Sydney)
The successful development and operation of a data archive depends upon establishing a close relationship with its users. For qualitative data archiving, this involves engaging with researchers to address the specific issues arising from the distinct characteristics of qualitative data; to develop joint solutions to the ethical challenges of data re-use; to encourage researcher up-take of the archive; and to create opportunities for collaboration in areas such as the creation of metadata and data management planning. In the case of Indigenous archives, moreover, the user community is extended to include Indigenous communities as partners in the compilation and repatriation of knowledge and data. In this session, we reflect on the process of engaging with stakeholders and respecting researcher autonomy in the operation of three qualitative data archives: the UK's Qualidata; the Australian Qualitative Archive; and the Aboriginal and Torres Strait Islander Data Archive.
Researchers exposed: Does archiving data reveal too much?
Libby Bishop (University of Leeds University of Essex )
The successful development and operation of a data archive depends upon establishing a close relationship with its users. For qualitative data archiving, this involves engaging with researchers to address the specific issues arising from the distinct characteristics of qualitative data; to develop joint solutions to the ethical challenges of data re-use; to encourage researcher up-take of the archive; and to create opportunities for collaboration in areas such as the creation of metadata and data management planning. In the case of Indigenous archives, moreover, the user community is extended to include Indigenous communities as partners in the compilation and repatriation of knowledge and data. In this session, we reflect on the process of engaging with stakeholders and respecting researcher autonomy in the operation of three qualitative data archives: the UK's Qualidata; the Australian Qualitative Archive; and the Aboriginal and Torres Strait Islander Data Archive.
No humbugging: Curating Indigenous data to promote eResearch
Elizabeth Mulhollann (University of Technology Sydney)
Alex Byrne (University of Technology Sydney)
Gabrielle Gardiner (University of Technology Sydney)
Kirsten Thorpe (University of Technology Sydney)
The successful development and operation of a data archive depends upon establishing a close relationship with its users. For qualitative data archiving, this involves engaging with researchers to address the specific issues arising from the distinct characteristics of qualitative data; to develop joint solutions to the ethical challenges of data re-use; to encourage researcher up-take of the archive; and to create opportunities for collaboration in areas such as the creation of metadata and data management planning. In the case of Indigenous archives, moreover, the user community is extended to include Indigenous communities as partners in the compilation and repatriation of knowledge and data. In this session, we reflect on the process of engaging with stakeholders and respecting researcher autonomy in the operation of three qualitative data archives: the UK's Qualidata; the Australian Qualitative Archive; and the Aboriginal and Torres Strait Islander Data Archive.
2010-06-04: F4: Providing Secure Access to Sensitive Data
NORC Data Enclave
Tim Mulcahy (NORC)
Pascal Heus (Metadata Technology)
Data services are now at a very exciting crossroad. The possibilities for innovative science represented by data merging, mashing, mining and mapping is rapidly expanding, as e-science web 2.0 worlds clamour for Open Data. Yet at the same time the power of these very techniques raises new concerns about data protection, as new techniques make the possibility of identifying individuals ever greater. Following some high profile data disclosure scandals and fearing public and respondent backlash, many data providers are seeking to place even greater restrictions on research access to detailed data. How can research access to data be preserved (and indeed expanded) whilst ensuring data security? This panel will compare and contrast existing options across the globe for providing secure access to sensitive data. Participants will include Tim Mulcahy/Pascal Heus from the NORC Secure Data Enclave (representing direct secure remote access to microdata), Stefan Bender from the German IAB (representing a remote execution service), a representative TBC from the Canadian Research Data Centres (representing networked safe centres), and a representative TBC from the US Census RDCs (representing stand-alone secure centres). The panel will be chaired by Melanie Wright, Director of the UK's new ESRC Secure Data Service.
Data Access: Some German Thoughts about Past, Present and Future
Stefan Bender (Forschungsdatenzentrum (FDZ) of the IAB)
Data services are now at a very exciting crossroad. The possibilities for innovative science represented by data merging, mashing, mining and mapping is rapidly expanding, as e-science web 2.0 worlds clamour for Open Data. Yet at the same time the power of these very techniques raises new concerns about data protection, as new techniques make the possibility of identifying individuals ever greater. Following some high profile data disclosure scandals and fearing public and respondent backlash, many data providers are seeking to place even greater restrictions on research access to detailed data. How can research access to data be preserved (and indeed expanded) whilst ensuring data security? This panel will compare and contrast existing options across the globe for providing secure access to sensitive data. Participants will include Tim Mulcahy/Pascal Heus from the NORC Secure Data Enclave (representing direct secure remote access to microdata), Stefan Bender from the German IAB (representing a remote execution service), a representative TBC from the Canadian Research Data Centres (representing networked safe centres), and a representative TBC from the US Census RDCs (representing stand-alone secure centres). The panel will be chaired by Melanie Wright, Director of the UK's new ESRC Secure Data Service.
Data services are now at a very exciting crossroad. The possibilities for innovative science represented by data merging, mashing, mining and mapping is rapidly expanding, as e-science web 2.0 worlds clamour for Open Data. Yet at the same time the power of these very techniques raises new concerns about data protection, as new techniques make the possibility of identifying individuals ever greater. Following some high profile data disclosure scandals and fearing public and respondent backlash, many data providers are seeking to place even greater restrictions on research access to detailed data. How can research access to data be preserved (and indeed expanded) whilst ensuring data security? This panel will compare and contrast existing options across the globe for providing secure access to sensitive data. Participants will include Tim Mulcahy/Pascal Heus from the NORC Secure Data Enclave (representing direct secure remote access to microdata), Stefan Bender from the German IAB (representing a remote execution service), a representative TBC from the Canadian Research Data Centres (representing networked safe centres), and a representative TBC from the US Census RDCs (representing stand-alone secure centres). The panel will be chaired by Melanie Wright, Director of the UK's new ESRC Secure Data Service.
2010-06-04: G1: Data Reference in Depth (Part II): Access, Citation, and Instruction
Data Reference In Depth: Access, Citation and Instruction
Paul H. Bern (Syracuse University)
How does providing access to data, as a unique format, affect library reference services? This session will provide some answers to that question by exploring how three data librarians have built upon traditional approaches to reference work. First, Paul Bern will look into how data librarians craft the tools they use to guide researchers and answer questions with special focus on deciding which sources to put out there as a starting point for users and the process of evolving from a list of sources to a catalog of sources. Next, Hailey Mooney will consider how the seemingly simple request of bibliographic verification can be complicated by data citation practices. This will be investigated through a citation analysis of datasets used by faculty at a major research university. Finally, Kristin Partlo will look at how the data reference interview is especially crucial in the provision of research assistance, from the viewpoint of working with undergraduates at a small liberal arts college.
How does providing access to data, as a unique format, affect library reference services? This session will provide some answers to that question by exploring how three data librarians have built upon traditional approaches to reference work. First, Paul Bern will look into how data librarians craft the tools they use to guide researchers and answer questions with special focus on deciding which sources to put out there as a starting point for users and the process of evolving from a list of sources to a catalog of sources. Next, Hailey Mooney will consider how the seemingly simple request of bibliographic verification can be complicated by data citation practices. This will be investigated through a citation analysis of datasets used by faculty at a major research university. Finally, Kristin Partlo will look at how the data reference interview is especially crucial in the provision of research assistance, from the viewpoint of working with undergraduates at a small liberal arts college.
How does providing access to data, as a unique format, affect library reference services? This session will provide some answers to that question by exploring how three data librarians have built upon traditional approaches to reference work. First, Paul Bern will look into how data librarians craft the tools they use to guide researchers and answer questions with special focus on deciding which sources to put out there as a starting point for users and the process of evolving from a list of sources to a catalog of sources. Next, Hailey Mooney will consider how the seemingly simple request of bibliographic verification can be complicated by data citation practices. This will be investigated through a citation analysis of datasets used by faculty at a major research university. Finally, Kristin Partlo will look at how the data reference interview is especially crucial in the provision of research assistance, from the viewpoint of working with undergraduates at a small liberal arts college.
The Data Archive as a Social Network: An Analysis of the Australian Social Science Data Archive
Steven McEachern (Australian Social Science Data Archive )
The study of the network structure of academic disciplines through the analysis of publication citation data has a long history in the field of social network analysis (eg. Mullins et al, 1977). Such studies have examined network characteristics such as the centrality of researchers within collaborative networks (Bollen et al, 2005), and the characteristics distinguishing core and peripheral members of networks (Borgatti and Everett, 1999). The prevalence of such research can in part be attributed to the relatively easy collection of the relational data required for social network analysis, in a consistent and structured form, within a definable population. It is therefore somewhat surprising that the holdings of data archives have not been subject to similar analysis. The metadata describing social science data deposits represents a rich source of data to explore the emergence of the data networks supporting social science research activity. This paper seeks to address therefore seeks to fill this gap. It presents a longitudinal social network analysis of the data holdings of the Australian Social Science Data Archive, exploring the patterns of deposit by discipline and institution, over the 30 year history of the archive, as well as visualisation of the emergence of the deposit network over time.
The Diffusion of Information Technology in the United States and Its Impact on Social Science Research across Institutions and Countries
Anne Winkler (University of Missouri-St. Louis)
Sharon G. Levin (University of Missouri-St. Louis)
Paula E. Stephan (Georgia State University and NBER)
Wolfgang Glanzel (Katholieke Universiteit Leuven, Stenupunt OO)
This study examines the extent to which IT has differentially affected collaboration in the social sciences relative to the natural sciences. IT’s impact on the social sciences may be larger because much research can be conducted virtually, while working in close proximity (in labs) may be more crucial to producing research in the natural sciences. To undertake the research, the authors match an explicit measure of institutional IT adoption (domain names, e.g. www.umsl.edu) with institutional data on all published papers indexed by ISI for 1,348 four-year colleges, universities and medical schools for the years 1991-2007. The publication data cover the social sciences, humanities, and natural sciences and narrower fields such as economics and biology. Three measures of co-authorship are examined: (1) average number of coauthors by institution; (2) percent of papers from an institution with one or more co-authors at another U.S. institution; and (3) percent of papers with one or more non-U.S. coauthors. The study describes collaboration patterns and then uses regression analysis to examine the impact of IT “exposure” on co-authorship. Preliminary results suggest: 1) dramatic growth in co-authorship within and across fields; and; 2) differential effects of IT by field.
Applications of Social Networking in International Collaboration, Multisite-research, Knowledge Re-use and Data Configuration Management
Kartikeya Bolar (University of Toledo)
Researchers would like to collaborate with the researchers who are miles away but have similar interests in examining a particular phenomena or conducting a project. The phenomena in focus or the project parameters will definitely vary in their intensities in different contexts. So knowledge or patterns developed at one site will need validation and further enrichment by examining the same at multiple sites. Hence for effective synthesis of knowledge about any phenomena, it is essential to have international collaboration along with multi-site research. Further knowledge is discovered from the databases. Hence, a different perspective and different analysis to the same database will definitely lead to discovery of new knowledge body. Hence it is not only essential that the knowledge discovered is preserved but also the underlying data. The data can be essentially configured to the needs of new research projects. This paper tries to provide insights on how Social networking can be effectively used to identify potential collaborators in research, resolves issues in multisite-research, knowledge re-use and data configuration management.
2010-06-04: G3: Preservation: Interoperability and Reproducibility
Replicated & Distributed Storage Technologies : “Impact on Social Science Data Archive Policies”
Jonathon Crabtree (University of North Carolina, Chapel Hill)
The Data-PASS partnership engages in collaboration at three levels: coordinated operations, development of best practices, and creation and use of open-source shared infrastructure. The first talk in the session provides an update on our search for replication and distributed storage technologies for preservation. Systems like iRODS and LOCKSS can be developed into preservation environments for social science data archives. The key when implementing these preservation environments will be the modification of existing archive policies and procedures to reflect new dependence on collaboration. The second talk discusses the collection of international public opinion data collected by the USIA, which began in 1952 and extended through 1999. Until recently, these data were difficult to access. The Roper Center and the National Archives and Records Administration have identified, rescued, and made these data available to the research community. The third talk describes a new alliance between ICPSR and Institutional Repositories (IRs) with the goal of preserving and re-using social science data. This talk focuses on the formation of these partnerships; how an archiving guide for IRs will be developed; and new services that ICPSR can offer to IRs to assist with social science data. The fourth talk summarizes the efforts of ICPSR and the Roper Center to migrate punched card data to modern preservation formats. This presentation focuses on the recovery of the Cornell Retirement Study, a longitudinal study that began in 1952. The final talk discusses the current collaborative structure of Data-PASS, our agreements, infrastructure, and the services and infrastructure available to new partners.
Towards a Federated Infrastructure for the Preservation and Analysis Archival Data
Chien-Yi Hou (University of North Carolina at Chapel Hill)
Richard Marciano (University of North Carolina at Chapel Hill)
The Sustainable Archives and Leveraging Technologies group (SALT) at UNC is pursuing a number of projects that address issues of interoperability and reproducibility. This presentation will discuss 3 projects: (1) e-Legacy, an NHPRC-funded project which is developing preservation infrastructure and services for state government geospatial data, (2) PoDRI, an IMLS-funded project which explores policy-based interoperability between Fedora and iRODS repositories, and (3) T-RACES (Testbed for the Redlining Archives of California's Exclusionary Spaces), an IMLS-funded project which builds on these other projects and will publish an online archive of historical redlining and racial discrimination data. T-RACES documents the New Deal Home Owners' Loan Corporation federal agency's confidential security maps and surveys of the 1930s. These surveys form the genesis of neighborhood discrimination and restricted mortgage lending, known as redlining. A digital library interface based on interactive databases and Google Map and Google Earth interfaces will integrate data from 8 California cities, including Los Angeles, San Francisco, and Oakland. The intent of this archive is to serve as a core reference data set that can be augmented and customized through social networking mechanisms, through overlays of social science data. Across all three projects, the authors are interested in roles and responsibilities of data services and data repositories that support concepts of policies and customization.
Automated DDI Metadata Harvesting and Replication for Preservation Purposes within iRODS
Jon Crabtree (H. W. Odum Institute for Research in Social Science)
Antoine de Torcy (DICE)
Mason Chua (H. W. Odum Institute for Research in Social Science)
This prototype demonstrated that the migration of collections between digital libraries and preservation data archives is now possible using automated batch load for both data and metadata. We used this capability to enable collection interoperability between the H.W. Odum Institute for Research in Social Science (Odum) Data Archive and the integrated Rule Oriented Data System (iRODS) extension of the National Archives and Record Administration's (NARA) Transcontinental Persistent Archive Prototype (TPAP). We extracted data and metadata from a Dataverse data archive and ingested it into the iRODS server and metadata catalog using the OAI-PMH, Java, XML/XSL and iRODS rules and microservices. We validated ingest of the files and retained the required Terms Conditions for the social science data after ingest.
Encoding Archival Context: An Australian Perspective on Situating Data in Frameworks of Meaning
Gavan McCarthy (Australian Social Science Data Archive )
In 2008 the University of Melbourne eScholarship Research Centre joined the Australian Social Science Data Archive, not as social science researchers nor as experienced data archivists but as a group with significant experience in pushing the boundaries of generalised archival practice. We had been studying and developing tools to systematically document the larger contexts in which archival materials are located, to understand the cultural informatics of meaning and how it is ascribed both by the archivists and users of records. This paper examines the use of two tools we use (the Online Heritage Resource Manager – OHRM, and the Heritage Documentation Management System – HDMS) while working directly social science researchers and their data. In particular it explores metadata interchange with DDI (Versions 2 and 3) and the positioning of these tools within the Open Archive Information System reference model. The systematic documentation of contexts (there are often more than one) has numerous benefits but for social science data it is probably the management of rich and highly interconnected authority records where the most obvious benefit lies. The paper will conclude with reference to recent work on the development and utilisation of the Encoded Archival Context xml schema.
The End of Marketing As We Know it, and the Rise of Sociological Metrics
Dave Linabury (Campbell-Ewald, SocialThreat.com)
Social media has forever changed the way people interact, communicate and do business. No industry has been more affected, crippled and mutated by it than marketing. With social media growing in importance each day, in part due to the integration with mobile, advertisers are finding that their tried and true methods of measurement just aren’t cutting it anymore. This presentation proposes that sociology holds the key to understanding social media, not only for understanding people’s communicative behaviors, but also their shopping, research and buying behaviors. Case studies will be presented.