The archive provides titles, presenters, abstracts, and links to presentations in Zenodo and recordings on YouTube, where available. If the list does not include any links, it means the presentation was not obtained by IASSIST. However, it may be available online, if the authors have published it elsewhere.
All recent archives are organized in the following order: plenaries, concurrent sessions, lightning talks, posters, and workshops. Abstracts can be viewed by clicking the button below.
2009-05-26: Workshops
Nesstar 4.0
Ornulf Risnes (Norwegian Social Science Data Services)
A new version of Nesstar, Nesstar 4.0, with a lot of exciting new functionality will be released during spring 2009. This version includes, among other things, support for multilingual metadata, improved search functionality, subscription, and the ability to add cell notes and to embed live data into regular web pages. The workshop will introduce this new functionality and demonstrate how to publish data onto the Web, and how end-users can find, browse, visualize and analyse data online. The main focus will be on using Nesstar to:1) Publish survey data and multilingual documentation onto the Web2) Publish aggregate data/cubesDifferent combination of micro- and aggregate data
Data Archiving Networked Services (DANS) is active in the area of data infrastructure, with two main themes, namely (digital) archiving and making research data available. The field of activity of DANS covers both the social sciences and the humanities. DANS also manages its own data repository of research data. In 2005, the founders of DANS, the Royal Netherlands Academy of Arts and Sciences (KNAW) and the Netherlands Organization for Scientific Research (NWO), gave DANS the formulation of a data seal of approval as one of its assignments. In February 2008, 17 guidelines were presented under the name data seal of approval, nationally at a KNAW symposium and internationally at the first African Digital Curation Conference. In this workshop more about the backgrounds of the Seal will be explained: what it is and what it isn't, which international seals of approval exist, how this seal of approval matches them, what its unique selling point is, what the plans for the future are, and, most important how does the assessment work? After a 45 minutes of introduction we will first look at a use case and after that work on filling in the assessment.
Exporting DDI 3.0 from Computer Assisted Interviewing Systems
Jeremy Iverson (Algenta)
Computer Assisted Interviewing systems are rich sources of metadata, but current practice in creating data and documentation from CAI systems does not generally capture this richness, nor does it structure the documentation content optimally for archiving. This session will focus on innovations in data production that permit the export of comprehensive DDI 3.0 documentation that provides not only an enhanced codebook with integrated question text and full labeling but also documentation of the survey instrument itself, including the universe for each question and the instrument flow logic. Data producers and archivists will learn from each other about how to produce output that is most useful. The workshop will cover mapping an instrument to DDI, using a database to store metadata, exporting to DDI, metadata reuse for surveys with multiple waves, and archival ingest of DDI-compliant metadata.
To assist researchers in developing and implementing sound data curation practices, the Digital Curation Centre (DCC) has developed the Digital Curation 101 course to provide an introduction to digital curation and the range of activities that should be considered when dealing with digital data. Using the Digital Curation Lifecycle Model as a reference point, this course employs a mix of lectures and practical exercises to equip students with both a theoretical underpinning of core digital curation issues as well as hands-on experience in applying the lessons learned. In this workshop, we will provide a 'lite' version of the DC 101 which normally runs over four days. During the workshop, participants will be introduced to the lifecycle model, provided with an overview of the various roles and responsibilities associated with the lifecycle stages, an idea of tools available, and given tips on key points to be considered at each stage.
Moving to DDI 3.0: Translating Current Collections of Structured Metadata
Wendy Thomas (Minnesota Population Center)
In anticipation of new tools for creating and exploiting DDI 3.0 content, many current DDI users would like to get started converting their collections to DDI 3.0. This workshop can help you identify the issues involved in upgrading to DDI 3.0 and organizing the process of converting your current collection. We will focus on moving metadata from earlier DDI versions and other common structures such a relational database systems, SAS or SPSS files. Mapping of metadata from DDI 2.1 to DDI 3.0 will be reviewed, identifying elements that require additional decision making before transferring or elements that may be problematic due to local usage. The hands-on portion of the workshop will focus on metadata in DDI 1.0 - 2.1 and statistical file formats. We will walk through the decision points needed for a clean transfer of metadata content, the current tools for transferring from standard formats to DDI 3.0, and the clean up steps needed to take full advantage of the reuse of metadata. Attendees may bring in a sample of earlier DDI metadata to work with or use the samples provided. The presentation will also cover the process of grouping a set of study units and identifying the common metadata that can be "moved up" to the group level for inheritance by members of the group.
Data requirements and Digital Repositories
Ann Green (Yale University)
Robin Rice (University of Edinburgh)
Stuart MacDonald (University of Edinburgh)
Luis Martinez Uribe (Oxford University, Oxford e-Research Centre)
Tanvi Desai (London School of Economics)
Jane Roberts (Oxford University, Social Science Data Service))
This workshop will be based upon the DISC-UK DataShare project's "Guide to Data Requirements for Digital Repositories" (in development and to be released at the IASSIST workshop). The guide is intended to be used as a decision making and planning tool for institutions with digital repositories in existence or in development that are considering adding research datasets to their digital collections. It also can help articulate the benefits of sound data management practices as well as the goals of data sharing and long term access. The guide is largely based upon the online OpenDOAR Policy Tool, the OAIS Information model, and the TRAC checklist. It compiles information from and refers to multiple resources and to examples of how some of the requirements have been implemented. The workshop will focus upon each section of the guide covering repository content policies, metadata, file formats, access and reuse of data, submission and administration of data, preservation issues and more. Workshop participants will break up into groups, review a particular section of the guide, choose parameters for a fictional digital repository, and report back on their findings and recommendations. Although the focus is upon social science data, other data types could be considered within the range of requirements covered in the workshop. The workshop will be an opportunity for data librarians and data archivists, repository managers and developers to discuss the implementation of best practices, standards, and strategies as research data make their way into a wide range of digital repositories.
Data Librarians Represent! Integrating Data Services into the Social Science Research Process
Lynda Kellam (University of North Carolina at Greensboro)
Katharin Peter (University of Southern California)
This workshop will present a range of models for incorporating data services into the social science research process and practices for reaching new populations through data services. We will provide a framework for teaching students (and other librarians) how to incorporate data searches into the literature review process and present examples of ways to include data in shorter assignments (e.g., "fast facts" or polling statistics). We will discuss approaches for undergraduate and graduate students (including first-year students and non-data populations) as well as approaches for collaborating with teaching faculty and subject librarians. Discussion will incorporate a range of data tools, such as the ICPSR learning modules, table generators and various numeric databases. Using these tools along with hands-on activities and concrete examples, this workshop aims to help participants develop approaches suitable for their own clientele and institutions.
Digital Challenges: Bridging the Gap between Publication and Data
Adam Farquhar (The British Library)
The digital age has seen a widening fissure in our scholarly infrastructure - the gap between published research and the datasets that underlie it. While there are well established services for published research, there is only an incomplete patchwork of inconsistently supported services for datasets. For example, there is little agreement on how to identify, cite, or catalogue datasets. The implications for future use of today's research are even more concerning. This talk will outline some of the challenges we face as we try to bridge this gap and highlight some emerging practices that could change the situation if they become widely adopted.
2009-05-27: A1: Tools and Implementations of DDI 3.0
Session theme/info: <p>This session is an overview of new tools developments since the initial set of DDI 3.0 tools was demonstrated at IASSIST 2008. The goal is to show people how DDI 3.0 is being implemented in new tools, to provide ideas for implementation, and contacts for each of the various tools. This panel will provide a brief (5 minute) overview of the entire tools picture (including references to new tools which have their own sessions at IASSIST), and will then provide short (10-15 minute) demos of:</p><ul><li>CentERdata's tool for documenting/disseminating longitudinal online panel surveys (LISS panel)</li><li>The DDI Foundation Tools Editing Suite, an open-source implementation developed by a group of organizations from the DDI community</li><li>Exanda, a tool built by GESIS for tabulating data and making visualizations, with outputs in several formats</li><li>MISSY 2: another GESIS tool for documenting and viewing the German microcensus</li><li>Editing and database tools developed at the IAB.</li></ul><p> </p><p>This panel is very much a follow-up from last years', to show how aggressively DDI 3.0 is being adopted, and to show the various tools and approaches available to those who are looking at DDI 3.0 implementation themselves.</p>
2009-05-27: A2: Semi-Permeable Boundaries Among Institutions
Session theme/info: <p>Data Services have evolved in very different institutions in different places: in some places, data have traditionally been lodged in centralized national data archives, in others they evolved in the libraries, and in others they began their life in local data archives. None of these settings is static - and they are certainly in transition - but each has different strengths and weaknesses. Increasingly, these institutional settings must confront the ways in which they complement, supplement, compete with, and coordinate with one another. In this session, the presenters will discuss both formal and informal initiatives which have attempted to redefine institutional boundaries with respect to kinds of services provided (e.g. statistical consulting, data discovery, file manipulation, preservation), the kinds of data dealt with (e.g. restricted vs. public, aggregate vs. microdata), and the kinds of expertise needed. The results at times blur or remove boundaries, and at times preserve institutional boundaries through formal or informal agreements to divide complementary services or activities. Whether strengthened, weakened, or simply shifted along different lines, the intent in each case is to increase the mobility of data and support services between the most appropriate providers and the data users.</p>
2009-05-27: A3: Practicing What We Provide: Surveying Users of Surveys
The Data That Did Not Arrive For The Date: Talking The Non-Response Blues
Karsten Boye Rasmussen (University of Southern Denmark)
We have the exact formulas for computing sample errors when performing surveys with probability sampling. Most social scientists can recite some practical heuristics to the tune of if you have about 1000 questionnaires your error rate will be close to +/- 3 per cent. The precision of the actual calculations have vanished from memory, but some parts of the sampling theory are sufficiently sticky. However, the true and sad story is that the precision delivered by statisticians is unnecessary as the plausible error of nonresponse is momentous compared to the sample error even in surveys considered to have high response rates. Argumentation to diminish the effect of nonresponse is sought in methods like comparison with known values for the population and extrapolation in time for the nonresponse. The focus will be on the lightweight and insufficiency of the argumentation obtained by these methods. The presentation will demonstrate nonresponse through preliminary results from a data collection of companies using mailed out business survey, repeated phone reminding procedures with note taking, information obtained from valid registers, and special information evaluating the presence of the company. I'll be in your survey if you'll be in mine.
Data Services Awareness and Use Survey 2008: Five-year Follow-up at the University of Tennessee
Eleanor Read (University of Tennessee)
In fall 2003, the University of Tennessee Libraries conducted a survey to assess awareness of its data services among faculty and graduate students in selected departments. The information collected has been helpful in understanding how data users learn about and use Data Services, how successful various promotional and outreach methods have been, and what types of data are of interest, among other things. In fall 2008, we conducted a similar survey to assess the current state of needs for and use of secondary data, and whether there has been an increase in awareness of Data Services following several years of increased promotion and outreach activities. This session will discuss the survey process, some of the key results, and how the recent survey results compare to the 2003 survey.
Taking the Pulse of our Members: Creating a Healthy Data Environment
Wendy Watkins (Carleton University)
Michel Seguin (Statistics Canada)
Staying in touch with users is an important part of being able to provide them with the right kind of service. After 12 years in existence, Canada's Data Liberation Initiative (DLI) has recently completed a comprehensive survey of contacts-those folk who are responsible for the day-to-day delivery of data services at their institutions. With an over 90% response rate there are many lessons to be learned. Some of the topics covered were a needs-assessment, as well as satisfaction with the collection, services and local support for data. A major component was the self-assessed competencies in data-related tasks. This session will concentrate on some surprising results regarding these data competencies and the implications for future training of both the contacts and their audiences.
2009-05-27: A4: Public Opinion Data: Over Time and Across the Globe
Question Bank: New and Comparative Research at a Glance
Nanna Floor Clausen (Dansk Data Arkiv)
This paper will demonstrate and discuss how the Danish question bank is of great value for both the social science research community but also for the data archives. The question database holds information on every question from the questionnaires deposited in the archive. The database is searchable from the Web via a dedicated search interface on two levels: simple and advanced. The paper will present the new advantages and possibilities where the most important feature is no doubt the direct access to compare the question wordings at a glance and the immediate access to the original questionnaires. All relevant information regarding a question is presented for the user like e.g. response categories. Other advantages for the users are e.g.:an additional method for identifying more precisely relevant surveys links to all the surveys using the given question the possibility of making comparative research of the surveys using a given question support in the construction of new questions and questionnaires new research projects as the question wordings themselves reflect the time and circumstances in which they were designedmore open access and insight in the surveys and the data via the question database.
Margaret Adams (U.S. National Archives and Records Administration)
As a result of the DataPASS initiative in the U.S., and especially the partnership that it supported between the custodial electronic records program of the U.S. National Archives and the Roper Center for Public Opinion at the University of Connecticut, there is now greater awareness of and expanded opportunities for using the rich collection of international public opinion data collected from 1952-1999, by the United States Information Agency (USIA). The National Archives also preserves and makes available online the public portion of the electronic telegrams, 1973-1975, from the U.S. Department of State's Central Foreign Policy Files. This presentation will focus on the complementary nature of these two types of data for research on U.S. foreign relations, and the manner in which the public opinion data and the electronic telegrams offer unique perspectives on a selection of mid-1970s topics.
2009-05-27: B1: Forging Links: Context and Content in Cultural and Educational Data
Combining Statistics and Documents for a Contextual View of Irish History and Culture
Fredric Gey (University of California, Berkeley)
This paper will discuss quantitative aspects of an ongoing research project Context and Relationships: Ireland and Irish Studies. For biographies and other cultural materials, context and relationships are central to humanities scholarship -- Who were the people and institutions mentioned? How were they related? What else did they do? What other materials relate this topic? Where and when did this happen? What else was going on around that time and place? One important feature of context is the quantitative statistics gathered for the time and place. We relate a historical dataset of Irish census data from the 19th and early 20th century to approximately 1/2 million pages of newly digitized Irish scholarly materials (publications in history, culture, architecture, etc) by connections in time and place. The project is a partnership between the University of California at Berkeley and the Centre for Digitisation at the Queen's University, Belfast, which is creating A digital library of core e-resources on Ireland, including back files of 100 journals on Irish culture and history. Berkeley funding is from the USA National Endowment for the Humanities and the Institute of Museum and Library Services
National Digital Library of Finland: How to Enable Access to Digital Cultural Material to Users of Today and to Future Generations
Minna Karvonen (Ministry of Education, Finland)
Ministry of Education has started a project called "National Digital Library" (2008 - 2011), which encompasses digitization of prioritized cultural heritage material of museums, archives and libraries, and online accessibility and long term preservation of both digitized and born-digital cultural material. The aim of this project is to establish one national access point, through which the most essential cultural heritage collections are searchable on the item level. The service will allow access also to databases of immovable heritage and various services of museums, archives and both research and public libraries. In addition, the Finnish National Digital Library aims at creating the lasting mechanisms for long-term preservation of digital cultural material and scientific information (common architecture and guidelines, common information system, legal and contractual responsibilities, processes etc.). National Digital Library of Finland encompasses: digitisation of cultural heritage collections of museums, libraries, archives, and audiovisual archivesaccess to digital cultural resources, andlong-term preservation of digital cultural heritage. The aims to make the most essential Finnish collections both digitised and searchable through a common user interface (in operation 2011)to create sustainable solution for long-term preservation of digital cultural material (finalised plan in 2010)
Research Data Center at the German Institute for Educational Progress (IQB)
Michel Knigge (Institute for Educational Progress (IQB))
After decades of abstinence Germany resumed to participate in international empirical educational large scale assessment studies like TIMSS, PISA and PIRLS in the 1990s. Because of the political explosiveness of the topic and lacking experience in data dissemination in Germany the generated data was regularly not accessible for researchers outside the national research groups. The recent development of research data centers (RDC) and the growing interest for educational research in Germany led 2007 to the foundation of the RDC at the Institute for Educational Progress in Berlin (German: IQB) to facilitate the data access to the available German educational assessment data. The aim of the presentation is to introduce the RDC at IQB. We provide cross-sectional and longitudinal educational assessment and survey data on students, schools and classes. Data on individual students include comprehensive information on actual competencies and personal and family background. School and class data is linkable to the student data. We offer different access methods which depend on the degree of data confidentiality: Anonymous made scientific use files are sent to the researcher. Confidential data can be analyzed via on-site use or remote execution. Access is free of charge and not restricted to German researchers.
2009-05-27: B2: Enhancing Data Sharing: Practices, Tools and Constraints
Making Sense of the Census: One Year On with the Census Aggregate Information Resource Demonstrator (CAIRD)
Justin Hayes (MIMAS, University of Manchester)
Rob Dymond (MIMAS, University of Manchester)
As previewed at IASSIST08, the CAIRD project is utilising developments in structured XML (SDMX and DDI) to create an online service combining, advertising and providing data and metadata generated from UK 2001 Census aggregate outputs in open-standards, machine-readable ways to increase their usability, and facilitate advances in application development. The primary aim of CAIRD is to encourage adoption of a similar approach by UK census agencies for the UK 2011 Census by demonstrating the potentials of this approach. As of November 2008, CAIRD is beginning to produce some exciting outputs, with more in the pipeline by IASSIST09! A prototype web service is already delivering previously impossible search and exploration capabilities based directly on dataset information content; a major improvement on browsing traditional census tables. We are pleased to report substantial interest from, and collaboration with UK census agencies. CAIRD has strong relevance to the IASSIST09 theme of Mobile Data and the Life Cycle through demonstration of the benefits that this approach can bring to data sharing, flexibility and overall usability; prerequisites for a mobile, Web 2.0 world. It also identifies a need for the approach to be incorporated and considered prospectively at all stages of the data life cycle in order to fully realise its benefits.
The area of depositing and making research data available is not a new or uncultivated one. Scientific data archives have been active in this field since the early sixties. Since a couple of years nearly all scientific organizations in the Netherlands aim for a policy in which public funded research will be made accessible as much as possible. Funders and commissioners of research attach more and more value to the permanent accessibility of research data financed by them. Governmental organizations, funders of research, universities, publishers, scientific data archives and researchers are implementing in different ways the policy of access to research data. However, sharing of research data is not a common practice for most scientists. In practice it appears that the depositing of research data is not imbedded in the scholarly workflow and 'mental' system of researches, institutions, funders and policy makers. Therefore it is important to identify which stakeholders benefit by making research data available. What kind of policies have they already developed? Which instruments play an important role in these policies and which restraints do they encounter in practice? In 2008 DANS organized a workshop with different stakeholders where recommendations have been made to remove obstacles for sharing data. During this presentation these recommendations will be addressed.
Access to Governmental Microdata for Research: Recent Developments and New Challenges in Europe
Paola Tubaro (Reseau Quetelet, France)
Roxane Silberman (Reseau Quetelet, France)
Official microdata constitute a major source for research in the social and economic sciences and for public policy evaluation. The paper provides an overview of similarities and differences in conditions of access to governmental microdata for researchers in European countries, placing emphasis on recent changes and trying to identify directions for future development. Regarding anonymized datasets, focus is on limits that often remain despite a generalized improvement in availability, and on unevenness of access conditions across Europe. On confidential data, the paper examines challenges arising from the recent upsurge in researchers' data demand, and discusses some possible solutions, including safe data centres on the premises of statistical institutes, secure virtual data laboratories accessible through the Internet, and an enhanced role for data archives at the national and European levels. These solutions are diverse in terms of easiness, cost-effectiveness, and extensiveness of access. Dealing with these problems, and experimenting with innovative solutions, requires an appropriate legal framework. In this light, the paper draws attention to recent changes in the law of European countries and examines whether and how they have opened new possibilities for microdata dissemination. Special emphasis is placed on the cases of France and the United Kingdom.
2009-05-27: B3: Life Cycle Considerations for Research, Users and Archives
Applying the DCC Curation Lifecycle Model
Sarah Higgins (Digital Curation Centre)
The documentary heritage and the scientific record are increasingly born digital. The UK based Digital Curation Centre supports institutions who store, manage and preserve such data to help ensure its enhancement and continuing long-term use. The DCC (Digital Curation Centre) Curation Lifecycle Model provides a generic graphical high-level overview of the stages required for successful curation and preservation of digital material from initial conceptualisation. The model can be used to plan curation and preservation activities, to ensure sustainability of digital material, within an organisation or consortium. It's application can help ensure that all necessary stages are undertaken, each in the correct sequence. The model is used by the DCC: as a curation training tool; an internal planning tool to ensure that information, services and advisory material cover all areas of the lifecycle; to contextualise standards within the DCC DIFFUSE Standards Project; and to structure advisory case-study work. Externally it has been adopted by a number of organisations as the framework for curation activities and to date has been used to conceptualise processes by the UK Research Data Service and some JISC initiatives. This paper will present the DCC Curation Lifecycle Model and highlight some of the current applications of it.
Is Mobility of Data a Special Problem for Qualitative Research Collections?
John Southall (UK Data Archive, University of Essex)
This paper builds upon the experience of archiving qualitative data to look at issues of data mobility. It will look at the interplay of such mobility with the wider concerns of data acquisition, preservation and delivery. At different times within the life cycle of data the potential for data mobility may be strong and beneficial. At other times it may be seen as problematic or as occupying a place of tension within the demands of a digital archive. This will be examined through examples of audio and textual data. The need for archives to promote data and metadata standards as part of their key objectives will also be considered. Such activity can be used to promote mobility but it is arguably important to recognise where it may present problems for an archive and as such be discouraged. The effect of restricting access or dealing with confidentiality will therefore be considered. Particular reference will be made to the role of users of data and there will be a consideration of user demands and expectations. The argument that such expectations are new and novel and that they have developed alongside digital media - in ignorance of traditional archival values - will be examined.
From Life Cycle to Continuum: Assuring Research Use of Records and Archives
Marjo Rita Valtonen (University of Tampere, Department of Information Studies)
Life cycle thinking has spread to every field in society; products, services and healthcare have life cycle. In the digital working environment the life cycle of data, information, documents, records and archives have to carefully design in advance. To be able to use them as long as needed in operational tasks and later on in research needs holistic multidisciplinary approaches. In the meaning "from the cradle to grave" the life cycle concept misrepresents the case. Records as evidences of social realities and building blocks of social memory are preserved long time or permanently. This means continuous attending of accessibility, usability and availability of digital records and archives. Specialised scholarly and professional knowledge is needed in appraising, managing, selection and preservation of digital records and archives - all this even before "the cradle stage", and with future visions reaching over hundred years. Standardised recordkeeping in organisational context supports current and future research with creating researcher satisfied metadata. Question is not about a linear and ending life cycle but a continuum, a kind of "information processing rhythm", which assures that information can not die.
2009-05-27: B4: Crossing Traditional Boundaries: Mobile Data-Based Resources
Networking Outside the Networks
Vesa Korhonen (University of Jyvaskyla, Finland)
Due to the recent unfortunate events, for example, the ability to monitor and control the messages and data transferred in the Internet has gained attention. However, there also exist methods for networking beyond any control. Smart, hand-held mobile devices with support for communication technologies such as the WLAN allow spontaneous, ad-hoc networks being created (and discontinued) without notice. Such networks may remain completely invisible and hence allow the delivery of illegal or inappropriate contents with very low risk of being detected. And even if the network traffic can be detected, it is virtually impossible to point out the originator of the data from a large crowd, for example, if he manages to shut down the equipment fast enough. The author has participated in research on short-range wireless mobile networks, where the concept Mobile Encounter Network has emerged. Based on this concept, the so-called ''Dark Side'' of mobile networking has been predicted. It is suggested that much more research effort, both in computer science and in human sciences, should be directed to this area. Getting mobile can integrate, but it may also disintegrate. We must know what we can't let loose, before it is too late.
Sari Makinen (University of Tampere, Department of Information Studies)
Mobile information and communication technology challenges the theory and practice of organization science, business processes, information systems, and especially records management. Electronic records cannot be managed without changes in the methods developed for traditional records mainly in paper form. Electronic records can be found in every area of government and business activities. These records are part of mobile business processes more and more often. With the growing number of people using mobile tools new kind of problems are also encountered. It has been estimated, that about 12 percent of organizational knowledge is in structured knowledge base and the majority lies scattered about organizations in the form of paper and electronic documents. Since mobile workers have less control over the working environment, records may not be captured into records management systems and neither into the organizational memory. There is no regulation for producing, editing and storing records in the mobile working environment. Another important topic is the idea of accessed: what problems do mobile professionals have accessing information sources of their organizations? Using mobile devices and electronic records, we also need to be convinced of the integrity of data.
Doing Data on YouTube: Outreach and Education Using Web 2.0
Ryan Womack (Rutgers University Libraries)
In the YouTube era, data centers need new outreach mechanisms to raise awareness of their services and instruct their clientele in the use of data resources. At Rutgers University Libraries, the Data Librarian is creating a series of video tutorials, titled "Data Snapshots", on major data resources for posting on YouTube, blogs, and university web sites. These are integrated into research guides and the newly created RutgersData blog (rutgersdata.wordpress.com), which is a news resource for the Rutgers data community. Issues in scripting, capturing, editing and manipulating the videos using various software will be discussed. The expansion of a virtual presence for data services through chat consultations and design of the blog is driven by the dispersed environment at Rutgers University, where only one dedicated data librarian serves a student population of 50,000 spread across multiple campuses. Feedback from users is also being gathered to improve design of services and guide future efforts.
2009-05-27: C1: Developing Best Practices for the DDI
Session theme/info: <p>In November 2008, a group of 25 members of the DDI community gathered in
Germany for a week-long working meeting to draft an initial set of Best
Practice documents addressing technical and business matters related to
the application and implementation of Data Documentation Initiative
(DDI) metadata. This panel will discuss the benefits of creating best
practices; how the specific best practice topics were chosen and
prioritized; how the workgroup collaborated to produce the best
practices; and further developments with DDI best practices since
November.</p>
2009-05-27: C2: Sharing Data: High Rewards, Formidable Barriers
Swedish National Data Service's Strategy for Sharing and Mediating Data
Iris Alfredsson (Swedish National Data Service)
Carina Carlhed (Malardalen University)
Veronika Marosi (Swedish National Data Service)
The main purposes for SND are to mediate information on data bases and other digital material collections for research, to facilitate access to research databases and to serve as a knowledge node for documenting and managing research data and adherent methodologies in several knowledge fields. Thus, a very important task for SND is to strengthen the altruistic reception of the importance of data sharing and open access among researchers. We have identified two key areas which serve as barriers for reaching our goals; legal barriers and possessive barriers. The legal barriers are hinders in Swedish current laws and statues. The possessive barriers are thresholds connected to unconsciousness among researchers. We will present our strategy which is a combination of "top-down" and "bottom-up" activities and some preliminary results from a survey. An example of a "top-down" activity is to influence research financiers to put higher demands on future open access data when completion of studies. Another example is to provide means and support researchers through the whole research process, e.g. with interpretations of different legal aspects of open access. Examples of "bottom-up" activities is to be present in different research contexts and "missionize" the benefits of sharing data.
Changing Laws in the UK: The New Statistics and Registration Services Act
Tanvi Desai (London School of Economics)
This paper will outline the new UK Statistics and Registration Services Act, giving details of the new legislative framework, and examining how the Act affects the status of research data users. The Act is the first of its kind in the UK and has led to the creation of the independent UK Statistics Authority, and has made researcher access to data a statutory function of the Office for National Statistics. The paper will also look at the opportunities this might present for improving researcher access to data.
Richard Wiseman (ESDS International; Mimas, University of Manchester)
Celia Russell (ESDS International; Mimas, University of Manchester)
2009-05-27: C3: Mobilizing Data in the Learning Environment
Infrastructure for Statistics Education in Russia
Anna Bogomolova (Moscow State University)
Tatyana Yudina (Moscow State University)
Statistical knowledge is considered one of the main competences of next generation administrators, managers, specialists in all fields of economic and social progress of a country, its concurrent position in future decades and successful personal career of a citizen. One by one states declare statistical education as a national priority and launch programs to promote teaching and training in data understanding and analysis. Following other countries Russian university community has set forth the initiative to develop statistical culture in society, as a first step to compose modern information infrastructure. International experience proves that education may start and go up all school years and to be continued at a college - university and then at post-university training centers. When started at school statistical education may bring important social benefit if a pupil may access and exploit an information base that maintains real economic, social, demographic data and portraits the place a school child lives. The University Information System RUSSIA (http://uisrussia.msu.ru) is supposed to serve as one of a statistical infrastructure components - it provides for data bases that update indicators at regional and local levels. Analytical instruments and tutorials are implemented. Graphics- and map-based data representation is accomplished. Modified modules-based teaching program in statistics is being work out to serve all levels of education - schools, graduate and post graduate university courses and training for government agencies' and public institutions' specialists.
Bringing Data to Undergraduate Classrooms: The Social Science Data Analysis Network (SSDAN) and ICPSR's Online Learning Center (OLC)
Lynette Hoelter (University of Michigan)
John P. DeWitt (University of Michigan)
Quantitative literacy/reasoning have become buzzwords on many campuses, and few disciplines are as well-suited to building students' skills in these areas as are social sciences. Introducing students to data in their early courses when the focus is mainly on substantive topics also gives them a more realistic picture of how social scientists work, preventing some of the disconnect often felt as students move from substantive courses into research methods and statistics. Instructors face many challenges in using real data in these early classes, however. Difficulty in identifying relevant data, the need to simplify data for analyses, and the desire to avoid teaching statistical packages at these early stages are common obstacles instructors describe. This presentation will describe the SSDAN and OLC tools which aim to assist faculty by making it easier for instructors to bring data into lower-division social science courses. SSDAN makes available new data from the Census Bureau's American Community Survey and the OLC pairs substantive concepts with relevant data, both using a Web-based interface and teaching modules that are ready to use with little additional instructor preparation. These tools and support quantitative literacy skills by exposing students to the world of data.
2009-05-27: C4: Data Sharing Across the Disciplines
Session theme/info: <p>This three-part session will explore differences in data sharing among
disciplines, with a specific focus on how the nascent open data movement
in the sciences portends the emergence of a data sharing culture much
different from that found in the social sciences. The role of data
specialists in promoting unfettered sharing of research data is
explored. Part One will examine the theoretical underpinnings of data
sharing within several disciplines and provide empirical evidence - from
an examination of dissertations (with particular focus on access to raw
data that is supplemental to published dissertations) - to test our
hypothesis that a culture of data sharing is less evident in life
sciences and physical sciences than it is in social sciences. Part Two
offers insights from two data librarians who - as front-line
data-sharing intermediaries (largely in the social sciences) - will draw
upon experiences and observations to offer practical tips for promoting
and advancing a data-sharing culture. This will include suggestions for
guiding emergent data developers in the sciences towards an ideal model
of data sharing. Part Three will engage the audience in a lively
discussion about advancing the role of data specialists as full
participants in support of the growing data-sharing culture across
disciplines.</p>
2009-05-28: D1: Tag - You're it! DDI Applications and Experiences
Finding the Right Tags in DDI 3.0: A Beginner's Experience
Claudia Lehnert (Institute for Employment Research of the Federal Employment Agency, IAB)
The Institute for Employment Research (IAB) is associated with the Federal Employment Agency in Germany. We decided to establish a new database organized according to the DDI 3.0 standard for our documentation. One initial task was to build the DDI 3.0 structure for our most complex data example, to make sure that all needed metadata would be supported. For this purpose the Integrated Employment Biographies (IEB) were selected. The IEB is a merged dataset, containing data on employment, benefit receipt, participation in measures of active labour market policy, and jobseeker status. These data come from many different sources, which produce many difficulties. For example, harmonization of some variables is not possible, because the values are dependent on the sources, and are collected in different ways. Capturing such things in DDI 3.0 is not obvious, and some difficulties were encountered. This presentation focuses on how these types of challenges were resolved, to allow for the DDI 3.0 model to be used with our documentation systems.
EduDDI: An application of DDI 3.0 for Large-scale Assessments in Education
Martin Mechtel (German Institute of Educational Progress (IQB))
For the monitoring of educational systems, large-scale assessments of student's archievements are conducted. The institutions involved in these studies need IT systems for the following steps: development of hierarchical structured questions, generation of booklets and codebooks, coding/rating, data processing, analyses, dissemination of datasets and reporting. In order to improve the efficiency of these systems, the reuse of questions and the analysis of one question throughout the whole set of studies is an crucial demand of the researchers. Therefore, any system used in that context must offer an easy link between data sets and the question bank(s). At the German Institute for Educational Progress (IQB), a proprietary IT system for the generation of booklets has been developed and used for 4 years. This system will be expanded to the management of datasets as described above. But the internal datastructures and the tools will implement DDI 3.0 to ensure a sustainable and nationwide utilisation. The presentation covers the following issues: institutional arrangement to ensure a broad-based support, first sections of the DDI-model developed, architecture and first software tools.
Slovak Archive of Social Data (SASD) and Its Experiences with Implementing DDI 3.0
Juraj Svec (Slovak Archive of Social Data, Department of Sociology (CU))
Slovak Archive of Social Data was established in year 2004 with aim of collecting, preserving and disseminating data and metadata from sociological research for non-commercial purposes like research and teaching. Archive is open to all kind of data, but in present initial stage, is actively gathering data mainly from international research programmes (eg. ISSP, EVS, ESS). Main tool for disseminating is our website, where access to (meta)data is provided by two means. User can browse catalogue of archived surveys or search data using keywords, as well as Advanced search. Presently SASD is using DDI 1/2 for archiving metadata. From now on, migration to DDI 3.0 is taking place. Part of the presentation is about experiences with using this new schema. Schema is suitable for exchange of metadata between archives, therefore discussion may arise about this topic concerning efficiency of documentation labour. For example information about survey and core parts of variables can be shared among national archives (eg. basic documentation of ISSP modules). So when one archive will document this core part, other archives may use this documentation.
Continuation/update/report from the robust future of IASSIST-Outreach session from Stanford
Paula Lackie (Carleton College)
Lynn Woolfrey (Data First, University of Capetown)
This session will serve as a platform to report on the ongoing discussions for IASSIST Outreach and provide a venue to continue the conversation among interested attendees - especially those from outside of north America and Europe.
2009-05-28: D3: Ideal IRB
Ethics Review and Data Archiving
Arja Kuula (Finnish Social Science Data Archive, FSD)
2009-05-28: D4: Just the Same, but Different: Comparison Across National Contexts
Marching to the Same Drummer: An Overview of a Proposed IHSN (International Household Survey Network) Open Source Question Bank
Mark McConaghy (Department for International Development, UK)
Survey takers must ask the similar and culturally-appropriate questions in order to ensure that they measure the same phenomena. In most countries, sample surveys, censuses and administrative reporting systems measure the same variables (e.g., demographics, income, labour activities, health characteristics, etc), but often with slight differences which result in non-comparability of data across sources. In this presentation, the IHSN will provide an overview of a question bank which aims to address issues of metadata accessibility and quality by using a standard and comprehensive XML template to hold information on questions, classifications, concepts and indicators from a range of official sources. The Question Bank will serve as a central resource for data producers and other stakeholders to share information in a consistent way. It is hoped that this question bank which, using an open source software will support improvements in data comparability within country as users have easy access to recommended approaches and can consider modifications to new and existing instruments.
For repeated comparative surveys like the European Values Study (EVS) conducted in 1980, 1990, 1999, and 2008 metadata management becomes more and more challenging. The EVS is a longitudinal survey research program, carried out in now 46 countries under the responsibility of the European Values Study Foundation. The original language documentation was started in cooperation with the EVS-countries, CEPS Luxembourg, and EVS at Tilburg University with the aims to assist the primary investigators with the development and translation of the questionnaire for the 2008 wave and to support user friendly comparisons of the wording of questions and answers used in the different languages for comparative analyses. For this purpose a procedure was developed to document the original languages. The documentation process has been supported by two programs developed by GESIS: the Dataset Documentation Manager (DSDM), which allows the language specific documentation on variable level and the export into a DDI 2 compatible format, and the CodebookExplorer (CBE), a special tool to manage complex data and metadata. The translation process was done by a web-based translation system (WebTrans) provided by Gallup Europe. The original language documentation will be published in the GESIS Online Study Catalogue ZACAT (a Nesstar server).
Session theme/info: <p>This panel seeks input from the IASSIST general membership about the 2010-2015 Strategic Plan. Since IASSIST 2008, the Strategic Planning Action group has conducted a series of surveys trying to understand members' impressions of the current state of the organization as well as future opportunities for IASSIST. This session will present information about the Strategic Planning Process, share results from a survey of the IASSIST membership, and engage the membership in a discussion about next steps in creating IASSIST's new strategic plan.</p>
IASSIST Futures: A Discussion Panel on IASSIST Strategic Planning and Organization
Joel Herndon (Duke University/IASSIST Administrative Committee)
Thomas Lindsay (University of Minnesota)
Bill Block (CEISER, Cornell University)
Melanie Wright (UKDA)
San Cannon (Federal Reserve Board)
2009-05-28: E2: United States Information Agency's Historical International Data (withdrawn)
2009-05-28: E3: Qualitative Data: Understandings, Tools and Strategies for Sharing
It's about Relationships, It's about Ethics, It's about Respect: Qualitative Researchers' Understandings of Their Practice and the Implications for Data Archivng and Sharing
Lynda Cheshire (University of Queensland)
Alex Broom (University of Sydney)
Michael Emmison (University of Queensland)
With qualitative data archiving emerging as a distinct possibility in Australia, the practices and 'use' of qualitative research are coming under increased scrutiny and reflection. The recent development of a qualitative data archive (AQuA) by the Australian Social Science Data Archive (ASSDA) provides an opportunity for qualitative researchers to reflect, not only upon the feasibility of qualitative data archiving, but on the core assumptions of their work and the extent to which qualitative data lend themselves to sharing and secondary analysis. Drawing on a series of focus groups with qualitative researchers, we critically explore the meanings ascribed to qualitative research practice and the perceived challenges posed by contemporary technological innovations in data management, access, and analysis. As well as raising concerns over the ethical and intellectual property implications of data sharing, focus group participants frequently referred to the uniqueness of qualitative research as an artistic endeavour that is both personal and solitary, yet at the same time involves special relationships with participants, data and research partners. The accounts presented provide insight into key debates (and divergences) within the qualitative community regarding the values and meanings of qualitative practice, but also how data archiving may come to challenge these core values.
Dimitris Vonofakos (UK Data Archive, ESDS Qualidata)
Richard Deswarte (presenter) (UK Data Archive, ESDS Qualidata)
Qualidata at the UK Data Archive has recently completed two on-line teaching resources, specifically designed to assist qualitative methods teaching. The first resource distinguishes different types of interviewing, whilst highlighting and promoting some of the most important collections held in the archive. It offers summaries of seven distinct interview types: structured, unstructured, semi-structured, feminist, psycho-social, oral history and life story interviews. Each typology begins with a summary of what characterises that particular type of interview and is illustrated by selected extracts from some of the most interesting studies held in the UKDA. The second resource looks at five non-interview qualitative methods, including focus groups, the written word, ethnography, visual methods and the Internet, once again using examples from archived collections. This paper provides an introduction to Qualidata's teaching resources and demonstrates their potential uses for researchers and methods teachers. It highlights the challenges to reusing qualitative collections for teaching purposes, such as issues of distinguishing research styles, the dissemination of the collections and resources, gaining permissions from depositors, and addressing issues of confidentiality and anonymisation. It further discusses the complex interplay between qualitative methods and data collection.
Life-Cycle & Comparative Study Types: Metadata Needs of the Future CESSDA RI
Uwe Jensen (GESIS - Leibniz-Institut for the Social Science)
Particular attention is given to researcher's expectations and demands on metadata from specific study types and to discuss options and barriers in providing substantive context information. Both appear relevant to find available and relevant data as a base for use them in comparative research. On the background of a study life-cycle perspective within the CESSDA research infrastructure (RI) the presentation will focus related metadata needs for different complex study types. Uses cases e.g. on EB trends; ISSP, ESS, BHPS will focus the metadata specifics of these study types. Along with best practice in documenting respective studies and their data the use of the new DDI3 standard is of specific interest for a developing research infrastructure along the study life-cycle from survey design to the publishing, dissemination and reuse of the data.
Session theme/info: Over the past five years government research and higher education funding institutions have 'discovered' many aspects of data and its uses that the IASSIST community has known for over thirty years: data is a valuable commodity, a source of new knowledge, the basis for international research efforts and which therefore requires sound policies to preserve and manage its use and re-use. Thus, there is considerable irony in the fact that just as high-level policy bodies at the national and international level are coming to this realisation, the comfortable and predictable environments in which many of us work are being challenged on a number of fronts. For example, the 'google' world leads to naïve assumptions about discoverability of data no matter where it is housed. But of what quality? New tools and technologies have thrown up a multiplicity of choices for storing, retrieving and analysing data. But can these multiple stores, formats and standards 'talk' to each other? This paper explores how the environment in which we work is changing; the need to take advantage of, and adapt to, the growing recognition of the importance of data; and the need for greater international efforts to maintain the strong, professional standards and practices built up within the IASSIST community.
Data Archives in the 21st Century: Evolving, Adapting or Endangered?
Deborah Mitchell (Australian Social Science Data Archive)
Ben Evans (ANU Supercomputer Facility)
2009-05-29: F1: Foundations First: Laying the Groundwork for Building Partnerships First
Cozying up to the CODATA Elephant: Some Ideas for IASSIST Outreach
Ernie Boyko (Carelton University)
One of the thrusts of the IASSIST strategic plan (2004-2009) was to encourage "collaborations and strategic alliances with related organizations". CODATA is one of the organizations with which IASSIST has started to collaborate. The mission of CODATA is to strengthen international science for the benefit of society by promoting improved scientific and technical data management and use. Sounds a lot like IASSIST! Thus far, collaboration has consisted mainly of having IASSIST members attend the annual CODATA conference and having CODATA members attend IASSIST conferences. The question is... what else can we do to strengthen this relationship? Wendy Watkins and Ernie Boyko from Canada have been officially invited to be observers on the Canadian National Committee for CODATA. This presentation will outline a possible strategy for IASSIST members to use in the context of their National and Regional CODATA committees. By working individually, IASSIST members may be able to strengthen IASSIST's overall connection with CODATA.
The Data Archive Technologies Alliance: Looking towards a Common Future
Myron Gutmann (ICPSR, University of Michigan)
For more than a decade social science data archives and other providers of social science data have used the new technologies of the World Wide Web and advanced programming systems to integrate their internal workflows and systems for delivering data and other content to their users. In this environment two trends emerged: individual, proprietary systems custom-developed for individual archives (such as that at ICPSR, the IPUMS project, and many others), and general-purpose systems that were designed to be installed and used in many archives (such as Nesstar and the DataVerse Network). The proliferation of such systems, and the increasing cost of developing and maintaining them, has led to proposals for increased open source and community development approaches that would allow archives to share a common architecture and common tools, while allowing extensive customization of workflows and data delivery systems. In October, 2008, a group met to discuss the creation of a Data Archive Technologies Alliance. This group will report on its activities at IASSIST, focusing on a survey of technology needs among data archives and a list of priority activities for the Alliance as it moves forward.
Scoping and Developing Institutional Data Services: The Data Libraries of 2020
Luis Martinez-Uribe (University of Oxford, Oxford e-Research Centre)
Research methods are experiencing a revolution due to the emergence of infrastructure and tools that empower scholars to conduct research in novel ways. This brings an increase production of digital research data that has triggered the alarms from research funders and academic institutions. Many research funders in the UK require data produced as part of the research process to be made available on request and expect data management plans to be included with funding applications. Although the UK is rich in domain specific national data services, many disciplines do not have the support and infrastructure required for data collection, access and preservation. Therefore, academic institutions need to help their researchers comply with funding requirements as well as to provide them with the means to allow them to participate in a new data centric research world. Data libraries are great examples of institutional data support and now their services need to evolve to serve the future needs of scholars. This presentation will explore these issues in addition to describe the work carried out in Oxford to scope the requirements for services to manage and curate research data as well as to develop some of these services.
Establishing Trust in Data Curation: OAIS and TRAC applied to a Data Staging Repository (DataStaR)
Gail Steinhart (Albert R. Mann Library, Cornell University)
DataStaR (http://datastar.mannlib.cornell.edu/), a Data Staging Repository developed and maintained by Cornell University's Mann Library, was designed as a platform and related services to support data sharing among collaborators, as well as the eventual publication of data to permanent, domain-specific repositories and institutional repositories. As a staging repository, providing temporary storage for data (whether preliminary or final), DataStaR assumes no long-term responsibility for preservation of content. However, because one of the goals of DataStaR is to facilitate the publication of data to permanent, external repositories, it is critical that DataStaR's operations are consistent with digital preservation best practices. Toward that end, we consider how DataStaR's design and function map to the OAIS reference model, and use the Trustworthy Repositories Audit & Certification (TRAC): Criteria and Checklist as a framework for specifying system, policy, and documentation requirements to ensure that DataStaR is a responsible partner in the entire chain of preservation activities. We present a description of this process and a summary of the types of elements that are most important in a staging environment.
2009-05-29: F2: Protecting Privacy While Preserving Access: Restricted Use Data and Disclosure Considerations
Strengthening the Production of Public Use Microdata Files: Better Tools for Anonymizing Census and Survey Data
Olivier Dupriez (World Bank - IHSN)
Geoffrey Greenwell (PARIS21 Secretariat, OECD)
Public use microdata files (PUMFs, PUFs or many other acronyms) are the bread and butter of many data centres and are essential tools for research and teaching. To date, the production of such files has been a challenging and time-consuming task for data producers. This makes it an expensive step which is often not undertaken by data producers. Intuitively, the anonymization of data files should lend itself to the use of computer based tools to facilitate this process. The International Household Survey Network (IHSN) has established a task force which is investigating the development of a series of tools to aid data producers in measuring the disclosure risk associated with a file, identifying ways of reducing this risk and assessing the information loss as a result of implementing the disclosure limitation procedures. Rather than an integrated software tool, the task force is working towards a modular, coherent set of tools to achieve the above goals. By building on work by others, they may even be able to develop an "intelligent" system. This presentation provides an overview the work of this task force.
An Integrated System for Handling Restricted Use Data
Felicia LeClere (ICPSR)
The volume of restricted use data files distributed by data archives and data producers has increased dramatically in the last 10 years. As data files become more complicated with the addition of georeferencing, biomarkers, and linked administrative records, and other information, disclosure risk has increased dramatically. The most popular solution for distributing highly confidential data is to issue legal restricted use contracts to users. The handling of confidential data and the distribution of restricted use contracts, however, have not kept pace with other developments in data distribution. In this paper, we will explore several linked initiatives at ICPSR designed to streamline the handling, processing, and distribution of restricted use data. This paper will report on our progress in redesigning the entire system of handling, processing, and distributing confidential data. Our internal data processing steps will include segregating and streamlining data handling for confidential data through a CITRIX system. We are also creating an automated contracting system to handle the distribution of all of our restricted use files through a secured download system.
In the light of heightened concern around data security, this paper will highlight some of the measures that can be used to develop and strengthen security in data archiving. We will discuss the different approaches which can be taken towards the construction of firm and resilient data and information security policies within the social science data archiving communities. While international standards can provide theoretical guidelines for the construction of such a policy, procedures need to be informed by more practical considerations. We will draw attention to the necessity of following a holistic approach to data security, which includes the education of data creators in the reduction of disclosure risk, the integration of robust and appropriate data processing, handling and management procedures, the value of emerging technological solutions, the training of data users in data security, the importance of management control as well as being informed by emerging government security and digital preservation standards.
2009-05-29: F3: Beyond and Behind the Numbers: Metadata, Codebooks and Publications
Uncovering the Pitfalls of Enhanced Publications
Maarten Hoogerwerf (Data Archiving and Networked Services, DANS)
Researchers are discovering the enormous potential of the Internet and want to use it to enhance their publications with additional resources such as research data or visualizations. There are currently many ways to construct such 'enhanced publications', but there remain many difficulties that need to be solved before these enhanced publications can be safely implemented on a large scale. DANS, in cooperation with partners from the SURFshare programme, has built a demonstrator of enhanced publications. The demonstrator gives examples of enhanced publications from different scientific disciplines and shows how OAI-ORE can aggregate the different resources of an enhanced publication and how these aggregations can be transformed into user friendly web pages that allow researchers to view- and navigate between them. The goal of this demonstrator is twofold: triggering the researcher to think about their actual needs by showing them what is possible and making repository managers aware of the difficulties that have to be dealt with before enhanced publications can become a common way of publishing. This paper will give an overview of these difficulties.
Adding to the Toolbox: Creating and Maintaining a Searchable Database of Events
Timothy Mullen (Federal Reserve Board)
Economists at the Federal Reserve Board are tasked every day with explaining the underlying causes of shifts in data, yet they do not have at their disposal all the tools necessary to efficiently perform these tasks. A glaring omission in their stable of available resources is a system which allows for researching events in a fast, organized, and succinct manner. We plan to solve that problem by creating a database of events categorized by date, type (financial, political, economic, etc), sector (labor, energy, prices, etc) and sub-sector (unemployment, crude oil, CPI, etc). Using established guidelines for event inclusion while also having strong metadata requirements we can open up a wealth of information for, and greatly reduce the search costs paid by, the user. An open-source application is being developed which will allow for the visualization of a timeline of events which can then be plotted against the time-series data in question. This paper will describe the efforts currently under way at the Federal Reserve Board to create and maintain a searchable database of events aimed primarily at assisting the work of research and forecast economists.
Linda Powell (Board of Governors of the Federal Reserve System)
When the Board of Governors of the Federal Reserve System needed to create a new metadata system they decided to start with a time tested foundation; Dublin Core. The U.S. Central Bank consumes a variety of metadata including metadata that defines collections of data and metadata that describes variable level data. This paper discusses the challenges and advantages of using an international standard. It follows the processes used to create collection level metadata, variable level metadata, and retrofitting existing metadata to make it all usable by economists and financial analysts.
CoSSI - Codebook for Statistical Information or Something More?
Heikki Rouhuvirta (Statistics Finland)
The starting point of the presentation is to examine the character of statistical information, the way in which data and metadata are interconnected in statistical information and how the entity formed by them can be modelled. The focus of interest is statistical information itself - what we actually mean when we talk about statistical information. CoSSI (Common Structure of Statistical Information) is a model created for statistical information. Within its framework different parts of statistical information are combined conceptually as one complete entity. The CoSSI model defines the structures of statistical data (matrices and tables), statistical metadata, quality declarations and publications. XML DTDs have been selected as the technical means for implementing these structures. The CoSSI model also contain the language versioning necessary for statistics in international use. After the short introduction to the CoSSI model its usability is examined in the statistics production and dissemination of statistics, as well as in the scientific research use of statistical data. Finally the relation and compatibility of CoSSI and DDI are examined.
2009-05-29: F4: Building on Data: Resources, Tools and Applications
The Good, the Bad and the Ugly of Playing a Data Custodian
Chiu-Chuang (Lu) Chou (Data and Information Service, University of Wisconsin, Madison)
The National Survey of Families and Households (NSFH) is a prominent longitudinal study on family life. NSFH was funded by the National Institute of Child Health and Human Development (NICHD) and National Institute on Aging (NIA). The total amount of federal grant for NSFH was 14.5 million dollars. Three waves of surveys were conducted in 1987-1988, 1992-1994 and 2001-2003. According to ICPSR Related Literature database, there are 1,053 publications based on NSFH data. Researchers continue to use NSFH to study family living arrangement, marriage, cohabitation, fertility, parenting relations, kin contact and economic and psychological well-being. The Center for Demography of Health and Aging (CDHA) took over user support for NSFH, after this project ended in summer of 2006. Without the expertise of the original NSFH project team, how does CDHA staff help NSFH researchers? In this paper I will share the challenges and rewards we have in providing user support for the complex NSFH studies. Our enhancement on disseminating NSFH data using an online analysis tool will be discussed. A plan of data repurposing for NSFH in the future will be presented also.
Hidden in Plain Sight: Creating County-Based Data with Public Use Mircrodata Areas
Lisa Neidert (Population Studies Center, University of Michigan)
Researchers are requesting geographically-referenced data, particularly at the sub-state level. There is a array of resources to fulfill these requests. This paper describes a 'hidden' source of county level data; presents a tool that allows users to extract user-defined characteristics; and illustrates a reverse use of this tool to create reliable characteristics for sparsely populated counties. The American Community Survey has generated county-based data since 2005. However, the only counties included in the data release are counties with populations of 65,000 or more. While this population cut-off includes over 80 percent of the US population, it only incorporates 783 of 3,141 counties. Data from the census has typically included all counties but access problems remain. Sometimes a researcher needs a measure not included in the summary files. In addition, there are some sparsely populated counties. Users would benefit from more reliable statistics for these counties. The lowest unit of geography in Census/ACS microdata is the public use microdata area (PUMA). Using the mapping between PUMAs and counties, we translate PUMA-based statistics into county-based statistics. Likewise, when counties are too small to generate reliable statistics, we combine data for these counties based on their PUMA boundaries to create pseudo county-based measures.
Nereus is a consortium of prestigious European libraries in the world of academic economics. At present the consortium concentrates its efforts on a EU funded project called NEEO (Network of European Economists Online) which will address the lack of integration of academic output by creating a powerful new research tool called Economists Online. With this tool it will give access to 50,000 journal articles, working papers, book chapters, conference proceedings and primary datasets of leading European Economists. One of the work packages of the project is entirely devoted to datasets. The three main objectives of this work packages are: To disclose and link the research data of publications of leading economists in Europe on the Internet To make these datasets openly accessible and freely available To make an inventory of the problems involved in the disclosure of primary research data To do so NEEO partners will store datasets in their Institutional Repositories, describe the datasets to the DDI standard and harvest the metadata for the Economists Online portal. The NEEO data workgroup will use the results of the enriched publications project "Together in Sharing" which was presented at last year IASSIST conference at Stanford.
The Value of Public Sector Data and Information to Civil Society Organizations in South Africa: Evidence from the Fight to Alleviate Poverty
Raed Sharif (School of Information Studies, Syracuse University)
Public sector data and information (PSDI) are considered by many to be a strategic resource, potentially needed at all levels of society, by different communities. This presentation reports on the preliminary findings of an investigation of the ways in which these resources are utilized by the South African Civil Society Organizations (CSOs) to increase their effectiveness and add value to their efforts to alleviate poverty. The study draws upon literature from organizational studies and information policy. The concepts of value of information (Parker & Houghton, 1994), management of external information (Sammons, 2005), absorptive capacity (Cohen & Levinthal, 1990), organizational learning (Argyris & Schön, 1978), and organizational innovation (March & Simons, 1958) are used to guide my inquiry to demonstrate the value of PSDI to South African CSOs through describing and explaining the processes to identify, acquire (including factors that facilitate or hinder access and acquisition), assimilate, and exploit this strategic resource. It is expected that the discussions and findings of this study will have theoretical and policy contributions, and will be of special importance to the government of South Africa (and hopefully governments in other developing countries), the CSOs fighting against poverty, and subsequently to the people of South Africa.
2009-05-29: G2: Making Space: Issues in Linking Data and Geographies
GeoConvert: Creating that Spatial Relationship
David Rawnsley (Mimas, University of Manchester)
GeoConvert is a web based service allowing the matching and conversion of geocoded data to other geographic area types, including those from different years. Geography area types abound - Counties, Postcodes, Output Areas, NUTS areas... and on and on and is ever changing, so every few years the boundaries get redrawn or someone creates a new type of geographical area. What is the social science researcher to do, at what level does he collect his data or do his analysis? Most people know their Postcode, but very few know what Census Output Area they live in. GeoConvert enables the user to convert their data to numerous other geocoded datasets, no matter what geographic level it has been collected at. UK Census geographies can be connected to European Union Eurostat data at NUTS level or Postcodes can be matched to Primary Care Trust health data. GeoConvert opens up social science dataset usage to new users such as Widening Participation and Further Education by making it easy to link datasets to Postcodes, a common easy to use geographical identifier. By reducing the technical barriers to geocoded data and allowing the ability to connect to historical geographies the re-use of pre-existing datasets is encouraged.
Standards Based Services for Dissemination and Processing of Geospatial Data: An Example Using the UK Census
James Reid (University of Edinburgh)
This paper will report the findings of the Data Integration and Dissemination (DIaD) project , which is investigating the potential of using international open standards based techniques (Open Geospatial Consortia) to perform data linkage between two of the most heavily used UK academic census outputs - the aggregate statistical data and the output geographies. The primary objective of this work is to develop a data dissemination model which demonstrates a more generic capability - that of 'geo-linking'. This provides the ability to separate census statistical data (for example, but other geospatially-linked data are equally capable of utilising this approach) and the boundary (geometry) data to which it relates. Geo-linking allows for distributed, multi-source datasets to be seamlessly linked in a fashion that facilitates data separation for management and administration purposes. In essence, the approach proposed will provide an extensible infrastructure applicable not only to the immediate needs of the UK Census Programme but also more widely to a broader range of use cases. Additionally, using the same standards based approach, the project will aim to demonstrate how further value added processing can be invoked by transforming the geo-linked outputs through a series of ancillary web processing services.
2009-05-29: G3: Building Data Archives and User Communities: Greece, Estonia and Ethiopia
Developing of Data Archiving and Dissemination System at the CSA
Yakob Seid (Central Statistical Agency, CSA)
The Central Statistical Agency of Ethiopia is responsible for providing accurate and timely statistical information for development planning and monitoring purposes. To achieve its responsibility CSA has been engaged in utilizing Information Communication Technology to facilitate its data processing, archiving and dissemination system so that the required statistical information can be generated and reach the users. CSA is considered as one of the leading institutions in Ethiopia in utilizing ICT to accomplish its basic tasks. The CSA started its computerized statistical data production by utilizing the IBM System/3 with 12k CPU. This time, the agency is handling its statistical data archiving and dissemination system through high capacity servers and a reliable network infrastructure. The paper based dissemination and restricted access to the CSA data has undergone very significant improvements. The DDI application has tremendously improved the metadata documentation and enables the CSA metadata to be archived and disseminated in an internationally accepted standard. Utilization of GIS in providing easy access of data to decision makers has shown a very significant improvement as well.
What Do Researchers Look for in Archives? Data and Metadata on User Requests, and After Service Tracking in the Case of an Emerging Data Sharing Culture
Chryssa Kappi (Greek Social Data Bank, National Centre for Social Research)
Collecting, Visualizing, Communicating, and Modeling Spatial Data in the Social Sciences
Dr. Michael Batty (Centre for Advanced Spatial Anaysis, University College London)
Andrew Hudson-Smith (Centre for Advanced Spatial Anaysis, University College London)
Andrew Crooks (Centre for Advanced Spatial Anaysis, University College London)
Richard Milton (Centre for Advanced Spatial Anaysis, University College London)
Duncan Smith (Centre for Advanced Spatial Anaysis, University College London)
New web technologies and task specific software packages and services are fundamentally changing the way we share, collect, visualize, communicate and distribute geographic/spatial information. Coupled with these new technologies is the emergence of rich fine scale and extensive spatial datasets of the built environment. Such technologies and data are providing opportunities for the social sciences that were unimaginable ten years ago, particularly new forms of modeling and simulation. In this paper, we discuss such changes from our own applications which are developed in a research context which emphasizes dissemination using Web 2.0 technologies... Specifically, we illustrate how it is now possible to harness the crowd to collect peoples' opinions about topical events such as the current financial crisis, in real time and map the results, through the use of our GMapCreator software and the MapTube website, infrastructure for social science developed under the banner of the UK's National Centre for e-Social Science. Furthermore, such tools allow for widespread dissemination and visualization of geographic data to whoever has an internet connection. We will explore how one can use new datasets to visualize the city using our 3-D GIS-CAD Virtual London model as an example which we can embed in web based software such as Google Earth as well as within conventional GIS and CAD software. Within the model, individual buildings can be tagged with multiple attributes providing a lens to explore the urban structure offering a plethora of research applications. Finally we turn to how one can visualize and communicate such data through low cost software and virtual worlds such as Crysis and Second Life with a look into their potential for modeling. Our aim in the paper is to provide a perspective on new developments in spatial data and modeling and their dissemination within the social science.