Already a member?

Sign In

Conference Presentations 2012

  • IASSIST 2012-Data Science for a Connected World: Unlocking and Harnessing the Power of Information, Washington, DC
    Host Institution: National Opinion Research Center (NORC)

B1: Metadata Production Tools (Wed, 2012-06-06)

  • Keeping up with Colectica
    Jeremy Iverson (Colectica)


    Colectica is a suite of software for managing statistical data for a single user or a large institution. It is based on the DDI 3 metadata standard and provides tools for documenting data, importing metadata from existing sources, and publishing documentation on the Web and in other formats. In this session we will demonstrate features added to Colectica in the past year, including: doubling of the coverage of the metadata content model, user-based access controls, integration with organizational user management such as LDAP and ActiveDirectory, enterprise database support, streamlined synchronization and versioning for multi-user metadata editing, improved full-text search, a new Dataset Explorer view for Colectica Web, linked data publishing, and new releases of the free DDI Reader and DDI multiple-level Validation tools.

  • Metadata Editing Framework: An open source project for building metadata tools
    Jack Gager (Metadata Technology North America)


    As part of the DDI4RDC development project initiated in 2009, a set of editors and researcher tools based on the DDI-Lifecycle model were developed to meet the Canadian Research Data Center Network metadata documentation specific needs. As the project evolved, the core components of the platform morphed into a flexible and generic framework on which numerous types of metadata management and user tools can be built. The core framework as well as the RDC specific editors will be released as an open source project in the second quarter of 2012. Our presentation will detail the underlying principles and architecture of the framework, as well as the design of the RDC specific editors. Particular focus will be paid to how these components can be leveraged by other organizations to develop metadata tools meeting their specific management and user needs. Examples will be provided detailing how the framework is being leveraged to develop stand alone DDI editing tools, a data management plan documentation tool, and a researcher Toolkit.

  • Tools for extracting ASCII data and DDI metadata from Stata, SPSS, and other proprietary statistical file formats
    Andrew DeCarlo (Metadata Technology North America)


    Statistical data are often stored in proprietary files formats such as SAS, Stata, SPSS, and others. While useful for processing and analytical purposes, it makes it challenging to access unless you have the right software or utility, which often requires commercial licensing. While statistical packages are not particularly metadata aware, these files hold a significant amount of variable level information. Having the ability to extract these in a DDI friendly XML format along with complementing it with summary statistics computed off the data, is highly desirable. Extending on previous efforts, Metadata Technology North America has enhanced and developed new Java based packages for reading Stata and SPSS files that can export the data in ASCII text format and extract variable level DDI-Codebook and DDI-Lifecycle metadata (data dictionary and summary statistics). Various options are available in terms of ASCII flavors and metadata generation, providing features beyond what is typical export capabilities of statistical packages or utilities. This enables the conversion of data files into open format combining ASCII+DDI, fit for long term preservation, dissemination, or further processing by DDI aware tools. Our presentation will provide an overview of these utilities, describe use cases, share lessons learned, and discuss future development.


B2: Metadata meets the semantic web (Wed, 2012-06-06)

  • How to Infer an Ontology of the Data Documentation Initiative?
    Thomas Bosch (Gesis - Leibniz Institute for the Social Sciences)


    In close collaboration with domain experts, ontology engineers have designed a first draft of an ontology, describing the DDI data model's conceptual components which are most relevant in queries and use cases. XML Schemas express the vocabulary and the syntactic structure of XML documents representing both DDI data and meta-data. There is a huge amount of XML document instances, which has to be mapped to the RDF representation of the developed DDI domain ontology, in order to reuse already existing DDI data and meta-data and to profit from the benefits associated with the RDF format. A manual mapping, however, requires a lot of time, is very error-prone, and therefore not applicable. The authors devised a generic approach which converts unexceptionally any XML Schemas to generated ontologies automatically using XSLT transformations. Domain ontologies, such as an ontology of the DDI, can be derived automatically on the basis of the generated ontologies using SWRL rules. As a consequence, all the information, located in the underlying XML Schemas and associated XML documents, is also expressed in the DDI ontology and its RDF representation. Subsequently, ontology engineers will add supplementary domain-specific semantic information, not represented in the XML Schemas, to the DDI ontology.

  • Linking Study Descriptions to the Linked Open Data (LOD) Cloud
    Johann Schaible (Gesis - Leibniz Institute for the Social Sciences)
    Benjamin Zapilko (Gesis - Leibniz Institute for the Social Sciences)
    Wolfgang Zenk-Moeltgen (Gesis - Leibniz Institute for the Social Sciences)


    The Data Catalogue contains the study descriptions for all archived studies at GESIS. These descriptions include information about primary researchers, research topics and objects, used methods and the resulting dataset. They are primarily used for archiving and retrieval. However, for this purpose the existing metadata can be enriched with further information about the study content, investigators, involved affiliations, collection dates, and more from other sources like e.g. DBpedia, GeoNames or the Name Authority File (PND) of the German National Library. In this paper we present how to enrich a study description with datasets from the LOD cloud. To accomplish this, we expose selected elements of the study description in RDF (Resource Description Framework) by applying commonly used vocabularies. This optimizes the interoperability to other RDF datasets and hence the possibility to express links between them. For link detection we use existing algorithms and tools, which are most promising in discovering relevant links to related data. Once links are detected, the study description is linked to external datasets and holds therefore additional information for the user, e.g. occurred events before or during the collection dates of a study, which are relevant to its topic.

  • Accessing DDI 3 as Linked Data - Colectica RDF Services
    Dan Smith (Colectica)


    Linked Data represents an opportunity to publish the structured data of DDI 3 and publish it in a way that can be interlinked to become even more useful. Linked data makes use of existing web technologies standardized by the W3C such as HTTP and URIs and extends them to enable the machine actionability of web information. The DDI 3 has a very robust information model containing many relationships between metadata items. This can be seen through the DDI 3's use of references to enable reuse. This DDI model can also be represented in RDF. This paper examines why serializing DDI as RDF can be beneficial, how DDI 3 can be represented in RDF, stored in a repository, and queried using the standard SPARQL query language. An example of these concepts is demonstrated using Colectica RDF Services.

  • DDI-RDF - Trouble with Triples
    Olof Olsson (Swedish National Data Service)
    Thomas Bosch (GESIS - Leibniz Institute for the Social Sciences)
    Benjamin Zapilko (GESIS - Leibniz Institute for the Social Sciences)
    Arofan Gregory (ODaF - Open Data Foundation)
    Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences)


    In Schloss Dagstuhl and in Gothenburg before EDDI 2011, ontology engineers and experts from the social, behavioral and economic sciences developed an ontology of both DDI Codebook and Lifecycle and implemented a rendering of DDI instances to RDF (Resource Description Framework). The main goals associated with the design process of the DDI ontology were to reuse widely adopted and accepted ontologies like DC or SKOS and also to define meaningful relationships to the RDF Data Cube vocabulary. Now, organizations have the possibility to publish their DDI data and meta-data in RDF and link it with many other datasets from the Linked Open Data (LOD) cloud. As a consequence, a huge amount of related DDI instances can be discovered, queried, connected, and harmonized. Only the semantic combination of DDI data as well as meta-data of several organizations will enable derivations of even surprising and not imaginable implicit knowledge out of explicitly stated pieces of information.


B3: Planning for Preservation: TDR/OAIS in data archives (Wed, 2012-06-06)

  • Improving Operations Using Standards and Metrics: Self-Assessment of Long-Term Preservation Practices at FSD
    Mari Kleemola (Finnish Social Science Data Archive)


    Preserving digital data is a challenge. For the outcome to be successful, many organisational and practical issues need to be in place. The Finnish Social Science Data Archive (FSD, is dedicated to support the life cycle of digital research data in social sciences. It is therefore critical that FSD's operations and procedures are up-to-date and consistent with relevant standards and best practices. The key standard for long term preservation of digital data is the Open Archival Information System (OAIS) Reference Model (ISO14721:2003). This presentation will explore FSD's conformance to OAIS and describe how FSD's functions map to the seven OAIS functions (ingest, archival storage, data management, administration, preservation planning, access and common services), with special attention to ingest and access. We will also discuss the process and results of a self-assessment that was conducted analysing FSD policies, plans and procedures within the framework of the Audit and Certification of Trustworthy Digital Repositories (TDR) Checklist (CCSDS 652.0-M-1). We will summarise the weaknesses, risks and strengths identified, discuss the use of the TDR metrics and analyse how adopting standards and best practices could facilitate co-operation and collaborative partnerships.

  • Improving the Trustworthiness of an Interdisciplinary Scientific Data Archive
    Robert R Downs (CIESIN, Columbia University)
    Robert S. Chen (CIESIN, Columbia University)


    The opportunity to deposit scientific data with an archive or scientific data center enables researchers and scholars to focus on their intellectual pursuits while trusting that the archive will attend to the details of providing stewardship, management, and long-term access services for their valuable digital assets. Whether an archive is worthy of such trust is a question that needs to be answered so that data producers will know where to submit their data for safekeeping and continuing dissemination to current and future communities of interest. A new standard released by the International Organization for Standardization, ISO 16363, specifies the requirements for certification of a digital repository as trustworthy. With the establishment of the new standard, archives have measurable targets for attaining trustworthiness—and users of archives, including depositors, have a way to determine if a particular archive meets their need for a trustworthy archive. An initial set of test audits using the new standard offers insight into the audit process and the level and type of effort required for archives to become certified as trustworthy. The authors will describe the test audit of an interdisciplinary scientific data center for compliance with the draft standard and summarize the test results.

  • Data Seal of Approval (DSA) - The assessment procedure
    Lisa F. de Leeuw (Data Archiving and Networked Services (DANS))
    Henk Harmsen (Data Archiving and Networked Services (DANS))


    Lisa de Leeuw (DANS) Henk Harmsen (DANS) The Data Seal of Approval ensures that in the future, research data can still be processed in a high-quality and reliable manner, without this entailing new thresholds, regulations or high costs. The Data Seal of Approval and it's quality guidelines may be of interest to research institutions, organizations that archive data and to users of that data. It can be granted to any repository that applies for it via the assessment procedure. Achieving the DSA means that the data concerned have been subjected to the sixteen guidelines of which the assessment procedure consists. The data archive as an organization should take care of the overall implementation of the DSA in its own specific field. In the integrated Framework for Auditing and certification, which is being set up at the moment, DSA will be the first step in 3 tiered certification process, see: The DSA-Board has developed an online self-assessment tool through which the DSA can be applied for. In this paper we will present the online application tool, focussing amongst others on problems and solutions, as well as elaborate on the procedures around DSA.

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect


  • Resources


    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...