Already a member?

Sign In

Generating Useful Test Data for Complex Linked Employer-employee Datasets

Presenter 1
Peter Jacobebbinghaus
German Data Service Center for Business and Organizational Data (DSZ-BO)

When data access for researchers is provided via remote execution or on-site use, it can be beneficial for data users, if test datasets that mimic the structure of the original data are disseminated in advance. With these test data researchers can develop their analysis code and avoid delays due to otherwise likely syntax errors. It is not the aim of test data to provide any meaningful results or to preserve statistical inferences. Instead, it is important to maintain the structure of the data in a way that any code that is developed with these test data will also run on the original data without further modifications. Achieving this goal can be challenging and costly for complex datasets such as linked employer-employee datasets (LEED) as the links between the establishments and the employees also need to be maintained. We illustrate how useful test data can be develpoed for complex datasets in a straightforward manner at limited costs. Our apporach mainly relies on traditional statistical disclosure control (SDC) techniques such as data swapping and noise addition. The structure of the data is maintained by adding constraints on the swapping procedure.

Presentation File: 
  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect


  • Resources


    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...