Regional Report 2004-2005 Canada

IASSIST Regional Report 2004-2005

Bo Wandschneider
University of Guelph
May 2005

Being my second report for IASSIST I assumed this would be easy, but I still wasn’t sure where to begin. I have come to the conclusion that it is almost impossible to keep track of what everyone is doing because there is so much great stuff happening. Electronic communications has certainly enhanced our ability to share information, but there still is nothing as productive as spending time with our colleagues, outside of our work environments, discussing what we do. In Canada we are fortunate to have strong regional organizations like ACCOLEDS, CREPUQ and DINO (see note below) that can be tied together nationally through DLI.

I have tried to solicit feedback from the community and if I forgot to highlight something it is not that I didn’t deem it significant, just that I didn’t know about it or I just missed it.

This is in no particular order:

DLI: The Data Liberation Initiative will be celebrating its 10th anniversary in December of this year. This is a great achievement. I can confidently say that without DLI we wouldn’t be were we are now and I personally would probably not be involved with the Canadian Data Community and IASSIST. DLI was a wake up call and opportunity for many of us to initiate or further develop our services and skills. The sense of community it has created has been extremely important to research and teaching in Canada.

Everyone should stay tuned for a celebration that will extend to IASSIST2006 in Ann Arbor.

CAPDU - The Canadian Association of Public Data Users recently met in Kingston. There has been turnover in the executive. Vince Gray of Western has assumed the role of President and Michelle Edwards of Guelph Vice-President, while Mary Luebbe of UBC is Secretary and Sandra Keys of Waterloo is treasurer. Laine Ruus is past president. Having the meeting tied to DINO and DLI training increased participation, but unfortunately it was skewed and it was our loss in not having wide representation from the western and eastern parts of Canada.

Discussion at the AGM focused on changes to services and roles. More colleagues are taking on the role of data librarian or data professional and there appears to be a trend to moving data resources closer to GIS resources in terms of support and collections.

Notes from Laine Ruus:

The Canadian social research sector continues to be without a data-sharing ethic. There continues to be confusion between the SSHRC requirement for data deposit (very weak, and unpoliced), and the interpretation by local Ethics Review Boards of the Tri-Council Policy on research on humans, which is interpreted as saying that all data must be destroyed (as opposed to merely the personally-identifiable components). As a step in the direction of clarifying this issue, CAPDU is drafting a code of ethics, to which all CAPDU members would be de facto signatories. This is an effort to signal to the research community the seriousness with which we in this profession regard the principles of personal privacy and confidentiality. A second step might then be the promotion of training in anonymization techniques, so as to promote the destruction, not of the data themselves, but merely of the identifiable components of data.

NDA - During the last year the National Archive and the National Library have been amalgamated. Any plans for a NDA appear to be on hold and there does not seem to be any plan for any archiving or any sort of integration.

From Chuck Humphrey:

The final report of the National Consultation on Access to Scientific Research data has just been released (March 29, 2005) and builds upon the earlier report of the National Data Archive Consultation, which was completed in June 2002. I believe that the ultimate outcomes of these two consultations and their recommendations will have a significant impact on the future of data services.

Both reports recognize the importance of data services in our universities to the overall success of preserving research data and of ensuring access to these valuable resources. Both reports acknowledge the role that academic data services need to play in the larger infrastructure supporting research and instruction in Canada.

The complexity and scale of preserving and providing access to research data requires the involvement of many institutions, including university and research libraries, granting agencies, research councils, researchers, the Library and Archives of Canada, data services, Statistics Canada and others. Together these institutions are part of the fabric constituting Canada’s knowledge society. Given the current organizational climate in Canada, no single institution can possibly carry the whole preservation-access load. This is true in other countries, too. The movement in the U.K. around digital curation is to build a matrix of institutions sharing responsibilities for the care of digital resources throughout their lifecycle. Many institutions have a role to play and knowing and clearly articulating their role will be critical to their future.

DDI - Last year we discussed the formation of an ad hoc working group to look at DDI in relation to Canadian data. This national group had some very wide ranging discussion at last year’s IASSIST meeting as well as at several other meetings during the year. Out of this meeting it was proposed that several potential working groups be formed around the following:

  • Logistics(coordination),
  • Defining a ‘Canadian’ tag set,
  • Control Vocabulary/Thesaurus,
  • Quality
  • Education/Training.

A great deal of work has been done towards defining and understanding a subset of tags that are best suited for describing the data we work with in Canada. There have been ongoing discussions between several people working with marking up DLI (Stats Can) data. The University of Guelph and the DLI team have shared their ideas and are focusing on developing a suggested set of tags that would be considered the minimum for marking up and sharing metadata. The ultimate objective is to put some standards in place so that the community can share responsibility for marking up the back-log of data we have.

In addition there have been a couple of training sessions that exposed data professionals to DDI and metadata in general. A workshop was held in Kingston that made people aware of the complexities and difficulties of marking up data even if you know which tags you are going to use. There were discussions that the next training session for this group will use the Nesstar publisher and delve a little deeper into the creation of DDI metadata. A training session is being organized in Quebec for mid May that will do just that. DLI staff have also been talking to authoring divisions within Statistics Canada and the some of the RDC’s have been working with DDI for some of their data. Our hope is that the creators of the data will begin to use DDI in their process of creating survey data and save us the work of marking up data. In light of this, Chuck Humphrey has been looking into the creation of metadata during the collection and creation of a survey, rather than trying to retrofit the data.

On that front and from a more local perspective Guelph has been working with colleagues at Waterloo to mark-up several tobacco surveys using DDI and distributing the results with Nesstar. Waterloo is part of the TTURC (Transdisciplinary Tobacco Use Research Centers) that are collecting an international set of data around tobacco use. It will be interesting to see the outcome of this.

The next step for CANDDI will be to formalize the coordinating group and officially establish the subsequent working groups. It has taken some time to get people more comfortable with DDI and the possibilities and this will now allow us to move forward.

RDC’s - A great deal of progress has been made over the last year to make RDC’s work as part of our community as opposed to working outside of our community. Many more of us are more proactive in reaching out to the RDC’s and in many cases the RDC’s are being located close to the local data services. New RDC’s have recently come on line, or are about to come on-line, at Western, Carleton and Queen’s and there is movement on opening satellites.

That being said, there is always a concern that Statistics Canada will more easily bow to confidentiality issues and only release data into the RDC’s. We are certainly seeing a lot of surveys not having PUMF’s created and going directly to the RDC’s or if they have PUMF’s there is a great deal of suppression. This in turn makes it easier for researchers and teachers to use US or UK data instead of working with Canadian data.

We have been pushing for more synthetic files, but have had limited success in getting files that have any value. More work needs to be done in this area.

For more information contact Wendy Watkins.

Personnel - last year we talked about some significant retirements. Fortunately, both Gaeten Drolet (see below) and Ernie Boyko have managed to keep involved in the community. In addition we have seen the emergence of some new people in Quebec and this is very important for our community. In addition the Train the Trainers added a large, young, and talented contingent to the national scene.

Citation Guide for Data: Gaetan Drolet has been working on a document entitled “Citing Statistics and Data: Where are we today?” This is a very interesting piece of work with a multitude of examples that will really help all level of users properly cite the types of data we work with. This will be presented at IASSIST

New Data Sources - as was mentioned last year, we are still receiving information from the 2001 census. We are still waiting on some of the PUMF’s. Although delivery dates for early products of the 2001 census were better than 1996, the delivery data for the latter products have really slipped and we have been assured this will be improved in 2006. At our recent meetings in Kingston census representatives noted that work is very far along on the 2006 census. This will be an internet enabled survey, and based on testing they are conservatively hoping for 20% take-up. In addition there are a few new things being asked around same sex marriage and whether in 90 years from now you will allow your information to be released. Early testing on the latter shows there may be problems for genealogists and historians down the road, while the former question simply seems to confuse Canadians (just testing if you are still reading :) )

Another interesting development is around data from the addiction folk - specifically NADS and CADS. Statistics Canada will no longer be collecting the data, but the funders will be the same. When the data is finished it will be deposited into DLI - the significance is that this is not Statistics Canada data and the funders like the model used for DLI and are willing to use it for distribution and access control. The hope is that this will be the first of many depositers to DLI.

In addition we continue to roll out more longitudinal surveys, but disclosure is becoming more of an issue.

DINO - Data in Ontario is a new sub-group of OCUL (Ontario Council of University Libraries) and follows in line with other regional groups like ACCOLEDS and CREPUQ. We mentioned last year that we were going to try and establish this group and we have managed to have two very successful meetings. This was clearly a hole that needed to be filled and we are keeping close links with the OCUL Map Group.

We are in the process of establishing a website where you can review the Terms of Reference and minutes.

Topics being discussed include such things as federated identities, sharing metadata (DDI), certification, new provincial data consortiums, numeracy, and an ICPSR National Federation.

Training - This continues to be a strong point for the community in Canada. DLI has really evolved into two things. One is opening opportunities for access to the data and the other is developing new and skilled data professionals. DLI training sessions seem to be getting stronger in terms of content and are being delivered by the community for the community. In addition the DLI training sessions are invaluable for developing relationships among the data community in Canada

Jane Fry of Carleton University and Sage Cram of DLI have been instrumental in developing a collection of all our training materials. This is going to be an invaluable resource. They have already gathered a lot of the information and are in the process of uploading this into an information repository located at the University of Toronto

That’s all for this year ;).