Already a member?

Sign In
Syndicate content

San's blog

Remembering Darrell


On Saturday, April 27, 2019, IASSIST and the broader data professional’s community lost a valued colleague and friend, Darrell Donakowski. The IASSIST organization mourns his untimely death and wishes to recognize the significant contributions he made to the data community over the course of his career.

Friends and family have created a scholarship at the University of Michigan-Dearborn in his memory.  At the suggestion of the several members and with the approval of the Administrative Committee, IASSIST has contributed $1000 to the scholarship to recognize the numerous contributions Darrell made to the data community right up to the time of his premature passing.

Darrell worked as a project manager in the Collection Development Unit at ICPSR and later served as the Director of Studies of the American National Election Studies (ANES). He was strongly engaged with IASSIST and with a broader data community through ICPSR and ANES.  He greatly enriched those organizations by bringing together data professionals and archives through efforts such as Data-PASS.

Darrell made prominent scholarly contributions to IASSIST through multiple publications, web postings and presentations.  He was a long-time active member who regularly attended and readily participated in the annual conference. In addition, Darrell was an enthusiastic and dedicated volunteer in the Dearborn, Michigan political community, and was generally known as a kind-hearted and generous soul. 

The IASSIST community hopes that the scholarship will help remember Darrell’s contributions and help encourage others to follow his example.  He will be greatly missed.


Digital Scholarship Librarian Opening at Kansas City Fed


Please see the announcement below for a new position created here in the Center for the Advancement of Data and Research in Economics at the Federal Reserve Bank of Kansas City .  It will be part of the content support team that works with our technology specialists to support, enhance and advance data or computationally intensive research in economics.   This position reports to me and has supervisory responsibility for our library staff.  Please let me know if I can answer any questions.

San Cannon

To apply:

Digital scholarship Librarian

This new position will manage a team of three library staff as part of the Center for the Advancement for Data and Research in Economics (CADRE). In addition to continuing to provide reference services, strategic collection development, and other information services, this team will spearhead new research support initiatives such as curation of research data, promotion of Bank research and analytical output, and educating the research staff on issues related to intellectual property and emerging measures and metrics for assessing the scholarly impact of publications and other forms of scholarly expression. Specific responsibilities for the position include:

  • Develop and implement data management, data curation, and data publication and access initiatives, working closely with the technology staff, researchers, and others. 
  • Lead the library and the Bank in developing a vision for policies, strategies, programs, and staffing that supports and advances scholarly expression by researchers across the variety of business lines engaged in such work. Work with Board and System library and research staff to fully account for potential partnerships and shared services, leveraging existing relationships and collaborations where available and appropriate and suggesting new connections or associations where necessary.
  • Work closely with Legal, Public Affairs, Information Security, and others to develop mechanisms that can enable the dissemination of research products such as computer code and data and help address specific questions regarding intellectual property and licensing, as well as related questions with confident and reliable answers.
  • Actively and purposefully manage current library staff to ensure full engagement and appropriate application of talents and resources while making measurable strides towards the strategic vision of operating as a vital and vibrant organization focused on connecting users with information and data to support activities along the entire span of the research lifecycle .

The successful candidate is expected to be active professionally and to contribute to developments in the field.


Required: ALA-accredited graduate degree or accredited graduate degree in an appropriate discipline. At least 5 years of professional experience with at least 3 years of a progressively growing portfolio in leadership and administration of research libraries.

Preferred: Demonstrated and broad knowledge and expertise regarding scholarly communications and scholarly publishing issues. Deep understanding of scholarly and research enterprise at research universities or other research oriented institutions. Comprehensive knowledge and understanding of the evolving digital context for scholarship, research, teaching, and creative expression. Wide-ranging knowledge and understanding of scholarly publishing initiatives in research libraries, including technology platforms for publishing, staffing options, and business models. Experience with new information technologies. Strong interpersonal, collaboration, and teambuilding skills. Impeccable presentation, written, and oral communication skills.

 To apply:

In search of: Best practice for code repositories?

I was asked by a colleague about organized efforts within the economics community to develop or support repositories of code for research.  Her experience was with the astrophysics world which apparently has several and she was wondering what could be learned from another academic community.  So I asked a non-random sample of technical economists with whom I work, and then expanded the question to cover all of social sciences and posed the question to the IASSIST community. 

In a nutshell, the answer seems to be “nope, nothing organized across the profession” – even with the profession very broadly defined.  The general consensus for both the economics world and the more general social science community was that there was some chaos mixed with a little schizophrenia. I was told there are there are instances of such repositories, but they were described to me as “isolated attempts” such as this one by Volker Wieland:  Some folks mentioned repositories that were package or language based such as R modules or SAS code from the SAS-L list or online at

Many people pointed out that there are more repositories being associated with journals so that authors can (or are required to) submit their data and code when submitting a paper for publication. Several responses touched on this issue of replication, which is the impetus for most journal requirements, including one that pointed out a “replication archive” at Yale (  I was also pointed to an interested paper that questions whether such archives promote replicable research ( but that’s a discussion for another post.

By far, the most common reference I received was for the repositories associated with RePEc (Research Papers in Economics) which offers a broad range of services to the economic research community.  There you’ll find the IDEAS site ( and the QM&RBC site with code for Dynamic General Equilibrium models ( both run by the St. Louis Fed.

I also heard from support folks who had tried to build a code repository for their departments and were disappointed by the lack of enthusiasm for the project. The general consensus is that economists would love to leverage other people’s code but don’t want to give away their proprietary models.  They should know there is no such thing as a free lunch! 

 I did hear that project specific repositories were found to be useful but I think of those as collaboration tools rather than a dissemination platform.  That said, one economist did end his email to me with the following plea:  “lots of authors provide code on their websites, but there is no authoritative host. Will you start one please?”


Council of Professional Associations on Federal Statistics (COPAFS) meeting notes

I was lucky enough to be able to sit in on the most recent COPAFS meeting in place of our regular liaison Judith Rowe.  While the topics were very different than the issues I usually deal with at work, I found the presentations really interesting. Here's an abridged version of my notes.



Ed Spar will be stepping down as Executive Director at the end of 2012.  The board will be launching a search and will be engaging a search firm.

Director's update:

The budgetary situation is grim to worse and outlook isn't any better. Every agency will wish they had last years budget. Census numbers reflect a very bad year coming up. The meeting dates for next year are: March 16, June 1, Sept 14, December 7.

Update on National Center for Education Statistics (NCES)- Marilyn Seastrom
NCES is the statistical agency within Dept of Education.  They have a small staff but lots of contractors and may be lucky enough to be level funded next year.

Assessment: it was the busiest year in the history of national assessment.  They are ready to release the state mapping report.  This compares assessment measures across states - map state assessments to National Assessment of Educational Progress (NAEP). For example, there is only one state (MA) where a 4th grader who is deemed is proficient on the state exam is proficient on the national level.  There are many states where they are proficient at the state level but they don't even make the "basic" cut for the national assessment. The are also ready to Release the Reading and Mathematics report card

Elementary and Secondary update: They've done an expansion of NCES Geo-mapping application which works with the ACS to provide data by school district boundaries.

Miscellaneous: there's a new OECD adult literacy study (PIAAC - first international assessment done on laptops in the home) and the national household education survey (what goes on outside of school) is no longer random digit dial sample due to deterioration in response rates, now address based sample (mail) .
There's new stuff on the horizon:  a middle school study, NAEP-TIMSS (Trends in International Mathematics and Science Study) link which will be an ambitious study using 8th grade level achievement in math and science.

American Demographic History: Campbell Gibson (demographer retired from Census)
Website of demographic history :
Developed over a few years with David Kennedy and Herbert Kline (Stanford) - about 130 graphics through 2000 for both state and national charts which are freely available and can be downloaded.
Source:  all decennial census - some drawn from compendia of ipums files.
He showed a variety of slides - all of which are available on the website and most of which were fascinating.  Can you guess the changes in the set of the top five languages spoken in the home of non-US born residents?

Rural Statistical Areas: Mike Radcliffe, Geography Division, Census
The presentation described a three year joint research project with 23 states. The goal was to define Rural Statistical areas - geographic areas defined using counties, county subdivisions and census tracts a building blocks. The goal was to be able to tabulate ACS 1 year estimates for areas of 65K+ people. These areas would be based on rural focus - not like pumas which used 100K but mostly urban areas.  They started with most rural parts and build from there - urban is really the residual.

RSA delineation process - counties with 65K+ would be standalone RSAs if rural focus.  Used the urban influence codes (UIC from USDA) to get to "ruralness" and grouped counties with some boundary tweaks made by State Data Center Steering committee. He showed maps of UIC ratings then discussed how to aggregate counties:  they created an aggregation net using state boundaries, interstate highways and rivers to create a lattice work to think about how to group counties.  They started with UIC category 12 and aggregated up by county until you hit the 65K+ measure. It's an imperfect measure and there were some problems with adjacent county differences and sometimes had to sacrifice resolution.

The resulting definitions for RSAs by state were sent to the state and they were able to move things around a bit to help smooth out some of the initial classification imperfections. Some states suggested alternative definitions; for example, Vermont wanted to use their planning regions.

Questions on the table:

  • Should RSAs be contiguous? Census has a preference for yes but states disagree - eg Alabama might have similar demographics between north and south counties that would match better for an RSA than using geography.
  • Can a variety of building blocks be used to form RSAs?  Initial proposal was counties but they may not be the best units to start with.  States found that in some cases sub-county divisions or census tracts worked better.
  • Why not cross state lines?  Makes sense for some questions but State data centers need to address rural areas withing their states?
  • Should counties of 65K+ be split into multiple areas?

Next steps:
State data centers have asked Census to define these as statistical areas but Census has said that in some cases (like Los Angeles) you just can't call them rural.  What do you call them? The project needs to get wider review including public comment through a Federal Register notice.

Research on measuring same sex couples - Nancy Bates - Census
Motivation: definition of marriage has changed; new terms and different state recognition and no federal recognition of same sex couples. According to 2008 ACS, there are about 150,000 self described same-sex married couples but only around 32,000 same-sex legally married couples.

Possible causes:

  • Classification error:  maybe people think of themselves as married even if they aren't.
  • First response: on ACS the husband/wife category is first in list but unmarried partner is 13th
  • Errors elsewhere: false positives due to incorrect gender response

Research: some based on focus groups - 18 groups in 8 different areas with different legal recognition of same sex marriage.  Mostly gay couples but some unmarried straight couples.  Most people interpreted the question on federal form as indicating "legal status".  Some thought it meant "legally married anywhere".  Many groups noted they were missing categories for civil unions or domestic partnerships. And there is the "function equivalence" problem that couples had the equivalent of a marriage but no where to put themselves.

Research: some based on cognitive interviews - 40 interviews both gays and straights across different legal jurisdictions. Participants filled out forms then were debriefed afterwards and showed alternative form and asked for preference.
Results: most survey results aligned with "true" legal status.  Specifically calling out same sex or opposite sex in the marital status question was preferred but also was flagged as potentially sensitive. Would this delineation increase unit non-response? Also, there was some confusion about defintion of civil union/domestic partnership.  Most people found it useful to have a cohabitation question.
Next steps:  interagency group review, piggyback on an ACS test for a larger trial which is mail only and they need to test in other modes and would love to be able to have a re-interview component.

Research on measuring same sex couples - Martin O'Donnell - showing some data
Showed a comparison of ACS data and census stuff - but comparability may not be perfect.
Changes in ACS forms and editing caused a drop of self reported same sex spouses from 350K+ to 150K+.

2010 Census results showed much higher level of same sex households than the 2010 ACS.  There was a huge difference between mail forms and non-mail forms.  Approximately 3 times as many households reported themselves as same sex households in mail forms as non-mail forms for ACS where the non-mail were nonresponse follow up (NRFU). On the pre2008 ACS and 2010 Census NRFU form, the matrix format for the form didn't yield consistent results.  ACS 2008+ and 2010 Census form had a person based column format which had much more consistent responses.  This is truly non-sampling error for populations: you only need 4 errors per 1000 of opposite sex households to generate the 250K+ error in the same sex spouses because there are 60 million of them.

Problem: bad matrix form was approved and printed before these results where available. Now short form data wave 1 is published including one table with one table about same sex couples but they can't stop the processing of the entire 2010 Census to allow for the correction of one table. Now how do they fix it?

They tested the quality of the reporting on sex.  Used name index to match the probability that a person has a name associated with a male (John or Thomas has very high index, Virginia or Elizabeth is very low) with state controls for cultural differences (Jean may be more likely to be a male in French areas).  Index value of 0-50 were likely to be female and those with 950-1000 were likely to be male.  Couples with a female partner with a name at the highest index value or a male partner with a name at the lowest index value where then considered to have incorrectly marked the sex item on the question and they were dropped from the same sex couples category. Ex: 9000 male-male couples in Texas out of 31,000 have names that indicate they are probably male-female couples - nearly one third of the same sex marriage stats in American Factfinder may be incorrect.  

Geographic distribution with inconsistent name reporting: swath from Florida north west to ND - matches high rate of NRFU forms.
Summary: They reissued the numbers which matched the 2010 ACS better once the name mismatched folks where thrown out. Spousal household estimate is most improved. American Factfinder page shows people where to go to get preferred estimate. Census PUMS is based on edited data.  They aren't recalculating the entire Census data but they are published the edit data and there will be a flag on data that are affected.

IASSIST Quarterly 32 available on-line

The IASSIST Quarterly (IQ) volume 32 2008 contains a collection of the 1, 2, 3, and 4 issues into a single issue for 2008. more...

IASSIST 2009 Tweets!


We've created a Twitter feed for conference info, updates and impressions. See it at or  there's a link on the program page.

Before the conference, it will be mostly logistic and planning information.  At the conference, IASSISTers will be tweeting about the conference itself:  comments, suggestions, updates, and other twitter-friendly information. We've got a few volunteer tweeters but more are always welcome.

And so ends "The Best IASSIST ever"

And another conference has passed. It's so sad to think that it will be another year before we get together again but at least we can play virtually on our lists, this blog and maybe even in SecondLife! I'm still trying to round up more conference reports but in the meantime, here's the official conference song lyrics. Sing to the tune of The Band's song "The Weight" more...

Ready, Set, Go! IASSIST08 is in two worlds!

So I had the best intentions to blog the conference but alas I was distracted - creating my SecondLife avatar to be able to participate virtually as well as in reality was far more interesting than I want to admit!  My teenagers were appalled when they found out I was on Facebook - what will they say when I tell them about SusieQue!


"Data files should contain data."

For those tech-types who do their own data munging, here's a rant from Mark Dominus, a Perl programming wizard who was briefly stymied by trying to process a large data file from Census. As we face these issues daily in my office, I thought I'd share the frustration!

Of course, he doesn't mention where he thinks metadata "should" go but I have a pretty good idea what he would suggest.... ;-)

Connecting the Real to the Representational: Historical Demographic Data in the Town of Pullman, 1880-1940

by Andrew H. Bullen

The Pullman House History Project is a part of the Pullman State Historic Site’s virtual museum and web site ( which links together census, city directory, and telephone directory information to describe the people who lived in the town of Pullman, Illinois between 1881 and 1940. This demographic data is linked through a database/XML record system to online maps and Perl programs that allow the data to be represented in various useful combinations.
  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect


  • Resources


    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...