Already a member?

Sign In
Syndicate content

Blogs

IASSIST 2012 - Conference website

The IASSIST 2012 conference website is now live and ready to receive submissions:  http://www.iassist2012.org/index.html

Call for Papers

Data Science for a Connected World: Unlocking and Harnessing the Power of Information

The theme of this year's conference is Data Science for a Connected World: Unlocking and Harnessing the Power of Information. This theme reflects the growing desire of research communities, government agencies and other organizations to build connections and benefit from the better use of data through practicing good management, dissemination and preservation techniques.

The theme is intended to stimulate discussions on building connections across all scholarly disciplines, governments, organizations, and individuals who are engaged in working with data.  IASSIST as a professional organization has a long history of bringing together those who provide information technology and data services to support research and teaching in the social sciences.  What can we as data professionals with shared interests and concerns learn from others going forward and what can they learn from us?  How can data professionals of all kinds build the connections that will be needed to address shared concerns and leverage strengths to better manage, share, curate and preserve data?

We welcome submissions on the theme outlined above, and encourage conference participants to propose papers and sessions that would be of interest to a diverse audience. Any paper related to the conference theme will be considered; below is a sample of possible topics

Topics:  
  • Innovative/disruptive technologies for data management and preservation
  • Infrastructures, tools and resources for data production and research
  • Linked data: opportunities and challenges
  • Metadata standards enhancing the utility of data
  • Challenges and concerns with inter-agency / intra-governmental data sharing
  • Privacy, confidentiality and regulation issues around sensitive data
  • Roles, responsibilities, and relationships in supporting data
  • Facilitating data exchange and sharing across boundaries
  • Data and statistical literacy
  • Data management plans and funding agency requirements
  • Norms and cultures of data in the sciences, social sciences and the humanities
  • Collaboration on research data infrastructure across domains and communities
  • Addressing the digital/statistical divide and the need for trans-national outreach

Papers will be selected from a wide range of subjects to ensure a broad balance of topics.

The Program Committee welcomes proposals for:
- Individual presentations (typically 15-20 minutes)
- Complete sessions, which could take a variety of formats (e.g. a set of three to four individual presentations on a theme, a discussion panel, a discussion with the audience, etc.)
- Posters/demonstrations for the poster session
- Pecha Kucha (a presentation of 20 slides shown for 20 seconds each, heavy emphasis on visual content) http://www.wired.com/techbiz/media/magazine/15-09/st_pechakucha
- Round table discussions (as these are likely to have limited spaces, an explanation of how the discussion will be shared with the wider group should form part of the proposal).
[Note: A separate call for workshops is forthcoming].

Session formats are not limited to the ideas above and session organizers are welcome to suggest other formats.

Proposals for complete sessions should list the organizer or moderator and possible participants; the session organizer will be responsible for securing both session participants and a chair.

All submissions should include the proposed title and an abstract no longer than 200 words (note: longer abstracts will be returned to be shortened before being considered).  Abstracts submitted for complete sessions should provide titles and a brief description for each of the individual presentations.  Abstracts for complete session proposals should be no longer than 300 words if information about individual presentations are needed. 

Please note that all presenters are required to register and pay the registration fee for the conference; registration for individual days will be available.

  • Deadline for submission of individual presentations and sessions: 9 December 2011.
  • Deadline for submission of posters, Pecha Kucha sessions and round table discussions: 16 January 2012.
  • Notification of acceptance for individual presentations and sessions: 10 February 2012.
  • Notification of acceptance for posters, Pecha Kucha sessions and round table discussions: 2 March 2012.

We would want to receive confirmation of acceptance from those we invite to present by two weeks after notification.

Open Access to Federally Funded Research

Got something to say about "ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research"?

 

The White House Office for Science and Technology Policy (OSTP) released two public consultations today, one on OA for data and one on OA for publications arising from publicly-funded research. Responses are due in early January. Please spread the word. Submit your own comments and/or work with colleagues to submit comments on behalf of your institution.

(1) "[T]his Request for Information (RFI) offers the opportunity for interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and encouraging broad public access to unclassified digital data that result from federally funded scientific research....Response Date: January 12, 2012...."
http://goo.gl/L1jn3

(2) "[T]his Request for Information (RFI) offers the opportunity for interested individuals and organizations to provide recommendations on approaches for ensuring long-term stewardship and broad public access to the peer-reviewed scholarly publications that result from federally funded scientific research....Response Date: January 2, 2012...."
http://goo.gl/vTP18

IASSIST Latin Engagement Action Group

The Latin Engagement Action Group have come up with a number of outreach activities aimed at supporting data professionals from Spanish and Portuguese speaking educational institutions, namely:

1. Research Data Management Webinars (complete with IASSIST contribution) for Spanish/Portuguese data specialists (http://www.recolecta.net/buscador/webminars.jsp)

Stuart Macdonald and Luis Martínez-Uribe in collaboration with Alicia López Medina (UNED, Spain), the Spanish Agency of Science and Technology (FECYT) and the network of Spanish repositories RECOLECTA have organised a programme of webinars in 3 strands starting in October to discuss RDM issues:

Strand 1 is dedicated to Research Data Management Strategy (presentations from FECYT, RedIris, Simon Hodson (JISC Managing Research Data (MRD) Programme Manager)

Strand 2 - RDM Tools and models (presentations from Sarah Jones on DAF/DMP online (DCC) and Stuart Macdonald (EDINA) on IASSIST Latin Engagement, RDM at Edinburgh & Research Data MANTRA 

Strand 3 - Research Data Management Experiences (presentations from Kate McNeil-Harmen (MIT) , Luis Martinez Uribe (Institute Juan March), colleagues from University of Porto

Several members of IASSIST have been invited and the work of the group will be presented in order to keep promoting the organization to colleagues in Spain, Portugal and Latin-America.

2. Preparation of a Latin-American session in next IASSIST annual conference in collaboration with outreach committee

Organise another Latin-American session at IASSIST 2012 (complete with NGO representation) led by Bobray Bordelan (Princeton). Liaise with the outreach to fund and invite data professional colleagues from Latin America to participate in this session.

3. Spanish and Portuguese translation of the main pages of the IASSIST site - May 2012

Working with the IASSIST web editor Robin Rice to scope and implement (voluntary) translation of the main landing pages on the IASSIST website (e.g. Home page, About page, Becoming a member if IASSIST, FAQ, IASSIST at a Glance, About IQ, Instruction for Authors)

Image: Toledo by Pat Barker on Flickr, CC-BY-NC licence

86 helpful tools for the data professional PLUS 45 bonus tools

I have been working on this (mostly) annotated collection of tools and articles that I believe would be of help to both the data dabbler and professional. If you are a data scientist, data analyst or data dummy, chances are there is something in here for you. I included a list of tools, such as programming languages and web-based utilities, data mining resources, some prominent organizations in the field, repositories where you can play with data, events you may want to attend and important articles you should take a look at.

The second segment (BONUS!) of the list includes a number of art and design resources the infographic designers might like including color palette generators and image searches. There are also some invisible web resources (if you're looking for something data-related on Google and not finding it) and metadata resources so you can appropriately curate your data. This is in no way a complete list so please contact me here with any suggestions!

Data Tools

  1. Google Refine - A power tool for working with messy data (formerly Freebase Gridworks)
  2. The Overview Project - Overview is an open-source tool to help journalists find stories in large amounts of data, by cleaning, visualizing and interactively exploring large document and data sets. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.
  3. Refine, reuse and request data | ScraperWiki - ScraperWiki is an online tool to make acquiring useful data simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.
  4. Data Curation Profiles - This website is an environment where academic librarians of all kinds, special librarians at research facilities, archivists involved in the preservation of digital data, and those who support digital repositories can find help, support and camaraderie in exploring avenues to learn more about working with research data and the use of the Data Curation Profiles Tool.
  5. Google Chart Tools - Google Chart Tools provide a perfect way to visualize data on your website. From simple line charts to complex hierarchical tree maps, the chart galley provides a large number of well-designed chart types. Populating your data is easy using the provided client- and server-side tools.
  6. 22 free tools for data visualization and analysis
  7. The R Journal - The R Journal is the refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that might be of interest to users or developers of R.
  8. CS 229: Machine Learning - A widely referenced course by Professor Andrew Ng, CS 229: Machine Learning provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.
  9. Google Research Publication: BigTable - Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
  10. Scientific Data Management - An introduction.
  11. Natural Language Toolkit - Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
  12. Beautiful Soup - Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping.
  13. Mondrian: Pentaho Analysis - Pentaho Open source analysis OLAP server written in Java. Enabling interactive analysis of very large datasets stored in SQL databases without writing SQL.
  14. The Comprehensive R Archive Network - R is `GNU S', a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Please consult the R project homepage for further information. CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Please use the CRAN mirror nearest to you to minimize network load.
  15. DataStax - Software, support, and training for Apache Cassandra.
  16. Machine Learning Demos
  17. Visual.ly - Infographics & Visualizations. Create, Share, Explore
  18. Google Fusion Tables - Google Fusion Tables is a modern data management and publishing web application that makes it easy to host, manage, collaborate on, visualize, and publish data tables online.
  19. Tableau Software - Fast Analytics and Rapid-fire Business Intelligence from Tableau Software.
  20. WaveMaker - WaveMaker is a rapid application development environment for building, maintaining and modernizing business-critical Web 2.0 applications.
  21. Visualization: Annotated Time Line - Google Chart Tools - Google Code An interactive time series line chart with optional annotations. The chart is rendered within the browser using Flash.
  22. Visualization: Motion Chart - Google Chart Tools - Google Code A dynamic chart to explore several indicators over time. The chart is rendered within the browser using Flash.
  23. PhotoStats Create gorgeous infographics about your iPhone photos, with Photostats.
  24. Ionz Ionz will help you craft an infographic about yourself.
  25. chart builder Powerful tools for creating a variety of charts for online display.
  26. Creately Online diagramming and design.
  27. Pixlr Editor A powerful online photo editor.
  28. Google Public Data Explorer ?The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your own comparisons, and share your findings.
  29. Fathom Fathom Information Design helps clients understand and express complex data through information graphics, interactive tools, and software for installations, the web, and mobile devices. Led by Ben Fry. Enough said!
  30. healthymagination | GE Data Visualization Visualizations that advance the conversation about issues that shape our lives, and so we encourage visitors to download, post and share these visualizations.
  31. ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.
  32. Protovis Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layoutsto simplify construction.Protovis is free and open-source, provided under the BSD License. It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)! Although programming experience is helpful, Protovis is mostly declarative and designed to be learned by example.
  33. d3.js D3.js is a small, free JavaScript library for manipulating documents based on data.
  34. MATLAB - The Language Of Technical Computing MATLAB® is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran.
  35. OpenGL - The Industry Standard for High Performance Graphics OpenGL.org is a vendor-independent and organization-independent web site that acts as one-stop hub for developers and consumers for all OpenGL news and development resources. It has a very large and continually expanding developer and end-user community that is very active and vested in the continued growth of OpenGL.
  36. Google Correlate Google Correlate finds search patterns which correspond with real-world trends.
  37. Revolution Analytics - Commercial Software & Support for the R Statistics Language Revolution Analytics delivers advanced analytics software at half the cost of existing solutions. By building on open source R—the world’s most powerful statistics software—with innovations in big data analysis, integration and user experience, Revolution Analytics meets the demands and requirements of modern data-driven businesses.
  38. 22 Useful Online Chart & Graph Generators
  39. The Best Tools for Visualization Visualization is a technique to graphically represent sets of data. When data is large or abstract, visualization can help make the data easier to read or understand. There are visualization tools for search, music, networks, online communities, and almost anything else you can think of. Whether you want a desktop application or a web-based tool, there are many specific tools are available on the web that let you visualize all kinds of data.
  40. Visual Understanding Environment The Visual Understanding Environment (VUE) is an Open Source project based at Tufts University. The VUE project is focused on creating flexible tools for managing and integrating digital resources in support of teaching, learning and research. VUE provides a flexible visual environment for structuring, presenting, and sharing digital information.
  41. Bime - Cloud Business Intelligence | Analytics & Dashboards Bime is a revolutionary approach to data analysis and dashboarding. It allows you to analyze your data through interactive data visualizations and create stunning dashboards from the Web.
  42. Data Science Toolkit A collection of data tools and open APIs curated by our own Pete Warden. You can use it to extract text from a document, learn the political leanings of a particular neighborhood, find all the names of people mentioned in a text and more.
  43. BuzzData BuzzData lets you share your data in a smarter, easier way. Instead of juggling versions and overwriting files, use BuzzData and enjoy a social network designed for data.
  44. SAP - SAP Crystal Solutions: Simple, Affordable, and Open BI Tools for Everyday Use
  45. Project Voldemort
  46. ggplot. had.co.nz

Data Mining

  1. Weka -nWeka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU General Public License.
  2. PSPP- PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions. The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.
  3. Rapid I- Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, i.e. for large amounts of structured data like database systems and unstructured data like texts. The open-source data mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.The main product of Rapid-I, the data analysis solution RapidMiner is the world-leading open-source system for knowledge discovery and data mining. It is available as a stand-alone application for data analysis and as a data mining engine which can be integrated into own products. By now, thousands of applications of RapidMiner in more than 30 countries give their users a competitive edge. Among the users are well-known companies as Ford, Honda, Nokia, Miele, Philips, IBM, HP, Cisco, Merrill Lynch, BNP Paribas, Bank of America, mobilkom austria, Akzo Nobel, Aureus Pharma, PharmaDM, Cyprotex, Celera, Revere, LexisNexis, Mitre and many medium-sized businesses benefitting from the open-source business model of Rapid-I.
  4. R Project - R is a language and environment for statistical computing and graphics. It is a GNU projectwhich is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

Organizations

  1. Data.gov
  2. SDM group at LBNL
  3. Open Archives Initiative
  4. Code for America | A New Kind of Public Service
  5. The # DataViz Daily
  6. Institute for Advanced Analytics | North Carolina State University | Professor Michael Rappa · MSA Curriculum
  7. BuzzData | Blog, 25 great links for data-lovin' journalists
  8. MetaOptimize - Home - Machine learning, natural language processing, predictive analytics, business intelligence, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization
  9. had.co.nz
  10. Measuring Measures - Measuring Measures

Repositories

  1. Repositories | DataCite
  2. Data | The World Bank
  3. Infochimps Data Marketplace + Commons: Download Sell or Share Databases, statistics, datasets for free | Infochimps
  4. Factual Home - Factual
  5. Flowing Media: Your Data Has Something To Say
  6. Chartsbin
  7. Public Data Explorer
  8. StatPlanet
  9. ManyEyes
  10. 25+ more ways to bring data into R

Events

  1. Welcome | Visweek 2011
  2. O'Reilly Strata: O'Reilly Conferences
  3. IBM Information On Demand 2011 and Business Analytics Forum
  4. Data Scientist Summit 2011
  5. IBM Virtual Performance 2011
  6. Wolfram Data Summit 2011—Conference on Data Repositories and Ideas
  7. Big Data Analytics: Mobile, Social and Web

Articles

  1. Data Science: a literature review | (R news & tutorials)
  2. What is "Data Science" Anyway?
  3. Hal Varian on how the Web challenges managers - McKinsey Quarterly - Strategy - Innovation
  4. The Three Sexy Skills of Data Geeks « Dataspora
  5. Rise of the Data Scientist
  6. dataists » A Taxonomy of Data Science
  7. The Data Science Venn Diagram « Zero Intelligence Agents
  8. Revolutions: Growth in data-related jobs
  9. Building data startups: Fast, big, and focused - O'Reilly Radar

BONUS! Art Design

  1. Periodic Table of Typefaces
  2. Color Scheme Designer 3
  3. Color Palette Generator Generate A Color Palette For Any Image
  4. COLOURlovers
  5. Colorbrewer: Color Advice for Maps

Image Searches

  1. American Memory from the Library of Congress The home page for the American Memory Historical Collections from the Library of Congress. American Memory provides free access to historical images, maps, sound recordings, and motion pictures that document the American experience. American Memory offers primary source materials that chronicle historical events, people, places, and ideas that continue to shape America.
  2. Galaxy of Images | Smithsonian Institution Libraries
  3. Flickr Search
  4. 50 Websites For Free Vector Images Download
  5. Design weblog for designers, bloggers and tech users. Covering useful tools, tutorials, tips and inspirational photos.
  6. Images Google Images. The most comprehensive image search on the web.
  7. Trade Literature - a set on Flickr
  8. Compfight / A Flickr Search Tool
  9. morgueFile free photos for creatives by creatives
  10. stock.xchng - the leading free stock photography site
  11. The Ultimate Collection Of Free Vector Packs - Smashing Magazine
  12. How to Create Animated GIFs Using Photoshop CS3 - wikiHow
  13. IAN Symbol Libraries (Free Vector Symbols and Icons) - Integration and Application Network
  14. Usability.gov
  15. best icons
  16. Iconspedia
  17. IconFinder
  18. IconSeeker

Invisible Web

  1. 10 Search Engines to Explore the Invisible Web Like the header says...
  2. Scirus - for scientific information The most comprehensive scientific research tool on the web. With over 410 million scientific items indexed at last count, it allows researchers to search for not only journal content but also scientists' homepages, courseware, pre-print server material, patents and institutional repository and website information.
  3. TechXtra: Engineering, Mathematics, and Computing TechXtra is a free service which can help you find articles, books, the best websites, the latest industry news, job announcements, technical reports, technical data, full text eprints, the latest research, thesis & dissertations, teaching and learning resources and more, in engineering, mathematics and computing.
  4. Welcome to INFOMINE: Scholarly Internet Resource Collections INFOMINE is a virtual library of Internet resources relevant to faculty, students, and research staff at the university level. It contains useful Internet resources such as databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and many other types of information.
  5. The WWW Virtual Library The WWW Virtual Library (VL) is the oldest catalogue of the Web, started by Tim Berners-Lee, the creator of HTML and of the Web itself, in 1991 at CERN in Geneva. Unlike commercial catalogues, it is run by a loose confederation of volunteers, who compile pages of key links for particular areas in which they are expert; even though it isn't the biggest index of the Web, the VL pages are widely recognised as being amongst the highest-quality guides to particular sections of the Web.
  6. Intute Intute is a free online service that helps you to find web resources for your studies and research. With millions of resources available on the Internet, it can be difficult to find useful material. We have reviewed and evaluated thousands of resources to help you choose key websites in your subject. The Virtual Training Suite can also help you develop your Internet research skills through tutorials written by lecturers and librarians from universities across the UK.
  7. CompletePlanet - Discover over 70,000+ databases and specially search engines There are hundreds of thousands of databases that contain Deep Web content. CompletePlanet is the front door to these Deep Web databases on the Web and to the thousands of regular search engines — it is the first step in trying to find highly topical information. By tracing through CompletePlanet's subject structure or searching Deep Web sites, you can go to various topic areas, such as energy or agriculture or food or medicine, and find rich content sites not accessible using conventional search engines. BrightPlanet initially developed the CompletePlanet compilation to identify and tap into many hundreds and thousands of search sources simultaneously to automatically deliver high-quality content to its corporate and enterprise customers. It then decided to make CompletePlanet available as a public service to the Internet search public.
  8. Infoplease: Encyclopedia, Almanac, Atlas, Biographies, Dictionary, Thesaurus. Information Please has been providing authoritative answers to all kinds of factual questions since 1938—first as a popular radio quiz show, then starting in 1947 as an annual almanac, and since 1998 on the Internet at www.infoplease.com. Many things have changed since 1938, but not our dedication to providing reliable information, in a way that engages and entertains.
  9. DeepPeep: discover the hidden web DeepPeep is a search engine specialized in Web forms. The current beta version currently tracks 45,000 forms across 7 domains. DeepPeep helps you discover the entry points to content in Deep Web (aka Hidden Web) sites, including online databases and Web services. Advanced search allows you to perform more specific queries. Besides specifying keywords, you can also search for specific form element labels, i.e., the description of the form attributes.
  10. IncyWincy: The Invisible Web Search Engine IncyWincy is a showcase of Net Research Server (NRS) 5.0, a software product that provides a complete search portal solution, developed by LoopIP LLC. LoopIP licenses the NRS engine and provides consulting expertise in building search solutions.

Metadata

  1. Description Schema: MODS (Library of Congress) and Outline of elements and attributes in MODS version 3.4: MetadataObject This document contains a listing of elements and their related attributes in MODS Version 3.4 with values or value sources where applicable. It is an "outline" of the schema. Items highlighted in red indicate changes made to MODS in Version 3.4.All top-level elements and all attributes are optional, but you must have at least one element. Subelements are optional, although in some cases you may not have empty containers. Attributes are not in a mandated sequence and not repeatable (per XML rules). "Ordered" below means the subelements must occur in the order given. Elements are repeatable unless otherwise noted."Authority" attributes are either followed by codes for authority lists (e.g., iso639-2b) or "see" references that link to documents that contain codes for identifying authority lists.For additional information about any MODS elements (version 3.4 elements will be added soon), please see the MODS User Guidelines.
  2. wiki.dbpedia.org : About DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. We hope this will make it easier for the amazing amount of information in Wikipedia to be used in new and interesting ways, and that it might inspire new mechanisms for navigating, linking and improving the encyclopaedia itself.
  3. Semantic Web - W3C In addition to the classic “Web of documents” W3C is helping to build a technology stack to support a “Web of data,” the sort of data you find in databases. The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term “Semantic Web” refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.
  4. RDA: Resource Description & Access | www.rdatoolkit.org Designed for the digital world and an expanding universe of metadata users, RDA: Resource Description and Access is the new, unified cataloging standard. The online RDA Toolkit subscription is the most effective way to interact with the new standard. More on RDA.
  5. Cataloging Cultural Objects Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) is a manual for describing, documenting, and cataloging cultural works and their visual surrogates. The primary focus of CCO is art and architecture, including but not limited to paintings, sculpture, prints, manuscripts, photographs, built works, installations, and other visual media. CCO also covers many other types of cultural works, including archaeological sites, artifacts, and functional objects from the realm of material culture.
  6. Library of Congress Authorities (Search for Name, Subject, Title and Name/Title) Using Library of Congress Authorities, you can browse and view authority headings for Subject, Name, Title and Name/Title combinations; and download authority records in MARC format for use in a local library system. This service is offered free of charge.
  7. Search Tools and Databases (Getty Research Institute) Use these search tools to access library materials, specialized databases, and other digital resources.
  8. Art & Architecture Thesaurus (Getty Research Institute) Learn about the purpose, scope and structure of the AAT. The AAT is an evolving vocabulary, growing and changing thanks to contributions from Getty projects and other institutions. Find out more about the AAT's contributors.
  9. Getty Thesaurus of Geographic Names (Getty Research Institute) Learn about the purpose, scope and structure of the TGN. The TGN is an evolving vocabulary, growing and changing thanks to contributions from Getty projects and other institutions. Find out more about the TGN's contributors.
  10. DCMI Metadata Terms
  11. The Digital Object Identifier System
  12. The Federal Geographic Data Committee — Federal Geographic Data Committee

Workshop: Building a Culture of Research Data Citation

Building a Culture of Research Data Citation

Workshop at eResearch Australasia 2011

Thursday 10 November 2011: 9:00-12:30

http://conference.eresearch.edu.au/workshops/#8

The Australian National Data Service (ANDS) is currently developing a service called "Cite My Data" [1], which uses the international DataCite infrastructure [2] to support the citation of Australian research sector datasets.  The DataCite infrastructure is built on the Digital Object Identifier (DOI) system [3]--widely used for citation and tracking of scholarly publications.  The ANDS Cite My Data service will allow Australian research data publishers and users to uniquely identify research data and cite data from publications or other datasets [4].

ANDS is hosting a workshop at the eResearch Australasia 2011 conference based around data citation and the Cite My Data service. The workshop is designed for data publishers and users in the research sector who need to gain a deeper understanding of the issues and technologies around data citation and the ways data citation can be supported at their organizations.

Highlights of the workshop include:

- An overview of the DataCite initiative from Jan Brase, Managing Agent of DataCite

- An overview of ANDS services related to data citation and tracking

- A practical look at the ANDS Cite My Data service

- Experience reports from institutions currently implementing data citation policies

- Opportunity for question and answers with key ANDS and DataCite representatives

The workshop will be held on Thursday 10 November 2011 (during the conference workshop days) from 9:00-12:30. Conference registration is available at http://conference.eresearch.edu.au/registration/.

IASSIST 2012 - Call for Papers

Topic:

The Call for Papers for IASSIST 2012 is below.  The conference website is still under development, but we thought it important to disseminate the call for papers to IASSIST membership now.  The Call will be reposted and further disseminated once the conference website is up and running.  Thanks.

 

2012 Conference Program Chairs

-Pascal, Oliver and Jake   

 

===========================

Data Science for a Connected World: Unlocking and Harnessing the Power of Information

The 38th International Association for Social Science Information Services

and Technology (IASSIST) annual conference will be held in Washington DC, June 4 - 8, 2012.

 

The theme of this year's conference is Data Science for a Connected World: Unlocking and Harnessing the Power of Information. This theme reflects the growing desire of research communities, government agencies and other organizations to build connections and benefit from the better use of data through practicing good management, dissemination and preservation techniques.

The theme is intended to stimulate discussions on building connections across all scholarly disciplines, governments, organizations, and individuals who are engaged in working with data.  IASSIST as a professional organization has a long history of bringing together those who provide information technology and data services to support research and teaching in the social sciences.  What can we as data professionals with shared interests and concerns learn from others going forward and what can they learn from us?  How can data professionals of all kinds build the connections that will be needed to address shared concerns and leverage strengths to better manage, share, curate and preserve data? 

We welcome submissions on the theme outlined above, and encourage conference participants to propose papers and sessions that would be of interest to a diverse audience. Any paper related to the conference theme will be considered; below is a sample of possible topics

Topics:

  • Innovative/disruptive technologies for data management and preservation
  • Infrastructures, tools and resources for data production and research
  • Linked data: opportunities and challenges
  • Metadata standards enhancing the utility of data
  • Challenges and concerns with inter-agency / intra-governmental data sharing
  • Privacy, confidentiality and regulation issues around sensitive data
  • Roles, responsibilities, and relationships in supporting data
  • Facilitating data exchange and sharing across boundaries
  • Data and statistical literacy
  • Data management plans and funding agency requirements
  • Norms and cultures of data in the sciences, social sciences and the humanities
  • Collaboration on research data infrastructure across domains and communities
  • Addressing the digital/statistical divide and the need for trans-national outreach

Papers will be selected from a wide range of subjects to ensure a broad balance of topics.

  • The Program Committee welcomes proposals for:
  • Individual presentations (typically 15-20 minutes)
  • Complete sessions, which could take a variety of formats (e.g. a set of three to four individual presentations on a theme, a discussion panel, a discussion with the audience, etc.)
  • Posters/demonstrations for the poster session
  • Pecha Kucha (a presentation of 20 slides shown for 20 seconds each, heavy emphasis on visual content) http://www.wired.com/techbiz/media/magazine/15-09/st_pechakucha
  • Round table discussions (as these are likely to have limited spaces, an explanation of how the discussion will be shared with the wider group should form part of the proposal).
  • [Note: A separate call for workshops is forthcoming].


Session formats are not limited to the ideas above and session organizers are welcome to suggest other formats.

Proposals for complete sessions should list the organizer or moderator and possible participants; the session organizer will be responsible for securing both session participants and a chair.

All submissions should include the proposed title and an abstract no longer than 200 words (note: longer abstracts will be returned to be shortened before being considered).  Abstracts submitted for complete sessions should provide titles and a brief description for each of the individual presentations.  Abstracts for complete session proposals should be no longer than 300 words if information about individual presentations are needed.  

Please note that all presenters are required to register and pay the registration fee for the conference; registration for individual days will be available.

Deadline for submission of individual presentations and sessions: 5 December 2011.

Deadline for submission of posters, Pecha Kucha sessions and round table discussions: 16 January 2012.

Notification of acceptance for individual presentations and sessions: 3 February 2012.

Notification of acceptance for posters, Pecha Kucha sessions and round table discussions: 24 February 2012.

Data can be cool

As I prepare to leave Guelph there are lots of things I will miss - but what I will maybe miss most is the Data Resource Centre and the creative people who work there.   If you link to the picasa album below you will see some awesome posters they have made to showcase services and bring people into the world of Data and GIS. The images on some of the posters are really powerful....

posters on picasa

DDI/Java developer @ Metadata Technology

Topic:

Metadata Technology North America is looking for full-time Java developers with strong background in XML and web service technologies. We are seeking entry level programmers as well as a more senior individual that can potentially operate as both a Java developer and project manager. Individuals must be able to work independently and posses the ability to adapt existing knowledge to new applications and technologies.

Our company focus is on providing solutions for the management of socio-economic and health statistical data and metadata, leveraging the Data Documentation Initiative (DDI), the Statistical Data and Metadata Exchange (SDMX), and related XML specifications.

Candidates must have the following qualifications:
- Strong Java development in a client / server environment
- Creative, quick learner, independent
- Solid expertise with XML and related technologies (XSL, XPath/XQuery, XSchema, SOAP, etc.)
- Development web service based J2EE applications
- Experience with the Eclipse Integrated Development Environment, including JUnit, Subversion, and JavaDocs

Familiarity with the following is desired but not required:
- Eclipse RCP framework
- Google Web Toolkit
- Spring framework
- Relational and/or native XML databases
- Apache XMLBeans
- Statistical data and software
- DDI, SDMX and related specifications
- Pentaho or similar BI/ETL platform

Positions are local to Knoxville, TN or Washington, DC metropolitan area.

Interested candidates should submit resume and letter of motivation to mtna@metadatatechnology.com.

Metadata Technology North America is committed to the principles of equal employment opportunity and to making employment decisions based on merit. We are committed to complying with Federal, State and local laws providing equal employment opportunities, as well as all laws related to terms and conditions of employment. The company desires to keep a work environment free of sexual harassment or discrimination based on race, religion, ethnicity, national origin, sexual orientation, physical or mental disability, marital status, age or any other status protected by Federal, State or local laws.

ANES Announcement: : Deadlines for the ANES 2010-2012 EGSS Online Commons Proposals

The American National Election Studies are continuing to accept proposals for the ANES 2010-2012 Evaluations of Government and Society Study. The deadline to submit proposals for EGSS 4 is 3:00p.m. EDT, August 30, 2011. The deadline for members of the Online Commons community to comment on proposals is September 8, 2011. The deadline for revisions to proposals is at 3:00p.m. EDT on September 14, 2011. For additional information about how to submit a proposal, please visit: http://www.electionstudies.org/

Proposals may be submitted through the ANES Online Commons. The following describes the goals of this study and proposal process.

About The 2010-2012 Evaluations of Government and Society Study

The overarching theme of the surveys is citizen attitudes about government and society. These Internet surveys represent the most cost-effective way for the ANES user community to gauge political perceptions during one of the most momentous periods in American history. Aside from the historic nature of the current administration and the almost unprecedented economic crisis facing the country, we believe it is imperative that researchers assess attitudes about politics and society in the period leading up to the 2012 national elections. Potential topics include: attitudes about the performance of the Obama administration on the major issues of the day, evaluations of Congress and the Supreme Court, identification with and attitudes about the major political parties, and levels of interest in and engagement with national politics. This is primarily because these perceptions are unmistakably correlated with both presidential vote choice and levels of political participation. We intend to measure each of these topics at multiple points throughout the two-year period preceding the

2012 elections. In addition to these subjects, we envision that each of these surveys would explore a particular aspect of these political perceptions.

This Study includes five rolling cross-section surveys that will allow us the opportunity to pilot new items for possible inclusion on the 2012 time series. Proposals for the first three surveys of the study were accepted earlier this year. The first survey of the study was conducted in October 2010; the second survey was conducted in the Spring of 2011. The third survey will be in the field later this year. We are currently accepting proposals for the final two surveys of the study. The fourth survey will be conducted in early 2012 and the final survey will be in the field in the middle of 2012. For the timelines and deadlines for the remaining surveys, please see http://electionstudies.org/studypages/2010_2012EGSS/2010_2012EGSScalendar.htm

By offering multiple opportunities for the user community to place their items on one or more surveys, we are providing the capacity to survey on a diverse set of topics that are relevant to a wide set of research communities. Lastly, the flexibility of these surveys as to both content and timing will allow the ANES to respond promptly to emerging political issues in this volatile period in our country's history.

About the Online Commons

The design of the questionnaires for The 2010-2012 Evaluations of Government and Society Study will evolve from proposals and comments submitted to the Online Commons (OC). The OC is an online system designed to promote communication among scholars and to yield innovative proposals about the most effective ways to measure electorally-relevant concepts and relationships. The goal of the OC is to improve the quality and scientific value of ANES data collections, to encourage the submission of new ideas, and to make such experiences more beneficial to and enjoyable for investigators. In the last study cycle, more than 700 scholars sent over 200 proposals through the Online Commons.

Proposals for the inclusion of questions must include clear theoretical and empirical rationales. All proposals must also clearly state how the questions will increase the value of the respective studies. In particular, proposed questions must have the potential to help scholars understand the causes and/or consequences of turnout or candidate choice.

For more information about the criteria that will be used to evaluate proposals, please see http://www.electionstudies.org/studypages/2010_2012EGSS/2010_2012EGSScriteria.htm

For additional information on how to submit a proposal, please see http://www.electionstudies.org/onlinecommons/proposalsubmit.htm

ANES Announcement: The ANES 2012 Time Series Study

On June 30, 2011, the American National Election Studies (ANES) began accepting proposals for questions to include on the ANES 2012 Time Series Study.  Proposals may be submitted through the ANES Online Commons. The following describes the goals of this study and the opportunity to include questions on it.

About The ANES 2012 Time Series Study

The ANES’s core mission is to promote cutting-edge and broadly-collaborative research on American national elections. The heart of the ANES is its presidential year time series surveys. The time series legacy is well known, serving as a model for election studies around the world and having generated thousands of publications. Every four years, a large representative sample of American adults has been interviewed on two occasions, first between Labor Day and Election Day, and again between Election Day and the onset of the winter holidays. The two face-to-face interviews will last approximately one hour each in 2012. Pre-election interviews focus on candidate preferences and anticipated vote choice; an array of possible predictors of candidate preferences, turnout, citizen engagement; and an array of indicators of cognitive and behavioral engagement in the information flow of the campaign. Post-election interviews measures a variety of behavioral experiences people might have had throughout the campaign (e.g., turnout, mobilization efforts), plus additional posited predictors of candidate preferences, turnout, and citizen engagement.

Some of the questions asked during these interviews are categorized as standard (also known as core) items, meaning that they have been asked regularly over the years.  These questions are scheduled to appear on subsequent editions of the ANES Time Series in order to permit comparisons across elections.  The purpose of categorizing items as standard is to assure scholars who conduct longitudinal analyses that they can continue to depend on ANES to include variables that have been shown to perform well in the past.

Although recognizing the importance of continuity, ANES has also sought to develop the time series in innovative ways. The non-standard component of each questionnaire has routinely focused on matters of interest to the current election cycle. These items are often selected from an "ANES Question Inventory," which includes the standard questions and questions that have been asked in past ANES surveys but are not part of the standard battery of questions.  Researchers can access the question inventory at:

ftp://ftp.electionstudies.org/ftp/anes/OC/CoreUtility/ALT2010core.htm

The non-standard content of questionnaires has varied over the years. For example, candidate positions on issues of government policy are recognized as predictors of candidate preferences, but two one-hour interviews do not permit measuring positions on all of the many issues enjoying government attention at any one time in history. So from year to year, different choices have been made about which issues to include in the questionnaire.

As in the past, ANES will continue to emphasize best practices in sample design, respondent recruitment, and interviewing.  As always, we aim to provide top-quality service in many respects, including: (1) the careful and extensive planning that must be done before the field work begins, (2) the hard work that will be done by interviewers, supervisors, and study managers during data collection to monitor productivity and make adjustments in strategy to maximize the quality of the final product, and (3) the extensive data processing efforts (including integration of an extensive contextual data file) that will be required to assemble and document the final data set.

 

About the Online Commons

Content for the ANES 2012 Time Series Study will primarily evolve from two sources:  previous ANES Time Series questionnaires and new proposals received via the ANES Online Commons (OC).  The OC is an Internet-based system designed to promote communication among scholars and to yield innovative proposals about the most effective ways to measure electorally-relevant concepts and relationships. The goal of the OC is to improve the quality and scientific value of ANES data collections, to encourage the submission of new ideas, and to make such experiences more beneficial to and enjoyable for investigators. In the last study cycle, more than 700 scholars sent over 200 proposals through the OC.

Proposals for the inclusion of questions must include clear theoretical and empirical rationales. All proposals must also clearly state how the questions will increase the value of the respective studies. In particular, proposed questions must have the potential to help scholars understand the causes and/or consequences of turnout or candidate choice.

The ANES Online Commons will accept proposals until 3:00pm Eastern Time on August 30, 2011. The deadline for members of the Online Commons community to comment on proposals is September 8, 2011. The deadline for revisions to proposals is at 3:00pm Eastern Time on September 14, 2011.

For additional information about how to submit a proposal, please visit:

http://www.electionstudies.org/

 

Proposal Evaluation Criteria

The following criteria will guide the PIs and the ANES Board in evaluating proposals made through the Online Commons. We strongly encourage anyone who is considering making a proposal to read the following carefully.

1. Problem-Relevant. Are the theoretical motivations, proposed concepts and survey items relevant to ongoing controversies among researchers? How will the data that the proposers expect to observe advance the debate?

What specific analyses of the data will be performed? What might these analyses reveal? How would these findings be relevant to specific questions or controversies?

2. Suitability to ANES. The primary mission of the ANES is to advance our understanding of voter choice and electoral participation. Ceteris paribus, concepts and instrumentation that are relevant to our understanding of these phenomena will be considered more favorably than items tapping other facets of politics, public opinion, American culture or society.

3. Building on Solid Theoretical Footing. Does the proposed instrumentation follow from a plausible theory of political behavior?

4. Demonstrated Validity and Reliability of Proposed Items. Proposed items should be accompanied by evidence demonstrating their validity and reliability. Validity has various facets: e.g., construct validity, concurrent validity, discriminant validity and predictive validity. Any assessment of predictive validity should keep in mind criterion 2, above.

Reliability can be demonstrated in various ways; one example is test-retest reliability. We understand that proposals for novel concepts and/or instrumentation will almost always lack empirical evidence demonstrating validity and/or reliability. Proposals for truly "novel" instrumentation might be best suited for the series of smaller, cross-sectional studies ANES will field in the period 2010 through the summer of 2012; as a general matter, we are highly unlikely to field untested instrumentation on the Fall 2012 pre-election and post-election surveys.

5. Breadth of Relevance and Generalizability. Will the research that results from the proposed instrumentation be useful to many scholars?

Given the broad usage of ANES data, we may be unable to accommodate requests to include items that are relevant for one -or only a few- hypothesis tests. Ceteris paribus, items that are potentially relevant for a wide range of analyses will be considered more favorably than items that would seem to have less applicability.

When the 2012 questionnaires are designed, the status of the standard questions will be central considerations. Standard questions do not have an infinite shelf life -- Science advances and new insights can reveal more effective ways of asking important questions or can show that some questions do not in fact meet the requirements of remaining a standard question.  However, proposed changes made to standard questions will be scrutinized with recognition of the value of continuity over time.  While we will welcome proposals to change standard questions, the burden of proof required for making such changes will be high. We will take most seriously arguments that are backed by concrete evidence and strong theory.

All proposals that include a change to a particular question (standard or non-standard) should name the specific question that would be altered and provide a full explanation as to why the ANES user community will benefit by such a change.

Tools To Assist Your Proposal Development

As previously mentioned, researchers can access the ANES Question Inventory at:

ftp://ftp.electionstudies.org/ftp/anes/OC/CoreUtility/ALT2010core.htm

This Inventory provides the list of standard and non-standard questions that have been part of the Time Series, and includes frequencies for the most recent studies.

We have also created a second resource to review questions that have been asked previously.  The ANES Time Series Codebook Search utility searches existing codebooks from studies in the ANES Time Series.   You can access the utility at http://ftp.nes.isr.umich.edu/backup/searchhelp.htm  

(Please note that there are some limitations to the utility that are documented on the search help page, the link to that page is at the top of the utility page.)

We hope that you will find these tools useful as you prepare your proposals.

The opportunity to submit proposals is open to anyone who wants to make a constructive contribution to the development of the ANES 2012 Time Series Study. Feel free to pass this invitation along to anyone (e.g., your colleagues and students) who you think might be interested. We hope to hear from you.

For additional resources and information on how to submit a proposal, please visit http://www.electionstudies.org/onlinecommons/

 

Darrell Donakowski

Director of Studies

American National Election Studies (ANES)

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...