Already a member?

Sign In
Syndicate content

Metadata & Standards

IASSIST Quarterly (IQ) volume 40-2 is now on the website: Revolution in the air

Welcome to the second issue of Volume 40 of the IASSIST Quarterly (IQ 40:2, 2016). We present three papers in this issue.

http://iassistdata.org/iq/issue/40/2

First, there are two papers on the Data Documentation Initiative that have their own special introduction. I want to express my respect and gratitude to Joachim Wackerow (GESIS - Leibniz Institute for the Social Sciences). Joachim (Achim) and Mary Vardigan (University of Michigan) have several times and for many years communicated to and advised the readers of the IASSIST Quarterly on the continuing development of the DDI. The metadata of data is central for the use and reuse of data, and we have come a long way through the efforts of many people.    

The IASSIST 2016 conference in Bergen was a great success - I am told. I was not able to attend but heard that the conference again was 'the best ever'. I was also told that among the many interesting talks and inputs at the conference Matthew Woollard's keynote speech on 'Data Revolution' was high on the list. Good to have well informed informers! Matthew Woollard is Director of the UK Data Archive at the University of Essex. Here in the IASSIST Quarterly we bring you a transcript of his talk. Woollard starts his talk on the data revolution with the possibility of bringing to users access to data, rather than bringing data to users. The data is in the 'cloud' - in the air - 'Revolution in the air' to quote a Nobel laureate. We are not yet in the post-revolutionary phase and many issues still need to be addressed. Woollard argues that several data skills are in demand, like an understanding of data management and of the many ethical issues. Although he is not enthusiastic about the term 'Big Data', Woollard naturally addresses the concept as these days we cannot talk about data - and surely not about data revolution - without talking about Big Data. I fully support his view that we should proceed with caution, so that we are not simply replacing surveys where we 'ask more from fewer' with big data that give us 'less from more'. The revolution gives us new possibilities, and we will see more complex forms of research that will challenge data skills and demand solutions at data service institutions.  

Papers for the IASSIST Quarterly are always very welcome. We welcome input from IASSIST conferences or other conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing a presentation, give a thought to turning your one-time presentation into a lasting contribution. We permit authors 'deep links' into the IQ as well as deposition of the paper in your local repository. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the session participants, and will be readily available on the IASSIST website at http://www.iassistdata.org

Authors are very welcome to take a look at the instructions and layout:

http://iassistdata.org/iq/instructions-authors

Authors can also contact me via e-mail: kbr@sam.sdu.dk. Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.

Karsten Boye Rasmussen   
Editor, IASSIST Quarterly

IASSIST 2016 Program At-A-Glance, Part 2: Data infrastructure, data processing and research data management

 

Here's another list of highlights from IASSIST2016 which is focusing on the data revolution. For previous highlights, see here.

Infrastructure

  • For those of you with an interest in technical infrastructure, the University of Applied Sciences HTW Chur will showcase an early protype MMRepo (1 June, 3F), whose function is to store qualitative and quantitative data into one big data repository.
  • The UK Data Service will present the following panel "The CESSDA Technical Framework - what is it and why is it needed?", which elaborates how the CESSDA Research Infrastructure should have modern data curation techniques rooted in sophisticated IT capabilities at its core, in order to better serve its community.

  • If you have been wondering about the various operational components and the associated technology counterparts involved with running a data science repository, then the presentation by ICPSR is for you. Participants in that panel will leave with an understanding of how the Archonnex Architecture at ICPSR is strengthening the data services offered to new researchers and much more.

Data processing

Be sure to check out the aforementioned infrastructure offerings if you’re interested in data processing, but also check out a half-day workshop on 31 May, “Text Processing with Regular Expressions,” presented by Harrison Dekker, UC Berkeley, that will help you learn regular expression syntax and how to use it in R, Python, and on the command line. The workshop will be example-driven.

Data visualisation

If you are comfortable working with quantitative data and are familiar with the R tool for statistical computing and want to learn how to create a variety of visualisations, then the workshop by the University of Minnesota on 31 May is for you. It will introduce the logic behind ggplot2 and give participants hands-on experience creating data visualizations with this package. This session will also introduce participants to related tools for creating interactive graphics from this syntax.

Programming

  • If you’re interesting in programming there’s a full-day Intro to Python for Data Wrangling workshop on 31 May, led by Tim Dennis, UC San Diego,  that will provide tools to use scientific notebooks in the cloud, write basic Python programs, integrate disparate csv files and more.

  • Also, the aforementioned Regular Expressions workshop also on 31 May will offer  in-workshop opportunities  to working with real data and perform representative data cleaning and validation operations in multiple languages.

Research data management

  • Get a behind-the-scenes look at data management and see how an organization such as the Odum Institute manages its archiving workflows, head to “Automating Archive Policy Enforcement using Dataverse and iRODS” on 31 May with presenters from the UNC Odom Institute, UNC Chapel Hill. ’Participants will see machine actionable rules in practice and be introduced to an environment where written policies can be expressed in ways an archive can automate their enforcement.

  • Another good half-day workshop, targeted to for people tasked with teaching good research data management practices to researchers is  “Teaching Research Data Management Skills Using Resources and Scenarios Based on Real Data,” 31 May, with presenters from ICPSR, the UK Data Archive and FORS. The organisers of this workshop will showcase recent examples of how they have developed teaching resources for hands-on-training, and will talk about successes and failures in this regard.

Tools

If you’re just looking to add more resources to your data revolution toolbox, whether it’s metadata, teaching, data management, open and restricted access, or documentation, here’s a quick list of highlights:

  • At Creating GeoBlacklight Metadata: Leveraging Open Source Tools to Facilitate Metadata Genesis (31 May), presenters from New York University will provide hands-on experience in creating GeoBlacklight geospatial metadata, including demos on how to capture, export, and store GeoBlacklight metadata.

  • DDI Tools Demo (1 June). The Data Documentation Initiative (DDI) is an international standard for describing statistical and social science data.

  • DDI tools: No Tools, No Standard (3 June), where participants will be introduced to the work of the DDI Developers Community and get an overview of tools available from the community.

Open-access

As mandates for better accessibility of data affects more researchers, dive into the Conversation with these IASSIST offerings:

Metadata

Don’s miss IASSIST 2016’s offerings on metadata, which is the data about the data that makes finding and working with data easier to do. There are many offerings, with a quick list of highlights below:

  • Creating GeoBlacklight Metadata: Leveraging Open Source Tools to Facilitate Metadata Genesis (Half-day workshop, 31 May), with presenters from New York University

  • At Posters and Snacks on 2 June, Building A Metadata Portfolio For Cessda, with presenters from the Finnish Social Science Data Archive; GESIS – Leibniz-Institute for the Social Sciences; and UK Data Service

Spread the word on Twitter using #IASSIST16. 


A story by Dory Knight-Ingram (
ICPSR)

Latest Issue of IQ Available! Data Documentation Initiative - Results, Tools, and Further Initiatives

Welcome to the third issue of Volume 39 of the IASSIST Quarterly (IQ 39:3, 2015). This special issue is guest edited by Joachim Wackerow of GESIS – Leibniz Institute for the Social Sciences in Germany and Mary Vardigan of ICPSR at the University of Michigan, USA. That sentence is a direct plagiarism from the editor’s notes of the recent double issue (IQ 38:4 & 39:1). We are very grateful for all the work Mary and Achim have carried out and are developing further in the continuing story of the Data Documentation Initiative (DDI), and for their efforts in presenting the work here in the ASSIST Quarterly.

As in the recent double issue on DDI this special issue also presents results, tools, and further initiatives. The DDI started 20 years ago and much has been accomplished. However, creative people are still refining and improving it, as well as developing new areas for the use of DDI.

Mary Vardigan and Joachim Wackerow give on the next page an overview of the content of DDI papers in this issue.

Let me then applaud the two guest editors and also the many authors who made this possible:

  • Alerk Amin, RAND Cooperation, www.rand.org, USA
  • Ingo Barkow, Associate Professor for Data Management at the University for Applied Sciences Eastern Switzerland (HTW Chur), Switzerland
  • Stefan Kramer, American University, Washington, DC, USA
  • David Schiller, Research Data Centre (FDZ) of the German Federal Employment Agency (BA) at the Institute for Employment Research (IAB)
  • Jeremy Williams, Cornell Institute for Social and Economic Research, USA
  • Larry Hoyle, senior scientist at the Institute for Policy & Social Research at the University of Kansas, USA
  • Joachim Wackerow, metadata expert at GESIS - Leibniz Institute for the Social Sciences, Germany
  • William Poynter, UCL Institute of Education, London, UK
  • Jennifer Spiegel, UCL Institute of Education, London, UK
  • Jay Greenfield, health informatics architect working with data standards, USA
  • Sam Hume, vice president of SHARE Technology and Services at CDISC, USA
  • Sanda Ionescu, user support for data and documentation, ICPSR, USA
  • Jeremy Iverson, co-founder and partner at Colectica, USA
  • John Kunze, systems architect at the California Digital Library, USA
  • Barry Radler, researcher at the University of Wisconsin Institute on Aging, USA
  • Wendy Thomas, director of the Data Access Core in the Minnesota Population Center (MPC) at the University of Minnesota, USA
  • Mary Vardigan, archivist at the Inter-university Consortium for Political and Social Research (ICPSR), USA
  • Stuart Weibel, worked in OCLC Research, USA
  • Michael Witt, associate professor of Library Science at Purdue University, USA.

I hope you will enjoy their work in this issue, and I am certain that the contact authors will enjoy hearing from you
about new potential results, tools, and initiatives.

Articles for the IASSIST Quarterly are always very welcome. They can be papers from IASSIST conferences or other
conferences and workshops, from local presentations or papers especially written for the IQ. When you are preparing
a presentation, give a thought to turning your one-time presentation into a lasting contribution to continuing development. As an author you are permitted ‘deep links’ where you link directly to your paper published in the IQ. Chairing a conference session with the purpose of aggregating and integrating papers for a special issue IQ is also much appreciated as the information reaches many more people than the session participants, and will be readily available on the IASSIST website at http://www.iassistdata.org.

Authors are very welcome to take a look at the instructions and layout: http://iassistdata.org/iq/instructions-authors. Authors can also contact me via e-mail: kbr@sam.sdu.dk.

Should you be interested in compiling a special issue for the IQ as guest editor(s) I will also be delighted to hear from you.

Karsten Boye Rasmussen
September 2015
Editor

New Perspectives on DDI

This issue features four papers that look at leveraging the structured metadata provided by DDI in
different ways. The first, “Design Considerations for DDI-Based Data Systems,“ aims to help decisionmakers
by highlighting the approach of using relational databases for data storage in contrast to
representing DDI in its native XML format. The second paper, “DDI as a Common Format for Export
and Import for Statistical Packages,” describes an experiment using the program Stat/Transfer to
move datasets among five popular packages with DDI Lifecycle as an intermediary format. The paper
“Protocol Development for Large-Scale Metadata Archiving Using DDI Lifecycle” discusses the use
of a DDI profile to document CLOSER (Cohorts and Longitudinal Studies Enhancement Resources,
www.closer.ac.uk), which brings together nine of the UK’s longitudinal cohort studies by producing a
metadata discovery platform (MDP). And finally, “DDI and Enhanced Data Citation“ reports on efforts in
extend data citation information in DDI to include a larger set of elements and a taxonomy for the role
of research contributors.

Mary Vardigan - vardigan@umich.edu
Joachim Wackerow - Joachim.Wackerow@gesis.org

IQ double issue 38(4)/39(1) is up, and so is vol 39(2)!

Hi folks!  A lovely gift for your reading pleasure over the holidays, we present two, yes, TWO issues of the IASSIST Quarterly.  The first is the double issue, 38(4)/39(1) with guest editors, Joachim Wacherow of GESIS – Leibniz Institute for the Social Sciences in Germany and Mary Vardigan of ICPSR at the University of Michigan, USA.  This issue focuses on the Data Documentation Initiative (DDI) and how it makes meta-analysis possible.  The second issue is 39(2), and is all about data:  avoiding statistical disclosure, using data, and improving digital preservation.  Although we usually post the full text of the Editor's Notes in the blog post, it seems lengthy to do that for both issues.  You will find them, though, on the web site: the Editor's Notes for the double issue, and the Editor's Notes for issue 39(2).

Michele Hayslett, for the IQ Publications Committee

North American DDI Conference April 2013

Registration is now open for NADDI 2013 (http://www.ipsr.ku.edu/naddi/). The North American Data Documentation Initiative Conference (NADDI) is an opportunity for those using DDI and those interested in learning more about it to come together and learn from each other. Patterned after the successful European DDI conference (EDDI), NADDI 2013 will be a two day conference with invited and contributed presentations. This conference should be of interest to both researchers and data professionals in the social sciences and other disciplines. Training sessions will follow the conference. One focus of the first year's conference will be on the use of DDI by individual research teams through the data lifecycle.  

Please note that thanks to the generous support of the Alfred P. Sloan Foundation, a limited number of reduced rate registrations for graduate students are available.

 

Our keynote speaker will be Dr. Jay Greenfield of Booz Allen Hamilton where he is the semantic architect for a DDI Lifecycle based metadata system that supports the National Children's Study (NCS).

 

The conference will be held in the Kansas Union at the University of Kansas on April 2 and 3 2013. An opening night reception will be held April 1, and workshops will be held on April 4.

 

The call for papers is also now open through January 31, 2013.

 

For more information visit the conference web site at http://www.ipsr.ku.edu/naddi/ or email naddi@ku.edu .

86 helpful tools for the data professional PLUS 45 bonus tools

I have been working on this (mostly) annotated collection of tools and articles that I believe would be of help to both the data dabbler and professional. If you are a data scientist, data analyst or data dummy, chances are there is something in here for you. I included a list of tools, such as programming languages and web-based utilities, data mining resources, some prominent organizations in the field, repositories where you can play with data, events you may want to attend and important articles you should take a look at.

The second segment (BONUS!) of the list includes a number of art and design resources the infographic designers might like including color palette generators and image searches. There are also some invisible web resources (if you're looking for something data-related on Google and not finding it) and metadata resources so you can appropriately curate your data. This is in no way a complete list so please contact me here with any suggestions!

Data Tools

  1. Google Refine - A power tool for working with messy data (formerly Freebase Gridworks)
  2. The Overview Project - Overview is an open-source tool to help journalists find stories in large amounts of data, by cleaning, visualizing and interactively exploring large document and data sets. Whether from government transparency initiatives, leaks or Freedom of Information requests, journalists are drowning in more documents than they can ever hope to read.
  3. Refine, reuse and request data | ScraperWiki - ScraperWiki is an online tool to make acquiring useful data simpler and more collaborative. Anyone can write a screen scraper using the online editor. In the free version, the code and data are shared with the world. Because it's a wiki, other programmers can contribute to and improve the code.
  4. Data Curation Profiles - This website is an environment where academic librarians of all kinds, special librarians at research facilities, archivists involved in the preservation of digital data, and those who support digital repositories can find help, support and camaraderie in exploring avenues to learn more about working with research data and the use of the Data Curation Profiles Tool.
  5. Google Chart Tools - Google Chart Tools provide a perfect way to visualize data on your website. From simple line charts to complex hierarchical tree maps, the chart galley provides a large number of well-designed chart types. Populating your data is easy using the provided client- and server-side tools.
  6. 22 free tools for data visualization and analysis
  7. The R Journal - The R Journal is the refereed journal of the R project for statistical computing. It features short to medium length articles covering topics that might be of interest to users or developers of R.
  8. CS 229: Machine Learning - A widely referenced course by Professor Andrew Ng, CS 229: Machine Learning provides a broad introduction to machine learning and statistical pattern recognition. Topics include supervised learning, unsupervised learning, learning theory, reinforcement learning and adaptive control. Recent applications of machine learning, such as to robotic control, data mining, autonomous navigation, bioinformatics, speech recognition, and text and web data processing are also discussed.
  9. Google Research Publication: BigTable - Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance. These applications place very different demands on Bigtable, both in terms of data size (from URLs to web pages to satellite imagery) and latency requirements (from backend bulk processing to real-time data serving). Despite these varied demands, Bigtable has successfully provided a flexible, high-performance solution for all of these Google products. In this paper we describe the simple data model provided by Bigtable, which gives clients dynamic control over data layout and format, and we describe the design and implementation of Bigtable.
  10. Scientific Data Management - An introduction.
  11. Natural Language Toolkit - Open source Python modules, linguistic data and documentation for research and development in natural language processing and text analytics, with distributions for Windows, Mac OSX and Linux.
  12. Beautiful Soup - Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping.
  13. Mondrian: Pentaho Analysis - Pentaho Open source analysis OLAP server written in Java. Enabling interactive analysis of very large datasets stored in SQL databases without writing SQL.
  14. The Comprehensive R Archive Network - R is `GNU S', a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc. Please consult the R project homepage for further information. CRAN is a network of ftp and web servers around the world that store identical, up-to-date, versions of code and documentation for R. Please use the CRAN mirror nearest to you to minimize network load.
  15. DataStax - Software, support, and training for Apache Cassandra.
  16. Machine Learning Demos
  17. Visual.ly - Infographics & Visualizations. Create, Share, Explore
  18. Google Fusion Tables - Google Fusion Tables is a modern data management and publishing web application that makes it easy to host, manage, collaborate on, visualize, and publish data tables online.
  19. Tableau Software - Fast Analytics and Rapid-fire Business Intelligence from Tableau Software.
  20. WaveMaker - WaveMaker is a rapid application development environment for building, maintaining and modernizing business-critical Web 2.0 applications.
  21. Visualization: Annotated Time Line - Google Chart Tools - Google Code An interactive time series line chart with optional annotations. The chart is rendered within the browser using Flash.
  22. Visualization: Motion Chart - Google Chart Tools - Google Code A dynamic chart to explore several indicators over time. The chart is rendered within the browser using Flash.
  23. PhotoStats Create gorgeous infographics about your iPhone photos, with Photostats.
  24. Ionz Ionz will help you craft an infographic about yourself.
  25. chart builder Powerful tools for creating a variety of charts for online display.
  26. Creately Online diagramming and design.
  27. Pixlr Editor A powerful online photo editor.
  28. Google Public Data Explorer ?The Google Public Data Explorer makes large datasets easy to explore, visualize and communicate. As the charts and maps animate over time, the changes in the world become easier to understand. You don't have to be a data expert to navigate between different views, make your own comparisons, and share your findings.
  29. Fathom Fathom Information Design helps clients understand and express complex data through information graphics, interactive tools, and software for installations, the web, and mobile devices. Led by Ben Fry. Enough said!
  30. healthymagination | GE Data Visualization Visualizations that advance the conversation about issues that shape our lives, and so we encourage visitors to download, post and share these visualizations.
  31. ggplot2 ggplot2 is a plotting system for R, based on the grammar of graphics, which tries to take the good parts of base and lattice graphics and none of the bad parts. It takes care of many of the fiddly details that make plotting a hassle (like drawing legends) as well as providing a powerful model of graphics that makes it easy to produce complex multi-layered graphics.
  32. Protovis Protovis composes custom views of data with simple marks such as bars and dots. Unlike low-level graphics libraries that quickly become tedious for visualization, Protovis defines marks through dynamic properties that encode data, allowing inheritance, scales and layoutsto simplify construction.Protovis is free and open-source, provided under the BSD License. It uses JavaScript and SVG for web-native visualizations; no plugin required (though you will need a modern web browser)! Although programming experience is helpful, Protovis is mostly declarative and designed to be learned by example.
  33. d3.js D3.js is a small, free JavaScript library for manipulating documents based on data.
  34. MATLAB - The Language Of Technical Computing MATLAB® is a high-level language and interactive environment that enables you to perform computationally intensive tasks faster than with traditional programming languages such as C, C++, and Fortran.
  35. OpenGL - The Industry Standard for High Performance Graphics OpenGL.org is a vendor-independent and organization-independent web site that acts as one-stop hub for developers and consumers for all OpenGL news and development resources. It has a very large and continually expanding developer and end-user community that is very active and vested in the continued growth of OpenGL.
  36. Google Correlate Google Correlate finds search patterns which correspond with real-world trends.
  37. Revolution Analytics - Commercial Software & Support for the R Statistics Language Revolution Analytics delivers advanced analytics software at half the cost of existing solutions. By building on open source R—the world’s most powerful statistics software—with innovations in big data analysis, integration and user experience, Revolution Analytics meets the demands and requirements of modern data-driven businesses.
  38. 22 Useful Online Chart & Graph Generators
  39. The Best Tools for Visualization Visualization is a technique to graphically represent sets of data. When data is large or abstract, visualization can help make the data easier to read or understand. There are visualization tools for search, music, networks, online communities, and almost anything else you can think of. Whether you want a desktop application or a web-based tool, there are many specific tools are available on the web that let you visualize all kinds of data.
  40. Visual Understanding Environment The Visual Understanding Environment (VUE) is an Open Source project based at Tufts University. The VUE project is focused on creating flexible tools for managing and integrating digital resources in support of teaching, learning and research. VUE provides a flexible visual environment for structuring, presenting, and sharing digital information.
  41. Bime - Cloud Business Intelligence | Analytics & Dashboards Bime is a revolutionary approach to data analysis and dashboarding. It allows you to analyze your data through interactive data visualizations and create stunning dashboards from the Web.
  42. Data Science Toolkit A collection of data tools and open APIs curated by our own Pete Warden. You can use it to extract text from a document, learn the political leanings of a particular neighborhood, find all the names of people mentioned in a text and more.
  43. BuzzData BuzzData lets you share your data in a smarter, easier way. Instead of juggling versions and overwriting files, use BuzzData and enjoy a social network designed for data.
  44. SAP - SAP Crystal Solutions: Simple, Affordable, and Open BI Tools for Everyday Use
  45. Project Voldemort
  46. ggplot. had.co.nz

Data Mining

  1. Weka -nWeka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes. Weka is open source software issued under the GNU General Public License.
  2. PSPP- PSPP is a program for statistical analysis of sampled data. It is a Free replacement for the proprietary program SPSS, and appears very similar to it with a few exceptions. The most important of these exceptions are, that there are no “time bombs”; your copy of PSPP will not “expire” or deliberately stop working in the future. Neither are there any artificial limits on the number of cases or variables which you can use. There are no additional packages to purchase in order to get “advanced” functions; all functionality that PSPP currently supports is in the core package.PSPP can perform descriptive statistics, T-tests, linear regression and non-parametric tests. Its backend is designed to perform its analyses as fast as possible, regardless of the size of the input data. You can use PSPP with its graphical interface or the more traditional syntax commands.
  3. Rapid I- Rapid-I provides software, solutions, and services in the fields of predictive analytics, data mining, and text mining. The company concentrates on automatic intelligent analyses on a large-scale base, i.e. for large amounts of structured data like database systems and unstructured data like texts. The open-source data mining specialist Rapid-I enables other companies to use leading-edge technologies for data mining and business intelligence. The discovery and leverage of unused business intelligence from existing data enables better informed decisions and allows for process optimization.The main product of Rapid-I, the data analysis solution RapidMiner is the world-leading open-source system for knowledge discovery and data mining. It is available as a stand-alone application for data analysis and as a data mining engine which can be integrated into own products. By now, thousands of applications of RapidMiner in more than 30 countries give their users a competitive edge. Among the users are well-known companies as Ford, Honda, Nokia, Miele, Philips, IBM, HP, Cisco, Merrill Lynch, BNP Paribas, Bank of America, mobilkom austria, Akzo Nobel, Aureus Pharma, PharmaDM, Cyprotex, Celera, Revere, LexisNexis, Mitre and many medium-sized businesses benefitting from the open-source business model of Rapid-I.
  4. R Project - R is a language and environment for statistical computing and graphics. It is a GNU projectwhich is similar to the S language and environment which was developed at Bell Laboratories (formerly AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a different implementation of S. There are some important differences, but much code written for S runs unaltered under R. R provides a wide variety of statistical (linear and nonlinear modelling, classical statistical tests, time-series analysis, classification, clustering, ...) and graphical techniques, and is highly extensible. The S language is often the vehicle of choice for research in statistical methodology, and R provides an Open Source route to participation in that activity.One of R's strengths is the ease with which well-designed publication-quality plots can be produced, including mathematical symbols and formulae where needed. Great care has been taken over the defaults for the minor design choices in graphics, but the user retains full control.R is available as Free Software under the terms of the Free Software Foundation's GNU General Public License in source code form. It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS.

Organizations

  1. Data.gov
  2. SDM group at LBNL
  3. Open Archives Initiative
  4. Code for America | A New Kind of Public Service
  5. The # DataViz Daily
  6. Institute for Advanced Analytics | North Carolina State University | Professor Michael Rappa · MSA Curriculum
  7. BuzzData | Blog, 25 great links for data-lovin' journalists
  8. MetaOptimize - Home - Machine learning, natural language processing, predictive analytics, business intelligence, artificial intelligence, text analysis, information retrieval, search, data mining, statistical modeling, and data visualization
  9. had.co.nz
  10. Measuring Measures - Measuring Measures

Repositories

  1. Repositories | DataCite
  2. Data | The World Bank
  3. Infochimps Data Marketplace + Commons: Download Sell or Share Databases, statistics, datasets for free | Infochimps
  4. Factual Home - Factual
  5. Flowing Media: Your Data Has Something To Say
  6. Chartsbin
  7. Public Data Explorer
  8. StatPlanet
  9. ManyEyes
  10. 25+ more ways to bring data into R

Events

  1. Welcome | Visweek 2011
  2. O'Reilly Strata: O'Reilly Conferences
  3. IBM Information On Demand 2011 and Business Analytics Forum
  4. Data Scientist Summit 2011
  5. IBM Virtual Performance 2011
  6. Wolfram Data Summit 2011—Conference on Data Repositories and Ideas
  7. Big Data Analytics: Mobile, Social and Web

Articles

  1. Data Science: a literature review | (R news & tutorials)
  2. What is "Data Science" Anyway?
  3. Hal Varian on how the Web challenges managers - McKinsey Quarterly - Strategy - Innovation
  4. The Three Sexy Skills of Data Geeks « Dataspora
  5. Rise of the Data Scientist
  6. dataists » A Taxonomy of Data Science
  7. The Data Science Venn Diagram « Zero Intelligence Agents
  8. Revolutions: Growth in data-related jobs
  9. Building data startups: Fast, big, and focused - O'Reilly Radar

BONUS! Art Design

  1. Periodic Table of Typefaces
  2. Color Scheme Designer 3
  3. Color Palette Generator Generate A Color Palette For Any Image
  4. COLOURlovers
  5. Colorbrewer: Color Advice for Maps

Image Searches

  1. American Memory from the Library of Congress The home page for the American Memory Historical Collections from the Library of Congress. American Memory provides free access to historical images, maps, sound recordings, and motion pictures that document the American experience. American Memory offers primary source materials that chronicle historical events, people, places, and ideas that continue to shape America.
  2. Galaxy of Images | Smithsonian Institution Libraries
  3. Flickr Search
  4. 50 Websites For Free Vector Images Download
  5. Design weblog for designers, bloggers and tech users. Covering useful tools, tutorials, tips and inspirational photos.
  6. Images Google Images. The most comprehensive image search on the web.
  7. Trade Literature - a set on Flickr
  8. Compfight / A Flickr Search Tool
  9. morgueFile free photos for creatives by creatives
  10. stock.xchng - the leading free stock photography site
  11. The Ultimate Collection Of Free Vector Packs - Smashing Magazine
  12. How to Create Animated GIFs Using Photoshop CS3 - wikiHow
  13. IAN Symbol Libraries (Free Vector Symbols and Icons) - Integration and Application Network
  14. Usability.gov
  15. best icons
  16. Iconspedia
  17. IconFinder
  18. IconSeeker

Invisible Web

  1. 10 Search Engines to Explore the Invisible Web Like the header says...
  2. Scirus - for scientific information The most comprehensive scientific research tool on the web. With over 410 million scientific items indexed at last count, it allows researchers to search for not only journal content but also scientists' homepages, courseware, pre-print server material, patents and institutional repository and website information.
  3. TechXtra: Engineering, Mathematics, and Computing TechXtra is a free service which can help you find articles, books, the best websites, the latest industry news, job announcements, technical reports, technical data, full text eprints, the latest research, thesis & dissertations, teaching and learning resources and more, in engineering, mathematics and computing.
  4. Welcome to INFOMINE: Scholarly Internet Resource Collections INFOMINE is a virtual library of Internet resources relevant to faculty, students, and research staff at the university level. It contains useful Internet resources such as databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and many other types of information.
  5. The WWW Virtual Library The WWW Virtual Library (VL) is the oldest catalogue of the Web, started by Tim Berners-Lee, the creator of HTML and of the Web itself, in 1991 at CERN in Geneva. Unlike commercial catalogues, it is run by a loose confederation of volunteers, who compile pages of key links for particular areas in which they are expert; even though it isn't the biggest index of the Web, the VL pages are widely recognised as being amongst the highest-quality guides to particular sections of the Web.
  6. Intute Intute is a free online service that helps you to find web resources for your studies and research. With millions of resources available on the Internet, it can be difficult to find useful material. We have reviewed and evaluated thousands of resources to help you choose key websites in your subject. The Virtual Training Suite can also help you develop your Internet research skills through tutorials written by lecturers and librarians from universities across the UK.
  7. CompletePlanet - Discover over 70,000+ databases and specially search engines There are hundreds of thousands of databases that contain Deep Web content. CompletePlanet is the front door to these Deep Web databases on the Web and to the thousands of regular search engines — it is the first step in trying to find highly topical information. By tracing through CompletePlanet's subject structure or searching Deep Web sites, you can go to various topic areas, such as energy or agriculture or food or medicine, and find rich content sites not accessible using conventional search engines. BrightPlanet initially developed the CompletePlanet compilation to identify and tap into many hundreds and thousands of search sources simultaneously to automatically deliver high-quality content to its corporate and enterprise customers. It then decided to make CompletePlanet available as a public service to the Internet search public.
  8. Infoplease: Encyclopedia, Almanac, Atlas, Biographies, Dictionary, Thesaurus. Information Please has been providing authoritative answers to all kinds of factual questions since 1938—first as a popular radio quiz show, then starting in 1947 as an annual almanac, and since 1998 on the Internet at www.infoplease.com. Many things have changed since 1938, but not our dedication to providing reliable information, in a way that engages and entertains.
  9. DeepPeep: discover the hidden web DeepPeep is a search engine specialized in Web forms. The current beta version currently tracks 45,000 forms across 7 domains. DeepPeep helps you discover the entry points to content in Deep Web (aka Hidden Web) sites, including online databases and Web services. Advanced search allows you to perform more specific queries. Besides specifying keywords, you can also search for specific form element labels, i.e., the description of the form attributes.
  10. IncyWincy: The Invisible Web Search Engine IncyWincy is a showcase of Net Research Server (NRS) 5.0, a software product that provides a complete search portal solution, developed by LoopIP LLC. LoopIP licenses the NRS engine and provides consulting expertise in building search solutions.

Metadata

  1. Description Schema: MODS (Library of Congress) and Outline of elements and attributes in MODS version 3.4: MetadataObject This document contains a listing of elements and their related attributes in MODS Version 3.4 with values or value sources where applicable. It is an "outline" of the schema. Items highlighted in red indicate changes made to MODS in Version 3.4.All top-level elements and all attributes are optional, but you must have at least one element. Subelements are optional, although in some cases you may not have empty containers. Attributes are not in a mandated sequence and not repeatable (per XML rules). "Ordered" below means the subelements must occur in the order given. Elements are repeatable unless otherwise noted."Authority" attributes are either followed by codes for authority lists (e.g., iso639-2b) or "see" references that link to documents that contain codes for identifying authority lists.For additional information about any MODS elements (version 3.4 elements will be added soon), please see the MODS User Guidelines.
  2. wiki.dbpedia.org : About DBpedia is a community effort to extract structured information from Wikipedia and to make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link other data sets on the Web to Wikipedia data. We hope this will make it easier for the amazing amount of information in Wikipedia to be used in new and interesting ways, and that it might inspire new mechanisms for navigating, linking and improving the encyclopaedia itself.
  3. Semantic Web - W3C In addition to the classic “Web of documents” W3C is helping to build a technology stack to support a “Web of data,” the sort of data you find in databases. The ultimate goal of the Web of data is to enable computers to do more useful work and to develop systems that can support trusted interactions over the network. The term “Semantic Web” refers to W3C’s vision of the Web of linked data. Semantic Web technologies enable people to create data stores on the Web, build vocabularies, and write rules for handling data. Linked data are empowered by technologies such as RDF, SPARQL, OWL, and SKOS.
  4. RDA: Resource Description & Access | www.rdatoolkit.org Designed for the digital world and an expanding universe of metadata users, RDA: Resource Description and Access is the new, unified cataloging standard. The online RDA Toolkit subscription is the most effective way to interact with the new standard. More on RDA.
  5. Cataloging Cultural Objects Cataloging Cultural Objects: A Guide to Describing Cultural Works and Their Images (CCO) is a manual for describing, documenting, and cataloging cultural works and their visual surrogates. The primary focus of CCO is art and architecture, including but not limited to paintings, sculpture, prints, manuscripts, photographs, built works, installations, and other visual media. CCO also covers many other types of cultural works, including archaeological sites, artifacts, and functional objects from the realm of material culture.
  6. Library of Congress Authorities (Search for Name, Subject, Title and Name/Title) Using Library of Congress Authorities, you can browse and view authority headings for Subject, Name, Title and Name/Title combinations; and download authority records in MARC format for use in a local library system. This service is offered free of charge.
  7. Search Tools and Databases (Getty Research Institute) Use these search tools to access library materials, specialized databases, and other digital resources.
  8. Art & Architecture Thesaurus (Getty Research Institute) Learn about the purpose, scope and structure of the AAT. The AAT is an evolving vocabulary, growing and changing thanks to contributions from Getty projects and other institutions. Find out more about the AAT's contributors.
  9. Getty Thesaurus of Geographic Names (Getty Research Institute) Learn about the purpose, scope and structure of the TGN. The TGN is an evolving vocabulary, growing and changing thanks to contributions from Getty projects and other institutions. Find out more about the TGN's contributors.
  10. DCMI Metadata Terms
  11. The Digital Object Identifier System
  12. The Federal Geographic Data Committee — Federal Geographic Data Committee

SBE 2020 white papers available

The Directorate for the Social, Behavioral, and Economic Sciences of the National Science Foundation (NSF/SBE) has released today a collection of white papers contributed under the "SBE 2020: Future Research in the Social, Behavioral &  Economic Sciences" initiative. Authors were asked to outline grand challenge questions that are both foundational and transformative. For information, please visit:
http://www.nsf.gov/sbe/sbe_2020/index.cfm more...

Special Double Issue of IQ on Data Documentation Initiative

Guest Editors Notes - Mary Vardigan and Joachim Wackerow

Welcome to a special double issue of the IASSIST Quarterly featuring articles focused on the Data Documentation Initiative (DDI), a metadata standard for the social sciences. We are proud to present these six articles, which explore various projects related to DDI 3 and its enhanced features. more...

  • IASSIST Quarterly

    Publications Special issue: A pioneer data librarian
    Welcome to the special volume of the IASSIST Quarterly (IQ (37):1-4, 2013). This special issue started as exchange of ideas between Libbie Stephenson and Margaret Adams to collect

    more...

  • Resources

    Resources

    A space for IASSIST members to share professional resources useful to them in their daily work. Also the IASSIST Jobs Repository for an archive of data-related position descriptions. more...

  • community

    • LinkedIn
    • Facebook
    • Twitter

    Find out what IASSISTers are doing in the field and explore other avenues of presentation, communication and discussion via social networking and related online social spaces. more...