Research Information Scientist

Posted to IASSIST on: 2015-10-24

Employer: New York University Center for Urban Science & Progress

Employer URL:


The research information scientist will serve as an information specialist, programmer, and ETL engineer, in order to support the full CUSP data life cycle, including data curation, data ingestion, data discovery, and researcher access.  The research information scientist will be responsible for collecting, developing, collating, archiving and communicating information about research datasets in the CUSP data facility.   In that role, s/he will oversee the metadata management system and design/implement new features or services as needed, which requires strong programming and database skills. S/he will provide programming support to software engineers, in order to adapt in-house data profiling and discovery software to build and update in-house software.   A successful research information scientist candidate will also be able to develop basic and execute complex ETL scripts for data ingest and researcher database development.  This person will lead CUSP’s metadata knowledge management – structural and domain information about data assets. In this role, s/he will communicate with domain experts on NYC and related open data, urban policy research data, and physical measurement data, creating a database to facilitate data discovery beyond the standard laundry list approach.

  • Create and update metadata standards for the data facility – for tabular and non tabular datasets (such as images, sound, text), including geospatial data.
  • Provide development support for and maintain an internal metadata management tool (currently CKAN); provide functional specifications and development support for internal data discovery tools.
  • Work directly with dataset domain experts (generally, these are the data providers and CUSP researchers) in order to create a domain knowledgebase about dataset quality and content; this includes how data was collected or derived, and known issues.
  • Communicate with data facility users about all datasets housed in the facility, providing guidance for users to identify the appropriate data for research questions; this will include documenting user activity to feed into the metadata database.
  • Serve as the primary point of contact for data facility users with data access and workspace requests (students, faculty, agency staff, etc.); this includes communication with users prior to submitting data access/workspace requests and internal routing of user access/workspace requests using an in-house workflow management system.
  • Develop and run ETL scripts for tabular data.
  • Work with software developer and systems engineer to support development of complex ETL scripts for difficult and nontabular datasets.
  • Develop technical specs and provision existing ETL scripts for data of all types – tabular; time series; images; GIS, streaming data – in order to create datamarts for facility users
  • Manage and track data facility information security training sessions for all users and data stewards; this includes tracking compliance of data stewards to data facility best practices in data management, confidentiality, privacy and governance.


  • M.S. library and information sciences or related field
  • Bachelor’s degree in programming, information technology or a related field OR an equivalent combination of education/experience in technology and operations
  • 3+ years of practical experience in research dataset curation
  • 3+ years of programming experience with Python, Perl, Ruby or similar language
  • 3+ years of experience managing data in xml and json
  • 2+ years of experience with at least basic database development using Oracle, MySQL, MSSQL, or PostgreSQL
  • Experience managing large datasets and creating databases (ETL) for social science research
  • Working knowledge of metadata standards: Technical metadata, descriptive metadata (Dublin Core, MODS, DDI, CSDGM), process metadata, and preservation metadata (PREMIS); this will require an ability to learn, implement, and crosswalk metadata standards
  • 1-2 years of experience working with and communicating with domain scientists
  • Experience communicating with nontechnical audiences
  • Expertise in best practices in use, reuse, reproducibility, curation, and preservation of scientific data
  • Excellent time management and project management skills
  • Passionate about the value of responsible data management and reproducible data analysis for evidence-based policy; thrives in a fast-paced, entrepreneurial work environment

Preferred Skills:

  • Experience using APIs to access and query complex datasets
  • Experience developing APIs for dataset dissemination

Archived on: 2015-10-26