Loading…
This event has ended. Create your own event on Sched.
Welcome to the Earth Science Information Partners (ESIP) 2018 Summer Meeting! The 2018 theme is Realizing the Socioeconomic Value of Data. The theme is based on one of the goals in the 2015 - 2020 ESIP Strategic Plan, which provides a framework for ESIP’s activities over the next three years.

All Presentations are being added to a Google Folder temporarily and then will be moved to FigShare and linked to the sessions here. 
Back To Schedule
Tuesday, July 17 • 11:30am - 1:00pm
Collaboration among data repositories: replication, deduplication, and interoperability

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Environmental data repositories are rapidly adapting to the positive changes in the culture of data publishing, as requested by funders,journals, and researchers. Repositories are increasingly being tagged as the principal site for depositing data and research products from specific sponsor programs (e.g., BCO-DMO for NSF Biological & Chemical Oceanography, EDI for NSF LTER and DEB programs, the Arctic Data Center for NSF Arctic programs, and NCEI for NOAA data of all stripes). This leads to many highly specialized repositories that serve specific communities and are responsible curators for targeted swaths of data. These repositories are then faced with the challenge of replicating copies of data to meet funder expectations while providing an integrated discovery and access system for their communities and across the broader environmental sciences community. Repository interoperability allows federated data aggregators like DataONE and ESDIS to then provide a common discovery and interoperability layer and a searchable view on top of this federated repository infrastructure.

In this session, we will…
  • Explore the concepts of data sharing, data replication, data duplication among repositories and what they mean for the user community (short intro to the problem)
  • Explore some real-word data sharing/interoperability scenarios,
  • Identify the common elements and requirements for data interoperability between repositories (e.g., Elements: Dataset, Funding Award, Persons, Organizations, Roles, etc., and Requirements: ‘Element’ Identification, ACLs, Attribution of sources, PROV, etc)
  • Try to answer the question, “Are the existing science metadata standards sufficient for data interoperability and replication among repositories?”. I.e., can they express the relationship between data in different repositories (‘primary or original’ data, synchronized copy, copy of certain version, subset associated with publication)
Agenda

1) Repository interoperability challenges (Jones) 20 minutes

  • technical: identifier practices, mutability, duplication, versioning and derived data variants, built infrastructure

  • socio-cultural: open source & open communities, NIH syndrome, tech leapfrogging, so many standards to choose from

  • DataONE crosswalk/integration experiences

2) Case studies in interoperability challenges

  • EDI / BCO-DMO (Gries) (10 minutes)

  • BCO-DMO / R2R / NCEI (Shepherd) (10 minutes)

  • Arctic Data Center / IARC/ EDI / LTER (Jones) (10 minutes)

3) Brainstorming, Discussion and Q&A (Shepherd moderates) (40 minutes)

  • What are the easy interoperability wins?

  • What are the hard interoperability challenges?

  • What does it take to build an open community where:

    • Many repositories implement the same API, share identifier and versioning models, and can replicate content without creating new identifiers, and can be searched from a common system like DataONE?


Speakers & Moderators
avatar for Matt Jones

Matt Jones

Director of Informatics R&D, DataONE / NCEAS / UC Santa Barbara
DataONE | Arctic Data Center | Open Science | Provenance and Semantics | Scientific Synthesis
avatar for Adam Shepherd

Adam Shepherd

Technical Director, BCO-DMO
Architecting adaptive and sustainable data infrastructures.Co-chair of the ESIP schema.org clusterKnowledge Graphs | Data Containerization | Declarative Workflows | Provenance | schema.org



Tuesday July 17, 2018 11:30am - 1:00pm PDT
Canyon B
  Canyon B, Working Session