Environmental data repositories are rapidly adapting to the positive changes in the culture of data publishing, as requested by funders,journals, and researchers. Repositories are increasingly being tagged as the principal site for depositing data and research products from specific sponsor programs (e.g., BCO-DMO for NSF Biological & Chemical Oceanography, EDI for NSF LTER and DEB programs, the Arctic Data Center for NSF Arctic programs, and NCEI for NOAA data of all stripes). This leads to many highly specialized repositories that serve specific communities and are responsible curators for targeted swaths of data. These repositories are then faced with the challenge of replicating copies of data to meet funder expectations while providing an integrated discovery and access system for their communities and across the broader environmental sciences community. Repository interoperability allows federated data aggregators like DataONE and ESDIS to then provide a common discovery and interoperability layer and a searchable view on top of this federated repository infrastructure.
In this session, we will…
- Explore the concepts of data sharing, data replication, data duplication among repositories and what they mean for the user community (short intro to the problem)
- Explore some real-word data sharing/interoperability scenarios,
- Identify the common elements and requirements for data interoperability between repositories (e.g., Elements: Dataset, Funding Award, Persons, Organizations, Roles, etc., and Requirements: ‘Element’ Identification, ACLs, Attribution of sources, PROV, etc)
- Try to answer the question, “Are the existing science metadata standards sufficient for data interoperability and replication among repositories?”. I.e., can they express the relationship between data in different repositories (‘primary or original’ data, synchronized copy, copy of certain version, subset associated with publication)
Agenda
1) Repository interoperability challenges (Jones) 20 minutes
-
technical: identifier practices, mutability, duplication, versioning and derived data variants, built infrastructure
-
socio-cultural: open source & open communities, NIH syndrome, tech leapfrogging, so many standards to choose from
-
DataONE crosswalk/integration experiences
2) Case studies in interoperability challenges
-
EDI / BCO-DMO (Gries) (10 minutes)
-
BCO-DMO / R2R / NCEI (Shepherd) (10 minutes)
-
Arctic Data Center / IARC/ EDI / LTER (Jones) (10 minutes)
3) Brainstorming, Discussion and Q&A (Shepherd moderates) (40 minutes)
-
What are the easy interoperability wins?
-
What are the hard interoperability challenges?
-
What does it take to build an open community where:
-
Many repositories implement the same API, share identifier and versioning models, and can replicate content without creating new identifiers, and can be searched from a common system like DataONE?