Welcome to the Earth Science Information Partners (ESIP) 2018 Summer Meeting! The 2018 theme is Realizing the Socioeconomic Value of Data. The theme is based on one of the goals in the 2015 - 2020 ESIP Strategic Plan, which provides a framework for ESIP’s activities over the next three years.

Room Block Update: Our block is full. We recommend the AC Hotel Tucson Downtown, which is about 5 minutes by car and is accessible via the Tucson Streetcar in about fifteen minutes.
Tuesday, July 17 • 11:30am - 1:00pm
Optimizing Data for the Cloud

Session Description: When data is shared in the cloud, anyone can analyze it without having to download it or store it themselves, which lowers the cost of new product development, reduces the time to scientific discovery, and can accelerate innovation. However, staging large-scale datasets for analysis in the cloud requires consideration of how data should be prepared and organized to allow fast, efficient, and programmatic access from distributed computing systems. This workshop will provide a forum for members of the community to share lessons learned as they explore ways to use the cloud to expand access to data. It seeks to encourage dialog between users interested in leveraging data in the AWS Cloud for research and application development.

Data Optimization for the cloud: Data Formats (July 17th, 11:00 am – 1:00 pm):


Otis Brown and Jonathan Brannock, CICS-NC (10 min)
Title: Big Data Project (BDP) Data Broker Update
Description: The NOAA Big Data Project Data Broker role and current datasets being provided by CICS-NC are reviewed. NOAA datasets under consideration for provision to the cloud partners are described. An update on GOES-16 accession from AWS S3 including usage by volume and users is given. New policy challenges associated with reformatting datasets and online updated are discussed.

Rich Signell, USGS (10 min)
Title: Cloud-friendly ndarray formats
Description: There is a tremendous amount of scientific multidimensional array data (ndarray) stored in NetCDF or HDF files. Since the cloud uses object storage, not conventional filesystems, there is a need for a "cloud-friendly" storage format that can support the NetCDF and HDF data models. Several solutions have been proposed, including HSDS, Zarr, TileDB, S3-Netcdf, and can be compared with FUSE, which provides a POSIX layer to make object storage look like a filesystem. This talk will discuss what the Pangeo project is doing to explore these data formats and the challenges that remain for the community.

Rob Emanuele, Azavea (10 min)
Title: Cloud Optimized GeoTiffs: enabling efficient cloud workflows
Description: Cloud Optimized GeoTIFFs (COGs) are a raster data format that is a key component to enabling cloud-native geospatial workflows. COGs enable faster reading, writing, and processing of raster data on the cloud without the need for local copies. This talk will include a brief overview of what COGs are and show examples of how they can be used to leverage cloud deployment for research and application development.

John Readey, The HDF Group (10 min)
Title: HDF Data in the Cloud
Description: Amazon S3 is a great storage technology for the cloud: scalable, built-in redundancy, and cost-effective. However traditionally HDF5 files stored on S3 haven’t worked well (or at all) with applications that expect data to be stored on POSIX filesystems, requiring files to be copied to local storage before being accessed. In order to enable HDF data for cloud-based analytics over massive datasets, The HDF Group has developed new methods for storing HDF data on S3 that take full advantage of the storage platform, allows data to be accessed in place, and is compatible with existing applications. This talk will review these technologies and outline some future directions.

General discussion (10 min)

Breakout groups: focus on data formats (30 min)

Report findings from breakout groups (10 min)

  • Subject Jump In, Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Tags Cloud Computing, Data Analytics
