Loading…
Welcome to the Earth Science Information Partners (ESIP) 2018 Summer Meeting! The 2018 theme is Realizing the Socioeconomic Value of Data. The theme is based on one of the goals in the 2015 - 2020 ESIP Strategic Plan, which provides a framework for ESIP’s activities over the next three years.

If you haven’t already, register here!

Room Block Update: Our block is full. We recommend the AC Hotel Tucson Downtown, which is about 5 minutes by car and is accessible via the Tucson Streetcar in about fifteen minutes.
View analytic
Tuesday, July 17 • 11:30am - 1:00pm
Optimizing Data for the Cloud

Sign up or log in to save this to your schedule and see who's attending!

Session Description: When data is shared in the cloud, anyone can analyze it without having to download it or store it themselves, which lowers the cost of new product development, reduces the time to scientific discovery, and can accelerate innovation. However, staging large-scale datasets for analysis in the cloud requires consideration of how data should be prepared and organized to allow fast, efficient, and programmatic access from distributed computing systems. This workshop will provide a forum for members of the community to share lessons learned as they explore ways to use the cloud to expand access to data. It seeks to encourage dialog between users interested in leveraging data in the AWS Cloud for research and application development.


Data Optimization for the cloud: Data Formats (July 17th, 11:00 am – 1:00 pm):

AGENDA


Otis Brown and Jonathan Brannock, CICS-NC (10 min)
Title: Big Data Project (BDP) Data Broker Update
Description: The NOAA Big Data Project Data Broker role and current datasets being provided by CICS-NC are reviewed. NOAA datasets under consideration for provision to the cloud partners are described. An update on GOES-16 accession from AWS S3 including usage by volume and users is given. New policy challenges associated with reformatting datasets and online updated are discussed.

Rich Signell, USGS (10 min)
Title: Cloud-friendly ndarray formats
Description: There is a tremendous amount of scientific multidimensional array data (ndarray) stored in NetCDF or HDF files. Since the cloud uses object storage, not conventional filesystems, there is a need for a "cloud-friendly" storage format that can support the NetCDF and HDF data models. Several solutions have been proposed, including HSDS, Zarr, TileDB, S3-Netcdf, and can be compared with FUSE, which provides a POSIX layer to make object storage look like a filesystem. This talk will discuss what the Pangeo project is doing to explore these data formats and the challenges that remain for the community.

Rob Emanuele, Azavea (10 min)
Title: Cloud Optimized GeoTiffs: enabling efficient cloud workflows
Description: Cloud Optimized GeoTIFFs (COGs) are a raster data format that is a key component to enabling cloud-native geospatial workflows. COGs enable faster reading, writing, and processing of raster data on the cloud without the need for local copies. This talk will include a brief overview of what COGs are and show examples of how they can be used to leverage cloud deployment for research and application development.

John Readey, The HDF Group (10 min)
Title: HDF Data in the Cloud
Description: Amazon S3 is a great storage technology for the cloud: scalable, built-in redundancy, and cost-effective. However traditionally HDF5 files stored on S3 haven’t worked well (or at all) with applications that expect data to be stored on POSIX filesystems, requiring files to be copied to local storage before being accessed. In order to enable HDF data for cloud-based analytics over massive datasets, The HDF Group has developed new methods for storing HDF data on S3 that take full advantage of the storage platform, allows data to be accessed in place, and is compatible with existing applications. This talk will review these technologies and outline some future directions.

General discussion (10 min)

Breakout groups: focus on data formats (30 min)

Report findings from breakout groups (10 min)





Tuesday July 17, 2018 11:30am - 1:00pm
Pima
  • Subject Jump In, Deep Dive
  • Remote Participation Link https://global.gotomeeting.com/join/752150301
  • Remote Participation Access Code 752-150-301
  • Remote Participation Phone # (646) 749-3129 More phone numbers Australia: +61 2 9087 3604 Austria: +43 7 2081 5427 Belgium: +32 28 93 7018 Canada: +1 (647) 497-9391 Denmark: +45 32 72 03 82 Finland: +358 942 72 1060 France: +33 170 950 594 Germany: +49 692 5736 7317 Ireland: +353 16 572 651 Italy: +39 0 247 92 13 01 Netherlands: +31 202 251 017 New Zealand: +64 9 280 6302 Norway: +47 21 93 37 51 Spain: +34 932 75 2004 Sweden: +46 853 527 827 Switzerland: +41 435 5015 61 United Kingdom: +44 20 3713 5028
  • Tags Cloud Computing, Data Analytics
Feedback form isn't open yet.

Attendees (15)