ESIP
Browse

Building a Federated Data Catalog with Client Implementations - Meeting Data Where It is

Download (1.3 MB)
poster
posted on 2023-07-19, 15:48 authored by Mike Johnson, Jim Coll, Angus Watters, Justin Singh, Rachel Bash

Much of the dialog and technical advancement surrounding the use of earth observation data is centered on data creators and providers, whose official responsibilities end at data delivery. While providers like NOAA, NASA, the USGS have taken critical steps to collect and publicly host data; they are spread across a range of data locations Requiring varying type of data access protocols. Finding, accessing, and extracting subsets of data from these varied providers is a burdensome and often challenging task that could be minimized with a automatically refreshing (“Automatic Refresh”), federated data catalog (“Federated Flat Catalog”) with common language software for access (“Programmatic Access”).

This combination of a auto-refreshing catalog paired with multi language implementations (R and Python), allowed the catalog to grow its data holdings from 11 to over 2,000 data providers and share the catalog as JSON and parquet files from a github.io page. To highlight how they might be used, “Examples: Figure (A)” shows how one might extract elevation from the USGS National Map A3 account, POLARIS soils data from Duke FTP server, Landcover from the USGS LCMAP team over HTTPS, and a derived wetness index from a Lynker s3 bucket for the city of Fort Collins using the catalog and generic dap() function. In “Examples: Figure (B)”, we are able to subset 4 days of rainfall data for the state of Florida using a climatePy shortcut for TerraClimate data.

Funding

ESIP Lab

History

Usage metrics

    ESIP JULY 2023

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC