AMP: An Automated Metadata Pipeline
posterposted on 2021-07-20, 18:03 authored by Beth Huffer
We are developing an automated metadata production pipeline and a data discovery and access service. We auto-generate machine-readable, semantically consistent metadata records using a combination of ontologies, inference rules, and machine learning. A data preparation pipeline parses and prepares data files for analysis by a Convolutional Neural Net, which generates labels indicating the scientific type of the dataset. HDF and NetCDF files are reformatted as CSV for delivery to our own internal ML pipeline, and to data consumers using our discovery and access service. This poster was presented at the 2021 Earth Science Information Partners (ESIP) Summer Meeting held virtually in July 2021.