NASA GES-DISC Knowledge Base: Connecting Science Variables, Measurement, Datasets, and Publications
The Goddard Earth Sciences Data and Information Services Center (GES-DISC) is a NASA-run data archive center that focuses on topics such as atmospheric composition, water and energy cycles, and climate variability. The center adheres to the FAIR data model, which stresses the importance of finding data. Users can access the center's datasets through a search engine that allows them to locate relevant information. Conventional data discovery methods, which often involve manual curation of metadata, can be limited in their capacity to scale as datasets grow and may not account for linguistic variations. Recent advancements in natural language processing and knowledge graphs have led to the development of NASA GES-DISC vector search, which incorporates these technologies to improve search results by integrating the knowledge of the research community. The citation network is used as supplementary metadata, and knowledge graphs are used to enhance the explanations provided by the search engine. Here we implement our graph on a hybrid of Neo4j and Weaviate vector search. Graph analytics can be performed using Neo4j as a graph-native database. The use of Weaviate as a vector search allows us to perform high-level sparse queries as well as abstract features such as natural language question answering. This poster was presented at the 2023 January Earth Science Information Partners (ESIP) Meeting held virtuall Jan. 23-27, 2023.