Analyzing EOSDIS Dataset Research Outputs using Knowledge Graphs and Large Language Models
Datasets, unlike publications, can be updated over time, with each new version receiving a DOI but not always being linked to previous ones. This complicates tracking citations across a dataset’s lifecycle. We address this by integrating dataset versions and citations into a knowledge graph (KG), which helps trace dataset citations and analyze dataset usage in applied research. To categorize publications from various journals, we fine-tuned NASA IMPACT INDUS Large Language Model (LLM) on a labeled publication set, assigning publications to one of twenty applied research areas. By linking datasets to these research areas, we improved dataset searchability and discovery through these domains. This poster was presented at the 2024 July ESIP Meeting in Asheville, NC (July 22-26, 2024).