Improving FAIRness of AI/ML in Earth Science via Reproducible Big Data Analytics in the Cloud
The talk explains our Reproducible and Portable big data Analytics in the Cloud (RPAC) Toolkit, which helps improve FAIRness of AI/ML in Earth science. The toolkit can deploy, execute, analyze, and reproduce big data analytics automatically in the cloud. This open-source toolkit supports 1) on-demand distributed hardware and software environment provisioning, 2) automatic data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproducibility of existing executions in the same environment or a different environment. More information of the toolkit can be found at https://bdal.umbc.edu/tools/#reproducible-data-analytics. This presentation was given during the 2022 ESIP January meeting held virtually in January 2022.