Reproducible Data Analytics in the Cloud-ESIP-2022.pdf (883.65 kB)

Improving FAIRness of AI/ML in Earth Science via Reproducible Big Data Analytics in the Cloud

Download (883.65 kB)
posted on 2022-02-03, 17:21 authored by Jianwu Wang, Xin Wang, Jinbo wang

The talk explains our Reproducible and Portable big data Analytics in the Cloud (RPAC) Toolkit, which helps improve FAIRness of AI/ML in Earth science. The toolkit can deploy, execute, analyze, and reproduce big data analytics automatically in the cloud. This open-source toolkit supports 1) on-demand distributed hardware and software environment provisioning, 2) automatic data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproducibility of existing executions in the same environment or a different environment. More information of the toolkit can be found at This presentation was given during the 2022 ESIP January meeting held virtually in January 2022.