A Toolkit for Reproducible Big Data Analytics in the Cloud.pdf (309.15 kB)
A Toolkit for Reproducible Big Data Analytics in the Cloud
posterposted on 2022-01-10, 20:02 authored by Xin Wang, Jianwu Wang
We present our open-source RPAC toolkit that supports 1) on-demand distributed hardware and software environment provisioning, 2) automatic data and configuration storage for each execution, 3) flexible client modes based on user preferences, 4) execution history query, and 5) simple reproduction of existing executions in the same environment or a different environment. This presentation was given during the 2022 ESIP January meeting held virtually in January 2022.