Controlling AWS Costs with Data Carousel
presentationposted on 11.08.2020, 16:12 by Benjamin GalewskyBenjamin Galewsky, Donald Petravick, Daues, Greg, John Readey, Ryan Kolak
How to manage the costs associated with a 2.4 Petabyte dataset hosted on AWS? This is a question posed by the EOSDIS Large, Mission Scale Data working group. Part of the answer lies in keeping the data in low-cost Glacier storage, however unbounded data retrieval costs are incompatible with federal budget rules. We will describe and demonstrate a data carousel model where data is restored on a fixed regular schedule and research jobs are run against the data before it is again placed in cold storage. This provides a bounded, fixed cost to NASA to operate, and allows the researchers to scale their analysis as their budgets and needs permit. This presentation was given at the Earth Science Information Partners (ESIP) Summer Meeting held online in July 2020.