Machine Learning based Ocean Eddy Detection using Cloud Services
The main objective of this study is to make use of cloud services, such as amazon web services (AWS) to build machine learning models to identify ocean eddies from satellite images. There are two major approaches to conduct machine learning at AWS. The first approach is via AWS SageMaker, which is a machine learning platform that facilitates image labeling, jupyter notebooks, and various algorithms for model training. The second approach is via AWS EC2, which provides virtual instances as execution environments. Beside SageMaker and EC2, we also configured Google cloud as an alternative service to train our models to identify ocean eddies from satellite images. We chose a three layer CNN (convolutional neural network) model for binary image classification to see whether the images contain eddies or not, and YOLOv3 for eddy detection and localization (where the eddy is located if there are eddies in images). In terms of accuracy, our CNN model achieved 60% accuracy. Using the YOLOv3, the intersection over union (IoU) ranged from 40% to 90%. Comparing features we found that SageMaker is equipped with more functionalities compared to Google cloud platform (GCP), however, the services are not easy to merge to build end-to-end pipelines. GCP offers GPU (graphics processing unit) based services by default with TPU (Tensor Processing Unit) and CPU (Central Processing Unit) as well but in AWS the GPU base services need to be configured beforehand. In this work, we configured the scripts using Horovod to deploy on GPU based EC2 instances. Codes are open sourced for further references at https://github.com/big-data-lab-umbc/AWS-automation/tree/main/gpu-example/OceanEddy. Between SageMaker and EC2, we think EC2 is a general computing resource and provides a lot of freedom for users to decide on what to do such as installing any software/packages, running Docker images, running Jupyter notebooks, utilizing multiple EC2 instances for parallel execution. Meanwhile, most of these capabilities need manual command line operations from users, which could be difficult for users who are not frequent command line users.