Big Data in the Geosciences: 3

We are inundated with environmental data – Earth observing satellites stream terabytes of data back to us daily; ground-based sensor networks track weather, water quality and air pollution, taking readings every few minutes; and community scientists log hundreds and thousands of observations every day, recording everything from bird sightings to road closures and accidents. But this very richness of data has created a new set of problems.
This third post in our four-part series gives a brief summary of how deep learning is being used in the geosciences today – loosely based on the Earth and Space Science Informatics sessions and town halls at the AGU fall meeting in Dec 2016.

Deep learning
Artificial Neural Networks (ANNs) are already being widely used in domains ranging from stock price predictions to image recognition; from genetic sequencing to targeted marketing. Deep learning neural networks – that is, networks that have multiple layers of neurons between the input and output neurons – are also beginning to be used in the Geosciences to address a range of problems.

Prabhat (UC Berkeley), in his talk “Classification and Localization of Extreme Weather Patterns with Deep Learning” described using a 4-layer deep network, consisting of two convolutional layers and two fully connected layers to classify cyclones, weather fronts, and atmospheric rivers. The deep network was trained on 10,000 positive examples and 10,000 negative examples to output a binary (yes/no) to the presence of tropical cyclones, weather fronts, and atmospheric rivers. The hyper-parameters (the learning and regularization parameters) were selected based on a Bayesian Optimization scheme using a tool called Spearmint. The network achieved 99% classification accuracy on test data for tropical cyclones; 90% accuracy for atmospheric rivers; and 89% accuracy for weather fronts. The deep network outperformed Logistic Regression, K-nearest neighbor, Support Vector Machines, random forest, and convolution nets.
For more details, see their article on ResearchGate.

Ramachandran (NASA Marshall Space Flight Center), in his talk “Riding the Hype Wave: Evaluating new AI techniques for their applicability in Earth Science”, began by cautioning scientists about the hype surrounding machine learning. He pointed to a Gartner report that places machine learning at the peak of the “hype cycle” and shows it poised to descend into the “trough of disillusionment”. He also noted the cautionary tale of Microsoft’s chatbot. However, Ramachandran averred, geoscientists can still effectively leverage machine learning by being aware of what machine learning can and cannot do; by having an evaluation framework; and by carefully looking at the cost-benefit trade-offs when identifying problems for implementing machine learning solutions.

He described how NASA has identified the domain of pattern recognition and classification of satellite imagery as an area where deep learning can be used effectively. Several projects at NASA are implementing deep learning networks: one project aims to classify cyclone intensity based on satellite images utilizing a deep learning network comprising 8 layers, including five convolutional layers and three fully connected layers. The network has a validation accuracy of 90%. Similar deep networks are also being used to search for specific events (such as a hurricane, dust, or smoke event) in satellite imagery, achieving an overall accuracy of 88%; and in studying transverse cirrus bands.
For more details on these deep networks, see Maskey’s AGU presentation.

True Positives from Estimation of Cylcone Intensity using Deep Learning (Ramachandran)

Ganguly (NASA Ames Research Center) in his talk “DeepSAT: A Deep Learning Framework for satellite image classification” describes a probabilistic framework that combines machine learning and expert knowledge to improve the classification of tree cover in 1-m NAIP imagery. Additionally, Ganguly described a deep hierarchal learning network (DeepSAT) developed to classify land cover classes in satellite images. DeepSAT consists of three levels: at the first level, more than 100 features are extracted (then pruned down to ~20) and the extracted feature vectors normalized. The set of normalized feature vectors feed into the next level, a Deep Belief Network (DBN) which trains on the input features. The trained DBN is then input into the third level: initializing the weights of a feed-forward backpropagation neural network (see diagram below). DeepSAT achieved 98% classification accuracy in classifying images into 4 land cover classes, and a 93% classification accuracy when classifying into six land cover classes. For both four and six-class classification, DeepSAT outperforms a DBN, a CNN, or a Stacked De-noising Auto-encoder.
For more details Basu et al, 2015 “A Semiautomated Probabilistic Framework for Tree-Cover Delineation From 1-m NAIP Imagery Using a High-Perfomrance Computing Architecture”, as well as Ganguly’s presentation on SAT-4 and SAT-6 at the GPU Tech Conference.

DeeSAT architecture (Ganguly)

Spatial Reasoning

Science data analysis, visualization, communication, and education

Big Data in the Geosciences: 3