1 Introduction

In this module, we will discuss the following concepts:

  1. The difference between supervised and unsupervised image classification.
  2. The definitions and application of the various classification algorithms available through Google Earth Engine.
  3. How to set up and run a classification using randomForest, with aspen presence and absence as an example dataset.

2 Background

Image Classification
Humans naturally tend to organize spatial information into groups. From above, we recognize common landforms like lakes and rivers, buildings and roads, forests and deserts. We call this grouping of objects by similar traits, “Image Classification”. But manually classifying objects and assigning values across the globe would be an unending task. Thankfully, the use of remotely sensed data to delineate varying landscape features into categorical classes has become a staple of ecological research over the past 40 years. Classifications have been performed for everything from agricultural development and land cover change, to silvicultural practices and pollution monitoring.

Unsupervised vs. Supervised
Image classification methods can be divided into two categories. First, unsupervised classification involves applying potential predictor variables to a geographic region and asking the predictive algorithm or a priori regression coefficients to do the work of image classification. The second, supervised classification, requires the creation of independent training data: information that a probabilistic model can use to find associations between observed conditions and a suite of predictor variables.

Google Earth Engine Classifiers
Of the available options in Google Earth Engine’s ee.Classifier() function, several fall under the general category of “machine learning”. The algorithmic functions “learn” from the data fed to them and make predictions based on that learned information. These classifiers are particularly adept at building statistical models from relationships between large numbers of remotely sensed predictors and (often highly non-linear) training data. The models can then be applied across large spatial extents to generate predictions in the form of map outputs. In recent years, classifiers such as classification and regression trees (CART) and randomForest have been imported from the computer science and statistics communities and into ecological research.

One commonly utilized algorithm available in Google Earth Engine for supervised classification is randomForest (Breiman, 2001). In a nutshell, a randomForest (RF) model is constructed by taking a randomized subset of training data (i.e. field measurements, weather station recordings) and fitting them to a random subset of predictors (i.e. remotely sensed data) in decision trees. While no single tree perfectly captures the statistical relationships between training and predictor data, the compilation of trees, the forest, tells a more complete story. If that sounds complicated, do not worry! The classifier is going to do all that hard work behind the scenes. But in using powerful tools, like RF, it is our responsibility to know what kind of information it is receiving. Much like the human body, if you give RF low-quality information for the training data and predictors, your outputs will reflect that quality. So, know thy data!

3 Google Earth Engine Image Classification Workflow

In this module, we are going to walk through a sample modeling workflow in Google Earth Engine. We will see how to set up our initial dataset, build a list of potential predictors, run our RF classifier, apply the resulting model to a larger spatial extent, and assess the accuracy of our RF model. For this example, we are going to use RF to classify aspen stands, which provide multiple ecosystem services from wood product stock to highly biodiverse wildlife habitat in western Colorado, USA.