1 Introduction

In this module, we will discuss the following concepts:

  1. Learn about the types of data corrections commonly applied to remote sensing imagery.
  2. How to visually compare spatial data from different pre-processing levels within the same dataset.
  3. How to perform cloud masking and cloud masking assessments in Google Earth Engine for Landsat 8 surface reflectance imagery.

2 Background

What is Pre-Processing?
Much of the data you will find in Google Earth Engine (GEE) will have some level of pre-processing. This involves several different methods of quality control to ensure the highest levels of accuracy and consistency within raster collections. Depending on the collection, there may be a variety of pre-processing levels available and it is important to understand the differences to successfully integrate remotely sensed data into ecological studies. Three common sources of error for imagery products are consistently addressed by publishers before data are made available in GEE: atmosphere (i.e. air chemistry), topography (i.e. elevation), and geometry (i.e. pixel consistency).


Atmospheric Correction
As solar energy rebounds off the surface of the Earth and back towards our sensor in space, the atmosphere does a pretty good job of getting in the way. This takes place in the form of scattering and absorption (for more information, see Module 3). Identifying and correcting for these effects is important to accurately represent and interpret true surface conditions, like tree species leaf pigments or the differences between urban and agricultural pixels.


Topographic and Terrain Correction
Illumination effects from slope, aspect, and elevation pose additional challenges in collecting and processing remotely sensed data. Multiple correction methods have been developed, including the use of digital elevation models, to develop predictions of problematic topography. If your research is conducted at high altitudes or in areas of sheer relief, you will be relieved to know that pre-processing for terrain effects has been taken care of by specialists (though, for the careful and cautious, manual methods do exist).


Geometric Correction
This process makes sure the alignment of the raster images is systematic and consistent over time and relative to each other images. For Landsat, the processes of georeferencing and orthorectification are accomplished through independent ground control points and previously created digital elevation models. For an archival dataset like Landsat, ensuring that pixels line up pass after pass and year after year is paramount. Otherwise, remote sensing scientists and ecologist would have little ability to conduct multitemporal analyses.


It is important to remember that none of these quality assurance methods is 100% foolproof! Live by the motto of ‘Know Thy Data’ and carefully examine your images qualitatively and quantitatively. We will show a couple examples of this later in the module.

3 Pre-processing in Google Earth Engine with Landsat 8

It is an incredible advantage to have the (free!) dedicated support and behind-the-scenes work that happens before data becomes available in Google Earth Engine. However, you may still find it necessary to manipulate your dataset of interest to facilitate a particular research application. In this module, we will be working with Landsat 8 data and the image below details several use-cases for different levels of processing.

A decision workflow from Young et al, 2017 showing suggested use-cases for various levels of Landsat data pre-processing.

3.1 Example of Pre-processing Levels.

To give a qualitative sense of the difference between different pre-processing levels, we can look at several true-color images of southern Oregon, USA from late Summer, 2018. In this timeframe, our standard atmospheric interference has been worsened by smoke effects from the Carr Fire in northern California. To assess our initial imagery, we will load the ‘Raw’ Landsat 8 collection. Raw data (also called ‘at-sensor radiance’) has not been corrected for any form of potential effects and is not typically utilized in ecological studies. However, it is helpful to establish a baseline for the levels to come. Run the script below to produce an image similar to the one below.

Note: In addition to the following pre-processing levels, Landsat data has been split into two quality levels, Tier 1 and Tier 2. Tier 1 is the higher quality option. Click here to learn more about the differences between the tiers.

// Center our map on southern Oregon, USA.
Map.setCenter(-122.3158, 42.4494, 12);

// Import Tier 1 Raw Landsat 8 scene.
var raw = ee.Image('LANDSAT/LC08/C01/T1/LC08_045031_20180811');

// Define the true-color vis params.
var rawvis = {bands: ['B4','B3','B2'], min: 0.0, max: 30000.0, gamma: 1};

// Add the raw image to your map.
Map.addLayer(raw, rawvis, 'raw');

Raw Landsat 8 imagery from southern Oregon. Our image is centered over Mt McLoughlin (2,893 m) with Upper Klamath Lake to the east. The cities of Medford (west) and Klamath Falls (east) are visible if you zoom out your image.

3.2 Top of Atmosphere (TOA)

The next level of pre-processing takes our ‘Raw’ data and applies corrections for the effects of solar activity, including solar irradiance, Earth-Sun distance, and solar elevation angle. For researchers, top of atmosphere (TOA) is often appropriate for the evaluation of single-date, single-scene imagery (i.e. land cover classification in a relatively small study area). This is due to varying levels of solar effects dependent upon the date, time, and latitude of collection. Append the following code to your script and the resulting image will appear similar to the one below.

// To begin your comparison, bring in an image from Tier 1 Landsat 8 Top of Atmosphere.
var toa = ee.Image('LANDSAT/LC08/C01/T1_TOA/LC08_045031_20180811');

// Define the true-color vis params.
var toavis = {bands: ['B4','B3','B2'], min: 0.0, max: 0.4};

// Add the TOA image to your map.
Map.addLayer(toa, toavis, 'toa');

Loading the TOA collection, parts of the image appear clearer due to some corrections for the influence of solar effects. However, there still appears to be work to do!

3.3 Surface Reflectance (SR)

These data have received the highest level of pre-processing in an attempt to best represent the actual conditions on the ground where a certain amount of solar energy bounces back (is reflected) toward aerial and spaceborne sensors. However, even surface reflectance products can be adversely affected by low sun angle, excessive clouds, and coverage locations over latitudes greater than 65 degrees north or south (Young et al., 2017). Despite this, it is recommended that analyses over multiple dates (e.g. change detection) or large geographic scale (e.g. algorithmic prediction) use Landsat surface reflectance data. Add this final piece of code to your script to see the image below in your map viewer pane.

// Finally, source an image from Tier 1 Landsat 8 surface reflection.
var sr = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_045031_20180811');

// Define the true-color vis params.
var srvis = {bands: ['B4','B3','B2'], min: 0, max: 3000, gamma: 1.4};

// Add the surface reflectance image to your map.
Map.addLayer(sr, srvis, 'sr');

Applying atmospheric corrections appears to have greatly improved our image clarity, especially over Upper Klamath Lake and the agricultural areas west of Mt McGloughlin.

3.4 Complete Code for Landsat Imagery Comparison

The code for these examples is from one specific Landsat image but you now have the framework to investigate any area of interest (i.e. your study area) to compare different levels of pre-processing.

// Center our map on southern Oregon, USA.
Map.setCenter(-122.3158, 42.4494, 12);

// Import Tier 1 Raw Landsat 8 scene.
var raw = ee.Image('LANDSAT/LC08/C01/T1/LC08_045031_20180811');

// Define the true-color vis params.
var rawvis = {bands: ['B4','B3','B2'], min: 0.0, max: 30000.0, gamma: 1};

// Add the raw image to your map.
Map.addLayer(raw, rawvis, 'raw');

// To begin your comparison, bring in an image from Tier 1 Landsat 8 Top of Atmosphere.
var toa = ee.Image('LANDSAT/LC08/C01/T1_TOA/LC08_045031_20180811');

// Define the true-color vis params.
var toavis = {bands: ['B4','B3','B2'], min: 0.0, max: 0.4};

// Add the TOA image to your map.
Map.addLayer(toa, toavis, 'toa');

// Finally, source an image from Tier 1 Landsat 8 surface reflection.
var sr = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_045031_20180811');

// Define the true-color vis params.
var srvis = {bands: ['B4','B3','B2'], min: 0, max: 3000, gamma: 1.4};

// Add the TOA image to your map.
Map.addLayer(sr, srvis, 'sr');

3.5 Cloud Masking

As we have discovered, the pre-processing work of atmospheric, topographic, and geometric correction has been completed by the time we arrive at surface reflectance products. One very important step that is not included in the processing of Landsat data, prior to it being available in Google Earth Engine, is the removal of near-ground weather phenomena. This most often takes the form of clouds. Clouds are especially prevalent over the tropics (i.e. dense rainforest) and over water bodies. In this section, we will take a look at the latter in the Quetico-Superior region of northeast Minnesota and southwest Ontario, where the terrain is difficult to capture in the summer months due large areas of cloud cover, generated in part by hundreds of small to medium-sized lakes.


Summer clouds gathering over Jean Lake in Quetico Provincial Park. Photo Credit: Tye Schulke.

3.5.1 Single-Image Masking: Part 1

Let us begin by loading an image that we know is cloudy. There are plenty of options to choose from in this part of the world but this image from late August shows us clouds in multiple forms. Something else to consider is that clouds have the added effect of casting a shadow on the land below, which further expands the geographic extent of what we must eventually remove. Starting a new script, run the code below to produce an image like the one in the image below.

// Load an initial image to test view a cloudy image.
var sr = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_027026_20180829');

// Define visparams for true-color image.
var srvis = {bands: ['B4', 'B3', 'B2'], min: 0, max: 3000, gamma: 1.4};

// Add your cloudy image to the map.
Map.addLayer(sr, srvis, 'single_scene');
Map.setCenter(-91.8859, 48.8936, 8.81);

An excessively cloudy image over the Quetico-Superior country.

3.5.2 Single-Image Masking: Part 2

Now to take care of our cloud issue. Landsat provides a pixel_qa band that, in a nutshell, assigns different values based on previously quantified features such as the likelihood of clouds and haze. You will find more complicated code for building a cloud mask but this is a simple, conservative way of removing those pesky white blobs from your image. Append the following code to your existing script and re-run to view an image similar to the one below. Remember to uncheck the ‘single_scene’ in your Layers control!

// Define simple cloud mask, based in values from the 'pixel_qa' band.
// Essentially, values of 322 = land and 324 = water.
var qa = sr.select('pixel_qa');
var mask = qa.eq(322).or(qa.eq(324));
var sr_cm = sr.updateMask(mask);

// Add your new map for comparison to the cloudy image.
Map.addLayer(sr_cm, srvis, 'single_scene_masked');

Successful removal of clouds and cloud shadows but the resulting image does not leave many usable pixels.

3.5.3 Masking Across Multiple Dates

We lost quite a bit of the geographic coverage in that single Landsat scene with the cloud mask. But there is another way! We can also apply masks across a range of dates. To do this, we will need to make a function, which we will cover more in Module 9. For now, use the function (and the rest of the code) below to continue your script. Append the code to your existing script.

While it is true there are pre-made NDVI image collections from Landsat in Google Earth Engine, these datasets are only available through 2017. Therefore, we are also going to calculate NDVI and add it to our image collection. This will allow us to generate a median value for each pixel across the growing season for 2018, measuring vegetation health in the study area. Re-running your code, a resulting image should appear like the one below.

// Define how your cloudMask function should work.
var cloudMask = function(image) {
  // var mask = image.select('pixel_qa').eq(322).or(image.select('pixel_qa').eq(324));
  var mask = image.select('pixel_qa').eq(322);
  return image.mask(mask);
};

// This time we'll look at a more ecologically-focused example, using NDVI.
var collection = ee.ImageCollection("LANDSAT/LC08/C01/T1_SR")
.filterDate('2018-06-01','2018-09-30')
.filter(ee.Filter.calendarRange(140, 270)) // roughly, our potential growing season
.filterMetadata('WRS_PATH', 'equals', 27)
.filterMetadata('WRS_ROW', 'equals', 26)
.map(cloudMask)
.map(function NDVI(i){
  return i.addBands(i.normalizedDifference(['B5','B4']).rename('NDVI'))  // generates NDVI band
});

var median = collection.median();

// Red = lower values and darker green = higher values
var collectionVis = {bands: 'NDVI', min: 0.5, max: 0.95, palette: ['red','yellow','green','003300']};

Map.addLayer(median, collectionVis, 'NDVI');

Visualizing the median NDVI values across the growing season in 2018. Feel free to toggle on different background layers. Your image may not look exactly the same as the one shown here.

3.5.4 Visualizing Image Counts

Our NDVI image is looking pretty nice. But how confident are we in these values? Specifically, it is important to assess how many images actually compose our median values. We can perform a quick gut check on whether our resulting NDVI median is representative across the study area by visualizing the sum of how many images were used at each pixel location. Append the following code to your script and click ‘Run’. You should see an image similar to the one below.

With count layers, if we find spatial anomalies in our NDVI values, there are several options. We could expand our seasonal date ranges or choose to include multiple years of data. Ultimately, we may accept defeat and decide that the weather is too cloudy to be useful and decide to explore a different dataset–which is totally acceptable!

// Create counts band.
var counts = collection.select('NDVI').count();

// Find our potential maximum count by getting the size of the initial image collection.
print(collection.size());

// In this palette, red = lower values and blue = higher values.
var countVis = {min: 0, max: 6, palette: ['b2182b','ef8a62','fddbc7', 
                                          'd1e5f0','67a9cf','2166ac']};

Map.addLayer(counts, countVis, 'counts');

Visualizing the number of images used to calculate the median value at each pixel. Darker red values are lower and darker blue values are higher.

3.5.5 Producing and Exporting Histograms

To understand the quantitative distribution of the count values we can build a histogram of the count data. If our study area is so cloudy that the mean is representing only one or two values, we might need to reconsider our source data or collection year. Adding this final code chunk to your script, you will be able to see a histogram in the Console tab. You will be making your own geometry shape (see Module 1 if you need a refresher on how to do this). Your histogram may vary slightly depending on your shape but it is big enough to encompass the imagery, it should resemble the distribution in the image below.

// Draw your own rectangle and try to cover most of the Landsat scene and 
// then we'll make a histogram of the distribution. Because the scene area 
// is so large  GEE requires us to perform some aggregating. But this will 
// give us a general sense of our distribution.
var histogram = ui.Chart.image.histogram(counts, geometry, 120);

// Display the histogram.
print(histogram);

Quantifying the number of distribution of values from the count layer with the histogram function.

4 Conclusion

In this module we have reviewed some of the common corrections applied to remotely sensed imagery that aid in the production of high quality products you will find in Google Earth Engine. We also covered a simple framework for visualizing these differences and saw how the changes in processing levels affected the resulting imagery during a smoky summer in southern Oregon. Finally, we built a workflow script to use Google Earth Engine to remove clouds from growing season imagery, generate a mean vegetation index value, and assess the distribution of the utilized images.

5 Complete Cloud Masking Code

// Load an initial image to test view a cloudy image.
var sr = ee.Image('LANDSAT/LC08/C01/T1_SR/LC08_027026_20180829');

// Define visparams for true-color image.
var srvis = {bands: ['B4', 'B3', 'B2'], min: 0, max: 3000, gamma: 1.4};

// Add your cloudy image to the map.
Map.addLayer(sr, srvis, 'single_scene')
Map.setCenter(-91.8859, 48.8936, 8.81);

// Define simple cloud mask, based in values from the 'pixel_qa' band.
// Essentially, values of 322 = land and 324 = water.
var qa = sr.select('pixel_qa');
var mask = qa.eq(322).or(qa.eq(324));
var sr_cm = sr.updateMask(mask);

// Add your new map for comparison to the cloudy image.
Map.addLayer(sr_cm, srvis, 'single_scene_masked');

// Define how your cloudMask function should work.
var cloudMask = function(image) {
  // var mask = image.select('pixel_qa').eq(322).or(image.select('pixel_qa').eq(324));
  var mask = image.select('pixel_qa').eq(322);
  return image.mask(mask);
};

// This time we'll look at a more ecologically-focused example, using NDVI.
var collection = ee.ImageCollection("LANDSAT/LC08/C01/T1_SR")
.filterDate('2018-06-01','2018-09-30')
.filter(ee.Filter.calendarRange(140, 270)) // roughly, our potential growing season
.filterMetadata('WRS_PATH', 'equals', 27)
.filterMetadata('WRS_ROW', 'equals', 26)
.map(cloudMask)
.map(function NDVI(i){
  return i.addBands(i.normalizedDifference(['B5','B4']).rename('NDVI'))  // generates NDVI band
});

var median = collection.median();

// Red = lower values and darker green = higher values.
var collectionVis = {bands: 'NDVI', min: 0.5, max: 0.95, palette: ['red','yellow','green','003300']};

Map.addLayer(median, collectionVis, 'NDVI');

// Create counts band.
var counts = collection.select('NDVI').count();

// Find our potential maximum count by getting the size of the initial image collection.
print(collection.size());

// In this palette, red = lower values and blue = higher values.
var countVis = {min: 0, max: 6, palette: ['b2182b','ef8a62','fddbc7', 
                                          'd1e5f0','67a9cf','2166ac']};

Map.addLayer(counts, countVis, 'counts');

// Draw your own rectangle and try to cover most of the Landsat scene and 
// then we'll make a histogram of the distribution. Because the scene area 
// is so large  GEE requires us to perform some aggregating. But this will 
// give us a general sense of our distribution.
var histogram = ui.Chart.image.histogram(counts, geometry, 120);

// Display the histogram.
print(histogram);

6 References

Young, N. E., Anderson, R. S., Chignell, S. M., Vorster, A. G., Lawrence, R., & Evangelista, P. H. (2017). A survival guide to Landsat pre-processing. Ecology, 98(4), 920-932.