Cloudiness Trends for the 2017 Solar Eclipse

Planning an excursion to see the upcoming solar eclipse? NASA can help with that! They provide two sets of data which can point you to good viewing:

A little bit of R scripting lets us combine these and put them onto a Leaflet map of the US.

Downloads

Before coding, there are two things to download:

  • An outline of the US, for clipping the path; I used one from the US census bureau at http://www2.census.gov/geo/tiger/GENZ2015/shp/cb_2015_us_nation_5m.zip.
  • A collection of satellite images, from https://ladsweb.nascom.nasa.gov/search/. For this example we use 12 years of data for the two weeks

    centered on August 21, for just the contiguous US. The MODIS satellites (“Aqua” and “Terra”) take images in swaths. We could download the raw swath data and do our cloud coverage analysis from that, but NASA provides options which simplify our task greatly. They can process the swath data into tiles, provide per-pixel “cloud fraction” data, generate geo-referenced TIFF format, and project the result into Mercator (to better match Leaflet’s display). To get the data product with these options at the LAADS website, specify the satellite (we used Terra), “6” for the data collection, and then “MOD06_L2” under the Atmosphere group. Then the choices for cloud fraction, geoTIFF, and Mercator are available under the “Review and Order” step. NASA sends email when the files are processed and ready for ftp download, typically in well under an hour.

You can confirm that your downloaded cloud images are what you expect by comparing them to the historical MODIS images at http://ge.ssec.wisc.edu/modis-today/index.php.

Code

In the code, first set up the environment.

require(leaflet)
require(rgdal)
require(raster)
require(rgeos)

modispath <- "directory_containing_MODIS_geotiff_files"
modisnames <- list.files(path=modispath, pattern="M*.tif",
                            full.names=TRUE, recursive=FALSE)

We want some flexibility in choosing what cloud data to consider. For instance, we may have downloaded images for more days than necessary, or we may want to consider a broader or narrower range of days on either side of August 21:

  • just the August 21 images,
  • August 21 plus 3 days on either side (Aug. 18-24), or
  • a full week on either side (Aug. 14-28)

We’ll do the date range selection by making an interval around the center date: pad by 0 to get just Aug. 21, by 3 for Aug. 18-24, or by 7 for Aug. 14-28.

days.padding <- 0  # or 3 or 7 or what you like.

For testing, it can be handy to do a quick run with just a few of the MODIS image files instead of the whole list. Even for complete runs one may not want to consider the full range of years available. So, one final piece of setup is an option to limit the number of input files.

useimages.all <- TRUE  # true for a full run, false to limit
useimages.limit <- 12  # only applies if useimages.all is FALSE
earliest.year <- 2000  # in case we want just a more recent subset of the years

That finishes the setup. Now, the first step of analysis is to load up one of the MODIS raster files; it will be handy to have the exact map projection that the raster files use, and any one of the files will tell us that. (Depending on details of your R installation, you may find that the raster functions give warning messages “NOTE: rgdal::checkCRSArgs: no proj_defs.dat in PROJ.4 shared files” — these can be ignored.)

anyoldmodisfile <- modisnames[1]
anymodisrast <- raster(anyoldmodisfile)

Because the MODIS images are given not dates (8/21) but day numbers (233), we need to do a little translation when we're checking to see if a given image file is one that we want. We take care of this with a function is.in.range, which will do the check based on file names. Image file names begin with a string like “MOD06_L2.A2005229”, which in this case indicates day number 229 of 2005. August 21 is day 233 (or 234 in leap years); from that August 21 center point we see if a given day is within the days.padding range around it.
Note that if we were using files from the MODIS instrument on Aqua instead of Terra, the file names would start with MYD instead of MOD. The “M.D” in the search string handles either case.

is.in.range <- function(fname) {
    year.day <- gsub(".*M.D06_L2.A(....)(...).*",
                    "\\1,\\2",
                    fname)  # to give us a string like "2005,229" from the file name
    yr <- as.integer(substr(year.day, 1, 4))  # 2005
    dy <- as.integer(substr(year.day, 6, 8))  # 229
    base.day <- 233                    # Aug. 21 in non-leap years
    if ( yr%%4 == 0 ) base.day <- 234  # Aug. 21 in leap years (handily 2000 was a leap year)
    answer <- (abs(base.day-dy) <= days.padding)
    if ( yr < earliest.year) answer <- FALSE # in case we want only the most recent few years
    answer
}

Make a list of the subset of files which will be used in the coverage analysis. For quick runs (only considering useimages.limit of the files), truncate the list down to that limit.

modisnames.filtered <- modisnames[sapply(modisnames, is.in.range)]
modisnames.final <- if (useimages.all) modisnames.filtered
                                     else modisnames.filtered[1:useimages.limit]

Next, pick up the projection used by the MODIS files; we'll adjust a US map to it.

crs.modis <- proj4string(anymodisrast)

Read in the US map, project it to match the MODIS file projections, and rasterize it for use in masking.

usa.all.shp <- readOGR("data/cb_2015_us_nation_5m",
                   "cb_2015_us_nation_5m")

conusplus <- extent(-125, -66, 24, 53)
usa.shp <- crop(usa.all.shp, conusplus)

template <- raster(ncol=1200,nrow=700)
proj4string(template) <- proj4string(usa.shp)
extent(template) <- conusplus

usa.rast <- rasterize(usa.shp, template)
usa.rast.modis <- projectRaster(usa.rast,crs=crs.modis)


#  and now we get a clipping mask to discard the over-ocean part of the eclipse path
usamask <- usa.rast.modis
usamask[ usamask>254] <- NA  # looks like 255 is the out-of-bounds value

Next, build the eclipse path polygon from the coordinates table available on the NASA eclipse web site.

pathurl <- 'https://eclipse.gsfc.nasa.gov/SEpath/SEpath2001/SE2017Aug21Tpath.html'
res <- readLines(pathurl)

The web page (do take a look at it, to get a sense of what we need to parse) has a table of coordinates for the path outline; to find the table in the file, we will look for the lines

  • from the one starting ' Limits'
  • to the one starting ' 20:00'

We'll do some parsing and reorganizing of that table text to generate a polygon for the path outline.

useful <- res[ pmatch(" Limits  3", res):pmatch(" 20:00   13",res) ]
good <- useful[ grep("0",useful)]  # get rid of the occasional blank line - nonblanks always have a 0 somewhere.
#  Now, want to change the format of the lines
#  from " 16:50   41 29.7N 164 30.3W  41 25.2N 161 24.4W ...
#  to   "         41,29.7, 164,30.3,  41,25.2, 161,24.4" 
#  so we can pass them to read.csv nicely.
commaed <- gsub(
      " [^ ]+ +(..) (....)N (...) (....)W  (..) (....)N (...) (....)W.*",
      "\\1,\\2,\\3,\\4,\\5,\\6,\\7,\\8",
      good)
joined <- paste(commaed, sep="\n")
pathtab <- read.csv(text=joined, header=FALSE)
pathtab$lat1 <-  pathtab$V1 + pathtab$V2/60  # degrees and minutes to decimal degrees
pathtab$lon1 <- -pathtab$V3 - pathtab$V4/60  # W means negative
pathtab$lat2 <-  pathtab$V5 + pathtab$V6/60
pathtab$lon2 <- -pathtab$V7 - pathtab$V8/60
#  read down the first column and back up the second to get
#  the perimeter of the region
lats <- c(pathtab$lat1, rev(pathtab$lat2))
lons <- c(pathtab$lon1, rev(pathtab$lon2))
outline <- data.frame(Lon=lons, Lat=lats)  # careful with the column names!

Next, put the outline into the map projection.

p <- Polygon(outline)
ps <- Polygons(list(p), ID=1)  # ID is required
#  get base latlong coord system by making a new raster; 40 is just an arbitrary raster size.
latlonraster <- raster(nrow=40,ncol=40)
crs.latlon <- proj4string(latlonraster)
sps <- SpatialPolygons(list(ps), proj4string=CRS(crs.latlon))
pathoutline.mercator <- spTransform(sps,CRSobj = CRS(crs.modis))

Rasterize and clip the path outline to the US boundary.

tmp <- rasterize(pathoutline.mercator, usa.rast.modis)
pathoutline.rast <- mask(tmp, usamask)

We're getting close to the code which figures out average cloud cover. The average at a pixel is total/num_samples; we’ll call it cloud.accumulator/cloud.counter.

#  accumulator for summing up cloud fraction (usa raster just for size & CRS)
cloud.accumulator <- usa.rast.modis
cloud.accumulator[] <- 0

# num. observations of cloud fraction
cloud.counter <- cloud.accumulator
cloud.counter[] <- 0

The moment we've been waiting for:
Loop through all the cloud_fraction tiff files to add up the cloud fraction and counts.

for ( fn in modisnames.final ){
    # print( paste0("working on ", fn))  # optional: progress markers

    cloudrast <- raster(fn)

    #  the resample() call quits if the extents don't overlap,
    #  and some of the MODIS input files are just outside of the
    #  conUS region.  So, carefully skip any file which doesn't
    #  overlap our accumulator rasters.
    ext.cloudrast <- extent(cloudrast)
    ext.accum <- extent(cloud.accumulator)
    if ( is.null(intersect(ext.cloudrast, ext.accum) ) ) next

    #  resample using nearest-neighbor; otherwise extrapolation into NA
    #  regions gives troublesome negative numbers
    tmpcloud <- resample(cloudrast, cloud.accumulator, method='ngb')
    tmpcloud[ tmpcloud>100 ] <- NA  # 127 is the NA flag; see
    #  http://modis-atmos.gsfc.nasa.gov/_specs_c6/MOD06_L2_CDL_fs.txt

    #  the cloudiness fractions which we'll be adding up
    mydata <- tmpcloud
    mydata[is.na(mydata[])] <- 0  # count NAs as no clouds

    #  Keep track of pixels where we had data  (so when figuring the
    #  average we divide by the correct number of observations)
    ones <- raster(tmpcloud) 
    ones[] <- 1
    counts <- mask(ones, tmpcloud)
    counts[is.na(counts[])] <- 0  # count NAs as no observation here

    cloud.accumulator <- cloud.accumulator + mydata
    cloud.counter <- cloud.counter + counts
}

Take the average.

cloud.has <- cloud.counter
cloud.has[ cloud.has == 0 ] <- NA  # NA prevents divide-by-0 problems
cloud.avg <- cloud.accumulator / cloud.has
cloud.avg.masked <- mask(cloud.avg, pathoutline.rast)

We've worked hard to make these Rasters; save them for later use. Include days.padding in the filename so that runs with different amounts of padding won't overwrite each other.

pathfn <- "out/pathmask.tif"
cloudfn <- paste0("out/cloudavg_pad",days.padding,".tif")
cloudpathfn <- paste0("out/cloudavgmasked_pad",days.padding,".tif")

writeRaster(pathoutline.rast, pathfn, overwrite=TRUE)
writeRaster(cloud.avg, cloudfn, overwrite=TRUE)  # cloudiness over the whole contiguous US
writeRaster(cloud.avg.masked, cloudpathfn, overwrite=TRUE)  # cloudiness masked to just the eclipse path

Set up colors for the final display – blue for clear, white for cloudy, and interpolated between. A note of caution here: Leaflet overlays can end up with a patchwork look if the highest values in the raster are too close to the top end of the color palette’s domain; the workaround is to extend the domain slightly.

palette.domain <- extendrange( c(0,100), f=0.001 )
palplus <- colorNumeric(
  na.color="#00000000",  #  transparent
  palette = c("blue", "white"),
  domain = palette.domain   #  not just values(cloud.avg.masked)!
    #  The domain needs to go just a little higher than the top end
    #  of the values, because otherwise the bitmap ends up with
    #  gray bands mixed in with the white.  Use c(0,100) instead
    #  of values(cloud.avg.masked) because the latter may not include
    #  the entire 0..100 range and we want the colors to be based
    #  on 0..100 
)

And the point of all of this: let's show the path in Leaflet. Here’s what the result of one of our test runs looks like; the actual Leaflet pane can be found at http://spatialreasoning.com/cloudcoverage/. Another note of caution: Including Leaflet and an image in an R Markdown script (this blog post was generated from a markdown script) turns out to make an HTML file bigger than WordPress will handle. So here, commented out, are the two lines which would make a Leaflet map pane in the markdown.

# leaflet() %>% addTiles() %>%
#   addRasterImage(cloud.avg.masked, opacity = 0.7, colors=palplus)

Leaflet map of eclipse path with partial cloud cover data

Another option is to show cloudiness over the full region, not just in the path of totality, which can be useful context:

# leaflet() %>% addTiles() %>%
#   addRasterImage(cloud.avg, opacity = 0.7, colors=palplus)

And that wraps up our cloud coverage code walk-through. What’s next? One extension of this would be to use data from the Aqua satellite rather than Terra; the timing of the Terra passes is close to the eclipse times for the western U.S., but the slightly later time of day for Aqua's photos may give more relevant cloud coverage for the eastern US eclipse times.