Accessing data from large online rasters with Cloud-Optimized-Geotiff, GDAL, and terra R package

Quite often we want to extract climatic or environmental data from large global raster files. The traditional workflow requires downloading the rasters (quite often many GB) and then performing data extraction in your local computer. Thus, to retrieve climate data for a single site we may first need to download many GB of raster files, taking quite a lot of time and disk space.

Using cloud optimized geotiff can magically save us from such a massive download, allowing us to efficiently collect data from really large rasters in a matter of seconds, without having to download anything!

In this post I show how we can convert any raster to a cloud-optimised-geotiff (COG), and efficiently read data from online COG using GDAL virtual file system and terra R package.

Converting a raster into a cloud optimised geotiff

As explained in https://www.cogeo.org, converting a raster into a cloud-optimised-geotiff (COG) is really easy using gdal_translate:

gdal_translate in.tif out.tif -co TILED=YES -co COPY_SRC_OVERVIEWS=YES -co COMPRESS=DEFLATE

If you want to make such conversion within R, you could use gdalUtilities for example:

gdalUtilities::gdal_translate(src_dataset = in.tif, 
                              dst_dataset = out.tif,
                              co = matrix(c("TILED=YES", 
                                            "COPY_SRC_OVERVIEWS=YES", 
                                            "COMPRESS=DEFLATE"), 
                                          ncol = 1))

out.tif will now be a cloud-optimised-geotiff which you can host online in a webpage, file server, etc.

Accessing data from cloud optimised geotiffs with R

Accessing data from cloud optimised geotiffs is easy, fast, and efficient thanks to GDAL virtual file system.

Here we use terra R package, the successor of raster package, to extract daily climate data for Grazalema, a rainy town in southern Spain. First, let’s get the site coordinates:

site.coords <- data.frame(lon = -5.36878, lat = 36.75864)

The climate raster files (in cloud optimised geotiff format) are hosted online in an FTP server (note https works as well). There is one large (several GB) COG raster for every year, and each has one layer per day (i.e. 365 or 366 layers depending on leap years).

If we wanted to obtain climate data from these rasters using the traditional workflow, we would have to download many GB to our computer, even if interested in a single site. But using COG, we can extract data in a matter of seconds without having to download any entire raster.

For example, this is the url of the COG raster with daily precipitation data for 2010:

ftp://palantir.boku.ac.at/Public/ClimateData/v2_cogeo/AllDataRasters/prec/DownscaledPrcp2010_cogeo.tif

We are going to paste “/vsicurl/” before the url to enable GDAL virtual file system:

cog.url <- "/vsicurl/ftp://palantir.boku.ac.at/Public/ClimateData/v2_cogeo/AllDataRasters/prec/DownscaledPrcp2010_cogeo.tif"

Now let’s connect to the online raster using terra::rast

library(terra)
## terra version 1.2.13
ras <- rast(cog.url)

We can see that the connection works. These are the raster metadata:

ras
## class       : SpatRaster 
## dimensions  : 6030, 13920, 365  (nrow, ncol, nlyr)
## resolution  : 0.008333333, 0.008333333  (x, y)
## extent      : -40.5, 75.5, 25.25, 75.5  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +datum=WGS84 +no_defs 
## source      : DownscaledPrcp2010_cogeo.tif 
## names       : Dow_1, Dow_2, Dow_3, Dow_4, Dow_5, Dow_6, ...

Now let’s extract the precipitation at this site for each day of 2010. Using terra::extract it is as simple as:

cog.data <- terra::extract(ras, site.coords)

We have 365 values of precipitation, one per day:

cog.data[1:10]
##   ID Dow_1 Dow_2 Dow_3 Dow_4 Dow_5 Dow_6 Dow_7 Dow_8 Dow_9
## 1  1     0     0  1331  3311     0  1940  1671     0     0
cog.data[360:366]
##   Dow_359 Dow_360 Dow_361 Dow_362 Dow_363 Dow_364 Dow_365
## 1     142       0       0       0     222    1182     694

And extraction has been very fast!

system.time({
  ras <- rast(cog.url)
  terra::extract(ras, site.coords)
})
##    user  system elapsed 
##   0.378   0.085   0.462

We are obtaining data from a 4 GB online raster in less than a second, without having to download anything!

We could also obtain the data for a particular day, or subset of days (e.g. 6 January 2010):

ras <- rast(cog.url)
ras <- subset(ras, 6)  # get 6th layer only (6 January)
terra::extract(ras, site.coords)
##   ID Dow_6
## 1  1  1940

Or for several sites:

site.coords <- data.frame(lon = c(-5.6, -5.1), 
                          lat = c(36.5, 37.2))
site.coords
##    lon  lat
## 1 -5.6 36.5
## 2 -5.1 37.2
terra::extract(ras, site.coords)
##   ID Dow_6
## 1  1  2043
## 2  2  1705

Finally, we could also get a subset of the online raster (i.e. get a raster rather than a data frame).

For example, to get a raster of the precipitation around Grazalema town on 6 January 2010:

site.coords <- data.frame(lon = c(-5.6, -5.1), 
                          lat = c(36.6, 37.2))
site.area <- vect(site.coords)  # convert to terra vector object (SpatVector)
site.area
##  class       : SpatVector 
##  geometry    : points 
##  dimensions  : 2, 2  (geometries, attributes)
##  extent      : -5.6, -5.1, 36.6, 37.2  (xmin, xmax, ymin, ymax)
##  coord. ref. :  
##  names       :   lon   lat
##  type        : <num> <num>
##  values      :  -5.6  36.6
##                 -5.1  37.2
ras <- rast(cog.url)
ras <- subset(ras, 6)  # get 6th layer only

precip <- terra::crop(ras, site.area)
plot(precip)

Conclusion

Cloud optimised geotiffs allow us to easily, quickly and efficiently distribute and access data from very large rasters hosted online.

We are using this approach in a forthcoming R package that aims to facilitate access to climate data. Stay tuned!

Francisco Rodríguez-Sánchez
Francisco Rodríguez-Sánchez
Researcher

Computational Ecologist & Data Scientist.

Related