Using R to perform basic operations on a NetCDF file
I am pleased to present my new blog. In this space you will find tips that may be useful for students or researchers who need to deal in some way with spatial data in their research.
The purpose of this first post is to show how to read, plot and perform simple algebraic operations on a relatively common climate file (in the NetCDF format) using R. The same logic can be applied not only to atmospheric data but to any data in grid point: for example satellite images or ocean data.
To illustrate this post I will use the monthly global observations of air temperature in high spatial resolution (0.5° x 0.5°) from the Climate Research Unit (CRU). The current version of this data (version 4.03 at the time of this writing) covers the months of January 1901 to December 2018 and can be downloaded from: crudata.uea.ac.uk/cru/data/hrg.
However, the complete series file is very large (~ 170Mb download and 3Gb after unpacking). For simplicity's sake, we'll use the file that covers only January 2011 through December 2018. So download and unzip the file named
cru_ts4.03.2011.2018.tmp.dat.nc.gz from the above CRU page.
Before we start
I'm assuming in these articles that you have basic R skills such as the most common data structures, installing packages, accessing directories in command line, and so on. For help on these topics, abundant resources are available on the internet. Besides asking google, very good information can be found on stackoverflow.com. In addition to installing R, I recommend using the RStudio editor (www.rstudio.com). RStudio is a graphic interface to R that makes it much simpler to use. RStudio includes code editor and visualization tools for variables and figures.
In this first tutorial we will use the
raster package. Packages in R are a combination of functions that perform specialized tasks which are not available in standard R functions. One of these packages,
raster (which I will often mention hereafter) is specialized in working with spatialized spatial data of “raster”).
Particularly for opening netCDF files, the raster package requires functions from the
ncdf4 package, which is responsible for accessing the netCDF libraries that are installed on your computer. Therefore, in order to install
ncdf4 you will also need to install the netCDF libraries on your computer. This step is not necessary if you use R in Windows or MacOS. However, if you are using linux, such as Ubuntu, you can install the netCDF libraries with the following command (on your system, not on R):
sudo apt-get install libudunits2-dev libnetcdf-dev udunits-bin libudunits2-dev
Then, proceed to the installation of both
# install packages used in this article install.packages("ncdf4") install.packages("raster")
Let's begin our R session by loading the packages we will need for today's tasks:
# loads the packages used in this guide library(ncdf4) library(raster)
Reading a NetCDF file using the raster package is quite simple. If the file had only one time layer (for example a topography file), you would just use the
raster command (you read that right: the command has the same name as the package name).
However, since most weather archives have multiple layers of time - that is, consecutive measurements of the study variable over time - we need to use the
brick function. The following command creates a variable called
b, containing all time slices of the NetCDF file:
# reads the netcdf file without specifying variable b <- brick("/Volumes/Storage/Research/data/cru_ts4.03.2011.2018.tmp.dat.nc")
** Note **: If the NetCDF file has more than one variable, you can choose which variable to open using the
varname argument. In the case of the file we are analyzing, the variable name is
tmp. Depending on the file you are reviewing, it could be
precip or something like that. Here's an example of how to specify the variable name:
# reads the netcdf file specifying variable b <- brick("/Volumes/Storage/Research/data/cru_ts4.03.2011.2018.tmp.dat.nc", varname="tmp")
Note that the brick command loads all layers of the NetCDF file, automatically reading its resolution, date, and spatial coverage details (lat x lon). Let's take a look at the summary of the variable we just created:
# summarizes file information b class : RasterBrick dimensions : 360, 720, 259200, 60 (nrow, ncol, ncell, nlayers) resolution : 0.5, 0.5 (x, y) extent : -180, 180, -90, 90 (xmin, xmax, ymin, ymax) coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 data source : /Volumes/Storage/Research/data/cru_ts3.24.2011.2015.tmp.dat.nc names : X2011.01.16, X2011.02.15, X2011.03.16, X2011.04.16, X2011.05.16, X2011.06.16, X2011.07.16, X2011.08.16, X2011.09.16, X2011.10.16, X2011.11.16, X2011.12.16, X2012.01.16, X2012.02.15, X2012.03.16, ... Date : 2011-01-16, 2018-12-16 (min, max) varname : tmp
Some useful information is provided by this summary. For example:
- dimensions: the number of rows (latitudes), columns (longitudes), total pixels (rows multiplied by columns), and time layers (dates) of the file.
- resolution: shows the resolution (or “pixel size”) of the data (in degrees in this case). A pixel of 0.5°x0.5° means that each pixel represents an area of approximately 55km x 55km of the earth's surface.
- extent: shows the spatial coverage (that is, the maximum and minimum latitude and longitude limits) of the NetCDF file.
- coord. ref: the coordinate system in which the file is projected. This file is in lat / lon coordinate system with the WGS84 datum.
- date: the temporal coverage of the data, in this case going from January 16, 2011 (the first layer) until December 16, 2018 (the last layer).
Plot the data
In addition to this basic information we extract from the file, we can also do an exploratory visual inspection of the data plotting the variable b:
# plots the variable plot(b)
Note that, by default, all 16 first layers of the raster are plotted. If you want, for example, to plot only the first layer, use the indexing system of R:
# plots the first layer of the variable plot(b[])
And you can also plot a range of layers, for example 13 to 24 - which in our file would represent the dates from 16th January 2012 to 16th December 2012.
# plots a range of variable layers plot(b[[13:24]])
Although the plot function of R provides very basic figures for raster objects, it serves well for an initial visual inspection of the data. In a future tutorial, I'll explore ways to plot raster objects with a much more “professional” look. If you want to get ahead of yourself, here are some examples of maps produced with the
rasterVis package: oscarperpinan.github.io/rastervis.
Simple algebraic operations on raster objects in R do not require any special syntax. We can treat our raster variable
b as a normal R variable. To illustrate this idea, let's super want to convert our data, which are expressed in ° C, to a new variable that will be expressed in ° F. A very simple code to do this would be:
# Converts temperature in degrees C to degrees F # The formula used is: # t_F is equal to t_C multiplied by 9/5 and added to 32 b_em_F <- (b * 9/5) + 32
And now let's check the result:
# plots the 1st temperature date in F plot(b_em_F[])
The map appears consistent, with temperatures ranging from about -50 ° F (-45 ° C) to 105 ° F (40 ° C).
More complicated operations, for example requiring more than one variable, are a little more elaborate and would need to use the calc or overlay functions of the raster package. I will approach such operations in a future tutorial.
This tutorial showed some basic functions for opening and plotting NetCDF files in R. In the next post I will show you how to make a spatial clipping on that CRU data. Stay tuned! If you find the content of this post useful, feel free to post it among other people who may be interested.
If you have any questions, comments or suggestions, please do not hesitate to contact us. If you have any specific suggestion of article topic, shoot me an email as well. If I know how to help, I write the article as soon as possible.