Raster Histograms

First, add the required depedencies

using Rasters, NCDatasets, Downloads, CairoMakie

and the RasterHistograms package.

using RasterHistograms

Using this package we can produce Histograms from data that is in a Raster, RasterStack or RasterSeries, which are N-dimensional arrays, in a similar way that xhistogram works for xarray in python. This example is structured similarly to the xhistogram tutorial.

Randomly generated toy data

First we generate some randomly distributed data and form a Raster.

x, t = range(-2π, 2π; length = 50), range(0, 4π; length = 100)
dimensions = (X(x), Ti(t))
rs = Raster(randn(length(x), length(t)), dimensions; name = :Toy_data)
50×100 Raster{Float64,2} Toy_data with dimensions: 
  X Sampled{Float64} -6.283185307179586:0.2564565431501872:6.283185307179586 ForwardOrdered Regular Points,
  Ti Sampled{Float64} 0.0:0.12693303650867852:12.566370614359172 ForwardOrdered Regular Points
extent: Extent(X = (-6.283185307179586, 6.283185307179586), Ti = (0.0, 12.566370614359172))missingval: missingparent:
            0.0         0.126933  …  12.3125    12.4394     12.5664
 -6.28319  -1.82931     0.826835      1.05965    0.0852104  -0.536963
 -6.02673   1.45437     1.44044      -0.246961   0.185595    0.0402522
 -5.77027   0.582355   -0.772286     -0.43168   -0.9027      1.87734
 -5.51382  -1.28468     1.34507      -3.24976   -0.928869   -1.19851
 -5.25736   0.375997    0.513657  …  -0.39049   -0.314387    2.10712
  ⋮                               ⋱              ⋮          
  5.0009    0.0892556  -0.70489       0.343495   2.01332    -1.48377
  5.25736   1.85664     1.07829      -0.475647   0.536484   -0.382936
  5.51382  -0.294048   -0.540655      2.05521    0.625332    0.734095
  5.77027  -1.0287     -0.516634  …  -0.522387  -0.252286   -0.508618
  6.02673  -0.044631   -0.280868      1.01865    1.71449     0.438216
  6.28319  -0.267313   -0.617277      0.675261  -1.38207    -1.8626

The we can form a RasterLayerHistogram for the :Toy_data

rs_hist = RasterLayerHistogram(rs)
RasterLayerHistogram for the variable Toy_data
┣━━ Layer dimensions: (:X, :Ti) 
┣━━━━━━━━ Layer size: (50, 100)
┗━━━━━━━━━ Histogram: 1-dimensional

We can then plot the data and the Histogram

fig = Figure(size = (1000, 600))
ax1 = Axis(fig[1, 1];
           title = "Toy data",
           xlabel = "x",
           ylabel = "time")
hm = heatmap!(ax1, x, t, rs.data)
Colorbar(fig[2, 1], hm; vertical = false, flipaxis = false)
ax2 = Axis(fig[1, 2];
          title = "Histogram of Toy data",
          xlabel = "Toy data", ylabel = "Counts")
plot!(ax2, rs_hist; color = :steelblue)
fig

By default the Histogram has the counts in each bin. We can normalise the Histogram by calling the normalize! function on rs_hist and choosing a mode of normalisation. For more information about the possible modes of normalisation see here.

normalize!(rs_hist; mode = :pdf)

Then replot with the normalised histogram

fig = Figure(size = (900, 600))
ax1 = Axis(fig[1, 1];
           title = "Toy data",
           xlabel = "x",
           ylabel = "time")
hm = heatmap!(ax1, x, t, rs.data)
Colorbar(fig[2, 1], hm; vertical = false, flipaxis = false)
ax2 = Axis(fig[1, 2];
          title = "Histogram (pdf) of Toy data",
          xlabel = "Toy data", ylabel = "density")
plot!(ax2, rs_hist; color = :steelblue)
fig
Info

Plotting using Plots.jl is also possible. See the module documentation for more info.

Real world data example

This package is mainly concerned with ocean variables, so we now look at temperature and salinity distributions using ECCO model output.

Downloads.download("https://opendap.earthdata.nasa.gov/providers/POCLOUD/collections/ECCO%2520Ocean%2520Temperature%2520and%2520Salinity%2520-%2520Daily%2520Mean%25200.5%2520Degree%2520(Version%25204%2520Release%25204)/granules/OCEAN_TEMPERATURE_SALINITY_day_mean_2007-01-01_ECCO_V4r4_latlon_0p50deg.dap.nc4", "ECCO_data.nc")

This example also shows how the module works for 2-dimensional Histograms though it can be generalised to N dimensions depending on the number of variables (i.e. layers in the RasterStack) one is looking at.

Forming the RasterStack

We form a RasterStack with only the :SALT (practical salinity) and :THETA (potential temperature) layers. This means the resulting RasterStackHistogram will be 2 dimensional. Note the order of the variables matters here for plotting purposes. The first layer, in this case :SALT will be the x-axis, and the second layer :THETA will be the y-axis.

stack_TS = RasterStack("ECCO_data.nc"; name = (:SALT, :THETA))
edges = (31:0.025:38, -2:0.1:32)
stack_hist = RasterStackHistogram(stack_TS, edges)

Now we can plot, the histogram and look at the unweighted distribtution of temperature and salinity. By default the empty bins are plotted with the value of zero. To not plot the empty bins argument we pass show_empty_bins = false to the plotting function.

fig = Figure(size = (500, 500))
ax = Axis(fig[1, 1];
          title = "Temperature and salinity joint distribution (unweighted)",
          xlabel = "Practical salinity (psu)",
          ylabel = "Potential temperature (°C)")
hm = heatmap!(ax, stack_hist; colorscale = log10)
Colorbar(fig[1, 2], hm)
fig

Weighting the Histogram

The module also exports simple functions for calculating area and volume weights from the dimensions of the grid and plot the data. Where weights are available from model data they should be used in favour of the functions.

dV = volume_weights(stack_TS)
weighted_stack_hist = RasterStackHistogram(stack_TS, dV, edges)
fig = Figure(size = (500, 500))
ax = Axis(fig[1, 1];
          title = "Temperature and salinity joint distribution (weighted)",
          xlabel = "Practical salinity (psu)",
          ylabel = "Potential temperature (°C)")
hm = heatmap!(ax, weighted_stack_hist; colorscale = log10)
Colorbar(fig[1, 2], hm)
fig

This page was generated using Literate.jl.