Skip to content

Using SDeMo with other packages

In this tutorial, we will work on the same species and location as in the excellent tutorial on SDMs by Damaris Zurell, to show how SpeciesDistributionToolkit integrates prediction to the rest of the workflow.

julia
using SpeciesDistributionToolkit
using CairoMakie
using Statistics

Note that there are several sections about the SDeMo package in the "How-to" section of the manual.

Getting the data

julia
CHE = SpeciesDistributionToolkit.openstreetmap("Switzerland");

In order to simplify the code, we will start from a list of bioclim variables that have been picked to optimize the model - SDeMo offers many functions for variable selection.

julia
bio_vars = [1, 11, 5, 8, 6]
5-element Vector{Int64}:
  1
 11
  5
  8
  6

We get the data on these variables from CHELSA2:

julia
provider = RasterData(CHELSA2, BioClim)
L = SDMLayer{Float32}[
    SDMLayer(
        provider;
        layer = x,
        SpeciesDistributionToolkit.boundingbox(CHE)...,
    ) for x in bio_vars
];

And we then clip and trim to the polygon describing Switzerland:

julia
mask!(L, CHE)
5-element Vector{SDMLayer{Float32}}:
 🗺️  A 240 × 546 layer (70065 Float32 cells)
 🗺️  A 240 × 546 layer (70065 Float32 cells)
 🗺️  A 240 × 546 layer (70065 Float32 cells)
 🗺️  A 240 × 546 layer (70065 Float32 cells)
 🗺️  A 240 × 546 layer (70065 Float32 cells)

The next step is to get the data, using the eBird dataset:

julia
ouzel = taxon("Turdus torquatus")
presences = occurrences(
    ouzel,
    first(L),
    "occurrenceStatus" => "PRESENT",
    "limit" => 300,
    "datasetKey" => "4fa7b334-ce0d-4e88-aaae-2e0c138d049e",
)
while length(presences) < count(presences)
    occurrences!(presences)
end

And after this, we prepare a layer with presence data:

julia
presencelayer = mask(first(L), Occurrences(mask(presences, CHE)))
🗺️  A 240 × 546 layer with 70065 Bool cells
   Projection: +proj=longlat +datum=WGS84 +no_defs

The next step is to generate a pseudo-absence mask. We will sample based on the distance to an observation, by also preventing pseudo-absences to be less than 4km from an observation:

julia
background = pseudoabsencemask(DistanceToEvent, presencelayer)
bgpoints = backgroundpoints(nodata(background, d -> d < 4), 2sum(presencelayer))
🗺️  A 240 × 546 layer with 45362 Bool cells
   Projection: +proj=longlat +datum=WGS84 +no_defs

We can take a minute to visualize the dataset, as well as the location of presences and pseudo-absences:

Code for the figure
julia
f = Figure(; size = (600, 300))
ax = Axis(f[1, 1]; aspect = DataAspect())
poly!(ax, CHE; color = :grey90, strokecolor = :black, strokewidth = 1)
scatter!(ax, presencelayer; color = :black)
scatter!(ax, bgpoints; color = :red, markersize = 4)
hidedecorations!(ax)
hidespines!(ax)

Setting up the model

These steps are documented as part of the SDeMo documentation. The only difference here is that rather than passing a matrix of features and a vector of labels, we give a vector of layers (features), and the two layers for presences and absences:

julia
sdm = SDM(ZScore, Logistic, L, presencelayer, bgpoints)
hyperparameters!(classifier(sdm),, 1e-4);
hyperparameters!(classifier(sdm), :interactions, :all);
hyperparameters!(classifier(sdm), :epochs, 10_000);

We will now train the model on all the training data.

julia
train!(sdm)
ZScore → Logistic → P(x) ≥ 0.318

Making the first prediction

We can predict using the model by giving a vector of layers as an input. This will return the output as a layer as well:

julia
prd = predict(sdm, L; threshold = false)
🗺️  A 240 × 546 layer with 70065 Float64 cells
   Projection: +proj=longlat +datum=WGS84 +no_defs

Because this prediction is a layer, we can plot it directly:

Code for the figure
julia
f = Figure(; size = (600, 300))
ax = Axis(f[1, 1]; aspect = DataAspect())
hm = heatmap!(ax, prd; colormap = :linear_worb_100_25_c53_n256, colorrange = (0, 1))
contour!(ax, predict(sdm, L); color = :black, linewidth = 0.5)
Colorbar(f[1, 2], hm)
lines!(ax, CHE; color = :black)
hidedecorations!(ax)
hidespines!(ax)

As a rule, most relevant function in SDeMo will, when given layers as an argument, return the output as layers:

julia
partial1 = partialresponse(sdm, L, 1; threshold=false)
🗺️  A 240 × 546 layer with 70065 Float64 cells
   Projection: +proj=longlat +datum=WGS84 +no_defs

The output can be visualised as well. Partial responses are given in the same units as the prediction.

Code for the figure
julia
f = Figure(; size = (600, 300))
ax = Axis(f[1, 1]; aspect = DataAspect())
hm = heatmap!(ax, partial1; colormap = :tempo, colorrange = (0, 1))
contour!(ax, predict(sdm, L); color = :black, linewidth = 0.5)
Colorbar(f[1, 2], hm)
lines!(ax, CHE; color = :black)
hidedecorations!(ax)
hidespines!(ax)

This, naturally, is also true for Shapley values:

julia
shapley1 = explain(sdm, L, 1; threshold=false)
🗺️  A 240 × 546 layer with 70065 Float64 cells
   Projection: +proj=longlat +datum=WGS84 +no_defs

Note that the Shapley values are expressed as a deviation from the average prediction, and so positive values correspond to the presence class being more likely.

Both the explain and partialresponse functions can accept keywords that are passed to predict - in this tutorial, we use threshold=false to provide an explanation on the score returned by the model, but we can also request an explanation of the binary response (this is usually less informative).

Code for the figure
julia
col_lims = maximum(abs.(quantile(shapley1, [0.1, 0.9]))).*(-1, 1)
f = Figure(; size = (600, 300))
ax = Axis(f[1, 1]; aspect = DataAspect())
hm = heatmap!(ax, shapley1; colormap = :roma, colorrange = col_lims)
contour!(ax, predict(sdm, L); color = :black, linewidth = 0.5)
Colorbar(f[1, 2], hm)
lines!(ax, CHE; color = :black)
hidedecorations!(ax)
hidespines!(ax)

When given a vector of layers, the explain function will return a layer with the explanation for each input variable:

julia
S = explain(sdm, L; threshold=false)
5-element Vector{SDMLayer{Float64}}:
 🗺️  A 240 × 546 layer (70065 Float64 cells)
 🗺️  A 240 × 546 layer (70065 Float64 cells)
 🗺️  A 240 × 546 layer (70065 Float64 cells)
 🗺️  A 240 × 546 layer (70065 Float64 cells)
 🗺️  A 240 × 546 layer (70065 Float64 cells)

We can then put this object into the mosaic function to get the index of which variable is the most important for each pixel:

Code for the figure
julia
f = Figure(; size = (600, 300))
mostimp = mosaic(argmax, map(x -> abs.(x), S))
colmap = [colorant"#E69F00", colorant"#56B4E9", colorant"#009E73", colorant"#D55E00", colorant"#CC79A7"]
ax = Axis(f[1, 1]; aspect = DataAspect())
heatmap!(ax, mostimp; colormap = colmap)
contour!(
    ax,
    predict(sdm, L);
    color = :black,
    linewidth = 0.5,
)
lines!(ax, CHE; color = :black)
hidedecorations!(ax)
hidespines!(ax)
Legend(
    f[2, 1],
    [PolyElement(; color = colmap[i]) for i in 1:length(bio_vars)],
    ["BIO$(b)" for b in bio_vars];
    orientation = :horizontal,
    nbanks = 1,
    framevisible = false,
    vertical = false,
)