Skip to content

Interpretability

The purpose of this vignette is to show how to generate explanations from SDeMo models, using partial responses and Shapley values.

julia
using SpeciesDistributionToolkit
using CairoMakie
using PrettyTables

We will work on the demo data:

julia
X, y = SDeMo.__demodata()
sdm = SDM(ZScore, Logistic, X, y)
variables!(sdm, [1, 12])
hyperparameters!(classifier(sdm), :interactions, :self)
hyperparameters!(classifier(sdm),, 1e-4)
hyperparameters!(classifier(sdm), :epochs, 10_000)
train!(sdm)
ZScore → Logistic → P(x) ≥ 0.311

We start by generating a partial response curve:

julia
prx, pry = partialresponse(sdm, 1, LinRange(5.0, 15.0, 100); threshold = false);

Note that we use threshold=false to make sure that we look at the score that is returned by the classifier, and not the thresholded version (i.e. presence/absence).

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; xlabel = "BIO1", ylabel = "Partial response")
lines!(ax, prx, pry; color = :black)

We can also show the response surface using two variables:

julia
prx, pry, prz = partialresponse(sdm, variables(sdm)[1:2]..., (50, 50); threshold = false);

Note that the last element returned in this case is a two-dimensional array, as it makes sense to visualize the result as a heatmap. Although the idea of a the partial response curves generalizes to more than two dimensions, it is not supported by the package.

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; xlabel = "BIO$(variables(sdm)[1])", ylabel = "BIO$(variables(sdm)[2])")
cm = heatmap!(prx, pry, prz; colormap = :Greys, colorrange = (0, 1))
Colorbar(f[1, 2], cm)

Inflated partial responses replace the average value by other values drawn from different quantiles of the variables:

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1])
prx, pry = partialresponse(sdm, 1; inflated = false, threshold = false)
for i in 1:200
    ix, iy = partialresponse(sdm, 1; inflated = true, threshold = false)
    lines!(ax, ix, iy; color = (:grey, 0.2))
end
lines!(ax, prx, pry; color = :black, linewidth = 4)

We can perform the (MCMC version of) Shapley values measurement, using the explain method:

julia
[explain(sdm, v; observation = 3, threshold = false) for v in variables(sdm)]
2-element Vector{Float64}:
 -0.36071835679425546
  0.0003793429916604289

These values are returned as the effect of this variable's value on the average prediction for this observation.

We can also produce a figure that looks like the partial response curve, by showing the effect on a variable on each training instance:

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; xlabel = "BIO1", ylabel = "Effect on the average prediction")
scatter!(ax, features(sdm, 1), explain(sdm, 1; threshold = false))