Skip to content

Spatial cross-validation

The purpose of this vignette is to show how, with tiles that are generated through the tessellation functions, we can split a dataset into multiple spatial groups, and use this to cross-validate the model.

Unstable part of the API

The functions on this page should be considered a work in progress, and their behavior is likely to change in ways that will be documented here. These should be treated as highly experimental.

julia
using SpeciesDistributionToolkit
const SDT = SpeciesDistributionToolkit
using PrettyTables
using CairoMakie

Assembling a dataset

We will work on the state of California:

julia
pol = getpolygon(PolygonData(OpenStreetMap, Places); place = "California")
bb = SDT.boundingbox(pol)
(left = -124.48200225830078, right = -114.13078308105469, bottom = 32.52952194213867, top = 42.009498596191406)

To ensure the correct representation of distances, we will generate an orthoprojection:

julia
proj = "+proj=ortho +lon_0=$((bb.right + bb.left)/2) +lat_0=$((bb.top + bb.bottom)/2)"
"+proj=ortho +lon_0=-119.30639266967773 +lat_0=37.26951026916504"

And we will finally get the list of Sasquatch sightings from the OccurrencesInterface package:

julia
records = Occurrences(mask(OccurrencesInterface.__demodata(), pol))
Occurrences(Occurrence[Occurrence("Sasquatch", true, (-119.6436, 37.35944), DateTime("1988-10-01T12:00:00")), Occurrence("Sasquatch", true, (-119.9833, 38.93333), DateTime("2000-06-15T12:00:00")), Occurrence("Sasquatch", true, (-120.0914, 38.9575), DateTime("2000-09-23T12:00:00")), Occurrence("Sasquatch", true, (-124.1625, 40.80222), DateTime("1957-06-01T12:00:00")), Occurrence("Sasquatch", true, (-119.836, 38.28085), DateTime("1997-11-15T12:00:00")), Occurrence("Sasquatch", true, (-121.9139, 40.33556), DateTime("1968-11-01T12:00:00")), Occurrence("Sasquatch", true, (-119.2119, 37.23389), DateTime("1997-12-18T12:00:00")), Occurrence("Sasquatch", true, (-119.9065, 38.17677), DateTime("1994-06-01T12:00:00")), Occurrence("Sasquatch", true, (-120.6781, 41.41), DateTime("1974-06-01T12:00:00")), Occurrence("Sasquatch", true, (-120.205, 38.93), DateTime("2000-07-01T12:00:00")), Occurrence("Sasquatch", true, (-119.9833, 38.93333), DateTime("1996-04-10T12:00:00")), Occurrence("Sasquatch", true, (-120.1358, 38.86), DateTime("1993-04-01T12:00:00")), Occurrence("Sasquatch", true, (-118.9975, 37.35861), DateTime("1995-10-29T12:00:00")), Occurrence("Sasquatch", true, (-120.1877, 38.06615), DateTime("1978-08-01T12:00:00")), Occurrence("Sasquatch", true, (-118.8489, 36.10222), DateTime("1955-08-01T12:00:00")), Occurrence("Sasquatch", true, (-119.9811, 38.19753), DateTime("1978-07-30T12:00:00")), Occurrence("Sasquatch", true, (-122.4306, 41.45833), DateTime("1999-03-04T12:00:00")), Occurrence("Sasquatch", true, (-122.3839, 41.42806), DateTime("1966-09-01T12:00:00")), Occurrence("Sasquatch", true, (-119.2642, 38.04806), DateTime("1977-07-01T12:00:00")), Occurrence("Sasquatch", true, (-122.7325, 40.63556), DateTime("1978-10-01T12:00:00")), Occurrence("Sasquatch", true, (-120.2339, 38.03898), DateTime("1983-10-25T12:00:00")), Occurrence("Sasquatch", true, (-118.1303, 36.05278), DateTime("1987-05-31T12:00:00")), Occurrence("Sasquatch", true, (-119.8633, 38.26661), DateTime("1993-06-15T12:00:00")), Occurrence("Sasquatch", true, (-122.9958, 41.82417), DateTime("1944-08-01T12:00:00")), Occurrence("Sasquatch", true, (-123.1317, 41.3657), DateTime("2000-07-20T12:00:00")), Occurrence("Sasquatch", true, (-122.5833, 38.25583), DateTime("1991-01-01T12:00:00")), Occurrence("Sasquatch", true, (-120.8908, 39.21278), DateTime("1978-06-01T12:00:00")), Occurrence("Sasquatch", true, (-119.2569, 37.91111), DateTime("1998-08-01T12:00:00")), Occurrence("Sasquatch", true, (-116.8625, 33.2675), DateTime("1993-07-04T12:00:00")), Occurrence("Sasquatch", true, (-122.2703, 41.21361), DateTime("2001-06-15T12:00:00")), Occurrence("Sasquatch", true, (-120.2653, 39.3225), DateTime("1956-10-01T12:00:00")), Occurrence("Sasquatch", true, (-121.6156, 40.92911), DateTime("1994-09-15T12:00:00")), Occurrence("Sasquatch", true, (-123.4817, 39.68833), DateTime("1987-06-01T12:00:00")), Occurrence("Sasquatch", true, (-120.0133, 38.85556), DateTime("2003-07-20T12:00:00")), Occurrence("Sasquatch", true, (-119.9833, 38.93333), DateTime("1983-01-01T12:00:00")), Occurrence("Sasquatch", true, (-123.8692, 40.26639), DateTime("1963-08-01T12:00:00")), Occurrence("Sasquatch", true, (-123.8692, 40.26639), DateTime("1965-07-01T12:00:00")), Occurrence("Sasquatch", true, (-121.495, 36.2485), DateTime("1993-04-01T12:00:00")), Occurrence("Sasquatch", true, (-122.3775, 40.8875), DateTime("1977-08-01T12:00:00")), Occurrence("Sasquatch", true, (-122.1781, 39.92778), DateTime("1973-10-15T12:00:00")), Occurrence("Sasquatch", true, (-121.0161, 39.26361), DateTime("2004-01-04T12:00:00")), Occurrence("Sasquatch", true, (-123.4817, 39.68833), DateTime("1974-09-01T12:00:00")), Occurrence("Sasquatch", true, (-123.7675, 39.22361), DateTime("1966-09-01T12:00:00")), Occurrence("Sasquatch", true, (-118.972, 37.5703), DateTime("1972-09-15T12:00:00")), Occurrence("Sasquatch", true, (-120.5836, 38.72139), DateTime("2003-09-01T12:00:00")), Occurrence("Sasquatch", true, (-121.58, 39.205), DateTime("1965-07-01T12:00:00")), Occurrence("Sasquatch", true, (-124.0, 40.465), DateTime("1992-07-12T12:00:00")), Occurrence("Sasquatch", true, (-119.2119, 37.23389), DateTime("2001-10-25T12:00:00")), Occurrence("Sasquatch", true, (-118.7402, 36.0438), DateTime("2000-05-01T12:00:00")), Occurrence("Sasquatch", true, (-120.3275, 39.31917), DateTime("2004-08-16T12:00:00")), Occurrence("Sasquatch", true, (-120.2434, 38.0312), DateTime("2003-12-24T12:00:00")), Occurrence("Sasquatch", true, (-121.7, 37.0), DateTime("2004-10-01T12:00:00")), Occurrence("Sasquatch", true, (-122.1667, 40.75), DateTime("1997-07-15T12:00:00")), Occurrence("Sasquatch", true, (-123.55, 40.85), DateTime("1988-10-15T12:00:00")), Occurrence("Sasquatch", true, (-120.6033, 38.90333), DateTime("2004-11-01T12:00:00")), Occurrence("Sasquatch", true, (-120.201, 36.7818), DateTime("1991-06-28T12:00:00")), Occurrence("Sasquatch", true, (-122.278, 40.7974), DateTime("2001-03-15T12:00:00")), Occurrence("Sasquatch", true, (-121.7146, 36.3779), DateTime("1986-06-15T12:00:00")), Occurrence("Sasquatch", true, (-123.35, 41.75), DateTime("1967-03-10T12:00:00")), Occurrence("Sasquatch", true, (-119.5483, 38.3068), DateTime("2005-08-21T12:00:00")), Occurrence("Sasquatch", true, (-121.0517, 39.25396), DateTime("1987-09-15T12:00:00")), Occurrence("Sasquatch", true, (-123.85, 41.65), DateTime("1979-10-25T12:00:00")), Occurrence("Sasquatch", true, (-123.2718, 40.7485), DateTime("2003-09-15T12:00:00")), Occurrence("Sasquatch", true, (-123.35, 41.75), DateTime("1993-12-15T12:00:00")), Occurrence("Sasquatch", true, (-121.4871, 37.39555), DateTime("1869-11-10T12:00:00")), Occurrence("Sasquatch", true, (-120.3784, 39.50435), DateTime("1993-09-05T12:00:00")), Occurrence("Sasquatch", true, (-122.8055, 40.6789), DateTime("1982-08-09T12:00:00")), Occurrence("Sasquatch", true, (-120.2201, 38.86165), DateTime("2005-08-20T12:00:00")), Occurrence("Sasquatch", true, (-118.9166, 37.58325), DateTime("2006-06-17T12:00:00")), Occurrence("Sasquatch", true, (-124.0687, 41.21385), DateTime("2006-10-21T12:00:00")), Occurrence("Sasquatch", true, (-122.6667, 40.7333), DateTime("2007-04-18T12:00:00")), Occurrence("Sasquatch", true, (-121.7138, 38.4906), DateTime("1980-10-11T12:00:00")), Occurrence("Sasquatch", true, (-123.1668, 41.8333), DateTime("2007-09-11T12:00:00")), Occurrence("Sasquatch", true, (-118.4333, 34.6167), DateTime("2007-10-10T12:00:00")), Occurrence("Sasquatch", true, (-123.056, 41.28996), DateTime("1998-08-03T12:00:00")), Occurrence("Sasquatch", true, (-123.2393, 39.83929), DateTime("2007-12-15T12:00:00")), Occurrence("Sasquatch", true, (-119.0, 37.9333), DateTime("2008-07-09T12:00:00")), Occurrence("Sasquatch", true, (-123.3899, 39.51489), DateTime("2008-07-08T12:00:00")), Occurrence("Sasquatch", true, (-119.7853, 38.33115), DateTime("1993-08-13T12:00:00")), Occurrence("Sasquatch", true, (-119.2916, 34.625), DateTime("2008-09-14T12:00:00")), Occurrence("Sasquatch", true, (-116.5333, 33.95002), DateTime("2008-12-01T12:00:00")), Occurrence("Sasquatch", true, (-118.375, 36.70835), DateTime("2008-10-27T12:00:00")), Occurrence("Sasquatch", true, (-121.07, 37.475), DateTime("1962-08-14T12:00:00")), Occurrence("Sasquatch", true, (-123.0585, 38.83996), DateTime("2007-09-08T12:00:00")), Occurrence("Sasquatch", true, (-118.125, 36.79165), DateTime("2009-01-12T12:00:00")), Occurrence("Sasquatch", true, (-119.9254, 34.73389), DateTime("1982-04-15T12:00:00")), Occurrence("Sasquatch", true, (-123.1749, 41.98645), DateTime("2009-06-03T12:00:00")), Occurrence("Sasquatch", true, (-120.2083, 38.79998), DateTime("2009-07-15T12:00:00")), Occurrence("Sasquatch", true, (-121.18, 39.705), DateTime("1993-09-30T12:00:00")), Occurrence("Sasquatch", true, (-118.5834, 34.76666), DateTime("2010-01-22T12:00:00")), Occurrence("Sasquatch", true, (-123.4498, 41.65925), DateTime("2010-04-22T12:00:00")), Occurrence("Sasquatch", true, (-120.55, 35.665), DateTime("1974-08-01T12:00:00")), Occurrence("Sasquatch", true, (-121.0, 38.25), DateTime("2010-10-25T12:00:00")), Occurrence("Sasquatch", true, (-122.575, 41.845), DateTime("2011-12-03T12:00:00")), Occurrence("Sasquatch", true, (-119.6234, 38.51665), DateTime("2012-09-23T12:00:00")), Occurrence("Sasquatch", true, (-117.3395, 34.28665), DateTime("2014-08-20T12:00:00")), Occurrence("Sasquatch", true, (-118.9604, 36.77763), DateTime("2015-05-16T12:00:00")), Occurrence("Sasquatch", true, (-121.7587, 37.62124), DateTime("1990-05-15T12:00:00")), Occurrence("Sasquatch", true, (-123.7879, 41.89288), DateTime("2019-03-15T12:00:00")), Occurrence("Sasquatch", true, (-121.8214, 40.64289), DateTime("2018-09-04T12:00:00")), Occurrence("Sasquatch", true, (-120.9851, 40.03728), DateTime("1992-08-15T12:00:00")), Occurrence("Sasquatch", true, (-120.0124, 38.21877), DateTime("2008-06-15T12:00:00")), Occurrence("Sasquatch", true, (-118.6271, 37.22716), DateTime("1991-07-15T12:00:00")), Occurrence("Sasquatch", true, (-124.2, 40.1), DateTime("1980-02-15T12:00:00"))])

We will also grab some landcover variables over this area to train the model on:

julia
L = SDMLayer{Float32}[
    SDMLayer(RasterData(EarthEnv, LandCover); bb..., layer = i) for i in 1:12
]
mask!(L, pol)
12-element Vector{SDMLayer{Float32}}:
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)
 🗺️  A 1139 × 1243 layer (620488 Float32 cells)

Creating the tiles

At this point, we can follow the steps from the vignette on tessellation, and generate an hexagonal tiling under the projection we specified, with an equivalent radius of 30km.

julia
T = tessellate(pol, 30.0; tile = :hexagons, pointy = true, proj = proj, densify = 5)
FeatureCollection with 193 features, each with 1 properties

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
lines!(ax, pol, color=:grey30)
lines!(ax, T; color = :teal)
hidespines!(ax)
hidedecorations!(ax)

We need to decide on a number of folds, i.e. how many splits of the data we want to get. We will use five here.

julia
n = 5
5

To facilitate the visualisation, we will generate a color palette:

julia
folds_colors = cgrad(Makie.wong_colors()[1:n], n; categorical = true);

Assigning the tiles to folds

We can start by splitting the landscape in horizontal bands:

julia
SDT.assignfolds!(
    T;
    n = n,
    order = :horizontal,
)
FeatureCollection with 193 features, each with 3 properties

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
for i in 1:n
    poly!(
        ax,
        T["__fold" => i];
        alpha = 0.2,
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
    lines!(
        ax,
        T["__fold" => i];
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
end
hidespines!(ax)
hidedecorations!(ax)

Wherever possible, every split has the same number of cells, and so assuming that the tiling was generated using an equal-area projection, they will cover an equivalent surface.

We can also split the landscape vertically:

julia
SDT.assignfolds!(
    T;
    n = n,
    order = :vertical,
)
FeatureCollection with 193 features, each with 3 properties

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
for i in 1:n
    poly!(
        ax,
        T["__fold" => i];
        alpha = 0.2,
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
    lines!(
        ax,
        T["__fold" => i];
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
end
hidespines!(ax)
hidedecorations!(ax)

Grouped and alternating splits

By defaults splits are contiguous in space. This behavior can be changed, by making them sequential (i.e. cycling over 1 to n, and then re-starting), in order to more evenly distribute the points in space:

julia
SDT.assignfolds!(
    T;
    n = n,
    group = false,
    order = :horizontal,
)
FeatureCollection with 193 features, each with 3 properties

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
for i in 1:n
    poly!(
        ax,
        T["__fold" => i];
        alpha = 0.2,
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
    lines!(
        ax,
        T["__fold" => i];
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
end
hidespines!(ax)
hidedecorations!(ax)

Using spatial splits for cross-validation

We will generate a layer of pseudo-absences, then extract the two layers as a series of occurrences, as per a previous vignette.

julia
L₊ = mask(L[1], records)
P₋ = pseudoabsencemask(BetweenRadius, L₊; closer = 50.0, further = 140.0)
L₋ = backgroundpoints(P₋, 2sum(L₊))
O = Occurrences(L₊, L₋)
Occurrences(Occurrence[Occurrence("", true, (-124.20416666666668, 40.095833333333346), missing), Occurrence("", true, (-124.16250000000002, 40.804166666666674), missing), Occurrence("", true, (-124.07083333333335, 41.212500000000006), missing), Occurrence("", true, (-123.99583333333335, 40.462500000000006), missing), Occurrence("", true, (-123.87083333333335, 40.26250000000001), missing), Occurrence("", true, (-123.84583333333335, 41.64583333333334), missing), Occurrence("", true, (-123.78750000000002, 41.89583333333334), missing), Occurrence("", true, (-123.77083333333336, 39.22083333333334), missing), Occurrence("", true, (-123.54583333333335, 40.845833333333346), missing), Occurrence("", true, (-123.47916666666669, 39.68750000000001), missing), Occurrence("", true, (-123.44583333333335, 41.662500000000016), missing), Occurrence("", true, (-123.38750000000002, 39.51250000000001), missing), Occurrence("", true, (-123.34583333333336, 41.745833333333344), missing), Occurrence("", true, (-123.27083333333336, 40.745833333333344), missing), Occurrence("", true, (-123.23750000000003, 39.837500000000006), missing), Occurrence("", false, (-123.22916666666669, 40.28750000000001), missing), Occurrence("", true, (-123.17083333333335, 41.82916666666667), missing), Occurrence("", true, (-123.17083333333335, 41.98750000000001), missing), Occurrence("", false, (-123.12916666666669, 37.79583333333334), missing), Occurrence("", true, (-123.12916666666669, 41.362500000000004), missing), Occurrence("", true, (-123.06250000000003, 38.837500000000006), missing), Occurrence("", true, (-123.05416666666669, 41.28750000000001), missing), Occurrence("", true, (-122.99583333333335, 41.82083333333334), missing), Occurrence("", true, (-122.80416666666667, 40.67916666666668), missing), Occurrence("", true, (-122.72916666666669, 40.63750000000001), missing), Occurrence("", false, (-122.7041666666667, 39.304166666666674), missing), Occurrence("", true, (-122.67083333333336, 40.72916666666668), missing), Occurrence("", true, (-122.57916666666668, 38.25416666666668), missing), Occurrence("", true, (-122.57083333333335, 41.845833333333346), missing), Occurrence("", false, (-122.56250000000003, 37.52083333333334), missing), Occurrence("", false, (-122.52083333333337, 37.495833333333344), missing), Occurrence("", false, (-122.50416666666669, 39.03750000000001), missing), Occurrence("", false, (-122.49583333333337, 39.087500000000006), missing), Occurrence("", false, (-122.48750000000001, 38.78750000000001), missing), Occurrence("", true, (-122.42916666666669, 41.45416666666667), missing), Occurrence("", false, (-122.42083333333335, 37.28750000000001), missing), Occurrence("", true, (-122.38750000000002, 41.429166666666674), missing), Occurrence("", true, (-122.37916666666668, 40.88750000000001), missing), Occurrence("", false, (-122.36250000000001, 39.345833333333346), missing), Occurrence("", false, (-122.35416666666669, 37.44583333333334), missing), Occurrence("", false, (-122.35416666666669, 38.76250000000001), missing), Occurrence("", true, (-122.27916666666668, 40.79583333333334), missing), Occurrence("", true, (-122.27083333333336, 41.212500000000006), missing), Occurrence("", false, (-122.2041666666667, 39.429166666666674), missing), Occurrence("", true, (-122.17916666666669, 39.929166666666674), missing), Occurrence("", true, (-122.17083333333335, 40.745833333333344), missing), Occurrence("", false, (-122.13750000000002, 39.37916666666668), missing), Occurrence("", true, (-121.91250000000002, 40.33750000000001), missing), Occurrence("", false, (-121.82916666666668, 41.98750000000001), missing), Occurrence("", true, (-121.82083333333335, 40.64583333333334), missing), Occurrence("", true, (-121.76250000000002, 37.620833333333344), missing), Occurrence("", true, (-121.7125, 36.37916666666668), missing), Occurrence("", true, (-121.7125, 38.48750000000001), missing), Occurrence("", false, (-121.7125, 41.63750000000001), missing), Occurrence("", true, (-121.69583333333335, 36.995833333333344), missing), Occurrence("", true, (-121.61250000000003, 40.92916666666668), missing), Occurrence("", true, (-121.57916666666668, 39.20416666666668), missing), Occurrence("", true, (-121.49583333333337, 36.245833333333344), missing), Occurrence("", true, (-121.48750000000004, 37.39583333333334), missing), Occurrence("", false, (-121.37916666666669, 41.34583333333334), missing), Occurrence("", false, (-121.37916666666669, 41.98750000000001), missing), Occurrence("", false, (-121.37083333333335, 41.87916666666668), missing), Occurrence("", false, (-121.30416666666667, 41.529166666666676), missing), Occurrence("", false, (-121.29583333333335, 41.73750000000001), missing), Occurrence("", true, (-121.17916666666667, 39.70416666666668), missing), Occurrence("", true, (-121.07083333333335, 37.47083333333334), missing), Occurrence("", true, (-121.05416666666669, 39.25416666666668), missing), Occurrence("", false, (-121.0291666666667, 36.69583333333334), missing), Occurrence("", true, (-121.01250000000002, 39.26250000000001), missing), Occurrence("", true, (-120.99583333333337, 38.245833333333344), missing), Occurrence("", true, (-120.98750000000003, 40.03750000000001), missing), Occurrence("", false, (-120.9541666666667, 40.88750000000001), missing), Occurrence("", false, (-120.93750000000003, 40.60416666666668), missing), Occurrence("", true, (-120.88750000000002, 39.21250000000001), missing), Occurrence("", false, (-120.88750000000002, 41.85416666666668), missing), Occurrence("", false, (-120.85416666666669, 36.82083333333334), missing), Occurrence("", false, (-120.8375, 37.00416666666668), missing), Occurrence("", false, (-120.79583333333335, 41.97916666666668), missing), Occurrence("", false, (-120.77916666666667, 40.62916666666668), missing), Occurrence("", false, (-120.76250000000002, 36.970833333333346), missing), Occurrence("", false, (-120.68750000000003, 34.89583333333334), missing), Occurrence("", true, (-120.67916666666667, 41.412500000000016), missing), Occurrence("", false, (-120.63750000000002, 41.88750000000001), missing), Occurrence("", true, (-120.60416666666669, 38.904166666666676), missing), Occurrence("", false, (-120.58750000000002, 36.44583333333334), missing), Occurrence("", true, (-120.58750000000002, 38.720833333333346), missing), Occurrence("", false, (-120.5791666666667, 34.77083333333334), missing), Occurrence("", false, (-120.5791666666667, 36.39583333333334), missing), Occurrence("", false, (-120.5791666666667, 40.73750000000001), missing), Occurrence("", false, (-120.55416666666669, 34.595833333333346), missing), Occurrence("", true, (-120.54583333333335, 35.66250000000001), missing), Occurrence("", false, (-120.43750000000003, 37.404166666666676), missing), Occurrence("", false, (-120.4291666666667, 35.16250000000001), missing), Occurrence("", true, (-120.37916666666669, 39.50416666666668), missing), Occurrence("", false, (-120.34583333333336, 35.087500000000006), missing), Occurrence("", true, (-120.32916666666668, 39.32083333333334), missing), Occurrence("", false, (-120.29583333333335, 36.25416666666668), missing), Occurrence("", false, (-120.2791666666667, 37.47916666666668), missing), Occurrence("", true, (-120.26250000000002, 39.32083333333334), missing), Occurrence("", true, (-120.24583333333334, 38.029166666666676), missing), Occurrence("", true, (-120.23750000000001, 38.03750000000001), missing), Occurrence("", false, (-120.22916666666669, 36.06250000000001), missing), Occurrence("", true, (-120.22083333333336, 38.86250000000001), missing), Occurrence("", false, (-120.22083333333336, 41.75416666666668), missing), Occurrence("", true, (-120.20416666666668, 36.779166666666676), missing), Occurrence("", true, (-120.20416666666668, 38.79583333333335), missing), Occurrence("", true, (-120.20416666666668, 38.929166666666674), missing), Occurrence("", true, (-120.18750000000001, 38.062500000000014), missing), Occurrence("", false, (-120.18750000000001, 41.054166666666674), missing), Occurrence("", true, (-120.13750000000003, 38.86250000000001), missing), Occurrence("", false, (-120.13750000000003, 40.63750000000001), missing), Occurrence("", false, (-120.12916666666669, 40.54583333333334), missing), Occurrence("", false, (-120.12083333333335, 40.154166666666676), missing), Occurrence("", false, (-120.11250000000003, 36.029166666666676), missing), Occurrence("", false, (-120.09583333333336, 40.00416666666668), missing), Occurrence("", true, (-120.08750000000002, 38.95416666666668), missing), Occurrence("", false, (-120.05416666666667, 40.18750000000001), missing), Occurrence("", false, (-120.04583333333335, 34.062500000000014), missing), Occurrence("", false, (-120.03750000000002, 41.85416666666668), missing), Occurrence("", true, (-120.01250000000002, 38.220833333333346), missing), Occurrence("", true, (-120.01250000000002, 38.85416666666668), missing), Occurrence("", true, (-119.97916666666669, 38.19583333333334), missing), Occurrence("", true, (-119.97916666666669, 38.929166666666674), missing), Occurrence("", false, (-119.9541666666667, 36.179166666666674), missing), Occurrence("", false, (-119.94583333333337, 35.90416666666668), missing), Occurrence("", true, (-119.92916666666669, 34.73750000000001), missing), Occurrence("", true, (-119.9041666666667, 38.179166666666674), missing), Occurrence("", false, (-119.87916666666669, 35.31250000000001), missing), Occurrence("", true, (-119.86250000000003, 38.26250000000001), missing), Occurrence("", true, (-119.83750000000003, 38.279166666666676), missing), Occurrence("", false, (-119.82916666666668, 34.07083333333334), missing), Occurrence("", false, (-119.80416666666667, 36.36250000000001), missing), Occurrence("", true, (-119.78750000000002, 38.32916666666667), missing), Occurrence("", false, (-119.72083333333335, 35.42083333333335), missing), Occurrence("", false, (-119.72083333333335, 35.94583333333335), missing), Occurrence("", false, (-119.71250000000002, 36.054166666666674), missing), Occurrence("", false, (-119.69583333333334, 35.89583333333334), missing), Occurrence("", false, (-119.65416666666667, 36.18750000000001), missing), Occurrence("", true, (-119.64583333333334, 37.36250000000001), missing), Occurrence("", false, (-119.63750000000002, 35.91250000000001), missing), Occurrence("", false, (-119.62083333333337, 36.61250000000001), missing), Occurrence("", true, (-119.62083333333337, 38.51250000000001), missing), Occurrence("", false, (-119.57083333333335, 36.345833333333346), missing), Occurrence("", false, (-119.55416666666669, 36.14583333333334), missing), Occurrence("", true, (-119.54583333333336, 38.304166666666674), missing), Occurrence("", false, (-119.48750000000003, 35.37916666666668), missing), Occurrence("", false, (-119.43750000000003, 35.47916666666668), missing), Occurrence("", false, (-119.42083333333336, 35.24583333333334), missing), Occurrence("", false, (-119.32083333333335, 35.64583333333334), missing), Occurrence("", true, (-119.28750000000002, 34.620833333333344), missing), Occurrence("", true, (-119.26250000000002, 38.04583333333334), missing), Occurrence("", true, (-119.25416666666669, 37.91250000000001), missing), Occurrence("", false, (-119.22916666666669, 35.20416666666668), missing), Occurrence("", true, (-119.21250000000003, 37.23750000000001), missing), Occurrence("", false, (-119.13750000000002, 35.462500000000006), missing), Occurrence("", false, (-119.08750000000003, 34.12916666666668), missing), Occurrence("", false, (-119.07083333333335, 33.48750000000001), missing), Occurrence("", false, (-119.06250000000003, 35.64583333333334), missing), Occurrence("", false, (-118.99583333333335, 35.37916666666668), missing), Occurrence("", true, (-118.99583333333335, 37.36250000000001), missing), Occurrence("", true, (-118.99583333333335, 37.929166666666674), missing), Occurrence("", false, (-118.98750000000001, 33.51250000000001), missing), Occurrence("", false, (-118.97083333333336, 35.35416666666667), missing), Occurrence("", true, (-118.97083333333336, 37.57083333333334), missing), Occurrence("", true, (-118.96250000000002, 36.779166666666676), missing), Occurrence("", true, (-118.91250000000002, 37.57916666666667), missing), Occurrence("", true, (-118.84583333333336, 36.10416666666667), missing), Occurrence("", false, (-118.82916666666668, 34.029166666666676), missing), Occurrence("", false, (-118.82083333333335, 35.54583333333334), missing), Occurrence("", false, (-118.76250000000002, 33.97083333333334), missing), Occurrence("", true, (-118.73750000000003, 36.04583333333334), missing), Occurrence("", false, (-118.66250000000002, 35.26250000000001), missing), Occurrence("", false, (-118.62916666666669, 35.33750000000001), missing), Occurrence("", true, (-118.62916666666669, 37.22916666666667), missing), Occurrence("", true, (-118.58750000000002, 34.76250000000001), missing), Occurrence("", false, (-118.5291666666667, 34.03750000000001), missing), Occurrence("", true, (-118.4291666666667, 34.620833333333344), missing), Occurrence("", true, (-118.37083333333335, 36.712500000000006), missing), Occurrence("", false, (-118.3041666666667, 35.529166666666676), missing), Occurrence("", false, (-118.1791666666667, 35.54583333333334), missing), Occurrence("", true, (-118.12916666666669, 36.054166666666674), missing), Occurrence("", true, (-118.12083333333335, 36.78750000000001), missing), Occurrence("", false, (-118.00416666666669, 33.72916666666668), missing), Occurrence("", false, (-117.98750000000001, 34.25416666666668), missing), Occurrence("", false, (-117.98750000000001, 35.57916666666668), missing), Occurrence("", false, (-117.93750000000003, 33.679166666666674), missing), Occurrence("", false, (-117.92916666666669, 33.97916666666668), missing), Occurrence("", false, (-117.91250000000002, 35.03750000000001), missing), Occurrence("", false, (-117.91250000000002, 37.27083333333334), missing), Occurrence("", false, (-117.85416666666669, 35.495833333333344), missing), Occurrence("", false, (-117.83750000000003, 35.054166666666674), missing), Occurrence("", false, (-117.79583333333335, 35.17083333333334), missing), Occurrence("", false, (-117.78750000000002, 33.94583333333334), missing), Occurrence("", false, (-117.76250000000002, 34.91250000000001), missing), Occurrence("", false, (-117.7041666666667, 35.61250000000001), missing), Occurrence("", false, (-117.67916666666669, 34.64583333333334), missing), Occurrence("", false, (-117.67916666666669, 34.712500000000006), missing), Occurrence("", false, (-117.65416666666668, 37.26250000000002), missing), Occurrence("", false, (-117.62083333333337, 37.27083333333334), missing), Occurrence("", false, (-117.59583333333336, 33.73750000000001), missing), Occurrence("", false, (-117.58750000000003, 35.81250000000001), missing), Occurrence("", false, (-117.57916666666668, 34.91250000000001), missing), Occurrence("", false, (-117.57916666666668, 35.01250000000001), missing), Occurrence("", false, (-117.57916666666668, 35.23750000000001), missing), Occurrence("", false, (-117.54583333333335, 36.337500000000006), missing), Occurrence("", false, (-117.53750000000002, 35.52083333333334), missing), Occurrence("", false, (-117.52916666666668, 33.63750000000001), missing), Occurrence("", false, (-117.52916666666668, 35.06250000000001), missing), Occurrence("", false, (-117.52083333333337, 36.554166666666674), missing), Occurrence("", false, (-117.47916666666669, 33.22916666666668), missing), Occurrence("", false, (-117.47916666666669, 33.370833333333344), missing), Occurrence("", false, (-117.41250000000002, 36.92083333333335), missing), Occurrence("", false, (-117.39583333333337, 36.245833333333344), missing), Occurrence("", false, (-117.38750000000003, 36.00416666666668), missing), Occurrence("", false, (-117.35416666666669, 36.462500000000006), missing), Occurrence("", true, (-117.33750000000002, 34.28750000000001), missing), Occurrence("", false, (-117.31250000000003, 32.82916666666667), missing), Occurrence("", false, (-117.30416666666667, 35.179166666666674), missing), Occurrence("", false, (-117.28750000000002, 35.47916666666668), missing), Occurrence("", false, (-117.24583333333337, 33.779166666666676), missing), Occurrence("", false, (-117.21250000000002, 34.76250000000001), missing), Occurrence("", false, (-117.1541666666667, 35.27083333333334), missing), Occurrence("", false, (-117.14583333333336, 32.870833333333344), missing), Occurrence("", false, (-117.12083333333337, 32.779166666666676), missing), Occurrence("", false, (-117.07083333333335, 33.845833333333346), missing), Occurrence("", false, (-117.06250000000003, 32.76250000000001), missing), Occurrence("", true, (-116.86250000000001, 33.27083333333334), missing), Occurrence("", false, (-116.86250000000001, 35.79583333333335), missing), Occurrence("", false, (-116.85416666666669, 36.50416666666668), missing), Occurrence("", false, (-116.8291666666667, 35.54583333333334), missing), Occurrence("", false, (-116.82083333333335, 35.1375), missing), Occurrence("", false, (-116.77083333333336, 36.19583333333334), missing), Occurrence("", false, (-116.77083333333336, 36.437500000000014), missing), Occurrence("", false, (-116.75416666666669, 35.28750000000001), missing), Occurrence("", false, (-116.72083333333335, 35.087500000000006), missing), Occurrence("", false, (-116.71250000000003, 36.06250000000001), missing), Occurrence("", false, (-116.7041666666667, 36.029166666666676), missing), Occurrence("", false, (-116.7041666666667, 36.220833333333346), missing), Occurrence("", false, (-116.7041666666667, 36.27083333333334), missing), Occurrence("", false, (-116.61250000000001, 34.81250000000001), missing), Occurrence("", false, (-116.56250000000003, 34.712500000000006), missing), Occurrence("", false, (-116.55416666666669, 34.82916666666668), missing), Occurrence("", false, (-116.53750000000002, 32.654166666666676), missing), Occurrence("", true, (-116.52916666666668, 33.95416666666668), missing), Occurrence("", false, (-116.50416666666669, 34.72083333333334), missing), Occurrence("", false, (-116.49583333333335, 35.11250000000001), missing), Occurrence("", false, (-116.4791666666667, 35.03750000000001), missing), Occurrence("", false, (-116.47083333333336, 34.82083333333334), missing), Occurrence("", false, (-116.37083333333335, 35.16250000000001), missing), Occurrence("", false, (-116.36250000000003, 32.80416666666668), missing), Occurrence("", false, (-116.32083333333335, 32.85416666666667), missing), Occurrence("", false, (-116.31250000000003, 33.14583333333334), missing), Occurrence("", false, (-116.29583333333335, 33.370833333333344), missing), Occurrence("", false, (-116.28750000000002, 33.45416666666667), missing), Occurrence("", false, (-116.28750000000002, 34.554166666666674), missing), Occurrence("", false, (-116.27916666666668, 32.66250000000001), missing), Occurrence("", false, (-116.27916666666668, 34.929166666666674), missing), Occurrence("", false, (-116.27083333333334, 34.53750000000001), missing), Occurrence("", false, (-116.27083333333334, 34.66250000000001), missing), Occurrence("", false, (-116.23750000000003, 35.054166666666674), missing), Occurrence("", false, (-116.22083333333336, 32.63750000000001), missing), Occurrence("", false, (-116.2041666666667, 32.654166666666676), missing), Occurrence("", false, (-116.14583333333334, 33.57916666666667), missing), Occurrence("", false, (-116.12916666666669, 33.60416666666667), missing), Occurrence("", false, (-116.09583333333336, 34.62916666666668), missing), Occurrence("", false, (-116.0791666666667, 33.38750000000001), missing), Occurrence("", false, (-116.04583333333335, 34.48750000000001), missing), Occurrence("", false, (-115.99583333333337, 33.20416666666668), missing), Occurrence("", false, (-115.99583333333337, 33.76250000000001), missing), Occurrence("", false, (-115.98750000000003, 33.00416666666668), missing), Occurrence("", false, (-115.98750000000003, 34.20416666666667), missing), Occurrence("", false, (-115.97916666666669, 34.37916666666668), missing), Occurrence("", false, (-115.94583333333335, 33.95416666666668), missing), Occurrence("", false, (-115.94583333333335, 34.21250000000001), missing), Occurrence("", false, (-115.93750000000003, 33.47916666666667), missing), Occurrence("", false, (-115.85416666666669, 33.41250000000001), missing), Occurrence("", false, (-115.82083333333335, 34.18750000000001), missing), Occurrence("", false, (-115.81250000000001, 32.995833333333344), missing), Occurrence("", false, (-115.76250000000003, 33.06250000000001), missing), Occurrence("", false, (-115.70416666666668, 34.370833333333344), missing), Occurrence("", false, (-115.68750000000001, 33.35416666666668), missing), Occurrence("", false, (-115.66250000000002, 34.07916666666667), missing), Occurrence("", false, (-115.62916666666669, 32.73750000000001), missing), Occurrence("", false, (-115.62916666666669, 32.78750000000001), missing), Occurrence("", false, (-115.62916666666669, 33.48750000000001), missing), Occurrence("", false, (-115.62083333333337, 34.929166666666674), missing), Occurrence("", false, (-115.60416666666669, 33.654166666666676), missing), Occurrence("", false, (-115.5541666666667, 32.72916666666668), missing), Occurrence("", false, (-115.46250000000002, 34.76250000000001), missing), Occurrence("", false, (-115.4291666666667, 33.245833333333344), missing), Occurrence("", false, (-115.39583333333336, 34.57083333333334), missing), Occurrence("", false, (-115.14583333333336, 33.86250000000001), missing), Occurrence("", false, (-115.02916666666668, 33.95416666666668), missing)])

We will only keep the part of the tiling that covers at least one presence or absence point:

julia
SDT.assignfolds!(T; n = n, order = :horizontal)
S = SDT.keeprelevant(T, O)
FeatureCollection with 131 features, each with 5 properties

Generation of tiles

It is also possible to generate the tiles directly from the model. We are using the keeprelevant approach here in order to highlight the position of species data within the entire region.

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
lines!(ax, T; color = :grey70)
for i in 1:n
    poly!(
        ax,
        S["__fold" => i];
        alpha = 0.2,
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
    lines!(
        ax,
        S["__fold" => i];
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
end
scatter!(ax, presences(O); color = :black)
scatter!(ax, absences(O); color = :grey40, marker = :cross, markersize = 8)
hidedecorations!(ax)
hidespines!(ax)

Before moving on, we will assemble the actual model we will use here:

julia
model = SDM(RawData, NaiveBayes, L, O)
❎  RawData → NaiveBayes → P(x) ≥ 0.5 🗺️

We generate

julia
folds = spatialfold(model, S);

Spatial fold function

The spatialfold function can also be called with a single argument (a FeatureCollection of tiles), in which case it will return a closure that can be called directly on a model like any other function to generate dataset splits.

Because the spatialfold function returns a correct division of samples between training and validation, we can use it as the second argument to crossvalidate:

julia
cv = crossvalidate(model, folds)
(validation = ConfusionMatrix[(tp: 0, fp: 0; fn: 2, tn: 56), (tp: 0, fp: 0; fn: 6, tn: 65), (tp: 0, fp: 0; fn: 16, tn: 41), (tp: 0, fp: 0; fn: 43, tn: 10), (tp: 0, fp: 0; fn: 31, tn: 23)], training = ConfusionMatrix[(tp: 0, fp: 0; fn: 96, tn: 139), (tp: 0, fp: 0; fn: 92, tn: 130), (tp: 0, fp: 0; fn: 82, tn: 154), (tp: 0, fp: 0; fn: 55, tn: 185), (tp: 0, fp: 0; fn: 67, tn: 172)])

Cross-validation

There is an entire vignette on cross-validation, which covers the important ways to interact with the cross-validation outputs, as well as non-spatial methods to split data.

julia
measures = [mcc, SDeMo.specificity, SDeMo.sensitivity, balancedaccuracy]
cvresult = [measure(set) for measure in measures, set in cv]
nullresult = [measure(null(model)) for measure in measures, null in [coinflip, noskill]]
pretty_table(
    hcat(string.(measures), hcat(cvresult, nullresult));
    alignment = [:l, :c, :c, :c, :c],
    backend = :markdown,
    column_labels = ["Measure", "Validation", "Training", "Coin-flip", "No-skill"],
    formatters = [fmt__printf("%5.3f", [2, 3, 4, 5])],
)
MeasureValidationTrainingCoin-flipNo-skill
mcc0.0000.000-0.3310.000
specificity1.0001.0000.3340.666
sensitivity0.0000.0000.3340.334
balancedaccuracy0.5000.5000.3340.500

This is not a very good model. There are a few reasons for this. First, we have not done any variable selection. Second, the splits are likely to have very different class balance, which can bias the model performance.

julia
pr_by_fold = [sum(uniqueproperties(S["__fold" => i])["__presences"]) for i in 1:n]
ab_by_fold = [sum(uniqueproperties(S["__fold" => i])["__absences"]) for i in 1:n]
extrema(pr_by_fold ./ (pr_by_fold .+ ab_by_fold))
(0.034482758620689655, 0.8571428571428571)

The splits we have used cover a large range of balances, which means that the model will be both trained and evaluated on very different balances when compared to the actual dataset.

Creating folds with balance

We can instead assign the observations to spatially stratified folds that are optimized to have the same (approx.) class balance as the entire dataset.

julia
SDT.assignfolds!(
    S;
    n = n,
    order = :horizontal,
    balanced = true,
)
FeatureCollection with 131 features, each with 5 properties

The class balancing approach works by starting from an initial position (here, horizontally stratified bands), and then switching tiles between folds until the distance between the balance of each fold and the balance of the dataset is minimized. Internally this is done using a greedy but fast algorithm.

The spatial structure is lost

The currently implemented version of the class balance algorithm will not attempt to maintain the spatial structure of the blocks, nor will it ensure that the folds end up with similar numbers of instances. The number of tiles that constitutes each fold will be maintained.

After performing the optimisation of splits for class balance, we obtained a new division of the landscape:

Code for the figure
julia
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
lines!(ax, T; color = :grey70)
for i in 1:n
    poly!(
        ax,
        S["__fold" => i];
        alpha = 0.2,
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
    lines!(
        ax,
        S["__fold" => i];
        color = i,
        colorrange = (1, n),
        colormap = folds_colors,
    )
end
scatter!(ax, presences(O); color = :black)
scatter!(ax, absences(O); color = :grey40, marker = :cross, markersize = 8)
hidedecorations!(ax)
hidespines!(ax)

Based on this new split, we can select the variables:

julia
folds = SDT.spatialfold(model, S)
variables!(model, ForwardSelection, folds)
layers(RasterData(EarthEnv, LandCover))[variables(model)]
2-element Vector{String}:
 "Evergreen/Deciduous Needleleaf Trees"
 "Cultivated and Managed Vegetation"

As before, this model can be cross-validated:

julia
cv = crossvalidate(model, folds);

We can measure the expected performance:

julia
measures = [mcc, SDeMo.specificity, SDeMo.sensitivity, balancedaccuracy]
cvresult = [measure(set) for measure in measures, set in cv]
nullresult = [measure(null(model)) for measure in measures, null in [coinflip, noskill]]
pretty_table(
    hcat(string.(measures), hcat(cvresult, nullresult));
    alignment = [:l, :c, :c, :c, :c],
    backend = :markdown,
    column_labels = ["Measure", "Validation", "Training", "Coin-flip", "No-skill"],
    formatters = [fmt__printf("%5.3f", [2, 3, 4, 5])],
)
MeasureValidationTrainingCoin-flipNo-skill
mcc0.6000.631-0.3310.000
specificity0.8950.8920.3340.666
sensitivity0.6890.7280.3340.334
balancedaccuracy0.7920.8100.3340.500
SpeciesDistributionToolkit.assignfolds! Function
julia
assignfolds!(H::FeatureCollection; n::Integer=10, order::Symbol=:random, group::Bool=true, balanced::Bool=false, maxiter::Integer = 2000)

Assigns the features in a tiling (or any other FeatureCollection) to n blocks for spatial cross-validation. Note that the features in H must have a __centroid property which indicates where the center of each cell is.

The order keyword will determine how the tiles are assigned. When using :horizontal or :vertical, the tiles will be assigned either horizontally, or vertically. In this case, the keyword group will determine how the folds are assigned. When group is true (the default), folds are spatially contiguous. When group is false, folds are spatially alternating.

When order is :random, the tiles are assigned fully at random.

When balanced is :true, the tiles must have both __presences and __absences properties. The assignment of a tile to folds is done by using a greedy algorithm (for up to maxiter rounds) which will swap tiles across folds until all folds are as close as possible to reaching the class imbalance of the entire dataset. Note that this step is done after the previous steps (order / group) have been applied.

This method changes the feature collection by adding a __fold property to each tiles, which can be used in conjunction with spatialfold.

source
SpeciesDistributionToolkit.spatialfold Function
julia
spatialfold(model::SDM, blocks::FeatureCollection)

Returns a series of training, validation folds, as a vector of tuple of vectors. This is the same output returned by all cross-validation functions such as kfold and leaveoneout.

The folds are assigned by looking at the "__fold" property of the feature collection. It will likely have been set by assignfolds!.

source
julia
spatialfold(blocks::FeatureCollection)

Creates a closure which, when applied to a model, will return the training,validation folds.

source