Spatial cross-validation
The purpose of this vignette is to show how, with tiles that are generated through the tessellation functions, we can split a dataset into multiple spatial groups, and use this to cross-validate the model.
Unstable part of the API
The functions on this page should be considered a work in progress, and their behavior is likely to change in ways that will be documented here. These should be treated as highly experimental.
using SpeciesDistributionToolkit
const SDT = SpeciesDistributionToolkit
using PrettyTables
using CairoMakieAssembling a dataset
We will work on the state of California:
pol = getpolygon(PolygonData(OpenStreetMap, Places); place = "California")
bb = SDT.boundingbox(pol)(left = -124.48200225830078, right = -114.13078308105469, bottom = 32.52952194213867, top = 42.009498596191406)To ensure the correct representation of distances, we will generate an orthoprojection:
proj = "+proj=ortho +lon_0=$((bb.right + bb.left)/2) +lat_0=$((bb.top + bb.bottom)/2)""+proj=ortho +lon_0=-119.30639266967773 +lat_0=37.26951026916504"And we will finally get the list of Sasquatch sightings from the OccurrencesInterface package:
records = Occurrences(mask(OccurrencesInterface.__demodata(), pol))Occurrences(Occurrence[Occurrence("Sasquatch", true, (-119.6436, 37.35944), DateTime("1988-10-01T12:00:00")), Occurrence("Sasquatch", true, (-119.9833, 38.93333), DateTime("2000-06-15T12:00:00")), Occurrence("Sasquatch", true, (-120.0914, 38.9575), DateTime("2000-09-23T12:00:00")), Occurrence("Sasquatch", true, (-124.1625, 40.80222), DateTime("1957-06-01T12:00:00")), Occurrence("Sasquatch", true, (-119.836, 38.28085), DateTime("1997-11-15T12:00:00")), Occurrence("Sasquatch", true, (-121.9139, 40.33556), DateTime("1968-11-01T12:00:00")), Occurrence("Sasquatch", true, (-119.2119, 37.23389), DateTime("1997-12-18T12:00:00")), Occurrence("Sasquatch", true, (-119.9065, 38.17677), DateTime("1994-06-01T12:00:00")), Occurrence("Sasquatch", true, (-120.6781, 41.41), DateTime("1974-06-01T12:00:00")), Occurrence("Sasquatch", true, (-120.205, 38.93), DateTime("2000-07-01T12:00:00")), Occurrence("Sasquatch", true, (-119.9833, 38.93333), DateTime("1996-04-10T12:00:00")), Occurrence("Sasquatch", true, (-120.1358, 38.86), DateTime("1993-04-01T12:00:00")), Occurrence("Sasquatch", true, (-118.9975, 37.35861), DateTime("1995-10-29T12:00:00")), Occurrence("Sasquatch", true, (-120.1877, 38.06615), DateTime("1978-08-01T12:00:00")), Occurrence("Sasquatch", true, (-118.8489, 36.10222), DateTime("1955-08-01T12:00:00")), Occurrence("Sasquatch", true, (-119.9811, 38.19753), DateTime("1978-07-30T12:00:00")), Occurrence("Sasquatch", true, (-122.4306, 41.45833), DateTime("1999-03-04T12:00:00")), Occurrence("Sasquatch", true, (-122.3839, 41.42806), DateTime("1966-09-01T12:00:00")), Occurrence("Sasquatch", true, (-119.2642, 38.04806), DateTime("1977-07-01T12:00:00")), Occurrence("Sasquatch", true, (-122.7325, 40.63556), DateTime("1978-10-01T12:00:00")), Occurrence("Sasquatch", true, (-120.2339, 38.03898), DateTime("1983-10-25T12:00:00")), Occurrence("Sasquatch", true, (-118.1303, 36.05278), DateTime("1987-05-31T12:00:00")), Occurrence("Sasquatch", true, (-119.8633, 38.26661), DateTime("1993-06-15T12:00:00")), Occurrence("Sasquatch", true, (-122.9958, 41.82417), DateTime("1944-08-01T12:00:00")), Occurrence("Sasquatch", true, (-123.1317, 41.3657), DateTime("2000-07-20T12:00:00")), Occurrence("Sasquatch", true, (-122.5833, 38.25583), DateTime("1991-01-01T12:00:00")), Occurrence("Sasquatch", true, (-120.8908, 39.21278), DateTime("1978-06-01T12:00:00")), Occurrence("Sasquatch", true, (-119.2569, 37.91111), DateTime("1998-08-01T12:00:00")), Occurrence("Sasquatch", true, (-116.8625, 33.2675), DateTime("1993-07-04T12:00:00")), Occurrence("Sasquatch", true, (-122.2703, 41.21361), DateTime("2001-06-15T12:00:00")), Occurrence("Sasquatch", true, (-120.2653, 39.3225), DateTime("1956-10-01T12:00:00")), Occurrence("Sasquatch", true, (-121.6156, 40.92911), DateTime("1994-09-15T12:00:00")), Occurrence("Sasquatch", true, (-123.4817, 39.68833), DateTime("1987-06-01T12:00:00")), Occurrence("Sasquatch", true, (-120.0133, 38.85556), DateTime("2003-07-20T12:00:00")), Occurrence("Sasquatch", true, (-119.9833, 38.93333), DateTime("1983-01-01T12:00:00")), Occurrence("Sasquatch", true, (-123.8692, 40.26639), DateTime("1963-08-01T12:00:00")), Occurrence("Sasquatch", true, (-123.8692, 40.26639), DateTime("1965-07-01T12:00:00")), Occurrence("Sasquatch", true, (-121.495, 36.2485), DateTime("1993-04-01T12:00:00")), Occurrence("Sasquatch", true, (-122.3775, 40.8875), DateTime("1977-08-01T12:00:00")), Occurrence("Sasquatch", true, (-122.1781, 39.92778), DateTime("1973-10-15T12:00:00")), Occurrence("Sasquatch", true, (-121.0161, 39.26361), DateTime("2004-01-04T12:00:00")), Occurrence("Sasquatch", true, (-123.4817, 39.68833), DateTime("1974-09-01T12:00:00")), Occurrence("Sasquatch", true, (-123.7675, 39.22361), DateTime("1966-09-01T12:00:00")), Occurrence("Sasquatch", true, (-118.972, 37.5703), DateTime("1972-09-15T12:00:00")), Occurrence("Sasquatch", true, (-120.5836, 38.72139), DateTime("2003-09-01T12:00:00")), Occurrence("Sasquatch", true, (-121.58, 39.205), DateTime("1965-07-01T12:00:00")), Occurrence("Sasquatch", true, (-124.0, 40.465), DateTime("1992-07-12T12:00:00")), Occurrence("Sasquatch", true, (-119.2119, 37.23389), DateTime("2001-10-25T12:00:00")), Occurrence("Sasquatch", true, (-118.7402, 36.0438), DateTime("2000-05-01T12:00:00")), Occurrence("Sasquatch", true, (-120.3275, 39.31917), DateTime("2004-08-16T12:00:00")), Occurrence("Sasquatch", true, (-120.2434, 38.0312), DateTime("2003-12-24T12:00:00")), Occurrence("Sasquatch", true, (-121.7, 37.0), DateTime("2004-10-01T12:00:00")), Occurrence("Sasquatch", true, (-122.1667, 40.75), DateTime("1997-07-15T12:00:00")), Occurrence("Sasquatch", true, (-123.55, 40.85), DateTime("1988-10-15T12:00:00")), Occurrence("Sasquatch", true, (-120.6033, 38.90333), DateTime("2004-11-01T12:00:00")), Occurrence("Sasquatch", true, (-120.201, 36.7818), DateTime("1991-06-28T12:00:00")), Occurrence("Sasquatch", true, (-122.278, 40.7974), DateTime("2001-03-15T12:00:00")), Occurrence("Sasquatch", true, (-121.7146, 36.3779), DateTime("1986-06-15T12:00:00")), Occurrence("Sasquatch", true, (-123.35, 41.75), DateTime("1967-03-10T12:00:00")), Occurrence("Sasquatch", true, (-119.5483, 38.3068), DateTime("2005-08-21T12:00:00")), Occurrence("Sasquatch", true, (-121.0517, 39.25396), DateTime("1987-09-15T12:00:00")), Occurrence("Sasquatch", true, (-123.85, 41.65), DateTime("1979-10-25T12:00:00")), Occurrence("Sasquatch", true, (-123.2718, 40.7485), DateTime("2003-09-15T12:00:00")), Occurrence("Sasquatch", true, (-123.35, 41.75), DateTime("1993-12-15T12:00:00")), Occurrence("Sasquatch", true, (-121.4871, 37.39555), DateTime("1869-11-10T12:00:00")), Occurrence("Sasquatch", true, (-120.3784, 39.50435), DateTime("1993-09-05T12:00:00")), Occurrence("Sasquatch", true, (-122.8055, 40.6789), DateTime("1982-08-09T12:00:00")), Occurrence("Sasquatch", true, (-120.2201, 38.86165), DateTime("2005-08-20T12:00:00")), Occurrence("Sasquatch", true, (-118.9166, 37.58325), DateTime("2006-06-17T12:00:00")), Occurrence("Sasquatch", true, (-124.0687, 41.21385), DateTime("2006-10-21T12:00:00")), Occurrence("Sasquatch", true, (-122.6667, 40.7333), DateTime("2007-04-18T12:00:00")), Occurrence("Sasquatch", true, (-121.7138, 38.4906), DateTime("1980-10-11T12:00:00")), Occurrence("Sasquatch", true, (-123.1668, 41.8333), DateTime("2007-09-11T12:00:00")), Occurrence("Sasquatch", true, (-118.4333, 34.6167), DateTime("2007-10-10T12:00:00")), Occurrence("Sasquatch", true, (-123.056, 41.28996), DateTime("1998-08-03T12:00:00")), Occurrence("Sasquatch", true, (-123.2393, 39.83929), DateTime("2007-12-15T12:00:00")), Occurrence("Sasquatch", true, (-119.0, 37.9333), DateTime("2008-07-09T12:00:00")), Occurrence("Sasquatch", true, (-123.3899, 39.51489), DateTime("2008-07-08T12:00:00")), Occurrence("Sasquatch", true, (-119.7853, 38.33115), DateTime("1993-08-13T12:00:00")), Occurrence("Sasquatch", true, (-119.2916, 34.625), DateTime("2008-09-14T12:00:00")), Occurrence("Sasquatch", true, (-116.5333, 33.95002), DateTime("2008-12-01T12:00:00")), Occurrence("Sasquatch", true, (-118.375, 36.70835), DateTime("2008-10-27T12:00:00")), Occurrence("Sasquatch", true, (-121.07, 37.475), DateTime("1962-08-14T12:00:00")), Occurrence("Sasquatch", true, (-123.0585, 38.83996), DateTime("2007-09-08T12:00:00")), Occurrence("Sasquatch", true, (-118.125, 36.79165), DateTime("2009-01-12T12:00:00")), Occurrence("Sasquatch", true, (-119.9254, 34.73389), DateTime("1982-04-15T12:00:00")), Occurrence("Sasquatch", true, (-123.1749, 41.98645), DateTime("2009-06-03T12:00:00")), Occurrence("Sasquatch", true, (-120.2083, 38.79998), DateTime("2009-07-15T12:00:00")), Occurrence("Sasquatch", true, (-121.18, 39.705), DateTime("1993-09-30T12:00:00")), Occurrence("Sasquatch", true, (-118.5834, 34.76666), DateTime("2010-01-22T12:00:00")), Occurrence("Sasquatch", true, (-123.4498, 41.65925), DateTime("2010-04-22T12:00:00")), Occurrence("Sasquatch", true, (-120.55, 35.665), DateTime("1974-08-01T12:00:00")), Occurrence("Sasquatch", true, (-121.0, 38.25), DateTime("2010-10-25T12:00:00")), Occurrence("Sasquatch", true, (-122.575, 41.845), DateTime("2011-12-03T12:00:00")), Occurrence("Sasquatch", true, (-119.6234, 38.51665), DateTime("2012-09-23T12:00:00")), Occurrence("Sasquatch", true, (-117.3395, 34.28665), DateTime("2014-08-20T12:00:00")), Occurrence("Sasquatch", true, (-118.9604, 36.77763), DateTime("2015-05-16T12:00:00")), Occurrence("Sasquatch", true, (-121.7587, 37.62124), DateTime("1990-05-15T12:00:00")), Occurrence("Sasquatch", true, (-123.7879, 41.89288), DateTime("2019-03-15T12:00:00")), Occurrence("Sasquatch", true, (-121.8214, 40.64289), DateTime("2018-09-04T12:00:00")), Occurrence("Sasquatch", true, (-120.9851, 40.03728), DateTime("1992-08-15T12:00:00")), Occurrence("Sasquatch", true, (-120.0124, 38.21877), DateTime("2008-06-15T12:00:00")), Occurrence("Sasquatch", true, (-118.6271, 37.22716), DateTime("1991-07-15T12:00:00")), Occurrence("Sasquatch", true, (-124.2, 40.1), DateTime("1980-02-15T12:00:00"))])We will also grab some landcover variables over this area to train the model on:
L = SDMLayer{Float32}[
SDMLayer(RasterData(EarthEnv, LandCover); bb..., layer = i) for i in 1:12
]
mask!(L, pol)12-element Vector{SDMLayer{Float32}}:
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)
🗺️ A 1139 × 1243 layer (620488 Float32 cells)Creating the tiles
At this point, we can follow the steps from the vignette on tessellation, and generate an hexagonal tiling under the projection we specified, with an equivalent radius of 30km.
T = tessellate(pol, 30.0; tile = :hexagons, pointy = true, proj = proj, densify = 5)FeatureCollection with 193 features, each with 1 properties
Code for the figure
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
lines!(ax, pol, color=:grey30)
lines!(ax, T; color = :teal)
hidespines!(ax)
hidedecorations!(ax)We need to decide on a number of folds, i.e. how many splits of the data we want to get. We will use five here.
n = 55To facilitate the visualisation, we will generate a color palette:
folds_colors = cgrad(Makie.wong_colors()[1:n], n; categorical = true);Assigning the tiles to folds
We can start by splitting the landscape in horizontal bands:
SDT.assignfolds!(
T;
n = n,
order = :horizontal,
)FeatureCollection with 193 features, each with 3 properties
Code for the figure
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
for i in 1:n
poly!(
ax,
T["__fold" => i];
alpha = 0.2,
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
lines!(
ax,
T["__fold" => i];
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
end
hidespines!(ax)
hidedecorations!(ax)Wherever possible, every split has the same number of cells, and so assuming that the tiling was generated using an equal-area projection, they will cover an equivalent surface.
We can also split the landscape vertically:
SDT.assignfolds!(
T;
n = n,
order = :vertical,
)FeatureCollection with 193 features, each with 3 properties
Code for the figure
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
for i in 1:n
poly!(
ax,
T["__fold" => i];
alpha = 0.2,
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
lines!(
ax,
T["__fold" => i];
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
end
hidespines!(ax)
hidedecorations!(ax)Grouped and alternating splits
By defaults splits are contiguous in space. This behavior can be changed, by making them sequential (i.e. cycling over 1 to n, and then re-starting), in order to more evenly distribute the points in space:
SDT.assignfolds!(
T;
n = n,
group = false,
order = :horizontal,
)FeatureCollection with 193 features, each with 3 properties
Code for the figure
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
for i in 1:n
poly!(
ax,
T["__fold" => i];
alpha = 0.2,
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
lines!(
ax,
T["__fold" => i];
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
end
hidespines!(ax)
hidedecorations!(ax)Using spatial splits for cross-validation
We will generate a layer of pseudo-absences, then extract the two layers as a series of occurrences, as per a previous vignette.
L₊ = mask(L[1], records)
P₋ = pseudoabsencemask(BetweenRadius, L₊; closer = 50.0, further = 140.0)
L₋ = backgroundpoints(P₋, 2sum(L₊))
O = Occurrences(L₊, L₋)Occurrences(Occurrence[Occurrence("", true, (-124.20416666666668, 40.095833333333346), missing), Occurrence("", true, (-124.16250000000002, 40.804166666666674), missing), Occurrence("", true, (-124.07083333333335, 41.212500000000006), missing), Occurrence("", true, (-123.99583333333335, 40.462500000000006), missing), Occurrence("", true, (-123.87083333333335, 40.26250000000001), missing), Occurrence("", true, (-123.84583333333335, 41.64583333333334), missing), Occurrence("", true, (-123.78750000000002, 41.89583333333334), missing), Occurrence("", true, (-123.77083333333336, 39.22083333333334), missing), Occurrence("", true, (-123.54583333333335, 40.845833333333346), missing), Occurrence("", true, (-123.47916666666669, 39.68750000000001), missing), Occurrence("", true, (-123.44583333333335, 41.662500000000016), missing), Occurrence("", true, (-123.38750000000002, 39.51250000000001), missing), Occurrence("", true, (-123.34583333333336, 41.745833333333344), missing), Occurrence("", true, (-123.27083333333336, 40.745833333333344), missing), Occurrence("", true, (-123.23750000000003, 39.837500000000006), missing), Occurrence("", false, (-123.22916666666669, 40.28750000000001), missing), Occurrence("", true, (-123.17083333333335, 41.82916666666667), missing), Occurrence("", true, (-123.17083333333335, 41.98750000000001), missing), Occurrence("", false, (-123.12916666666669, 37.79583333333334), missing), Occurrence("", true, (-123.12916666666669, 41.362500000000004), missing), Occurrence("", true, (-123.06250000000003, 38.837500000000006), missing), Occurrence("", true, (-123.05416666666669, 41.28750000000001), missing), Occurrence("", true, (-122.99583333333335, 41.82083333333334), missing), Occurrence("", true, (-122.80416666666667, 40.67916666666668), missing), Occurrence("", true, (-122.72916666666669, 40.63750000000001), missing), Occurrence("", false, (-122.7041666666667, 39.304166666666674), missing), Occurrence("", true, (-122.67083333333336, 40.72916666666668), missing), Occurrence("", true, (-122.57916666666668, 38.25416666666668), missing), Occurrence("", true, (-122.57083333333335, 41.845833333333346), missing), Occurrence("", false, (-122.56250000000003, 37.52083333333334), missing), Occurrence("", false, (-122.52083333333337, 37.495833333333344), missing), Occurrence("", false, (-122.50416666666669, 39.03750000000001), missing), Occurrence("", false, (-122.49583333333337, 39.087500000000006), missing), Occurrence("", false, (-122.48750000000001, 38.78750000000001), missing), Occurrence("", true, (-122.42916666666669, 41.45416666666667), missing), Occurrence("", false, (-122.42083333333335, 37.28750000000001), missing), Occurrence("", true, (-122.38750000000002, 41.429166666666674), missing), Occurrence("", true, (-122.37916666666668, 40.88750000000001), missing), Occurrence("", false, (-122.36250000000001, 39.345833333333346), missing), Occurrence("", false, (-122.35416666666669, 37.44583333333334), missing), Occurrence("", false, (-122.35416666666669, 38.76250000000001), missing), Occurrence("", true, (-122.27916666666668, 40.79583333333334), missing), Occurrence("", true, (-122.27083333333336, 41.212500000000006), missing), Occurrence("", false, (-122.2041666666667, 39.429166666666674), missing), Occurrence("", true, (-122.17916666666669, 39.929166666666674), missing), Occurrence("", true, (-122.17083333333335, 40.745833333333344), missing), Occurrence("", false, (-122.13750000000002, 39.37916666666668), missing), Occurrence("", true, (-121.91250000000002, 40.33750000000001), missing), Occurrence("", false, (-121.82916666666668, 41.98750000000001), missing), Occurrence("", true, (-121.82083333333335, 40.64583333333334), missing), Occurrence("", true, (-121.76250000000002, 37.620833333333344), missing), Occurrence("", true, (-121.7125, 36.37916666666668), missing), Occurrence("", true, (-121.7125, 38.48750000000001), missing), Occurrence("", false, (-121.7125, 41.63750000000001), missing), Occurrence("", true, (-121.69583333333335, 36.995833333333344), missing), Occurrence("", true, (-121.61250000000003, 40.92916666666668), missing), Occurrence("", true, (-121.57916666666668, 39.20416666666668), missing), Occurrence("", true, (-121.49583333333337, 36.245833333333344), missing), Occurrence("", true, (-121.48750000000004, 37.39583333333334), missing), Occurrence("", false, (-121.37916666666669, 41.34583333333334), missing), Occurrence("", false, (-121.37916666666669, 41.98750000000001), missing), Occurrence("", false, (-121.37083333333335, 41.87916666666668), missing), Occurrence("", false, (-121.30416666666667, 41.529166666666676), missing), Occurrence("", false, (-121.29583333333335, 41.73750000000001), missing), Occurrence("", true, (-121.17916666666667, 39.70416666666668), missing), Occurrence("", true, (-121.07083333333335, 37.47083333333334), missing), Occurrence("", true, (-121.05416666666669, 39.25416666666668), missing), Occurrence("", false, (-121.0291666666667, 36.69583333333334), missing), Occurrence("", true, (-121.01250000000002, 39.26250000000001), missing), Occurrence("", true, (-120.99583333333337, 38.245833333333344), missing), Occurrence("", true, (-120.98750000000003, 40.03750000000001), missing), Occurrence("", false, (-120.9541666666667, 40.88750000000001), missing), Occurrence("", false, (-120.93750000000003, 40.60416666666668), missing), Occurrence("", true, (-120.88750000000002, 39.21250000000001), missing), Occurrence("", false, (-120.88750000000002, 41.85416666666668), missing), Occurrence("", false, (-120.85416666666669, 36.82083333333334), missing), Occurrence("", false, (-120.8375, 37.00416666666668), missing), Occurrence("", false, (-120.79583333333335, 41.97916666666668), missing), Occurrence("", false, (-120.77916666666667, 40.62916666666668), missing), Occurrence("", false, (-120.76250000000002, 36.970833333333346), missing), Occurrence("", false, (-120.68750000000003, 34.89583333333334), missing), Occurrence("", true, (-120.67916666666667, 41.412500000000016), missing), Occurrence("", false, (-120.63750000000002, 41.88750000000001), missing), Occurrence("", true, (-120.60416666666669, 38.904166666666676), missing), Occurrence("", false, (-120.58750000000002, 36.44583333333334), missing), Occurrence("", true, (-120.58750000000002, 38.720833333333346), missing), Occurrence("", false, (-120.5791666666667, 34.77083333333334), missing), Occurrence("", false, (-120.5791666666667, 36.39583333333334), missing), Occurrence("", false, (-120.5791666666667, 40.73750000000001), missing), Occurrence("", false, (-120.55416666666669, 34.595833333333346), missing), Occurrence("", true, (-120.54583333333335, 35.66250000000001), missing), Occurrence("", false, (-120.43750000000003, 37.404166666666676), missing), Occurrence("", false, (-120.4291666666667, 35.16250000000001), missing), Occurrence("", true, (-120.37916666666669, 39.50416666666668), missing), Occurrence("", false, (-120.34583333333336, 35.087500000000006), missing), Occurrence("", true, (-120.32916666666668, 39.32083333333334), missing), Occurrence("", false, (-120.29583333333335, 36.25416666666668), missing), Occurrence("", false, (-120.2791666666667, 37.47916666666668), missing), Occurrence("", true, (-120.26250000000002, 39.32083333333334), missing), Occurrence("", true, (-120.24583333333334, 38.029166666666676), missing), Occurrence("", true, (-120.23750000000001, 38.03750000000001), missing), Occurrence("", false, (-120.22916666666669, 36.06250000000001), missing), Occurrence("", true, (-120.22083333333336, 38.86250000000001), missing), Occurrence("", false, (-120.22083333333336, 41.75416666666668), missing), Occurrence("", true, (-120.20416666666668, 36.779166666666676), missing), Occurrence("", true, (-120.20416666666668, 38.79583333333335), missing), Occurrence("", true, (-120.20416666666668, 38.929166666666674), missing), Occurrence("", true, (-120.18750000000001, 38.062500000000014), missing), Occurrence("", false, (-120.18750000000001, 41.054166666666674), missing), Occurrence("", true, (-120.13750000000003, 38.86250000000001), missing), Occurrence("", false, (-120.13750000000003, 40.63750000000001), missing), Occurrence("", false, (-120.12916666666669, 40.54583333333334), missing), Occurrence("", false, (-120.12083333333335, 40.154166666666676), missing), Occurrence("", false, (-120.11250000000003, 36.029166666666676), missing), Occurrence("", false, (-120.09583333333336, 40.00416666666668), missing), Occurrence("", true, (-120.08750000000002, 38.95416666666668), missing), Occurrence("", false, (-120.05416666666667, 40.18750000000001), missing), Occurrence("", false, (-120.04583333333335, 34.062500000000014), missing), Occurrence("", false, (-120.03750000000002, 41.85416666666668), missing), Occurrence("", true, (-120.01250000000002, 38.220833333333346), missing), Occurrence("", true, (-120.01250000000002, 38.85416666666668), missing), Occurrence("", true, (-119.97916666666669, 38.19583333333334), missing), Occurrence("", true, (-119.97916666666669, 38.929166666666674), missing), Occurrence("", false, (-119.9541666666667, 36.179166666666674), missing), Occurrence("", false, (-119.94583333333337, 35.90416666666668), missing), Occurrence("", true, (-119.92916666666669, 34.73750000000001), missing), Occurrence("", true, (-119.9041666666667, 38.179166666666674), missing), Occurrence("", false, (-119.87916666666669, 35.31250000000001), missing), Occurrence("", true, (-119.86250000000003, 38.26250000000001), missing), Occurrence("", true, (-119.83750000000003, 38.279166666666676), missing), Occurrence("", false, (-119.82916666666668, 34.07083333333334), missing), Occurrence("", false, (-119.80416666666667, 36.36250000000001), missing), Occurrence("", true, (-119.78750000000002, 38.32916666666667), missing), Occurrence("", false, (-119.72083333333335, 35.42083333333335), missing), Occurrence("", false, (-119.72083333333335, 35.94583333333335), missing), Occurrence("", false, (-119.71250000000002, 36.054166666666674), missing), Occurrence("", false, (-119.69583333333334, 35.89583333333334), missing), Occurrence("", false, (-119.65416666666667, 36.18750000000001), missing), Occurrence("", true, (-119.64583333333334, 37.36250000000001), missing), Occurrence("", false, (-119.63750000000002, 35.91250000000001), missing), Occurrence("", false, (-119.62083333333337, 36.61250000000001), missing), Occurrence("", true, (-119.62083333333337, 38.51250000000001), missing), Occurrence("", false, (-119.57083333333335, 36.345833333333346), missing), Occurrence("", false, (-119.55416666666669, 36.14583333333334), missing), Occurrence("", true, (-119.54583333333336, 38.304166666666674), missing), Occurrence("", false, (-119.48750000000003, 35.37916666666668), missing), Occurrence("", false, (-119.43750000000003, 35.47916666666668), missing), Occurrence("", false, (-119.42083333333336, 35.24583333333334), missing), Occurrence("", false, (-119.32083333333335, 35.64583333333334), missing), Occurrence("", true, (-119.28750000000002, 34.620833333333344), missing), Occurrence("", true, (-119.26250000000002, 38.04583333333334), missing), Occurrence("", true, (-119.25416666666669, 37.91250000000001), missing), Occurrence("", false, (-119.22916666666669, 35.20416666666668), missing), Occurrence("", true, (-119.21250000000003, 37.23750000000001), missing), Occurrence("", false, (-119.13750000000002, 35.462500000000006), missing), Occurrence("", false, (-119.08750000000003, 34.12916666666668), missing), Occurrence("", false, (-119.07083333333335, 33.48750000000001), missing), Occurrence("", false, (-119.06250000000003, 35.64583333333334), missing), Occurrence("", false, (-118.99583333333335, 35.37916666666668), missing), Occurrence("", true, (-118.99583333333335, 37.36250000000001), missing), Occurrence("", true, (-118.99583333333335, 37.929166666666674), missing), Occurrence("", false, (-118.98750000000001, 33.51250000000001), missing), Occurrence("", false, (-118.97083333333336, 35.35416666666667), missing), Occurrence("", true, (-118.97083333333336, 37.57083333333334), missing), Occurrence("", true, (-118.96250000000002, 36.779166666666676), missing), Occurrence("", true, (-118.91250000000002, 37.57916666666667), missing), Occurrence("", true, (-118.84583333333336, 36.10416666666667), missing), Occurrence("", false, (-118.82916666666668, 34.029166666666676), missing), Occurrence("", false, (-118.82083333333335, 35.54583333333334), missing), Occurrence("", false, (-118.76250000000002, 33.97083333333334), missing), Occurrence("", true, (-118.73750000000003, 36.04583333333334), missing), Occurrence("", false, (-118.66250000000002, 35.26250000000001), missing), Occurrence("", false, (-118.62916666666669, 35.33750000000001), missing), Occurrence("", true, (-118.62916666666669, 37.22916666666667), missing), Occurrence("", true, (-118.58750000000002, 34.76250000000001), missing), Occurrence("", false, (-118.5291666666667, 34.03750000000001), missing), Occurrence("", true, (-118.4291666666667, 34.620833333333344), missing), Occurrence("", true, (-118.37083333333335, 36.712500000000006), missing), Occurrence("", false, (-118.3041666666667, 35.529166666666676), missing), Occurrence("", false, (-118.1791666666667, 35.54583333333334), missing), Occurrence("", true, (-118.12916666666669, 36.054166666666674), missing), Occurrence("", true, (-118.12083333333335, 36.78750000000001), missing), Occurrence("", false, (-118.00416666666669, 33.72916666666668), missing), Occurrence("", false, (-117.98750000000001, 34.25416666666668), missing), Occurrence("", false, (-117.98750000000001, 35.57916666666668), missing), Occurrence("", false, (-117.93750000000003, 33.679166666666674), missing), Occurrence("", false, (-117.92916666666669, 33.97916666666668), missing), Occurrence("", false, (-117.91250000000002, 35.03750000000001), missing), Occurrence("", false, (-117.91250000000002, 37.27083333333334), missing), Occurrence("", false, (-117.85416666666669, 35.495833333333344), missing), Occurrence("", false, (-117.83750000000003, 35.054166666666674), missing), Occurrence("", false, (-117.79583333333335, 35.17083333333334), missing), Occurrence("", false, (-117.78750000000002, 33.94583333333334), missing), Occurrence("", false, (-117.76250000000002, 34.91250000000001), missing), Occurrence("", false, (-117.7041666666667, 35.61250000000001), missing), Occurrence("", false, (-117.67916666666669, 34.64583333333334), missing), Occurrence("", false, (-117.67916666666669, 34.712500000000006), missing), Occurrence("", false, (-117.65416666666668, 37.26250000000002), missing), Occurrence("", false, (-117.62083333333337, 37.27083333333334), missing), Occurrence("", false, (-117.59583333333336, 33.73750000000001), missing), Occurrence("", false, (-117.58750000000003, 35.81250000000001), missing), Occurrence("", false, (-117.57916666666668, 34.91250000000001), missing), Occurrence("", false, (-117.57916666666668, 35.01250000000001), missing), Occurrence("", false, (-117.57916666666668, 35.23750000000001), missing), Occurrence("", false, (-117.54583333333335, 36.337500000000006), missing), Occurrence("", false, (-117.53750000000002, 35.52083333333334), missing), Occurrence("", false, (-117.52916666666668, 33.63750000000001), missing), Occurrence("", false, (-117.52916666666668, 35.06250000000001), missing), Occurrence("", false, (-117.52083333333337, 36.554166666666674), missing), Occurrence("", false, (-117.47916666666669, 33.22916666666668), missing), Occurrence("", false, (-117.47916666666669, 33.370833333333344), missing), Occurrence("", false, (-117.41250000000002, 36.92083333333335), missing), Occurrence("", false, (-117.39583333333337, 36.245833333333344), missing), Occurrence("", false, (-117.38750000000003, 36.00416666666668), missing), Occurrence("", false, (-117.35416666666669, 36.462500000000006), missing), Occurrence("", true, (-117.33750000000002, 34.28750000000001), missing), Occurrence("", false, (-117.31250000000003, 32.82916666666667), missing), Occurrence("", false, (-117.30416666666667, 35.179166666666674), missing), Occurrence("", false, (-117.28750000000002, 35.47916666666668), missing), Occurrence("", false, (-117.24583333333337, 33.779166666666676), missing), Occurrence("", false, (-117.21250000000002, 34.76250000000001), missing), Occurrence("", false, (-117.1541666666667, 35.27083333333334), missing), Occurrence("", false, (-117.14583333333336, 32.870833333333344), missing), Occurrence("", false, (-117.12083333333337, 32.779166666666676), missing), Occurrence("", false, (-117.07083333333335, 33.845833333333346), missing), Occurrence("", false, (-117.06250000000003, 32.76250000000001), missing), Occurrence("", true, (-116.86250000000001, 33.27083333333334), missing), Occurrence("", false, (-116.86250000000001, 35.79583333333335), missing), Occurrence("", false, (-116.85416666666669, 36.50416666666668), missing), Occurrence("", false, (-116.8291666666667, 35.54583333333334), missing), Occurrence("", false, (-116.82083333333335, 35.1375), missing), Occurrence("", false, (-116.77083333333336, 36.19583333333334), missing), Occurrence("", false, (-116.77083333333336, 36.437500000000014), missing), Occurrence("", false, (-116.75416666666669, 35.28750000000001), missing), Occurrence("", false, (-116.72083333333335, 35.087500000000006), missing), Occurrence("", false, (-116.71250000000003, 36.06250000000001), missing), Occurrence("", false, (-116.7041666666667, 36.029166666666676), missing), Occurrence("", false, (-116.7041666666667, 36.220833333333346), missing), Occurrence("", false, (-116.7041666666667, 36.27083333333334), missing), Occurrence("", false, (-116.61250000000001, 34.81250000000001), missing), Occurrence("", false, (-116.56250000000003, 34.712500000000006), missing), Occurrence("", false, (-116.55416666666669, 34.82916666666668), missing), Occurrence("", false, (-116.53750000000002, 32.654166666666676), missing), Occurrence("", true, (-116.52916666666668, 33.95416666666668), missing), Occurrence("", false, (-116.50416666666669, 34.72083333333334), missing), Occurrence("", false, (-116.49583333333335, 35.11250000000001), missing), Occurrence("", false, (-116.4791666666667, 35.03750000000001), missing), Occurrence("", false, (-116.47083333333336, 34.82083333333334), missing), Occurrence("", false, (-116.37083333333335, 35.16250000000001), missing), Occurrence("", false, (-116.36250000000003, 32.80416666666668), missing), Occurrence("", false, (-116.32083333333335, 32.85416666666667), missing), Occurrence("", false, (-116.31250000000003, 33.14583333333334), missing), Occurrence("", false, (-116.29583333333335, 33.370833333333344), missing), Occurrence("", false, (-116.28750000000002, 33.45416666666667), missing), Occurrence("", false, (-116.28750000000002, 34.554166666666674), missing), Occurrence("", false, (-116.27916666666668, 32.66250000000001), missing), Occurrence("", false, (-116.27916666666668, 34.929166666666674), missing), Occurrence("", false, (-116.27083333333334, 34.53750000000001), missing), Occurrence("", false, (-116.27083333333334, 34.66250000000001), missing), Occurrence("", false, (-116.23750000000003, 35.054166666666674), missing), Occurrence("", false, (-116.22083333333336, 32.63750000000001), missing), Occurrence("", false, (-116.2041666666667, 32.654166666666676), missing), Occurrence("", false, (-116.14583333333334, 33.57916666666667), missing), Occurrence("", false, (-116.12916666666669, 33.60416666666667), missing), Occurrence("", false, (-116.09583333333336, 34.62916666666668), missing), Occurrence("", false, (-116.0791666666667, 33.38750000000001), missing), Occurrence("", false, (-116.04583333333335, 34.48750000000001), missing), Occurrence("", false, (-115.99583333333337, 33.20416666666668), missing), Occurrence("", false, (-115.99583333333337, 33.76250000000001), missing), Occurrence("", false, (-115.98750000000003, 33.00416666666668), missing), Occurrence("", false, (-115.98750000000003, 34.20416666666667), missing), Occurrence("", false, (-115.97916666666669, 34.37916666666668), missing), Occurrence("", false, (-115.94583333333335, 33.95416666666668), missing), Occurrence("", false, (-115.94583333333335, 34.21250000000001), missing), Occurrence("", false, (-115.93750000000003, 33.47916666666667), missing), Occurrence("", false, (-115.85416666666669, 33.41250000000001), missing), Occurrence("", false, (-115.82083333333335, 34.18750000000001), missing), Occurrence("", false, (-115.81250000000001, 32.995833333333344), missing), Occurrence("", false, (-115.76250000000003, 33.06250000000001), missing), Occurrence("", false, (-115.70416666666668, 34.370833333333344), missing), Occurrence("", false, (-115.68750000000001, 33.35416666666668), missing), Occurrence("", false, (-115.66250000000002, 34.07916666666667), missing), Occurrence("", false, (-115.62916666666669, 32.73750000000001), missing), Occurrence("", false, (-115.62916666666669, 32.78750000000001), missing), Occurrence("", false, (-115.62916666666669, 33.48750000000001), missing), Occurrence("", false, (-115.62083333333337, 34.929166666666674), missing), Occurrence("", false, (-115.60416666666669, 33.654166666666676), missing), Occurrence("", false, (-115.5541666666667, 32.72916666666668), missing), Occurrence("", false, (-115.46250000000002, 34.76250000000001), missing), Occurrence("", false, (-115.4291666666667, 33.245833333333344), missing), Occurrence("", false, (-115.39583333333336, 34.57083333333334), missing), Occurrence("", false, (-115.14583333333336, 33.86250000000001), missing), Occurrence("", false, (-115.02916666666668, 33.95416666666668), missing)])We will only keep the part of the tiling that covers at least one presence or absence point:
SDT.assignfolds!(T; n = n, order = :horizontal)
S = SDT.keeprelevant(T, O)FeatureCollection with 131 features, each with 5 propertiesGeneration of tiles
It is also possible to generate the tiles directly from the model. We are using the keeprelevant approach here in order to highlight the position of species data within the entire region.

Code for the figure
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
lines!(ax, T; color = :grey70)
for i in 1:n
poly!(
ax,
S["__fold" => i];
alpha = 0.2,
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
lines!(
ax,
S["__fold" => i];
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
end
scatter!(ax, presences(O); color = :black)
scatter!(ax, absences(O); color = :grey40, marker = :cross, markersize = 8)
hidedecorations!(ax)
hidespines!(ax)Before moving on, we will assemble the actual model we will use here:
model = SDM(RawData, NaiveBayes, L, O)❎ RawData → NaiveBayes → P(x) ≥ 0.5 🗺️We generate
folds = spatialfold(model, S);Spatial fold function
The spatialfold function can also be called with a single argument (a FeatureCollection of tiles), in which case it will return a closure that can be called directly on a model like any other function to generate dataset splits.
Because the spatialfold function returns a correct division of samples between training and validation, we can use it as the second argument to crossvalidate:
cv = crossvalidate(model, folds)(validation = ConfusionMatrix[(tp: 0, fp: 0; fn: 2, tn: 56), (tp: 0, fp: 0; fn: 6, tn: 65), (tp: 0, fp: 0; fn: 16, tn: 41), (tp: 0, fp: 0; fn: 43, tn: 10), (tp: 0, fp: 0; fn: 31, tn: 23)], training = ConfusionMatrix[(tp: 0, fp: 0; fn: 96, tn: 139), (tp: 0, fp: 0; fn: 92, tn: 130), (tp: 0, fp: 0; fn: 82, tn: 154), (tp: 0, fp: 0; fn: 55, tn: 185), (tp: 0, fp: 0; fn: 67, tn: 172)])Cross-validation
There is an entire vignette on cross-validation, which covers the important ways to interact with the cross-validation outputs, as well as non-spatial methods to split data.
measures = [mcc, SDeMo.specificity, SDeMo.sensitivity, balancedaccuracy]
cvresult = [measure(set) for measure in measures, set in cv]
nullresult = [measure(null(model)) for measure in measures, null in [coinflip, noskill]]
pretty_table(
hcat(string.(measures), hcat(cvresult, nullresult));
alignment = [:l, :c, :c, :c, :c],
backend = :markdown,
column_labels = ["Measure", "Validation", "Training", "Coin-flip", "No-skill"],
formatters = [fmt__printf("%5.3f", [2, 3, 4, 5])],
)| Measure | Validation | Training | Coin-flip | No-skill |
|---|---|---|---|---|
| mcc | 0.000 | 0.000 | -0.331 | 0.000 |
| specificity | 1.000 | 1.000 | 0.334 | 0.666 |
| sensitivity | 0.000 | 0.000 | 0.334 | 0.334 |
| balancedaccuracy | 0.500 | 0.500 | 0.334 | 0.500 |
This is not a very good model. There are a few reasons for this. First, we have not done any variable selection. Second, the splits are likely to have very different class balance, which can bias the model performance.
pr_by_fold = [sum(uniqueproperties(S["__fold" => i])["__presences"]) for i in 1:n]
ab_by_fold = [sum(uniqueproperties(S["__fold" => i])["__absences"]) for i in 1:n]
extrema(pr_by_fold ./ (pr_by_fold .+ ab_by_fold))(0.034482758620689655, 0.8571428571428571)The splits we have used cover a large range of balances, which means that the model will be both trained and evaluated on very different balances when compared to the actual dataset.
Creating folds with balance
We can instead assign the observations to spatially stratified folds that are optimized to have the same (approx.) class balance as the entire dataset.
SDT.assignfolds!(
S;
n = n,
order = :horizontal,
balanced = true,
)FeatureCollection with 131 features, each with 5 propertiesThe class balancing approach works by starting from an initial position (here, horizontally stratified bands), and then switching tiles between folds until the distance between the balance of each fold and the balance of the dataset is minimized. Internally this is done using a greedy but fast algorithm.
The spatial structure is lost
The currently implemented version of the class balance algorithm will not attempt to maintain the spatial structure of the blocks, nor will it ensure that the folds end up with similar numbers of instances. The number of tiles that constitutes each fold will be maintained.
After performing the optimisation of splits for class balance, we obtained a new division of the landscape:

Code for the figure
f = Figure()
ax = Axis(f[1, 1]; aspect = DataAspect())
lines!(ax, T; color = :grey70)
for i in 1:n
poly!(
ax,
S["__fold" => i];
alpha = 0.2,
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
lines!(
ax,
S["__fold" => i];
color = i,
colorrange = (1, n),
colormap = folds_colors,
)
end
scatter!(ax, presences(O); color = :black)
scatter!(ax, absences(O); color = :grey40, marker = :cross, markersize = 8)
hidedecorations!(ax)
hidespines!(ax)Based on this new split, we can select the variables:
folds = SDT.spatialfold(model, S)
variables!(model, ForwardSelection, folds)
layers(RasterData(EarthEnv, LandCover))[variables(model)]2-element Vector{String}:
"Evergreen/Deciduous Needleleaf Trees"
"Cultivated and Managed Vegetation"As before, this model can be cross-validated:
cv = crossvalidate(model, folds);We can measure the expected performance:
measures = [mcc, SDeMo.specificity, SDeMo.sensitivity, balancedaccuracy]
cvresult = [measure(set) for measure in measures, set in cv]
nullresult = [measure(null(model)) for measure in measures, null in [coinflip, noskill]]
pretty_table(
hcat(string.(measures), hcat(cvresult, nullresult));
alignment = [:l, :c, :c, :c, :c],
backend = :markdown,
column_labels = ["Measure", "Validation", "Training", "Coin-flip", "No-skill"],
formatters = [fmt__printf("%5.3f", [2, 3, 4, 5])],
)| Measure | Validation | Training | Coin-flip | No-skill |
|---|---|---|---|---|
| mcc | 0.600 | 0.631 | -0.331 | 0.000 |
| specificity | 0.895 | 0.892 | 0.334 | 0.666 |
| sensitivity | 0.689 | 0.728 | 0.334 | 0.334 |
| balancedaccuracy | 0.792 | 0.810 | 0.334 | 0.500 |
Related documentation
SpeciesDistributionToolkit.assignfolds! Function
assignfolds!(H::FeatureCollection; n::Integer=10, order::Symbol=:random, group::Bool=true, balanced::Bool=false, maxiter::Integer = 2000)Assigns the features in a tiling (or any other FeatureCollection) to n blocks for spatial cross-validation. Note that the features in H must have a __centroid property which indicates where the center of each cell is.
The order keyword will determine how the tiles are assigned. When using :horizontal or :vertical, the tiles will be assigned either horizontally, or vertically. In this case, the keyword group will determine how the folds are assigned. When group is true (the default), folds are spatially contiguous. When group is false, folds are spatially alternating.
When order is :random, the tiles are assigned fully at random.
When balanced is :true, the tiles must have both __presences and __absences properties. The assignment of a tile to folds is done by using a greedy algorithm (for up to maxiter rounds) which will swap tiles across folds until all folds are as close as possible to reaching the class imbalance of the entire dataset. Note that this step is done after the previous steps (order / group) have been applied.
This method changes the feature collection by adding a __fold property to each tiles, which can be used in conjunction with spatialfold.
SpeciesDistributionToolkit.spatialfold Function
spatialfold(model::SDM, blocks::FeatureCollection)Returns a series of training, validation folds, as a vector of tuple of vectors. This is the same output returned by all cross-validation functions such as kfold and leaveoneout.
The folds are assigned by looking at the "__fold" property of the feature collection. It will likely have been set by assignfolds!.
spatialfold(blocks::FeatureCollection)Creates a closure which, when applied to a model, will return the training,validation folds.
source