An introduction to BiodiversityObservationNetworks

In this vignette, we will walk through the basic functionalities of the package, by generating a random uncertainty matrix, and then using a seeder and a refiner to decide which locations should be sampled in order to gain more insights about the process generating this entropy.

using BiodiversityObservationNetworks
using NeutralLandscapes
using CairoMakie

In order to simplify the process, we will use the NeutralLandscapes package to generate a 100×100 pixels landscape, where each cell represents the entropy (or information content) in a unit we can sample:

U = rand(MidpointDisplacement(0.5), (100, 100))
heatmap(U)

In practice, this uncertainty matrix is likely to be derived from an application of the hyper-parameters optimization step, which is detailed in other vignettes.

The first step of defining a series of locations to sample is to use a BONSeeder, which will generate a number of relatively coarse proposals that cover the entire landscape, and have a balanced distribution in space. We do so using the BalancedAcceptance sampler, which can be tweaked to capture more (or less) uncertainty. To start with, we will extract 200 candidate points, i.e. 200 possible locations which will then be refined.

pack = seed(BalancedAcceptance(; numpoints = 200), U);

(CartesianIndex[CartesianIndex(33, 52), CartesianIndex(83, 85), CartesianIndex(21, 30), CartesianIndex(46, 96), CartesianIndex(5, 44), CartesianIndex(67, 33), CartesianIndex(11, 2), CartesianIndex(36, 68), CartesianIndex(24, 46), CartesianIndex(74, 79)  …  CartesianIndex(28, 62), CartesianIndex(78, 96), CartesianIndex(16, 10), CartesianIndex(66, 44), CartesianIndex(91, 22), CartesianIndex(9, 55), CartesianIndex(59, 88), CartesianIndex(84, 66), CartesianIndex(72, 1), CartesianIndex(47, 34)], [0.3691706464677344 0.4429044910426994 … 0.45886884963775637 0.4904602020021396; 0.3296539323902518 0.32976222566025065 … 0.41733797835197195 0.42925871367938806; … ; 0.6825088482584936 0.6509222027336945 … 0.3847779648133036 0.3965383410501949; 0.6087713955544382 0.6480963131288376 … 0.38940970816372344 0.4037758777481499])

The output of a BONSampler (whether at the seeding or refinement step) is always a tuple, storing in the first position a vector of CartesianIndex elements, and in the second position the matrix given as input. We can have a look at the first five points:

first(pack)[1:5]

5-element Vector{CartesianIndex}:
 CartesianIndex(33, 52)
 CartesianIndex(83, 85)
 CartesianIndex(21, 30)
 CartesianIndex(46, 96)
 CartesianIndex(5, 44)

Although returning the input matrix may seem redundant, it actually allows to chain samplers together to build pipelines that take a matrix as input, and return a set of places to sample as outputs; an example is given below.

The positions of locations to sample are given as a vector of CartesianIndex, which are coordinates in the uncertainty matrix. Once we have generated a candidate proposal, we can further refine it using a BONRefiner – in this case, AdaptiveSpatial, which performs adaptive spatial sampling (maximizing the distribution of entropy while minimizing spatial auto-correlation).

candidates, uncertainty = pack
locations, _ = refine(candidates, AdaptiveSpatial(; numpoints = 50), uncertainty)
locations[1:5]

5-element Vector{CartesianIndex}:
 CartesianIndex(76, 97)
 CartesianIndex(78, 96)
 CartesianIndex(16, 10)
 CartesianIndex(82, 94)
 CartesianIndex(21, 12)

The reason we start from a candidate set of points is that some algorithms struggle with full landscapes, and work much better with a sub-sample of them. There is no hard rule (or no heuristic) to get a sense for how many points should be generated at the seeding step, and so experimentation is a must!

The previous code examples used a version of the seed and refine functions that is very useful if you want to change arguments between steps, or examine the content of the candidate pool of points. In addition to this syntax, both functions have a curried version that allows chaining them together using pipes (|>):

locations =
    U |>
    seed(BalancedAcceptance(; numpoints = 200)) |>
    refine(AdaptiveSpatial(; numpoints = 50)) |>
    first

50-element Vector{CartesianIndex}:
 CartesianIndex(74, 96)
 CartesianIndex(73, 84)
 CartesianIndex(72, 86)
 CartesianIndex(72, 98)
 CartesianIndex(69, 83)
 CartesianIndex(66, 80)
 CartesianIndex(64, 77)
 CartesianIndex(61, 75)
 CartesianIndex(70, 90)
 CartesianIndex(76, 92)
 ⋮
 CartesianIndex(51, 50)
 CartesianIndex(76, 27)
 CartesianIndex(13, 61)
 CartesianIndex(88, 42)
 CartesianIndex(32, 53)
 CartesianIndex(70, 64)
 CartesianIndex(95, 3)
 CartesianIndex(16, 80)
 CartesianIndex(66, 25)

This works because seed and refine have curried versions that can be used directly in a pipeline. Proposed sampling locations can then be overlayed onto the original uncertainty matrix:

plt = heatmap(U)
#scatter!(plt, [x[1] for x in locations], [x[2] for x in locations], ms=2.5, mc=:white, label="")