Working with GBIF data
Justification for this use case: building species distribution models requires information of where species are. In this document, we will see how the SimpleSDMLayers
and GBIF
packages interact, as a first step to get in a usable state. Specifically, we will work on the occurences of the relationship between temperature and precipitation for a few occurrences of the fungus Hypomyces lactifluorum, which will be the taxon used for all SDM-related vignettes.
using SimpleSDMLayers
using GBIF
using Plots
using Statistics
We will focus on showing where on the temperature/precipitation space the occurrences are, so we will download these layers:
temperature, precipitation = SimpleSDMPredictor(WorldClim, BioClim, [1, 12])
2-element Vector{SimpleSDMPredictor{Float32}}:
SDM predictor → 1080×2160 grid with 808053 Float32-valued cells
SDM predictor → 1080×2160 grid with 808053 Float32-valued cells
And now, we can follow the GBIF documentation to get some occurrences
observations = occurrences(
GBIF.taxon("Hypomyces lactifluorum"; strict=true),
"hasCoordinate" => "true",
"country" => "CA",
"country" => "US",
"limit" => 300,
)
while length(observations) < size(observations)
occurrences!(observations)
end
@info observations
[ Info: GBIF records: downloaded 2921 out of 2921
We can then extract the temperature for the first occurrence:
temperature[observations[1]]
11.541479f0
Of course, it would be unwieldy to do this for every occurrence in our dataset, and so we will see a way do it much faster. But first, we do not need the entire surface of the planet to perform our analysis, and so we will instead clip the layers:
temperature = clip(temperature, observations)
precipitation = clip(precipitation, observations)
SDM predictor → 278×712 grid with 100109 Float32-valued cells
Latitudes 24.666666666666668 ⇢ 71.0
Longitudes -169.0 ⇢ -50.33333333333333
This will make the future queries a little faster. By default, the clip
function will ad a 5% margin on every side. To get the values of a layer at every occurrence in a GBIFRecord
, we simply pass the records as a position:
histogram2d(temperature, precipitation; c=:devon, frame=:zerolines, leg=false, lab="")
scatter!(
temperature[observations],
precipitation[observations];
lab="",
c=:white,
msc=:orange,
alpha=0.2,
)
xaxis!("Temperature (°C)")
yaxis!("Precipitation (mm)")
This will return a record of all data for all geo-localized occurrences (i.e. neither the latitude nor the longitude is missing
) in a GBIFRecords
collection, as an array of the eltype
of the layer. Note that the layer values can be nothing
, in which case you might need to run filter(!isnothing, temperature_clip[kf_occurrences]
for it to work with the plotting functions.
We can also plot the records over space, using the overloads of the latitudes
and longitudes
functions:
contour(temperature; c=:cork, frame=:box, fill=true, clim=(-20, 20), levels=6)
scatter!(
longitudes(observations), latitudes(observations); lab="", c=:white, msc=:orange, ms=2
)
We can finally make a layer with the number of observations per cells:
presabs = mask(temperature, observations, Float32)
plot(log1p(presabs); c=:tokyo)
Because the cells are rather small, and there are few observations, this is not necessarily going to be very informative - to get a better sense of the distribution of observations, we can get the average number of observations in a radius of 100km around each cell (we will do so for a zoomed-in part of the map to save time):
zoom = clip(presabs; left=-80.0, right=-65.0, top=50.0, bottom=40.0)
buffered = slidingwindow(zoom, Statistics.mean, 100.0)
plot(buffered; c=:tokyo, frame=:box)
scatter!(
longitudes(observations),
latitudes(observations);
lab="",
c=:white,
msc=:orange,
ms=2,
alpha=0.5,
)