Skip to content

The dataset interface

This page is meant for contributors to the package, and specifically provides information on the interface, what to overload, and why.

All of the methods that form the interface have two versions: one for current data, and one for future data. The default behavior of the interface is for the version on future data to fall back to the version for current data (i.e. we assume that future data are provided with the same format as current data). This means that most of the functions will not need to be overloaded when adding a provider with support for future data.

The interface is built around the idea that Julia will use the most specific version of a method first, and resort to the less generic ones when there are multiple matches. A good example is the BioClim dataset, provided by a number of sources, that often has different URLs and filenames. This is handled (in e.g. CHELSA2) by writing a method for the general case of any dataset RasterData{CHELSA2,T} (using a Union type), and then a specific method on RasterData{CHELSA2,BioClim}. In the case of CHELSA2, the general method handles all datasets except BioClim, which makes the code much easier to write and maintain.

Compatibility between datasets and providers

The inner constructor for RasterData involves a call to provides, which must return true for the type to be constructed. The generic method for provides returns false, so additional provider/dataset pairs must be overloaded to return true in order for the corresponding RasterData type to exist.

In practice, especially when there are multiple datasets for a single provider, the easiest way is to define a Union type and overload based on membership to this union type, as touched upon earlier in this document.

SimpleSDMDatasets.provides Function
julia
provides(::Type{P}, ::Type{D}) where {P <: RasterProvider, D <: RasterDataset}

This is the core function upon which the entire interface is built. Its purpose is to specify whether a specific dataset is provided by a specific provider. Note that this function takes two arguments, as opposed to a RasterData argument, because it is called in the inner constructor of RasterData: you cannot instantiate a RasterData with an incompatible provider/dataset combination.

The default value of this function is false, and to allow the use of a dataset with a provider, it is required to overload it for this specific pair so that it returns true.

source

julia
provides(::R, ::F) where {R <: RasterData, F <: Future}

This method for provides specifies whether a RasterData combination has support for the value of the Future (a combination of a FutureScenario and a FutureModel) given as the second argument. Note that this function is not called as part of the Future constructor (because models and scenarios are messy and dataset dependent), but is still called when requesting data.

The default value of this function is false, and to allow the use of a future dataset with a given provider, it is required to overload it so that it returns true.

source

Type of object downloaded

The specification about the format of downloaded files is managed by downloadtype. By default, we assume that a request to a usable dataset is returning a single file, but this can be overloaded for the providers who return an archive.

SimpleSDMDatasets.downloadtype Function
julia
downloadtype(::R) where {R <: RasterData}

This method returns a RasterDownloadType that is used internally to be more explicit about the type of object that is downloaded from the raster source. The supported values are _file (the default, which is an ascii, geotiff, NetCDF, etc. single file), and _zip (a zip archive containing files). This is a trait because we cannot trust file extensions.

source

julia
downloadtype(data::R, ::F) where {R <: RasterData, F <: Future}

This method provides the type of the downloaded object for a combination of a raster source and a future scenario as a RasterDownloadType.

If no overload is given, this will default to downloadtype(data), as we can assume that the type of downloaded object is the same for both current and future scenarios.

source

The return type of the downloadtype must be one of the RasterDownloadType enum, which can be extended if adding a new provider requires a new format for the download.

Type of object stored

The specification about the format of the information contained in the downloaded type is managed by filetype. By default, we assume that a request to a usable dataset is returning a tiff, but this can be overloaded for the providers who return data in another format. Note that if the download type is an archive, the file type describes the format of the files within the archive.

SimpleSDMDatasets.filetype Function
julia
filetype(::R) where {R <: RasterData}

This method returns a RasterFileType that represents the format of the raster data. RasterFileType is an enumerated type. This overload is particularly important as it will determine how the returned file path should be read.

The default value is _tiff.

source

julia
filetype(data::R, ::F) where {R <: RasterData, F <: Future}

This method provides the format of the stored raster for a combination of a raster source and a future scenario as a RasterFileType.

If no overload is given, this will default to filetype(data), as we can assume that the raster format is the same for both current and future scenarios.

source

The return type of the filetype must be one of the RasterFileType enum, which can be extended if adding a new provider requires a new format for the download.

Available resolutions

SimpleSDMDatasets.resolutions Function
julia
resolutions(::R) where {R <: RasterData}

This method controls whether the dataset has a resolution, i.e. a grid size. If this is nothing (the default), it means that the dataset is only given at a set resolution.

An overload of this method is required when there are multiple resolutions available, and must return a Dict with numeric keys (for the resolution) and a string value giving an explanation of the resolution.

Any dataset with a return value that is not nothing must accept the resolution keyword.

source

julia
resolutions(data::R, ::F) where {R <: RasterData, F <: Future}

This methods control the resolutions for a future dataset. Unless overloaded, it will return resolutions(data).

source

Available layers

SimpleSDMDatasets.layers Function
julia
layers(::R) where {R <: RasterData}

This method controls whether the dataset has named layers. If this is nothing (the default), it means that the dataset will have a single layer.

An overload of this method is required when there are multiple layers available, and must return a Vector, usually of String. Note that by default, the layers can also be accessed by using an Integer, in which case layer=i will be the i-th entry in layers(data).

Any dataset with a return value that is not nothing must accept the layer keyword.

source

SimpleSDMDatasets.layerdescriptions Function
julia
layerdescriptions(data::R) where {R <: RasterData}

Human-readable names the layers. This must be a dictionary mapping the layer names (as returned by layers) to a string explaining the contents of the layers.

source

Available months

SimpleSDMDatasets.months Function
julia
months(::R) where {R <: RasterData}

This method controls whether the dataset has monthly layers. If this is nothing (the default), it means that the dataset is not accessible at a monthly resolution.

An overload of this method is required when there are multiple months available, and must return a Vector{Dates.Month}.

Any dataset with a return value that is not nothing must accept the month keyword.

source

Available years

SimpleSDMDatasets.timespans Function
julia
timespans(data::R, ::F) where {R <: RasterData, F <: Future}

For datasets with a Future scenario, this method should return a Vector of Pairs, which are formatted as

Year(start) => Year(end)

There is a method working on a single RasterData argument, defaulting to returning nothing, but it should never be overloaded.

source

Additional keyword arguments

SimpleSDMDatasets.extrakeys Function
julia
extrakeys(::R) where {R <: RasterData}

This method controls whether the dataset has additional keys. If this is nothing (the default), it means that the dataset can be accessed using only the default keys specified in this interface.

An overload of this method is required when there are additional keywords needed to access the data (e.g. full=true for the EarthEnv land-cover data), and must return a Dict, with Symbol keys and Tuples of pairs as values.

The key is the keyword argument passed to downloader and the tuple lists all accepted values, in the format value => explanation.

Any dataset with a return value that is not nothing must accept the keyword arguments specified in the return value.

source

URL for the data to download

SimpleSDMDatasets.source Function
julia
source(::RasterData{P, D}; kwargs...) where {P <: RasterProvider, D <: RasterDataset}

This method specifies the URL for the data. It defaults to nothing, so this method must be overloaded.

source

Path to the data locally

SimpleSDMDatasets.destination Function
julia
destination(::RasterData{P, D}; kwargs...) where {P <: RasterProvider, D <: RasterDataset}

This method specifies where the data should be stored locally. By default, it is the _LAYER_PATH, followed by the provider name, followed by the dataset name.

source

URL for additional information

The url method will display one URL re-directing users to either the description of the provider, or the description of the dataset. A minima, the version for the RasterProvider should be specified. Note that this must return a Markdown string.

Most of the RasterDataset will have a default blurb, but more specific (i.e. adapted to a particular prodiver) ones can be provided.

SimpleSDMDatasets.url Function
julia
url(::P) where {P <: DataProvider}

The URL for the data provider - if there is no specific URL for each dataset, it is enough to define this one.

source

Additional information about a dataset

The blurb is a short text explaining what the dataset / provider is about. A minima, the version for the RasterProvider should be specified. In some cases, it is acceptable to only define a version for one RasterDataset and any RasterProvider, although a more specific dispatch can be implemented.

SimpleSDMDatasets.blurb Function
julia
blurb(::Type{P}) where {P <: RasterProvider}

source