The dataset interface
This page is meant for contributors to the package, and specifically provides information on the interface, what to overload, and why.
All of the methods that form the interface have two versions: one for current data, and one for future data. The default behavior of the interface is for the version on future data to fall back to the version for current data (i.e. we assume that future data are provided with the same format as current data). This means that most of the functions will not need to be overloaded when adding a provider with support for future data.
The interface is built around the idea that Julia will use the most specific version of a method first, and resort to the less generic ones when there are multiple matches. A good example is the BioClim
dataset, provided by a number of sources, that often has different URLs and filenames. This is handled (in e.g. CHELSA2
) by writing a method for the general case of any dataset RasterData{CHELSA2,T}
(using a Union
type), and then a specific method on RasterData{CHELSA2,BioClim}
. In the case of CHELSA2
, the general method handles all datasets except BioClim
, which makes the code much easier to write and maintain.
Compatibility between datasets and providers
The inner constructor for RasterData
involves a call to provides
, which must return true
for the type to be constructed. The generic method for provides
returns false
, so additional provider/dataset pairs must be overloaded to return true
in order for the corresponding RasterData
type to exist.
In practice, especially when there are multiple datasets for a single provider, the easiest way is to define a Union
type and overload based on membership to this union type, as touched upon earlier in this document.
SimpleSDMDatasets.provides
— Functionprovides(::Type{P}, ::Type{D}) where {P <: RasterProvider, D <: RasterDataset}
This is the core function upon which the entire interface is built. Its purpose is to specify whether a specific dataset is provided by a specific provider. Note that this function takes two arguments, as opposed to a RasterData
argument, because it is called in the inner constructor of RasterData
: you cannot instantiate a RasterData
with an incompatible provider/dataset combination.
The default value of this function is false
, and to allow the use of a dataset with a provider, it is required to overload it for this specific pair so that it returns true
.
provides(::R, ::F) where {R <: RasterData, F <: Future}
This method for provides
specifies whether a RasterData
combination has support for the value of the Future
(a combination of a FutureScenario
and a FutureModel
) given as the second argument. Note that this function is not called as part of the Future
constructor (because models and scenarios are messy and dataset dependent), but is still called when requesting data.
The default value of this function is false
, and to allow the use of a future dataset with a given provider, it is required to overload it so that it returns true
.
Type of object downloaded
The specification about the format of downloaded files is managed by downloadtype
. By default, we assume that a request to a usable dataset is returning a single file, but this can be overloaded for the providers who return an archive.
SimpleSDMDatasets.downloadtype
— Functiondownloadtype(::R) where {R <: RasterData}
This method returns a RasterDownloadType
that is used internally to be more explicit about the type of object that is downloaded from the raster source. The supported values are _file
(the default, which is an ascii, geotiff, NetCDF, etc. single file), and _zip
(a zip archive containing files). This is a trait because we cannot trust file extensions.
downloadtype(data::R, ::F) where {R <: RasterData, F <: Future}
This method provides the type of the downloaded object for a combination of a raster source and a future scenario as a RasterDownloadType
.
If no overload is given, this will default to downloadtype(data)
, as we can assume that the type of downloaded object is the same for both current and future scenarios.
The return type of the downloadtype
must be one of the RasterDownloadType
enum, which can be extended if adding a new provider requires a new format for the download.
SimpleSDMDatasets.RasterDownloadType
— TypeRasterDownloadType
This enum stores the possible types of downloaded files. They are listed with instances(RasterDownloadType)
, and are currently limited to _file
(a file, can be read directly) and _zip
(an archive, must be unzipped).
Type of object stored
The specification about the format of the information contained in the downloaded type is managed by filetype
. By default, we assume that a request to a usable dataset is returning a tiff
, but this can be overloaded for the providers who return data in another format. Note that if the download type is an archive, the file type describes the format of the files within the archive.
SimpleSDMDatasets.filetype
— Functionfiletype(::R) where {R <: RasterData}
This method returns a RasterFileType
that represents the format of the raster data. RasterFileType
is an enumerated type. This overload is particularly important as it will determine how the returned file path should be read.
The default value is _tiff
.
filetype(data::R, ::F) where {R <: RasterData, F <: Future}
This method provides the format of the stored raster for a combination of a raster source and a future scenario as a RasterFileType
.
If no overload is given, this will default to filetype(data)
, as we can assume that the raster format is the same for both current and future scenarios.
The return type of the filetype
must be one of the RasterFileType
enum, which can be extended if adding a new provider requires a new format for the download.
SimpleSDMDatasets.RasterFileType
— TypeRasterFileType
This enum stores the possible types of returned files. They are listed with instances(RasterFileType)
.
Available resolutions
SimpleSDMDatasets.resolutions
— Functionresolutions(::R) where {R <: RasterData}
This method controls whether the dataset has a resolution, i.e. a grid size. If this is nothing
(the default), it means that the dataset is only given at a set resolution.
An overload of this method is required when there are multiple resolutions available, and must return a Dict
with numeric keys (for the resolution) and string values (giving the textual representation of these keys, usually in the way that is usable to build the url).
Any dataset with a return value that is not nothing
must accept the resolution
keyword.
resolutions(data::R, ::F) where {R <: RasterData, F <: Future}
This methods control the resolutions
for a future dataset. Unless overloaded, it will return resolutions(data)
.
Available layers
SimpleSDMDatasets.layers
— Functionlayers(::R) where {R <: RasterData}
This method controls whether the dataset has named layers. If this is nothing
(the default), it means that the dataset will have a single layer.
An overload of this method is required when there are multiple layers available, and must return a Vector
, usually of String
. Note that by default, the layers can also be accessed by using an Integer
, in which case layer=i
will be the i-th entry in layers(data)
.
Any dataset with a return value that is not nothing
must accept the layer
keyword.
SimpleSDMDatasets.layerdescriptions
— Functionlayerdescriptions(data::R) where {R <: RasterData}
Human-readable names the layers. This must be a dictionary mapping the layer names (as returned by layers
) to a string explaining the contents of the layers.
Available months
SimpleSDMDatasets.months
— Functionmonths(::R) where {R <: RasterData}
This method controls whether the dataset has monthly layers. If this is nothing
(the default), it means that the dataset is not accessible at a monthly resolution.
An overload of this method is required when there are multiple months available, and must return a Vector{Dates.Month}
.
Any dataset with a return value that is not nothing
must accept the month
keyword.
Available years
SimpleSDMDatasets.timespans
— Functiontimespans(data::R, ::F) where {R <: RasterData, F <: Future}
For datasets with a Future
scenario, this method should return a Vector
of Pairs
, which are formatted as
Year(start) => Year(end)
There is a method working on a single RasterData
argument, defaulting to returning nothing
, but it should never be overloaded.
Additional keyword arguments
SimpleSDMDatasets.extrakeys
— Functionextrakeys(::R) where {R <: RasterData}
This method controls whether the dataset has additional keys. If this is nothing
(the default), it means that the dataset can be accessed using only the default keys specified in this interface.
An overload of this method is required when there are additional keywords needed to access the data (e.g. full=true
for the EarthEnv
land-cover data), and must return a Dict
, with Symbol
keys and Tuple
arguments, where the key is the keyword argument passed to downloader
and the tuple lists all accepted values.
Any dataset with a return value that is not nothing
must accept the keyword arguments specified in the return value.
URL for the data to download
SimpleSDMDatasets.source
— Functionsource(::RasterData{P, D}; kwargs...) where {P <: RasterProvider, D <: RasterDataset}
This method specifies the URL for the data. It defaults to nothing
, so this method must be overloaded.
Path to the data locally
SimpleSDMDatasets.destination
— Functiondestination(::RasterData{P, D}; kwargs...) where {P <: RasterProvider, D <: RasterDataset}
This method specifies where the data should be stored locally. By default, it is the _LAYER_PATH
, followed by the provider name, followed by the dataset name.