Models
This page provides information about the models that are currently implemented. All models have three components: a transformer to prepare the data, a classifier to produce a quantitative prediction, and a threshold to turn it into a presence/absence prediction. All three of these components are trained when training a model (plus or minus the keyword arguments to train!
).
Training of transformers
The transformers, by default, are only trained on the presences. This is because, in most cases, the pseudo-absences are sampled from the background. When the model uses actual absences, passing absences=true
to functions that train the model will instead use the absence data as well.
Transformers (univariate)
SDeMo.RawData Type
RawData
A transformer that does nothing to the data. This is passing the raw data to the classifier, and can be a good first step for models that assume that the features are independent, or are not sensitive to the scale of the features.
SDeMo.ZScore Type
ZScore
A transformer that scales and centers the data, using only the data that are avaiable to the model at training time.
For all variables in the SDM features (regardless of whether they are used), this transformer will store the observed mean and standard deviation. There is no correction on the sample size, because there is no reason to expect that the sample size will be the same for the training and prediction situation.
Transformers (multivariate)
The multivariate transformers are using MultivariateStats
to handle the training data. During projection, the features are projected using the transformation that was learned from the training data.
SDeMo.PCATransform Type
PCATransform
The PCA transform will project the model features, which also serves as a way to decrease the dimensionality of the problem. Note that this method will only use the training instances, and unless the absences=true
keyword is used, only the present cases. This ensure that there is no data leak (neither validation data nor the data from the raster are used).
This is an alias for MultivariateTransform{PCA}
.
SDeMo.WhiteningTransform Type
WhiteningTransform
The whitening transformation is a linear transformation of the input variables, after which the new variables have unit variance and no correlation. The input is transformed into white noise.
Because this transform will usually keep the first variable "as is", and then apply increasingly important perturbations on the subsequent variables, it is sensitive to the order in which variables are presented, and is less useful when applying tools for interpretation.
This is an alias for MultivariateTransform{Whitening}
.
SDeMo.MultivariateTransform Type
MultivariateTransform{T} <: Transformer
T
is a multivariate transformation, likely offered through the MultivariateStats
package. The transformations currently supported are PCA
, PPCA
, KernelPCA
, and Whitening
, and they are documented through their type aliases (e.g. PCATransform
).
Classifiers
SDeMo.NaiveBayes Type
NaiveBayes
Naive Bayes Classifier
By default, upon training, the prior probability will be set to the prevalence of the training data.
SDeMo.DecisionTree Type
DecisionTree
The depth and number of nodes can be adjusted with maxnodes!
and maxdepth!
.
Adding new models
Adding a new transformer or classifier is relatively straightforward (refer to the implementation of ZScore
and BIOCLIM
for easily digestible examples). The only methods to implement are train!
and StatsAPI.predict
.