Models

This page provides information about the models that are currently implemented. All models have three components: a transformer to prepare the data, a classifier to produce a quantitative prediction, and a threshold to turn it into a presence/absence prediction. All three of these components are trained when training a model (plus or minus the keyword arguments to train!).

Training of transformers

The transformers, by default, are only trained on the presences. This is because, in most cases, the pseudo-absences are sampled from the background. When the model uses actual absences, passing absences=true to functions that train the model will instead use the absence data as well.

Transformers (univariate)

SDeMo.RawData Type

julia

RawData

A transformer that does nothing to the data. This is passing the raw data to the classifier, and can be a good first step for models that assume that the features are independent, or are not sensitive to the scale of the features.

source

SDeMo.ZScore Type

julia

ZScore

A transformer that scales and centers the data, using only the data that are avaiable to the model at training time.

For all variables in the SDM features (regardless of whether they are used), this transformer will store the observed mean and standard deviation. There is no correction on the sample size, because there is no reason to expect that the sample size will be the same for the training and prediction situation.

source

Transformers (multivariate)

The multivariate transformers are using MultivariateStats to handle the training data. During projection, the features are projected using the transformation that was learned from the training data.

SDeMo.PCATransform Type

julia

PCATransform

The PCA transform will project the model features, which also serves as a way to decrease the dimensionality of the problem. Note that this method will only use the training instances, and unless the absences=true keyword is used, only the present cases. This ensure that there is no data leak (neither validation data nor the data from the raster are used).

This is an alias for MultivariateTransform{PCA}.

source

SDeMo.WhiteningTransform Type

julia

WhiteningTransform

The whitening transformation is a linear transformation of the input variables, after which the new variables have unit variance and no correlation. The input is transformed into white noise.

Because this transform will usually keep the first variable "as is", and then apply increasingly important perturbations on the subsequent variables, it is sensitive to the order in which variables are presented, and is less useful when applying tools for interpretation.

This is an alias for MultivariateTransform{Whitening}.

source

SDeMo.MultivariateTransform Type

julia

MultivariateTransform{T} <: Transformer

T is a multivariate transformation, likely offered through the MultivariateStats package. The transformations currently supported are PCA, PPCA, KernelPCA, and Whitening, and they are documented through their type aliases (e.g. PCATransform).

source

Classifiers

SDeMo.NaiveBayes Type

julia

NaiveBayes

Naive Bayes Classifier

By default, upon training, the prior probability will be set to the prevalence of the training data.

source

SDeMo.BIOCLIM Type

julia

BIOCLIM

BIOCLIM

source

SDeMo.DecisionTree Type

julia

DecisionTree

The depth and number of nodes can be adjusted with maxnodes! and maxdepth!.

source

Adding new models

Adding a new transformer or classifier is relatively straightforward (refer to the implementation of ZScore and BIOCLIM for easily digestible examples). The only methods to implement are train! and StatsAPI.predict.

Models ​

Transformers (univariate) ​

Transformers (multivariate) ​

Classifiers ​

Models

Transformers (univariate)

Transformers (multivariate)

Classifiers