Tools for SDM demos and education

The prediction pipeline

julia

AbstractSDM

This abstract type covers both the regular and the ensemble models.

source

SDeMo.AbstractEnsembleSDM Type

julia

AbstractEnsembleSDM

This abstract types covers model that combine different SDMs to make a prediction, which currently covers Bagging and Ensemble.

source

SDeMo.SDM Type

julia

SDM

This type specifies a full model, which is composed of a transformer (which applies a transformation on the data), a classifier (which returns a quantitative score), a threshold (above which the score corresponds to the prediction of a presence).

In addition, the SDM carries with it the training features and labels, as well as a vector of indices indicating which variables are actually used by the model.

source

SDeMo.Transformer Type

julia

Transformer

This abstract type covers all transformations that are applied to the data before fitting the classifier.

source

SDeMo.Classifier Type

julia

Classifier

This abstract type covers all algorithms to convert transformed data into prediction.

source

Utility functions

SDeMo.features Function

julia

features(sdm::SDM)

Returns the features stored in the field X of the SDM. Note that the features are an array, and this does not return a copy of it – any change made to the output of this function will change the content of the SDM features.

source

julia

features(sdm::SDM, n)

Returns the n-th feature stored in the field X of the SDM.

source

SDeMo.labels Function

julia

labels(sdm::SDM)

Returns the labels stored in the field y of the SDM – note that this is not a copy of the labels, but the object itself.

source

SDeMo.threshold Function

julia

threshold(sdm::SDM)

This returns the value above which the score returned by the SDM is considered to be a presence.

source

SDeMo.threshold! Function

julia

threshold!(sdm::SDM, τ)

Sets the value of the threshold.

source

SDeMo.variables Function

julia

variables(sdm::SDM)

Returns the list of variables used by the SDM – these may be ordered by importance. This does not return a copy of the variables array, but the array itself.

source

SDeMo.variables! Function

julia

variables!(sdm::SDM, v)

Sets the list of variables.

source

julia

variables!(ensemble::Bagging, v::Vector{Int})

Sets the variable of the top-level model, and then sets the variables of each model in the ensemble.

source

julia

variables!(model::AbstractSDM, ::Type{T}, folds::Vector{Tuple{Vector{Int}, Vector{Int}}}; included=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {T <: VariableSelectionStrategy}

Performs variable selection based on a selection strategy, with a possible folds for cross-validation. If omitted, this defaults to k-folds.

The model is retrained on the optimal set of variables after training.

Keywords:

included (Int[]), a list of variables that must be included in the model
optimality (mcc), the measure to optimise at each round of variable selection
verbose (false), whether the performance should be returned after each round of variable selection
bagfeatures (false), whether bagfeatures! should be called on each model in an homogeneous ensemble
all other keywords are passed to train! and crossvalidate

Important notes:

When using bagfeatures with a pool of included variables, they will always be present in the overall model, but not necessarilly in each model of the ensemble
When using VarianceInflationFactor, the variable selection will stop even if the VIF is above the threshold, if it means producing a model with a lower performance – using variables! will always lead to a better model

source

julia

variables!(model::M, ::Type{StrictVarianceInflationFactor{N}}, args...; included::Vector{Int}=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {M <: Union{SDM, Bagging}, N}

Version of the variable selection for the strict VIF case. This may result in a worse model, and for this reason there is no cross-validation.

source

SDeMo.instance Function

julia

instance(sdm::SDM, n; strict=true)

Returns the n-th instance stored in the field X of the SDM. If the keyword argument strict is true, only the variables used for prediction are returned.

source

Training and predicting

SDeMo.train! Function

julia

train!(ensemble::Bagging; kwargs...)

Trains all the model in an ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

source

julia

train!(ensemble::Ensemble; kwargs...)

Trains all the model in an heterogeneous ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

The keywod arguments are passed to train! and can include the training indices.

source

julia

train!(sdm::SDM; threshold=true, training=:, optimality=mcc)

This is the main training function to train a SDM.

The three keyword arguments are:

training: defaults to :, and is the range (or alternatively the indices) of the data that are used to train the model
threshold: defaults to true, and performs moving threshold by evaluating 200 possible values between the minimum and maximum output of the model, and returning the one that is optimal
optimality: defaults to mcc, and is the function applied to the confusion matrix to evaluate which value of the threshold is the best
absences: defaults to false, and indicates whether the (pseudo) absences are used to train the transformer; when using actual absences, this should be set to true

Internally, this function trains the transformer, then projects the data, then trains the classifier. If threshold is true, the threshold is then optimized.

source

StatsAPI.predict Function

julia

predict(model::RegressionModel, [newX])

Form the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.

source

SDeMo.reset! Function

julia

reset!(sdm::SDM, thr=0.5)

Resets a model, with a potentially specified value of the threshold. This amounts to re-using all the variables, and removing the tuned threshold version.

source

Tools for SDM demos and education ​

The prediction pipeline ​

Utility functions ​

Training and predicting ​

Tools for SDM demos and education

The prediction pipeline

Utility functions

Training and predicting