Skip to content

Tools for SDM demos and education

The prediction pipeline

SDeMo.AbstractSDM Type
julia
AbstractSDM

This abstract type covers both the regular and the ensemble models.

source
SDeMo.AbstractEnsembleSDM Type
julia
AbstractEnsembleSDM

This abstract types covers model that combine different SDMs to make a prediction, which currently covers Bagging and Ensemble.

source
SDeMo.SDM Type
julia
SDM

This type specifies a full model, which is composed of a transformer (which applies a transformation on the data), a classifier (which returns a quantitative score), a threshold (above which the score corresponds to the prediction of a presence).

In addition, the SDM carries with it the training features and labels, as well as a vector of indices indicating which variables are actually used by the model.

source
SDeMo.Transformer Type
julia
Transformer

This abstract type covers all transformations that are applied to the data before fitting the classifier.

source
SDeMo.Classifier Type
julia
Classifier

This abstract type covers all algorithms to convert transformed data into prediction.

source

Utility functions

SDeMo.features Function
julia
features(sdm::SDM)

Returns the features stored in the field X of the SDM. Note that the features are an array, and this does not return a copy of it – any change made to the output of this function will change the content of the SDM features.

source
julia
features(sdm::SDM, n)

Returns the n-th feature stored in the field X of the SDM.

source
SDeMo.labels Function
julia
labels(sdm::SDM)

Returns the labels stored in the field y of the SDM – note that this is not a copy of the labels, but the object itself.

source
SDeMo.threshold Function
julia
threshold(sdm::SDM)

This returns the value above which the score returned by the SDM is considered to be a presence.

source
SDeMo.threshold! Function
julia
threshold!(sdm::SDM, τ)

Sets the value of the threshold.

source
SDeMo.variables Function
julia
variables(sdm::SDM)

Returns the list of variables used by the SDM – these may be ordered by importance. This does not return a copy of the variables array, but the array itself.

source
SDeMo.variables! Function
julia
variables!(sdm::SDM, v)

Sets the list of variables.

source
julia
variables!(ensemble::Bagging, v::Vector{Int})

Sets the variable of the top-level model, and then sets the variables of each model in the ensemble.

source
julia
variables!(model::AbstractSDM, ::Type{T}, folds::Vector{Tuple{Vector{Int}, Vector{Int}}}; included=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {T <: VariableSelectionStrategy}

Performs variable selection based on a selection strategy, with a possible folds for cross-validation. If omitted, this defaults to k-folds.

The model is retrained on the optimal set of variables after training.

Keywords:

  • included (Int[]), a list of variables that must be included in the model

  • optimality (mcc), the measure to optimise at each round of variable selection

  • verbose (false), whether the performance should be returned after each round of variable selection

  • bagfeatures (false), whether bagfeatures! should be called on each model in an homogeneous ensemble

  • all other keywords are passed to train! and crossvalidate

Important notes:

  1. When using bagfeatures with a pool of included variables, they will always be present in the overall model, but not necessarilly in each model of the ensemble

  2. When using VarianceInflationFactor, the variable selection will stop even if the VIF is above the threshold, if it means producing a model with a lower performance – using variables! will always lead to a better model

source
julia
variables!(model::M, ::Type{StrictVarianceInflationFactor{N}}, args...; included::Vector{Int}=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {M <: Union{SDM, Bagging}, N}

Version of the variable selection for the strict VIF case. This may result in a worse model, and for this reason there is no cross-validation.

source
SDeMo.instance Function
julia
instance(sdm::SDM, n; strict=true)

Returns the n-th instance stored in the field X of the SDM. If the keyword argument strict is true, only the variables used for prediction are returned.

source

Training and predicting

SDeMo.train! Function
julia
train!(ensemble::Bagging; kwargs...)

Trains all the model in an ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

source
julia
train!(ensemble::Ensemble; kwargs...)

Trains all the model in an heterogeneous ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.

The keywod arguments are passed to train! and can include the training indices.

source
julia
train!(sdm::SDM; threshold=true, training=:, optimality=mcc)

This is the main training function to train a SDM.

The three keyword arguments are:

  • training: defaults to :, and is the range (or alternatively the indices) of the data that are used to train the model

  • threshold: defaults to true, and performs moving threshold by evaluating 200 possible values between the minimum and maximum output of the model, and returning the one that is optimal

  • optimality: defaults to mcc, and is the function applied to the confusion matrix to evaluate which value of the threshold is the best

  • absences: defaults to false, and indicates whether the (pseudo) absences are used to train the transformer; when using actual absences, this should be set to true

Internally, this function trains the transformer, then projects the data, then trains the classifier. If threshold is true, the threshold is then optimized.

source
StatsAPI.predict Function
julia
predict(model::RegressionModel, [newX])

Form the predicted response of model. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit model; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.

source
SDeMo.reset! Function
julia
reset!(sdm::SDM, thr=0.5)

Resets a model, with a potentially specified value of the threshold. This amounts to re-using all the variables, and removing the tuned threshold version.

source