Tools for SDM demos and education
The prediction pipeline
SDeMo.AbstractSDM Type
AbstractSDM
This abstract type covers both the regular and the ensemble models.
SDeMo.AbstractEnsembleSDM Type
AbstractEnsembleSDM
This abstract types covers model that combine different SDMs to make a prediction, which currently covers Bagging
and Ensemble
.
SDeMo.SDM Type
SDM
This type specifies a full model, which is composed of a transformer (which applies a transformation on the data), a classifier (which returns a quantitative score), a threshold (above which the score corresponds to the prediction of a presence).
In addition, the SDM carries with it the training features and labels, as well as a vector of indices indicating which variables are actually used by the model.
SDeMo.Transformer Type
Transformer
This abstract type covers all transformations that are applied to the data before fitting the classifier.
SDeMo.Classifier Type
Classifier
This abstract type covers all algorithms to convert transformed data into prediction.
Utility functions
SDeMo.features Function
features(sdm::SDM)
Returns the features stored in the field X
of the SDM. Note that the features are an array, and this does not return a copy of it – any change made to the output of this function will change the content of the SDM features.
features(sdm::SDM, n)
Returns the n-th feature stored in the field X
of the SDM.
SDeMo.labels Function
labels(sdm::SDM)
Returns the labels stored in the field y
of the SDM – note that this is not a copy of the labels, but the object itself.
SDeMo.threshold Function
threshold(sdm::SDM)
This returns the value above which the score returned by the SDM is considered to be a presence.
SDeMo.variables Function
variables(sdm::SDM)
Returns the list of variables used by the SDM – these may be ordered by importance. This does not return a copy of the variables array, but the array itself.
SDeMo.instance Function
instance(sdm::SDM, n; strict=true)
Returns the n-th instance stored in the field X
of the SDM. If the keyword argument strict
is true
, only the variables used for prediction are returned.
Training and predicting
SDeMo.train! Function
train!(ensemble::Bagging; kwargs...)
Trains all the model in an ensemble model - the keyword arguments are passed to train!
for each model. Note that this retrains the entire model, which includes the transformers.
train!(ensemble::Ensemble; kwargs...)
Trains all the model in an heterogeneous ensemble model - the keyword arguments are passed to train!
for each model. Note that this retrains the entire model, which includes the transformers.
The keywod arguments are passed to train!
and can include the training
indices.
train!(sdm::SDM; threshold=true, training=:, optimality=mcc)
This is the main training function to train a SDM.
The three keyword arguments are:
training
: defaults to:
, and is the range (or alternatively the indices) of the data that are used to train the modelthreshold
: defaults totrue
, and performs moving threshold by evaluating 200 possible values between the minimum and maximum output of the model, and returning the one that is optimaloptimality
: defaults tomcc
, and is the function applied to the confusion matrix to evaluate which value of the threshold is the bestabsences
: defaults tofalse
, and indicates whether the (pseudo) absences are used to train the transformer; when using actual absences, this should be set totrue
Internally, this function trains the transformer, then projects the data, then trains the classifier. If threshold
is true
, the threshold is then optimized.
StatsAPI.predict Function
predict(model::RegressionModel, [newX])
Form the predicted response of model
. An object with new covariate values newX
can be supplied, which should have the same type and structure as that used to fit model
; e.g. for a GLM it would generally be a DataFrame
with the same variable names as the original predictors.
SDeMo.reset! Function
reset!(sdm::SDM, thr=0.5)
Resets a model, with a potentially specified value of the threshold. This amounts to re-using all the variables, and removing the tuned threshold version.