SDeMo
SDeMo.__classsplit
— Method__classsplit(y)
Returns a tuple with the presences indices, and the absences indices - this is used to maintain class balance in cross-validation and bagging
SDeMo._explain_many_instances
— Method_explain_many_instances(f, Z, X, j, n)
Applies explainone_instance on the matrix Z
SDeMo._explain_one_instance
— Method_explain_one_instance(f, instance, X, j, n)
This method returns the explanation for the instance at variable j, based on training data X. This is the most granular version of the Shapley values algorithm.
SDeMo._mcsample
— Method_mcsample(x::Vector{T}, X::Matrix{T}, j::Int64, n::Int64) where {T <:Number}
This generates a Monte-Carlo sample for Shapley values. The arguments are, in order
x: a single instance (as a vector) to explain
X: a matrix of training data providing the samples for explanation
j: the index of the variable to explain
n: the number of samples to generate for evaluation
SDeMo._validate_one_model!
— Method_validate_one_model!(model::AbstractSDM, fold, τ, kwargs...)
Trains the model and returns the Cv and Ct conf matr. Used internally by cross-validation.
SDeMo.accuracy
— Functionaccuracy(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of accuracy
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.accuracy
— Methodaccuracy(M::ConfusionMatrix)
Accuracy
$\frac{TP + TN}{TP + TN + FP + FN}$
SDeMo.backwardselection!
— Methodbackwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)
Removes variables one at a time until the optimality
measure stops increasing. Variables included in pool
are not removed.
All keyword arguments are passed to crossvalidate
and train!
.
SDeMo.backwardselection!
— Methodbackwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)
Removes variables one at a time until the optimality
measure stops increasing.
All keyword arguments are passed to crossvalidate
and train!
.
SDeMo.balancedaccuracy
— Functionbalancedaccuracy(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of balancedaccuracy
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.balancedaccuracy
— Methodbalanced(M::ConfusionMatrix)
Balanced accuracy
$\frac{1}{2} (TPR + TNR)$
SDeMo.bootstrap
— Methodbootstrap(y, X; n = 50)
SDeMo.bootstrap
— Methodbootstrap(sdm::SDM; kwargs...)
SDeMo.ci
— Methodci(C::Vector{<:ConfusionMatrix}, f)
Applies f
to all confusion matrices in the vector, and returns the 95% CI.
SDeMo.ci
— Methodci(C::Vector{<:ConfusionMatrix})
Applies the MCC (mcc
) to all confusion matrices in the vector, and returns the 95% CI.
SDeMo.classifier
— Methodclassifier(model::Bagging)
Returns the classifier used by the model that is used as a template for the bagged model
SDeMo.classifier
— Methodclassifier(model::SDM)
Returns the classifier used by the model
SDeMo.coinflip
— Methodcoinflip(ensemble::Bagging)
Version of coinflip
using the training labels for an homogeneous ensemble.
SDeMo.coinflip
— Methodcoinflip(sdm::SDM)
Version of coinflip
using the training labels for an SDM.
SDeMo.coinflip
— Methodcoinflip(labels::Vector{Bool})
Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.
SDeMo.constantnegative
— Methodconstantnegative(ensemble::Bagging)
Version of constantnegative
using the training labels for an homogeneous ensemble.
SDeMo.constantnegative
— Methodconstantnegative(sdm::SDM)
Version of constantnegative
using the training labels for an SDM.
SDeMo.constantnegative
— Methodconstantnegative(labels::Vector{Bool})
Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.
SDeMo.constantpositive
— Methodconstantpositive(ensemble::Bagging)
Version of constantpositive
using the training labels for an homogeneous ensemble.
SDeMo.constantpositive
— Methodconstantpositive(sdm::SDM)
Version of constantpositive
using the training labels for an SDM.
SDeMo.constantpositive
— Methodconstantpositive(labels::Vector{Bool})
Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.
SDeMo.counterfactual
— Methodcounterfactual(model::AbstractSDM, x::Vector{T}, yhat, λ; maxiter=100, minvar=5e-5, kwargs...) where {T <: Number}
Generates one counterfactual explanation given an input vector x
, and a target rule to reach yhat
. The learning rate is λ
. The maximum number of iterations used in the Nelder-Mead algorithm is maxiter
, and the variance improvement under which the model will stop is minvar
. Other keywords are passed to predict
.
SDeMo.crossvalidate
— Methodcrossvalidate(sdm, folds; thr = nothing, kwargs...)
Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr
keyword arguments. All other keywords are passed to the train!
method.
This method returns two vectors of ConfusionMatrix
, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.
SDeMo.dor
— Functiondor(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of dor
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.dor
— Methoddor(M::ConfusionMatrix)
Diagnostic odd ratio, defined as plr
/nlr
. A useful test has a value larger than unity, and this value has no upper bound.
SDeMo.explain
— Methodexplain(model::AbstractSDM, j; observation = nothing, instances = nothing, samples = 100, kwargs..., )
Uses the MCMC approximation of Shapley values to provide explanations to specific predictions. The second argument j
is the variable for which the explanation should be provided.
The observation
keywords is a row in the instances
dataset for which explanations must be provided. If instances
is nothing
, the explanations will be given on the training data.
All other keyword arguments are passed to predict
.
SDeMo.f1
— Functionf1(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of f1
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.f1
— Methodf1(M::ConfusionMatrix)
F₁ score, defined as the harmonic mean between precision and recall:
$2\times\frac{PPV\times TPR}{PPV + TPR}$
This uses the more general fscore
internally.
SDeMo.fdir
— Functionfdir(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of fdir
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fdir
— Methodfdir(M::ConfusionMatrix)
False discovery rate, 1 - ppv
SDeMo.features
— Methodfeatures(sdm::SDM, n)
Returns the n-th feature stored in the field X
of the SDM.
SDeMo.features
— Methodfeatures(sdm::SDM)
Returns the features stored in the field X
of the SDM. Note that the features are an array, and this does not return a copy of it – any change made to the output of this function will change the content of the SDM features.
SDeMo.fnr
— Functionfnr(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of fnr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fnr
— Methodfnr(M::ConfusionMatrix)
False-negative rate
$\frac{FN}{FN+TP}$
SDeMo.fomr
— Functionfomr(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of fomr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fomr
— Methodfomr(M::ConfusionMatrix)
False omission rate, 1 - npv
SDeMo.forwardselection!
— Methodforwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)
Adds variables one at a time until the optimality
measure stops increasing. The variables in pool
are added at the start.
All keyword arguments are passed to crossvalidate
and train!
.
SDeMo.forwardselection!
— Methodforwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)
Adds variables one at a time until the optimality
measure stops increasing.
All keyword arguments are passed to crossvalidate
and train!
.
SDeMo.fpr
— Functionfpr(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of fpr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fpr
— Methodfpr(M::ConfusionMatrix)
False-positive rate
$\frac{FP}{FP+TN}$
SDeMo.fscore
— Functionfscore(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of fscore
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fscore
— Functionfscore(M::ConfusionMatrix, β=1.0)
Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:
$(1 + \beta^2)\times\frac{PPV\times TPR}{(\beta^2 \times PPV) + TPR}$
SDeMo.holdout
— Methodholdout(y, X; proportion = 0.2, permute = true)
Sets aside a proportion (given by the proportion
keyword, defaults to 0.2
) of observations to use for validation, and the rest for training. An additional argument permute
(defaults to true
) can be used to shuffle the order of observations before they are split.
This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate
, it must be put in []
.
SDeMo.holdout
— Methodholdout(sdm::Bagging)
Version of holdout
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.holdout
— Methodholdout(sdm::SDM)
Version of holdout
using the instances and labels of an SDM.
SDeMo.hyperparameters!
— Methodhyperparameters!(tr::HasHyperParams, hp::Symbol, val)
Sets the hyper-parameters for a transformer or a classifier
SDeMo.hyperparameters
— Methodhyperparameters(::Type{<:HasHyperParams}) = nothing
Returns the hyper-parameters for a type of classifier or transformer
SDeMo.hyperparameters
— Methodhyperparameters(::HasHyperParams)
Returns the hyper-parameters for a classifier or a transformer
SDeMo.hyperparameters
— Methodhyperparameters(::HasHyperParams, ::Symbol)
Returns the value for an hyper-parameter
SDeMo.instance
— Methodinstance(sdm::SDM, n; strict=true)
Returns the n-th instance stored in the field X
of the SDM. If the keyword argument strict
is true
, only the variables used for prediction are returned.
SDeMo.iqr
— Functioniqr(x, m=0.25, M=0.75)
Returns the inter-quantile range, by default between 25% and 75% of observations.
SDeMo.kfold
— Methodkfold(y, X; k = 10, permute = true)
Returns splits of the data in which 1 group is used for validation, and k
-1 groups are used for training. All k
groups have the (approximate) same size, and each instance is only used once for validation (and
k`-1 times for training). The groups are stratified (so that they have the same prevalence).
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
SDeMo.kfold
— Methodkfold(sdm::Bagging)
Version of kfold
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.kfold
— Methodkfold(sdm::SDM)
Version of kfold
using the instances and labels of an SDM.
SDeMo.labels
— Methodlabels(sdm::SDM)
Returns the labels stored in the field y
of the SDM – note that this is not a copy of the labels, but the object itself.
SDeMo.leaveoneout
— Methodleaveoneout(y, X)
Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
SDeMo.leaveoneout
— Methodleaveoneout(sdm::Bagging)
Version of leaveoneout
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.leaveoneout
— Methodleaveoneout(sdm::SDM)
Version of leaveoneout
using the instances and labels of an SDM.
SDeMo.loadsdm
— Methodloadsdm(file::String; kwargs...)
Loads a model to a JSON
file. The keyword arguments are passed to train!
. The model is trained in full upon loading.
SDeMo.markedness
— Functionmarkedness(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of markedness
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.markedness
— Methodmarkedness(M::ConfusionMatrix)
Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions
$PPV + NPV -1$
SDeMo.mcc
— Functionmcc(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of mcc
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.mcc
— Methodmcc(M::ConfusionMatrix)
Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.
SDeMo.montecarlo
— Methodmontecarlo(y, X; n = 100, kwargs...)
Returns n
(def. 100
) samples of holdout
. Other keyword arguments are passed to holdout
.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
SDeMo.montecarlo
— Methodmontecarlo(sdm::Bagging)
Version of montecarlo
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.montecarlo
— Methodmontecarlo(sdm::SDM)
Version of montecarlo
using the instances and labels of an SDM.
SDeMo.nlr
— Functionnlr(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of nlr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.nlr
— Methodnlr(M::ConfusionMatrix)
Negative likelihood ratio
$\frac{FNR}{TNR}$
SDeMo.noselection!
— Methodnoselection!(model, folds; verbose::Bool = false, kwargs...)
Returns the model to the state where all variables are used.
All keyword arguments are passed to train!
.
SDeMo.noselection!
— Methodnoselection!(model; verbose::Bool = false, kwargs...)
Returns the model to the state where all variables are used.
All keyword arguments are passed to train!
. For convenience, this version does not require a folds
argument, as it would be unused anyway.
SDeMo.noskill
— Methodnoskill(ensemble::Bagging)
Version of noskill
using the training labels for an homogeneous ensemble.
SDeMo.noskill
— Methodnoskill(sdm::SDM)
Version of noskill
using the training labels for an SDM.
SDeMo.noskill
— Methodnoskill(labels::Vector{Bool})
Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.
SDeMo.npv
— Functionnpv(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of npv
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.npv
— Methodnpv(M::ConfusionMatrix)
Negative predictive value
$\frac{TN}{TN+FN}$
SDeMo.outofbag
— Methodoutofbag(ensemble::Bagging; kwargs...)
This method returns the confusion matrix associated to the out of bag error, wherein the succes in predicting instance i is calculated on the basis of all models that have not been trained on i. The consensus of the different models is a simple majority rule.
The additional keywords arguments are passed to predict
.
SDeMo.partialresponse
— Methodpartialresponse(model::T, i::Integer, j::Integer, s::Tuple=(50, 50); inflated::Bool, kwargs...)
This method returns the partial response of applying the trained model to a simulated dataset where all variables except i
and j
are set to their mean value.
This function will return a grid corresponding to evenly spaced values of i
and j
, the size of which is given by the last argument s
(defaults to 50 × 50).
All keyword arguments are passed to predict
.
SDeMo.partialresponse
— Methodpartialresponse(model::T, i::Integer, args...; inflated::Bool, kwargs...)
This method returns the partial response of applying the trained model to a simulated dataset where all variables except i
are set to their mean value. The inflated
keywork, when set to true
, will instead pick a random value within the range of the observations.
The different arguments that can follow the variable position are
- nothing, where the unique values for the
i
-th variable are used (sorted) - a number, in which point that many evenly spaced points within the range of the variable are used
- an array, in which case each value of this array is evaluated
All keyword arguments are passed to predict
.
SDeMo.plr
— Functionplr(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of plr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.plr
— Methodplr(M::ConfusionMatrix)
Positive likelihood ratio
$\frac{TPR}{FPR}$
SDeMo.ppv
— Functionppv(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of ppv
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.ppv
— Methodppv(M::ConfusionMatrix)
Positive predictive value
$\frac{TP}{TP+FP}$
SDeMo.precision
— Functionprecision(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of precision
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.precision
— Methodprecision(M::ConfusionMatrix)
Alias for ppv
, the positive predictive value
SDeMo.recall
— Functionrecall(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of recall
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.recall
— Methodrecall(M::ConfusionMatrix)
Alias for tpr
, the true positive rate
SDeMo.reset!
— Functionreset!(sdm::SDM, thr=0.5)
Resets a model, with a potentially specified value of the threshold. This amounts to re-using all the variables, and removing the tuned threshold version.
SDeMo.sensitivity
— Functionsensitivity(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of sensitivity
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.sensitivity
— Methodsensitivity(M::ConfusionMatrix)
Alias for tpr
, the true positive rate
SDeMo.specificity
— Functionspecificity(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of specificity
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.specificity
— Methodspecificity(M::ConfusionMatrix)
Alias for tnr
, the true negative rate
SDeMo.stepwisevif!
— Functionstepwisevif!(model::SDM, limit, tr=:;kwargs...)
Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :
) is the indices to use for the VIF calculation. All keyword arguments are passed to train!
.
SDeMo.threshold!
— Methodthreshold!(sdm::SDM, τ)
Sets the value of the threshold.
SDeMo.threshold
— Methodthreshold(sdm::SDM)
This returns the value above which the score returned by the SDM is considered to be a presence.
SDeMo.tnr
— Functiontnr(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of tnr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.tnr
— Methodtnr(M::ConfusionMatrix)
True-negative rate
$\frac{TN}{TN+FP}$
SDeMo.tpr
— Functiontpr(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of tpr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.tpr
— Methodtpr(M::ConfusionMatrix)
True-positive rate
$\frac{TP}{TP+FN}$
SDeMo.train!
— Methodtrain!(ensemble::Bagging; kwargs...)
Trains all the model in an ensemble model - the keyword arguments are passed to train!
for each model. Note that this retrains the entire model, which includes the transformers.
SDeMo.train!
— Methodtrain!(ensemble::Ensemble; kwargs...)
Trains all the model in an heterogeneous ensemble model - the keyword arguments are passed to train!
for each model. Note that this retrains the entire model, which includes the transformers.
The keywod arguments are passed to train!
and can include the training
indices.
SDeMo.train!
— Methodtrain!(sdm::SDM; threshold=true, training=:, optimality=mcc)
This is the main training function to train a SDM.
The three keyword arguments are:
training
: defaults to:
, and is the range (or alternatively the indices) of the data that are used to train the modelthreshold
: defaults totrue
, and performs moving threshold by evaluating 200 possible values between the minimum and maximum output of the model, and returning the one that is optimaloptimality
: defaults tomcc
, and is the function applied to the confusion matrix to evaluate which value of the threshold is the bestabsences
: defaults tofalse
, and indicates whether the (pseudo) absences are used to train the transformer; when using actual absences, this should be set totrue
Internally, this function trains the transformer, then projects the data, then trains the classifier. If threshold
is true
, the threshold is then optimized.
SDeMo.transformer
— Methodtransformer(model::Bagging)
Returns the transformer used by the model that is used as a template for the bagged model
SDeMo.transformer
— Methodtransformer(model::SDM)
Returns the transformer used by the model
SDeMo.trueskill
— Functiontrueskill(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of trueskill
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.trueskill
— Methodtrueskill(M::ConfusionMatrix)
True skill statistic (a.k.a Youden's J, or informedness)
$TPR + TNR - 1$
SDeMo.variableimportance
— Methodvariableimportance(model, folds; kwargs...)
Returns the importance of all variables in the model. The keywords are passed to variableimportance
.
SDeMo.variableimportance
— Methodvariableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)
Returns the importance of one variable in the model. The samples
keyword fixes the number of bootstraps to run (defaults to 10
, which is not enough!).
The keywords are passed to ConfusionMatrix
.
SDeMo.variables!
— Methodvariables!(ensemble::Bagging, v::Vector{Int})
Sets the variable of the top-level model, and then sets the variables of each model in the ensemble.
SDeMo.variables!
— Methodvariables!(sdm::SDM, v)
Sets the list of variables.
SDeMo.variables!
— Methodvariables!(model::AbstractSDM, ::Type{T}, folds::Vector{Tuple{Vector{Int}, Vector{Int}}}; included=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {T <: VariableSelectionStrategy}
Performs variable selection based on a selection strategy, with a possible folds
for cross-validation. If omitted, this defaults to k-folds.
The model is retrained on the optimal set of variables after training.
Keywords:
included
(Int[]
), a list of variables that must be included in the modeloptimality
(mcc
), the measure to optimise at each round of variable selectionverbose
(false
), whether the performance should be returned after each round of variable selectionbagfeatures
(false
), whetherbagfeatures!
should be called on each model in an homogeneous ensemble- all other keywords are passed to
train!
andcrossvalidate
Important notes:
- When using
bagfeatures
with a pool of included variables, they will always be present in the overall model, but not necessarilly in each model of the ensemble - When using
VarianceInflationFactor
, the variable selection will stop even if the VIF is above the threshold, if it means producing a model with a lower performance – usingvariables!
will always lead to a better model
SDeMo.variables!
— Methodvariables!(model::M, ::Type{StrictVarianceInflationFactor{N}}, args...; included::Vector{Int}=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {M <: Union{SDM, Bagging}, N}
Version of the variable selection for the strict VIF case. This may result in a worse model, and for this reason there is no cross-validation.
SDeMo.variables
— Methodvariables(sdm::SDM)
Returns the list of variables used by the SDM – these may be ordered by importance. This does not return a copy of the variables array, but the array itself.
SDeMo.vif
— Methodvif(::Matrix)
Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.
SDeMo.vif
— Methodvif(::AbstractSDM, tr=:)
Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to :
for all points). The VIF is calculated on the de-meaned predictors.
SDeMo.writesdm
— Methodwritesdm(file::String, model::SDM)
Writes a model to a JSON
file. This method is very bare-bones, and only saves the structure of the model, as well as the data.
SDeMo.κ
— Functionκ(C::Vector{<:ConfusionMatrix}, full::Bool=false)
Version of κ
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.κ
— Methodκ(M::ConfusionMatrix)
Cohen's κ
StatsAPI.predict
— MethodStatsAPI.predict(ensemble::Bagging; kwargs...)
Predicts the ensemble model for all training data.
StatsAPI.predict
— MethodStatsAPI.predict(ensemble::Ensemble; kwargs...)
Predicts the heterogeneous ensemble model for all training data.
StatsAPI.predict
— MethodStatsAPI.predict(sdm::SDM; kwargs...)
This method performs the prediction on the entire set of training data available for the training of an SDM.
StatsAPI.predict
— MethodStatsAPI.predict(ensemble::Bagging, X; consensus = median, kwargs...)
Returns the prediction for the ensemble of models a dataset X
. The function used to aggregate the outputs from different models is consensus
(defaults to median
). All other keyword arguments are passed to predict
.
To get a direct estimate of the variability, the consensus
function can be changed to iqr
(inter-quantile range), or any measure of variance.
StatsAPI.predict
— MethodStatsAPI.predict(ensemble::Ensemble, X; consensus = median, kwargs...)
Returns the prediction for the heterogeneous ensemble of models a dataset X
. The function used to aggregate the outputs from different models is consensus
(defaults to median
). All other keyword arguments are passed to predict
.
To get a direct estimate of the variability, the consensus
function can be changed to iqr
(inter-quantile range), or any measure of variance.
StatsAPI.predict
— MethodStatsAPI.predict(sdm::SDM, X; threshold = true)
This is the main prediction function, and it takes as input an SDM and a matrix of features. The only keyword argument is threshold
, which determines whether the prediction is returned raw or as a binary value (default is true
).
SDeMo.AbstractEnsembleSDM
— TypeAbstractEnsembleSDM
This abstract types covers model that combine different SDMs to make a prediction, which currently covers Bagging
and Ensemble
.
SDeMo.AbstractSDM
— TypeAbstractSDM
This abstract type covers both the regular and the ensemble models.
SDeMo.AllVariables
— TypeAllVariables
All variables in the training dataset are used. Note that this also crossvalidates and trains the model.
SDeMo.BIOCLIM
— TypeBIOCLIM
BIOCLIM
SDeMo.BackwardSelection
— TypeForwardSelection
Variables are removed one at a time until the performance of the models stops improving.
SDeMo.Bagging
— TypeBagging
SDeMo.Bagging
— MethodBagging(model::SDM, n::Integer)
Creates a bag from SDM
SDeMo.Bagging
— MethodBagging(model::SDM, bags::Vector)
blah
SDeMo.ChainedTransform
— TypeChainedTransform{T1, T2}
A transformer that applies, in sequence, a pair of other transformers. This can be used to, for example, do a PCA then a z-score on the projected space. This is limited to two steps because the value of chaining more transformers is doubtful. We may add support for more complex transformations in future versions.
The first and second steps are accessible through first
and last
.
SDeMo.Classifier
— TypeClassifier
This abstract type covers all algorithms to convert transformed data into prediction.
SDeMo.ConfusionMatrix
— TypeConfusionMatrix{T <: Number}
A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero
method.
SDeMo.ConfusionMatrix
— MethodConfusionMatrix(ensemble::Bagging; kwargs...)
Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict
method.
SDeMo.ConfusionMatrix
— MethodConfusionMatrix(sdm::SDM; kwargs...)
Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict
method.
SDeMo.ConfusionMatrix
— MethodConfusionMatrix(pred::Vector{Bool}, truth::Vector{Bool})
Given a vector of binary predictions and a vector of ground truths, returns the confusion matrix.
SDeMo.ConfusionMatrix
— MethodConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}, τ::T) where {T <: Number}
Given a vector of scores and a vector of ground truths, as well as a threshold, transforms the score into binary predictions and returns the confusion matrix.
SDeMo.ConfusionMatrix
— MethodConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}) where {T <: Number}
Given a vector of scores and a vector of truth, returns the confusion matrix under the assumption that the score are probabilities and that the threshold is one half.
SDeMo.DecisionTree
— TypeDecisionTree
The depth and number of nodes can be adjusted with maxnodes!
and maxdepth!
.
SDeMo.Ensemble
— TypeEnsemble
An heterogeneous ensemble model is defined as a vector of SDM
s. Bagging
models can also be used.
SDeMo.ForwardSelection
— TypeForwardSelection
Variables are included one at a time until the performance of the models stops improving.
SDeMo.Logistic
— TypeLogistic
Logistic regression with default learning rate of 0.01, penalization (L2) of 0.1, and 2000 epochs. Note that interaction terms can be turned on and off through the use of the interactions
field. Possible values are :all
(default), :self
(only squared terms), and :none
(no interactions).
The verbose
field (defaults to false
) can be used to show the progress of gradient descent, by showing the loss every 100 epochs, or to the value of the verbosity
field. Note that when doing cross-validation, the loss on the validation data will be automatically reported.
SDeMo.MultivariateTransform
— TypeMultivariateTransform{T} <: Transformer
T
is a multivariate transformation, likely offered through the MultivariateStats
package. The transformations currently supported are PCA
, PPCA
, KernelPCA
, and Whitening
, and they are documented through their type aliases (e.g. PCATransform
).
SDeMo.NaiveBayes
— TypeNaiveBayes
Naive Bayes Classifier
By default, upon training, the prior probability will be set to the prevalence of the training data.
SDeMo.PCATransform
— TypePCATransform
The PCA transform will project the model features, which also serves as a way to decrease the dimensionality of the problem. Note that this method will only use the training instances, and unless the absences=true
keyword is used, only the present cases. This ensure that there is no data leak (neither validation data nor the data from the raster are used).
This is an alias for MultivariateTransform{PCA}
.
SDeMo.RawData
— TypeRawData
A transformer that does nothing to the data. This is passing the raw data to the classifier, and can be a good first step for models that assume that the features are independent, or are not sensitive to the scale of the features.
SDeMo.SDM
— TypeSDM
This type specifies a full model, which is composed of a transformer (which applies a transformation on the data), a classifier (which returns a quantitative score), a threshold (above which the score corresponds to the prediction of a presence).
In addition, the SDM carries with it the training features and labels, as well as a vector of indices indicating which variables are actually used by the model.
SDeMo.StrictVarianceInflationFactor
— TypeStrictVarianceInflationFactor{N}
Removes variables one at a time until the largest VIF is lower than N
(a floating point number). By contrast with VarianceInflationFactor
, this approach to variable selection will not cross-validate the model, and might result in a model that is far worse than any other variable selection technique.
SDeMo.Transformer
— TypeTransformer
This abstract type covers all transformations that are applied to the data before fitting the classifier.
SDeMo.VariableSelectionStrategy
— TypeVariableSelectionStrategy
This is an abstract type to which all variable selection types belong. The variable selection methods should define a method for variables!
, whose first argument is a model, and the second argument is a selection strategy. The third and fourth positional arguments are, respectively, a list of variables to be included, and the folds to use for cross-validation. They can be omitted and would default to no default variables, and k-fold cross-validation.
SDeMo.VarianceInflationFactor
— TypeVarianceInflationFactor{N}
Removes variables one at a time until the largest VIF is lower than N
(a floating point number), or the performancde of the model stops increasing. Note that the resulting set of variables may have a largest VIF larger than the threshold. See StrictVarianceInflationFactor
for an alternative.
SDeMo.WhiteningTransform
— TypeWhiteningTransform
The whitening transformation is a linear transformation of the input variables, after which the new variables have unit variance and no correlation. The input is transformed into white noise.
Because this transform will usually keep the first variable "as is", and then apply increasingly important perturbations on the subsequent variables, it is sensitive to the order in which variables are presented, and is less useful when applying tools for interpretation.
This is an alias for MultivariateTransform{Whitening}
.
SDeMo.ZScore
— TypeZScore
A transformer that scales and centers the data, using only the data that are avaiable to the model at training time.
For all variables in the SDM features (regardless of whether they are used), this transformer will store the observed mean and standard deviation. There is no correction on the sample size, because there is no reason to expect that the sample size will be the same for the training and prediction situation.