SDeMo
SDeMo.__classsplit — Method__classsplit(y)Returns a tuple with the presences indices, and the absences indices - this is used to maintain class balance in cross-validation and bagging
SDeMo._explain_many_instances — Method_explain_many_instances(f, Z, X, j, n)Applies explainone_instance on the matrix Z
SDeMo._explain_one_instance — Method_explain_one_instance(f, instance, X, j, n)This method returns the explanation for the instance at variable j, based on training data X. This is the most granular version of the Shapley values algorithm.
SDeMo._mcsample — Method_mcsample(x::Vector{T}, X::Matrix{T}, j::Int64, n::Int64) where {T <:Number}This generates a Monte-Carlo sample for Shapley values. The arguments are, in order
x: a single instance (as a vector) to explain
X: a matrix of training data providing the samples for explanation
j: the index of the variable to explain
n: the number of samples to generate for evaluation
SDeMo._validate_one_model! — Method_validate_one_model!(model::AbstractSDM, fold, τ, kwargs...)Trains the model and returns the Cv and Ct conf matr. Used internally by cross-validation.
SDeMo.accuracy — Functionaccuracy(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of accuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.accuracy — Methodaccuracy(M::ConfusionMatrix)Accuracy
$\frac{TP + TN}{TP + TN + FP + FN}$
SDeMo.backwardselection! — Methodbackwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)Removes variables one at a time until the optimality measure stops increasing. Variables included in pool are not removed.
All keyword arguments are passed to crossvalidate and train!.
SDeMo.backwardselection! — Methodbackwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)Removes variables one at a time until the optimality measure stops increasing.
All keyword arguments are passed to crossvalidate and train!.
SDeMo.balancedaccuracy — Functionbalancedaccuracy(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of balancedaccuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.balancedaccuracy — Methodbalanced(M::ConfusionMatrix)Balanced accuracy
$\frac{1}{2} (TPR + TNR)$
SDeMo.bootstrap — Methodbootstrap(y, X; n = 50)Generate a series of n bootstrap samples for molde bagging. The present and absent classes are boostrapped separately so that in and out of bag respect (on average) class balance.
SDeMo.bootstrap — Methodbootstrap(sdm::SDM; kwargs...)SDeMo.calibrate — Methodcalibration(sdm::T; kwargs...) where {T <: AbstractSDM}Returns a function for model calibration, using Platt scaling, optimized with the Newton method. The returned function can be applied to a model output.
SDeMo.ci — Methodci(C::Vector{<:ConfusionMatrix}, f)Applies f to all confusion matrices in the vector, and returns the 95% CI.
SDeMo.ci — Methodci(C::Vector{<:ConfusionMatrix})Applies the MCC (mcc) to all confusion matrices in the vector, and returns the 95% CI.
SDeMo.classifier — Methodclassifier(model::Bagging)Returns the classifier used by the model that is used as a template for the bagged model
SDeMo.classifier — Methodclassifier(model::SDM)Returns the classifier used by the model
SDeMo.coinflip — Methodcoinflip(ensemble::Bagging)Version of coinflip using the training labels for an homogeneous ensemble.
SDeMo.coinflip — Methodcoinflip(sdm::SDM)Version of coinflip using the training labels for an SDM.
SDeMo.coinflip — Methodcoinflip(labels::Vector{Bool})Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.
SDeMo.constantnegative — Methodconstantnegative(ensemble::Bagging)Version of constantnegative using the training labels for an homogeneous ensemble.
SDeMo.constantnegative — Methodconstantnegative(sdm::SDM)Version of constantnegative using the training labels for an SDM.
SDeMo.constantnegative — Methodconstantnegative(labels::Vector{Bool})Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.
SDeMo.constantpositive — Methodconstantpositive(ensemble::Bagging)Version of constantpositive using the training labels for an homogeneous ensemble.
SDeMo.constantpositive — Methodconstantpositive(sdm::SDM)Version of constantpositive using the training labels for an SDM.
SDeMo.constantpositive — Methodconstantpositive(labels::Vector{Bool})Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.
SDeMo.counterfactual — Methodcounterfactual(model::AbstractSDM, x::Vector{T}, yhat, λ; maxiter=100, minvar=5e-5, kwargs...) where {T <: Number}Generates one counterfactual explanation given an input vector x, and a target rule to reach yhat. The learning rate is λ. The maximum number of iterations used in the Nelder-Mead algorithm is maxiter, and the variance improvement under which the model will stop is minvar. Other keywords are passed to predict.
SDeMo.crossvalidate — Methodcrossvalidate(sdm, folds; thr = nothing, kwargs...)Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr keyword arguments. All other keywords are passed to the train! method.
This method returns two vectors of ConfusionMatrix, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.
SDeMo.crossvalidate — Methodcrossvalidate(sdm::T, args...; kwargs...) where {T <: AbstractSDM}Performs cross-validation using 10-fold validation as a default. Called when crossvalidate is used without a folds second argument.
SDeMo.dor — Functiondor(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of dor using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.dor — Methoddor(M::ConfusionMatrix)Diagnostic odd ratio, defined as plr/nlr. A useful test has a value larger than unity, and this value has no upper bound.
SDeMo.explain — Methodexplain(model::AbstractSDM, j; observation = nothing, instances = nothing, samples = 100, kwargs..., )Uses the MCMC approximation of Shapley values to provide explanations to specific predictions. The second argument j is the variable for which the explanation should be provided.
The observation keywords is a row in the instances dataset for which explanations must be provided. If instances is nothing, the explanations will be given on the training data.
All other keyword arguments are passed to predict.
SDeMo.f1 — Functionf1(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of f1 using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.f1 — Methodf1(M::ConfusionMatrix)F₁ score, defined as the harmonic mean between precision and recall:
$2\times\frac{PPV\times TPR}{PPV + TPR}$
This uses the more general fscore internally.
SDeMo.fdir — Functionfdir(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of fdir using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fdir — Methodfdir(M::ConfusionMatrix)False discovery rate, 1 - ppv
SDeMo.features — Methodfeatures(sdm::SDM, n)Returns the n-th feature stored in the field X of the SDM.
SDeMo.features — Methodfeatures(sdm::SDM)Returns the features stored in the field X of the SDM. Note that the features are an array, and this does not return a copy of it – any change made to the output of this function will change the content of the SDM features.
SDeMo.fnr — Functionfnr(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of fnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fnr — Methodfnr(M::ConfusionMatrix)False-negative rate
$\frac{FN}{FN+TP}$
SDeMo.fomr — Functionfomr(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of fomr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fomr — Methodfomr(M::ConfusionMatrix)False omission rate, 1 - npv
SDeMo.forwardselection! — Methodforwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)Adds variables one at a time until the optimality measure stops increasing. The variables in pool are added at the start.
All keyword arguments are passed to crossvalidate and train!.
SDeMo.forwardselection! — Methodforwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)Adds variables one at a time until the optimality measure stops increasing.
All keyword arguments are passed to crossvalidate and train!.
SDeMo.fpr — Functionfpr(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of fpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fpr — Methodfpr(M::ConfusionMatrix)False-positive rate
$\frac{FP}{FP+TN}$
SDeMo.fscore — Functionfscore(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of fscore using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fscore — Functionfscore(M::ConfusionMatrix, β=1.0)Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:
$(1 + \beta^2)\times\frac{PPV\times TPR}{(\beta^2 \times PPV) + TPR}$
SDeMo.fscore — Methodfscore(β::Real)Creates a function for the Fᵦ score, which takes a confusion matrix as an input.
SDeMo.gmean — Methodgmean(M::ConfusionMatrix)Geometric mean of sensitivity and specificity.
SDeMo.holdout — Methodholdout(y, X; proportion = 0.2, permute = true)Sets aside a proportion (given by the proportion keyword, defaults to 0.2) of observations to use for validation, and the rest for training. An additional argument permute (defaults to true) can be used to shuffle the order of observations before they are split.
This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate, it must be put in [].
SDeMo.holdout — Methodholdout(sdm::Bagging)Version of holdout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.holdout — Methodholdout(sdm::SDM)Version of holdout using the instances and labels of an SDM.
SDeMo.hyperparameters! — Methodhyperparameters!(tr::HasHyperParams, hp::Symbol, val)Sets the hyper-parameters for a transformer or a classifier
SDeMo.hyperparameters — Methodhyperparameters(::Type{<:HasHyperParams}) = nothingReturns the hyper-parameters for a type of classifier or transformer
SDeMo.hyperparameters — Methodhyperparameters(::HasHyperParams)Returns the hyper-parameters for a classifier or a transformer
SDeMo.hyperparameters — Methodhyperparameters(::HasHyperParams, ::Symbol)Returns the value for an hyper-parameter
SDeMo.instance — Methodinstance(sdm::SDM, n; strict=true)Returns the n-th instance stored in the field X of the SDM. If the keyword argument strict is true, only the variables used for prediction are returned.
SDeMo.iqr — Functioniqr(x, m=0.25, M=0.75)Returns the inter-quantile range, by default between 25% and 75% of observations.
SDeMo.kfold — Methodkfold(y, X; k = 10, permute = true)Returns splits of the data in which 1 group is used for validation, and k-1 groups are used for training. All kgroups have the (approximate) same size, and each instance is only used once for validation (andk`-1 times for training). The groups are stratified (so that they have the same prevalence).
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
SDeMo.kfold — Methodkfold(sdm::Bagging)Version of kfold using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.kfold — Methodkfold(sdm::SDM)Version of kfold using the instances and labels of an SDM.
SDeMo.labels — Methodlabels(sdm::SDM)Returns the labels stored in the field y of the SDM – note that this is not a copy of the labels, but the object itself.
SDeMo.leaveoneout — Methodleaveoneout(y, X)Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
SDeMo.leaveoneout — Methodleaveoneout(sdm::Bagging)Version of leaveoneout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.leaveoneout — Methodleaveoneout(sdm::SDM)Version of leaveoneout using the instances and labels of an SDM.
SDeMo.loadsdm — Methodloadsdm(file::String; kwargs...)Loads a model to a JSON file. The keyword arguments are passed to train!. The model is trained in full upon loading.
SDeMo.markedness — Functionmarkedness(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of markedness using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.markedness — Methodmarkedness(M::ConfusionMatrix)Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions
$PPV + NPV -1$
SDeMo.mcc — Functionmcc(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of mcc using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.mcc — Methodmcc(M::ConfusionMatrix)Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.
SDeMo.montecarlo — Methodmontecarlo(y, X; n = 100, kwargs...)Returns n (def. 100) samples of holdout. Other keyword arguments are passed to holdout.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
SDeMo.montecarlo — Methodmontecarlo(sdm::Bagging)Version of montecarlo using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.montecarlo — Methodmontecarlo(sdm::SDM)Version of montecarlo using the instances and labels of an SDM.
SDeMo.nlr — Functionnlr(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of nlr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.nlr — Methodnlr(M::ConfusionMatrix)Negative likelihood ratio
$\frac{FNR}{TNR}$
SDeMo.noselection! — Methodnoselection!(model, folds; verbose::Bool = false, kwargs...)Returns the model to the state where all variables are used.
All keyword arguments are passed to train!.
SDeMo.noselection! — Methodnoselection!(model; verbose::Bool = false, kwargs...)Returns the model to the state where all variables are used.
All keyword arguments are passed to train!. For convenience, this version does not require a folds argument, as it would be unused anyway.
SDeMo.noskill — Methodnoskill(ensemble::Bagging)Version of noskill using the training labels for an homogeneous ensemble.
SDeMo.noskill — Methodnoskill(sdm::SDM)Version of noskill using the training labels for an SDM.
SDeMo.noskill — Methodnoskill(labels::Vector{Bool})Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.
SDeMo.npv — Functionnpv(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of npv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.npv — Methodnpv(M::ConfusionMatrix)Negative predictive value
$\frac{TN}{TN+FN}$
SDeMo.outofbag — Methodoutofbag(ensemble::Bagging; kwargs...)This method returns the confusion matrix associated to the out of bag error, wherein the succes in predicting instance i is calculated on the basis of all models that have not been trained on i. The consensus of the different models is a simple majority rule.
The additional keywords arguments are passed to predict.
SDeMo.partialresponse — Methodpartialresponse(model::T, i::Integer, j::Integer, s::Tuple=(50, 50); inflated::Bool, kwargs...)This method returns the partial response of applying the trained model to a simulated dataset where all variables except i and j are set to their mean value.
This function will return a grid corresponding to evenly spaced values of i and j, the size of which is given by the last argument s (defaults to 50 × 50).
All keyword arguments are passed to predict.
SDeMo.partialresponse — Methodpartialresponse(model::T, i::Integer, args...; inflated::Bool, kwargs...)This method returns the partial response of applying the trained model to a simulated dataset where all variables except i are set to their mean value. The inflated keywork, when set to true, will instead pick a random value within the range of the observations.
The different arguments that can follow the variable position are
- nothing, where the unique values for the
i-th variable are used (sorted) - a number, in which point that many evenly spaced points within the range of the variable are used
- an array, in which case each value of this array is evaluated
All keyword arguments are passed to predict.
SDeMo.plr — Functionplr(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of plr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.plr — Methodplr(M::ConfusionMatrix)Positive likelihood ratio
$\frac{TPR}{FPR}$
SDeMo.ppv — Functionppv(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of ppv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.ppv — Methodppv(M::ConfusionMatrix)Positive predictive value
$\frac{TP}{TP+FP}$
SDeMo.precision — Functionprecision(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of precision using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.precision — Methodprecision(M::ConfusionMatrix)Alias for ppv, the positive predictive value
SDeMo.prune! — Methodprune!(tree, X, y)This function will take each twig in a tree, and merge the one with the worst contribution to information gain.
SDeMo.recall — Functionrecall(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of recall using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.recall — Methodrecall(M::ConfusionMatrix)Alias for tpr, the true positive rate
SDeMo.reliability — Methodreliability(sdm::AbstractSDM, link::Function=identity; bins=9, kwargs...)Returns a binned reliability curve for a trained model, where the raw scores are transformed with a specified link function (which defaults to identity). Keyword arguments other than bins are passed to predict.
SDeMo.reliability — Methodreliability(yhat, y; bins=9)Returns a binned reliability curve for a series of predicted quantitative scores and a series of truth values.
SDeMo.reset! — Functionreset!(sdm::SDM, thr=0.5)Resets a model, with a potentially specified value of the threshold. This amounts to re-using all the variables, and removing the tuned threshold version.
SDeMo.sensitivity — Functionsensitivity(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of sensitivity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.sensitivity — Methodsensitivity(M::ConfusionMatrix)Alias for tpr, the true positive rate
SDeMo.specificity — Functionspecificity(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of specificity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.specificity — Methodspecificity(M::ConfusionMatrix)Alias for tnr, the true negative rate
SDeMo.stepwisevif! — Functionstepwisevif!(model::SDM, limit, tr=:;kwargs...)Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :) is the indices to use for the VIF calculation. All keyword arguments are passed to train!.
SDeMo.threshold! — Methodthreshold!(sdm::SDM, τ)Sets the value of the threshold.
SDeMo.threshold! — Methodthreshold!(sdm::SDM, folds::Vector{Tuple{Vector{Int}, Vector{Int}}}; optimality=mcc)Optimizes the threshold for a SDM using cross-validation, as given by the folds. This is meant to be used after cross-validation, as it will cross-validate the threshold across all the training data in a way that is a little more robust than the version in train!.
The specific technique used is to train one model per fold, then aggregate all of their predictions on the validation data, and find the value of the threshold that maximizes the average performance across folds.
SDeMo.threshold! — Methodthreshold!(sdm::SDM; kwargs...)Version of threshold! without folds, for which the default of 10-fold validation will be used.
SDeMo.threshold — Methodthreshold(sdm::SDM)This returns the value above which the score returned by the SDM is considered to be a presence.
SDeMo.tnr — Functiontnr(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of tnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.tnr — Methodtnr(M::ConfusionMatrix)True-negative rate
$\frac{TN}{TN+FP}$
SDeMo.tpr — Functiontpr(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of tpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.tpr — Methodtpr(M::ConfusionMatrix)True-positive rate
$\frac{TP}{TP+FN}$
SDeMo.train! — Methodtrain!(b::AdaBoost; kwargs...)Trains all the model in an ensemble model - the keyword arguments are passed to train! for each model. Note that this also retrains the original model. If the original model contains transformers, they are re-trained for each learner that is added to the ensemble. This is crucial as learners are re-trained on proportionally weighted samples of the training data, and not re-training the transformers would create data leakage.
SDeMo.train! — Methodtrain!(ensemble::Bagging; kwargs...)Trains all the models in an ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.
SDeMo.train! — Methodtrain!(ensemble::Ensemble; kwargs...)Trains all the model in an heterogeneous ensemble model - the keyword arguments are passed to train! for each model. Note that this retrains the entire model, which includes the transformers.
The keywod arguments are passed to train! and can include the training indices.
SDeMo.train! — Methodtrain!(sdm::SDM; threshold=true, training=:, optimality=mcc)This is the main training function to train a SDM.
The three keyword arguments are:
training: defaults to:, and is the range (or alternatively the indices) of the data that are used to train the modelthreshold: defaults totrue, and performs moving threshold by evaluating 200 possible values between the minimum and maximum output of the model, and returning the one that is optimaloptimality: defaults tomcc, and is the function applied to the confusion matrix to evaluate which value of the threshold is the bestabsences: defaults tofalse, and indicates whether the (pseudo) absences are used to train the transformer; when using actual absences, this should be set totrue
Internally, this function trains the transformer, then projects the data, then trains the classifier. If threshold is true, the threshold is then optimized.
SDeMo.transformer — Methodtransformer(model::Bagging)Returns the transformer used by the model that is used as a template for the bagged model
SDeMo.transformer — Methodtransformer(model::SDM)Returns the transformer used by the model
SDeMo.trueskill — Functiontrueskill(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of trueskill using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.trueskill — Methodtrueskill(M::ConfusionMatrix)True skill statistic (a.k.a Youden's J, or informedness)
$TPR + TNR - 1$
SDeMo.variableimportance — Methodvariableimportance(model, folds; kwargs...)Returns the importance of all variables in the model. The keywords are passed to variableimportance.
SDeMo.variableimportance — Methodvariableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)Returns the importance of one variable in the model. The samples keyword fixes the number of bootstraps to run (defaults to 10, which is not enough!).
The keywords are passed to ConfusionMatrix.
SDeMo.variables! — Methodvariables!(ensemble::Bagging, v::Vector{Int})Sets the variable of the top-level model, and then sets the variables of each model in the ensemble.
SDeMo.variables! — Methodvariables!(sdm::SDM, v)Sets the list of variables.
SDeMo.variables! — Methodvariables!(model::AbstractSDM, ::Type{T}, folds::Vector{Tuple{Vector{Int}, Vector{Int}}}; included=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {T <: VariableSelectionStrategy}Performs variable selection based on a selection strategy, with a possible folds for cross-validation. If omitted, this defaults to k-folds.
The model is retrained on the optimal set of variables after training.
Keywords:
included(Int[]), a list of variables that must be included in the modeloptimality(mcc), the measure to optimise at each round of variable selectionverbose(false), whether the performance should be returned after each round of variable selectionbagfeatures(false), whetherbagfeatures!should be called on each model in an homogeneous ensemble- all other keywords are passed to
train!andcrossvalidate
Important notes:
- When using
bagfeatureswith a pool of included variables, they will always be present in the overall model, but not necessarilly in each model of the ensemble - When using
VarianceInflationFactor, the variable selection will stop even if the VIF is above the threshold, if it means producing a model with a lower performance – usingvariables!will always lead to a better model
SDeMo.variables! — Methodvariables!(model::M, ::Type{StrictVarianceInflationFactor{N}}, args...; included::Vector{Int}=Int[], optimality=mcc, verbose::Bool=false, bagfeatures::Bool=false, kwargs...) where {M <: Union{SDM, Bagging}, N}Version of the variable selection for the strict VIF case. This may result in a worse model, and for this reason there is no cross-validation.
SDeMo.variables — Methodvariables(sdm::SDM)Returns the list of variables used by the SDM – these may be ordered by importance. This does not return a copy of the variables array, but the array itself.
SDeMo.vif — Methodvif(::Matrix)Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.
SDeMo.vif — Methodvif(::AbstractSDM, tr=:)Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to : for all points). The VIF is calculated on the de-meaned predictors.
SDeMo.writesdm — Methodwritesdm(file::String, model::SDM)Writes a model to a JSON file. This method is very bare-bones, and only saves the structure of the model, as well as the data.
SDeMo.κ — Functionκ(C::Vector{<:ConfusionMatrix}, full::Bool=false)Version of κ using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.κ — Methodκ(M::ConfusionMatrix)Cohen's κ
StatsAPI.predict — MethodTODOStatsAPI.predict — MethodStatsAPI.predict(ensemble::Bagging; kwargs...)Predicts the ensemble model for all training data.
StatsAPI.predict — MethodStatsAPI.predict(ensemble::Ensemble; kwargs...)Predicts the heterogeneous ensemble model for all training data.
StatsAPI.predict — MethodStatsAPI.predict(sdm::SDM; kwargs...)This method performs the prediction on the entire set of training data available for the training of an SDM.
StatsAPI.predict — MethodTODOStatsAPI.predict — MethodStatsAPI.predict(ensemble::Bagging, X; consensus = median, kwargs...)Returns the prediction for the ensemble of models a dataset X. The function used to aggregate the outputs from different models is consensus (defaults to median). All other keyword arguments are passed to predict.
To get a direct estimate of the variability, the consensus function can be changed to iqr (inter-quantile range), or any measure of variance.
StatsAPI.predict — MethodStatsAPI.predict(ensemble::Ensemble, X; consensus = median, kwargs...)Returns the prediction for the heterogeneous ensemble of models a dataset X. The function used to aggregate the outputs from different models is consensus (defaults to median). All other keyword arguments are passed to predict.
To get a direct estimate of the variability, the consensus function can be changed to iqr (inter-quantile range), or any measure of variance.
StatsAPI.predict — MethodStatsAPI.predict(sdm::SDM, X; threshold = true)This is the main prediction function, and it takes as input an SDM and a matrix of features. The only keyword argument is threshold, which determines whether the prediction is returned raw or as a binary value (default is true).
SDeMo.AbstractBoostedSDM — TypeAbstractBoostedSDMThis type covers model that use boosting to iteratively improve on the least well predicted instances of a problem.
SDeMo.AbstractEnsembleSDM — TypeAbstractEnsembleSDMThis abstract types covers model that combine different SDMs to make a prediction, which currently covers Bagging and Ensemble.
SDeMo.AbstractSDM — TypeAbstractSDMThis abstract type covers the regular, ensemble, and boosted models.
SDeMo.AdaBoost — TypeAdaBoost <: AbstractBoostedSDMA type for AdaBoost that contains the model, a vector of learners, a vector of learner weights, a number of boosting iterations, and the weights w of each point.
Note that this type uses training by re-sampling data according to their weights, as opposed to re-training on all samples and weighting internally.
SDeMo.AllVariables — TypeAllVariablesAll variables in the training dataset are used. Note that this also crossvalidates and trains the model.
SDeMo.BIOCLIM — TypeBIOCLIMBIOCLIM
SDeMo.BackwardSelection — TypeForwardSelectionVariables are removed one at a time until the performance of the models stops improving.
SDeMo.Bagging — TypeBaggingSDeMo.Bagging — MethodBagging(model::SDM, n::Integer)Creates a bag from SDM
SDeMo.Bagging — MethodBagging(model::SDM, bags::Vector)blah
SDeMo.ChainedTransform — TypeChainedTransform{T1, T2}A transformer that applies, in sequence, a pair of other transformers. This can be used to, for example, do a PCA then a z-score on the projected space. This is limited to two steps because the value of chaining more transformers is doubtful. We may add support for more complex transformations in future versions.
The first and second steps are accessible through first and last.
SDeMo.Classifier — TypeClassifierThis abstract type covers all algorithms to convert transformed data into prediction.
SDeMo.ConfusionMatrix — TypeConfusionMatrix{T <: Number}A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero method.
SDeMo.ConfusionMatrix — MethodConfusionMatrix(ensemble::Bagging; kwargs...)Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict method.
SDeMo.ConfusionMatrix — MethodConfusionMatrix(sdm::SDM; kwargs...)Performs the predictions for an SDM, and compare to the labels used for training. The keyword arguments are passed to the predict method.
SDeMo.ConfusionMatrix — MethodConfusionMatrix(pred::Vector{Bool}, truth::Vector{Bool})Given a vector of binary predictions and a vector of ground truths, returns the confusion matrix.
SDeMo.ConfusionMatrix — MethodConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}, τ::T) where {T <: Number}Given a vector of scores and a vector of ground truths, as well as a threshold, transforms the score into binary predictions and returns the confusion matrix.
SDeMo.ConfusionMatrix — MethodConfusionMatrix(pred::Vector{T}, truth::Vector{Bool}) where {T <: Number}Given a vector of scores and a vector of truth, returns the confusion matrix under the assumption that the score are probabilities and that the threshold is one half.
SDeMo.DecisionTree — TypeDecisionTreeThe depth and number of nodes can be adjusted with maxnodes! and maxdepth!.
SDeMo.Ensemble — TypeEnsembleAn heterogeneous ensemble model is defined as a vector of SDMs. Bagging models can also be used.
SDeMo.ForwardSelection — TypeForwardSelectionVariables are included one at a time until the performance of the models stops improving.
SDeMo.Logistic — TypeLogisticLogistic regression with default learning rate of 0.01, penalization (L2) of 0.1, and 2000 epochs. Note that interaction terms can be turned on and off through the use of the interactions field. Possible values are :all (default), :self (only squared terms), and :none (no interactions).
The verbose field (defaults to false) can be used to show the progress of gradient descent, by showing the loss every 100 epochs, or to the value of the verbosity field. Note that when doing cross-validation, the loss on the validation data will be automatically reported.
SDeMo.MultivariateTransform — TypeMultivariateTransform{T} <: TransformerT is a multivariate transformation, likely offered through the MultivariateStats package. The transformations currently supported are PCA, PPCA, KernelPCA, and Whitening, and they are documented through their type aliases (e.g. PCATransform).
SDeMo.NaiveBayes — TypeNaiveBayesNaive Bayes Classifier
By default, upon training, the prior probability will be set to the prevalence of the training data.
SDeMo.PCATransform — TypePCATransformThe PCA transform will project the model features, which also serves as a way to decrease the dimensionality of the problem. Note that this method will only use the training instances, and unless the absences=true keyword is used, only the present cases. This ensure that there is no data leak (neither validation data nor the data from the raster are used).
This is an alias for MultivariateTransform{PCA}.
SDeMo.RawData — TypeRawDataA transformer that does nothing to the data. This is passing the raw data to the classifier, and can be a good first step for models that assume that the features are independent, or are not sensitive to the scale of the features.
SDeMo.SDM — TypeSDMThis type specifies a full model, which is composed of a transformer (which applies a transformation on the data), a classifier (which returns a quantitative score), a threshold (above which the score corresponds to the prediction of a presence).
In addition, the SDM carries with it the training features and labels, as well as a vector of indices indicating which variables are actually used by the model.
SDeMo.StrictVarianceInflationFactor — TypeStrictVarianceInflationFactor{N}Removes variables one at a time until the largest VIF is lower than N (a floating point number). By contrast with VarianceInflationFactor, this approach to variable selection will not cross-validate the model, and might result in a model that is far worse than any other variable selection technique.
SDeMo.Transformer — TypeTransformerThis abstract type covers all transformations that are applied to the data before fitting the classifier.
SDeMo.VariableSelectionStrategy — TypeVariableSelectionStrategyThis is an abstract type to which all variable selection types belong. The variable selection methods should define a method for variables!, whose first argument is a model, and the second argument is a selection strategy. The third and fourth positional arguments are, respectively, a list of variables to be included, and the folds to use for cross-validation. They can be omitted and would default to no default variables, and k-fold cross-validation.
SDeMo.VarianceInflationFactor — TypeVarianceInflationFactor{N}Removes variables one at a time until the largest VIF is lower than N (a floating point number), or the performancde of the model stops increasing. Note that the resulting set of variables may have a largest VIF larger than the threshold. See StrictVarianceInflationFactor for an alternative.
SDeMo.WhiteningTransform — TypeWhiteningTransformThe whitening transformation is a linear transformation of the input variables, after which the new variables have unit variance and no correlation. The input is transformed into white noise.
Because this transform will usually keep the first variable "as is", and then apply increasingly important perturbations on the subsequent variables, it is sensitive to the order in which variables are presented, and is less useful when applying tools for interpretation.
This is an alias for MultivariateTransform{Whitening}.
SDeMo.ZScore — TypeZScoreA transformer that scales and centers the data, using only the data that are avaiable to the model at training time.
For all variables in the SDM features (regardless of whether they are used), this transformer will store the observed mean and standard deviation. There is no correction on the sample size, because there is no reason to expect that the sample size will be the same for the training and prediction situation.