Skip to content

Cross-validation

Confusion matrix

SDeMo.ConfusionMatrix Type
julia
ConfusionMatrix{T <: Number}

A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero method.

source

Folds

These methods will all take as input a vector of labels and a matrix of features, and return a vector of tuples, that have the training indices in the first position, and the validation data in the second. This is not true for holdout, which returns a single tuple.

SDeMo.holdout Function
julia
holdout(y, X; proportion = 0.2, permute = true)

Sets aside a proportion (given by the proportion keyword, defaults to 0.2) of observations to use for validation, and the rest for training. An additional argument permute (defaults to true) can be used to shuffle the order of observations before they are split.

This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate, it must be put in [].

source
julia
holdout(sdm::SDM)

Version of holdout using the instances and labels of an SDM.

source
julia
holdout(sdm::Bagging)

Version of holdout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.montecarlo Function
julia
montecarlo(y, X; n = 100, kwargs...)

Returns n (def. 100) samples of holdout. Other keyword arguments are passed to holdout.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
julia
montecarlo(sdm::SDM)

Version of montecarlo using the instances and labels of an SDM.

source
julia
montecarlo(sdm::Bagging)

Version of montecarlo using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.leaveoneout Function
julia
leaveoneout(y, X)

Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
julia
leaveoneout(sdm::SDM)

Version of leaveoneout using the instances and labels of an SDM.

source
julia
leaveoneout(sdm::Bagging)

Version of leaveoneout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source
SDeMo.kfold Function
julia
kfold(y, X; k = 10, permute = true)

Returns splits of the data in which 1 group is used for validation, and k-1 groups are used for training. All k``groups have the (approximate) same size, and each instance is only used once for validation (andk`-1 times for training). The groups are stratified (so that they have the same prevalence).

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source
julia
kfold(sdm::SDM)

Version of kfold using the instances and labels of an SDM.

source
julia
kfold(sdm::Bagging)

Version of kfold using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

Cross-validation

SDeMo.crossvalidate Function
julia
crossvalidate(sdm, folds; thr = nothing, kwargs...)

Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr keyword arguments. All other keywords are passed to the train! method.

This method returns two vectors of ConfusionMatrix, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.

source

Null classifiers

SDeMo.noskill Function
julia
noskill(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.

source
julia
noskill(sdm::SDM)

Version of noskill using the training labels for an SDM.

source
julia
noskill(ensemble::Bagging)

Version of noskill using the training labels for an homogeneous ensemble.

source
SDeMo.coinflip Function
julia
coinflip(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.

source
julia
coinflip(sdm::SDM)

Version of coinflip using the training labels for an SDM.

source
julia
coinflip(ensemble::Bagging)

Version of coinflip using the training labels for an homogeneous ensemble.

source
SDeMo.constantnegative Function
julia
constantnegative(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.

source
julia
constantnegative(sdm::SDM)

Version of constantnegative using the training labels for an SDM.

source
julia
constantnegative(ensemble::Bagging)

Version of constantnegative using the training labels for an homogeneous ensemble.

source
SDeMo.constantpositive Function
julia
constantpositive(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.

source
julia
constantpositive(sdm::SDM)

Version of constantpositive using the training labels for an SDM.

source
julia
constantpositive(ensemble::Bagging)

Version of constantpositive using the training labels for an homogeneous ensemble.

source

List of performance measures

SDeMo.tpr Function
julia
tpr(M::ConfusionMatrix)

True-positive rate

TPTP+FN

source
julia
tpr(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of tpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.tnr Function
julia
tnr(M::ConfusionMatrix)

True-negative rate

TNTN+FP

source
julia
tnr(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of tnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fpr Function
julia
fpr(M::ConfusionMatrix)

False-positive rate

FPFP+TN

source
julia
fpr(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of fpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fnr Function
julia
fnr(M::ConfusionMatrix)

False-negative rate

FNFN+TP

source
julia
fnr(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of fnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.ppv Function
julia
ppv(M::ConfusionMatrix)

Positive predictive value

TPTP+FP

source
julia
ppv(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of ppv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.npv Function
julia
npv(M::ConfusionMatrix)

Negative predictive value

TNTN+FN

source
julia
npv(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of npv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fdir Function
julia
fdir(M::ConfusionMatrix)

False discovery rate, 1 - ppv

source
julia
fdir(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of fdir using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fomr Function
julia
fomr(M::ConfusionMatrix)

False omission rate, 1 - npv

source
julia
fomr(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of fomr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.plr Function
julia
plr(M::ConfusionMatrix)

Positive likelihood ratio

TPRFPR

source
julia
plr(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of plr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.nlr Function
julia
nlr(M::ConfusionMatrix)

Negative likelihood ratio

FNRTNR

source
julia
nlr(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of nlr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.accuracy Function
julia
accuracy(M::ConfusionMatrix)

Accuracy

TP+TNTP+TN+FP+FN

source
julia
accuracy(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of accuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.balancedaccuracy Function
julia
balanced(M::ConfusionMatrix)

Balanced accuracy

12(TPR+TNR)

source
julia
balancedaccuracy(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of balancedaccuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.f1 Function
julia
f1(M::ConfusionMatrix)

F₁ score, defined as the harmonic mean between precision and recall:

2×PPV×TPRPPV+TPR

This uses the more general fscore internally.

source
julia
f1(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of f1 using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.fscore Function
julia
fscore(M::ConfusionMatrix, β=1.0)

Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:

(1+β2)×PPV×TPR(β2×PPV)+TPR

source
julia
fscore(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of fscore using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.trueskill Function
julia
trueskill(M::ConfusionMatrix)

True skill statistic (a.k.a Youden's J, or informedness)

TPR+TNR1

source
julia
trueskill(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of trueskill using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.markedness Function
julia
markedness(M::ConfusionMatrix)

Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions

PPV+NPV1

source
julia
markedness(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of markedness using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.dor Function
julia
dor(M::ConfusionMatrix)

Diagnostic odd ratio, defined as plr/nlr. A useful test has a value larger than unity, and this value has no upper bound.

source
julia
dor(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of dor using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.κ Function
julia
κ(M::ConfusionMatrix)

Cohen's κ

source
julia
κ(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of κ using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.mcc Function
julia
mcc(M::ConfusionMatrix)

Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.

source
julia
mcc(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of mcc using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

Confidence interval

SDeMo.ci Function
julia
ci(C::Vector{<:ConfusionMatrix}, f)

Applies f to all confusion matrices in the vector, and returns the 95% CI.

source
julia
ci(C::Vector{<:ConfusionMatrix})

Applies the MCC (mcc) to all confusion matrices in the vector, and returns the 95% CI.

source

Aliases

SDeMo.specificity Function
julia
specificity(M::ConfusionMatrix)

Alias for tnr, the true negative rate

source
julia
specificity(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of specificity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.sensitivity Function
julia
sensitivity(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source
julia
sensitivity(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of sensitivity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.recall Function
julia
recall(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source
julia
recall(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of recall using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source
SDeMo.precision Function
julia
precision(M::ConfusionMatrix)

Alias for ppv, the positive predictive value

source
julia
precision(C::Vector{<:ConfusionMatrix}, full::Bool=false)

Version of precision using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source