Skip to content

Cross-validation

Confusion matrix

SDeMo.ConfusionMatrix Type
julia
ConfusionMatrix{T <: Number}

A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero method.

source

Folds

These methods will all take as input a vector of labels and a matrix of features, and return a vector of tuples, that have the training indices in the first position, and the validation data in the second. This is not true for holdout, which returns a single tuple.

SDeMo.holdout Function
julia
holdout(y, X; proportion = 0.2, permute = true)

Sets aside a proportion (given by the proportion keyword, defaults to 0.2) of observations to use for validation, and the rest for training. An additional argument permute (defaults to true) can be used to shuffle the order of observations before they are split.

This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate, it must be put in [].

source

julia
holdout(sdm::SDM)

Version of holdout using the instances and labels of an SDM.

source

julia
holdout(sdm::Bagging)

Version of holdout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

SDeMo.montecarlo Function
julia
montecarlo(y, X; n = 100, kwargs...)

Returns n (def. 100) samples of holdout. Other keyword arguments are passed to holdout.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source

julia
montecarlo(sdm::SDM)

Version of montecarlo using the instances and labels of an SDM.

source

julia
montecarlo(sdm::Bagging)

Version of montecarlo using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

SDeMo.leaveoneout Function
julia
leaveoneout(y, X)

Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source

julia
leaveoneout(sdm::SDM)

Version of leaveoneout using the instances and labels of an SDM.

source

julia
leaveoneout(sdm::Bagging)

Version of leaveoneout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

SDeMo.kfold Function
julia
kfold(y, X; k = 10, permute = true)

Returns splits of the data in which 1 group is used for validation, and k-1 groups are used for training. All k``groups have the (approximate) same size, and each instance is only used once for validation (andk`-1 times for training).

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source

julia
kfold(sdm::SDM)

Version of kfold using the instances and labels of an SDM.

source

julia
kfold(sdm::Bagging)

Version of kfold using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

Cross-validation

SDeMo.crossvalidate Function
julia
crossvalidate(sdm, folds; thr = nothing, kwargs...)

Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr keyword arguments. All other keywords are passed to the train! method.

This method returns two vectors of ConfusionMatrix, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.

source

Null classifiers

SDeMo.noskill Function
julia
noskill(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.

source

julia
noskill(sdm::SDM)

Version of noskill using the training labels for an SDM.

source

SDeMo.coinflip Function
julia
coinflip(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.

source

julia
coinflip(sdm::SDM)

Version of coinflip using the training labels for an SDM.

source

SDeMo.constantnegative Function
julia
constantnegative(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.

source

julia
constantnegative(sdm::SDM)

Version of constantnegative using the training labels for an SDM.

source

SDeMo.constantpositive Function
julia
constantpositive(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.

source

julia
constantpositive(sdm::SDM)

Version of constantpositive using the training labels for an SDM.

source

List of performance measures

SDeMo.tpr Function
julia
tpr(M::ConfusionMatrix)

True-positive rate

TPTP+FN

source

julia
tpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.tnr Function
julia
tnr(M::ConfusionMatrix)

True-negative rate

TNTN+FP

source

julia
tnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fpr Function
julia
fpr(M::ConfusionMatrix)

False-positive rate

FPFP+TN

source

julia
fpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fnr Function
julia
fnr(M::ConfusionMatrix)

False-negative rate

FNFN+TP

source

julia
fnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.ppv Function
julia
ppv(M::ConfusionMatrix)

Positive predictive value

TPTP+FP

source

julia
ppv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of ppv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.npv Function
julia
npv(M::ConfusionMatrix)

Negative predictive value

TNTN+FN

source

julia
npv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of npv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fdir Function
julia
fdir(M::ConfusionMatrix)

False discovery rate, 1 - ppv

source

julia
fdir(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fdir using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fomr Function
julia
fomr(M::ConfusionMatrix)

False omission rate, 1 - npv

source

julia
fomr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fomr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.plr Function
julia
plr(M::ConfusionMatrix)

Positive likelihood ratio

TPRFPR

source

julia
plr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of plr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.nlr Function
julia
nlr(M::ConfusionMatrix)

Negative likelihood ratio

FNRTNR

source

julia
nlr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of nlr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.accuracy Function
julia
accuracy(M::ConfusionMatrix)

Accuracy

TP+TNTP+TN+FP+FN

source

julia
accuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of accuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.balancedaccuracy Function
julia
balanced(M::ConfusionMatrix)

Balanced accuracy

12(TPR+TNR)

source

julia
balancedaccuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of balancedaccuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.f1 Function
julia
f1(M::ConfusionMatrix)

F₁ score, defined as the harmonic mean between precision and recall:

2×PPV×TPRPPV+TPR

This uses the more general fscore internally.

source

julia
f1(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of f1 using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fscore Function
julia
fscore(M::ConfusionMatrix, β=1.0)

Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:

(1+β2)×PPV×TPR(β2×PPV)+TPR

source

julia
fscore(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fscore using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.trueskill Function
julia
trueskill(M::ConfusionMatrix)

True skill statistic (a.k.a Youden's J, or informedness)

TPR+TNR1

source

julia
trueskill(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of trueskill using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.markedness Function
julia
markedness(M::ConfusionMatrix)

Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions

PPV+NPV1

source

julia
markedness(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of markedness using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.dor Function
julia
dor(M::ConfusionMatrix)

Diagnostic odd ratio, defined as plr/nlr. A useful test has a value larger than unity, and this value has no upper bound.

source

julia
dor(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of dor using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.κ Function
julia
κ(M::ConfusionMatrix)

Cohen's κ

source

julia
κ(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of κ using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.mcc Function
julia
mcc(M::ConfusionMatrix)

Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.

source

julia
mcc(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of mcc using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

Confidence interval

SDeMo.ci Function
julia
ci(C::Vector{ConfusionMatrix}, f)

Applies f to all confusion matrices in the vector, and returns the 95% CI.

source

julia
ci(C::Vector{ConfusionMatrix})

Applies the MCC (mcc) to all confusion matrices in the vector, and returns the 95% CI.

source

Aliases

SDeMo.specificity Function
julia
specificity(M::ConfusionMatrix)

Alias for tnr, the true negative rate

source

julia
specificity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of specificity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.sensitivity Function
julia
sensitivity(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source

julia
sensitivity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of sensitivity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.recall Function
julia
recall(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source

julia
recall(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of recall using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.precision Function
julia
precision(M::ConfusionMatrix)

Alias for ppv, the positive predictive value

source

julia
precision(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of precision using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source