Cross-validation

Confusion matrix

julia

ConfusionMatrix{T <: Number}

A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero method.

source

Folds

These methods will all take as input a vector of labels and a matrix of features, and return a vector of tuples, that have the training indices in the first position, and the validation data in the second. This is not true for holdout, which returns a single tuple.

SDeMo.holdout Function

julia

holdout(y, X; proportion = 0.2, permute = true)

Sets aside a proportion (given by the proportion keyword, defaults to 0.2) of observations to use for validation, and the rest for training. An additional argument permute (defaults to true) can be used to shuffle the order of observations before they are split.

This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate, it must be put in [].

source

julia

holdout(sdm::SDM)

Version of holdout using the instances and labels of an SDM.

source

julia

holdout(sdm::Bagging)

Version of holdout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

SDeMo.montecarlo Function

julia

montecarlo(y, X; n = 100, kwargs...)

Returns n (def. 100) samples of holdout. Other keyword arguments are passed to holdout.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source

julia

montecarlo(sdm::SDM)

Version of montecarlo using the instances and labels of an SDM.

source

julia

montecarlo(sdm::Bagging)

Version of montecarlo using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

SDeMo.leaveoneout Function

julia

leaveoneout(y, X)

Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source

julia

leaveoneout(sdm::SDM)

Version of leaveoneout using the instances and labels of an SDM.

source

julia

leaveoneout(sdm::Bagging)

Version of leaveoneout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

SDeMo.kfold Function

julia

kfold(y, X; k = 10, permute = true)

Returns splits of the data in which 1 group is used for validation, and k-1 groups are used for training. All k``groups have the (approximate) same size, and each instance is only used once for validation (andk`-1 times for training).

This method returns a vector of tuples, with each entry have the training data first, and the validation data second.

source

julia

kfold(sdm::SDM)

Version of kfold using the instances and labels of an SDM.

source

julia

kfold(sdm::Bagging)

Version of kfold using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.

source

Cross-validation

SDeMo.crossvalidate Function

julia

crossvalidate(sdm, folds; thr = nothing, kwargs...)

Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr keyword arguments. All other keywords are passed to the train! method.

This method returns two vectors of ConfusionMatrix, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.

source

Null classifiers

SDeMo.noskill Function

julia

noskill(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.

source

julia

noskill(sdm::SDM)

Version of noskill using the training labels for an SDM.

source

SDeMo.coinflip Function

julia

coinflip(labels::Vector{Bool})

Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.

source

julia

coinflip(sdm::SDM)

Version of coinflip using the training labels for an SDM.

source

SDeMo.constantnegative Function

julia

constantnegative(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.

source

julia

constantnegative(sdm::SDM)

Version of constantnegative using the training labels for an SDM.

source

SDeMo.constantpositive Function

julia

constantpositive(labels::Vector{Bool})

Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.

source

julia

constantpositive(sdm::SDM)

Version of constantpositive using the training labels for an SDM.

source

List of performance measures

SDeMo.tpr Function

julia

tpr(M::ConfusionMatrix)

True-positive rate

$\frac{T P}{T P + F N}$

source

julia

tpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.tnr Function

julia

tnr(M::ConfusionMatrix)

True-negative rate

$\frac{T N}{T N + F P}$

source

julia

tnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of tnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fpr Function

julia

fpr(M::ConfusionMatrix)

False-positive rate

$\frac{F P}{F P + T N}$

source

julia

fpr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fnr Function

julia

fnr(M::ConfusionMatrix)

False-negative rate

$\frac{F N}{F N + T P}$

source

julia

fnr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.ppv Function

julia

ppv(M::ConfusionMatrix)

Positive predictive value

$\frac{T P}{T P + F P}$

source

julia

ppv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of ppv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.npv Function

julia

npv(M::ConfusionMatrix)

Negative predictive value

$\frac{T N}{T N + F N}$

source

julia

npv(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of npv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fdir Function

julia

fdir(M::ConfusionMatrix)

False discovery rate, 1 - ppv

source

julia

fdir(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fdir using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fomr Function

julia

fomr(M::ConfusionMatrix)

False omission rate, 1 - npv

source

julia

fomr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fomr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.plr Function

julia

plr(M::ConfusionMatrix)

Positive likelihood ratio

$\frac{T P R}{F P R}$

source

julia

plr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of plr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.nlr Function

julia

nlr(M::ConfusionMatrix)

Negative likelihood ratio

$\frac{F N R}{T N R}$

source

julia

nlr(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of nlr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.accuracy Function

julia

accuracy(M::ConfusionMatrix)

Accuracy

$\frac{T P + T N}{T P + T N + F P + F N}$

source

julia

accuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of accuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.balancedaccuracy Function

julia

balanced(M::ConfusionMatrix)

Balanced accuracy

$\frac{1}{2} (T P R + T N R)$

source

julia

balancedaccuracy(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of balancedaccuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.f1 Function

julia

f1(M::ConfusionMatrix)

F₁ score, defined as the harmonic mean between precision and recall:

$2 \times \frac{P P V \times T P R}{P P V + T P R}$

This uses the more general fscore internally.

source

julia

f1(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of f1 using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.fscore Function

julia

fscore(M::ConfusionMatrix, β=1.0)

Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:

$(1 + β^{2}) \times \frac{P P V \times T P R}{(β^{2} \times P P V) + T P R}$

source

julia

fscore(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of fscore using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.trueskill Function

julia

trueskill(M::ConfusionMatrix)

True skill statistic (a.k.a Youden's J, or informedness)

$T P R + T N R - 1$

source

julia

trueskill(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of trueskill using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.markedness Function

julia

markedness(M::ConfusionMatrix)

Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions

$P P V + N P V - 1$

source

julia

markedness(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of markedness using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.dor Function

julia

dor(M::ConfusionMatrix)

Diagnostic odd ratio, defined as plr/nlr. A useful test has a value larger than unity, and this value has no upper bound.

source

julia

dor(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of dor using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.κ Function

julia

κ(M::ConfusionMatrix)

Cohen's κ

source

julia

κ(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of κ using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.mcc Function

julia

mcc(M::ConfusionMatrix)

Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.

source

julia

mcc(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of mcc using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

Confidence interval

SDeMo.ci Function

julia

ci(C::Vector{ConfusionMatrix}, f)

Applies f to all confusion matrices in the vector, and returns the 95% CI.

source

julia

ci(C::Vector{ConfusionMatrix})

Applies the MCC (mcc) to all confusion matrices in the vector, and returns the 95% CI.

source

Aliases

SDeMo.specificity Function

julia

specificity(M::ConfusionMatrix)

Alias for tnr, the true negative rate

source

julia

specificity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of specificity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.sensitivity Function

julia

sensitivity(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source

julia

sensitivity(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of sensitivity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.recall Function

julia

recall(M::ConfusionMatrix)

Alias for tpr, the true positive rate

source

julia

recall(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of recall using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

SDeMo.precision Function

julia

precision(M::ConfusionMatrix)

Alias for ppv, the positive predictive value

source

julia

precision(C::Vector{ConfusionMatrix}, full::Bool=false)

Version of precision using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.

source

Cross-validation ​

Confusion matrix ​

Folds ​

Cross-validation ​

Null classifiers ​

List of performance measures ​

Confidence interval ​

Aliases ​

Cross-validation

Confusion matrix

Folds

Cross-validation

Null classifiers

List of performance measures

Confidence interval

Aliases