Cross-validation
Confusion matrix
SDeMo.ConfusionMatrix Type
ConfusionMatrix{T <: Number}A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero method.
Folds
These methods will all take as input a vector of labels and a matrix of features, and return a vector of tuples, that have the training indices in the first position, and the validation data in the second. This is not true for holdout, which returns a single tuple.
SDeMo.holdout Function
holdout(y, X; proportion = 0.2, permute = true)Sets aside a proportion (given by the proportion keyword, defaults to 0.2) of observations to use for validation, and the rest for training. An additional argument permute (defaults to true) can be used to shuffle the order of observations before they are split.
This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate, it must be put in [].
holdout(sdm::SDM)Version of holdout using the instances and labels of an SDM.
holdout(sdm::Bagging)Version of holdout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.montecarlo Function
montecarlo(y, X; n = 100, kwargs...)Returns n (def. 100) samples of holdout. Other keyword arguments are passed to holdout.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
montecarlo(sdm::SDM)Version of montecarlo using the instances and labels of an SDM.
montecarlo(sdm::Bagging)Version of montecarlo using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.leaveoneout Function
leaveoneout(y, X)Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
leaveoneout(sdm::SDM)Version of leaveoneout using the instances and labels of an SDM.
leaveoneout(sdm::Bagging)Version of leaveoneout using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.kfold Function
kfold(y, X; k = 10, permute = true)Returns splits of the data in which 1 group is used for validation, and k-1 groups are used for training. All k``groups have the (approximate) same size, and each instance is only used once for validation (andk`-1 times for training).
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
kfold(sdm::SDM)Version of kfold using the instances and labels of an SDM.
kfold(sdm::Bagging)Version of kfold using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
Cross-validation
SDeMo.crossvalidate Function
crossvalidate(sdm, folds; thr = nothing, kwargs...)Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr keyword arguments. All other keywords are passed to the train! method.
This method returns two vectors of ConfusionMatrix, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.
Null classifiers
SDeMo.noskill Function
noskill(labels::Vector{Bool})Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.
noskill(sdm::SDM)Version of noskill using the training labels for an SDM.
SDeMo.coinflip Function
coinflip(labels::Vector{Bool})Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.
coinflip(sdm::SDM)Version of coinflip using the training labels for an SDM.
SDeMo.constantnegative Function
constantnegative(labels::Vector{Bool})Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.
constantnegative(sdm::SDM)Version of constantnegative using the training labels for an SDM.
SDeMo.constantpositive Function
constantpositive(labels::Vector{Bool})Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.
constantpositive(sdm::SDM)Version of constantpositive using the training labels for an SDM.
List of performance measures
SDeMo.tpr Function
tpr(M::ConfusionMatrix)True-positive rate
tpr(C::Vector{ConfusionMatrix}, full::Bool=false)Version of tpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.tnr Function
tnr(M::ConfusionMatrix)True-negative rate
tnr(C::Vector{ConfusionMatrix}, full::Bool=false)Version of tnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fpr Function
fpr(M::ConfusionMatrix)False-positive rate
fpr(C::Vector{ConfusionMatrix}, full::Bool=false)Version of fpr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fnr Function
fnr(M::ConfusionMatrix)False-negative rate
fnr(C::Vector{ConfusionMatrix}, full::Bool=false)Version of fnr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.ppv Function
ppv(M::ConfusionMatrix)Positive predictive value
ppv(C::Vector{ConfusionMatrix}, full::Bool=false)Version of ppv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.npv Function
npv(M::ConfusionMatrix)Negative predictive value
npv(C::Vector{ConfusionMatrix}, full::Bool=false)Version of npv using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fdir Function
fdir(M::ConfusionMatrix)False discovery rate, 1 - ppv
fdir(C::Vector{ConfusionMatrix}, full::Bool=false)Version of fdir using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fomr Function
fomr(M::ConfusionMatrix)False omission rate, 1 - npv
fomr(C::Vector{ConfusionMatrix}, full::Bool=false)Version of fomr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.plr Function
plr(M::ConfusionMatrix)Positive likelihood ratio
plr(C::Vector{ConfusionMatrix}, full::Bool=false)Version of plr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.nlr Function
nlr(M::ConfusionMatrix)Negative likelihood ratio
nlr(C::Vector{ConfusionMatrix}, full::Bool=false)Version of nlr using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.accuracy Function
accuracy(M::ConfusionMatrix)Accuracy
accuracy(C::Vector{ConfusionMatrix}, full::Bool=false)Version of accuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.balancedaccuracy Function
balanced(M::ConfusionMatrix)Balanced accuracy
balancedaccuracy(C::Vector{ConfusionMatrix}, full::Bool=false)Version of balancedaccuracy using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.f1 Function
f1(M::ConfusionMatrix)F₁ score, defined as the harmonic mean between precision and recall:
This uses the more general fscore internally.
f1(C::Vector{ConfusionMatrix}, full::Bool=false)Version of f1 using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.fscore Function
fscore(M::ConfusionMatrix, β=1.0)Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:
fscore(C::Vector{ConfusionMatrix}, full::Bool=false)Version of fscore using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.trueskill Function
trueskill(M::ConfusionMatrix)True skill statistic (a.k.a Youden's J, or informedness)
trueskill(C::Vector{ConfusionMatrix}, full::Bool=false)Version of trueskill using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.markedness Function
markedness(M::ConfusionMatrix)Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions
markedness(C::Vector{ConfusionMatrix}, full::Bool=false)Version of markedness using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.dor Function
dor(M::ConfusionMatrix)Diagnostic odd ratio, defined as plr/nlr. A useful test has a value larger than unity, and this value has no upper bound.
dor(C::Vector{ConfusionMatrix}, full::Bool=false)Version of dor using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.κ Function
κ(M::ConfusionMatrix)Cohen's κ
κ(C::Vector{ConfusionMatrix}, full::Bool=false)Version of κ using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.mcc Function
mcc(M::ConfusionMatrix)Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.
mcc(C::Vector{ConfusionMatrix}, full::Bool=false)Version of mcc using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
Confidence interval
SDeMo.ci Function
ci(C::Vector{ConfusionMatrix}, f)Applies f to all confusion matrices in the vector, and returns the 95% CI.
ci(C::Vector{ConfusionMatrix})Applies the MCC (mcc) to all confusion matrices in the vector, and returns the 95% CI.
Aliases
SDeMo.specificity Function
specificity(M::ConfusionMatrix)Alias for tnr, the true negative rate
specificity(C::Vector{ConfusionMatrix}, full::Bool=false)Version of specificity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.sensitivity Function
sensitivity(M::ConfusionMatrix)Alias for tpr, the true positive rate
sensitivity(C::Vector{ConfusionMatrix}, full::Bool=false)Version of sensitivity using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.recall Function
recall(M::ConfusionMatrix)Alias for tpr, the true positive rate
recall(C::Vector{ConfusionMatrix}, full::Bool=false)Version of recall using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.
SDeMo.precision Function
precision(M::ConfusionMatrix)Alias for ppv, the positive predictive value
precision(C::Vector{ConfusionMatrix}, full::Bool=false)Version of precision using a vector of confusion matrices. Returns the mean, and when the second argument is true, returns a tuple where the second argument is the CI.