Cross-validation
Confusion matrix
SDeMo.ConfusionMatrix Type
ConfusionMatrix{T <: Number}
A structure to store the true positives, true negatives, false positives, and false negatives counts (or proportion) during model evaluation. Empty confusion matrices can be created using the zero
method.
Folds
These methods will all take as input a vector of labels and a matrix of features, and return a vector of tuples, that have the training indices in the first position, and the validation data in the second. This is not true for holdout
, which returns a single tuple.
SDeMo.holdout Function
holdout(y, X; proportion = 0.2, permute = true)
Sets aside a proportion (given by the proportion
keyword, defaults to 0.2
) of observations to use for validation, and the rest for training. An additional argument permute
(defaults to true
) can be used to shuffle the order of observations before they are split.
This method returns a single tuple with the training data first and the validation data second. To use this with crossvalidate
, it must be put in []
.
holdout(sdm::SDM)
Version of holdout
using the instances and labels of an SDM.
holdout(sdm::Bagging)
Version of holdout
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.montecarlo Function
montecarlo(y, X; n = 100, kwargs...)
Returns n
(def. 100
) samples of holdout
. Other keyword arguments are passed to holdout
.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
montecarlo(sdm::SDM)
Version of montecarlo
using the instances and labels of an SDM.
montecarlo(sdm::Bagging)
Version of montecarlo
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.leaveoneout Function
leaveoneout(y, X)
Returns the splits for leave-one-out cross-validation. Each sample is used once, on its own, for validation.
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
leaveoneout(sdm::SDM)
Version of leaveoneout
using the instances and labels of an SDM.
leaveoneout(sdm::Bagging)
Version of leaveoneout
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
SDeMo.kfold Function
kfold(y, X; k = 10, permute = true)
Returns splits of the data in which 1 group is used for validation, and k
-1 groups are used for training. All k``groups have the (approximate) same size, and each instance is only used once for validation (and
k`-1 times for training).
This method returns a vector of tuples, with each entry have the training data first, and the validation data second.
kfold(sdm::SDM)
Version of kfold
using the instances and labels of an SDM.
kfold(sdm::Bagging)
Version of kfold
using the instances and labels of a bagged SDM. In this case, the instances of the model used as a reference to build the bagged model are used.
Cross-validation
SDeMo.crossvalidate Function
crossvalidate(sdm, folds; thr = nothing, kwargs...)
Performs cross-validation on a model, given a vector of tuples representing the data splits. The threshold can be fixed through the thr
keyword arguments. All other keywords are passed to the train!
method.
This method returns two vectors of ConfusionMatrix
, with the confusion matrix for each set of validation data first, and the confusion matrix for the training data second.
Null classifiers
SDeMo.noskill Function
noskill(labels::Vector{Bool})
Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected by its proportion in the training data.
noskill(sdm::SDM)
Version of noskill
using the training labels for an SDM.
SDeMo.coinflip Function
coinflip(labels::Vector{Bool})
Returns the confusion matrix for the no-skill classifier given a vector of labels. Predictions are made at random, with each class being selected with a probability of one half.
coinflip(sdm::SDM)
Version of coinflip
using the training labels for an SDM.
SDeMo.constantnegative Function
constantnegative(labels::Vector{Bool})
Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be negative.
constantnegative(sdm::SDM)
Version of constantnegative
using the training labels for an SDM.
SDeMo.constantpositive Function
constantpositive(labels::Vector{Bool})
Returns the confusion matrix for the constant positive classifier given a vector of labels. Predictions are assumed to always be positive.
constantpositive(sdm::SDM)
Version of constantpositive
using the training labels for an SDM.
List of performance measures
SDeMo.tpr Function
tpr(M::ConfusionMatrix)
True-positive rate
tpr(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of tpr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.tnr Function
tnr(M::ConfusionMatrix)
True-negative rate
tnr(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of tnr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fpr Function
fpr(M::ConfusionMatrix)
False-positive rate
fpr(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of fpr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fnr Function
fnr(M::ConfusionMatrix)
False-negative rate
fnr(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of fnr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.ppv Function
ppv(M::ConfusionMatrix)
Positive predictive value
ppv(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of ppv
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.npv Function
npv(M::ConfusionMatrix)
Negative predictive value
npv(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of npv
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fdir Function
fdir(M::ConfusionMatrix)
False discovery rate, 1 - ppv
fdir(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of fdir
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fomr Function
fomr(M::ConfusionMatrix)
False omission rate, 1 - npv
fomr(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of fomr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.plr Function
plr(M::ConfusionMatrix)
Positive likelihood ratio
plr(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of plr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.nlr Function
nlr(M::ConfusionMatrix)
Negative likelihood ratio
nlr(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of nlr
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.accuracy Function
accuracy(M::ConfusionMatrix)
Accuracy
accuracy(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of accuracy
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.balancedaccuracy Function
balanced(M::ConfusionMatrix)
Balanced accuracy
balancedaccuracy(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of balancedaccuracy
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.f1 Function
f1(M::ConfusionMatrix)
F₁ score, defined as the harmonic mean between precision and recall:
This uses the more general fscore
internally.
f1(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of f1
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.fscore Function
fscore(M::ConfusionMatrix, β=1.0)
Fᵦ score, defined as the harmonic mean between precision and recall, using a positive factor β indicating the relative importance of recall over precision:
fscore(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of fscore
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.trueskill Function
trueskill(M::ConfusionMatrix)
True skill statistic (a.k.a Youden's J, or informedness)
trueskill(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of trueskill
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.markedness Function
markedness(M::ConfusionMatrix)
Markedness, a measure similar to informedness (TSS) that emphasizes negative predictions
markedness(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of markedness
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.dor Function
dor(M::ConfusionMatrix)
Diagnostic odd ratio, defined as plr
/nlr
. A useful test has a value larger than unity, and this value has no upper bound.
dor(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of dor
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.κ Function
κ(M::ConfusionMatrix)
Cohen's κ
κ(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of κ
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.mcc Function
mcc(M::ConfusionMatrix)
Matthew's correlation coefficient. This is the default measure of model performance, and there are rarely good reasons to use anything else to decide which model to use.
mcc(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of mcc
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
Confidence interval
SDeMo.ci Function
ci(C::Vector{ConfusionMatrix}, f)
Applies f
to all confusion matrices in the vector, and returns the 95% CI.
ci(C::Vector{ConfusionMatrix})
Applies the MCC (mcc
) to all confusion matrices in the vector, and returns the 95% CI.
Aliases
SDeMo.specificity Function
specificity(M::ConfusionMatrix)
Alias for tnr
, the true negative rate
specificity(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of specificity
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.sensitivity Function
sensitivity(M::ConfusionMatrix)
Alias for tpr
, the true positive rate
sensitivity(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of sensitivity
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.recall Function
recall(M::ConfusionMatrix)
Alias for tpr
, the true positive rate
recall(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of recall
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.
SDeMo.precision Function
precision(M::ConfusionMatrix)
Alias for ppv
, the positive predictive value
precision(C::Vector{ConfusionMatrix}, full::Bool=false)
Version of precision
using a vector of confusion matrices. Returns the mean, and when the second argument is true
, returns a tuple where the second argument is the CI.