Features selection and importance
Feature selection
SDeMo.VariableSelectionStrategy Type
VariableSelectionStrategy
This is an abstract type to which all variable selection types belong. The variable selection methods should define a method for variables!
, whose first argument is a model, and the second argument is a selection strategy. The third and fourth positional arguments are, respectively, a list of variables to be included, and the folds to use for cross-validation. They can be omitted and would default to no default variables, and k-fold cross-validation.
SDeMo.ForwardSelection Type
ForwardSelection
Variables are included one at a time until the performance of the models stops improving.
sourceSDeMo.BackwardSelection Type
ForwardSelection
Variables are removed one at a time until the performance of the models stops improving.
sourceSDeMo.AllVariables Type
AllVariables
All variables in the training dataset are used. Note that this also crossvalidates and trains the model.
sourceSDeMo.VarianceInflationFactor Type
VarianceInflationFactor{N}
Removes variables one at a time until the largest VIF is lower than N
(a floating point number), or the performancde of the model stops increasing. Note that the resulting set of variables may have a largest VIF larger than the threshold. See StrictVarianceInflationFactor
for an alternative.
SDeMo.StrictVarianceInflationFactor Type
StrictVarianceInflationFactor{N}
Removes variables one at a time until the largest VIF is lower than N
(a floating point number). By contrast with VarianceInflationFactor
, this approach to variable selection will not cross-validate the model, and might result in a model that is far worse than any other variable selection technique.
Feature importance
SDeMo.variableimportance Function
variableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)
Returns the importance of one variable in the model. The samples
keyword fixes the number of bootstraps to run (defaults to 10
, which is not enough!).
The keywords are passed to ConfusionMatrix
.
variableimportance(model, folds; kwargs...)
Returns the importance of all variables in the model. The keywords are passed to variableimportance
.
Variance Inflation Factor
SDeMo.stepwisevif! Function
stepwisevif!(model::SDM, limit, tr=:;kwargs...)
Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :
) is the indices to use for the VIF calculation. All keyword arguments are passed to train!
.
SDeMo.vif Function
vif(::Matrix)
Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.
sourcevif(::AbstractSDM, tr=:)
Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to :
for all points). The VIF is calculated on the de-meaned predictors.