Skip to content

Features selection and importance

Feature selection

SDeMo.VariableSelectionStrategy Type
julia
VariableSelectionStrategy

This is an abstract type to which all variable selection types belong. The variable selection methods should define a method for variables!, whose first argument is a model, and the second argument is a selection strategy. The third and fourth positional arguments are, respectively, a list of variables to be included, and the folds to use for cross-validation. They can be omitted and would default to no default variables, and k-fold cross-validation.

source
SDeMo.ForwardSelection Type
julia
ForwardSelection

Variables are included one at a time until the performance of the models stops improving.

source
SDeMo.BackwardSelection Type
julia
ForwardSelection

Variables are removed one at a time until the performance of the models stops improving.

source
SDeMo.AllVariables Type
julia
AllVariables

All variables in the training dataset are used. Note that this also crossvalidates and trains the model.

source
SDeMo.VarianceInflationFactor Type
julia
VarianceInflationFactor{N}

Removes variables one at a time until the largest VIF is lower than N (a floating point number), or the performancde of the model stops increasing. Note that the resulting set of variables may have a largest VIF larger than the threshold. See StrictVarianceInflationFactor for an alternative.

source
SDeMo.StrictVarianceInflationFactor Type
julia
StrictVarianceInflationFactor{N}

Removes variables one at a time until the largest VIF is lower than N (a floating point number). By contrast with VarianceInflationFactor, this approach to variable selection will not cross-validate the model, and might result in a model that is far worse than any other variable selection technique.

source

Feature importance

SDeMo.variableimportance Function
julia
variableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)

Returns the importance of one variable in the model. The samples keyword fixes the number of bootstraps to run (defaults to 10, which is not enough!).

The keywords are passed to ConfusionMatrix.

source
julia
variableimportance(model, folds; kwargs...)

Returns the importance of all variables in the model. The keywords are passed to variableimportance.

source

Variance Inflation Factor

SDeMo.stepwisevif! Function
julia
stepwisevif!(model::SDM, limit, tr=:;kwargs...)

Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :) is the indices to use for the VIF calculation. All keyword arguments are passed to train!.

source
SDeMo.vif Function
julia
vif(::Matrix)

Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.

source
julia
vif(::AbstractSDM, tr=:)

Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to : for all points). The VIF is calculated on the de-meaned predictors.

source