Features selection and importance
Feature selection
SDeMo.noselection! Function
noselection!(model, folds; verbose::Bool = false, kwargs...)
Returns the model to the state where all variables are used.
All keyword arguments are passed to train!
.
SDeMo.backwardselection! Function
backwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)
Removes variables one at a time until the optimality
measure stops increasing.
All keyword arguments are passed to crossvalidate!
.
SDeMo.forwardselection! Function
forwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)
Adds variables one at a time until the optimality
measure stops increasing. The variables in pool
are added at the start.
All keyword arguments are passed to crossvalidate!
.
forwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)
Adds variables one at a time until the optimality
measure stops increasing.
All keyword arguments are passed to crossvalidate!
.
Feature importance
SDeMo.variableimportance Function
variableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)
Returns the importance of one variable in the model. The samples
keyword fixes the number of bootstraps to run (defaults to 10
, which is not enough!).
The keywords are passed to ConfusionMatrix
.
variableimportance(model, folds; kwargs...)
Returns the importance of all variables in the model. The keywords are passed to variableimportance
.
Variance Inflation Factor
SDeMo.stepwisevif! Function
stepwisevif!(model::SDM, limit, tr=:;kwargs...)
Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :
) is the indices to use for the VIF calculation. All keyword arguments are passed to train!
.
SDeMo.vif Function
vif(::Matrix)
Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.
vif(::AbstractSDM, tr=:)
Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to :
for all points). The VIF is calculated on the de-meaned predictors.