Features selection and importance
Feature selection
SDeMo.noselection! Function
noselection!(model, folds; verbose::Bool = false, kwargs...)Returns the model to the state where all variables are used.
All keyword arguments are passed to train!.
SDeMo.backwardselection! Function
backwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)Removes variables one at a time until the optimality measure stops increasing.
All keyword arguments are passed to crossvalidate!.
SDeMo.forwardselection! Function
forwardselection!(model, folds, pool; verbose::Bool = false, optimality=mcc, kwargs...)Adds variables one at a time until the optimality measure stops increasing. The variables in pool are added at the start.
All keyword arguments are passed to crossvalidate!.
forwardselection!(model, folds; verbose::Bool = false, optimality=mcc, kwargs...)Adds variables one at a time until the optimality measure stops increasing.
All keyword arguments are passed to crossvalidate!.
Feature importance
SDeMo.variableimportance Function
variableimportance(model, folds, variable; reps=10, optimality=mcc, kwargs...)Returns the importance of one variable in the model. The samples keyword fixes the number of bootstraps to run (defaults to 10, which is not enough!).
The keywords are passed to ConfusionMatrix.
variableimportance(model, folds; kwargs...)Returns the importance of all variables in the model. The keywords are passed to variableimportance.
Variance Inflation Factor
SDeMo.stepwisevif! Function
stepwisevif!(model::SDM, limit, tr=:;kwargs...)Drops the variables with the largest variance inflation from the model, until all VIFs are under the threshold. The last positional argument (defaults to :) is the indices to use for the VIF calculation. All keyword arguments are passed to train!.
SDeMo.vif Function
vif(::Matrix)Returns the variance inflation factor for each variable in a matrix, as the diagonal of the inverse of the correlation matrix between predictors.
vif(::AbstractSDM, tr=:)Returns the VIF for the variables used in a SDM, optionally restricting to some training instances (defaults to : for all points). The VIF is calculated on the de-meaned predictors.