This function can be applied to reduce the surrogate variables in a forest that is created by getTreeranger, addLayer and getSurrogates functions. Hence, it can be applied to the forests that were used for surrogate minimal depth variable importance.
Arguments
- forest
a list containing allvariables and trees. Allvariables is a vector of all variable names in the original data set (strings). Trees is a list of trees that was generated by getTreeranger, addLayer, and getSurrogates functions.
- s
number of surrogate variables in the new forest (have to be less than in the RF in trees)
Examples
# \donttest{
data("SMD_example_data")
###### use result of SMD variable importance and reduce surrogate variables to 10
# select variables with smd variable importance (usually more trees are needed)
set.seed(42)
res <- var.select.smd(
x = as.data.frame(SMD_example_data[, 2:ncol(SMD_example_data)]),
y = SMD_example_data[, 1],
s = 100,
num.trees = 10,
num.threads = 1
)
forest.new <- reduce.surrogates(forest = res$forest, s = 10)
# execute SMD on tree with reduced number of surrogates
res.new <- var.select.smd(
forest = forest.new,
num.threads = 1
)
res.new$var
#> [1] "X2" "X3" "X4" "X5" "X6" "X8" "cp1_1"
#> [8] "cp1_4" "cp1_5" "cp1_6" "cp1_8" "cp1_9" "cp2_2" "cp2_3"
#> [15] "cp2_4" "cp2_5" "cp2_10" "cp3_3" "cp3_4" "cp3_5" "cp3_6"
#> [22] "cp7_1" "cp8_2" "cp8_5" "cgn_4" "cgn_19" "cgn_47" "cgn_49"
#> [29] "cgn_62" "cgn_75" "cgn_121"
#' # investigate variable relations
rel <- var.relations(
forest = forest.new,
variables = c("X1", "X7"),
candidates = res$forest[["allvariables"]][1:100],
t = 5,
num.threads = 1
)
rel$var
#> $X1
#> [1] "cp1_1" "cp1_2" "cp1_3" "cp1_4" "cp1_5" "cp1_6" "cp1_7" "cp1_8"
#> [9] "cp1_9" "cp1_10" "cp8_1"
#>
#> $X7
#> [1] "cp7_1" "cp7_2" "cp7_3" "cp7_4" "cp7_5" "cp7_6" "cp7_7" "cp7_8"
#> [9] "cp7_9" "cp7_10"
#>
# }