Learn R Programming

StratifiedMedicine (version 0.1.3)

filter_ranger: Filter: Random Forest (ranger) Variable Importance

Description

Filtering through Random Forest Variable Importance with p-values. P-values are obtained through subsampling based T-statistics, as described in Ishwaran and Lu 2017. Default is to remove variables with p-values >= 0.10. Used for continuous, binary, or survival outcomes.

Usage

filter_ranger(Y, A, X, b = 0.66, K = 200, DF2 = FALSE, FDR = FALSE,
  pval.thres = 0.1, family = "gaussian", ...)

Arguments

Y

The outcome variable. Must be numeric or survival (ex; Surv(time,cens) )

A

Treatment variable. (a=1,...A)

X

Covariate space.

b

Subsample size (n^b)

K

Number of samples (default=200)

DF2

2-DF test statistic (default=FALSE)

FDR

FDR correction for p-values (default=FALSE)

pval.thres

p-value threshold for filtering (default=0.10)

family

Outcome type ("gaussian", "binomial", "survival"), default is "gaussian"

...

Any additional parameters, not currently passed through.

Value

Filter model and variables that remain after filtering.

  • mod - Filtering model

  • filter.vars - Variables that remain after filtering (could be all)

Examples

Run this code
# NOT RUN {
library(StratifiedMedicine)

## Continuous ##
dat_ctns = generate_subgrp_data(family="gaussian")
Y = dat_ctns$Y
X = dat_ctns$X
A = dat_ctns$A

# }
# NOT RUN {
mod1 = filter_ranger(Y, A, X, K=200) # Same as default #
mod1$filter.vars
mod1$mod # summary of variable importance outputs
# }
# NOT RUN {

# }

Run the code above in your browser using DataLab