Learn R Programming

MFKnockoffs (version 0.9.1)

MFKnockoffs.stat.random_forest: Random forest statistics for MFKnockoffs

Description

Computes the difference statistic $$W_j = |Z_j| - |\tilde{Z}_j|$$ where \(Z_j\) and \(\tilde{Z}_j\) are the random forest feature importances of the jth variable and its knockoff, respectively.

Usage

MFKnockoffs.stat.random_forest(X, X_k, y, ...)

Arguments

X

original design matrix (size n-by-p)

X_k

knockoff matrix (size n-by-p)

y

response vector (length n). If a factor, classification is assumed, otherwise regression is assumed.

...

additional arguments specific to 'ranger' (see Details)

Value

A vector of statistics \(W\) (length p)

Details

This function uses the ranger package to compute variable importance measures. The importance of a variable is measured as the total decrease in node impurities from splitting on that variable, averaged over all trees. For regression, the node impurity is measured by residual sum of squares. For classification, it is measured by the Gini index.

For a complete list of the available additional arguments, see ranger.

See Also

Other statistics for knockoffs: MFKnockoffs.stat.forward_selection, MFKnockoffs.stat.glmnet_coef_difference, MFKnockoffs.stat.glmnet_lambda_difference, MFKnockoffs.stat.lasso_coef_difference_bin, MFKnockoffs.stat.lasso_coef_difference, MFKnockoffs.stat.lasso_lambda_difference_bin, MFKnockoffs.stat.lasso_lambda_difference, MFKnockoffs.stat.sqrt_lasso, MFKnockoffs.stat.stability_selection

Examples

Run this code
# NOT RUN {
p=100; n=200; k=15
mu = rep(0,p); Sigma = diag(p)
X = matrix(rnorm(n*p),n)
nonzero = sample(p, k)
beta = 3.5 * (1:p %in% nonzero)
y = X %*% beta + rnorm(n)

knockoffs = function(X) MFKnockoffs.create.gaussian(X, mu, Sigma)
# Basic usage with default arguments
result = MFKnockoffs.filter(X, y, knockoffs=knockoffs, 
                           statistic=MFKnockoffs.stat.random_forest)
print(result$selected)

# Advanced usage with custom arguments
foo = MFKnockoffs.stat.random_forest
k_stat = function(X, X_k, y) foo(X, X_k, y, nodesize=5)
result = MFKnockoffs.filter(X, y, knockoffs=knockoffs, statistic=k_stat)
print(result$selected)

# }

Run the code above in your browser using DataLab