Learn R Programming

MFSIS (version 0.3.0)

CSIS: Model-Free Feature screening Based on Concordance Index Statistic

Description

A model-free and data-adaptive feature screening method for ultrahigh-dimensional data and even survival data. The proposed method is based on the concordance index which measures concordance between random vectors even if one of the vectors is a survival object Surv. This rank correlation based method does not require specifying a regression model, and applies robustly to data in the presence of censoring and heavy tails. It enjoys both sure screening and rank consistency properties under weak assumptions.

Usage

CSIS(X, Y, nsis = (dim(X)[1])/log(dim(X)[1]))

Value

the labels of first nsis largest active set of all predictors

Arguments

X

The design matrix of dimensions n * p. Each row is an observation vector.

Y

The response vector of dimension n * 1. For survival models, Y should be an object of class Surv, as provided by the function Surv() in the package survival.

nsis

Number of predictors recruited by CSIS. The default is n/log(n).

Author

Xuewei Cheng xwcheng@hunnu.edu.cn

References

Cheng X, Li G, Wang H. The concordance filter: an adaptive model-free feature screening procedure[J]. Computational Statistics, 2023: 1-24.

Examples

Run this code

## Scenario 1  generate complete data
n <- 100
p <- 200
rho <- 0.5
data <- GendataLM(n, p, rho, error = "gaussian")
data <- cbind(data[[1]], data[[2]])
colnames(data)[1:ncol(data)] <- c(paste0("X", 1:(ncol(data) - 1)), "Y")
data <- as.matrix(data)
X <- data[, 1:(ncol(data) - 1)]
Y <- data[, ncol(data)]
A1 <- CSIS(X, Y, n / log(n))
A1

## Scenario 2  generate survival data
library(survival)
n <- 100
p <- 200
rho <- 0.5
data <- GendataCox(n, p, rho)
data <- cbind(data[[1]], data[[2]], data[[3]])
colnames(data)[ncol(data)] <- c("status")
colnames(data)[(ncol(data) - 1)] <- c("time")
colnames(data)[(1:(ncol(data) - 2))] <- c(paste0("X", 1:(ncol(data) - 2)))
data <- as.matrix(data)
X <- data[, 1:(ncol(data) - 2)]
Y <- Surv(data[, (ncol(data) - 1)], data[, ncol(data)])
A2 <- CSIS(X, Y, n / log(n))
A2

Run the code above in your browser using DataLab