Learn R Programming

xLLiM (version 2.3)

preprocess_data: A proposition of function to process high dimensional data before running gllim, sllim or bllim

Description

The goal of preprocess_data() is to get relevant clusters for G-, S-, or BLLiM initialization, coupled with a feature selection for high-dimensional datasets. This function is an alternative to the default initialization implemented in gllim(), sllim() and bllim().

In this function, clusters are initialized with K-means, and variable selection is performed with a LASSO (glmnet) within each clusters. Then selected features are merged to get a subset variables before running any prediction method of xLLiM.

Usage

preprocess_data(tapp,yapp,in_K,...)

Value

selected.variables

Vector of the indexes of selected variables. Selection is made within clusters and merged hereafter.

clusters

Initialization clusters with k-means

Arguments

tapp

An L x N matrix of training responses with variables in rows and subjects in columns

yapp

An D x N matrix of training covariates with variables in rows and subjects in columns

in_K

Initial number of components or number of clusters

...

Other arguments of glmnet can be passed

Author

Emeline Perthame (emeline.perthame@pasteur.fr), Emilie Devijver (emilie.devijver@kuleuven.be), Melina Gallopin (melina.gallopin@u-psud.fr)

References

[1] E. Devijver, M. Gallopin, E. Perthame. Nonlinear network-based quantitative trait prediction from transcriptomic data. Submitted, 2017, available at https://arxiv.org/abs/1701.07899.

See Also

xLLiM-package, glmnet-package, kmeans

Examples

Run this code
x <- 1

Run the code above in your browser using DataLab