preprocess_data

The goal of <code>preprocess_data()</code> is to get relevant clusters for G-, S-, or BLLiM initialization, coupled with a feature selection for high-dimensional datasets. This function is an alternative to the default initialization implemented in <code>gllim()</code>, <code>sllim()</code> and <code>bllim()</code>.
In this function, clusters are initialized with K-means, and variable selection is performed with a LASSO (<code>glmnet</code>) within each clusters. Then selected features are merged to get a subset variables before running any prediction method of xLLiM.

Provides a tool for non linear mapping (non linear regression) using a mixture of regression model and an inverse regression strategy. The methods include the GLLiM model (see Deleforge et al (2015) <DOI:10.1007/s11222-014-9461-5>) based on Gaussian mixtures and a robust version of GLLiM, named SLLiM (see Perthame et al (2016) <DOI:10.1016/j.jmva.2017.09.009>) based on a mixture of Generalized Student distributions. The methods also include BLLiM (see Devijver et al (2017) <arXiv:1701.07899>) which is an extension of GLLiM with a sparse block diagonal structure for large covariance matrices (particularly interesting for transcriptomic data).

Emeline Perthame

xLLiM

High Dimensional Locally-Linear Mapping

preprocess_data function

<dl> <dt>tapp</dt>
<dd>An <code>L x N</code> matrix of training responses with variables in rows and subjects in columns</dd> <dt>yapp</dt>
<dd>An <code>D x N</code> matrix of training covariates with variables in rows and subjects in columns</dd> <dt>in_K</dt>
<dd>Initial number of components or number of clusters</dd> <dt>...</dt>
<dd>Other arguments of glmnet can be passed</dd></dl>

Arguments

Emeline Perthame (emeline.perthame@pasteur.fr), Emilie Devijver (emilie.devijver@kuleuven.be), Melina Gallopin (melina.gallopin@u-psud.fr)

Author

A proposition of function to process high dimensional data before running gllim, sllim or bllim — preprocess_data

<dl>

 <dt>tapp</dt>
<dd>An <code>L x N</code> matrix of training responses with variables in rows and subjects in columns</dd>

 <dt>yapp</dt>
<dd>An <code>D x N</code> matrix of training covariates with variables in rows and subjects in columns</dd>

 <dt>in_K</dt>
<dd>Initial number of components or number of clusters</dd>

 <dt>...</dt>
<dd>Other arguments of glmnet can be passed</dd>

</dl>

preprocess_data: A proposition of function to process high dimensional data before running gllim, sllim or bllim

Description

Usage

Value

Arguments

Author

References

See Also

Examples