Split

Generates a list of <code>length(tau)</code> non-overlapping sets of observation
IDs.

In stability selection (N Meinshausen, P Bühlmann (2010) <doi:10.1111/j.1467-9868.2010.00740.x>) and consensus clustering (S Monti et al (2003) <doi:10.1023/A:1023949509487>), resampling techniques are used to enhance the reliability of the results. In this package (B Bodinier et al (2025) <doi:10.18637/jss.v112.i05>), hyper-parameters are calibrated by maximising model stability, which is measured under the null hypothesis that all selection (or co-membership) probabilities are identical (B Bodinier et al (2023a) <doi:10.1093/jrsssc/qlad058> and B Bodinier et al (2023b) <doi:10.1093/bioinformatics/btad635>). Functions are readily implemented for the use of LASSO regression, sparse PCA, sparse (group) PLS or graphical LASSO in stability selection, and hierarchical clustering, partitioning around medoids, K means or Gaussian mixture models in consensus clustering.

Barbara Bodinier

sharp

Stability-enHanced Approaches using Resampling Procedures

Split function

<dl><dt>data</dt>
<dd>vector or matrix of data. In regression, this should be the
outcome data.</dd>
<dt>family</dt>
<dd>type of regression model. This argument is defined as in
<code>glmnet</code>. Possible values include <code>"gaussian"</code>
(linear regression), <code>"binomial"</code> (logistic regression),
<code>"multinomial"</code> (multinomial regression), and <code>"cox"</code> (survival
analysis).</dd>
<dt>tau</dt>
<dd>vector of the proportion of observations in each of the sets.</dd></dl>

Arguments

Splitting observations into non-overlapping sets — Split

<dl>

<dt>data</dt>
<dd>vector or matrix of data. In regression, this should be the
outcome data.</dd>


<dt>family</dt>
<dd>type of regression model. This argument is defined as in
<code>glmnet</code>. Possible values include <code>"gaussian"</code>
(linear regression), <code>"binomial"</code> (logistic regression),
<code>"multinomial"</code> (multinomial regression), and <code>"cox"</code> (survival
analysis).</dd>


<dt>tau</dt>
<dd>vector of the proportion of observations in each of the sets.</dd>

</dl>

Split: Splitting observations into non-overlapping sets

Description

Usage

Value

Arguments

Details

Examples