split_data

Split Data into Training and Test Sets

The workflow is a versatile R package designed for comprehensive feature selection in bulk RNAseq datasets. Its key innovation lies in the seamless integration of the 'Python' 'scikit-learn' (<https://scikit-learn.org/stable/index.html>) machine learning framework with R-based bioinformatics tools. 'GeneSelectR' performs robust Machine Learning-driven (ML) feature selection while leveraging 'Gene Ontology' (GO) enrichment analysis as described by Thomas PD et al. (2022) <doi:10.1002/pro.4218>, using 'clusterProfiler' (Wu et al., 2021) <doi:10.1016/j.xinn.2021.100141> and semantic similarity analysis powered by 'simplifyEnrichment' (Gu, Huebschmann, 2021) <doi:10.1016/j.gpb.2022.04.008>. This combination of methodologies optimizes computational and biological insights for analyzing complex RNAseq datasets.

Damir Zhakparov

GeneSelectR

'GeneSelectR' - Comprehensive Feature Selection Workflow for
Bulk RNAseq Datasets

split_data function

<dl><dt>X</dt>
<dd>A dataframe or matrix of predictors.</dd>
<dt>y</dt>
<dd>A vector of outcomes.</dd>
<dt>test_size</dt>
<dd>Proportion of the data to be used as the test set.</dd>
<dt>modules</dt>
<dd>A list containing the definitions for the Python modules and submodules.</dd></dl>

Arguments

Split Data into Training and Test Sets — split_data

<dl>

<dt>X</dt>
<dd>A dataframe or matrix of predictors.</dd>


<dt>y</dt>
<dd>A vector of outcomes.</dd>


<dt>test_size</dt>
<dd>Proportion of the data to be used as the test set.</dd>


<dt>modules</dt>
<dd>A list containing the definitions for the Python modules and submodules.</dd>

</dl>

split_data: Split Data into Training and Test Sets

Description

Usage

Value

Arguments

Examples