calculate_permutation_feature_importance

This function calculates permutation feature importance for a Scikit-learn
pipeline with a trained classifier as the final step.

The workflow is a versatile R package designed for comprehensive feature selection in bulk RNAseq datasets. Its key innovation lies in the seamless integration of the 'Python' 'scikit-learn' (<https://scikit-learn.org/stable/index.html>) machine learning framework with R-based bioinformatics tools. 'GeneSelectR' performs robust Machine Learning-driven (ML) feature selection while leveraging 'Gene Ontology' (GO) enrichment analysis as described by Thomas PD et al. (2022) <doi:10.1002/pro.4218>, using 'clusterProfiler' (Wu et al., 2021) <doi:10.1016/j.xinn.2021.100141> and semantic similarity analysis powered by 'simplifyEnrichment' (Gu, Huebschmann, 2021) <doi:10.1016/j.gpb.2022.04.008>. This combination of methodologies optimizes computational and biological insights for analyzing complex RNAseq datasets.

Damir Zhakparov

GeneSelectR

'GeneSelectR' - Comprehensive Feature Selection Workflow for
Bulk RNAseq Datasets

calculate_permutation_feature_importance function

<dl><dt>pipeline</dt>
<dd>A Scikit-learn pipeline object with a trained classifier as the final step.</dd>
<dt>X_train</dt>
<dd>A DataFrame containing the training data.</dd>
<dt>y_train</dt>
<dd>A DataFrame containing the training labels.</dd>
<dt>n_repeats</dt>
<dd>An integer specifying the number of times to permute each feature.</dd>
<dt>random_state</dt>
<dd>An integer specifying the seed for the random number generator.</dd>
<dt>njobs</dt>
<dd>An integer specifying number of cores to use. Set up by the master GeneSelectR function.</dd>
<dt>pipeline_name</dt>
<dd>Strings (names of the selected_pipelines list) representing pipeline names that were constructed for the feature selection</dd>
<dt>iter</dt>
<dd>An integer that is indicating current iteration of the train-test split</dd></dl>

Arguments

Calculate Permutation Feature Importance — calculate_permutation_feature_importance

<dl>

<dt>pipeline</dt>
<dd>A Scikit-learn pipeline object with a trained classifier as the final step.</dd>


<dt>X_train</dt>
<dd>A DataFrame containing the training data.</dd>


<dt>y_train</dt>
<dd>A DataFrame containing the training labels.</dd>


<dt>n_repeats</dt>
<dd>An integer specifying the number of times to permute each feature.</dd>


<dt>random_state</dt>
<dd>An integer specifying the seed for the random number generator.</dd>


<dt>njobs</dt>
<dd>An integer specifying number of cores to use. Set up by the master GeneSelectR function.</dd>


<dt>pipeline_name</dt>
<dd>Strings (names of the selected_pipelines list) representing pipeline names that were constructed for the feature selection</dd>


<dt>iter</dt>
<dd>An integer that is indicating current iteration of the train-test split</dd>

</dl>

calculate_permutation_feature_importance: Calculate Permutation Feature Importance

Description

Usage

Value

Arguments