Learn R Programming

idiffomix (version 1.0.0)

data_transformation: The function to filter, normalize and transform RNA-Seq and methylation data.

Description

The raw RNA-Seq and methylation data needs to be filtered, normalized and transformed before applying the idiffomix method.

Usage

data_transformation(seq_data, meth_data, gene_chr, N = 5)

Value

The function returns a list with two dataframes containing the transformed gene expression and methylation array data:

  • seq_transformed - A dataframe containing the log-fold change for gene expression data.

  • meth_transformed - A dataframe containing the differences in M-values for methylation data.

Arguments

seq_data

A dataframe of dimension \(G \times (N+1)\) containing raw RNA-Seq data for all G genes and N patients

meth_data

A dataframe of dimension \(C \times (N+2)\) containing beta methylation values for all $C$ CpG sites and $N$ patients along with the associated genes for each CpG site.

gene_chr

A dataframe containing the genes and their corresponding chromosome number.

N

Number of patients

Details

The RNA-Seq data consisted of raw counts depicting the gene expression levels. To ensure data quality, only genes whose sum of expression counts across both biological conditions was > 5 are retained. The data were normalized to account for differences in library sizes. The normalized count data were used to obtain CPM values which were further log-transformed to obtain log-CPM values. Given the paired design of the motivating setting, the log-fold changes between the tumour and benign samples were calculated for each gene in every patient and used in the subsequent analyses. For the methylation array data, the beta values at the CpG sites are logit transformed to M-values. Similar to the RNA-Seq data, given the paired design, the difference in M-values between tumour and benign samples were calculated for each CpG site in every patient and used in the subsequent analyses.

Examples

Run this code
N <- 4
data_output = data_transformation(seq_data=gene_expression_data,
                                  meth_data=methylation_data,
                                  gene_chr=gene_chromosome_data,
                                  N=N)

Run the code above in your browser using DataLab