read_epigenomic_data:
Reading epigenomic data from epimark file
Description
Function to read epimark file from disk and generate data.frame
instance. It is used to read epigenomic data from file on disk and
generate the input data.frame instance to fuel the model training,
prediction and other following steps.
Epimark file is a tab-separated file with a header. The first four
columns are "chr", "start", "end" and "id", specifying the
chromosome, start, end and id of regions. Each of the remaining
columns contain values of one epigenetic mark in one sample
(condition, cell or tissue type, etc) and the column name follows
"MARK_SAMPLE" format, such as "H3K4me1_mESC".
data.frame generated by reading data information file specifying the
samples and marks used in the analysis. The data.frame includes
at least two columns named "sample" and "mark", corresponding to
the samples and marks included.
epimark_file
name of epimark file
query_sample
name of the target sample
ref_sample
a vector of names of the reference sample(s)
incl_dev
logical value indicates whether to calculate the intensity deviation
feature. Intensity deviation is defined as the intensity in target
sample subtracted by the mean intensity in reference samples
(i.e. reference epigenome) and it captures the tissue-specificity of
each epigenetic mark.
Value
data.frame instance containing intensity and intensity deviation
values of each mark for each region