formSampleMatrixFromRawGDCData: Form sample matrix from GDC copy number data files.
Description
Reads a GDC segmetnation files, adds sample information, and forms a data matrix of samples and bins of a specified size.
Arguments
tcga_files
GDC files to be read
format
file format, TCGA or TARGET.
binsize
the binsize, in base pairs (default 1Mb or 1e6). This value provides a good balance of resolution and speed with memory sensitive applications.
freadskip
the number of lines to skip in the GDC files, typically 14 (the first 13 lines are metadata and the first is a blank line in NBL data). Adjust as needed.
debug
debug mode enable (allows specific breakpoints to be checked).
chromosomes
A vector of chromosomes to be used. Defaults to chr1-chrX,
but others can be added e.g. chrY or chrM for Y chromosome or mitochondrial DNA.
Format expected is a character vector, e.g. c("chr1", "chr2", "chr3").
sample_pat
Pattern used to extract sample name from filename.
Use "" to use the filename.
sample_col
The name of the sample column (for custom format input).
chrlabel
The name of the chromosome column (for custom format input).
startlabel
The name of the start column (for custom format input).
endlabel
The name of the end column (for custom format input).
Value
A dataframe containing the aggregated copy number values,
based on the parameters provided.