vcf.to.sigs.input: Converts a VCF file to correct input format
Description
Given a VCF file, outputs a data frame with counts of how frequently a
mutation is found within each trinucleotide context per sample ID. Output
can be used as input into getTriContextFraction.
Usage
vcf.to.sigs.input(vcf, bsg = NULL)
Arguments
vcf
Location of the VCF file that is to be converted
bsg
Only set if another genome build is required. Must be a BSgenome
object.
Value
A data frame that contains sample IDs for the rows and trinucleotide
contexts for the columns. Each entry is the count of how many times a
mutation with that trinucleotide context is seen in the sample.
Details
The context sequence is taken from the BSgenome.Hsapiens.UCSC.hg19::Hsapiens
object, therefore the coordinates must correspond to the human hg19 assembly,
the UCSC version of the GRCh37 Homo sapiens assembly. This method will to
its best to translate chromosome names from other versions of the assembly
like NCBI or Ensembl. For instance, the following transformation will be
done: "1" -> "chr1"; "MT" -> "chrM"; "GL000245.1" -> "chrUn_gl000245"; etc.
This method relies on the VariantAnnotation package to read the VCF file.