identify_vcf_file

Input vcf file. Only one sample column allowed.

vcf_file

Path of the output file. If blank, 
autogenerated as name of input file plus '_uniquorn_ident.tab' suffix.

output_file

Reference genome version. All training sets are 
associated with a reference genome version. Default: GRCH37

ref_gen

The minimum amount of mutations that 
has to match between query and training sample for a positive prediction

minimum_matching_mutations

Include only mutations 
with a weight of at least x. Range: 0.0 to 1.0. 1= unique to CL. 
~0 = found in many CL samples.

mutational_weight_inclusion_threshold

Only the CL identifier with highest 
score is predicted to be present in the sample

only_first_candidate

Create identification results additionally 
as xls file for easier reading

write_xls

If BED files for IGV visualization should be 
created for the Cancer Cell lines that pass the threshold

output_bed_file

Manually enter a vector of CL 
name(s) whose bed files should be created, independently from 
them passing the detection threshold

manual_identifier_bed_file

verbose

p_value

q_value

Threshold above which a positive prediction occurs
default 25.0

confidence_score


Identifies a cancer cell lines contained in a vcf file based 
on the pattern (start & length) of all contained mutations/ variations.


This packages enables users to identify cancer cell lines. Cancer cell line
misidentification and cross-contamination reprents a significant challenge for
cancer researchers. The identification is vital and in the frame of this package based
on the locations/ loci of somatic and germline mutations/ variations.
The input format is vcf/ vcf.gz and the files have to contain a single cancer cell line sample
(i.e. a single member/genotype/gt column in the vcf file).
The implemented method is optimized for the Next-generation whole exome and whole genome DNA-sequencing
technology.

identify_vcf_file: identify_VCF_file

Description

Usage

Arguments

Value

Details

Examples