Learn R Programming

BIGr (version 0.6.2)

madc2vcf_all: Converts MADC file to VCF recovering target and off-target SNPs

Description

This function processes a MADC file to generate a VCF file containing both target and off-target SNPs. It includes options for filtering multiallelic SNPs and parallel processing to improve performance.

Usage

madc2vcf_all(
  madc = NULL,
  botloci_file = NULL,
  hap_seq_file = NULL,
  n.cores = 1,
  rm_multiallelic_SNP = FALSE,
  multiallelic_SNP_dp_thr = 0,
  multiallelic_SNP_sample_thr = 0,
  alignment_score_thr = 40,
  out_vcf = NULL,
  verbose = TRUE
)

Value

This function does not return an R object. It writes the processed VCF file v4.3 to the specified out_vcf path.

Arguments

madc

A string specifying the path to the MADC file.

botloci_file

A string specifying the path to the file containing the target IDs designed in the bottom strand.

hap_seq_file

A string specifying the path to the haplotype database fasta file.

n.cores

An integer specifying the number of cores to use for parallel processing. Default is 1.

rm_multiallelic_SNP

A logical value. If TRUE, SNPs with more than one alternative base are removed. If FALSE, the thresholds specified by multiallelic_SNP_dp_thr and multiallelic_SNP_sample_thr are used to filter low-frequency SNP alleles. Default is FALSE.

multiallelic_SNP_dp_thr

A numeric value specifying the minimum depth by tag threshold for filtering low-frequency SNP alleles when rm_multiallelic_SNP is FALSE. Default is 0.

multiallelic_SNP_sample_thr

A numeric value specifying the minimum number of samples threshold for filtering low-frequency SNP alleles when rm_multiallelic_SNP is FALSE. Default is 0.

alignment_score_thr

A numeric value specifying the minimum alignment score threshold. Default is 40.

out_vcf

A string specifying the name of the output VCF file. If the file extension is not .vcf, it will be appended automatically.

verbose

A logical value indicating whether to print metrics and progress to the console. Default is TRUE.

Details

The function processes a MADC file to generate a VCF file containing both target and off-target SNPs. It uses parallel processing to improve performance and provides options to filter multiallelic SNPs based on user-defined thresholds. The alignment score threshold can be adjusted using the alignment_score_thr parameter. The generated VCF file includes metadata about the processing parameters and the BIGr package version. If the alignment_score_thr is not met, the corresponding SNPs are discarded.

Examples

Run this code
# Example usage:

# \donttest{
Sys.setenv("OMP_THREAD_LIMIT" = 2)

madc_file <- system.file("example_MADC_FixedAlleleID.csv", package="BIGr")
bot_file <- system.file("example_SNPs_DArTag-probe-design_f180bp.botloci", package="BIGr")
db_file <- system.file("example_allele_db.fa", package="BIGr")

#Temp location (only for example)
output_file <- tempfile()

madc2vcf_all(
  madc = madc_file,
  botloci_file = bot_file,
  hap_seq_file = db_file,
  n.cores = 2,
  rm_multiallelic_SNP = TRUE,
  multiallelic_SNP_dp_thr = 10,
  multiallelic_SNP_sample_thr = 5,
  alignment_score_thr = 40,
  out_vcf = output_file,
  verbose = TRUE
)

rm(output_file)
# }

Run the code above in your browser using DataLab