Learn R Programming

vcfR (version 1.0.0)

extract.gt: Extract elements from vcfR objects

Description

Extract elements from the 'gt' slot, convert extracted genotypes to their allelic state, extract indels from the data structure or extract elements from the INFO column of the 'fix' slot.

Usage

extract.gt(x, element = "GT", mask = FALSE, as.numeric = FALSE,
  return.alleles = FALSE, allele.sep = "/", extract = TRUE)

extract.haps(x, mask = FALSE, gt.split = "|", verbose = TRUE)

extract.indels(x, return.indels = FALSE)

extract.info(x, element, as.numeric = FALSE, mask = FALSE)

Arguments

x
An object of class chromR or vcfR
element
element to extract from vcf genotype data. Common options include "DP", "GT" and "GQ"
mask
a logical indicating whether to apply the mask (TRUE) or return all variants (FALSE). Alternatively, a vector of logicals may be provided.
as.numeric
logical, should the matrix be converted to numerics
return.alleles
logical indicating whether to return the genotypes (0/1) or alleles (A/T)
allele.sep
character which delimits the alleles in a genotype (/ or |), here this is not used for a regex (as it is in other functions)
extract
logical indicating whether to return the extracted element or the remaining string
gt.split
character which delimits alleles in genotypes
verbose
should verbose output be generated
return.indels
logical indicating whether to return indels or not

Details

The function extract.gt isolates elements from the 'gt' portion of vcf data. Fields available for extraction are listed in the FORMAT column of the 'gt' slot. Because different vcf producing software produce different fields the options will vary by software. The mask parameter allows the mask to be implemented when using a chromR object. The 'as.numeric' option will convert the results from a character to a numeric. Note that if the data is not actually numeric, it will result in a numeric result which may not be interpretable. The 'return.alleles' option allows the default behavior of numerically encoded genotypes (e.g., 0/1) to be converted to their nucleic acid representation (e.g., A/T). The allele.sep parameter allows the genotype delimiter to be specified. Note that this is not used for a regular expression as similar parameters are used in other functions. Extract allows the user to extract just the specified element (TRUE) or every element except the one specified.

Note that when 'as.numeric' is set to 'TRUE' but the data are not actually numeric, unexpected results will likely occur.

The function extract.haps uses extract.gt to isolate genotypes. It then uses the information in the REF and ALT columns as well as an allele delimiter (gt_split) to split genotypes into their allelic state. Ploidy is determined by the first non-NA genotype in the first sample.

The function extract.indels is used to remove indels from SNPs. The function queries the 'REF' and 'ALT' columns of the 'fix' slot to see if any alleles are greater than one character in length. When the parameter return_indels is FALSE only SNPs will be returned. When the parameter return_indels is TRUE only indels will be returned.

The function extract.info is used to isolate elements from the INFO column of vcf data.

See Also

is.polymorphic