Extract elements from the 'gt' slot, convert extracted genotypes to their allelic state, extract indels from the data structure or extract elements from the INFO column of the 'fix' slot.
extract.gt(x, element = "GT", mask = FALSE, as.numeric = FALSE,
return.alleles = FALSE, allele.sep = "/", extract = TRUE)extract.haps(x, mask = FALSE, gt.split = "|", verbose = TRUE)
extract.indels(x, return.indels = FALSE)
extract.info(x, element, as.numeric = FALSE, mask = FALSE)
An object of class chromR or vcfR
element to extract from vcf genotype data. Common options include "DP", "GT" and "GQ"
a logical indicating whether to apply the mask (TRUE) or return all variants (FALSE). Alternatively, a vector of logicals may be provided.
logical, should the matrix be converted to numerics
logical indicating whether to return the genotypes (0/1) or alleles (A/T)
character which delimits the alleles in a genotype (/ or |), here this is not used for a regex (as it is in other functions)
logical indicating whether to return the extracted element or the remaining string
character which delimits alleles in genotypes
should verbose output be generated
logical indicating whether to return indels or not
The function extract.gt isolates elements from the 'gt' portion of vcf data. Fields available for extraction are listed in the FORMAT column of the 'gt' slot. Because different vcf producing software produce different fields the options will vary by software. The mask parameter allows the mask to be implemented when using a chromR object. The 'as.numeric' option will convert the results from a character to a numeric. Note that if the data is not actually numeric, it will result in a numeric result which may not be interpretable. The 'return.alleles' option allows the default behavior of numerically encoded genotypes (e.g., 0/1) to be converted to their nucleic acid representation (e.g., A/T). The allele.sep parameter allows the genotype delimiter to be specified. Note that this is not used for a regular expression as similar parameters are used in other functions. Extract allows the user to extract just the specified element (TRUE) or every element except the one specified.
Note that when 'as.numeric' is set to 'TRUE' but the data are not actually numeric, unexpected results will likely occur.
The function extract.haps uses extract.gt to isolate genotypes. It then uses the information in the REF and ALT columns as well as an allele delimiter (gt_split) to split genotypes into their allelic state. Ploidy is determined by the first non-NA genotype in the first sample.
The function extract.indels is used to remove indels from SNPs. The function queries the 'REF' and 'ALT' columns of the 'fix' slot to see if any alleles are greater than one character in length. When the parameter return_indels is FALSE only SNPs will be returned. When the parameter return_indels is TRUE only indels will be returned.
The function extract.info is used to isolate elements from the INFO column of vcf data.