read_vcf: Data Input VCF

Description

Reads an external VCF file and creates an object of class mappoly.data

Usage

read_vcf(
  file.in,
  parent.1,
  parent.2,
  ploidy = NA,
  filter.non.conforming = TRUE,
  thresh.line = 0.05,
  min.gt.depth = 0,
  min.av.depth = 0,
  max.missing = 1,
  elim.redundant = TRUE,
  verbose = TRUE,
  read.geno.prob = FALSE,
  prob.thres = 0.95
)

Value

An object of class mappoly.data which contains a list with the following components:

ploidy: ploidy level
n.ind: number individuals
n.mrk: total number of markers
ind.names: the names of the individuals
mrk.names: the names of the markers
dosage.p1: a vector containing the dosage in parent P for all n.mrk markers
dosage.p2: a vector containing the dosage in parent Q for all n.mrk markers
chrom: a vector indicating which sequence each marker belongs. Zero indicates that the marker was not assigned to any sequence
genome.pos: Physical position of the markers into the sequence
seq.ref: Reference base used for each marker (i.e. A, T, C, G)
seq.alt: Alternative base used for each marker (i.e. A, T, C, G)
prob.thres: (unused field)
geno.dose: a matrix containing the dosage for each markers (rows) for each individual (columns). Missing data are represented by ploidy_level + 1
geno: a dataframe containing all genotypic probabilities columns for each marker and individual combination (rows). Missing data are represented by ploidy_level + 1
nphen: (unused field)
phen: (unused field)
all.mrk.depth: DP information for all markers on VCF file
chisq.pval: a vector containing p-values related to the chi-squared test of Mendelian segregation performed for all markers
kept: if elim.redundant = TRUE, holds all non-redundant markers
elim.correspondence: if elim.redundant = TRUE, holds all non-redundant markers and its equivalence to the redundant ones

Arguments

file.in: a character string with the name of (or full path to) the input file which contains the data (VCF format)
parent.1: a character string containing the name of parent 1
parent.2: a character string containing the name of parent 2
ploidy: the species ploidy (optional, it will be automatically detected)
filter.non.conforming: if TRUE (default) converts data points with unexpected genotypes (i.e. no double reduction) to 'NA'. See function segreg_poly for information on expected classes and their respective frequencies.
thresh.line: threshold used for p-values on segregation test (default = 0.05)
min.gt.depth: minimum genotype depth to keep information. If the genotype depth is below min.gt.depth, it will be replaced with NA (default = 0)
min.av.depth: minimum average depth to keep markers (default = 0)
max.missing: maximum proportion of missing data to keep markers (range = 0-1; default = 1)
elim.redundant: logical. If TRUE (default), removes redundant markers during map construction, keeping them annotated to export to the final map.
verbose: if TRUE (default), the current progress is shown; if FALSE, no output is produced
read.geno.prob: If genotypic probabilities are available (PL field), generates a probability-based dataframe (default = FALSE).
prob.thres: probability threshold to associate a marker call to a dosage. Markers with maximum genotype probability smaller than prob.thres are considered as missing data for the dosage calling purposes (default = 0.95)

Author

Gabriel Gesteira, gdesiqu@ncsu.edu

Details

This function can handle .vcf files versions 4.0 or higher. The ploidy can be automatically detected, but it is highly recommended that you inform it to check for mismatches. All individual and marker names will be kept as they are in the .vcf file.

References

Mollinari M., Olukolu B. A., Pereira G. da S., Khan A., Gemenet D., Yencho G. C., Zeng Z-B. (2020), Unraveling the Hexaploid Sweetpotato Inheritance Using Ultra-Dense Multilocus Mapping, _G3: Genes, Genomes, Genetics_. tools:::Rd_expr_doi("10.1534/g3.119.400620")

Mollinari, M., and Garcia, A. A. F. (2019) Linkage analysis and haplotype phasing in experimental autopolyploid populations with high ploidy level using hidden Markov models, _G3: Genes, Genomes, Genetics_. tools:::Rd_expr_doi("10.1534/g3.119.400378")

Examples

Run this code

# \donttest{
## Hexaploid sweetpotato: Subset of chromosome 3
fl = "https://github.com/mmollina/MAPpoly_vignettes/raw/master/data/sweet_sample_ch3.vcf.gz"
tempfl <- tempfile(pattern = 'chr3_', fileext = '.vcf.gz')
download.file(fl, destfile = tempfl)
dat.dose.vcf = read_vcf(file = tempfl, parent.1 = "PARENT1", parent.2 = "PARENT2")
print(dat.dose.vcf)
plot(dat.dose.vcf)
# }

Run the code above in your browser using DataLab