impute.missing.geno: Impute missing genotypes in measured markers.

Description

This function imputes missing genotype data using weighted k nearest neighbors imputation.

Usage

impute.missing.geno(data.obj, geno.obj = NULL, 
impute.full.genome = FALSE, k = 10, ind.missing.thresh = 0, 
marker.missing.thresh = 0, 
prioritize = c("ind", "marker", "fewer"), 
max.region.size = NULL, min.region.size = NULL, 
run.parallel = TRUE, verbose = FALSE, n.cores = 2)

Arguments

data.obj

The object in which all results are stored. See read.population.

geno.obj

The object in which genotype data are stored. See read.geno.

impute.full.genome

In CAPE it is possible to scan a subset of the full genotype matrix. This argument indicates whether imputation should be done only on the subset of markers being scanned or on the entire genotype matrix.

The number of neighbors to use in k-nearest neighbors imputation.

ind.missing.thresh

A percentage. After imputation the number of data points still missing is assessed. Markers for which large numbers of individuals still have missing data will be removed. This arguement defines the percentage of missing individuals above which a marker will be removed.

marker.missing.thresh

A percentage. After imputation the number of data points still missing is assessed. Individuals for which large numbers of markers still have missing data will be removed. This arguement defines the percentage of missing markers above which an individual will be removed.

prioritize

Markers and individuals with excessive missing data after imputation will be removed. If there is one marker with very low genotyping coverage, it is preferrable to remove that marker rather than all the individuals who are missing a genotype for that marker. This argument allows prioritization of the removal of markers or individuals. The default value is "fewer," meaning that priority goes to whichever dimension (individuals or markers) results in removal of fewer elements.

max.region.size

A number indicating the maximum number of markers to consider in each imputation step. This value defaults to the maximum number of markers on one chromosome.

min.region.size

A number indicating the minimum number of markers to consider in each imputation step. This value defaults to the minimum number of markers on one chromosome.

run.parallel

A logical value indicating whether this process should be run in parallel.

verbose

A logical value indicating whether the progress of the process should be printed to the screen

n.cores

An integer specifying the number of cores to be used in parallel processing.

Value

Because both the data.obj and the geno.obj are manipulated by this function, it returns a list of two elements in which the first is the data.obj and the second is the geno.obj. These objects need to be separated to continue with subsequent functions. If no geno.obj was provided to the function, the result it just the data.obj with the genotype matrix embedded.

Description

Usage

Arguments

Value

See Also