A function to impute SNP data
impute_snp_data(
obj,
X,
impute,
impute_method,
parallel,
outfile,
quiet,
seed = as.numeric(Sys.Date()),
...
)Nothing is returned, but the obj$genotypes is overwritten with the imputed version of the data
a bigSNP object (as created by read_plink_files())
A matrix of genotype data as returned by name_and_count_bigsnp
Logical: should data be imputed? Default to TRUE.
If 'impute' = TRUE, this argument will specify the kind of imputation desired. Options are:
mode (default): Imputes the most frequent call. See bigsnpr::snp_fastImputeSimple() for details.
random: Imputes sampling according to allele frequencies.
mean0: Imputes the rounded mean.
mean2: Imputes the mean rounded to 2 decimal places.
xgboost: Imputes using an algorithm based on local XGBoost models. See bigsnpr::snp_fastImpute() for details. Note: this can take several minutes, even for a relatively small data set.
Logical: should the computations within this function be run in parallel? Defaults to TRUE. See count_cores() and ?bigparallelr::assert_cores for more details.
In particular, the user should be aware that too much parallelization can make computations slower.
Optional: the name (character string) of the prefix of the logfile to be written. Defaults to 'process_plink', i.e. you will get 'process_plink.log' as the outfile.
Logical: should messages be printed to the console? Defaults to TRUE
Numeric value to be passed as the seed for impute_method = 'xgboost'. Defaults to as.numeric(Sys.Date())
Optional: additional arguments to bigsnpr::snp_fastImpute() (relevant only if impute_method = "xgboost")