Usage
pre5.genos2numeric(file.ped, dir.ped, file.dat, dir.dat = dir.ped, dir.out,
num.nonsnp.col = 2, num.nonsnp.last.col = 1, letter.encoding = TRUE,
ped.has.ext = TRUE, dat.has.ext = TRUE, remove.bad.genos = FALSE,
save.ids.name = "")
Arguments
file.ped
The name of file with genotypes, after imputation.
dir.ped
The name of directory where file.ped
can be found.
file.dat
The .dat file, should be tab separated, and no header.
dir.dat
The name of directory where file.dat
can be found. Defaults to dir.ped
.
dir.out
The name of output directory to which resulting file should be saved. The file will be named "Num.".
num.nonsnp.col
The number of leading columns in the .ped files that do not contain SNP values. The first columns of the file represent non-SNP values (like patient ID, gender, etc). For MaCH1 input format, the num.nonsnp.col=5
, for PLINK it is 6 (due to extra disease status column).
num.nonsnp.last.col
The number of last columns that do not correspond to geno values. Ex. If last column is the disease status (0s and 1s), then set this variable to 1. If 2 last columns correspond to confounding variables, set the variable to 2.
letter.encoding
Flag whether or not the ecoding used for Alleles is letters (A, C, T, G). If True, then does additional check for Alleles corresponding to the letters, and prints out warning messages if other symbols appear instead.
ped.has.ext
Flag whether or not file.ped
name has a filename extension (ex. ".ped", ".txt"). This is necessary for naming the output file.
dat.has.ext
Flag whether or not file.dat name has a filename extension (ex. ".dat", ".txt").
remove.bad.genos
Flag whether or not you want to remove a geno if at least one of its values is not valid (ex. "2" when only letters are expected, or "NA", etc). Warning: set this to TRUE only if the CASE and CONTROLs have been merged into the file.ped
, (otherwise we do not want to remove some SNPs from CASE but not from CONTROL and generate two different .dat files).
save.ids.name
The file name to which patient IDs should be saved. If not empty, then will save IDs of patients into another file with this name. Since dataset is generally split across many files, one chromosome each, the patient IDs should be the same across these files, thus it is enough to extract the patient ID ONCE, when running this code on the smallest chromosome. For runs on all other chromosomes, leave save.ids.name="" to save time and avoid redundant work. Could name output file as "patients.fam".