dir.ped
, categorizes genotype data into 3 levels, 1, 2, 3. Genos with two different Alleles are encoded as "2". Other genotypes are encoded as "1" or "3", where most frequent geno is "1". No missing values allowed, must be done after imputation. Geno values should use letters A, T, C, G if letter.encoding=TRUE.
pre5.genos2numeric.batch(dir.ped, dir.dat = dir.ped, dir.out, prefix.ped,
prefix.dat, key.ped = "", key.dat = "", ending.ped = ".txt", ending.dat = ".dat",
num.nonsnp.col = 2, num.nonsnp.last.col = 1, letter.encoding = TRUE,
ped.has.ext = TRUE, dat.has.ext = TRUE, remove.bad.genos = FALSE,
save.ids.name = "patients.fam")
num.nonsnp.col=5
, for PLINK it is 6 (due to extra disease status column).
file.ped
name has a filename extension (ex. ".ped", ".txt"). This is necessary for naming the output file.
file.ped
, (otherwise we do not want to remove some SNPs from CASE but not from CONTROL and generate two different .dat files).
The following files will be produced for each chromosome in the directory dir.ped
:
-_num - in \code{\var{dir.out}} directory, the resultant binary file: the SNP columns + last columns (but no user IDs will be recorded), where is the filename extension of file.ped. - _num.dat - in dir.out directory, the corresponding .dat file, will be different from original if remove.bad.genos=TRUE. - - the patient IDs, if save.ids.name is not empty "".
pre4.combine.case.control
,
pre4.combine.case.control.batch
,
pre5.genos2numeric
print("See the demo 'gendemo'.")
Run the code above in your browser using DataLab