dir.ped
, does the same thing as pre5.genos2numeric.batch
, only leaves genotypes the way they are, without categorizing them into 3 levels. Removes all SNPs that have missing or bad values. Intended to be done after imputation, to ensure consistency. Geno values should use letters A, T, C, G if letter.encoding
=TRUE.
genos.clean.batch(dir.ped, dir.dat = dir.ped, dir.out, prefix.ped, prefix.dat,
key.ped = "", key.dat = "", ending.ped = ".txt", ending.dat = ".dat",
num.nonsnp.col = 2, num.nonsnp.last.col = 1, letter.encoding = TRUE,
save.ids.name = "patients.fam")
genos.clean
for all the files in the directory, so that users do not have to call that function as many times as there are chromosomes. For all the .ped files that start with prefix.ped
, contain key.ped
, and end with ending.ped
in the directory dir.ped
; and for similarly obtained .dat files, this function removes all the SNPs that have not been properly imputed by MaCH, making sure that there are no missing/strange values. This function is needed since results of MaCH might contain weird symbols (like '2' can appear instead of A, T, C, G). This is only effective when letter.encoding
= True. The reason for calling this function, and not pre5.genos2numeric
is because you might wish to call other software packages on the fully imputed data, which will not need the data categorized into 3 levels.
Outputs the following files:
_clean - in dir.out directory, the resultant file: the SNP columns + last columns (but no user IDs will be recorded). _clean.dat - in dir.out directory, the corresponding .dat file, will be different from original if any bad SNPs get removed. - the patient IDs, if save.ids.name is not empty "".
pre3.call.mach
, pre5.genos2numeric
,
pre5.genos2numeric.batch
print("See demo for pre5.genos2numeric()")
Run the code above in your browser using DataLab