Usage
pre6.merge.genos(dir.file, dir.dat = dir.file, dir.out = dir.file,
file.out = "CGEM_Breast_complete.txt", dat.out = "CGEM_Breast_complete.dat",
prefix.file, prefix.dat, key.file = "", key.dat = "", ending.file = ".txt",
ending.dat = ".dat", num.nonsnp.col = 0, num.nonsnp.last.col = 1,
weak.check = FALSE, plan = FALSE)
Arguments
dir.file
The name of directory containing files with geno information. The files in this directory must have their last column as the disease status.
dir.dat
The name of directory containing .dat files. Should be a list of geno IDs, one ID per line, no header. Defaults to same directory as dir.genos
.
dir.out
The name of directory where the two output files will go. Defaults to same directory as dir.genos
.
file.out
The name of the output file which will contain the combined geno information and the last column will be the disease status.
dat.out
The name of the output file which will contain all the corresponding SNP values.
prefix.file
The string that appears at the beginning of all the geno input file names. The file names are expected to begin with prefix.file
, and then be immediately followed by chromosome number, for example, in dir.file
directory files named like :
"cgem_breast.21.pure.txt"
"cgem_breast.5.pure.txt"
"cgem_breast.24_and_25.txt"
must have prefix="cgem_breast."
prefix.dat
The string that appears at the beginning of all the .dat file names. Similarly to prefix.file
, it must be immediately followed by the chromosome number.
key.file
Any keyword in the name of the geno file that distinguishes it from other files.
key.dat
Any keyword in the name of the .dat file that distinguishes it from other files.
ending.file
The string with which all the geno filenames end.
ending.dat
The string with which all the .dat filenames end.
num.nonsnp.col
The number of leading columns in the .ped files that do not contain SNP values. The first columns of the file represent non-SNP values (like patient ID, gender, etc). For MaCH1 input format, the num.nonsnp.col=5
, for PLINK it is 6 (due to extra disease status column).
num.nonsnp.last.col
The number of last columns that do not correspond to geno values. Ex. If last column is the disease status (0s and 1s), then set this variable to 1. If 2 last columns correspond to confounding variables, set the variable to 2.
weak.check
Since this function will try to check correspondence of the number of genos in the genos file to the .dat file, the function would expect there to be the same number of genos and .dat files. If you wish to by-pass these checks, set weak.check
=TRUE, in which case only the total final number of the resultant geno and .dat files will be checked for consistency, and only a warning message will be printed if there is a problem.
plan
Flag: if this option is TRUE, then this function will "do" nothing, but will simply print which files it plans to combine in which order, since combination step itself might take time for large files.