file.ped
and file.dat
. MaCH1 can be run in 2 different ways: 1. with Hapmap, and 2. without Hapmap. NOTE: In this implementation, do NOT run "with Hapmap".This program first runs MaCH1 on file.ped
with Hapmap to fill in missing values for those SNPs that exist in the reference file; and then MaCH1 is run on the result without Hapmap to fill in all the remaining missing values. If no reference files ref.phase
and ref.legend
are provided, then the program runs MaCH1 without Hapmap only. To clean up any weird MaCH output, use genos.clean
or pre5.genos2numeric
.
pre3.call.mach(file.dat, file.ped, dir.file, ref.phase = "", ref.legend = "",
dir.ref = "", dir.out, out.prefix = "result", chrom.num = "", num.iters = 2,
num.subjects = 200, step2.subjects = 50, empty = "0/0", resample = FALSE,
mach.loc = "/software/mach1")
M SNP1 M SNP2- Space separated - No header - Column 1: consists of "M" - Column 2: character SNP names
p1 p1 0 0 1 C/C N/N T/C ... p2 p2 0 0 1 T/T A/C G/G ... ...- Tab separated - Alleles are separated by slash '/' (IMPORTANT!) - No header - 5 non-SNP leading columns - Col 1: sample/patient ID: some unique ID - Col 2: family ID: can be same as patient ID - Col 3 and Col 4: parents: mother/father: can all be 0 - Col 5: gender, 1-male, 2-female - Col 6+: geno information, slash separator between alleles.
file.ped
can be found.
file.phase
, obtained from same website. No zip.
ref.phase
and ref.legend
can be found.
num.subjects
> 0 then the num.subjects
will be appended to the prefix name.
genos.clean or pre5.genos2numeric
.
=>
file.ped
.
num.subjects
entries produced by previous runs of this algorithm with same file.dat
, file.ped
and num.subjects
parameters. By default, if the subjects have been sampled before, they are re-used.
file.ped
with Hapmap to fill in missing values for those SNPs that exist in the reference file; and then MaCH1 is run on the result without Hapmap to fill in all the remaining missing values. If no reference files ref.phase
and ref.legend
are provided, then the program runs MaCH1 without Hapmap only.It is recommended to avoid using Hapmap functionality in this implementation.
The MaCH1 algorithm requires 2 steps to be performed. The first step of MaCH1 will be run on num.subjects randomly chosen from the set. The file with randomly chosen individuals will be saved as file.ped.
in dir.file
directory. If the file already exists for this num.subjects, the old file will be used if resample
=F. If resample
=T then old files will be ignored, and new sampling will take place. The step1 of MaCH will only be run if resample
=T, or if the files that MaCH1 produces do not exist yet. Thus if step1 runs well, but step2 crashes, re-calling this function will not waste time on re-running step1 over again.
The second step without Hapmap takes exponentially long wrt number of subjects processed. Thus the second step will be run on bunches of subjects, step2.subjects
at a time.
A subdirectory structure for debugging will be formed in dir.out
, the directory will be named 'working'.
Two output files will be produced in dir.out
: the .ped file that will not have any missing values, will be named <out.prefix
><chrom.num
>.mlgeno, and a .dat file (same as before).
pre2.remove.genos
, pre2.remove.genos.batch
,
pre3.call.mach.batch
, pre4.combine.case.control
,
pre4.combine.case.control.batch
print("See the demo 'gendemo'.")
Run the code above in your browser using DataLab