prepPed
is used to prepare a ped file for loading into GenABEL. However, GenABEL requires unique individual IDs in the file, not only unique within family. Furthermore, numeric allele coding 1,2,3,4 is not accepted. To fix this, convertPed
can be run prior to running prepPed
. This will create unique IDs and do the necessary allele recoding, and possibly also select and reorder SNPs. convertPed
will also update the corresponding map file.convertPed(ped.infile, map.infile, ped.outfile, map.outfile, create.unique.id = FALSE,
convert, snp.select = NULL, choose.lines = NULL, col.sep = " ",
ask = TRUE, blank.lines.skip = TRUE, verbose = TRUE)
ped.infile
. By default, col.sep = " " (space). To split at all types of space or blank characters, set col.sep = "[[:space:]]" or col.sep = "[[:blank:]]".convertPed
ignores blank lines in ped.infile
and map.infile
.convertPed
is the converted ped file and the modified map file.
convertPed
assumes a standard ped file as input.
The format of the ped file should look something like this:
1104 1 2 3 1 2 4 1 3 2 1 1 1104 2 0 0 1 1 4 1 2 2 4 1 1104 3 0 0 2 1 0 0 0 0 0 0 1105 1 2 3 2 2 1 1 2 2 4 1 1105 2 0 0 1 1 1 1 2 2 1 1 1105 3 0 0 2 1 1 1 3 2 4 4The column values are: Family ID, Individual ID, Father's ID, Mother's ID, Sex (1 = male, 2 = female, alternatively: 1 = male, 0 = female), and Case-control status (1 = controls, 2 = cases, alternatively: 0 = controls, 1 = cases). Column 7 and onwards contain the genotype data, with alleles in separate columns, two columns representing one SNP. A ``0'' is used to denote missing data.
The corresponding map file should look something like this:
Chromosome SNP-identifier Base-pair-position 1 RS9629043 554636 1 RS12565286 711153 1 RS12138618 740098Alternatively, the map file could contain four columns. The column values should then be: Chromosome, SNP-identifier, Genetic-distance, Base-pair-position. A header must be added to the map file if this does not already have one.
After creating unique individual IDs and recoding the SNP alleles from 1,2,3,4 to A,C,G,T (using convertPed
with options create.unique.id = TRUE
and convert = "1234_to_ACGT"
),
the ped file above should look like this:
1104 1104_1 1104_2 1104_3 1 2 T A G C A A 1104 1104_2 0 0 1 1 T A C C T A 1104 1104_3 0 0 2 1 0 0 0 0 0 0 1105 1105_1 1105_2 1105_3 2 2 A A C C T A 1105 1105_2 0 0 1 1 A A C C A A 1105 1105_3 0 0 2 1 A A G C T T
Web Site: http://folk.uib.no/gjessing/genetics/software/haplin/
lineByLine
, Haplin:::lineConvert
, snpPos
, prepPed
, convert.snp.ped
## Not run:
#
# # Create unique individual IDs and recode SNP alleles from 1,2,3,4 to A,C,G,T
# convertPed(ped.infile = "mygwas.ped", map.infile = "mygwas.map",
# ped.outfile = "mygwas_modified.ped", map.outfile = "mygwas_modified.map",
# create.unique.id = TRUE, convert = "1234_to_ACGT", ask = TRUE)
#
# ## End(Not run)
Run the code above in your browser using DataLab