Learn R Programming

GRAB (version 0.2.4)

GRAB.makePlink: Convert genotype matrix to PLINK format files

Description

Converts a numeric genotype matrix to PLINK text files (PED and MAP format) for use with PLINK software and other genetic analysis tools.

Usage

GRAB.makePlink(
  GenoMat,
  OutputPrefix,
  A1 = "G",
  A2 = "A",
  CHR = NULL,
  BP = NULL,
  Pheno = NULL,
  Sex = NULL
)

Value

Character message confirming file creation location.

Arguments

GenoMat

Numeric genotype matrix (n×m) with values 0, 1, 2, or -9. Rows = subjects, columns = markers. Row and column names are required.

OutputPrefix

Output file prefix including path (without extension).

A1

Allele 1 character, usually minor/ALT allele (default: "G").

A2

Allele 2 character, usually major/REF allele (default: "A").

CHR

Chromosome numbers for markers (default: all chromosome 1).

BP

Base positions for markers (default: 1:m).

Pheno

Phenotype values for subjects (default: all missing as -9).

Sex

Sex codes for subjects (default: all coded as 1).

Details

Genotype Encoding:

  • 0, 1, 2 → copies of minor allele

  • -9 → missing genotype (coded as "00" in PED)

  • A1="G", A2="A": 0→"GG", 1→"AG", 2→"AA", -9→"00"

Output Files:

  • .ped: Pedigree file with genotype data

  • .map: Marker map file with positions

Downstream Processing:

# Convert to binary format
plink --file prefix --make-bed --out prefix

# Convert to raw format plink --bfile prefix --recode A --out prefix_raw

# Convert to BGEN format plink2 --bfile prefix --export bgen-1.2 bits=8 ref-first --out prefix_bgen

# Create BGEN index bgenix -g prefix_bgen.bgen --index

Examples

Run this code
### Step 1: simulate a numeric genotype matrix
n <- 1000
m <- 20
MAF <- 0.3
set.seed(123)
GenoMat <- matrix(rbinom(n * m, 2, MAF), n, m)
rownames(GenoMat) <- paste0("Subj-", 1:n)
colnames(GenoMat) <- paste0("SNP-", 1:m)
OutputDir <- tempdir()
outputPrefix <- file.path(OutputDir, "simuPLINK")

### Step 2(a): make PLINK files without missing genotype
GRAB.makePlink(GenoMat, outputPrefix)

### Step 2(b): make PLINK files with genotype missing rate of 0.1
indexMissing <- sample(n * m, 0.1 * n * m)
GenoMat[indexMissing] <- -9
GRAB.makePlink(GenoMat, outputPrefix)

## The following are in shell environment
# plink --file simuPLINK --make-bed --out simuPLINK
# plink --bfile simuPLINK --recode A --out simuRAW
# plink2 --bfile simuPLINK --export bgen-1.2 bits=8 ref-first --out simuBGEN
# UK Biobank use 'ref-first'
# bgenix -g simuBGEN.bgen --index


Run the code above in your browser using DataLab