Learn R Programming

GRAB (version 0.2.2)

GRAB.makePlink: Make PLINK files using a numeric R matrix

Description

Make PLINK files using a numeric matrix GenoMat (0,1,2,-9), rownames(GenoMat) are subject IDs and colnames(GenoMat) are marker IDs

Usage

GRAB.makePlink(
  GenoMat,
  OutputPrefix,
  A1 = "G",
  A2 = "A",
  CHR = NULL,
  BP = NULL,
  Pheno = NULL,
  Sex = NULL
)

Value

PLINK text files (PED and MAP) are stored in 'OutputPrefix'. Suppose A1 is "G" and A2 is "A", then genotype of 0,1,2,-9 will be coded as "GG", "AG", "AA", "00". If PLINK binary files (BED, BIM, and FAM) are required, please download PLINK software and use option of "--make-bed". Please check Details section for the downstream process.

Arguments

GenoMat

a numeric n*m genotype matrix (0,1,2,-9). Each row is for one subject and each column is for one marker. Row names of subject IDs and column names of marker IDs are required.

OutputPrefix

a character, prefix of the PLINK files to output (including path).

A1

a character to specify allele 1 (default="G"), usually minor (ALT).

A2

a character to specify allele 2 (default="A"), usually major (REF).

CHR

a character vector of the chromosome numbers for all markers. Default=NULL, that is, CHR=rep(1, m).

BP

a numeric vector of the base positions for all markers. Default=NULL, that is, BP=1:m).

Pheno

a character vector of the phenotypes for all subjects. Default=NULL, that is, Pheno=rep(-9, n).

Sex

a numeric vector of the sex for all subjects. Default=NULL, that is, Sex=rep(1, n)).

Details

Check link for detailed information of PLINK 2.00 alpha. Check link for detailed information of bgenix tool.

Run plink --bfile simuPLINK --recode A --out simuRAW to convert PLINK binary files (BED, BIM, and FAM) to raw files (raw).

RUN plink2 --bfile simuPLINK --export bgen-1.2 bits=8 ref-first --out simuBGEN to convert PLINK binary files (BED, BIM, and FAM) to BGEN binary files (BGEN).

Make bgi file using bgenix tool

RUN bgenix -g simuBGEN.bgen --index

Examples

Run this code
### Step 1: simulate a numeric genotype matrix
n <- 1000
m <- 20
MAF <- 0.3
set.seed(123)
GenoMat <- matrix(rbinom(n * m, 2, MAF), n, m)
rownames(GenoMat) <- paste0("Subj-", 1:n)
colnames(GenoMat) <- paste0("SNP-", 1:m)
OutputDir <- tempdir()
outputPrefix <- file.path(OutputDir, "simuPLINK")

### Step 2(a): make PLINK files without missing genotype
GRAB.makePlink(GenoMat, outputPrefix)

### Step 2(b): make PLINK files with genotype missing rate of 0.1
indexMissing <- sample(n * m, 0.1 * n * m)
GenoMat[indexMissing] <- -9
GRAB.makePlink(GenoMat, outputPrefix)

## The following are in shell environment
# plink --file simuPLINK --make-bed --out simuPLINK
# plink --bfile simuPLINK --recode A --out simuRAW
# plink2 --bfile simuPLINK --export bgen-1.2 bits=8 ref-first --out simuBGEN
# UK Biobank use 'ref-first'
# bgenix -g simuBGEN.bgen --index

Run the code above in your browser using DataLab