BEDMatrix
is an S3 class that behaves similarly to a regular
matrix
by implementing key methods such as [
, dim
, and
dimnames
. Subsets are extracted directly and on-demand from the
binary PED file without loading the entire file into memory through memory
mapping. The subsets are coded similarly to RAW files generated with the
--recodeA
argument in PLINK: 0
indicates homozygous major
allele, 1
indicates heterozygous, and 2
indicates homozygous
minor allele.
BEDMatrix(path, n = NULL, p = NULL)
rownames
will be set to NULL
and
have to be provided manually.colnames
will be set to NULL
and have
to be provided manually.BEDMatrix
instance can be created by providing the path to the BED
file (with or without extension) as path
, the number of individuals
as n
, and the number of markers as p
. If a FAM file (which
corresponds to the first six columns of a PED file) of the same name and in
the same directory as the BED file exists, it is optional to provide
n
and the number of individuals as well as the rownames of the
BEDMatrix
will be detected automatically. The rownames will be
generated based on the IID and FID of each individual, concatenated by
_
. If a BIM file (which corresponds to the MAP file that accompanies
a PED file) of the same name and in the same directory as the BED file
exists, it is optional to provide p
and the number of markers as well
as the colnames of the BEDMatrix
will be detected automatically. The
colnames will be generated based on the SNP name and the minor allele,
concatenated by _
(similar to the colnames in a RAW file). For very
large BED file it is advised to provide n
and p
manually to
speed up object creation. In that case rownames
and colnames
will be set to NULL
and have to be specified manually.A BED file can be created from a PED file with
PLINK using plink
--file myfile --make-bed
. BED files are storage and query efficient, and can
be transformed back into the original PED file with PLINK using plink
--bfile myfile --recode
.
Internally, BEDMatrix
inherits from list
and
exposes a few attributes that should not be relied upon in actual code:
path
, dims
, dnames
, and _instance
. path
stores the path to the BED file. dims
and dnames
contain the
dimensions and dimnames of the BEDMatrix object. _instance
points to
the underlying Rcpp
module. The Rcpp
module exposes an S4 class
called BEDMatrix_
that memory maps the BED file via
Boost.Interprocess
of the BH
package.
# Create an example BEDMatrix object
m <- BEDMatrix(system.file("extdata", "example.bed", package = "BEDMatrix"))
# Get the dimensions of the example BEDMatrix object
dim(m)
# Extract a subset of the example BEDMatrix object
m[1:3, ]
Run the code above in your browser using DataLab