Learn R Programming

SeqArray (version 1.8.0)

seqApply: Apply Functions Over Array Margins

Description

Returns a vector or list of values obtained by applying a function to margins of arrays or matrices

Usage

seqApply(gdsfile, var.name, FUN, margin = c("by.variant"), as.is = c("list", "integer", "double", "character", "none"), var.index = c("none", "relative", "absolute"), ...)

Arguments

gdsfile
var.name
the variable name(s), see details
FUN
the function to be applied
margin
giving the dimension which the function will be applied over. E.g., for a matrix 1 indicates rows, 2 indicates columns
as.is
returned value: a list, an integer vector, etc
var.index
if "none", call FUN(x, ...) without variable index; if "relative" or "absolute", add an argument to the user-defined function FUN like FUN(index, x, ...) where index is an index of variant starting from 1 if margin = "by.variant": "relative" for indexing in the selection defined by seqSetFilter, "absolute" for indexing with respect to all data
...
optional arguments to FUN

Value

A vector or list of values.

Details

The variable name should be "sample.id", "variant.id", "position", "chromosome", "allele", "annotation/id", "annotation/qual", "annotation/filter", "annotation/info/VARIABLE_NAME", or "annotation/format/VARIABLE_NAME".

The algorithm is highly optimized by blocking the computations to exploit the high-speed memory instead of disk.

See Also

seqSetFilter, seqGetData, seqParallel

Examples

Run this code
# the file of GDS
gds.fn <- seqExampleFileName("gds")
# or gds.fn <- "C:/YourFolder/Your_GDS_File.gds"

# display
(f <- seqOpen(gds.fn))

# get 'sample.id
(samp.id <- seqGetData(f, "sample.id"))
# "NA06984" "NA06985" "NA06986" ...

# get 'variant.id'
head(variant.id <- seqGetData(f, "variant.id"))


# set sample and variant filters
set.seed(100)
seqSetFilter(f, sample.id=samp.id[c(2,4,6,8,10)],
	variant.id=sample(variant.id, 10))

# read multiple variables variant by variant
seqApply(f, c(geno="genotype", phase="phase", qual="annotation/id"),
	FUN=function(x) print(x), as.is="none")

# get the numbers of alleles per variant
seqApply(f, "allele",
	FUN=function(x) length(unlist(strsplit(x,","))), as.is="integer")


################################################################
# with an index of variant

seqApply(f, c(geno="genotype", phase="phase", qual="annotation/id"),
	FUN=function(index, x) { print(index); print(x); index },
	as.is="integer", var.index="relative")
# it is as the same as
which(seqGetFilter(f)$variant.sel)



################################################################
# reset sample and variant filters
seqSetFilter(f)

# calculate the frequency of reference allele,
#   a faster version could be obtained by C coding
af <- seqApply(f, "genotype", FUN=function(x) mean(x==0, na.rm=TRUE),
	as.is="double")
length(af)
summary(af)


# close the GDS file
seqClose(f)

Run the code above in your browser using DataLab