Learn R Programming

GRAB (version 0.2.2)

GRAB.NullModel: Fit a null model to estimate parameters and residuals

Description

We fit a null model including response variable, covariates, and Genetic Relationship Matrix (GRM, if needed) to estimate parameters and residuals.

Usage

GRAB.NullModel(
  formula,
  data = NULL,
  subset = NULL,
  subjData,
  method = "SPACox",
  traitType = "time-to-event",
  GenoFile = NULL,
  GenoFileIndex = NULL,
  SparseGRMFile = NULL,
  control = NULL,
  ...
)

Value

an R object with a class of "XXXXX_NULL_Model" in which XXXXX is the 'method' used in analysis. The following elements are required for all methods.

  • N: Sample size in analysis

  • yVec: Phenotype data

  • beta: Coefficient parameters corresponding to covariates

  • subjData: Subject IDs in analysis

  • sessionInfo: Version information about R, the OS and attached or loaded packages.

  • Call: A call in which all of the specified arguments are specified by their full names.

  • time: The time when analysis is finished

  • control: The R list of control in null model fitting

Arguments

formula

a formula object, with the response on the left of a ~ operator and the covariates on the right. Do not add a column of intercept (i.e. a vector of ones) on the right. Missing values should be denoted by NA and the corresponding samples will be removed from analysis. Other values (e.g. -9, -999) will be treated as ordinary numeric values in analysis.

data

a data.frame, list or environment (or object coercible by as.data.frame to a data.frame), containing the variables in formula. Neither a matrix nor an array will be accepted.

subset

a specification of the rows to be used: defaults to all rows. This can be any valid indexing vector for the rows of data or if that is not supplied, a data frame made up of the variables used in formula.

subjData

a character vector of subject IDs. Its order should be the same as the subject order in the formula and data (before subset process).

method

a character: "SPACox" (check GRAB.SPACox), "POLMM" (check GRAB.POLMM), "SPAGE" (will be supported later), or "GATE" (will be supported later).

traitType

a character: "binary", "ordinal" (check GRAB.POLMM), "quantitative", or "time-to-event" (check GRAB.SPACox).

GenoFile

a character of genotype file. Currently, two types of genotype formats are supported: PLINK and BGEN. Check GRAB.ReadGeno for more details.

GenoFileIndex

additional index files corresponding to the GenoFile. If NULL (default), the same prefix as GenoFile is used. Check GRAB.ReadGeno for more details.

SparseGRMFile

a character of sparseGRM file. An example is system.file("SparseGRM","SparseGRM.txt",package="GRAB")

control

a list of parameters for controlling the model fitting process. For more details, please check Details section.

...

other arguments passed to or from other methods.

Details

GRAB package uses score testing which consists of two steps. In Step 1, function GRAB.NullModel fits a null model including response variable, covariates, and Genetic Relationship Matrix (GRM) if needed. In Step 2, functions GRAB.Marker and GRAB.Region perform genome-wide marker-level analysis and region-level analysis, respectively. Step 1 fits a null model to get an R object, which is passed to Step 2 for association testing. Functions of save and load can save and load the object.

GRAB package includes multiple methods which support a wide variety of phenotypes as follows.

  • POLMM: Support traitType = "ordinal". Check GRAB.POLMM for more details.

  • SPACox: Support traitType = "time-to-event" or "Residual". Check GRAB.SPACox for more details.

  • SPAmix: Support traitType = "time-to-event" or "Residual". Check GRAB.SPAmix for more details.

  • SPAGRM: Support traitType = "time-to-event" or "Residual". Check GRAB.SPAGRM for more details.

GRAB package supports both Dense and Sparse GRM to adjust for sample relatedness. If Dense GRM is used, then GenoFile is required to construct GRM. If Sparse GRM is used, then SparseGRMFile is required, whose details can be seen in getTempFilesFullGRM and getSparseGRM.

The following details are about argument control

Argument control includes a list of parameters for controlling the null model fitting process.

  • maxiter: Maximum number of iterations used to fit the null model. (default=100)

  • seed: An integer as a random seed. Used when random process is involved. (default=12345678)

  • tolBeta: Positive tolerance: the iterations converge when |beta - beta_old| / (|beta| + |beta_old| + tolBeta) < tolBeta. (default=0.001)

  • showInfo: Whether to show more detailed information for trouble shooting. (default=FALSE)

To adjust for sample relatedness, mixed effect model incorporates a random effect with a variance component. Argument control includes additional parameters to estimate the variance component.

  • tau: Initial value of the variance component (tau). (default=0.2).

  • tolTau: Positive tolerance: the iterations converge when |tau - tau_old| / (|tau| + |tau_old| + tolTau) < tolTau. (default=0.002)

If dense GRM is used to adjust for sample relatedness, GenoFile should be PLINK files and argument control includes additional parameters as follows.

  • maxiterPCG: Maximum number of iterations for PCG to converge. (default=100)

  • tolEps: Positive tolerance for PCG to converge. (default=1e-6)

  • minMafVarRatio: Minimal value of MAF cutoff to select markers (from PLINK files) to estimate variance ratio. (default=0.1)

  • maxMissingVarRatio: Maximal value of missing rate cutoff to select markers (from PLINK files) to estimate variance ratio. (default=0.1)

  • nSNPsVarRatio: Initial number of the selected markers to estimate variance ratio (default=20) the number will be automatically added by 10 until the coefficient of variantion (CV) of the variance ratio estimate is below CVcutoff.

  • CVcutoff: Minimal cutoff of coefficient of variantion (CV) to estimate variance ratio (default=0.0025)

  • LOCO: Whether to apply the leave-one-chromosome-out (LOCO) approach. (default=TRUE)

  • stackSize: Stack size (in bytes) to use for worker threads. For more details, check setThreadOptions. (default="auto")

  • grainSize: Grain size of a parallel algorithm sets a minimum chunk size for parallelization. In other words, at what point to stop processing input on separate threads. (default=1)

  • minMafGRM: Minimal value of MAF cutoff to select markers (from PLINK files) to construct dense GRM. (default=0.01)

  • memoryChunk: Size (Gb) for each memory chunk when reading in PLINK files. (default=2)

  • tracenrun: Number of runs for trace estimator. (default=30)

  • maxMissingGRM: Maximal value of missing rate to select markers (from PLINK files) to construct dense GRM. (default=0.1)

  • onlyCheckTime: Not fit the null model, only check the computation time of reading PLINK files and running 30 KinbVec() functions. (default=FALSE)

Examples

Run this code
# For POLMM method (ordinal categorical data analysis while adjusting for sample relatedness)
# Step 1(a): fit a null model using a dense GRM (recommand using Linux OS)
PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB")
PhenoData <- read.table(PhenoFile, header = TRUE)
GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB")

# Limit threads for CRAN checks (optional for users).
Sys.setenv(RCPP_PARALLEL_NUM_THREADS = 2)

obj.POLMM <- GRAB.NullModel(
  factor(OrdinalPheno) ~ AGE + GENDER,
  data = PhenoData,
  subjData = IID,
  method = "POLMM",
  traitType = "ordinal",
  GenoFile = GenoFile,
  control = list(showInfo = FALSE, LOCO = FALSE, tolTau = 0.2, tolBeta = 0.1)
)

names(obj.POLMM)
obj.POLMM$tau

# Step 1(b): fit a null model using a sparse GRM (recommand using Linux OS)
# First use getSparseGRM() function to get a sparse GRM file
PhenoData <- read.table(PhenoFile, header = TRUE)
GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB")
SparseGRMFile <- system.file("SparseGRM", "SparseGRM.txt", package = "GRAB")

obj.POLMM <- GRAB.NullModel(
  factor(OrdinalPheno) ~ AGE + GENDER,
  data = PhenoData,
  subjData = IID,
  method = "POLMM",
  traitType = "ordinal",
  GenoFile = GenoFile,
  SparseGRMFile = SparseGRMFile,
  control = list(showInfo = FALSE, LOCO = FALSE, tolTau = 0.2, tolBeta = 0.1)
)

names(obj.POLMM)
obj.POLMM$tau

# save(obj.POLMM, "obj.POLMM.RData")  # save the object for analysis in step 2

# For SPACox method, check ?GRAB.SPACox.
# For SPAmix method, check ?GRAB.SPAmix.
# For SPAGRM method, check ?GRAB.SPAGRM
# For WtCoxG method, check ?GRAB.WtCoxG

Run the code above in your browser using DataLab