Learn R Programming

GRAB (version 0.2.3)

GRAB.NullModel: Fit a null model to estimate parameters and residuals

Description

Fits a null model that includes response variables, covariates, and optionally a Genetic Relationship Matrix (GRM) to estimate model parameters and residuals for subsequent association testing.

Usage

GRAB.NullModel(
  formula,
  data,
  subset = NULL,
  subjData,
  method,
  traitType,
  GenoFile = NULL,
  GenoFileIndex = NULL,
  SparseGRMFile = NULL,
  control = NULL,
  ...
)

Value

A list object with class "XXXXX_NULL_Model" where XXXXX is the specified method. The returned object contains the following components:

N

Sample size in analysis

yVec

Phenotype data vector

beta

Coefficient parameters corresponding to covariates

subjData

Subject IDs included in analysis

sessionInfo

Version information about R, OS, and attached packages

Call

Function call with all specified arguments by their full names

time

Timestamp when analysis was completed

control

Control parameters used for null model fitting

tau

Estimated variance components (if using mixed models)

SparseGRM

Sparse genetic relationship matrix (if specified)

This object serves as input for downstream association testing with GRAB.Marker and GRAB.Region.

Arguments

formula

A formula object with the response on the left of a ~ operator and covariates on the right. Do not include an intercept term (i.e., a vector of ones) on the right side. Missing values should be coded as NA and corresponding samples will be excluded from analysis. Other values (e.g., -9, -999) will be treated as ordinary numeric values.

data

A data.frame, list, or environment (or object coercible by as.data.frame to a data.frame) containing the variables in the formula. Neither a matrix nor an array will be accepted.

subset

A specification of the rows to be used; defaults to all rows. This can be any valid indexing vector for the rows of data, or if data is not supplied, a data frame made up of the variables used in the formula.

subjData

A character vector of subject IDs. The order should match the subject order in the formula and data (before any subset processing).

method

A character string specifying the statistical method: "POLMM" (see GRAB.POLMM), "SPACox" (see GRAB.SPACox), "SPAmix" (see GRAB.SPAmix), or "WtCoxG" (see GRAB.WtCoxG).

traitType

A character string specifying the trait type: "binary", "ordinal", "quantitative", or "time-to-event".

GenoFile

A character string specifying the genotype file path. Currently, two genotype formats are supported: PLINK and BGEN. See GRAB.ReadGeno for details.

GenoFileIndex

Additional index files corresponding to GenoFile. If NULL (default), the same prefix as GenoFile is used. See GRAB.ReadGeno for details.

SparseGRMFile

A character string specifying the sparse GRM file path. An example is system.file("SparseGRM","SparseGRM.txt",package="GRAB").

control

A list of parameters for controlling the model fitting process. See the Details section for comprehensive information.

...

Additional arguments passed to or from other methods.

Details

The GRAB package uses score testing which consists of two steps:

  1. GRAB.NullModel fits a null model including response variable, covariates, and Genetic Relationship Matrix (GRM) if needed

  2. GRAB.Marker and GRAB.Region perform genome-wide marker-level analysis and region-level analysis, respectively

Step 1 fits a null model to get an R object, which is passed to Step 2 for association testing. Functions save and load can save and load this object.

GRAB package includes multiple methods which support a wide variety of phenotypes as follows.

  • POLMM: Support traitType = "ordinal". Check GRAB.POLMM for more details.

  • SPACox: Support traitType = "time-to-event" or "Residual". Check GRAB.SPACox for more details.

  • SPAmix: Support traitType = "time-to-event" or "Residual". Check GRAB.SPAmix for more details.

  • WtCoxG: Support traitType = "time-to-event". Check GRAB.WtCoxG for more details.

The GRAB package supports both Dense and Sparse GRM to adjust for sample relatedness. If Dense GRM is used, then GenoFile is required to construct GRM. If Sparse GRM is used, then SparseGRMFile is required. See getTempFilesFullGRM and getSparseGRM for details.

Control Parameters

The control argument includes a list of parameters for controlling the null model fitting process:

Basic Parameters:

maxiter

Maximum number of iterations used to fit the null model (default: 100)

seed

Random seed for reproducible results (default: 12345678)

tolBeta

Tolerance for fixed effects convergence: |beta - beta_old| / (|beta| + |beta_old| + tolBeta) < tolBeta (default: 0.001)

showInfo

Whether to show detailed information for troubleshooting (default: FALSE)

Variance Component Parameters:

tau

Initial value of the variance component (default: 0.2)

tolTau

Tolerance for variance component convergence: |tau - tau_old| / (|tau| + |tau_old| + tolTau) < tolTau (default: 0.002)

Dense GRM Parameters (when using PLINK files):

maxiterPCG

Maximum iterations for Preconditioned Conjugate Gradient (default: 100)

tolEps

Tolerance for PCG convergence (default: 1e-6)

minMafVarRatio

Minimum MAF for markers used in variance ratio estimation (default: 0.1)

maxMissingVarRatio

Maximum missing rate for markers used in variance ratio estimation (default: 0.1)

nSNPsVarRatio

Initial number of markers for variance ratio estimation (default: 20)

CVcutoff

Maximum coefficient of variation for variance ratio estimation (default: 0.0025)

LOCO

Whether to apply leave-one-chromosome-out approach (default: TRUE)

stackSize

Stack size (bytes) for worker threads (default: "auto")

grainSize

Minimum chunk size for parallelization (default: 1)

minMafGRM

Minimum MAF for markers used in dense GRM construction (default: 0.01)

memoryChunk

Memory chunk size (GB) when reading PLINK files (default: 2)

tracenrun

Number of runs for trace estimator (default: 30)

maxMissingGRM

Maximum missing rate for markers used in dense GRM construction (default: 0.1)

onlyCheckTime

Only check computation time without fitting model (default: FALSE)

Examples

Run this code
PhenoFile <- system.file("extdata", "simuPHENO.txt", package = "GRAB")
PhenoData <- read.table(PhenoFile, header = TRUE)
GenoFile <- system.file("extdata", "simuPLINK.bed", package = "GRAB")

# Fit a null model using POLMM with a dense GRM constructed from PLINK files.
Sys.setenv(RCPP_PARALLEL_NUM_THREADS = 2) # Limit threads for CRAN checks (optional for users).

obj.POLMM <- GRAB.NullModel(
  formula = factor(OrdinalPheno) ~ AGE + GENDER,
  data = PhenoData,
  subjData = IID,
  method = "POLMM",
  traitType = "ordinal",
  GenoFile = GenoFile,
  control = list(showInfo = FALSE, LOCO = FALSE, tolTau = 0.2, tolBeta = 0.1)
)

names(obj.POLMM)

# Fit a null model using POLMM with a sparse GRM pre-calculated by getSparseGRM()
SparseGRMFile <- system.file("extdata", "SparseGRM.txt", package = "GRAB")

obj.POLMM <- GRAB.NullModel(
  formula = factor(OrdinalPheno) ~ AGE + GENDER,
  data = PhenoData,
  subjData = IID,
  method = "POLMM",
  traitType = "ordinal",
  GenoFile = GenoFile,
  SparseGRMFile = SparseGRMFile,
  control = list(showInfo = FALSE, LOCO = FALSE, tolTau = 0.2, tolBeta = 0.1)
)

# Save the null model object for downstream analysis
OutputFile <- file.path(tempdir(), "objPOLMMnull.RData")
save(obj.POLMM, file = OutputFile)

# For SPACox method, check ?GRAB.SPACox
# For SPAmix method, check ?GRAB.SPAmix
# For SPAGRM method, check ?GRAB.SPAGRM
# For WtCoxG method, check ?GRAB.WtCoxG

Run the code above in your browser using DataLab