Learn R Programming

What is GBJ?

The Generalized Berk-Jones statistic was developed to perform set-based inference in genetic association studies. It is an alternative to tests such as the Sequence Kernel Association Test (SKAT), Generalized Higher Criticism (GHC), and Minimum p-value (minP).

Why use GBJ?

GBJ is a generalization of the Berk-Jones (BJ) statistic, which offers - in a certain sense - asymptotic power guarantees for detection of rare and weak signals. GBJ modifies BJ to account for correlation between factors in a set. GBJ has been demonstrated to outperform other tests when signals are moderately sparse (more precisely, when the number of signals is between d1/4 and d1/2, where d is the number of factors in the set).

Other advantages include:

  1. Analytic p-value calculation (no need for permutation inference).
  2. Can be applied to individual-level genotype data or GWAS summary statistics.
  3. No tuning parameters. Accepts standard inputs (similar to glm() function).

Example

We show a simple example for testing the association between a set of 50 SNPs (which could be, for example, from the same gene or pathway) and a binary outcome.

library(GBJ)
set.seed(1000)

# Case-control study, 1000 subjects
cancer_status <- c(rep(1,500), rep(0,500))

# We have 50 SNPs each with minor allele frequency of 0.3 in this example
genotype_data <- matrix(data=rbinom(n=1000*50, size=2, prob=0.3), nrow=1000)
age <- round( runif(n=1000, min=30, max=80) )
gender <- rbinom(n=1000, size=1, prob=0.5)     

# Fit the null model, calculate marginal score statistics for each SNP
# (asymptotically equivalent to those calculated by, for example, PLINK)
null_mod <- glm(cancer_status~age+gender, family=binomial(link="logit"))
log_reg_stats <- calc_score_stats(null_model=null_mod, factor_matrix=genotype_data, link_function="logit")

# Run the test
GBJ(test_stats=log_reg_stats$test_stats, cor_mat=log_reg_stats$cor_mat)
#> $GBJ
#> [1] 1.43984
#> 
#> $GBJ_pvalue
#> [1] 0.330911
#> 
#> $err_code
#> [1] 0

What else is in here?

We may not have convinced you that GBJ is the best option for your application. If that is the case, then you may still be interested in trying the Berk-Jones (BJ), Generalized Higher Criticism (GHC), Higher Criticism (HC), or Minimum p-value (minP) tests, which can be run with the same inputs, i.e. GHC(test_stats=score_stats, cor_mat=cor_Z) to run the GHC. We also have developed an omnibus test which information from multiple different methods. Please see the vignette for more details.

Copy Link

Version

Install

install.packages('GBJ')

Monthly Downloads

378

Version

0.5.4

License

GPL-3

Maintainer

Ryan Sun

Last Published

January 31st, 2024

Functions in GBJ (0.5.4)

surv

surv.R
minP

minP.R
qnorm_mu

qnorm_mu.R
score_stats_only

score_stats_only.R
parse_input

parse_input.R
OMNI_ss

omni_ss.R
GBJ

GBJ.R
BJ

BJ.R
GBJ_pvalue

GBJ_pvalue.R
GHC

GHC.R
GHC_objective

GHC_objective.R
HC

HC.R
OMNI_individual

omni_individual.R
calc_score_stats

calc_score_stats.R
calc_var_nonzero_mu

calc_var_mu_nonzero.R
FGFR2

Genotypes at FGFR2 SNPs for subjects from 'GBR' population in the 1000 Genomes Project.
gbr_pcs

Simulated Principal Components for 'GBR' population in the 1000 Genomes Project.
ebb_loglik

ebb_loglik.R
GBJ_objective

GBJ_objective.R
herm_poly_diff_t

herm_poly_diff_t.R
estimate_ss_cor

estimate_ss_cor.R