g2l.proc: Procedures for global and local inference.

Description

This function performs customized fdr analyses tailored to each individual cases.

Usage

g2l.proc(X, z, X.target = NULL, z.target = NULL, m = c(4, 6), alpha = 0.1,
	nbag = NULL, nsample = length(z), lp.reg.method = "lm",
	null.scale = "QQ", approx.method = "direct", ngrid = 2000,
	centering = TRUE, coef.smooth = "BIC", fdr.method = "locfdr",
	plot = TRUE, rel.null = "custom", locfdr.df = 10,
	fdr.th.fixed = NULL, parallel = FALSE, ...)

Arguments

A $n$-by-$d$ matrix of covariate values

A length $n$ vector containing observations of z values.

X.target

A $k$-by-$d$ matrix providing $k$ sets of covariates for target cases to investigate. Set to NULL to investigate all cases and provide global inference results.

z.target

A vector of length $k$, providing the target $z$ values to investigate

An ordered pair. First number indicates how many LP-nonparametric basis to construct for each $X$, second number indicates how many to construct for $z$. Default: m=c(4,6).

alpha

Confidence level for determining signals.

nbag

Number of bags of parametric bootstrapped samples to use for each target case, each time a new set of relevance samples will be generated for analysis, and the resulting fdr curves are aggregated together by taking the mean values. Set to NULL to disable.

nsample

Number of relevance samples generated for each case. The default is the size of the input z-statistic.

lp.reg.method

Method for estimating the relevance function and its conditional LP-Fourier coefficients. We currently support three options: lm (inbuilt with subset selection), glmnet, and knn.

null.scale

Method of estimating null standard deviation from the laser samples. Available options: "IQR", "QQ" and "locfdr"

approx.method

Method used to approximate customized fdr curve, default is "direct".When set to "indirect", the customized fdr is computed by modifying pooled fdr using relevant density function.

ngrid

Number of gridpoints to use for computing customized fdr curve.

centering

Whether to perform regression-adjustment to center the data, default is TRUE.

coef.smooth

Specifies the method to use for LP coefficient smoothing (AIC or BIC). Uses BIC by default.

fdr.method

Method for controlling false discoveries (either "locfdr" or "BH"), default choice is "locfdr".

plot

Whether to include plots in the results, default is TRUE.

rel.null

How the relevant null changes with x: "custom" denotes we allow it to vary with x, and "th" denotes fixed.

locfdr.df

Degrees of freedom to use for locfdr()

fdr.th.fixed

Use fixed fdr threshold for finding signals. Default set to NULL, which finds different thresholds for different cases.

parallel

Use parallel computing for obtaining the relevance samples, mainly used for very huge nsample, default is FALSE.

...

Extra parameters to pass to other functions. Currently only supports the arguments for knn().

Value

A list containing the following items:

macro

Available when X.target set to NULL, contains the following items:

$result

A list of global inference results:

Matrix of covariates, same as input X.

Vector of observations, same as input z.

$probnull

A vector of length $n$, indicating how likely the observed z belongs to local null.

$signal

A binary vector of length $n$, discoveries are indicated by $1$.

plots

A list of plots for global inference:

$signal_x

A plot of signals discovered, marked in red

$dps_xz

A scatterplot of z on x, colored based on the discovery propensity scores, only available when fdr.method = "locfdr".

$dps_x

A scatterplot of discovery propensity scores on x, only available when fdr.method = "locfdr".

micro

Available when X.target are provided with values, contains the following items:

$result

Customized estimates for null probabilities for target $X$ and $z$

$result$signal

A binary vector of length $k$, discoveries in the target cases are indicated by $1$

$global

Pooled global estimates for null probabilities for target $X$ and $z$

$plots

Customized fdr plots for the target cases.

m.lp

Same as input m

References

Mukhopadhyay, S., and Wang, K (2021) "On The Problem of Relevance in Statistical Inference". <arXiv:2004.09588>

Examples

Run this code

# NOT RUN {
data(funnel)
X<-funnel$x
z<-funnel$z
##macro-inference using locfdr and LASER:
g2l_macro<-g2l.proc(X,z)
g2l_macro$macro$plots

#Microinference for the DTI data: case A with x=(18,55) and z=3.95
data(data.dti)
X<- cbind(data.dti$coordx,data.dti$coordy)
z<-data.dti$z
g2l_x<-g2l.proc(X,z,X.target=c(18,55),z.target=3.95,nsample =3000)
g2l_x$micro$plots$fdr.1+ggplot2::coord_cartesian(xlim=c(0,4))
g2l_x$micro$result[4]
# }

Run the code above in your browser using DataLab