tmle.npvi: Targeted Minimum Loss Estimation of NPVI

Description

Carries out the targeted minimum loss estimation (TMLE) of a non-parametric variable importance measure of a continuous exposure.

Usage

tmle.npvi(obs, f = identity, nMax = 30L, flavor = c("learning", 
    "superLearning"), lib = list(), nodes = 1L, cvControl = NULL, 
    family = c("parsimonious", "gaussian"), cleverCovTheta = FALSE, 
    bound = 1, B = 1e+05, trueGMu = NULL, iter = 5L, stoppingCriteria = list(mic = 0.01, 
        div = 0.01, psi = 0.1), gmin = 0.05, gmax = 0.95, mumin = quantile(f(obs[obs[, 
        "X"] != 0, "X"]), type = 1, probs = 0.01), mumax = quantile(f(obs[obs[, 
        "X"] != 0, "X"]), type = 1, probs = 0.99), verbose = FALSE, 
    tabulate = TRUE, exact = TRUE, light = TRUE)

Arguments

obs

A n x p matrix of observations, with $p \ge 3$.

Column"X"corresponds to the continuous exposure variable (e.g. DNA copy number), or "cause" in a causal model, with a reference value$x_0$equal to 0.

Value

Returns an object of class "NPVI" summarizing the different steps of the TMLE procedure. The method getHistory outputs the "history" of the procedure (see getHistory). The object notably includes the following information:
obsThe matrix of observations used to carry out the TMLE procedure. Use the method getObs to retrieve it.
psiThe TMLE of the parameter of interest. Use the method getPsi to retrieve it.
psi.sdThe estimated standard deviation of the TMLE of the parameter of interest. Use the method getPsiSd to retrieve it.

item

f
nMax
flavor
lib
nodes
cvControl
family
cleverCovTheta
bound
B
trueGMu
iter
stoppingCriteria
gmin
gmax
mumin
mumax
verbose
tabulate
exact
light

code

geometry

eqn

$(W,X,Y)$

bold

true

Details

The parameter of interest is defined as $\psi=\Psi(P)$ with $$\Psi(P) = \frac{E_P[f(X) * (\theta(X,W) - \theta(0,W))]}{E_P[f(X)^2]},$$ with $P$ the distribution of the random vector $(W,X,Y)$, $\theta(X,W) = E_P[Y|X,W]$, $0$ the reference value for $X$, and $f$ a user-supplied function such that $f(0)=0$ (e.g., $f=identity$, the default value). The TMLE procedure stops when the maximal number of iterations, iter, is reached or when at least one of the following criteria is met:

The empirical mean$P_n effIC(P_n^{k+1})$of the efficient influence curve at$P_n^{k+1}$scaled by the estimated standard deviation of the efficient influence curve at$P_n^{k+1}$is smaller, in absolute value, thanmic.

The total variation (TV) distance between P_n^k and P_n^{k+1} is smaller than div. The change between the successive values $Psi(P_n^k)$ and $Psi(P_n^{k+1})$ is smaller than psi.

References

Chambaz, A., Neuvial, P., & van der Laan, M. J. (2012). Estimation of a non-parametric variable importance measure of a continuous exposure. Electronic journal of statistics, 6, 1059--1099.

Examples

Run this code

set.seed(12345)
##
## Simulating a data set and computing the true value of the parameter
##

## Parameters for the simulation (case 'f=identity')
O <- cbind(W=c(0.05218652, 0.01113460),
           X=c(2.722713, 9.362432),
           Y=c(-0.4569579, 1.2470822))
O <- rbind(NA, O)
lambda0 <- function(W) {-W}
p <- c(0, 1/2, 1/2)
omega <- c(0, 3, 3)
S <- matrix(c(10, 1, 1, 0.5), 2 ,2)

## Simulating a data set of 200 i.i.d. observations
sim <- getSample(2e2, O, lambda0, p=p, omega=omega, sigma2=1, Sigma3=S)
obs <- sim$obs

## Adding (dummy) baseline covariates
V <- matrix(runif(3*nrow(obs)), ncol=3)
colnames(V) <- paste("V", 1:3, sep="")
obs <- cbind(V, obs)

## Caution! MAKING '0' THE REFERENCE VALUE FOR 'X'
X0 <- O[2,2]
obsC <- obs
obsC[, "X"] <- obsC[, "X"] - X0
obs <- obsC

## True psi and confidence intervals (case 'f=identity')      
sim <- getSample(1e4, O, lambda0, p=p, omega=omega, sigma2=1, Sigma3=S)
truePsi <- sim$psi

confInt0 <- truePsi + c(-1, 1)*qnorm(.975)*sqrt(sim$varIC/nrow(sim$obs))
confInt <- truePsi + c(-1, 1)*qnorm(.975)*sqrt(sim$varIC/nrow(obs))
cat("Case f=identity:
")
msg <- paste("ttrue psi is: ", signif(truePsi, 3), "", sep="")
msg <- paste(msg, "t95%-confidence interval for the approximation is: ",
             signif(confInt0, 3), "", sep="")
msg <- paste(msg, "toptimal 95%-confidence interval is: ",
             signif(confInt, 3), "", sep="")
cat(msg)

##
## TMLE procedure
##

## Running the TMLE procedure
npvi <- tmle.npvi(obs, f=identity, flavor="learning", B=5e4, nMax=10)

## Summarizing its results
npvi
setConfLevel(npvi, 0.9)
npvi

history <- getHistory(npvi)
print(round(history, 4))

hp <- history[, "psi"]
hs <- history[, "sic"]
hs[1] <- NA
ics <-  c(-1,1) %*% t(qnorm(0.975)*hs/sqrt(nrow(getObs(npvi))))

pch <- 20
ylim <- range(c(confInt, hp, ics+hp), na.rm=TRUE)

xs <- (1:length(hs))-1
plot(xs, hp, ylim=ylim, pch=pch, xlab="Iteration", ylab=expression(psi[n]),
     xaxp=c(0, length(hs)-1, length(hs)-1))
dummy <- sapply(seq(along=xs), function(x) lines(c(xs[x],xs[x]), hp[x]+ics[, x]))

abline(h=confInt, col=4)
abline(h=confInt0, col=2)

Run the code above in your browser using DataLab