tmlenet: Estimate Average Network Effects For Arbitrary (Stochastic) Interventions

Description

Estimate the average network effect among dependent units with known network structure (in presence of interference and/or spillover) using TMLE (targeted maximum likelihood estimation), IPTW (Horvitz-Thompson or the inverse-probability-of-treatment) and GCOMP (parametric G-computation formula).

Usage

tmlenet(DatNet.ObsP0, data, Kmax, sW, sA, Anode, Ynode, f_gstar1,
  Qform = NULL, hform.g0 = NULL, hform.gstar = NULL, NETIDmat = NULL,
  IDnode = NULL, NETIDnode = NULL, verbose = getOption("tmlenet.verbose"),
  optPars = list(alpha = 0.05, lbound = 0.005, family = "binomial", n_MCsims =
  10, runTMLE = c("tmle.intercept", "tmle.covariate"), YnodeDET = NULL, f_gstar2
  = NULL, sep = " ", f_g0 = NULL, h_g0_SummariesModel = NULL,
  h_gstar_SummariesModel = NULL))

Arguments

DatNet.ObsP0

Instance of class DatNet.sWsA returned under the same name by the eval.summaries function. Stores the evaluated summary measures (sW,

data

Observed data, a data.frame with named columns, containing the baseline covariates, exposures (Anode), the outcomes (Ynode) and possibly the network column (NETIDnode), where network is specified by a ve

Kmax

Integer constant specifying the maximal number of friends for any observation in the input data data.frame.

Summary measures constructed from baseline covariates alone. This must be an object of class DefineSummariesClass that is returned by calling the function def.sW.

Summary measures constructed from exposures Anode and baseline covariates. This must be an object of class DefineSummariesClass that is returned by calling the function def.sW.

Anode

Exposure (treatment) variable name (column name in data); exposures can be either binary, categorical or continuous.

Ynode

Outcome variable name (column name in data), assumed normalized between 0 and 1. This can instead be specified on the left-side of the regression formula in argument Qform.

f_gstar1

Either a function or a vector of counterfactual exposures. If a function, must return a vector of counterfactual exposures evaluated based on the summary measures matrix (sW,sA) passed as a named argument "data", therefore, the f

Qform

Regression formula for outcome in Ynode, when omitted (Ynode=NULL), the outcome variable is regressed on all variables defined in sW and sA. See Details.

hform.g0

Regression formula for estimating the conditional density of P(sA | sW) under g0 (the observed exposure mechanism), when omitted, this regression is defined by sA~sW where sA are all summary

hform.gstar

Regression formula for estimating the conditional density P(sA | sW) under interventions f_gstar1 or f_gstar2. When omitted, the same regression formula as in hform.g0 will be used for

NETIDmat

Network specification via matrix of friend IDs (ncol=Kmax, nrow=nrow(data)), where each row i is a vector of i's friends IDs or i's friends row numbers in data if IDnode=N

IDnode

Subject identifier variable in the input data, if not supplied the network string in NETIDnode is assumed to be indexing the row numbers in the input data

NETIDnode

Network specification by a column name in input data consisting of strings that identify the unit's friends by their IDs or their row numbers (two friends are separated by space, e.g., "1 2"; unit with no friends should have an e

verbose

Set to TRUE to print messages on status and information to the console. Turn this on by default using options(tmlenet.verbose=TRUE).

optPars

A named list of additional optional parameters to be passed to tmlenet, such as alpha, lbound, family, n_MCsims, runTMLE, YnodeDET, f_gstar2, se

Value

A named list with 3 items containing the estimation results for:
- EY_gstar1- estimates of the mean counterfactual outcome under (stochastic) intervention functionf_gstar1$(E_{g^*_1}[Y])$.
- EY_gstar2- estimates of the mean counterfactual outcome under (stochastic) intervention functionf_gstar2$(E_{g^*_2}[Y])$,NULLiff_gstar2not specified.
- ATE- additive treatment effect ($E_{g^*_1}[Y]$-$E_{g^*_2}[Y]$) under interventionsf_gstar1vs. inf_gstar2,NULLiff_gstar2not specified.
Each list item above is itself a list containing the items:
- estimates- various estimates of the target parameter (network population counterfactual mean under (stochastic) intervention).
- vars- the asymptotic variance estimates, forIPTWandTMLE.
- CIs- CI estimates atalphalevel, forIPTWandTMLE.
- other.vars- Placeholder for future versions.
- h_g0_SummariesModel- The model fits for P(sA|sW) under observed exposure mechanismg0. This is an object ofSummariesModelR6class.
- h_gstar_SummariesModel- The model fits for P(sA|sW) under interventionf_gstar1orf_gstar2. This is an object ofSummariesModelR6class.
Currently implemented estimators are:
- tmle- Either weighted regression intercept-based TMLE (tmle.intercept- the default) with weights defined by the IPTW weightsh_gstar/h_gNor covariate-based unweighted TMLE (tmle.covariate) that uses the IPTW weights as a covariateh_gstar/h_gN.
- h_iptw- Efficient IPTW based on weights h_gstar/h_gN.
- gcomp- Parametric G-computation formula substitution estimator.

Details

Note that in case when both arguments NETIDnode and NETIDmat are left unspecified the input data are assumed independent, i.e., no network dependency between the observations. All inference will be performed based on the i.i.d. efficient influence curve for the target parameter $(E_{g^*_1}[Y])$.

Also note that the ordering of the friends IDs in NETIDnode or NETIDmat is unimportant.

A special non-negative-integer-valued variable nF is automatically calculated each time tmlenet or eval.summaries functions are called. nF contains the total number of friends for each observation and it is always added as an additional column to the matrix of the baseline-covariates-based summary measures sWmat. The variable nF can be used in the same ways as any of the column names in the input data frame data. In particular, the name nF can be used inside the summary measure expressions (calls to functions def.sW and def.sA) and inside any of the regression formulas (Qform, hform.g0, hform.gstar).

The regression formalas in Qform, hform.g0 and hform.gstar can include any summary measures names defined in sW and sA, referenced by their individual variable names or by their aggregate summary measure names. For example, hform.g0 = "netA ~ netW" is equivalent to hform.g0 = "A + A_netF1 + A_netF2 ~ W + W_netF1 + W_netF2" for sW,sA summary measures defined by def.sW(netW=W[[0:2]]) and def.sA(netA=A[[0:2]]).

Additional parameters

Some of the parameters that control the estimation in tmlenet can be set by calling the function tmlenet_options.

Additional parameters can be also specified as a named list optPars argument of the tmlenet function. The items that can be specified in optPars are:

alpha- alpha-level for CI calculation (0.05 for 95% CIs);
lbound- One value for symmetrical bounds on P(sW | sW).
family- Family specification for regression models, defaults to binomial (CURRENTLY ONLY BINOMIAL FAMILY IS IMPLEMENTED).
n_MCsims- Number of Monte-Carlo simulations performed, each of sample sizenrow(data), for generating new exposures underf_gstar1orf_gstar2(if specified) orf_g0(if specified). These newly generated exposures are utilized when fitting the conditional densities P(sA|sW) and when evaluating the substitution estimatorsGCOMPandTMLEunder stochastic interventionsf_gstar1orf_gstar2.
runTMLE- Choose which of the two TMLEs to run, "tmle.intercept" or "tmle.covariate". The default is "tmle.intercept".
YnodeDET- Optional column name for indicators of deterministic values of outcomeYnode, coded as (TRUE/FALSE) or (1/0); observations withYnodeDET=TRUE/1are assumed to have constant value for theirYnode.
f_gstar2- Either a function or a vector of counterfactual exposure assignments. Used for estimating contrasts (average treatment effect) for two interventions, if omitted, only the average counterfactual outcome under interventionf_gstar1is estimated. The requirements forf_gstar2are identical to those forf_gstar1.
sep- A character separating friend indices for the same observation inNETIDnode.
f_g0- A function for generating true treatment mechanism under observedAnode, if known (for example in a randomized trial). This is used for estimating P(sA|sW) underg0by sampling large vector ofAnode(of lengthnrow(data)*n_MCsims) fromf_g0function;
h_g0_SummariesModel- Previously fitted model for P(sA|sW) under observed exposure mechanismg0, returned by the previous runs of thetmlenetfunction. This has to be an object ofSummariesModelR6class. When this argument is specified, all predictions P(sA=sa|sW=sw) underg0will be based on the model fits provided by this argument.
h_gstar_SummariesModel- Previously fitted model for P(sA|sW) under (stochastic) intervention specified byf_gstar1orf_gstar2. Also an object ofSummariesModelR6class. When this argument is specified, the predictions P(sA=sa|sW=sw) underf_gstar1orf_gstar2will be based on the model fits provided by this argument.

Specifying the counterfactual intervention function (<code>f_gstar1</code> and <code>optPars$f_gstar2</code>)

The functions f_gstar1 and f_gstar2 can only depend on variables specified by the combined matrix of summary measures (sW,sA), which is passed using the argument data. The functions should return a vector of length nrow(data) of counterfactual treatments for observations in the input data.

Specifying the Network of Friends

The network of friends (connections) for observations in the input data can be specified in two alternative ways, using either NETIDnode or NETIDmat input arguments.

NETIDnode - The first (slower) method uses a vector of strings in data[, NETIDnode], where each string i must contain the space separated IDs or row numbers of all units in data thought to be connected to observation i (friends of unit i);

NETIDmat - An alternative (and faster) method is to pass a matrix with Kmax columns and nrow(data) rows, where each row NETIDmat[i,] is a vector of observation i's friends' IDs or i's friends' row numbers in data if IDnode=NULL. If observation i has fewer than Kmax friends, the remainder of NETIDmat[i,] must be filled with NAs. Note that the ordering of friend indices is irrelevant.

IPTW estimator

**********************************************************************

As described in the following section, the first step is to construct an estimator$P_{g_N}(sA | sW)$for the common (ini) conditional density$P_{g_0}(sA | sW)$for common (ini) unit-level summary measures (sA,sW).
The same fitting algorithm is applied to construct an estimator$P_{g^*_N}(sA^* | sW^*)$of the common (ini) conditional density$P_{g^*}(sA^* | sW^*)$for common (ini) unit-level summary measures (sA^*,sW^*) implied by the user-supplied stochastic interventionf_gstar1orf_gstar2and the observed distribution ofW.
These two density estimators form the basis of the IPTW estimator, which is evaluated at the observed N data points$O_i=(sW_i, sA_i, Y_i), i=1,...,N$and is given by$$\psi^{IPTW}_n = \sum_{i=1,...,N}{Y_i \frac{P_{g^*_N}(sA^*=sA_i | sW=sW_i)}{P_{g_N}(sA=sA_i | sW=sW_i)}}.$$

GCOMP estimator

**********************************************************************

TMLE estimator

**********************************************************************

Modeling <code>P(sA|sW)</code> for summary measures <code>(sA,sW)</code>

**********************************************************************

Non-parametric estimation of the common unit-level multivariate joint conditional probability model P_g0(sA|sW), for unit-level summary measures (sA,sW) generated from the observed exposures and baseline covariates $(A,W)=(A_i,W_i : i=1,...,N)$ (their joint density given by $g_0(A|W)Q(W)$), is performed by first constructing the dataset of N summary measures, $(sA_i,sW_i : i=1,...,N)$, and then fitting the usual i.i.d. MLE for the common density P_g0(sA|sW) based on the pooled N sample of these summary measures.

Note that sA can be multivariate and any of its components sA[j] can be either binary, categorical or continuous. The joint probability model for P(sA|sA) = P(sA[1],...,sA[k]|sA) can be factorized as P(sA[1]|sA) * P(sA[2]|sA, sA[1]) * ... * P(sA[k]|sA, sA[1],...,sA[k-1]), where each of these conditional probability models is fit separately, depending on the type of the outcome variable sA[j].

If sA[j] is binary, the conditional probability P(sA[j]|sW,sA[1],...,sA[j-1]) is evaluated via logistic regression model. When sA[j] is continuous (or categorical), its range will be fist partitioned into K bins and the corresponding K bin indicators (B_1,...,B_K), where each bin indicator B_j is then used as an outcome in a separate logistic regression model with predictors given by sW, sA[1],...,sA[k-1]. Thus, the joint probability P(sA|sW) is defined by such a tree of binary logistic regressions.

For simplicity, we now suppose sA is continuous and univariate and we describe here an algorithm for fitting $P_{g_0}(sA | sW)$ (the algorithm for fitting $P_{g^*}(sA^* | sW^*)$ is equivalent, except that exposure A is replaced with exposure A^* generated under f_gstar1 or f_gstar2 and the predictors sW from the regression formula hform.g0 are replaced with predictors sW^* specified by the regression formula hform.gstar).

Generate a dataset of N observed continuous summary measures (sa_i:i=1,...,N) from observed ((a_i,w_i):i=1,...,N). Letsa\\in{sa_i:i=1,...,M

Divide the range of sA values into intervals S=(i_1,...,i_M,i_{M+1}) so that any observed data point sa_i belongs to one interval in S, namely, for each possible value sa of sA there is k\in{1,...,M}, such that, i_k < sa <= i_{k+1}.="" let="" the="" mapping="" b(sa)\in{1,...,m}="" denote="" a="" unique="" interval="" in="" s="" for="" sa,="" such="" that,="" i_{b(sa)}="" <="" sa="" bw_{b(sa)}:="i_{B(sa)+1}-i_{B(sa)}" be="" length="" of="" (bandwidth)="" (i_{b(sa)},i_{b(sa)+1}).="" also="" define="" binary="" indicators="" b_1,...,b_m,="" where="" b_j:="I(B(sa)=j)," all="" j="" and="">B(sa). That is we set b_j to missing ones the indicator I(B(sa)=j) jumps from 0 to 1. Now let sA denote the random variable for the observed summary measure for one unit and denote by (B_1,...,B_M) the corresponding random indicators for sA defined as B_j := I(B(sA) = j) for all j <= b(sA) and B_j:=NA for all j>B(sA).

For each j=1,...,M, fit the logistic regression model for the conditional probability P(B_j = 1 | B_{j-1}=0, sW), i.e., at each j this is defined as the conditional probability of B_j jumping from 0 to 1 at bin j, given that B_{j-1}=0 and each of these logistic regression models is fit only among the observations that are still at risk of having B_j=1 with B_{j-1}=0.

Normalize the above conditional probability of B_j jumping from 0 to 1 by its corresponding interval length (bandwidth) bw_j to obtain the discrete conditional hazards h_j(sW):=P(B_j = 1 | (B_{j-1}=0, sW) / bw_j, for each j. For the summary measure sA, the above conditional hazard h_j(sW) is equal to P(sA \in (i_j,i_{j+1}) | sA>=i_j, sW), i.e., this is the probability that sA falls in the interval (i_j,i_{j+1}), conditional on sW and conditional on the fact that sA does not belong to any intervals before j.

Finally, for any given data-point (sa,sw), evaluate the discretized conditional density for P(sA=sa|sW=sw) by first evaluating the interval number k=B(sa)\in{1,...,M} for sa and then computing \prod{j=1,...,k-1}{1-h_j(sW))*h_k(sW)} which is equivalent to the joint conditional probability that sa belongs to the interval (i_k,i_{k+1}) and does not belong to any of the intervals 1 to k-1, conditional on sW.

The evaluation above utilizes a discretization of the fact that any continuous density f of random variable X can be written as f_X(x)=S_X(x)*h_X(x), for a continuous density f of X where S_X(x):=P(X>x) is the survival function for X, h_X=P(X>x|X>=x) is the hazard function for X; as well as the fact that the discretized survival function S_X(x) can be written as a of the hazards for s

Three methods for defining bin (interval) cuttoffs for a continuous one-dimenstional summary measure sA[j]

{

**********************************************************************

There are 3 alternative methods to defining the bin cutoffs S=(i_1,...,i_M,i_{M+1}) for a continuous summary measure sA. The choice of which method is used along with other discretization parameters (e.g., total number of bins) is controlled via the tmlenet_options() function. See ?tmlenet_options argument bin.method for additional details.

Approach 1 (equal.len): equal length, default.

*********************

The bins are defined by splitting the range of observed sA (sa_1,...,sa_n) into equal length intervals. This is the dafault discretization method, set by passing an argument bin.method="equal.len" to tmlenet_options function prior to calling tmlenet(). The intervals will be defined by splitting the range of (sa_1,...,sa_N) into nbins number of equal length intervals, where nbins is another argument of tmlenet_options() function. When nbins=NA (the default setting) the actual value of nbins is computed at run time by taking the integer value (floor) of n/maxNperBin, for n - the total observed sample size and maxNperBin=1000 - another argument of tmlenet_options() with the default value 1,000.

Approach 2 (equal.mass): data-adaptive equal mass intervals.

*********************

The intervals are defined by splitting the range of sA into non-equal length data-adaptive intervals that ensures that each interval contains around maxNperBin observations from (sa_j:j=1,...,N). This interval definition approach can be selected by passing an argument bin.method="equal.mass" to tmlenet_options() prior to calling tmlenet(). The method ensures that an approximately equal number of observations will belong to each interval, where that number of observations for each interval is controlled by setting maxNperBin. The default setting is maxNperBin=1000 observations per interval.

Approach 3 (dhist): combination of 1 & 2.

*********************

The data-adaptive approach dhist is a mix of Approaches 1 & 2. See Denby and Mallows "Variations on the Histogram" (2009)). This interval definition method is selected by passing an argument bin.method="dhist" to tmlenet_options() prior to calling tmlenet(). } #*************************************************************************************** data(df_netKmax6) # Load observed data data(NetInd_mat_Kmax6) # Load the network ID matrix Kmax <- 6 # Max number of friends in the network #***************************************************************************************

#*************************************************************************************** # EXAMPLES OF INTERVENTION FUNCTIONS: #*************************************************************************************** # Returns a function that will sample A with probability x:=P(A=1)) make_f.gstar <- function(x, ...) { eval(x) f.A_x <- function(data, ...){ rbinom(n = nrow(data), size = 1, prob = x[1]) } return(f.A_x) } # Deterministic f_gstar setting every A=0: f.A_0 <- make_f.gstar(x = 0) # Deterministic f_gstar setting every A=1: f.A_1 <- make_f.gstar(x = 1) # Stochastically sets (100*x)% of population to A=1 with probability 0.2: f.A_.2 <- make_f.gstar(x = 0.2)

#*************************************************************************************** # DEFINING SUMMARY MEASURES: #*************************************************************************************** def_sW <- def.sW(netW2 = W2[[1:Kmax]]) + def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE)

def_sA <- def.sA(sum.netAW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE) + def.sA(netA = A[[0:Kmax]])

#*************************************************************************************** # EVALUATING SUMMARY MEASURES BASED ON INPUT DATA: #*************************************************************************************** # A helper function that can pre-evaluate the summary measures on (O)bserved data, # given one of two ways of specifying the network: res <- eval.summaries(sW = def_sW, sA = def_sA, Kmax = 6, data = df_netKmax6, NETIDmat = NetInd_mat_Kmax6, verbose = TRUE)

#*************************************************************************************** # Specifying the clever covariate regressions hform.g0 and hform.gstar: # Left side can consist of any summary names defined by def.sA (as linear terms) # Right side can consist of any summary names defined by def.sW (as linear terms) & 'nF' #*************************************************************************************** hform.g01 <- "netA ~ netW2 + sum.netW3 + nF" hform.gstar1 <- "netA ~ sum.netW3"

# alternatives: hform.g02 <- "netA + sum.netAW2 ~ netW2 + sum.netW3 + nF" hform.g03 <- "sum.netAW2 ~ netW2 + sum.netW3"

#*************************************************************************************** # Specifying the outcome regression Qform: # Left side is ignored (with a warning if not equal to Ynode) # Right side can by any summary names defined by def.sW, def.sA & 'nF' #*************************************************************************************** Qform1 <- "Y ~ sum.netW3 + sum.netAW2" Qform2 <- "Y ~ netA + netW + nF" Qform3 <- "blah ~ netA + netW + nF"

#*************************************************************************************** # Estimate mean population outcome under deterministic intervention A=0 with 6 friends: # Estimation with regression formulas. #*************************************************************************************** # Note that Ynode is optional when Qform is specified; options(tmlenet.verbose = FALSE) # set to TRUE to print status messages res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", f_gstar1 = 0L, Qform = "Y ~ sum.netW3 + sum.netAW2", hform.g0 = "netA ~ netW2 + sum.netW3 + nF", hform.gstar = "netA ~ sum.netW3", IDnode = "IDs", NETIDnode = "Net_str", optPars = list(n_MCsims = 1))

res_K6_1$EY_gstar1$estimates res_K6_1$EY_gstar1$vars res_K6_1$EY_gstar1$CIs

#*************************************************************************************** # Run exactly the same estimators as above, # but using the output "DatNet.ObsP0" of eval.summaries as input to tmlenet: #*************************************************************************************** res_K6_1b <- tmlenet(DatNet.ObsP0 = res$DatNet.ObsP0, Anode = "A", f_gstar1 = 0L, Qform = "Y ~ sum.netW3 + sum.netAW2", hform.g0 = "netA ~ netW2 + sum.netW3 + nF", hform.gstar = "netA ~ sum.netW3", optPars = list(n_MCsims = 1))

res_K6_1b$EY_gstar1$estimates res_K6_1b$EY_gstar1$vars res_K6_1b$EY_gstar1$CIs

#*************************************************************************************** # Same as above but for covariate-based TMLE. #*************************************************************************************** res_K6_2 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", f_gstar1 = 0L, Qform = "Y ~ sum.netW3 + sum.netAW2", hform.g0 = "netA ~ netW2 + sum.netW3 + nF", hform.gstar = "netA ~ sum.netW3", IDnode = "IDs", NETIDnode = "Net_str", optPars = list(runTMLE = "tmle.covariate", n_MCsims = 1))

res_K6_2$EY_gstar1$estimates res_K6_2$EY_gstar1$vars res_K6_2$EY_gstar1$CIs

#*************************************************************************************** # SPECIFYING THE NETWORK AS A MATRIX OF FRIEND ROW NUMBERS: #*************************************************************************************** net_ind_obj <- simcausal::NetIndClass$new(nobs = nrow(df_netKmax6), Kmax = Kmax) # generating the network matrix from input data: NetInd_mat <- net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax6[, "Net_str"], IDs_str = df_netKmax6[, "IDs"])$NetInd nF <- net_ind_obj$nF # number of friends

data(NetInd_mat_Kmax6) all.equal(NetInd_mat, NetInd_mat_Kmax6) # TRUE

print(head(NetInd_mat)) print(head(nF)) print(all.equal(df_netKmax6[,"nFriends"], nF))

res_K6_net1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", f_gstar1 = f.A_0, Qform = "Y ~ sum.netW3 + sum.netAW2", hform.g0 = "netA ~ netW2 + sum.netW3 + nF", hform.gstar = "netA ~ sum.netW3", NETIDmat = NetInd_mat, optPars = list(runTMLE = "tmle.intercept", n_MCsims = 1))

all.equal(res_K6_net1$EY_gstar1$estimates, res_K6_1$EY_gstar1$estimates) all.equal(res_K6_net1$EY_gstar1$vars, res_K6_1$EY_gstar1$vars) all.equal(res_K6_net1$EY_gstar1$CIs, res_K6_1$EY_gstar1$CIs)

#*************************************************************************************** # EQUIVALENT WAYS OF SPECIFYING INTERVENTIONS f_gstar1/f_gstar2. # LOWERING THE DIMENSIONALITY OF THE SUMMARY MEASURES. #*************************************************************************************** def_sW <- def.sW(sum.netW3 = sum(W3[[1:Kmax]]), replaceNAw0=TRUE) def_sA <- def.sA(sum.netAW2 = sum((1-A[[1:Kmax]])*W2[[1:Kmax]]), replaceNAw0=TRUE)

# can define intervention by function f.A_0 that sets everyone's A to constant 0: res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", Ynode = "Y", f_gstar1 = f.A_0, NETIDmat = NetInd_mat) res_K6_1$EY_gstar1$estimates

# equivalent way to define intervention f.A_0 is to just set f_gstar1 to 0: res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", Ynode = "Y", f_gstar1 = 0L, NETIDmat = NetInd_mat) res_K6_1$EY_gstar1$estimates

# or set f_gstar1 to a vector of 0's of length nrow(data): res_K6_1 <- tmlenet(data = df_netKmax6, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", Ynode = "Y", f_gstar1 = rep_len(0L, nrow(df_netKmax6)), NETIDmat = NetInd_mat) res_K6_1$EY_gstar1$estimates

#*************************************************************************************** # EXAMPLE WITH SIMULATED DATA FOR AT MOST 2 FRIENDS AND 1 BASELINE COVARIATE W1 #*************************************************************************************** data(df_netKmax2) head(df_netKmax2) Kmax <- 2 # Define the summary measures: def_sW <- def.sW(W1[[0:Kmax]]) def_sA <- def.sA(A[[0:Kmax]]) # Define the network matrix: net_ind_obj <- simcausal::NetIndClass$new(nobs = nrow(df_netKmax2), Kmax = Kmax) NetInd_mat <- net_ind_obj$makeNetInd.fromIDs(Net_str = df_netKmax2[, "Net_str"], IDs_str = df_netKmax2[, "IDs"])$NetInd

#*************************************************************************************** # Mean population outcome under stochastic intervention P(A=1)=0.2 #*************************************************************************************** # Evaluates the target parameter by sampling exposures A from f_gstar1; # Setting optPars$n_MCsims=100 means that A will be sampled 1000 times ( # for a total sample size of nrow(data)*n_MCsims under f_gstar1) options(tmlenet.verbose = TRUE) res_K2_1 <- tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", Ynode = "Y", f_gstar1 = f.A_.2, NETIDmat = NetInd_mat, optPars = list(n_MCsims = 100)) res_K2_1$EY_gstar1$estimates res_K2_1$EY_gstar1$vars res_K2_1$EY_gstar1$CIs

#*************************************************************************************** # Estimating the average treatment effect (ATE) for two static interventions: # f_gstar1 sets everyone A=1 vs f_gstar2 sets everyone A=0; #*************************************************************************************** res_K2_2 <- tmlenet(data = df_netKmax2, Kmax = Kmax, sW = def_sW, sA = def_sA, Anode = "A", Ynode = "Y", f_gstar1 = 1, NETIDmat = NetInd_mat, optPars = list(f_gstar2 = 0, n_MCsims = 1)) names(res_K2_2)

# Estimates under f_gstar1: res_K2_2$EY_gstar1$estimates res_K2_2$EY_gstar1$vars res_K2_2$EY_gstar1$CIs

# Estimates under f_gstar2: res_K2_2$EY_gstar2$estimates res_K2_2$EY_gstar2$vars res_K2_2$EY_gstar2$CIs

# ATE estimates for f_gstar1-f_gstar2: res_K2_2$ATE$estimates res_K2_2$ATE$vars res_K2_2$ATE$CIs tmlenet-package for the general overview of the package, def.sW for defining the summary measures, eval.summaries for evaluation and validation of the summary measures, and df_netKmax2/df_netKmax6/NetInd_mat_Kmax6 for examples of network datasets.