Function that generates data of the different simulation studies
presented in the accompanying paper. This function requires the
truncnorm
package to be installed.
gendata(n, p, corr, E = truncnorm::rtruncnorm(n, a = -1, b = 1), betaE,
SNR, parameterIndex)
number of observations
number of main effect variables (X)
correlation between predictors
simulated environment vector of length n
. Can be continuous
or integer valued. Factors must be converted to numeric. Default:
truncnorm::rtruncnorm(n, a = -1, b = 1)
exposure effect size
signal to noise ratio
simulation scenario index. See details for more information.
A list with the following elements:
matrix of
dimension nxp
of simulated main effects
simulated response
vector of length n
simulated exposure vector of length
n
linear predictor vector of length n
the function f1
evaluated at x_1
(f1(X1)
)
the function f1
evaluated at x_1
(f1(X1)
)
the function f1
evaluated at x_1
(f1(X1)
)
the function f1
evaluated at x_1
(f1(X1)
)
the value for
the function
f1
the function f2
the function
f3
the function f4
an n
length
vector of the first predictor
an n
length vector of the
second predictor
an n
length vector of the third
predictor
an n
length vector of the fourth predictor
a character representing the simulation scenario identifier as described in Bhatnagar et al. (2018+)
character vector of causal variable names
character vector of noise variables
We evaluate the performance of our method on three of its defining characteristics: 1) the strong heredity property, 2) non-linearity of predictor effects and 3) interactions.
Truth obeys
weak hierarchy (parameterIndex = 2
)
Truth only has interactions (parameterIndex = 3
)
Truth is
linear (parameterIndex = 4
)
Truth only has main effects (parameterIndex = 5
)
.
The functions are from the paper by Lin and Zhang (2006):
f2 <- function(t) 3 * (2 * t - 1)^2
f3 <- function(t) 4 * sin(2 * pi * t) / (2 - sin(2 * pi * t))
f4 <- function(t) 6 * (0.1 * sin(2 * pi * t) + 0.2 * cos(2 * pi * t) + 0.3 * sin(2 * pi * t)^2 + 0.4 * cos(2 * pi * t)^3 + 0.5 * sin(2 * pi * t)^3)
The response is generated as
The covariates are simulated as follows as described in Huang et al.
(2010). First, we generate [0,1]
for
Lin, Y., & Zhang, H. H. (2006). Component selection and smoothing in multivariate nonparametric regression. The Annals of Statistics, 34(5), 2272-2297.
Huang J, Horowitz JL, Wei F. Variable selection in nonparametric additive models (2010). Annals of statistics. Aug 1;38(4):2282.
Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.
# NOT RUN {
DT <- gendata(n = 75, p = 100, corr = 0, betaE = 2, SNR = 1, parameterIndex = 1)
# }
Run the code above in your browser using DataLab