Generate data including responses and predictors values, of which predictors are independent and of mixed types.
mixone(n, p, sigma, binary)
The number of observations.
The number of predictors.
The error variance.
A boolean argument: binary = TRUE
indicates that binary responses are generated and binary = FALSE
indicates that continuous responses are generated.
Return a list with the following components.
An n by p data frame representing predictors values, with each row corresponding an observation.
A vector of length n representing response values.
A vector of length n representing the values of \(f0(x)\).
The error variance which is only returned when binary = FALSE
.
A vector of length n representing the values of \(\Phi(f0(x))\), which is only returned when binary = TRUE
.
Sample the predictors \(x_1, ..., x_{ceiling(p/2)}\) from Bernoulli(0.5) independently and
\(x_{ceiling(p/2)+1}, ..., x_p\) from Uniform(0, 1) independently.
If binary = FALSE
, sample the continuous response \(y\) from Normal(\(f0(x), \sigma^2\)), where
$$f0(x) = 10sin(\pi x_{ceiling(p/2)+1}*x_{ceiling(p/2)+2}) + 20(x_{ceiling(p/2)+3}-0.5)^2 + 10x_1 + 5x_2.$$
If binary = TRUE
, sample the binary response \(y\) from Bernoulli(\(\Phi(f0(x))\)) where \(f0\) is defined above and
\(\Phi\) is the cumulative density function of the standard normal distribution.
Luo, C. and Daniels, M. J. (2021) "Variable Selection Using Bayesian Additive Regression Trees." arXiv preprint arXiv:2112.13998.
# NOT RUN {
data = mixone(100, 10, 1, FALSE)
# }
Run the code above in your browser using DataLab