simContinuous(simPopObj, additional = "netIncome",
method = c("multinom", "lm"), zeros = TRUE,
breaks = NULL, lower = NULL, upper = NULL,
equidist = TRUE, probs = NULL, gpd = TRUE,
threshold = NULL, est = "moments", limit = NULL,
censor = NULL, log = TRUE, const = NULL,
alpha = 0.01, residuals = TRUE, keep = TRUE,
maxit = 500, MaxNWts = 1500,
tol = .Machine$double.eps^0.5, nr_cpus = NULL, eps = NULL,
regModel = "basic", byHousehold = FALSE, imputeMissings = FALSE, seed)simPopObj holding household survey data, population data and optionally some margins.dataS that should be simulated for the population data. Currently, only one additional variable can be simulated at a time."multinom", for using multinomial log-linear models combined with random draws from the resulting categories, and "lm"additional is semi-continuous, i.e., contains a considerable amount of zeros. If TRUE and method is "multinom", a separate factor level for zeros inadditional. If NULL, break points are computed using weighted quantilebreaks is NULL, these can be used to specify lower and upper bounds other than minimum and maximum, respectively. Note that if method is "multinommethod is "multinom" and breaks is NULL, this indicates whether the (positive) default break points should be equidistant or whether there should be refinements in the lower and upper tail (smethod is "multinom" and breaks is NULL, this gives probabilities for quantiles to be used as (positive) break points. If supplied, this is preferred over method is "multinom", this indicates whether the upper tail of the variable specified by additional should be simulated by random draws from a (truncated) generalized Pareto distribution rather than a unimethod is "multinom", values for categories above threshold are drawn from a (truncated) generalized Pareto distribution.method is "multinom", the estimator to be used to fit the generalized Pareto distribution.data.frames; if multinomial models are computed, this can be used to account for structural zeros. The names of the list components specify the categories that should be censored. For each of these categoriemethod is "lm", this indicates whether the linear model should be fitted to the logarithms of the variable specified by additional. The predicted values are then back-transformed with the exponential funcmethod is "lm" and log is TRUE, this gives a constant to be added before log transformation.method is "lm", this gives trimming parameters for the sample data. Trimming is thereby done with respect to the variable specified by additional. If a numeric vector of length two is supplied, the first method is "lm", this indicates whether the random error terms should be obtained by draws from the residuals. If FALSE, they are drawn from a normal distribution (median and MAD of the residuals are used TRUE, the corresponding column name is given by additional with postmethod is "lm" and zeros is TRUE, a small positive numeric value or NULL. When fitting a log-linear model within a stratum, factor levels may not exist in the sample but are likely to exiNULL (the default). In the former case and if (multinomial) log-linear models are computed, estimated probabilities smaller than this are assumed to result from structural zeros and are set to exactly 0.simPopObj containing survey data as well as the simulated population data including the continuous variable specified by additional and possibly simulated categories for the desired continous variable.method is "lm", the behavior for two-step models is described in the following. If zeros is TRUE and log is not TRUE or the variable specified by additional does not contain negative values, a log-linear model is used to predict whether an observation is zero or not. Then a linear model is used to predict the non-zero values.
If zeros is TRUE, log is TRUE and const is specified, again a log-linear model is used to predict whether an observation is zero or not. In the linear model to predict the non-zero values, const is added to the variable specified by additional before the logarithms are taken.
If zeros is TRUE, log is TRUE, const is NULL and there are negative values, a multinomial log-linear model is used to predict negative, zero and positive observations. Categories for the negative values are thereby defined by breaks. In the second step, a linear model is used to predict the positive values and negative values are drawn from uniform distributions in the respective classes.
If zeros is FALSE, log is TRUE and const is NULL, a two-step model is used if there are non-positive values in the variable specified by additional. Whether a log-linear or a multinomial log-linear model is used depends on the number of categories to be used for the non-positive values, as defined by breaks. Again, positive values are then predicted with a linear model and non-positive values are drawn from uniform distributions.
The number of cpus are selected automatically in the following manner. The number of cpus is equal the number of strata. However, if the number of cpus is less than the number of strata, the number of cpus - 1 is used by default. This should be the best strategy, but the user can also overwrite this decision.
simStructure, simCategorical,
simComponents, simEUSILCdata(eusilcS)
inp <- specifyInput(data=eusilcS, hhid="db030", hhsize="hsize", strata="db040", weight="db090")
simPop <- simStructure(data=inp, method="direct",
basicHHvars=c("age", "rb090", "hsize", "pl030", "pb220a"))
# multinomial model with random draws
eusilcM <- simContinuous(simPop, additional="netIncome", upper=200000, equidist=FALSE)
class(eusilcM)
# two-step regression
eusilcT <- simContinuous(simPop, additional="netIncome", method = "lm")
class(eusilcT)Run the code above in your browser using DataLab