Learn R Programming

distfreereg (version 1.1)

compare: Compare the simulated statistic distribution with the observed statistic distribution used in distribution-free parametric regression testing

Description

Simulate response data repeatedly with true_mean as the mean and true_covariance as the covariance structure, each time running distfreereg on the simulated data. The observed statistics and p-values are saved, as are the simulated statistics from the first replication.

See the Comparing Distributions with the distfreereg Package vignette for an introduction.

Usage

compare(true_mean, true_method = NULL, true_method_args = NULL, true_covariance,
	true_X = NULL, true_data = NULL, theta = NULL, n = NULL, reps = 1e3,
	prog = reps/10, simulate_args = NULL, err_dist_fun = NULL,
	err_dist_args = NULL, keep = NULL, manual = NULL, update_args = NULL,
	global_override = NULL, ...)

Value

An object of class compare with the following components:

call

The matched call.

Y

The matrix whose columns contain the model outcome values used for the corresponding repetitions.

theta

Supplied vector of parameter values.

true_mean

Supplied object specifying the true mean function.

true_covariance

List containing element(s) that specify the true covariance structure.

true_X

Supplied matrix of true covariate values.

true_data

Supplied data frame of true covariate values.

test_mean

Supplied object specifying the mean function being tested.

covariance

List containing element(s) that specify the test covariance structure.

X

Supplied matrix of test covariate values.

data

Supplied data frame of test covariate values.

observed_stats

The observed statistics collected in each repetition.

mcsim_stats

The simulated statistics from the first repetition. (They are the same for each repetition, because compare uses update.distfreereg.)

p

The p-values for the observed statistics.

dfrs

A list containing the outputs of distfreereg for repetitions specified in keep. Included when keep is not NULL.

manual

A list containing the results of the function specified by the argument manual. Included when manual is not NULL.

Arguments

true_mean

Object specifying the mean structure of the true model. It is used to generate the true values of Y that are passed internally to distfreereg.

true_method

Character vector of length one; specifies the function (e.g., lm) to use to create a model when true_mean is a formula.

true_method_args

Optional list; values are passed to the function specified by true_method.

true_covariance

Named list; specifies the covariance structures of the true error distribution in the format described in the documentation for the covariance argument of distfreereg. Required when true_mean is a function or nls object, or true_method is "nls".

true_X, true_data

Optional numeric matrix or data frame, respectively; specifies the covariate values for the true model. true_X is used when true_mean is a function that has an X or x argument, and the data argument is used when true_mean is a formula or model object.

theta

Numeric vector; used as the (true) parameter values for the model when true_mean is a function.

n

Optional integer; indicates how long each simulated data vector should be. Required only when no covariate values are specified for either the true or test mean. Silently converted to integer if numeric.

reps

Integer; specifies number of replications. Silently converted to integer if numeric.

prog

Integer or Inf; if finite, a progress message is given when the current repetition is a multiple of prog. Default value is reps/10, unless reps is less than 10, in which case the default is 1. If Inf, no progress messages are given. Silently converted to integer if finite numeric.

simulate_args

Optional list; specifies additional named arguments to pass to simulate.

err_dist_fun

Character string; specifies the name of the function to be used to simulate errors when true_mean is a function or nls object, or true_method is "nls". See details.

err_dist_args

Optional list; specifies additional named arguments to pass to err_dist_fun.

keep

A vector of integers, or the character string "all". If not NULL, then the output of each replication's call to distfreereg is included in the output if its repetition number is included in keep. Using keep = "all" is equivalent to keep = 1:reps.

manual

Optional function; applied to the distfreereg object created in each iteration, whose output is saved in the list manual in the output.

update_args

Optional named list; specifies arguments to pass to update.distfreereg.

global_override

Optional named list; specifies arguments to pass to the override argument of distfreereg on each call to that function.

...

Additional arguments passed to distfreereg. See details.

Author

Jesse Miller

Warnings

The generation of new outcome values requires specifying an error distribution. The default behavior when true_mean is a function, an nls object, or a formula with method equal to "nls" is to use a multivariate normal error distribution, but different error-generating functions can be defined by the user. When true_mean is a model object that is not an nls object, or a formula and method is not "nls", then the errors are generated using simulate and are therefore distributed according to that function's specifications.

In short, the asymptotic behavior is determined for a specific (true) error distribution, even though the test itself is distribution-free.

Details

This function allows the user to explore the asymptotic behavior of the distributions involved in the test conducted by distfreereg. If the sample size is large enough and the true covariance matrix of the errors is known or is estimated well enough, then the observed and simulated statistics have nearly the same distribution. How large the sample size must be depends on the details of the situation. This function can be used to determine how large the sample size must be to obtain approximately equal distributions, and to estimate the power of the test against a specific alternative.

The user specifies a particular true model which is used to generate outcome values. There are three cases:

  • When true_mean is a function, this function determines the mean of the outcome values and err_dist_fun is used to generate errors. The error-generating function will usually include an element of true_covariance as an argument, and in that case must accept the appropriate class of object. For example, if the true covariance is a list of matrices corresponding to a block-diagonal covariance matrix, then err_dist_fun must accept such a list as an argument.

  • When true_mean is an nls object, or when it is a formula and true_method is "nls", the function determined by the formula (in the model call or user-specified, respectively) is used to determine the mean function, and err_dist_fun generates the errors.

  • When true_mean is a model object that is not an nls object, or a formula and method is not "nls", then simulate is used to generate outcome values.

If none of these cases apply to true_mean, then compare() cannot be used. (E.g., true_mean cannot be a glm object fitted using a "quasi" family, because simulate does not work for that family.)

The user also specifies arguments to pass to distfreereg, most notably a model to test comprising a mean function test_mean and a covariance structure specified by covariance. For each repetition, compare sends the simulated data, as Y or as part of data, to distfreereg.

The true_covariance argument specifies the covariance structure that is available to err_dist_fun for generating errors. The needs of err_dist_fun can vary (for example, the default function uses SqrtSigma to generate multivariate normal errors), so any one of the elements Sigma, SqrtSigma, P, and Q (defined in the documentation of distfreereg) can be specified. Any element needed by err_dist_fun is calculated automatically if not supplied.

The value of err_dist_fun must be a function whose output is a numeric matrix with n rows and reps columns. Each column is used as the vector of errors in one repetition. The error function's arguments can include the special values n, reps, Sigma, SqrtSigma, P, and Q. These arguments are automatically assigned their corresponding values from the values passed to compare. For example, the default value rmvnorm uses SqrtSigma to generate multivariate normal values with mean 0 and covariance Sigma.

The argument keep is useful for diagnosing problems, but caution should be used lest a very large object be created. It is often sufficient to save the distfreereg objects from only the first few replications.

For more specialized needs, the manual argument allows the calculation and saving of objects during each repetition. For example, using manual = function(x) residuals(x) will save the (raw) residuals from each repetition.

The first repetition creates a distfreereg object. During each subsequent repetition, this object is passed to update.distfreereg to create a new object. The update_args argument can be used to modify this call.

If necessary, global_override can be used to pass an override argument to distfreereg in each repetition. For example, using gobal_override = list(theta_hat = theta) forces the estimated parameter vector used in the test in each call to be the true parameter vector theta.

See Also

asymptotics, distfreereg, rejection, plot.compare, ks.test.compare

Examples

Run this code
set.seed(20240201)
n <- 100
func <- function(X, theta) theta[1] + theta[2]*X[,1]
Sig <- rWishart(1, df = n, Sigma = diag(n))[,,1]
theta <- c(2,5)
X <- matrix(rexp(n, rate = 1))
# In practice, 'reps' should be much larger
cdfr <- compare(true_mean = func, true_X = X, true_covariance = list(Sigma = Sig),
                test_mean = func, X = X, covariance = list(Sigma = Sig),
                reps = 10, prog = Inf, theta = theta, theta_init = rep(1, length(theta)))

cdfr$p

Run the code above in your browser using DataLab