asterdata: Object Describing Saturated Aster Model

Description

Encapsulate all data describing a saturated aster model into a single R object. All other functions in the package take model descriptions of this form. The contract for objects of class "asterdata" constructed by this function is described. Functions that test conformance to the contract.

Usage

asterdata(data, vars, pred, group, code, families, delta,
  response.name = "resp", varb.name = "varb",
  tolerance = 8 * .Machine$double.eps)
validasterdata(object, tolerance = 8 * .Machine$double.eps)
is.validasterdata(object, tolerance = 8 * .Machine$double.eps)

Arguments

Value

an object of class "asterdata" is a list containing the following componentsredataa data frame having nrow(data) * length(vars) rows and containing variables having names in setdiff(names(data), vars) and also the names "id", response.name, and varb.name. Produced from data using the reshape function. Each variable in setdiff(names(data), vars) is repeated length(vars) times. The variable named response.name is the concatenation of the variables in data with names in vars. The variable named varb.name is a factor having levels vars that says which of the variables in the data frame data correspond to which components of the response vector. The variable named "id" is an integer vector that says which of the individuals (which rows of data) correspond to which rows of redata. Not all objects of class "asterdata" need have an id variable, although all those constucted by this function do.repredan integer vector satisfying length(repred) == nrow(redata) specifying the (arrows of the) graphical structure of the aster model for all individuals. Must be nonnegative and satisfy all(repred < seq(along = repred)). Zero indicates the predecessor is an initial node (formerly called root node) of the graph. Nonzero indicates the element of the response vector with index repred[j] is the predecessor of the element of the response vector with index j.

Note that repred is determined by pred but is quite different from it. Firstly, the lengths differ. Secondly, repred is not just a repetition of pred. The numbers in pred, if nonzero, are indices for the vector vars whereas the numbers in repred, if nonzero, are row indices for the data frame redata.

initiala numeric vector specifying constants associated with initial nodes (formerly called root nodes) of the graphical model for all individuals. If repred[j] == 0 then the predecessor of node j is an initial node associated with the constant initial[j], which must be a positive integer unless the family associated with the arrow from this initial node to node j is infinitely divisible (the only such family currently implemented being Poisson), in which case initial[j] must be a strictly positive and finite real number. If repred[j] != 0, then initial[j] is ignored and may be any numeric value, including NA or NaN. This function always makes initial equal to rep(1, nrow(redata)) but the more general description above is valid for objects of class "asterdata" constructed by hand.regroupan integer vector satisfying length(regroup) == nrow(redata) specifying the dependence group graphical structure of the aster model for all individuals. Must be nonnegative and satisfy all(regroup < seq(along = regroup)). Zero indicates the corresponding element of the response vector is not in a dependence group (the corresponding element of the response vector is conditionally independent of all elements of the response vector given its predecessor variable). Nonzero indicates the element of the response vector with index regroup[j] is in the same dependence group as the element of the response vector with index j, which requires repred[regroup[j]] == repred[j] (elements of the response vector in the same dependence group are conditionally independent of all other elements of the response vector given their common predecessor variable but not conditionally independent of each other).

Note that regroup is determined by group but is quite different from it. Firstly, the lengths differ. Secondly, regroup is not just a repetition of group. The numbers in group, if nonzero, are indices for the vector vars whereas the numbers in regroup, if nonzero, are row indices for the data frame redata.

recodean integer vector satisfying length(recode) == nrow(redata) specifying the annotation of the graphical structure of the aster model for all individuals (which families label which arrows). For component j the arrow in question goes from the element of the response vector indexed by repred[j] to the element of the response vector indexed by j if repred[j] is nonzero and from the constant initial[j] to the element of the response vector indexed by j if the repred[j] is zero. Components are in seq(along = families) and indicate that the arrow for component j is labeled with the family described by families[recode[j]].

Note that regroup[j] == k requires recode[j] == recode[k] when regroup[j] != 0. Also note that recode is determined by code but is different from it. Firstly, the lengths differ. Secondly, recode need not be just a repetition of code. This function always makes recode equal to rep(code, each = nrow(redata)) but the more general description above is valid for objects of class "asterdata" constructed by hand.

familiesa copy of the argument of the same name of this function except that any character string abbreviations are converted to objects of class "astfam".redeltaa numeric vector satisfying length(redelta) == nrow(redata) specifying the degeneracies of the aster model for all individuals. If not the zero vector, the degenerate model specified is the limit as $s \to \infty$ of nondegenerate models having conditional canonical parameter vector $\theta + s \delta$ (note that the conditional canonical parameter vector is always used here, regardless of whether conditional or unconditional canonical affine submodels are to be used).

Note that redelta is determined by delta but is different from it. Firstly, the lengths differ. Secondly, redelta need not be just a repetition of delta. This function always makes redelta equal to rep(delta, each = nrow(redata)) but the more general description above is valid for objects of class "asterdata" constructed by hand.

response.namea character string giving the name of the response variable in redata. For this function, a copy of the argument response.name.In addition an object of class "asterdata" may contain (and those constructed by this function do contain) components pred, group, and code, which are copies of the arguments of the same names of this function. Objects of class "asterdata" not constructed by this function need not contain these additional components, since they may make no sense if the graph for all individuals is not the repetition of isomorphic subgraphs, one for each individual.

Details

Response variables in dependence groups are taken to be in the order they appear in the response vector. The first to appear in the response vector is the first canonical statistic for the dependence group distribution, the second to appear the second canonical statistic, and so forth. The number of response variables in the dependence group must match the dimension of the dependence group distribution.

This function only handles the usual case where each individual corresponds to an isomorphic subgraph of the full graph and all initial nodes (formerly called root nodes) correspond to the constant one. Each row of data is the data for one individual. The vectors vars, pred, group, code, and delta (if not missing) describe the subgraph for one individual (which is the same for all individuals).

In other cases for which this function does not have the flexibility to construct the appropriate object of class "asterdata", such an object will have to be constructed by hand using R statements not involving this function or modifying an object produced by this function. See the following section for description of such objects. The functions validasterdata and is.validasterdata can be used to check whether objects constructed by hand have been constructed correctly.

Examples

Run this code

data(test1)
fred <- asterdata(test1, vars = c("m1", "n1", "n2"), pred = c(0, 1, 1),
    group = c(0, 0, 2), code = c(1, 2, 2),
    families = list("bernoulli", "normal.location.scale"))
is.validasterdata(fred)