Functions for producing EBLUP small area estimates of the dynamic or Rao-Yu time series models through either ML or REML estimation of the variance components. The functions can fit univariate or multivariate models.
eblupDyn(formula, D, TI, vardir, method = c("REML", "ML"),
MAXITER = 1000, PRECISION = .1e-05, data,
max.rho = NULL, dampening = 1, ...)
eblupRY(formula, D, TI, vardir, method = c("REML", "ML"),
MAXITER = 1000, PRECISION = .1e-05, data,
max.rho = .98, dampening = 0.9, ...)
In the univariate case, a vector of length D*nt with
the eblup estimates. In the multivariate case, a data frame of D*nt rows
and NV columns.
A list summarizing the fit of the model with the following:
model: form of the model: T - Dynamic or Rao-Yu; REML
or ML.
covergence: a logical value indicating whether the
convergence criterion was met.
TI: specified TI for model (see above).
estcoef: a data frame with the estimated model
coefficients (beta) in the first column ,
their asymptotic standard errors (std.error) in the
second column, the t statistics (tvalue) in the third column,
and the p-values (pvalue) of the significance of each
coefficient in last column.
estvarcomp: a data frame with the estimated values
of the variances and correlation coefficients in the first column
(estimate) and their asymptotic standard errors in the
second column (std.error).
iterations: number of iterations performed by the
Fisher-scoring algorithm.
goodness: the log-likelihood and, if REML, the restricted
log-likelihood.
A labelled vector with the estimated variance components, correlations, and number of iterations.
A labelled vector of coefficients of the model or models.
A data frame with D rows and one or more columns of
numeric or character domain identifiers.
An ordered vector of the variance components, which may be
used as starting values for additional iterations, see
dynRYfit.
MSE estimates for eblup.
The g1 term of the MSE estimate.
The g2 term of the MSE estimate.
The g3 term of the MSE estimate.
Estimates based on fixed effects only.
The variance-covariance matrix for the estimates in
coef.
Weights given to the direct estimate in forming eblup.
Weights given to the direct estimate, including effects through estimating the fixed effect coefficients.
Estimates requested by the specified contrasts.
MSE estimates for contrast.est.
The g1 term in the estimation of contrast.mse.
The g2 term in the estimation of contrast.mse.
The g3 term in the estimation of contrast.mse.
Contrast estimates based on the fixed effect model.
Variance estimates for the fixed effect model.
Weight wt1 given to the direct estimate in estimating the contrasts.
Weight wt2 in estimating the contrasts.
Information matrix for the components of delta.
Variance covariance matrix for coef.
The formula or list of formulas implemented.
For a univariate model, a formula for the linear
regression relationship between the dependent variable and the
independent variable(s). The variables included in formula must have
length equal to D*nt and be sorted in ascending order by time
within each domain, where nt is the number of time instances specified
by TI.
For a multivariate model, a list of formulas, one for each
dependent variable. The number of dependent variables, NV, is
determined by the length of the list. The dependent variables included
in the formulas must each have length equal to D*nt and be sorted
in ascending order by time within each component within each domain,
where nt is the number of time instances specified by TI.
The sorting requirement is an extension of the sorting requirement for the
univariate model. Further details of the model specification are given under
Details.
The total number of domains.
If TI is of length 1, it should specify the number of equally
spaced time instances, nt. For example, TI can be the number
of years in an unbroken sequence of years.
If some of the time instances in the time interval are to be omitted from the model,
the first element of TI should be 1, the last element should be the ending
instance, and the elements in between should indicate the other instances to include.
For example, over a period of 6 years, TI=c(1,2, 4:6) omits the 3rd year
from the model, giving nt=5.
For the univariate model, the sampling covariance matrix
for the direct estimates of the D*nt elements of the dependent
variable, where nt is the number of instances specified by TI.
The covariance matrix should be in the form of a square
matrix with D*nt rows and columns. Non-zero covariances between
domains are not allowed, so the matrix must have a block diagonal form
with D blocks, each of which is a square matrix with nt
rows and columns. Note that within domain, non-zero covariances are
allowed over time.
Alternatively, vardir can be a list of
D covariance matrices, each with nt rows and columns.
For the multivariate model, the square covariance matrix for the
D*NV*nt elements of the dependent variables. The matrix must be
in the form of a square matrix with D*NV*nt rows and columns. The
variances and covariances should be in the sort order of time within
dependent variable within domain. Non-zero covariances between domains
are not allowed, but non-zero covariances may be present across time
and between components within a domain.
Alternatively, vardir can be a list of
D covariance matrices, each with NV*nt rows and columns.
Whether restricted maximum likelihood REML or
maximum likelihood ML should be used.
The maximum number of iterations allowed for the Fisher-scoring algorithm.
The convergence tolerance limit for the Fisher-scoring algorithm.
An optional data frame containing the variables named in
formula. By default the variables are taken from the
environment from which eblupDyn is called. Because
vardir will be of a different size than the variables
in formula, data will not be searched for
vardir.
If not NULL, the maximum value allowed for
rho. Note the different defaults for eblupDyn and
eblupRY.
A multiplier of the computed update to parameters after iteration 5. A value less than 1 may slow the iterations but lessens the chance of overshooting the optimum choice. The default values were determined experimentally, but may be modified.
Other parameters passed to dynRYfit that affect
convergence, provide starting values, or request specific results.
The exceptions are y, X, NV, M,
ncolx, and model, which will be determined by
eblupDyn or eblupRY.
Robert E. Fay, Mamadou Diallo
A typical model has the form response ~ terms where response
is the (numeric) response vector and terms is a series of terms that
specifies a linear predictor for response.
A formula has an implied intercept term. To remove this use either
y ~ x - 1 or y ~ 0 + x. See link{formula} for more details of
allowed formulae.
eblupDyn and eblupRY parse formula by calling core
R functions to determine X, then calling dynRYfit.
Factors appearing in terms will be converted to indicator variables
in forming X before calling dynRYfit.
As a last step, eblupDyn or eblupRY finalize the list that
they return.
The additional parameters passed to dynRYfit may
include contrast.matrix, which specifies linear combinations
of estimates within domains, such as the sum over dependent variables
or moving averages across time. Corresponding MSE estimates are provided
for the contrasts.
The argument ids accepts a data frame with D
rows of domain identifiers. These ids are returned in the list from
eblupDyn or eblupRY.
If iter.history is set to TRUE, the returned object will
include additional items with values of statistics at each step of the
iteration; see dynRYfit for details on
delta.hist,
llikelihood.hist,
adj.hist,
s.hist,
ix.hist,
adj.factor.hist, and
warning.hist.
The default action is to include the history only if the iterations fail, in which case the history might suggest what went wrong. In the case of convergence, the history is usually not of interest, in which case omitting it reduces the size of the returned object.
MSE estimation for REML for both the Rao-Yu and dynamic models follows the results summarized in Rao and Molina (2015, pp. 98-111). The MSE estimates incorporate g1, g2, and g3 terms. Our simulations show that the REML estimates have somewhat smaller MSEs than the ML estimates, but this is not reflected in the comparison of the estimated MSEs returned by the functions. The MSE estimates under REML perform quite well on average. The MSE estimates for ML use the same estimator as for REML, but they are modest underestimates of the true MSE in the same simulations.
- Fay, R.E. and Herriot, R.A. (1979). Estimation of income from small places: An application of James-Stein procedures to census data. Journal of the American Statistical Association 74, 269-277.
- Fay, R.E., Planty, M. and Diallo, M.S. (2013). Small area estimates from the National Crime Victimization Survey. Proceedings of the Joint Statistical Meetings. American Statistical Association, pp. 1544-1557.
- Rao, J.N.K. and Molina, I. (2015). Small Area Estimation, 2nd ed. Wiley, Hoboken, NJ.
- Rao, J.N.K. and Yu, M. (1994). Small area estimation by combining time series and cross-sectional data. Canadian Journal of Statistics 22, 511-528.
D <- 20 # number of domains
TI <- 5 # number of years
set.seed(1)
data <- data.frame(Y= mvrnormSeries(D=D, TI=TI, rho.dyn=.9, sigma.v.dyn=1,
sigma.u.dyn=.19, sigma.e=diag(5)), X=rep(1:TI, times=D))
result.dyn <- eblupDyn(Y ~ X, D, TI, vardir = diag(100), data=data)
result.dyn$fit
if (FALSE) {
require(sae)
data(spacetime) # Load data set from sae package
data(spacetimeprox) # Load proximity matrix
D <- nrow(spacetimeprox) # number of domains
TI <- length(unique(spacetime$Time)) # number of time instants
# Fit model ST with AR(1) time effects for each domain
resultST <- eblupSTFH(Y ~ X1 + X2, D, TI, Var, spacetimeprox,
data=spacetime)
resultT <- eblupDyn(Y ~ X1 + X2, D, TI, vardir = diag(spacetime$Var),
data=spacetime, ids=spacetime$Area)
resultT.RY <- eblupRY(Y ~ X1 + X2, D, TI, vardir = diag(spacetime$Var),
data=spacetime, ids=spacetime$Area)
resultST$fit
resultT$fit
resultT.RY$fit
rowsT <- seq(TI, TI*D, by=TI)
data.frame(Domain=spacetime$Area[rowsT], Y=spacetime$Y[rowsT],
EBLUP_ST=resultST$eblup[rowsT],
EBLUB_Dyn=resultT$eblup[rowsT],
EBLUP_RY=resultT.RY$eblup[rowsT])
}
Run the code above in your browser using DataLab