EstStempCens: ML estimation in spatio-temporal model with censored/missing responses

Description

Return the maximum likelihood estimates of the unknown parameters of spatio-temporal model with censored/missing responses. The estimates are obtained using SAEM algorithm. Also, the function computes the observed information matrix using the method developed by Thomas (1982). The types of censoring considered are left, right or missing values.

Usage

EstStempCens(y, x, cc, tempo, coord, inits.phi, inits.rho, inits.tau2,
  tau2.fixo = FALSE, type.Data = "balanced", cens.type = "left",
  method = "nlminb", kappa = 0, type.S = "exponential",
  IMatrix = TRUE, lower.lim = c(0.01, -0.99, 0.01), upper.lim = c(30,
  0.99, 20), M = 20, perc = 0.25, MaxIter = 300, pc = 0.2,
  error = 10^-6)

Arguments

a vector of responses.

a matrix or vector of covariates.

a vector of censoring indicators. For each observation: 1 if censored/missing and 0 if non-censored/non-missing.

tempo

a vector of time.

coord

a matrix of coordinates of the spatial locations.

inits.phi

initial value of the spatial scaling parameter.

inits.rho

initial value of the time scaling parameter.

inits.tau2

initial value of the the nugget effect parameter.

tau2.fixo

TRUE or FALSE. Indicate if the nugget effect (\(\tau^2\)) parameter must be fixed. By default = FALSE.

type.Data

type of the data: 'balanced' for balanced data and 'unbalanced' for unbalanced data. By default = balanced.

cens.type

type of censoring: 'left' for left censoring, 'right' for rigth censoring and 'missing' for missing response. By default = left.

method

optimization method used to estimate (\(\phi\), \(\rho\) and \(\tau^2\)): 'optim' for the function optim and 'nlminb' for the function nlminb. By default = nlminb.

kappa

parameter for all spatial covariance functions. In the case of exponential, gaussian and spherical function \(\kappa\) is equal to zero.

type.S

type of spatial covariance function: 'exponential' for exponential, 'gaussian' for gaussian, 'matern' for matern, 'pow.exp' for power exponential and 'spherical' for spherical function, respectively. Default is exponential function.

IMatrix

TRUE or FALSE. Indicate if the observed information matrix will be computed. By default = TRUE.

lower.lim, upper.lim

vectors of lower and upper bounds for the optimization method. If unspecified, the default is c(0.01,-0.99,0.01) for the lower bound and c(30,0.99,20) for the upper bound.

number of Monte Carlo samples for stochastic aproximation. By default = 20.

perc

percentage of burn-in on the Monte Carlo sample. By default = 0.25.

MaxIter

the maximum number of iterations of the SAEM algorithm. By default = 300.

percentage of iterations of the SAEM algorithm with no memory. By default = 0.2

error

the convergence maximum error. By default = 10^-6.

Value

The function returns an object of class Est.StempCens which is a list given by:

data.model: Returns a list with all data components given in input.
results.model: A list given by:

theta

final estimation of \(\theta = (\beta, \sigma^2, \tau^2, \phi, \rho)\).

Theta

estimated parameters in all iterations, \(\theta = (\beta, \sigma^2, \tau^2, \phi, \rho)\).

beta

estimated \(\beta\).

sigma2

estimated \(\sigma^2\).

tau2

estimated \(\tau^2\).

phi

estimated \(\phi\).

rho

estimated \(\rho\).

PsiInv

estimated \(\Psi^-1\).

Cov

estimated \(\Sigma\).

SAEMy

stochastic approximation of the first moment for the truncated normal distribution.

SAEMyy

stochastic approximation of the second moment for the truncated normal distribution.

Hessian

Hessian matrix, the negative of the conditional expected second derivative matrix given the observed values.

Louis

the observed information matrix using the Louis' method.

loglik

log likelihood for SAEM method.

AIC

Akaike information criteria.

BIC

Bayesian information criteria.

AICcorr

corrected AIC by the number of parameters.

iteration

number of iterations needed to convergence.

Details

The spatio-temporal Gaussian model is giving by:

\( Y(s_{i},t_{j})= \mu(s_{i},t_{j})+ Z(s_{i},t_{j}) + \epsilon(s_{i},t_{j}),\)

where, the deterministic term \(\mu(s_{i},t_{j})\) and the stochastic terms \(Z(s_{i},t_{j})\), \(\epsilon(s_{i},t_{j})\) can depend on the observed spatio-temporal index for \(Y(s_{i},t_{j})\). We assume \(Z\) is normally distributed with zero-mean and covariance matrix \(\Sigma_z = \sigma^2 \Omega_{\phi\rho}(s,t)\) where \(\sigma^2\) is the model variance, \(\phi\) and \(\rho\) are the spatial and time scaling parameters; \(\epsilon(s_i,t_j)\) is an independent and identically distributed measurement error with \(E[\epsilon(s_i,t_j)]=0\), variance function \(Var[\epsilon(s_i,t_j)]=\tau^2\) (the nugget effect) and \(Cov[\epsilon(s_i,t_j), \epsilon(s_k,t_l)]=0\) \(for all s_i =! s_k\) and \(t_j =! t_l\).

In particular, we define \(\mu(s_i,t_j)\), the mean of the stochastic process as

\(\mu(s_i,t_j)=\sum_{k=1}^{p} x_k(s_i,t_j)\beta_k,\)

where \(x_1(s_i,t_j),..., x_p(s_i,t_j)\) are known functions of \((s_i,t_j)\), and \(\beta_1,...,\beta_p\) are unknown parameters to be estimated. Equivalently, in matrix notation, we have our spatio-temporal linear model as follows:

\(Y = X \beta + Z + \epsilon,\)

\(Z ~ N(0,\sigma^2 \Omega_{\phi\rho}),\)

\(\epsilon ~ N(0,\tau^2 I).\)

Therefore the spatio-temporal process, \(Y\), has normal distribution with mean \(E[Y]=X\beta\) and variance \(\Sigma=\sigma^2\Omega_{\phi\rho}+\tau^2 I_m\). We assume that \(\Sigma\) is non-singular, and \(X\) has full rank. Using standard geostatistical terms, \(\tau^2\) is the nugget effect (or the measurement error variance), \(\sigma^2\) the sill, and \(\Omega_{\phi\rho}=[r_{ij}]\) is the \(m x m\) spatio-temporal correlation matrix with diagonal elements \(r_{ii}=1\), for \(i=1,...,m\).

The estimation process was computed via SAEM algorithm initially proposed by Deylon et. al.(1999).

Examples

Run this code

# NOT RUN {
# Initial parameter values
beta <- c(-1,1.50); phi <- 5; rho <- 0.45; tau2 <- 0.80; sigma2 <- 1.5
# Simulating data
n1 <- 9    # Number of spatial locations
n2 <- 4    # Number of temporal index
set.seed(700)
x.coord <- round(runif(n1,0,10),9)   # X coordinate
y.coord <- round(runif(n1,0,10),9)   # Y coordinate
coordenadas <- cbind(x.coord,y.coord) # Cartesian coordinates without repetitions
coord2 <- cbind(rep(x.coord,each=n2),rep(y.coord,each=n2)) # Cartesian coordinates with repetitions
time <- as.matrix(seq(1,n2,1))      # Time index without repetitions
time2 <- as.matrix(rep(time,n1))    # Time index with repetitions
x1 <- rexp(n1*n2,2)
x2 <- rnorm(n1*n2,2,1)
x <- cbind(x1,x2)
media <- x%*%beta
# Covariance matrix
H <- as.matrix(dist(coordenadas)) # Spatial distances
Mt <- as.matrix(dist(time))       # Temporal distances
Cov <- CovarianceM(phi,rho,tau2,sigma2,distSpa=H,disTemp=Mt,kappa=0,type.S="exponential")
# Data
require(mvtnorm)
y <- as.vector(rmvnorm(1,mean=as.vector(media),sigma=Cov))
perc <- 0.2
aa=sort(y);  bb=aa[1:(perc*n1*n2)];  cutof<-bb[perc*n1*n2]
cc=matrix(1,(n1*n2),1)*(y<=cutof)
y[cc==1] <- cutof
# Estimation
est_teste <- EstStempCens(y, x, cc, time2, coord2, inits.phi=3.5, inits.rho=0.5, inits.tau2=0.7,
                          type.Data="balanced", cens.type="left", method="nlminb", kappa=0,
                          type.S="exponential",
                          IMatrix=TRUE, lower.lim=c(0.01,-0.99,0.01), upper.lim=c(30,0.99,20), M=20,
                          perc=0.25, MaxIter=25, pc=0.2, error = 10^-6)

# }
# NOT RUN {
# New York data
library(spTimer)
library(sp)
library(rgdal)
# Transform coordinates
station <- as.data.frame(NYdata[,2:3])
names(station) <- c("lon","lat")
coordinates(station) <- ~lon+lat
proj4string(station) <- CRS("+init=epsg:32116 +proj=longlat +datum=NAD83 +no_defs
                             +ellps=GRS80 +towgs84=0,0,0")
station2 <- spTransform(station,CRS("+proj=utm +zone=18 +north +datum=NAD83"))
station <- as.data.frame(station2)
names(station) <- c("x.Coord","y.Coord")
NYdata <- cbind(NYdata,station)
coord <- unique(station)
EstEstimation <- c(1,2,3,5,6,7,8,9,11,12,13,14,15,16,17,18,19,20,22,23,24,26,27,28)
EstEstimation <- coord[EstEstimation,]
dataEstimation <- NYdata[(NYdata[,11]%in%EstEstimation[,1])&(NYdata[,12]%in%EstEstimation[,2]),]
cc <- vector("numeric",length=nrow(dataEstimation))
cc[is.na(dataEstimation$o8hrmax)==1] <- 1
time <- rep(seq(1,62),nrow(EstEstimation))
dados <- cbind(dataEstimation,cc,time)
names(dados) <- c("s.index","Longitude","Latitude","Year","Month","Day","o8hrmax","cMAXTMP",
                  "WDSP","RH","x.Coord","y.Coord","censure","time")
# Data
y     <- dados$o8hrmax
cc    <- dados$censure
x     <- as.matrix(cbind(dados$cMAXTMP,dados$WDSP,dados$RH))
tempo <- as.data.frame(dados$time)
coord <- as.matrix(cbind(dados$x.Coord/1000,dados$y.Coord/1000)) # Coordinates in Km
# Power exponential model for the spatial covariance structure
est <- EstStempCens(y, x, cc, tempo, coord, inits.phi = 50, inits.rho = 0.30, inits.tau2 = 25,
                    type.Data="balanced", cens.type="missing", method="nlminb", kappa=0.50,
                    type.S="pow.exp",
                    IMatrix=TRUE, lower.lim=c(0.01,-0.80,0.01), upper.lim=c(500,0.80,150), M=20,
                    perc=0.25, MaxIter=500, pc=0.2, error = 10^-5)
# }

Run the code above in your browser using DataLab