plrm.est: Semiparametric estimates for the unknown components of the regression function in PLR models

Description

This routine computes estimates for $\beta$ and $m(newt_j)$ ($j=1,...,J$) from a sample ${(Y_i, X_{i1}, ..., X_{ip}, t_i)}$: $i=1,...,n$, where: $$\beta = (\beta_1,...,\beta_p)$$ is an unknown vector parameter, $$m(.)$$ is a smooth but unknown function and $$Y_i= X_{i1}*\beta_1 +...+ X_{ip}*\beta_p + m(t_i) + \epsilon_i.$$ The random errors, $\epsilon_i$, are allowed to be time series. Kernel smoothing, combined with ordinary least squares estimation, is used.

Usage

plrm.est(data = data, b = NULL, h = NULL, newt = NULL, estimator = "NW", 
kernel = "quadratic")

Value

A list containing:

beta: a vector containing the estimate of $\beta$.
m.t: a vector containing the estimator of the non-parametric part, $m$, evaluated in the design points.
m.newt: a vector containing the estimator of the non-parametric part, $m$, evaluated in newt.
residuals: a vector containing the residuals: Y - X*beta - m.t.
fitted.values: the values obtained from the expression: X*beta + m.t
b: the considered bandwidth for estimating $\beta$.
h: (b,h) is the pair of bandwidths considered for estimating $m$.

Arguments

data

data[, 1] contains the values of the response variable, $Y$;

data[, 2:(p+1)] contains the values of the "linear" explanatory variables,

$X_1, ..., X_p$;

data[, p+2] contains the values of the "nonparametric" explanatory variable, $t$.

b

bandwidth for estimating the parametric part of the model. If both b and h are NULL (the default), it is selected by means of the cross-validation procedure (fixing b=h); if b is NULL (the default) but h is not NULL, b=h is considered.

h

(b,h) is the pair of bandwidths for estimating the nonparametric part of the model. If both b and h are NULL (the default), it is selected by means of the cross-validation procedure (fixing b=h); if b is NULL (the default) but h is not NULL, b=h is considered; if h is NULL (the default) but b is not NULL, h=b is considered.

newt

values of the "nonparametric" explanatory variable where the estimator of $m$ is evaluated. If NULL (the default), the considered values will be the values of data[,p+2].

estimator

allows us the choice between “NW” (Nadaraya-Watson) or “LLP” (Local Linear Polynomial). The default is “NW”.

kernel

allows us the choice between “gaussian”, “quadratic” (Epanechnikov kernel), “triweight” or “uniform” kernel. The default is “quadratic”.

Author

German Aneiros Perez ganeiros@udc.es

Ana Lopez Cheda ana.lopez.cheda@udc.es

Details

Expressions for the estimators of $\beta$ and $m$ can be seen in page 52 in Aneiros-Perez et al. (2004).

References

Aneiros-Perez, G., Gonzalez-Manteiga, W. and Vieu, P. (2004) Estimation and testing in a partial linear regression under long-memory dependence. Bernoulli 10, 49-78.

Hardle, W., Liang, H. and Gao, J. (2000) Partially Linear Models. Physica-Verlag.

Speckman, P. (1988) Kernel smoothing in partial linear models. J. R. Statist. Soc. B 50, 413-436.

Examples

Run this code

# EXAMPLE 1: REAL DATA
data(barnacles1)
data <- as.matrix(barnacles1)
data <- diff(data, 12)
data <- cbind(data,1:nrow(data))

b.h <- plrm.gcv(data)$bh.opt
ajuste <- plrm.est(data=data, b=b.h[1], h=b.h[2])
ajuste$beta
plot(data[,4], ajuste$m, type="l", xlab="t", ylab="m(t)")

plot(data[,1], ajuste$fitted.values, xlab="y", ylab="y.hat", main="y.hat vs y")
abline(0,1)

mean(ajuste$residuals^2)/var(data[,1])



# EXAMPLE 2: SIMULATED DATA
## Example 2a: independent data

set.seed(1234)
# We generate the data
n <- 100
t <- ((1:n)-0.5)/n
beta <- c(0.05, 0.01)
m <- function(t) {0.25*t*(1-t)}
f <- m(t)

x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- rnorm(n, 0, 0.01)
y <-  sum + f + epsilon
data_ind <- matrix(c(y,x,t),nrow=100)

# We estimate the components of the PLR model
# (CV bandwidth)
a <- plrm.est(data_ind)

a$beta

est <- a$m.t
plot(t, est, type="l", lty=2, ylab="")
points(t, 0.25*t*(1-t), type="l")
legend(x="topleft", legend = c("m", "m hat"), col=c("black", "black"), lty=c(1,2))


## Example 2b: dependent data
# We generate the data
x <- matrix(rnorm(200,0,1), nrow=n)
sum <- x%*%beta
epsilon <- arima.sim(list(order = c(1,0,0), ar=0.7), sd = 0.01, n = n)
y <-  sum + f + epsilon
data_dep <- matrix(c(y,x,t),nrow=100)

# We estimate the components of the PLR model
# (CV bandwidth)
h <- plrm.cv(data_dep, ln.0=2)$bh.opt[3,1]
a <- plrm.est(data_dep, h=h)

a$beta

est <- a$m.t
plot(t, est, type="l", lty=2, ylab="")
points(t, 0.25*t*(1-t), type="l")
legend(x="topleft", legend = c("m", "m hat"), col=c("black", "black"), lty=c(1,2))

Run the code above in your browser using DataLab