ldblm
is a localized version of a distance-based linear model.
As in the global model dblm
, explanatory information is coded as
distances between individuals.
Neighborhood definition for localizing is done by the (semi)metric
dist1
whereas a second (semi)metric dist2
(which may coincide
with dist1
) is used for distance-based prediction.
Both dist1
and dist2
can either be computed from observed
explanatory variables or directly input as a squared interdistances
matrix or as a Gram
matrix. The response is a continuous variable
as in the ordinary linear model. The model allows for a mixture of
continuous and qualitative explanatory variables or, in fact, from more
general quantities such as functional data.
Notation convention: in distance-based methods we must distinguish
observed explanatory variables which we denote by Z or z, from
Euclidean coordinates which we denote by X or x. For explanation
on the meaning of both terms see the bibliography references below.## S3 method for class 'formula':
ldblm(formula,data,...,kind.of.kernel=1,
metric1="euclidean",metric2=metric1,method="GCV",weights,
user_h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,eff.rank=NULL)
# method for distance class 'dist' or 'dissimilary'
ldblm.dist(y,dist1,dist2=dist1,kind.of.kernel=1,
method="GCV",weights,user_h=quantile(dist1,.25)^.5,
h.range=quantile(as.matrix(dist1),c(.05,0.5))^.5,noh=10,
k.knn=3,rel.gvar=0.95,eff.rank=NULL,...)
# method for distance class 'D2'
ldblm.D2(y,D2_1,D2_2=D2_1,kind.of.kernel=1,method="GCV",
weights,user_h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,
eff.rank=NULL,...)
# method for class 'Gram'
ldblm.Gram(y,G1,G2=G1,kind.of.kernel=1,method="GCV",
weights,user_h=NULL,h.range=NULL,noh=10,k.knn=3,rel.gvar=0.95,
eff.rank=NULL,...)
dist
or dissimilarity
class object.
Distances between observations, used for neighborhood localizing
definition. Weights for observations are computed as a decreasing
function of their dist1
distancesdist
or dissimilarity
class object.
Distances between observations, used for fitting dblm
.
Default dist2=dist1
.D2
class object. Squared distances matrix between individuals.
One of the alternative ways of entering distance information
to a function. See the Details section in dblm
.
See aboveD2
class object. Squared distances between observations.
One of the alternative ways of entering distance information
to a function. See the Details section in dblm
.
See above d
Gram
class object. Doubly centered inner product matrix
associated with the squared distances matrix D2_1
.Gram
class object. Doubly centered inner product matrix
associated with the squared distances matrix D2_2
.
Default G2=G1
dist1
from observed
explanatory variables.
One of "euclidean"
(default), "manhattan"
,
or "gower"
.dist2
from observed
explanatory variables.
One of "euclidean"
(default), "manhattan"
,
or "gower"
.AIC
, BIC
, OCV
,
GCV
(default) and user_h
.
OCV
and GCV
user_h
, set by the user, controlling the size
of the local neighborhood of Z.
Smoothing parameter (Default: 1st quartile of all the distances
d(i,j) in dist1
). Applies only if method="user_
dist1
).h
values within h.range
for
automatic bandwidth choice (if method!="user_h"
).k.nn=3
.dblm
iteration, take the lowest effective rank, with
a relative geometric variability higher or equal to rel.gvar
.
Default value (rel.gv
dblm
iteration. If specified its value overrides
rel.gvar
. When eff.rank=NULL
(defaultldblm
containing the following components:if method!=user_h
)."D2"
or "dist"
) used to calculate the weights of the observations."D2"
or "dist"
) used to fit the dblm
.dist1
and dist2
. Both semi-metrics can coincide.
For instance, when dist1=||xi-xj||
and
dist2=||(xi,xi^2,xi^3)-(xj,xj^2,xj^3)||
the estimator
for new observations coincides with fitting a local cubic polynomial
regression.
The set of bandwidth h
values checked in automatic
bandwidth choice is defined by h.range
and noh
,
together with k.knn
. For each h
in it a local linear
model is fitted and the optimal h
is decided according to the
statistic specified in method
.
kind.of.kernel
designates which kernel function is to be used
in determining individual weights from dist1
values.
See density
for more information.dblm
for distance-based linear models.
ldbglm
for local distance-based generalized linear models.
summary.ldblm
for summary.
plot.ldblm
for plots.
predict.ldblm
for predictions.# example to use of the ldblm function
n <- 100
p <- 1
k <- 5
Z <- matrix(rnorm(n*p),nrow=n)
b1 <- matrix(runif(p)*k,nrow=p)
b2 <- matrix(runif(p)*k,nrow=p)
b3 <- matrix(runif(p)*k,nrow=p)
s <- 1
e <- rnorm(n)*s
y <- Z%*%b1 + Z^2%*%b2 +Z^3%*%b3 + e
D2<-as.matrix(dist(Z)^2)
class(D2)<-"D2"
ldblm1<-ldblm(y~Z,kind.of.kernel=1,method="GCV",noh=3,k.knn=3)
ldblm2<-ldblm.D2(y,D2_1=D2,D2_2=D2,kind.of.kernel=1,method="user_h",k.knn=3)
Run the code above in your browser using DataLab