ldbglm
is a localized version of a distance-based generalized linear
model. As in the global model dbglm
, explanatory information is
coded as distances between individuals.
Neighborhood definition for localizing is done by the (semi)metric
dist1
whereas a second (semi)metric dist2
(which may coincide
with dist1
) is used for distance-based prediction.
Both dist1
and dist2
can either be computed from observed
explanatory variables or directly input as a squared interdistances
matrix or as a Gram
matrix. Response and link function as in the
dbglm
function for ordinary generalized linear models.
The model allows for a mixture of continuous and qualitative explanatory
variables or, in fact, from more general quantities such as functional data.
Notation convention: in distance-based methods we must distinguish
observed explanatory variables which we denote by Z or z, from
Euclidean coordinates which we denote by X or x. For explanation
on the meaning of both terms see the bibliography references below.## S3 method for class 'formula':
ldbglm(formula,data,...,family=gaussian(),kind.of.kernel=1,
metric1="euclidean",metric2=metric1,method="GCV",weights,
user_h=NULL,h.range=NULL,noh=10,k.knn=3,
rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,
eps2=1e-10)
# method for distance class 'dist' or 'dissimilary'
ldbglm.dist(y,dist1,dist2=dist1,family=gaussian(),kind.of.kernel=1,
method="GCV",weights,user_h=quantile(dist1,.25)^.5,
h.range=quantile(as.matrix(dist1),c(.05,.25))^.5,noh=10,k.knn=3,
rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,eps2=1e-10,...)
# method for distance class 'D2'
ldbglm.D2(y,D2_1,D2_2=D2_1,family=gaussian(),kind.of.kernel=1,
method="GCV",weights,user_h=NULL,h.range=NULL,noh=10,
k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,
eps2=1e-10,...)
# method for class 'Gram'
ldbglm.Gram(y,G1,G2=G1,kind.of.kernel=1,user_h=NULL,
family=gaussian(),method="GCV",weights,h.range=NULL,noh=10,
k.knn=3,rel.gvar=0.95,eff.rank=NULL,maxiter=100,eps1=1e-10,
eps2=1e-10,...)
dist
or dissimilarity
class object.
Distances between observations, used for neighborhood localizing
definition. Weights for observations are computed as a decreasing
function of their dist1
distdist
or dissimilarity
class object.
Distances between observations, used for fitting dbglm
.
Default dist2=dist1
.D2
class object. Squared distances matrix between individuals.
One of the alternative ways of entering distance information
to a function. See the Details section in dblm
.
See aboveD2
class object. Squared distances between observations.
One of the alternative ways of entering distance information
to a function. See the Details section in dblm
.
See above d
Gram
class object. Doubly centered inner product matrix
associated with the squared distances matrix D2_1
.Gram
class object. Doubly centered inner product matrix
associated with the squared distances matrix D2_2
.
Default G2=G1
dist1
from observed
explanatory variables.
One of "euclidean"
(default), "manhattan"
,
or "gower"
.dist2
from observed
explanatory variables.
One of "euclidean"
(default), "manhattan"
,
or "gower"
.AIC
, BIC
, OCV
,
GCV
(default) and user_h
.
OCV
and GCV<
user_h
, set by the user, controlling the size
of the local neighborhood of Z.
Smoothing parameter (Default: 1st quartile of all the distances
d(i,j) in dist1
). Applies only if method="user_
dist1
).h
values within h.range
for
automatic bandwidth choice (if method!="user_h"
).k.nn=3
.dblm
iteration, take the lowest effective rank, with
a relative geometric variability higher or equal to rel.gvar
.
Default value (rel.gv
dblm
iteration. If specified its value overrides
rel.gvar
. When eff.rank=NULL
(defaultdblm
algorithm.
(Default = 100)"DevStat"
: convergence tolerance eps1
,
a positive (small) number;
the iterations converge when |dev - dev_{old}|/(|dev|) < eps1
.
Stationarity of deviance has been attained."mustat"
: convergence tolerance eps2
,
a positive (small) number;
the iterations converge when |mu - mu_{old}|/(|mu|) < eps2
.
Stationarity of fitted.values mu
has beeldbglm
containing the following components:h
used in the fitting proces (if method!=user_h
).family
object used."D2"
or "dist"
) used to calculate the weights of the observations."D2"
or "dist"
) used to fit the dbglm
."ldbglm"
are actually of class
c("ldbglm", "ldblm")
, inheriting the plot.ldblm
and
summary.ldblm
method from class "ldblm"
.dblm
.
The set of bandwidth h
values checked in automatic
bandwidth choice is defined by h.range
and noh
,
together with k.knn
. For each h
in it a local generalized
linear model is fitted and the optimal h
is decided according to the
statistic specified in method
.
kind.of.kernel
designates which kernel function is to be used
in determining individual weights from dist1
values.
See density
for more information.
For gamma distributions, the domain of the canonical link function
is not the same as the permitted range of the mean. In particular,
the linear predictor might be negative, obtaining an impossible
negative mean. Should that event occur, dbglm
stops with
an error message. Proposed alternative is to use a non-canonical link
function.dbglm
for distance-based generalized linear models.
ldblm
for local distance-based linear models.
summary.ldbglm
for summary.
plot.ldbglm
for plots.
predict.ldbglm
for predictions.# example of ldbglm usage
z <- rnorm(100)
y <- rbinom(100, 1, plogis(z))
D2<-as.matrix(dist(z))^2
class(D2)<-"D2"
# Distance-based generalized linear model
dbglm2<-dbglm.D2(y,D2,family=binomial(link = "logit"))
# Local Distance-based generalized linear model
ldbglm2<-ldbglm.D2(y,D2,family=binomial(link = "logit"),noh=3)
# check the difference of both
sum((y-ldbglm2$fit)^2)
sum((y-dbglm2$fit)^2)
plot(z,y)
points(z,ldbglm2$fit,col=3)
points(z,dbglm2$fit,col=2)
Run the code above in your browser using DataLab