dblm
is a variety of linear model where explanatory information
is coded as distances between individuals. These distances can either
be computed from observed explanatory variables or directly input as
a squared interdistances matrix. The response is a continuous variable as
in the ordinary linear model. Since distances can be computed from a mixture
of continuous and qualitative explanatory variables or,
in fact, from more general quantities, dblm
is a proper extension of
lm
.
Notation convention: in distance-based methods we must distinguish
observed explanatory variables which we denote by Z or z, from
Euclidean coordinates which we denote by X or x. For explanation
on the meaning of both terms see the bibliography references below.## S3 method for class 'formula':
dblm(formula,data,...,metric="euclidean",method="OCV",
full_search=FALSE,weights,rel.gvar=0.95,eff.rank)
# method for distance class 'dist' or 'dissimilary'
dblm.dist(y,distance,...,method="OCV",full_search=FALSE,
weights,rel.gvar=0.95,eff.rank)
# method for distance class 'D2'
dblm.D2(y,D2,...,method="OCV",full_search=FALSE,weights,rel.gvar=0.95,
eff.rank)
# method for class 'Gram'
dblm.Gram(y,G,...,method="OCV",full_search=FALSE,weights,rel.gvar=0.95,
eff.rank)
D2
class object. Squared distances matrix between individuals.
See details below to learn the usage of dblm.D2
.Gram
class object. Doubly centered inner product matrix of the
squared distances matrix D2
.
See details below to learn the usage of dblm.Gram
."euclidean"
(default), "manhattan"
,
or "gower"
."AIC"
, "BIC"
,
method
.
Needs to be specified only if method
is "AIC"
,
"BIC"
, "OCV"
or "GCV"
rel.gvar
. Default value (rel.gvar=0.95
)
uses a 95% of the total varimethod="eff.rank"
.dblm
containing the following components:method="AIC"
).method="BIC"
).dblm
model uses the distance matrix between individuals
to find an appropriate prediction method.
There are many ways to compute and calculate this matrix, besides
the three included as parameters in this function.
Several packages in R also study this problem. In particular
dist
in the package stats
and daisy
in the package cluster
(the three metrics in dblm
call
the daisy
function).
Another way to enter a distance matrix to the model is through an object
of class "D2"
(containing the squared distances matrix).
An object of class "dist"
or "dissimilarity"
can
easily be transformed into one of class "D2"
. See disttoD2
.
Reciprocally, an object of class "D2"
can be transformed into one
of class "dist"
. See D2toDist
.
S3 method Gram uses the Doubly centered inner product matrix G=XX'.
Its also easily to transformed into one of class "D2"
.
See D2toG
and GtoD2
.
The weights array is adequate when responses for different individuals
have different variances. In this case the weights array should be
(proportional to) the reciprocal of the variances vector.
When using method method="eff.rank"
or method="rel.gvar"
,
a compromise between possible consequences of a bad choice has to be
reached. If the rank is too large, the model can be overfitted, possibly
leading to an increased prediction error for new cases
(even though R2 is higher). On the other hand, a small rank suggests
a model inadequacy (R2 is small). The other four methods are less error
prone (but still they do not guarantee good predictions).summary.dblm
for summary.
plot.dblm
for plots.
predict.dblm
for predictions.
ldblm
for distance-based local linear models.# easy example to illustrate usage of the dblm function
n <- 100
p <- 3
k <- 5
Z <- matrix(rnorm(n*p),nrow=n)
b <- matrix(runif(p)*k,nrow=p)
s <- 1
e <- rnorm(n)*s
y <- Z%*%b + e
D<-dist(Z)
dblm1<-dblm.dist(y,D)
lm1 <- lm(y~Z)
# the same fitted values with the lm
mean(lm1$fitted.values-dblm1$fitted.values)
Run the code above in your browser using DataLab