`clsd`

computes the logspline density, density
derivative, distribution, and smoothed quantiles for a one (1)
dimensional continuous variable using the approach of Racine
(2013).

```
clsd(x = NULL,
beta = NULL,
xeval = NULL,
degree = NULL,
segments = NULL,
degree.min = 2,
degree.max = 25,
segments.min = 1,
segments.max = 100,
lbound = NULL,
ubound = NULL,
basis = "tensor",
knots = "quantiles",
penalty = NULL,
deriv.index = 1,
deriv = 1,
elastic.max = TRUE,
elastic.diff = 3,
do.gradient = TRUE,
er = NULL,
monotone = TRUE,
monotone.lb = -250,
n.integrate = 500,
nmulti = 1,
method = c("L-BFGS-B", "Nelder-Mead", "BFGS", "CG", "SANN"),
verbose = FALSE,
quantile.seq = seq(.01,.99,by=.01),
random.seed = 42,
maxit = 10^5,
max.attempts = 25,
NOMAD = FALSE)
```

x

a numeric vector of training data

beta

a numeric vector of coefficients (default `NULL`

)

xeval

a numeric vector of evaluation data

degree

integer/vector specifying the polynomial degree of the
B-spline basis for each dimension of the continuous `x`

(default
`degree=2`

)

segments

integer/vector specifying the number of segments of the
B-spline basis for each dimension of the continuous `x`

(i.e. number of knots minus one) (default `segments=1`

, i.e. Bezier
curve)

segments.min,segments.max

when `elastic.max=FALSE`

, the
minimum/maximum segments of the B-spline basis for each of the
continuous predictors (default
`segments.min=1`

,`segments.max=100`

)

degree.min,degree.max

when `elastic.max=FALSE`

the
minimum/maximum degree of the B-spline basis for each of the
continuous predictors (default `degree.min=2`

,
`degree.max=25`

)

lbound,ubound

lower/upper bound for the support of the density. For example, if there is a priori knowledge that the density equals zero to the left of 0, and has a discontinuity at 0, the user could specify lbound = 0. However, if the density is essentially zero near 0, one does not need to specify lbound

basis

a character string (default `basis="tensor"`

)
indicating whether the additive or tensor product B-spline basis
matrix for a multivariate polynomial spline or generalized B-spline
polynomial basis should be used

knots

a character string (default `knots="quantiles"`

)
specifying where knots are to be placed. ‘quantiles’ specifies
knots placed at equally spaced quantiles (equal number of observations
lie in each segment) and ‘uniform’ specifies knots placed at
equally spaced intervals

deriv

an integer `l`

(default `deriv=1`

) specifying
whether to compute the univariate `l`

th partial derivative for
each continuous predictor (and difference in levels for each
categorical predictor) or not and if so what order. Note that if
`deriv`

is higher than the spline degree of the associated
continuous predictor then the derivative will be zero and a warning
issued to this effect

deriv.index

an integer `l`

(default `deriv.index=1`

)
specifying the index (currently only supports 1) of the variable whose
derivative is requested

nmulti

integer number of times to restart the process of finding extrema of
the cross-validation function from different (random) initial
points (default `nmulti=1`

)

penalty

the parameter to be used in the AIC criterion. The
method chooses the number of degrees plus number of segments
(knots-1) that maximizes `2*logl-penalty*(degree+segments)`

. The
default is to use the penalty parameter of `log(n)/2`

(`2`

would deliver standard AIC, `log(n)`

standard BIC)

elastic.max,elastic.diff

a logical value/integer indicating
whether to use ‘elastic’ search bounds such that the optimal
degree/segment must lie `elastic.diff`

units from the
respective search bounds

do.gradient

a logical value indicating whether or not to use
the analytical gradient during optimization (defaults to `TRUE`

)

er

a scalar indicating the fraction of data range to extend
the tails (default `1/log(n)`

, see `extendrange`

for
further details)

monotone

a logical value indicating whether modify
the standard B-spline basis function so that it is tailored for
density estimation (default `TRUE`

)

monotone.lb

a negative bound specifying the lower bound on
the linear segment coefficients used when (`monotone=FALSE`

)

n.integrate

the number of evenly spaced integration points on the extended range specified by `er`

(defaults to `500`

)

method

see `optim`

for details

verbose

a logical value which when `TRUE`

produces verbose output
during optimization

quantile.seq

a sequence of numbers lying in \([0,1]\) on which quantiles from the logspline distribution are obtained

random.seed

seeds the random number generator for initial
parameter values when `optim`

is called

maxit

maximum number of iterations used by `optim`

max.attempts

maximum number of attempts to undertake if `optim`

fails for any set of initial parameters for each value of
`nmulti`

NOMAD

a logical value which when `TRUE`

calls `snomadr`

to determine the optimal `degree`

and `segments`

`clsd`

returns a `clsd`

object. The generic functions
`coef`

, `fitted`

, `plot`

and
`summary`

support objects of this type (`er=FALSE`

plots the density on the sample realizations (default is ‘extended
range’ data), see `er`

above, `distribution=TRUE`

plots
the distribution). The returned object has the following components:

estimates of the density function at the sample points

the density evaluated on the ‘extended range’ of the data

estimates of the derivative of the density function at the sample points

estimates of the derivative of the density function evaluated on the ‘extended range’ of the data

estimates of the distribution function at the sample points

the distribution evaluated on the ‘extended range’ of the data

the ‘extended range’ of the data

integer/vector specifying the degree of the B-spline
basis for each dimension of the continuous `x`

integer/vector specifying the number of segments of
the B-spline basis for each dimension of the continuous `x`

vector of quantiles

vector generated by `quantile.seq`

or input by the
user (lying in `[0,1]`

) from which the quantiles `xq`

are
obtained

This function should be considered to be in ‘beta’ status until further notice.

If smoother estimates are desired and `degree=degree.min`

, increase
`degree.min`

to, say, `degree.min=3`

.

The use of ‘regression’ B-splines can lead to undesirable behavior at
the endpoints of the data (i.e. when `monotone=FALSE`

). The
default ‘density’ B-splines ought to be well-behaved in these regions.

Typical usages are (see below for a list of options and also the examples at the end of this help file)

model <- clsd(x)

`clsd`

computes a logspline density estimate of a one (1)
dimensional continuous variable.

The spline model employs the tensor product B-spline basis matrix for
a multivariate polynomial spline via the B-spline routines in the GNU
Scientific Library (https://www.gnu.org/software/gsl/) and the
`tensor.prod.model.matrix`

function.

When `basis="additive"`

the model becomes additive in nature
(i.e. no interaction/tensor terms thus semiparametric not fully
nonparametric).

When `basis="tensor"`

the model uses the multivariate tensor
product basis.

Racine, J.S. (2013), “Logspline Mixed Data Density Estimation,” manuscript.

# NOT RUN { ## Old Faithful eruptions data histogram and clsd density library(MASS) data(faithful) attach(faithful) model <- clsd(eruptions) ylim <- c(0,max(model$density,hist(eruptions,breaks=20,plot=FALSE)$density)) plot(model,ylim=ylim) hist(eruptions,breaks=20,freq=FALSE,add=TRUE,lty=2) rug(eruptions) summary(model) coef(model) ## Simulated data set.seed(42) require(logspline) ## Example - simulated data n <- 250 x <- sort(rnorm(n)) f.dgp <- dnorm(x) model <- clsd(x) ## Standard (cubic) estimate taken from the logspline package ## Compute MSEs mse.clsd <- mean((fitted(model)-f.dgp)^2) model.logspline <- logspline(x) mse.logspline <- mean((dlogspline(x,model.logspline)-f.dgp)^2) ylim <- c(0,max(fitted(model),dlogspline(x,model.logspline),f.dgp)) plot(model, ylim=ylim, sub=paste("MSE: logspline = ",format(mse.logspline),", clsd = ", format(mse.clsd)), lty=3, col=3) xer <- model$xer lines(xer,dlogspline(xer,model.logspline),col=2,lty=2) lines(xer,dnorm(xer),col=1,lty=1) rug(x) legend("topright",c("DGP", paste("Cubic Logspline Density (package 'logspline', knots = ", model.logspline$nknots,")",sep=""), paste("clsd Density (degree = ", model$degree, ", segments = ", model$segments,", penalty = ",round(model$penalty,2),")",sep="")), lty=1:3, col=1:3, bty="n", cex=0.75) summary(model) coef(model) ## Simulate data with known bounds set.seed(42) n <- 10000 x <- runif(n,0,1) model <- clsd(x,lbound=0,ubound=1) plot(model) # } # NOT RUN { # } # NOT RUN { <!-- %% End dontrun --> # }