smoothz: Smoothing functions.

Description

Routines for k-Nearest Neighbor density and regression estimation, optionally using parallel computation.

Usage

smoothz(z,sf,k,checkna=TRUE,cls=NULL,nchunks=length(cls),scalefirst=FALSE)
smoothzpred(newx,oldx,oldxregest,checkna=TRUE,cls=NULL,nchunks=length(cls))
knnreg(data,k) 
knndens(data,k)

Arguments

The data, in data frame or matrix form. In the regression case, the response variable is assumed to be in the last column.

Smoothing function (unquoted), knnreg for regression or knndens for density estimation.

Number of nearest neighbors.

nchunks

Number of chunks to break the computation into.

newx

New X data to predict from

oldx

X-variable values in the training set.

oldxregest

Estimated regression values in the training set.

checkna

If TRUE, remove any row having at least one NA value.

cls

Cluster to use (see the parallel package) for parallel computation.

data

Data to be smoothed.

scalefirst

Apply scale to the data before smoothing.

Value

smoothzpred, vector of predicted Y values for newx.

Details

The smoothed values are calculated at the input data points (needed in this form for another application). So, for instance, the i-th value of the output of smoothz in the regression case is the estimated regression function at the i-th row of z.

The density estimates are not mormalized to having total hypervolume equal to 1.0.

In the case of non-null nchunks, smoothing is done within-chunk only. The smoothed value at a point will be computed only from its neighbors in the point's chunk.

The smoothzpred function applies only to the regression case. It is assumed that smoothz has been previously called on oldx, yielding regression function estimates oldxregest at those points. The smoothzpred function then finds, for each point newx[i], the closest point oldx[j] in oldx, and uses the corresponding value oldxregest[j] as the predicted value at newx[i].

Examples

Run this code


# programmers and engineers in Silicon Valley, 2000 census, age 25-65
data(prgeng)
pg <- prgeng
pg1 <- pg[pg$age >= 25 & pg$age <= 65,]
estreg <- smoothz(pg1[,c(1,8)],sf=knnreg,k=100)
age <- pg1[,1]
p <- ggplot(data.frame(age,estreg))
p + geom_smooth(aes(x=age,y=estreg))
# peak earnings appear to occur around age 45

Run the code above in your browser using DataLab