Learn R Programming

freqparcoord (version 1.0.1)

smoothz: Smoothing functions.

Description

Routines for k-Nearest Neighbor density and regression estimation, optionally using parallel computation.

Usage

smoothz(z,sf,k,checkna=TRUE,cls=NULL,nchunks=length(cls),scalefirst=FALSE) smoothzpred(newx,oldx,oldxregest,checkna=TRUE,cls=NULL,nchunks=length(cls)) knnreg(data,k) knndens(data,k)

Arguments

z
The data, in data frame or matrix form. In the regression case, the response variable is assumed to be in the last column.
sf
Smoothing function (unquoted), knnreg for regression or knndens for density estimation.
k
Number of nearest neighbors.
nchunks
Number of chunks to break the computation into.
newx
New X data to predict from
oldx
X-variable values in the training set.
oldxregest
Estimated regression values in the training set.
checkna
If TRUE, remove any row having at least one NA value.
cls
Cluster to use (see the parallel package) for parallel computation.
data
Data to be smoothed.
scalefirst
Apply scale to the data before smoothing.

Value

smoothzpred, vector of predicted Y values for newx.

Details

The smoothed values are calculated at the input data points (needed in this form for another application). So, for instance, the i-th value of the output of smoothz in the regression case is the estimated regression function at the i-th row of z.

The density estimates are not mormalized to having total hypervolume equal to 1.0.

In the case of non-null nchunks, smoothing is done within-chunk only. The smoothed value at a point will be computed only from its neighbors in the point's chunk.

The smoothzpred function applies only to the regression case. It is assumed that smoothz has been previously called on oldx, yielding regression function estimates oldxregest at those points. The smoothzpred function then finds, for each point newx[i], the closest point oldx[j] in oldx, and uses the corresponding value oldxregest[j] as the predicted value at newx[i].

Examples

Run this code

# programmers and engineers in Silicon Valley, 2000 census, age 25-65
data(prgeng)
pg <- prgeng
pg1 <- pg[pg$age >= 25 & pg$age <= 65,]
estreg <- smoothz(pg1[,c(1,8)],sf=knnreg,k=100)
age <- pg1[,1]
p <- ggplot(data.frame(age,estreg))
p + geom_smooth(aes(x=age,y=estreg))
# peak earnings appear to occur around age 45

Run the code above in your browser using DataLab