Learn R Programming

FRESA.CAD (version 2.2.0)

getKNNpredictionFromFormula: Predict classification using KNN

Description

This function will return the classification of the samples of a test set using a k-nearest neighbors (KNN) algorithm with euclidean distances, given a formula and a train set.

Usage

getKNNpredictionFromFormula(model.formula, trainData, testData, Outcome = "CLASS", nk = 3)

Arguments

model.formula
An object of class formula with the formula to be used
trainData
A data frame with the data to train the model, where all variables are stored in different columns
testData
A data frame similar to trainData, but with the data set to be predicted
Outcome
The name of the column in trainData that stores the variable to be predicted by the model
nk
The number of neighbors used to generate the KNN classification

Value

prediction
A vector with the predicted outcome for the testData data set
prob
The proportion of k neighbours that predicted the class to be the one being reported in prediction
binProb
The proportion of k neighbours that predicted the class of the outcome to be equal to 1
featureList
A vector with the names of the features used by the KNN procedure

See Also

predictForFresa, knn

Examples

Run this code
	## Not run: 
# 	# Start the graphics device driver to save all plots in a pdf format
# 	pdf(file = "Example.pdf")
# 	# Get the stage C prostate cancer data from the rpart package
# 	library(rpart)
# 	data(stagec)
# 	# Split the stages into several columns
# 	dataCancer <- cbind(stagec[,c(1:3,5:6)],
# 	                    gleason4 = 1*(stagec[,7] == 4),
# 	                    gleason5 = 1*(stagec[,7] == 5),
# 	                    gleason6 = 1*(stagec[,7] == 6),
# 	                    gleason7 = 1*(stagec[,7] == 7),
# 	                    gleason8 = 1*(stagec[,7] == 8),
# 	                    gleason910 = 1*(stagec[,7] >= 9),
# 	                    eet = 1*(stagec[,4] == 2),
# 	                    diploid = 1*(stagec[,8] == "diploid"),
# 	                    tetraploid = 1*(stagec[,8] == "tetraploid"),
# 	                    notAneuploid = 1-1*(stagec[,8] == "aneuploid"))
# 	# Remove the incomplete cases
# 	dataCancer <- dataCancer[complete.cases(dataCancer),]
# 	# Load a pre-stablished data frame with the names and descriptions of all variables
# 	data(cancerVarNames)
# 	# Split the data set into train and test samples
# 	trainDataCancer <- dataCancer[1:(nrow(dataCancer)/2),]
# 	testDataCancer <- dataCancer[(nrow(dataCancer)/2+1):nrow(dataCancer),]
# 	# Get a Cox proportional hazards model using:
# 	# - 10 bootstrap loops
# 	# - Train data
# 	# - Age as a covariate
# 	# - zIDI as the feature inclusion criterion
# 	cancerModel <- ForwardSelection.Model.Bin(loops = 10,
# 	                                           covariates = "1 + age",
# 	                                           Outcome = "pgstat",
# 	                                           variableList = cancerVarNames,
# 	                                           data = trainDataCancer,
# 	                                           type = "COX",
# 	                                           timeOutcome = "pgtime",
# 	                                           selectionType = "zIDI")
# 	# Predict the outcome of the test data sample using KNN
# 	KNNPrediction <- getKNNpredictionFromFormula(model.formula = cancerModel$formula,
# 	                                             trainData = trainDataCancer,
# 	                                             testData = testDataCancer,
# 	                                             Outcome = "pgstat",
# 	                                             nk = 5)
# 	# Shut down the graphics device driver
# 	dev.off()## End(Not run)

Run the code above in your browser using DataLab