mining: Powerful function that trains and tests a particular fit model under several runs and a given validation method

Description

Powerful function that trains and tests a particular fit model under several runs and a given validation method. Since there can be a huge number of models, the fitted models are not stored. Yet, several useful statistics (e.g. predictions) are returned.

Usage

mining(x, data = NULL, Runs = 1, method = NULL, model = "default", 
       task = "default", search = "heuristic", mpar = NULL,
       feature="none", scale = "default", transform = "none", 
       debug = FALSE, ...)

Arguments

a symbolic description (formula) of the model to be fit. If x contains the data, then data=NULL (similar to x in ksvm, kernlab package).

data

an optional data frame (columns denote attributes, rows show examples) containing the training data, when using a formula.

Runs

number of runs used (e.g. 1, 5, 10, 20, 30)

method

a vector with c(vmethod,vpar), where vmethod is:

all-- allNROWexamples are used as both training and test sets (novparis needed).
holdout-- standard hol

model

See fit for details.

task

See fit for details.

mpar

See fit for details.

feature

See fit for more details about feature="none", "sabs" or "sbs" options. For the mining function, additional options are feature=fmethod,

scale

See fit for details.

transform

See fit for details.

debug

If TRUE shows some information about each run.

...

See fit for details.

Value

A list with the components:
- $time -- vector with time elapsed for each run.
- $test -- vector list, where each element contains the test (target) results for each run.
- $pred -- vector list, where each element contains the predicted results for each test set and each run.
- $error -- vector with an errormetricfor each run (the error depends on themetricparameter ofmpar, valid options are explained inmmetric).
- $mpar -- data.frame with each fit model mpar parameters, the sequence repeatsRuns(timesvparifkfoldis used).
- $model -- themodel.
- $task -- thetask.
- $method -- the external validationmethod.
- $sen -- a matrix with the 1-D sensitivity analysis input importances. The number of rows isRunstimesvpar, ifkfold, else isRuns.
- $sresponses -- a vector list with a size equal to the number of attributes (useful forgraph="VEC"). Each element contains a list with the 1-D sensitivity analysis input responses (n-- name of the attribute;l-- number of levels;x-- attribute values;y-- 1-D sensitivity responses. Important note: sresponses (and "VEC" graphs) are only available iffeature="sabs"or"simp"related (seefeature).
- $runs -- theRuns.
- $attributes -- vector list with all attributes (features) selected in each run (and fold ifkfold) if a feature selection algorithm is used.
- $feature -- thefeature.

Details

Powerful function that trains and tests a particular fit model under several runs and a given validation method (see [Cortez, 2010] for more details). Several Runs are performed. In each run, the same validation method is adopted (e.g. holdout) and several relevant statistics are stored. Warning: be patient, this function can require some computational effort, specially if a high number of Runs is used.

References

To check for more details about rminer and for citation purposes: P. Cortez. Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool. In P. Perner (Ed.), Advances in Data Mining - Applications and Theoretical Aspects 10th Industrial Conference on Data Mining (ICDM 2010), Lecture Notes in Artificial Intelligence 6171, pp. 572-583, Berlin, Germany, July, 2010. Springer. ISBN: 978-3-642-14399-1. @Springer:http://www.springerlink.com/content/e7u36014r04h0334 http://www3.dsi.uminho.pt/pcortez/2010-rminer.pdf

Examples

Run this code

### simple regression example
x1=rnorm(200,100,20); x2=rnorm(200,100,20)
y=0.7*sin(x1/(25*pi))+0.3*sin(x2/(25*pi))
M=mining(y~x1+x2,Runs=2,model="mlpe",search=2)
print(M)
print(mmetric(M,metric="MAE"))

### classification example (task="prob")
data(iris)
M=mining(Species~.,iris,Runs=10,method=c("kfold",3),model="dt")
print(mmetric(M,metric="CONF"))
print(mmetric(M,metric="AUC"))
print(meanint(mmetric(M,metric="AUC")))
mgraph(M,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg="Versicolor",
       main="versicolor ROC")
mgraph(M,graph="LIFT",TC=2,baseline=TRUE,Grid=10,leg="Versicolor",
       main="Versicolor ROC")
M2=mining(Species~.,iris,Runs=10,method=c("kfold",3),model="svm")
L=vector("list",2)
L[[1]]=M;L[[2]]=M2
mgraph(L,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg=c("DT","SVM"),main="ROC")

### regression example
data(sin1reg)
M=mining(y~.,data=sin1reg,Runs=3,method=c("holdout",2/3),model="mlpe",
         search="heuristic5",mpar=c(50,3,"kfold",3,"MAE"),feature="sabs")
print(mmetric(M,metric="MAE"))
print(M$mpar)
cat("median H nodes:",medianminingpar(M)[1],"")
print(M$attributes)
mgraph(M,graph="RSC",Grid=10,main="sin1 MLPE scatter plot")
mgraph(M,graph="REP",Grid=10,main="sin1 MLPE scatter plot",sort=FALSE)
mgraph(M,graph="REC",Grid=10,main="sin1 MLPE REC")
mgraph(M,graph="IMP",Grid=10,main="input importances",xval=0.1,leg=names(sin1reg))
mgraph(M,graph="VEC",Grid=10,main="x1 VEC curve",xval=1,leg=names(sin1reg)[1])

### another classification example
data(iris)
M=mining(Species~.,data=iris,Runs=2,method=c("kfold",2),model="svm",
search="heuristic",mpar=c(NA,NA,"kfold",3,"AUC"),feature="s")
print(mmetric(M,metric="AUC",TC=2))
mgraph(M,graph="ROC",TC=2,baseline=TRUE,Grid=10,leg="SVM",main="ROC",intbar=FALSE)
mgraph(M,graph="IMP",TC=2,Grid=10,main="input importances",xval=0.1,
leg=names(iris),axis=1)
mgraph(M,graph="VEC",TC=2,Grid=10,main="Petal.Width VEC curve",
data=iris,xval=4)

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

Details

References

See Also

Examples