Learn R Programming

rminer (version 1.4.3)

mparheuristic: Function that returns a list of searching (hyper)parameters for a particular classification or regression model

Description

Function that returns a list of searching (hyper)parameters for a particular classification or regression model. The result is to be put in a search argument, used by fit or mining functions. Something like: search=list(search=mparheuristic(...),...).

Usage

mparheuristic(model, n = NA, lower = NA, upper = NA, by = NA, exponential = NA, 
              kernel = "rbfdot")

Arguments

model

model type name. See fit for the model details (e.g., "ksvm").

n

number of searches or heuristic (either n or by should be used, n has prevalence over by). By default, the searches are linear for all models except support vector machine based models ("ksvm","rsvm","lssvm", which assume 2^search-range). If this argument is a character type, then it is assumed to be an heuristic. Possible heuristic values are:

  • heuristic - only one model is fit, uses default rminer values, same as mparheuristic(model).

  • heuristic5 - 5 hyperparameter searches from lower to upper, only works for the following models: ctree, rpart, kknn, ksvm, lssvm, mlp, mlpe, randomForest, multinom, rvm. Notes: rpart - different cp values (see rpart.control); ctree - different mincriterion values (see ctree_control); randomForest -- upper argument should be set to the number of inputs, since mtry is searched; ksvm, lssvm or rvm - the optional kernel argument can be used.

  • heuristic10 - same as heuristic5 but with 10 searches from lower to upper.

  • mlp_t - heuristic 33 from Delgado 2014 paper, 10 searches, works only when model=mlp or model=mlpe.

  • avNNet_t - heuristic 34 from Delgado 2014 paper, 9 searches, works only when model=mlpe.

  • nnet_t - heuristic 36 from Delgado 2014 paper, 25 searches, works only when model=mlp or model=mlpe.

  • svm_C - heuristic 48 from Delgado 2014 paper, 130 searches (may take time), works only when model=ksvm.

  • svmRadial_t - heuristic 52 from Delgado 2014 paper, 25 searches, works only when model=ksvm.

  • svmLinear_t - heuristic 54 from Delgado 2014 paper, 5 searches, works only when model=ksvm.

  • svmPoly_t - heuristic 55 from Delgado 2014 paper, 27 searches, works only when model=ksvm.

  • lsvmRadial_t - heuristic 56 from Delgado 2014 paper, 10 searches, works only when model=lssvm.

  • rpart_t - heuristic 59 from Delgado 2014 paper, 10 searches, works only when model=rpart.

  • rpart2_t - heuristic 60 from Delgado 2014 paper, 10 searches, works only when model=rpart.

  • ctree_t - heuristic 63 from Delgado 2014 paper, 10 searches, works only when model=ctree.

  • ctree2_t - heuristic 64 from Delgado 2014 paper, 10 searches, works only when model=ctree.

  • rf_t - heuristic 131 from Delgado 2014 paper, 10 searches, works only when model=randomForest.

  • knn_R - heuristic 154 from Delgado 2014 paper, 19 searches, works only when model=kknn.

  • knn_t - heuristic 155 from Delgado 2014 paper, 10 searches, works only when model=kknn.

  • multinom_t - heuristic 167 from Delgado 2014 paper, 10 searches, works only when model=multinom.

lower

lower bound for the (hyper)parameter (if NA a default value is assumed).

upper

upper bound for the (hyper)parameter (if NA a default value is assumed).

by

increment in the sequence (if NA a default value is assumed depending on n).

exponential

if an exponential scale should be used in the search sequence (the NA is a default value that assumes a linear scale unless model is a support vector machine).

kernel

optional kernel type, only used when model="ksvm", model="rsvm" or model="lssvm". Currently mapped kernels are "rbfdot" (Gaussian), "polydot" (polynomial) and "vanilladot" (linear); see ksvm for kernel details.

Value

A list with one ore more (hyper)parameter values to be searched.

Details

This function facilitates the definition of the search argument used by fit or mining functions. Using simple heuristics, reasonable (hyper)parameter search values are suggested for several rminer models. For models not mapped in this function, the function returns NULL, which means that no hyperparameter search is executed (often, this implies using rminer or R function default values).

The heuristic assumes lower and upper bounds for a (hyper)parameter. If n=1, then rminer or R defaults are assumed. Else, a search is created using seq(lower,upper,by), where by was set by the used or computed from n. For model="ksvm", 2^seq(...) is used for sigma and C, (1/10)^seq(...) is used for scale.

References

  • To check for more details about rminer and for citation purposes: P. Cortez. Data Mining with Neural Networks and Support Vector Machines Using the R/rminer Tool. In P. Perner (Ed.), Advances in Data Mining - Applications and Theoretical Aspects 10th Industrial Conference on Data Mining (ICDM 2010), Lecture Notes in Artificial Intelligence 6171, pp. 572-583, Berlin, Germany, July, 2010. Springer. ISBN: 978-3-642-14399-1. @Springer: https://link.springer.com/chapter/10.1007/978-3-642-14400-4_44 http://www3.dsi.uminho.pt/pcortez/2010-rminer.pdf

  • This tutorial shows additional code examples: P. Cortez. A tutorial on using the rminer R package for data mining tasks. Teaching Report, Department of Information Systems, ALGORITMI Research Centre, Engineering School, University of Minho, Guimaraes, Portugal, July 2015. http://hdl.handle.net/1822/36210

  • Some lower/upper bounds and heuristics were retrieved from: M. Fernandez-Delgado, E. Cernadas, S. Barro and D. Amorim. Do we need hundreds of classifiers to solve real world classification problems?. In The Journal of Machine Learning Research, 15(1), 3133-3181, 2014.

See Also

fit and mining.

Examples

Run this code
# NOT RUN {
## "kknn"
s=mparheuristic("kknn",n="heuristic")
print(s) 
s=mparheuristic("kknn",n=1) # same thing
print(s) 
s=mparheuristic("kknn",n="heuristic5")
print(s) 
s=mparheuristic("kknn",n=5) # same thing
print(s)
s=mparheuristic("kknn",lower=5,upper=15,by=2)
print(s)
# exponential scale:
s=mparheuristic("kknn",lower=1,upper=5,by=1,exponential=2)
print(s)

## "mlpe"
s=mparheuristic("mlpe")
print(s) # "NA" means set size with min(inputs/2,10) in fit
s=mparheuristic("mlpe",n="heuristic10")
print(s) 
s=mparheuristic("mlpe",n=10) # same thing
print(s) 
s=mparheuristic("mlpe",n=10,lower=2,upper=20) 
print(s) 

## "randomForest", upper should be set to the number of inputs = max mtry
s=mparheuristic("randomForest",n=10,upper=6)
print(s) 

## "ksvm"
s=mparheuristic("ksvm",n=10)
print(s) 
s=mparheuristic("ksvm",n=10,kernel="vanilladot")
print(s) 
s=mparheuristic("ksvm",n=10,kernel="polydot")
print(s) 

## lssvm
s=mparheuristic("lssvm",n=10)
print(s) 

## rvm 
s=mparheuristic("rvm",n=5)
print(s) 
s=mparheuristic("rvm",n=5,kernel="vanilladot")
print(s) 

## "rpart" and "ctree" are special cases (see help(fit,package=rminer) examples):
s=mparheuristic("rpart",n=3) # 3 cp values
print(s) 
s=mparheuristic("ctree",n=3) # 3 mincriterion values
print(s) 

### examples with fit
# }
# NOT RUN {
### classification
data(iris)
# ksvm and rbfdot:
model="ksvm";kernel="rbfdot"
s=mparheuristic(model,n="heuristic5",kernel=kernel)
print(s) # 5 sigma values
search=list(search=s,method=c("holdout",2/3,123))
# task "prob" is assumed, optimization of "AUC":
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)

# different lower and upper range:
s=mparheuristic(model,n=5,kernel=kernel,lower=-5,upper=1)
print(s) # from 2^-5 to 2^1 
search=list(search=s,method=c("holdout",2/3,123))
# task "prob" is assumed, optimization of "AUC":
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)

# different exponential scale: 
s=mparheuristic(model,n=5,kernel=kernel,lower=-4,upper=0,exponential=10)
print(s) # from 10^-5 to 10^1 
search=list(search=s,method=c("holdout",2/3,123))
# task "prob" is assumed, optimization of "AUC":
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)

# "lssvm" Gaussian model, pure classification and ACC optimization, full iris:
model="lssvm";kernel="rbfdot"
s=mparheuristic("lssvm",n=3,kernel=kernel)
print(s)
search=list(search=s,method=c("holdout",2/3,123))
M=fit(Species~.,data=iris,model=model,search=search,fdebug=TRUE)
print(M@mpar)

# test several heuristic5 searches, full iris:
n="heuristic5";inputs=ncol(iris)-1
model=c("ctree","rpart","kknn","ksvm","lssvm","mlpe","randomForest")
for(i in 1:length(model))
 {
  cat("--- i:",i,"model:",model[i],"\n")
  if(model[i]=="randomForest") s=mparheuristic(model[i],n=n,upper=inputs) 
  else s=mparheuristic(model[i],n=n)
  print(s)
  search=list(search=s,method=c("holdout",2/3,123))
  M=fit(Species~.,data=iris,model=model[i],search=search,fdebug=TRUE)
  print(M@mpar)
 }


# test several Delgado 2014 searches (some cases launch warnings):
model=c("mlp","mlpe","mlp","ksvm","ksvm","ksvm",
        "ksvm","lssvm","rpart","rpart","ctree",
        "ctree","randomForest","kknn","kknn","multinom")
n=c("mlp_t","avNNet_t","nnet_t","svm_C","svmRadial_t","svmLinear_t",
    "svmPoly_t","lsvmRadial_t","rpart_t","rpart2_t","ctree_t",
    "ctree2_t","rf_t","knn_R","knn_t","multinom_t")
inputs=ncol(iris)-1
for(i in 1:length(model))
 {
  cat("--- i:",i,"model:",model[i],"heuristic:",n[i],"\n")
  if(model[i]=="randomForest") s=mparheuristic(model[i],n=n[i],upper=inputs) 
  else s=mparheuristic(model[i],n=n[i])
  print(s)
  search=list(search=s,method=c("holdout",2/3,123))
  M=fit(Species~.,data=iris,model=model[i],search=search,fdebug=TRUE)
  print(M@mpar)
 }
# }
# NOT RUN {
 #dontrun

### regression
# }
# NOT RUN {
data(sa_ssin)
s=mparheuristic("ksvm",n=3,kernel="polydot")
print(s)
search=list(search=s,metric="MAE",method=c("holdout",2/3,123))
M=fit(y~.,data=sa_ssin,model="ksvm",search=search,fdebug=TRUE)
print(M@mpar)

# regression task, predict iris "Petal.Width":
data(iris)
ir2=iris[,1:4]
names(ir2)[ncol(ir2)]="y" # change output name
n=3;inputs=ncol(ir2)-1 # 3 hyperparameter searches
model=c("ctree","rpart","kknn","ksvm","mlpe","randomForest","rvm")
for(i in 1:length(model))
 {
  cat("--- i:",i,"model:",model[i],"\n")
  if(model[i]=="randomForest") s=mparheuristic(model[i],n=n,upper=inputs)
  else s=mparheuristic(model[i],n=n)
  print(s)
  search=list(search=s,method=c("holdout",2/3,123))
  M=fit(y~.,data=ir2,model=model[i],search=search,fdebug=TRUE)
  print(M@mpar)
 }
# }
# NOT RUN {
 #dontrun
# }

Run the code above in your browser using DataLab