Learn R Programming

parallelML (version 1.2)

parallelML: Parallel-Voting version of machine learning algorithms

Description

By sampling your data, running the machine learning algorithm on these samples in parallel on your own machine and letting your models vote on a prediction, we return much faster predictions than the regular machine learning algorithm and possibly even more accurate predictions.

Usage

"parallelML"(MLCall, MLPackage, samplingSize = 0.2,numberCores = detectCores(), underSample = FALSE, underSampleTarget = NULL, sampleMethod = "bagging")

Arguments

MLCall
Your call to a machine learning algorithm. All arguments in this call should be named and the package only allows formula calls. Hence the call should look like "machineLearningAlgorithm(formula = ..., data = ..., ...)".
MLPackage
A character string of the package which provides your machine learning algorithm. This is needed since all cores should load the package.
samplingSize
Size of your data you will take in each sample.
numberCores
Number of cores of your machine you want to use. Is set equal to the number of samples you take.
underSample
Logical wether you want to take an undersample on your desired target.
underSampleTarget
When you set underSample to TRUE, underSampleTarget takes your target you want to keep in every sample. e.g. If you have 5 elements of category1 and 100 elements of category2 and your sampleSize is 0.2, then every sample will contain 25 elements, namely the 5 of category1 and 20 of category2.
sampleMethod
String which decides wether you sample on your observations (bagging) or on your variables (random).

Value

A list containing of numberCores machine learning models.

See Also

This package can be regarded as a parallel extension of machine learning algorithms, therefor check the package of the machine learning algorithm you want to use.

Examples

Run this code
## Not run: 
# # Load the library which provides svm
# library(e1071)
# 
# # Create your data
# data(iris)
# 
# # Create a model
# parSvmModel <- parallelML("svm(formula = Species ~ ., data = iris)",
#                      "e1071",samplingSize = 0.8)
#                      
# # Get prediction
# parSvmPred   <- predictML("predict(parSvmModel,newdata=iris)",
#                           "e1071","vote")
# 
# # Check the quality
# table(parSvmPred,iris$Species)
# ## End(Not run)
## Not run: 
# # Load the library which provides rpart
# library(rpart)
# 
# # Create your data
# data("magicData")
# 
# # Create a model
# parTreeModel  <- parallelML("rpart(formula = V11 ~ ., data = trainData[,-1])",
#                             "rpart",samplingSize = 0.8)
# 
# # Get prediction
# parTrainTreePred  <- predictML("predict(parTreeModel,newdata=trainData[,-1],type='class')",
#                                "rpart","vote")
# parTestTreePred  <- predictML("predict(parTreeModel,newdata=testData[,-1],type='class')",
#                               "rpart","vote")
# 
# # Check the quality
# table(parTrainTreePred,trainData$V11)
# table(parTestTreePred,testData$V11)	
# ## End(Not run)
## Not run: 
# # Load the library which provides svm
# library(e1071)
# 
# # Create your data
# data(iris)
# subdata <- iris[1:60,]
# 
# # Create a model
# parsvmmodel   <- parallelML("svm(formula = Species ~ ., data = subdata)",
#                             "e1071",samplingSize = 0.8,
#                             underSample = TRUE, underSampleTarget = "versicolor")
#                             
# # Get prediction                            
# parsvmpred    <- predictML("predict(parsvmmodel,newdata=subdata)",
#                            "e1071","vote")
#                            
# # Check the quality                           
# table(parsvmpred,subdata$Species)
# ## End(Not run)
## Not run: 
# # Load the library which provides svm
# library(e1071)
# 
# # Create your data
# data(iris)
# 
# # Create a model
# parsvmmodel   <- parallelML("svm(formula = Species ~ ., data = iris)",
#                             "e1071",samplingSize = 0.6,
#                             sampleMethod = "random")
#                             
# # Get prediction                            
# parsvmpred    <- predictML("predict(parsvmmodel,newdata=iris)",
#                            "e1071","vote")
#                            
# # Check the quality                           
# table(parsvmpred,iris$Species)
# ## End(Not run)

Run the code above in your browser using DataLab