Learn R Programming

parallelML (version 1.2)

trainSample: Sample data in parallel

Description

Sample data or data and output in parallel: each core provides one sample of your desired size.

Usage

trainSample(data, numberCores = detectCores(), samplingSize = 0.2, underSample = FALSE, toPredict = NULL, underSampleTarget = NULL, sampleMethod = "bagging")

Arguments

data
A data frame, or structure convertable to a data frame, which you want to sample upon.
numberCores
In this setting equal to number of different training samples you are creating: one for each core you are using.
samplingSize
Size of your training sample in percentage.
underSample
Logical wether you want to take an undersample on your desired target.
toPredict
The column of your dataset you want to predict
underSampleTarget
When you set underSample to TRUE, underSampleTarget takes your target you want to keep in every sample. e.g. If you have 5 elements of category1 and 100 elements of category2 and your sampleSize is 0.2, then every sample will contain 25 elements, namely the 5 of category1 and 20 of category2.
sampleMethod
String which decides wether you sample on your observations (bagging) or on your variables (random).

Value

You get a list of length numberCores. Each core has created one item of your list, namely a data frame containing a a samplingSize size sample of data.

See Also

Under the hood this function uses foreach, and sample

Examples

Run this code
## Not run: 
# # Create your data
# x <- data.frame(1:10,10:1)
# 
# # Sampling on observations
# trainSample(x,numberCores=2,samplingSize = 0.5)
# 
# #Create your data
# data(iris)
# 
# # Sampling on variables
# trainSample(iris,numberCores=2,samplingSize = 0.6,
#             toPredict = "Species", sampleMethod = "random")
# 
# # Create your data
# data(iris)
# data <- iris[1:110,]
# 
# # Sampling
# trainSamples <- trainSample(data,2,0.2,TRUE,"Species","virginica")
# ## End(Not run)

Run the code above in your browser using DataLab