liquidData: Loads or downloads training and testing data

Description

This looks at several locations to find a name.train.csv and name.test.csv. If it does then it loads or downloads it, parses it, and returns an liquidData-object. The files also can be gzipped having names name.train.csv.gz and name.test.csv.gz.

Usage

liquidData(name, factor_cols, header = FALSE, loc = c(".", "~/liquidData",
  system.file("data", package = "liquidSVM"), "../../../data",
  "https://www.isa.uni-stuttgart.de/liquidData"), prob = NULL,
  testSize = NULL, trainSize = NULL, stratified = NULL)
ttsplit(data, target = NULL, testProb = 0.2, testSize = NULL,
  stratified = NULL)
sample.liquidData(liquidData, prob = 0.2, trainSize = NULL,
  testSize = NULL, stratified = NULL)
# S3 method for liquidData
print(x, ...)

Arguments

name

name of the data set. If not given then a list of available names in loc is returned

factor_cols

list of column numbers that are factors (or list of header names, if header=TRUE)

header

do the data files have headers

loc

vector of locations where the data should be searched for

prob

probability of sample being put into test set

testSize

size of the test set. If stratified, this will only be approximately fulfilled.

trainSize

size of the train set. If stratified, this will only be approximately fulfilled.

stratified

whether sampling should be done separately in every bin defined by the unique values of the target column. Also can be index or name of the column in data that should be used to define bins.

data

the given data set

target

optional name or index of the target variable. If both this and stratified are not specified there will be no stratification.

testProb

probability of sample being put into test set

liquidData

the given liquidData

the model to print

...

other arguments to print.default

Value

if name is specified an liquidData object: an environment with $train and $test datasets as well as $name and optionally $target as name of the target variable. If no name is spacified a character vector of available names in loc.

Examples

Run this code

# NOT RUN {
banana <- liquidData('banana-mc')

## to get a smaller sample
liquidData('banana-mc',prob=0.2)
## if you disable stratified then there is some variance in the group sizes:
liquidData('banana-mc',prob=0.2, stratified=FALSE)

# }
# NOT RUN {
## to downlad a file from our web directory

liquidData("gisette")

## To get a list of available names:
liquidData()
# }
# NOT RUN {
## to produce an liquidData from some dataset
ttsplit(iris)
# the following will be stratified
ttsplit(iris,'Species')

# specify a testSize:
ttsplit(trees, testSize=10)
## example for sample.liquidData
banana <- liquidData('banana-mc')
sample.liquidData(banana, prob=0.1)
# this is equivalent to
liquidData('banana-mc', prob=0.1)
## example for print
banana <- liquidData("banana-mc")
print(banana)
# }

Run the code above in your browser using DataLab

Description

Usage

Arguments

Value

See Also

Examples