Read the task data using tdmReadDataset and split them into a test part and
a training/validation-part and return a TDMdata object.
tdmReadAndSplit(opts, tdm, nExp = 0, dset = NULL)a list from which we need here the elements
READ.INI: [T] =T: do read and split, =F: return NULL
READ.*: other settings for tdmReadDataset
filename: needed for tdmReadDataset
filetest: needed for tdmReadDataset
TST.testFrac: [0.1] set this fraction of the daa aside for testing
TST.COL: string with name for the partitioning column, if tdm$umode is not "SP_T".
(If tdm$umode=="SP_T", then TST.COL="tdmSplit" is used.)
a list from which we need here the elements
mainFile: if not NULL, set working dir to dir(mainFile) before executing tdmReadDataset
umode: [ "RSUB" | "CV" | "TST" | "SP_T" ], how to divide in training/validation data for tuning
and test data for the unbiased runs
SPLIT.SEED: if NULL, set random number generator (RNG) to tdmRandomSeed when constructing.
dataObj. If not NULL, set RNG to SPLIT.SEED + nExp --> deterministic test set split
stratified: [NULL] string specifying the column with the response variable for classification.
If not NULL, do the split by stratified sampling (at least one record of each class level
found in dset[,tdm$stratified] shall appear in the train-vali-set). Recommended for classification
[0] experiment counter, used to select a reproducible different seed, if tdm$SPLIT.SEED!=NULL
[NULL] if non-NULL, reading of dset is skipped and the given data frame dset is used.
dataObj, either NULL (if opts$READ.INI==FALSE) or an object of class TDMdata containing
a data frame with the complete data set
string, the name of the column in dset which has a 1 for
records belonging to the test set and a 0 for train/vali records. If tdm$umode=="SP_T", then
TST.COL="tdmSplit", else TST.COL=opts$TST.COL.
opts$filename, from where the data were read
If dset is NULL, the files specified in opts are read into dset, see
tdmReadDataset for details. Then, depending on the value of tdm$umode
"SP_T": split the data randomly into training and test data with test
set fraction according to opts$TST.testFrac. Make use of tdm$SPLIT.SEED
and tdm$stratified, if given. Set TST.COL to "tdmSplit".
"RSUB", "CV": use all data for training/validation. That is, the
training-validation split is done later in tdmClassifyLoop or
tdmRegressLoop.
"TST": split the data into training and test data according to column.
opts$TST.COL (usually "TST.COL"), which carries a 1 for each test record and a 0 else.
If opts$filetest is specified, then all records from this file will
carry a 1 in opts$TST.COL. All records from opts$filename carry a 0.
dsetTrnVa.TDMdata, dsetTest.TDMdata, tdmReadDataset, tdmBigLoop