partition_dataset: Partition latin-hypercube summary file to training, testing, and validation

Description

Used in the development of emulations of a simulation using a latin-hypercube summary file

Usage

partition_dataset(dataset, parameters, measures, percent_train = 75,
  percent_test = 15, percent_validation = 10, seed = NULL,
  normalise = FALSE, sample_mins = NULL, sample_maxes = NULL,
  timepoint = NULL)

Arguments

dataset

LHC summary file to partition

parameters

Simulation parameters the emulation will be fed as input

measures

Simulation responses of interest

percent_train

Percent of the dataset to use as training

percent_test

Percent of the dataset to use as testing

percent_validation

Percent of the dataset to use as validation

seed

For specifying a particular seed when randomly splitting the set

normalise

Whether the data needs to be normalised (to be between 0 and 1). For emulation creation to be successful, all data must be normalised prior to use in training and testing

sample_mins

The minimum value used for each parameter in generating the latin-hypercube sample

sample_maxes

The maximum value used for each parameter in generating the latin-hypercube sample

timepoint

Simulation timepoint for which this summary file was created

Value

Partitioned dataset containing training, testing, and validation sets, in addition to the sample mins and maxes such that any predictions that are generated using this normalised data can be rescaled correctly

Examples

Run this code

# NOT RUN {
data("sim_data_for_emulation")
parameters<-c("stableBindProbability","chemokineExpressionThreshold",
"initialChemokineExpressionValue","maxChemokineExpressionValue",
"maxProbabilityOfAdhesion","adhesionFactorExpressionSlope")
measures<-c("Velocity","Displacement","PatchArea")
sample_maxes <- cbind(100,0.9,0.5,0.08,1,5)
sample_mins <-cbind(0,0.1,0.1,0.015,0.1,0.25)
partitionedData <- partition_dataset(sim_data_for_emulation, parameters,
measures, percent_train=75, percent_test=15, percent_validation=10, normalise=TRUE,
sample_mins = sample_mins, sample_maxes = sample_maxes)

# }

Run the code above in your browser using DataLab