Learn R Programming

MIC (version 1.1.0)

train_test_filesystem: Organise files into a train-test filesystem

Description

Organise files into a train-test filesystem

Usage

train_test_filesystem(
  path_to_files,
  file_ext,
  split = 0.8,
  train_folder = "train",
  test_folder = "test",
  shuffle = TRUE,
  overwrite = FALSE
)

Value

named vector of train and test directories

Arguments

path_to_files

directory containing files

file_ext

file extension to filter

split

training data split

train_folder

name of training folder (subdirectory), will be created if does not exist

test_folder

name of testing folder (subdirectory), will be created if does not exist

shuffle

randomise files when splitting (if FALSE, files will be sorted by filename prior to splitting)

overwrite

force overwrite of files that already exist

Examples

Run this code
set.seed(123)
# create 10 random DNA files
tmp_dir <- tempdir()
# remove any existing .fna files
file.remove(
  list.files(tmp_dir, pattern = "*.fna", full.names = TRUE)
)

for (i in 1:10) {
 writeLines(paste0(">", i, "\n", paste0(sample(c("A", "T", "C", "G"),
 100, replace = TRUE), collapse = "")), file.path(tmp_dir, paste0(i, ".fna")))
}

# split files into train and test directories
paths <- train_test_filesystem(tmp_dir,
                               file_ext = "fna",
                               split = 0.8,
                               shuffle = TRUE,
                               overwrite = TRUE)

list.files(paths[["train"]])
list.files(paths[["test"]])

Run the code above in your browser using DataLab