process_delim: A function to read in large data files as a filebacked `big.matrix`

Description

A function to read in large data files as a filebacked big.matrix

Usage

process_delim(
  data_dir,
  data_file,
  feature_id,
  rds_dir = data_dir,
  rds_prefix,
  logfile = NULL,
  overwrite = FALSE,
  quiet = FALSE,
  ...
)

Value

The file path to the newly created .rds file

Arguments

data_dir: The directory to the file.
data_file: The file to be read in, without the filepath. This should be a file of numeric values. Example: use data_file = "myfile.txt", not data_file = "~/mydirectory/myfile.txt" Note: if your file has headers/column names, set header = TRUE -- this will be passed into bigmemory::read.big.matrix().
feature_id: A string specifying the column in the data X (the feature data) with the row IDs (e.g., identifiers for each row/sample/participant/, etc.). No duplicates allowed.
rds_dir: The directory where the user wants to create the .rds and .bk files. Defaults to data_dir
rds_prefix: String specifying the user's preferred filename for the to-be-created .rds file (will be create inside rds_dir folder). Note: rds_prefix cannot be the same as data_prefix
logfile: Optional: the name (character string) of the prefix of the logfile to be written in rds_dir. Default to NULL (no log file written). Note: do not append a .log to the filename; this is done automatically.
overwrite: Logical: if existing .bk/.rds files exist for the specified directory/prefix, should these be overwritten? Defaults to FALSE. Set to TRUE if you want to change the imputation method you're using, etc.
quiet: Logical: should console messages be silenced? Defaults to FALSE
...: Optional: other arguments to be passed to bigmemory::read.big.matrix(). Note: sep is an option to pass here, as is header.

Examples

Run this code

temp_dir <- tempdir()
colon_dat <- process_delim(data_file = "colon2.txt",
 data_dir = find_example_data(parent = TRUE), overwrite = TRUE,
 rds_dir = temp_dir, rds_prefix = "processed_colon2", sep = "\t", header = TRUE)

colon2 <- readRDS(colon_dat)
str(colon2)

Run the code above in your browser using DataLab