read_tidys: Read tidy-shaped files

Description

A function that imports tidy-shaped files into R. Largely acts as a wrapper for read.csv, read_xls, read_xls, or read_xlsx, but can handle multiple files at once and has additional options for taking subsets of rows/columns rather than the entire file and for adding filename or run names as an added column in the output.

Usage

read_tidys(
  files,
  filetype = NULL,
  startrow = NULL,
  endrow = NULL,
  startcol = NULL,
  endcol = NULL,
  sheet = NULL,
  run_names = NULL,
  run_names_header = NULL,
  run_names_dot = FALSE,
  run_names_path = TRUE,
  run_names_ext = FALSE,
  na.strings = c("NA", ""),
  extension,
  names_to_col,
  ...
)

Value

A dataframe containing a single tidy data.frame, or A list of tidy-shaped data.frames named by filename

Arguments

files

A vector of filepaths (relative to current working directory) where each one is a tidy-shaped data file

filetype

(optional) the type(s) of the files. Options include:

"csv", "xls", or "xlsx".

"tbl" or "table" to use read.table to read the file, "csv2" to use read.csv2, "delim" to use read.delim, or "delim2" to use read.delim2.

If none provided, read_tidys will infer filetype(s) from the extension(s) in files. When extension is not "csv", "xls", or "xlsx", will use "table".

startrow, endrow, startcol, endcol

(optional) the rows and columns where the data are located in files.

Can be a vector or list the same length as files, or a single value that applies to all files. Values can be numeric or a string that will be automatically converted to numeric by from_excel.

If not provided, data is presumed to begin on the first row and column of the file(s) and end on the last row and column of the file(s).

sheet

The sheet of the input files where data is located (if input files are .xls or .xlsx). If not specified defaults to the first

run_names

Names to give the tidy files read in. By default uses the file names if not specified. These names may be added to the resulting data frame depending on the value of the names_to_col argument

run_names_header

Should the run names (provided in run_names or inferred from files) be added as a column to the output?

If run_names_header is TRUE, they will be added with the column name "run_name"

If run_names_header is FALSE, they will not be added.

If run_names_header is a string, they will be added and the column name will be the string specified for run_names_header.

If run_names_header is NULL, they only will be added if there are multiple tidy data.frames being read. In which case, the column name will be "run_name"

run_names_dot

If run_names are inferred from filenames, should the leading './' (if any) be retained

run_names_path

If run_names are inferred from filenames, should the path (if any) be retained

run_names_ext

If run_names are inferred from filenames, should the file extension (if any) be retained

na.strings

A character vector of strings which are to be interpreted as NA values by read.csv, read_xls, read_xlsx, or read.table

extension

Deprecated, use filetype instead

names_to_col

Deprecated, use run_names_header instead

...

Other arguments passed to read.csv, read_xls, read_xlsx, or read.table sheet

Details

startrow, endrow, startcol, endcol, sheet and filetype can either be a single value that applies for all files or vectors or lists the same length as files

Note that the startrow is always assumed to be a header