reader: Flexibly load from a text or binary file, accepts multiple file formats.

Description

Uses file extension to distinguish between binary, csv or other text formats. Then tries to automatically determine other parameters necessary to read the file. Will attempt to detect the delimiter, and detect whether there is a heading/column names, and whether the first column should be rownames, or left as a data column. Internal calls to standard file reading functions use 'stringsAsFactors=FALSE'.

Usage

reader(fn, dir = "", want.type = NULL, def = "\t", force.read = TRUE, header = NA, h.test.p = 0.05, quiet = TRUE, treatas = NULL, override = FALSE, more.types = NULL, auto.vec = TRUE, one.byte = TRUE, ...)

Arguments

filename (with or without path if dir is specified)

dir

optional directory if separate path/filename is preferred

want.type

if loading a binary file with multiple objects, specify here the is() type of object you are trying to load

def

the default delimiter to try first

force.read

attempt to read the file even if the file type looks unsupported

header

presence of a header should be autodetected, but can specify header status if you don't trust the autodetection

h.test.p

p value to discriminate between number of characters in a column name versus a column value (sensitivity parameter for automatic header detection)

quiet

run without messages and warnings

treatas

a standard file extension, e.g, 'txt', to treat file as

override

assume first col is rownames, regardless of heuristic

more.types

optionally add more file types which are read as text

auto.vec

if the file seems to only have a single column, automatically return the result as a vector rather than a dataframe with 1 column

one.byte

logical parameter, passed to 'get.delim', whether to look for only 1-byte delimiters, to also search for 'whitespace' which is a multibyte (wildcard) delimiter type. Use one.byte = FALSE, to read fixed width files, e.g, many plink files.

...

further arguments to the function used by 'reader' to parse the file, e.g, depending on file.type, can be read.table(), read.delim(), read.csv().

Value

returns the most appropriate object depending on the file type, which is usually a data.frame except for binary files

Examples

Run this code

orig.dir <- getwd(); setwd(tempdir()); # move to temporary dir
# create some datasets
df <- data.frame(ID=paste("ID",101:110,sep=""),
  scores=sample(70,10,TRUE)+30,age=sample(7,10,TRUE)+11)
DNA <- apply(matrix(c("A","C","G","T")[sample(4,100,TRUE)],nrow=10),
                                                1,paste,collapse="")
fix.wid <- c("    MyVal    Results        Check",
  "    0.234      42344          yes",
  "    0.334        351          yes","    0.224         46           no",
  "    0.214     445391          yes")
# save data to various file formats
test.files <- c("temp.txt","temp2.txt","temp3.csv",
                              "temp4.rda","temp5.fasta","temp6.txt")
write.table(df,file=test.files[1],col.names=FALSE,row.names=FALSE,sep="|",quote=TRUE)
write.table(df,file=test.files[2],col.names=TRUE,row.names=TRUE,sep="\t",quote=FALSE)
write.csv(df,file=test.files[3])
save(df,file=test.files[4])
writeLines(DNA,con=test.files[5])
writeLines(fix.wid,con=test.files[6])
# use the same reader() function call to read in each file
for(cc in 1:length(test.files)) {
  cat(test.files[cc],"\n")
  myobj <- reader(test.files[cc])  # add 'quiet=FALSE' to see some working
  print(myobj); cat("\n\n")
}
# inspect files before deleting if desired
unlink(test.files) 
# myobj <- reader(file.choose()); myobj # run this to attempt opening a file
setwd(orig.dir) # reset working directory to original

Run the code above in your browser using DataLab