write.big.matrix: File interface for a ``big.matrix''

Description

Create a big.matrix by reading from a suitably-formatted ASCII file, or write the contents of a big.matrix to a file.

Usage

write.big.matrix(x, filename, row.names = FALSE, col.names = FALSE, sep = ",")
# S4 method for big.matrix,character
write.big.matrix(x, filename, row.names = FALSE, col.names = FALSE, sep = ",")
read.big.matrix(
  filename,
  sep = ",",
  header = FALSE,
  col.names = NULL,
  row.names = NULL,
  has.row.names = FALSE,
  ignore.row.names = FALSE,
  type = NA,
  skip = 0,
  separated = FALSE,
  backingfile = NULL,
  backingpath = NULL,
  descriptorfile = NULL,
  binarydescriptor = FALSE,
  extraCols = NULL,
  shared = options()$bigmemory.default.shared
)
# S4 method for character
read.big.matrix(
  filename,
  sep = ",",
  header = FALSE,
  col.names = NULL,
  row.names = NULL,
  has.row.names = FALSE,
  ignore.row.names = FALSE,
  type = NA,
  skip = 0,
  separated = FALSE,
  backingfile = NULL,
  backingpath = NULL,
  descriptorfile = NULL,
  binarydescriptor = FALSE,
  extraCols = NULL,
  shared = options()$bigmemory.default.shared
)

Value

a big.matrix object is returned by read.big.matrix, while write.big.matrix creates an output file (a path could be part of filename).

Arguments

x: a big.matrix.
filename: the name of an input/output file.
row.names: a vector of names, use them even if row names appear to exist in the file.
col.names: a vector of names, use them even if column names exist in the file.
sep: a field delimiter.
header: if TRUE, the first line (after a possible skip) should contain column names.
has.row.names: if TRUE, then the first column contains row names.
ignore.row.names: if TRUE when has.row.names==TRUE, the row names will be ignored.
type: preferably specified, "integer" for example.
skip: number of lines to skip at the head of the file.
separated: use separated column organization of the data instead of column-major organization.
backingfile: the root name for the file(s) for the cache of x.
backingpath: the path to the directory containing the file backing cache.
descriptorfile: the file to be used for the description of the filebacked matrix.
binarydescriptor: the flag to specify if the binary RDS format should be used for the backingfile description, for subsequent use with attach.big.matrix; if NULL of FALSE, the dput() file format is used.
extraCols: the optional number of extra columns to be appended to the matrix for future use.
shared: if TRUE, the resulting big.matrix can be shared across processes.

Author

John W. Emerson and Michael J. Kane bigmemoryauthors@gmail.com

Details

Files must contain only one atomic type (all integer, for example). You, the user, should know whether your file has row and/or column names, and various combinations of options should be helpful in obtaining the desired behavior.

When reading from a file, if type is not specified we try to make a reasonable guess for you without making any guarantees at this point. Unless you have really large integer values, we recommend you consider "short". If you have something that is essentially categorical, you might even be able use "char", with huge memory savings for large data sets.

Any non-numeric entry will be ignored and replaced with NA, so reading something that traditionally would be a data.frame won't cause an error. A warning is issued.

Wishlist: we'd like to provide an option to ignore specified columns while doing reads. Or perhaps to specify columns targeted for factor or character conversion to numeric values. Would you use such features? Email us and let us know!

Examples

Run this code

# Without specifying the type, this big.matrix x will hold integers.

x <- as.big.matrix(matrix(1:10, 5, 2))
x[2,2] <- NA
x[,]
temp_dir = tempdir()
if (!dir.exists(temp_dir)) dir.create(temp_dir)
write.big.matrix(x, file.path(temp_dir, "foo.txt"))

# Just for fun, I'll read it back in as character (1-byte integers):
y <- read.big.matrix(file.path(temp_dir, "foo.txt"), type="char")
y[,]

# Other examples:
w <- as.big.matrix(matrix(1:10, 5, 2), type='double')
w[1,2] <- NA
w[2,2] <- -Inf
w[3,2] <- Inf
w[4,2] <- NaN
w[,]
write.big.matrix(w, file.path(temp_dir, "bar.txt"))
w <- read.big.matrix(file.path(temp_dir, "bar.txt"), type="double")
w[,]
w <- read.big.matrix(file.path(temp_dir, "bar.txt"), type="short")
w[,]

# Another example using row names (which we don't like).
x <- as.big.matrix(as.matrix(iris), type='double')
rownames(x) <- as.character(1:nrow(x))
head(x)
write.big.matrix(x, file.path(temp_dir, 'IrisData.txt'), col.names=TRUE, 
                 row.names=TRUE)
y <- read.big.matrix(file.path(temp_dir, "IrisData.txt"), header=TRUE, 
                     has.row.names=TRUE)
head(y)

# The following would fail with a dimension mismatch:
if (FALSE) y <- read.big.matrix(file.path(temp_dir, "IrisData.txt"), 
                                header=TRUE)

Run the code above in your browser using DataLab