dcf
Read and Write Data in DCF Format
Reads or writes an R object from/to a file in Debian Control File format.
Usage
read.dcf(file, fields = NULL, all = FALSE, keep.white = NULL)write.dcf(x, file = "", append = FALSE, useBytes = FALSE,
indent = 0.1 * getOption("width"),
width = 0.9 * getOption("width"),
keep.white = NULL)
Arguments
- file
either a character string naming a file or a connection.
""
indicates output to the console. Forread.dcf
this can name a compressed file (seegzfile
).- fields
Fields to read from the DCF file. Default is to read all fields.
- all
a logical indicating whether in case of multiple occurrences of a field in a record, all these should be gathered. If
all
is false (default), only the last such occurrence is used.- keep.white
a character string with the names of the fields for which whitespace should be kept as is, or
NULL
(default) indicating that there are no such fields. Coerced to character if possible. For fields where whitespace is not to be kept as is,read.dcf
removes leading and trailing whitespace, andwrite.dcf
folds usingstrwrap
.- x
the object to be written, typically a data frame. If not, it is attempted to coerce
x
to a data frame.- append
logical. If
TRUE
, the output is appended to the file. IfFALSE
, any existing file of the name is destroyed.- useBytes
logical to be passed to
writeLines()
, see there: “for expert use”.- indent
a positive integer specifying the indentation for continuation lines in output entries.
- width
a positive integer giving the target column for wrapping lines in the output.
Details
DCF is a simple format for storing databases in plain text files that can easily be directly read and written by humans. DCF is used in various places to store R system information, like descriptions and contents of packages.
The DCF rules as implemented in R are:
A database consists of one or more records, each with one or more named fields. Not every record must contain each field. Fields may appear more than once in a record.
Regular lines start with a non-whitespace character.
Regular lines are of form
tag:value
, i.e., have a name tag and a value for the field, separated by:
(only the first:
counts). The value can be empty (i.e., whitespace only).Lines starting with whitespace are continuation lines (to the preceding field) if at least one character in the line is non-whitespace. Continuation lines where the only non-whitespace character is a . are taken as blank lines (allowing for multi-paragraph field values).
Records are separated by one or more empty (i.e., whitespace only) lines.
Individual lines may not be arbitrarily long; prior to R 3.0.2 the length limit was approximately 8191 bytes per line.
Note that read.dcf(all = FALSE)
reads the file byte-by-byte.
This allows a DESCRIPTION
file to be read and only its ASCII
fields used, or its Encoding field used to re-encode the
remaining fields.
write.dcf
does not write NA
fields.
Value
The default read.dcf(all = FALSE)
returns a character matrix
with one row per record and one column per field. Leading and
trailing whitespace of field values is ignored unless a field is
listed in keep.white
. If a tag name is specified in the file,
but the corresponding value is empty, then an empty string is
returned. If the tag name of a field is specified in fields
but never used in a record, then the corresponding value is NA
.
If fields are repeated within a record, the last one encountered is
returned. Malformed lines lead to an error.
For read.dcf(all = TRUE)
a data frame is returned, again with
one row per record and one column per field. The columns are lists of
character vectors for fields with multiple occurrences, and character
vectors otherwise.
Note that an empty file
is a valid DCF file, and
read.dcf
will return a zero-row matrix or data frame.
For write.dcf
, invisible NULL
.
Note
As from R 3.4.0, ‘whitespace’ in all cases includes newlines.
References
https://www.debian.org/doc/debian-policy/index.html#document-ch-controlfields.
Note that R does not require encoding in UTF-8, which is a recent Debian requirement. Nor does it use the Debian-specific sub-format which allows comment lines starting with #.
See Also
available.packages
, which uses read.dcf
to read
the indices of package repositories.
Examples
library(base)
# NOT RUN {
## Create a reduced version of the DESCRIPTION file in package 'splines'
x <- read.dcf(file = system.file("DESCRIPTION", package = "splines"),
fields = c("Package", "Version", "Title"))
write.dcf(x)
## An online DCF file with multiple records
con <- url("http://cran.r-project.org/src/contrib/PACKAGES")
y <- read.dcf(con, all = TRUE)
close(con)
utils::str(y)
# }