Learn R Programming

LaF (version 0.5)

detect_dm_csv: Automatically detect data models for CSV-files

Description

Automatically detect data models for CSV-files. Opening of files using the data models can be done using laf_open.

Usage

detect_dm_csv(filename, sep=",", dec=".", header=FALSE, nrows=1000, 
    nlines=NULL, sample=FALSE, factor_fraction=0.4, ...)

Arguments

filename
character containing the filename of the csv-file.
sep
character vector containing the separator used in the file.
dec
the character used for decimal points.
header
does the first line in the file contain the column names.
nrows
the number of lines that should be read in to detect the column types. The more lines the more likely that the correct types are detected.
nlines
(only needed when the sample option is used) the expected number of lines in the file. If not specified the number of lines in the file is first calculated.
sample
by default the first nrows lines are read in for determining the column types. When sample is used random lines from the file are used. This is more robust, but takes longer.
factor_fraction
the fraction of unique string in a column below which the column is converted to a factor/categorical. For more information see details.
...
additional arguments are passed on to read.table. However, be carefull with using these as some of these arguments are not supported by laf_open_csv

Value

  • read_dm returns a data model which can be used by laf_open. The data model can be written to file using write_dm.

Details

The argument factor_fraction determines the fraction of unique strings below which the column is converted to factor/categorical. If all column need to be converted to character a value larger than one can be used. A value smaller than zero will ensure that all columns will be converted to categorical. Note that LaF stores the levels of a categorical in memory. Therefore, for categorical columns with a very large number of (almost) unique levels can cause memory problems.

See Also

See write_dm to write the data model to file. The data models can be used to open a file using laf_open.

Examples

Run this code
# Generate test data
ntest <- 10
column_types <- c("integer", "integer", "double", "string")
testdata <- data.frame(
    a = 1:ntest,
    b = sample(1:2, ntest, replace=TRUE),
    c = round(runif(ntest), 13),
    d = sample(c("jan", "pier", "tjores", "corneel"), ntest, replace=TRUE)
    )
# Write test data to csv file
write.table(testdata, file="tmp.csv", row.names=FALSE, col.names=TRUE, sep=',')

# Detect data model
model <- detect_dm_csv("tmp.csv", header=TRUE)

# Create LaF-object
laf <- laf_open(model)

Run the code above in your browser using DataLab