Learn R Programming

kutils (version 0.93)

keyTemplate: Create variable key template

Description

A variable key is a human readable document that describes the variables in a data set. A key can be revised and re-imported by R so as to guide the re-importation and recoding of data. This might also be referred to as a "programmable codebook." This function inspects a data frame, takes notice of its variable names, their classes, and legal values, and then it creates a table summarizing that information. The aim is to create a document that principal investigators and research assistants can use to keep a project well organize. Please see the vignette in this package.

Usage

keyTemplate(dframe, long = FALSE, sort = FALSE, file = NULL,
  outdir = getwd(), max.levels = 15, safeNumericToInteger = TRUE)

Arguments

dframe
A data frame
long
Default FALSE.
sort
Default FALSE. Should the rows representing the variables be sorted alphabetically? Otherwise, they appear in the order in which they were included in the original dataset.
file
DEFAULT NULL, meaning no file is produced. The primary purpose of this function is to create the output file, so users should write a text string for the output file's base name. We planned for 3 storage formats, comma separate variables, MS Excel XLSX format, or R's RDS. To designate which format will be used, the user should specify a file name that ends in a suffix ".csv", ".rds", or ".xlsx". If spreadsheet output in an XLSX file is requested, user should make sure the openxlsx package is installed.
outdir
The output directory for the new variable key files. Default is current working directory.
max.levels
This is to regulate the enumeration of values for integer, character, and Date variables. When enumerating existing values for a variable, what is the maximum number of levels that should be included in the key? Default = 15. This does not affect variables declared as factor or ordered variables, for which all levels are included in all cases.
safeNumericToInteger
Default TRUE: Should we treat values which appear to be integers as integers? If a column is numeric, it might be safe to treat it as an integer. In many csv data sets, the values coded c(1, 2, 3) are really integers, not floats c(1.0, 2.0, 3.0). See safeInteger.

Value

A key in the form of a data frame. May also be saved on disk.

Details

The variable key can be created in two formats. The original style of the variable key has one row per variable. It has a style for compact notation about current values and required recodes. That is more compact, probably easier for experts to use, but perhaps more complicated for non-programmers. The long style variable key has one row per value per variable. Thus, in a larger project, the long style key can be quite voluminous. However, in a larger project, the long style key seems to be more likely to generate the intended result.

After a key is created, it should be re-imported into R with the kutils::keyImport function. Then the key structure can guide the importation and recoding of the data set.

Examples

Run this code
set.seed(234234)
N <- 200
mydf <- data.frame(x5 = rnorm(N),
                   x4 = rpois(N, lambda = 3),
                   x3 = ordered(sample(c("lo", "med", "hi"),
                   size = N, replace=TRUE),
                   levels = c("med", "lo", "hi")),
                   x2 = letters[sample(c(1:4,6), N, replace = TRUE)],
                   x1 = factor(sample(c("cindy", "bobby", "marcia",
                                        "greg", "peter"), N,
                   replace = TRUE)),
                   x7 = ordered(letters[sample(c(1:4,6), N, replace = TRUE)]),
                   x6 = sample(c(1:5), N, replace = TRUE),
                   stringsAsFactors = FALSE)
mydf$x4[sample(1:N, 10)] <- 999
mydf$x5[sample(1:N, 10)] <- -999

## Should be same as content of
## write.csv(mydf, file = "../inst/extdata/mydf.csv", row.names = FALSE)
mydf.templ <- keyTemplate(mydf, file = "mydf.templ.csv")
mydf.templ_long <- keyTemplate(mydf, long = TRUE, file = "mydf.templlong.csv")
## write.csv(mydf.templ, file = "../inst/extdata/mydf.templ.csv", row.names = FALSE)
## write.csv(mydf.templ_long, file = "../inst/extdata/mydf.templ_long.csv", row.names = FALSE)
## smartSave(mydf.templ, file = "mydf.templ.xlsx", outdir = ".")
## smartSave(mydf.templ_long, file = "mydf.templ_long.xlsx", outdir = ".")

## Try with the national longitudinal study data
data(natlongsurv)
natlong.templ <- keyTemplate(natlongsurv, file = "natlongsurv.templ.csv",
                           max.levels = 15, sort = TRUE)

natlong.templlong <- keyTemplate(natlongsurv, long = TRUE,
                   file = "natlongsurv.templ_long.csv", sort = TRUE)

if (require(openxlsx)){
   openxlsx::write.xlsx(natlong.templ, file = "natlongsurv.templ.xlsx")
   openxlsx::write.xlsx(natlong.templlong, file = "natlongsurv.templ_long.xlsx")
}


Run the code above in your browser using DataLab