codeplan: Describe structure of Data Sets and Importers

Description

The function codeplan() creates a data frame that describes the structure of an item list (a data.set object or an importer object), so that this structure can be stored and and recovered. The resulting data frame has a particular print method that delimits the output to one line per variable.

With setCodeplan an item list structure (as returned by codeplan()) can be applied to a data frame or data set. It is also possible to use an assignment like codeplan(x) <- value to a similar effect.

Usage

codeplan(x)
# S4 method for item.list
codeplan(x)
# S4 method for item
codeplan(x)
setCodeplan(x,value)
# S4 method for data.frame,codeplan
setCodeplan(x,value)
# S4 method for data.frame,NULL
setCodeplan(x,value)
# S4 method for data.set,codeplan
setCodeplan(x,value)
# S4 method for data.set,NULL
setCodeplan(x,value)
# S4 method for item,codeplan
setCodeplan(x,value)
# S4 method for item,NULL
setCodeplan(x,value)
# S4 method for atomic,codeplan
setCodeplan(x,value)
# S4 method for atomic,NULL
setCodeplan(x,value)
codeplan(x) <- value
read_codeplan(filename,type)
write_codeplan(x,filename,type,pretty)

Value

If applicable, codeplan returns a list with additional S3 class attribute "codeplan". For arguments for which the relevant information does not exist, the function returns NULL.

The list has at least one element or several elements, named after the variable in the "item.list" or "data.set" x. Each list element is a list itself with the following elements:

annotation: a named character vector,
labels: a named list of labels and labelled values
value.filter: a list with at least two elements named "class" and "filter", and optionally another element named "range". The "class" element determines the class of the value filter and equals either "missing.values", "valid.values", or "valid.range". An element named "range" may only be needed if "class" is "missing.values", as it is possible (like in SPSS) to have both individual missing values and a range of missing values.
mode: a character string that describes storage mode, such as "character", "integer", or "numeric".
measurement: a character string with the measurement level, "nominal", "ordinal", "interval", or "ratio".

If codeplan(x)<-value or setCodeplan(x,value) is used and value is NULL, all the special information about annotation, labels, value filters, etc. is removed from the resulting object, which then is usually a mere atomic vector or data frame.

Arguments

x: for codeplan(x) an object that inherits from class "item.list", i.e. can be a "data.set" object or an "importer" object, it can also be an object that inherits from class "item". For write_codeplan an object from class "codeplan".
value: an object as it would be returned by codeplan(x) or NULL.
filename: a character string, the name of the file that is to be read or to be written.
type: a character string (either "yaml" or "json") oder NULL (the default), gives the type of the file into which the codeplan is written or from which it is read. If type is NULL then the file type is inferred from the file name ending (".yaml" or ",yml" for "yaml", ".json" for "json").
pretty: a logical value, whether the JSON output created by write_codeplan(...) should be prettified.

Examples

Run this code

Data1 <- data.set(
          vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
          region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
          income = exp(rnorm(300,sd=.7))*2000
          )

Data1 <- within(Data1,{
  description(vote) <- "Vote intention"
  description(region) <- "Region of residence"
  description(income) <- "Household income"
  foreach(x=c(vote,region),{
    measurement(x) <- "nominal"
    })
  measurement(income) <- "ratio"
  labels(vote) <- c(
                    Conservatives         =  1,
                    Labour                =  2,
                    "Liberal Democrats"   =  3,
                    "Don't know"          =  8,
                    "Answer refused"      =  9,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  labels(region) <- c(
                    England               =  1,
                    Scotland              =  2,
                    Wales                 =  3,
                    "Not applicable"      = 97,
                    "Not asked in survey" = 99)
  foreach(x=c(vote,region,income),{
    annotation(x)["Remark"] <- "This is not a real survey item, of course ..."
    })
  missing.values(vote) <- c(8,9,97,99)
  missing.values(region) <- c(97,99)
})
cpData1 <- codeplan(Data1)

Data2 <- data.frame(
          vote = sample(c(1,2,3,8,9,97,99),size=300,replace=TRUE),
          region = sample(c(rep(1,3),rep(2,2),3,99),size=300,replace=TRUE),
          income = exp(rnorm(300,sd=.7))*2000
          )
codeplan(Data2) <- cpData1
codeplan(Data2)
codebook(Data2)

# Note the difference between 'as.data.frame' and setting
# the codeplan to NULL:
Data2df <- as.data.frame(Data2)
codeplan(Data2) <- NULL
str(Data2)
str(Data2df)
codeplan(Data2) <- NULL # Does not change anything

# Codeplans of survey items can also be inquired and manipulated:
vote <- Data1$vote
str(vote)
cp.vote <- codeplan(vote)
codeplan(vote) <- NULL
str(vote)
codeplan(vote) <- cp.vote
vote

fn.json <- paste0(tempfile(),".json")
write_codeplan(codeplan(Data1),filename=fn.json)
codeplan(Data2) <- read_codeplan(fn.json)
codeplan(Data2)

Run the code above in your browser using DataLab