Learn R Programming

ksformat (version 0.3.5)

fparse: Parse Format Definitions from 'SAS'-like Text

Description

Reads format definitions written in a human-friendly 'SAS'-like syntax and returns a list of ks_format and/or ks_invalue objects. All parsed formats are automatically stored in the global format library.

Usage

fparse(text = NULL, file = NULL)

Value

A named list of ks_format and/or ks_invalue objects. Names correspond to the format names defined in the text. All formats are automatically registered in the global format library.

Arguments

text

Character string or character vector containing format definitions. If a character vector, lines are concatenated with newlines.

file

Path to a text file containing format definitions. Exactly one of text or file must be provided.

Details

The syntax supports two block types:

VALUE blocks define formats (value -> label):


VALUE name (type)
  "value1" = "Label 1"
  "value2" = "Label 2"
  [low, high) = "Range Label (half-open)"
  (low, high] = "Range Label (open-low, closed-high)"
  .missing = "Missing Label"
  .other = "Other Label"
;

INVALUE blocks define reverse formats (label -> numeric value):


INVALUE name
  "Label 1" = 1
  "Label 2" = 2
;

Syntax rules:

  • Blocks start with VALUE or INVALUE keyword and end with ;

  • The type in parentheses is optional; defaults to "auto" for VALUE, "numeric" for INVALUE

  • Values can be quoted or unquoted

  • Ranges use interval notation with explicit bounds

  • Legacy range syntax low - high is also supported

  • Special range keywords: LOW (-Inf) and HIGH (Inf)

  • .missing and .other are special directives

  • Lines starting with /*, *, //, or # are comments

Examples

Run this code
# Parse multiple format definitions from text
fparse(text = '
VALUE sex (character)
  "M" = "Male"
  "F" = "Female"
  .missing = "Unknown"
;

VALUE age (numeric)
  [0, 18)    = "Child"
  [18, 65)   = "Adult"
  [65, HIGH]  = "Senior"
  .missing   = "Age Unknown"
;

// Invalue block
INVALUE race_inv
  "White" = 1
  "Black" = 2
  "Asian" = 3
;
')

fput(c("M", "F", NA), "sex")
fputn(c(5, 25, 70, NA), "age")
finputn(c("White", "Black"), "race_inv")
fprint()
fclear()

# Parse date/time/datetime format definitions
fparse(text = '
VALUE enrldt (date)
  pattern = "DATE9."
  .missing = "Not Enrolled"
;

VALUE visit_time (time)
  pattern = "TIME8."
;

VALUE stamp (datetime)
  pattern = "DATETIME20."
;
')

fput(as.Date("2025-03-01"), "enrldt")
fput(36000, "visit_time")
fput(as.POSIXct("2025-03-01 10:00:00", tz = "UTC"), "stamp")
fclear()

# Parse multilabel format
fparse(text = '
VALUE risk (numeric, multilabel)
  [0, 3]  = "Low Risk"
  [0, 7]  = "Monitored"
  (3, 7]  = "Medium Risk"
  (7, 10] = "High Risk"
;
')
fput_all(c(2, 5, 9), "risk")
fclear()

Run the code above in your browser using DataLab