Parse a SAS Dataline Command
A parser for simple SAS dataline command texts. A
data.frame is being built with the columnnames listed in the input section. The data object will be created in the given environment.
ParseSASDatalines(x, env = .GlobalEnv, overwrite = FALSE)
- the SAS text
- environment in which the dataset should be created.
- logical. If set to TRUE, the function will silently overwrite a potentially existing object in
envwith the same name as declared in the SAS
DATAsection. If set to
FALSE(default) an error will be raised if there already exists an object with the same name.
The SAS function
DATA is designed for quickly creating a dataset from scratch. The whole step normally consists out of the
DATA part defining the name of the dataset, an
INPUT line declaring the variables and a
DATALINES command followed by the values.
The default delimiter used to separate the different variables is a space (thus each variable should be one word). The $ after the variable name indicates that the variable preceding contain character values and not numeric values. Without specific instructions, SAS assumes that variables are numeric. The function will fail, if it encounters a character in the place of an expected numeric value.
Each new row in datalines will create a corresponding unique row in the dataset. Notice that a ; is not needed after every row, rather it is included at the end of the entire data step.
More complex command structures, i.e. other delimiters (dlm), in the
INPUT-section are not (yet) supported.
txt <- " DATA asurvey; INPUT id sex $ age inc r1 r2 r3 ; DATALINES; 1 F 35 17 7 2 2 17 M 50 14 5 5 3 33 F 45 6 7 2 7 49 M 24 14 7 5 7 65 F 52 9 4 7 7 81 M 44 11 7 7 7 2 F 34 17 6 5 3 18 M 40 14 7 5 2 34 F 47 6 6 5 6 50 M 35 17 5 7 5 ; " (d.frm <- ParseSASDatalines(txt))