The declared
objects are very similar to the haven_labelled_spss
objects
from package haven. It has exactly the same arguments, but it features
a fundamental difference in the treatment of (declared) missing values.
In package haven, existing values are treated as if they were missing.
By contrast, in package declared the NA values are treated as if they
were existing values.
This difference is fundamental and points to an inconsistency in package
haven: while existing values can be identified as missing using the
function is.na()
, they are in fact present in the vector and other
packages (most importantly the base ones) do not know these values should be
treated as missing.
Consequently, the existing values are interpreted as missing only by package
haven. Statistical procedures will use those values as if they were
valid values.
Package declared approaches the problem in exactly the opposite way:
instead of treating existing values as missing, it treats (certain) NA values
as existing. It does that by storing an attribute containing the indices of
those NA values which are to be treated as declared missing values, and it
refreshes this attribute each time the declared object is changed.
This is a trade off and has important implications when subsetting datasets:
all declared variables get this attribute refreshed, which consumes some time
depending on the number of variables in the data.
The generic function as.declared()
attempts to coerce only the compatible
types of objects, namely haven_labelled
and factor
s. Dedicated class
methods can be written for any other type of object, and users are free to
write their own. To end of with a declared object, additional metadata is
needed such as value labels, which values should be treated as missing etc.
The measurement level is optional and, for the moment, purely aesthetic. It
might however be useful to (automatically) determine if a declared object is
suitable for a certain statistical analysis, for instance regression requires
quantitative variables, while some declared objects are certainly categorical
despite using numbers to denote categories.
It distinguishes between "categorial"
and "quantitative"
types of
variables, and additionally recognizes "nominal"
and "ordinal"
as
categorical, and similarly recognizes "interval"
, "ratio"
,
"discrete"
and "continuous"
as quantitative.