factor is used to encode a vector as a factor (the
terms ‘category’ and ‘enumerated type’ are also used for
factors). If argument
TRUE, the factor
levels are assumed to be ordered. For compatibility with S there is
also a function
are the membership and coercion functions for these classes.
factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)
addNA(x, ifany = FALSE)
a vector of data, usually taking a small number of distinct values.
an optional vector of the values (as character strings) that
xmight have taken. The default is the unique set of values taken by
as.character(x), sorted into increasing order of
x. Note that this set can be specified as smaller than
either an optional character vector of (unique) labels for the levels (in the same order as
levelsafter removing those in
exclude), or a character string of length 1.
a vector of values to be excluded when forming the set of levels. This may be factor with the same level set as
xor should be a
logical flag to determine if the levels should be regarded as ordered (in the order given).
an upper bound on the number of levels; see ‘Details’.
ordered(.)): any of the above, apart from
(only add an
NAlevel if it is used, i.e. if
Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently.
The encoding of the vector happens as follows. First all the values
exclude are removed from
levels[j], then the
i-th element of the result is
j. If no match is found for
(which will happen for excluded values) then the
of the result is set to
Normally the ‘levels’ used as an attribute of the result are
the reduced set of levels after removing those in
this can be altered by supplying
labels. This should either
be a set of new labels for the levels, or a character string, in
which case the levels are that character string with a sequence
factor(x, exclude = NULL) applied to a factor without
NAs is a no-operation unless there are unused levels: in
that case, a factor with the reduced level set is returned. If
exclude is used, since R version 3.4.0, excluding non-existing
character levels is equivalent to excluding nothing, and when
exclude is a
character vector, that is
applied to the levels of
exclude can be factor with the same level set as
x and will exclude the levels present in
NA is a level, the way to set a code to be missing (as
opposed to the code of the missing level) is to
is.na on the left-hand-side of an assignment (as in
is.na(f)[i] <- TRUE; indexing inside
is.na does not work).
Under those circumstances missing values are currently printed as
<NA>, i.e., identical to entries of level
is.factor is generic: you can write methods to handle
specific classes of objects, see InternalMethods.
levels is not supplied,
unique is called.
Since factors typically have quite a small number of levels, for large
x it is helpful to supply
nmax as an upper bound
on the number of unique values.
factor returns an object of class
"factor" which has a
set of integer codes the length of
x with a
attribute of mode
character and unique
!anyDuplicated(.)) entries. If argument
is true (or
ordered() is used) the result has class
Undocumentedly for a long time,
factor(x) loses all
"names", and resets
factor to an ordered or unordered factor returns a
factor (of the same type) with just the levels which occur: see also
[.factor for a more transparent way to achieve this.
FALSE depending on
whether its argument is of type factor or not. Correspondingly,
TRUE when its argument is an ordered
as.factor coerces its argument to a factor.
It is an abbreviated (sometimes faster) form of
x if this is ordered, and
addNA modifies a factor by turning
NA into an extra
level (so that
NA values are counted in tables, for instance).
.valid.factor(object) checks the validity of a factor,
levels(object), and returns
TRUE if it is
valid, otherwise a string describing the validity problem. This
function is used for
In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings now share storage, so the difference is small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)
The interpretation of a factor depends on both the codes and the
"levels" attribute. Be careful only to compare factors with
the same set of levels (in the same order). In particular,
as.numeric applied to a factor is meaningless, and may
happen by implicit coercion. To transform a factor
approximately its original numeric values,
as.numeric(levels(f))[f] is recommended and slightly more
The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII.
There are some anomalies associated with factors that have
NA as a level. It is suggested to use them sparingly, e.g.,
only for tabulation purposes.
Comparison operators and group generic methods
"ordered" methods for the
provide methods for the Comparison operators,
and for the
range generics in
"ordered". (The rest of the groups and the
Math group generate an error as they
are not meaningful for factors.)
!= can be used for factors: a factor can
only be compared to another factor with an identical set of levels
(not necessarily in the same ordering) or to a character vector.
Ordered factors are compared in the same way, but the general dispatch
mechanism precludes comparing ordered and unordered factors.
All the comparison operators are available for ordered factors. Collation is done by the levels of the operands: if both operands are ordered factors they must have the same level set.
Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.
[.factor for subsetting of factors.
(ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) as.integer(ff) # the internal codes (f. <- factor(ff)) # drops the levels that do not occur ff[, drop = TRUE] # the same, more transparently factor(letters[1:20], labels = "letter") class(ordered(4:1)) # "ordered", inheriting from "factor" z <- factor(LETTERS[3:1], ordered = TRUE) ## and "relational" methods work: stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z)) ## suppose you want "NA" as a level, and to allow missing values. (x <- factor(c(1, 2, NA), exclude = NULL)) is.na(x) <- TRUE x #  1 <NA> <NA> is.na(x) #  FALSE TRUE FALSE ## More rational, since R 3.4.0 : factor(c(1:2, NA), exclude = "" ) # keeps <NA> , as factor(c(1:2, NA), exclude = NULL) # always did ## exclude = <character> z # ordered levels 'A < B < C' factor(z, exclude = "C") # does exclude factor(z, exclude = "B") # ditto ## Using addNA() Month <- airquality$Month table(addNA(Month)) table(addNA(Month, ifany = TRUE))