factoris used to encode a vector as a factor (the terms ‘category’ and ‘enumerated type’ are also used for factors). If argument
TRUE, the factor levels are assumed to be ordered. For compatibility with S there is also a function
as.orderedare the membership and coercion functions for these classes.
factor(x = character(), levels, labels = levels, exclude = NA, ordered = is.ordered(x), nmax = NA)
addNA(x, ifany = FALSE)
xmight have taken. The default is the unique set of values taken by
as.character(x), sorted into increasing order of
x. Note that this set can be specified as smaller than
levelsafter removing those in
exclude), or a character string of length 1.
x, and will be coerced if necessary.
ordered(.)): any of the above, apart from
NAlevel if it is used, i.e. if
factorreturns an object of class
"factor"which has a set of integer codes the length of
"levels"attribute of mode
characterand unique (
!anyDuplicated(.)) entries. If argument
orderedis true (or
ordered()is used) the result has class
c("ordered", "factor"). Applying
factorto an ordered or unordered factor returns a factor (of the same type) with just the levels which occur: see also
[.factorfor a more transparent way to achieve this.
FALSEdepending on whether its argument is of type factor or not. Correspondingly,
TRUEwhen its argument is an ordered factor and
as.factorcoerces its argument to a factor. It is an abbreviated form of
xif this is ordered, and
addNAmodifies a factor by turning
NAinto an extra level (so that
NAvalues are counted in tables, for instance).
"levels"attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular,
as.numericapplied to a factor is meaningless, and may happen by implicit coercion. To transform a factor
fto approximately its original numeric values,
as.numeric(levels(f))[f]is recommended and slightly more efficient than
as.numeric(as.character(f)). The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII. There are some anomalies associated with factors that have
NAas a level. It is suggested to use them sparingly, e.g., only for tabulation purposes.
"ordered"methods for the group generic
Opswhich provide methods for the Comparison operators, and for the
"ordered". (The rest of the groups and the
Mathgroup generate an error as they are not meaningful for factors.) Only
!=can be used for factors: a factor can only be compared to another factor with an identical set of levels (not necessarily in the same ordering) or to a character vector. Ordered factors are compared in the same way, but the general dispatch mechanism precludes comparing ordered and unordered factors. All the comparison operators are available for ordered factors. Collation is done by the levels of the operands: if both operands are ordered factors they must have the same level set.
xis not restricted; it only must have an
as.charactermethod and be sortable (by
sort.list). Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently. The encoding of the vector happens as follows. First all the values in
excludeare removed from
levels[j], then the
i-th element of the result is
j. If no match is found for
levels(which will happen for excluded values) then the
i-th element of the result is set to
NA. Normally the ‘levels’ used as an attribute of the result are the reduced set of levels after removing those in
exclude, but this can be altered by supplying
labels. This should either be a set of new labels for the levels, or a character string, in which case the levels are that character string with a sequence number appended.
factor(x, exclude = NULL)applied to a factor is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned. If
excludeis used it should also be a factor with the same level set as
xor a set of codes for the levels to be excluded. The codes of a factor may contain
NA. For a numeric
exclude = NULLto make
NAan extra level (prints as
<NA>); by default, this is the last level. If
NAis a level, the way to set a code to be missing (as opposed to the code of the missing level) is to use
is.naon the left-hand-side of an assignment (as in
is.na(f)[i] <- TRUE; indexing inside
is.nadoes not work). Under those circumstances missing values are currently printed as
<NA>, i.e., identical to entries of level
is.factoris generic: you can write methods to handle specific classes of objects, see InternalMethods. Where
levelsis not supplied,
uniqueis called. Since factors typically have quite a small number of levels, for large vectors
xit is helpful to supply
nmaxas an upper bound on the number of unique values.
[.factorfor subsetting of factors.
glfor construction of balanced factors and
Cfor factors with specified contrasts.
nlevelsfor accessing the levels, and
unclassto get integer codes.
(ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) as.integer(ff) # the internal codes (f. <- factor(ff)) # drops the levels that do not occur ff[, drop = TRUE] # the same, more transparently factor(letters[1:20], labels = "letter") class(ordered(4:1)) # "ordered", inheriting from "factor" z <- factor(LETTERS[3:1], ordered = TRUE) ## and "relational" methods work: stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z)) ## suppose you want "NA" as a level, and to allow missing values. (x <- factor(c(1, 2, NA), exclude = NULL)) is.na(x) <- TRUE x #  1 <NA> <NA> is.na(x) #  FALSE TRUE FALSE ## Using addNA() Month <- airquality$Month table(addNA(Month)) table(addNA(Month, ifany = TRUE))
Run the code above in your browser using DataCamp Workspace