`factor`

is used to encode a vector as a factor (the
terms ‘category’ and ‘enumerated type’ are also used for
factors). If argument `ordered`

is `TRUE`

, the factor
levels are assumed to be ordered. For compatibility with S there is
also a function `ordered`

. `is.factor`

, `is.ordered`

, `as.factor`

and `as.ordered`

are the membership and coercion functions for these classes.```
factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)
```ordered(x, …)

is.factor(x)
is.ordered(x)

as.factor(x)
as.ordered(x)

addNA(x, ifany = FALSE)

x

a vector of data, usually taking a small number of distinct
values.

levels

an optional vector of the values (as character strings)
that *of *. Note that this set can be
specified as smaller than

`x`

might have taken. The default is the unique set of
values taken by `as.character(x)`

, sorted into
increasing order `x`

`sort(unique(x))`

.labels

`levels`

after removing those in
`exclude`

), exclude

a vector of values to be excluded when forming the
set of levels. This should be of the same type as

`x`

, and
will be coerced if necessary.ordered

logical flag to determine if the levels should be regarded
as ordered (in the order given).

nmax

an upper bound on the number of levels; see ‘Details’.

…

(in

`ordered(.)`

): any of the above, apart from
`ordered`

itself.ifany

(only add an

`NA`

level if it is used, i.e.
if `any(is.na(x))`

.`factor`

returns an object of class `"factor"`

which has a
set of integer codes the length of `x`

with a `"levels"`

attribute of mode `character`

and unique
(`!anyDuplicated(.)`

) entries. If argument `ordered`

is true (or `ordered()`

is used) the result has class
`c("ordered", "factor")`

. Applying `factor`

to an ordered or unordered factor returns a
factor (of the same type) with just the levels which occur: see also
`[.factor`

for a more transparent way to achieve this. `is.factor`

returns `TRUE`

or `FALSE`

depending on
whether its argument is of type factor or not. Correspondingly,
`is.ordered`

returns `TRUE`

when its argument is an ordered
factor and `FALSE`

otherwise. `as.factor`

coerces its argument to a factor.
It is an abbreviated form of `factor`

. `as.ordered(x)`

returns `x`

if this is ordered, and
`ordered(x)`

otherwise. `addNA`

modifies a factor by turning `NA`

into an extra
level (so that `NA`

values are counted in tables, for instance).`"levels"`

attribute. Be careful only to compare factors with
the same set of levels (in the same order). In particular,
`as.numeric`

applied to a factor is meaningless, and may
happen by implicit coercion. To transform a factor `f`

to
approximately its original numeric values,
`as.numeric(levels(f))[f]`

is recommended and slightly more
efficient than `as.numeric(as.character(f))`

. The levels of a factor are by default sorted, but the sort order
may well depend on the locale at the time of creation, and should
not be assumed to be ASCII. There are some anomalies associated with factors that have
`NA`

as a level. It is suggested to use them sparingly, e.g.,
only for tabulation purposes.`"factor"`

and `"ordered"`

methods for the
group generic `Ops`

which
provide methods for the Comparison operators,
and for the `min`

, `max`

, and
`range`

generics in `Summary`

of `"ordered"`

. (The rest of the groups and the
`Math`

group generate an error as they
are not meaningful for factors.) Only `==`

and `!=`

can be used for factors: a factor can
only be compared to another factor with an identical set of levels
(not necessarily in the same ordering) or to a character vector.
Ordered factors are compared in the same way, but the general dispatch
mechanism precludes comparing ordered and unordered factors. All the comparison operators are available for ordered factors.
Collation is done by the levels of the operands: if both operands are
ordered factors they must have the same level set.`x`

is not restricted; it only must have
an `as.character`

method and be sortable (by
`sort.list`

). Ordered factors differ from factors only in their class, but methods
and the model-fitting functions treat the two classes quite differently. The encoding of the vector happens as follows. First all the values
in `exclude`

are removed from `levels`

. If `x[i]`

equals `levels[j]`

, then the `i`

-th element of the result is
`j`

. If no match is found for `x[i]`

in `levels`

(which will happen for excluded values) then the `i`

-th element
of the result is set to `NA`

. Normally the ‘levels’ used as an attribute of the result are
the reduced set of levels after removing those in `exclude`

, but
this can be altered by supplying `labels`

. This should either
be a set of new labels for the levels, or a character string, in
which case the levels are that character string with a sequence
number appended. `factor(x, exclude = NULL)`

applied to a factor is a no-operation
unless there are unused levels: in that case, a factor with the
reduced level set is returned. If `exclude`

is used it should
also be a factor with the same level set as `x`

or a set of codes
for the levels to be excluded. The codes of a factor may contain `NA`

. For a numeric
`x`

, set `exclude = NULL`

to make `NA`

an extra
level (prints as `<NA>`

); by default, this is the last level. If `NA`

is a level, the way to set a code to be missing (as
opposed to the code of the missing level) is to
use `is.na`

on the left-hand-side of an assignment (as in
`is.na(f)[i] <- TRUE`

; indexing inside `is.na`

does not work).
Under those circumstances missing values are currently printed as
`<NA>`

, i.e., identical to entries of level `NA`

. `is.factor`

is generic: you can write methods to handle
specific classes of objects, see InternalMethods. Where `levels`

is not supplied, `unique`

is called.
Since factors typically have quite a small number of levels, for large
vectors `x`

it is helpful to supply `nmax`

as an upper bound
on the number of unique values.`[.factor`

for subsetting of factors. `gl`

for construction of balanced factors and
`C`

for factors with specified contrasts.
`levels`

and `nlevels`

for accessing the
levels, and `unclass`

to get integer codes.```
(ff <- factor(substring("statistics", 1:10, 1:10), levels = letters))
as.integer(ff) # the internal codes
(f. <- factor(ff)) # drops the levels that do not occur
ff[, drop = TRUE] # the same, more transparently
factor(letters[1:20], labels = "letter")
class(ordered(4:1)) # "ordered", inheriting from "factor"
z <- factor(LETTERS[3:1], ordered = TRUE)
## and "relational" methods work:
stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z))
## suppose you want "NA" as a level, and to allow missing values.
(x <- factor(c(1, 2, NA), exclude = NULL))
is.na(x)[2] <- TRUE
x # [1] 1 <NA> <NA>
is.na(x)
# [1] FALSE TRUE FALSE
## Using addNA()
Month <- airquality$Month
table(addNA(Month))
table(addNA(Month, ifany = TRUE))
```

Run the code above in your browser using DataLab