# factor

0th

Percentile

##### Factors

The function factor is used to encode a vector as a factor (the terms ‘category’ and ‘enumerated type’ are also used for factors). If argument ordered is TRUE, the factor levels are assumed to be ordered. For compatibility with S there is also a function ordered.

is.factor, is.ordered, as.factor and as.ordered are the membership and coercion functions for these classes.

Keywords
NA, category
##### Usage
factor(x = character(), levels, labels = levels,
exclude = NA, ordered = is.ordered(x), nmax = NA)ordered(x, …)is.factor(x)
is.ordered(x)as.factor(x)
as.ordered(x)addNA(x, ifany = FALSE)
##### Arguments
x

a vector of data, usually taking a small number of distinct values.

levels

an optional vector of the unique values (as character strings) that x might have taken. The default is the unique set of values taken by as.character(x), sorted into increasing order of x. Note that this set can be specified as smaller than sort(unique(x)).

labels

either an optional character vector of labels for the levels (in the same order as levels after removing those in exclude), or a character string of length 1. Duplicated values in labels can be used to map different values of x to the same factor level.

exclude

a vector of values to be excluded when forming the set of levels. This may be factor with the same level set as x or should be a character.

ordered

logical flag to determine if the levels should be regarded as ordered (in the order given).

nmax

an upper bound on the number of levels; see ‘Details’.

(in ordered(.)): any of the above, apart from ordered itself.

ifany

only add an NA level if it is used, i.e. if any(is.na(x)).

##### Details

The type of the vector x is not restricted; it only must have an as.character method and be sortable (by order).

Ordered factors differ from factors only in their class, but methods and the model-fitting functions treat the two classes quite differently.

The encoding of the vector happens as follows. First all the values in exclude are removed from levels. If x[i] equals levels[j], then the i-th element of the result is j. If no match is found for x[i] in levels (which will happen for excluded values) then the i-th element of the result is set to NA.

Normally the ‘levels’ used as an attribute of the result are the reduced set of levels after removing those in exclude, but this can be altered by supplying labels. This should either be a set of new labels for the levels, or a character string, in which case the levels are that character string with a sequence number appended.

factor(x, exclude = NULL) applied to a factor without NAs is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned. If exclude is used, since R version 3.4.0, excluding non-existing character levels is equivalent to excluding nothing, and when exclude is a character vector, that is applied to the levels of x. Alternatively, exclude can be factor with the same level set as x and will exclude the levels present in exclude.

The codes of a factor may contain NA. For a numeric x, set exclude = NULL to make NA an extra level (prints as <NA>); by default, this is the last level.

If NA is a level, the way to set a code to be missing (as opposed to the code of the missing level) is to use is.na on the left-hand-side of an assignment (as in is.na(f)[i] <- TRUE; indexing inside is.na does not work). Under those circumstances missing values are currently printed as <NA>, i.e., identical to entries of level NA.

is.factor is generic: you can write methods to handle specific classes of objects, see InternalMethods.

Where levels is not supplied, unique is called. Since factors typically have quite a small number of levels, for large vectors x it is helpful to supply nmax as an upper bound on the number of unique values.

##### Value

factor returns an object of class "factor" which has a set of integer codes the length of x with a "levels" attribute of mode character and unique (!anyDuplicated(.)) entries. If argument ordered is true (or ordered() is used) the result has class c("ordered", "factor"). Undocumentedly for a long time, factor(x) loses all attributes(x) but "names", and resets "levels" and "class".

Applying factor to an ordered or unordered factor returns a factor (of the same type) with just the levels which occur: see also [.factor for a more transparent way to achieve this.

is.factor returns TRUE or FALSE depending on whether its argument is of type factor or not. Correspondingly, is.ordered returns TRUE when its argument is an ordered factor and FALSE otherwise.

as.factor coerces its argument to a factor. It is an abbreviated (sometimes faster) form of factor.

as.ordered(x) returns x if this is ordered, and ordered(x) otherwise.

addNA modifies a factor by turning NA into an extra level (so that NA values are counted in tables, for instance).

.valid.factor(object) checks the validity of a factor, currently only levels(object), and returns TRUE if it is valid, otherwise a string describing the validity problem. This function is used for validObject(<factor>).

##### Note

In earlier versions of R, storing character data as a factor was more space efficient if there is even a small proportion of repeats. However, identical character strings now share storage, so the difference is small in most cases. (Integer values are stored in 4 bytes whereas each reference to a character string needs a pointer of 4 or 8 bytes.)

##### Warning

The interpretation of a factor depends on both the codes and the "levels" attribute. Be careful only to compare factors with the same set of levels (in the same order). In particular, as.numeric applied to a factor is meaningless, and may happen by implicit coercion. To transform a factor f to approximately its original numeric values, as.numeric(levels(f))[f] is recommended and slightly more efficient than as.numeric(as.character(f)).

The levels of a factor are by default sorted, but the sort order may well depend on the locale at the time of creation, and should not be assumed to be ASCII.

There are some anomalies associated with factors that have NA as a level. It is suggested to use them sparingly, e.g., only for tabulation purposes.

##### Comparison operators and group generic methods

There are "factor" and "ordered" methods for the group generic Ops which provide methods for the Comparison operators, and for the min, max, and range generics in Summary of "ordered". (The rest of the groups and the Math group generate an error as they are not meaningful for factors.)

Only == and != can be used for factors: a factor can only be compared to another factor with an identical set of levels (not necessarily in the same ordering) or to a character vector. Ordered factors are compared in the same way, but the general dispatch mechanism precludes comparing ordered and unordered factors.

All the comparison operators are available for ordered factors. Collation is done by the levels of the operands: if both operands are ordered factors they must have the same level set.

##### References

Chambers, J. M. and Hastie, T. J. (1992) Statistical Models in S. Wadsworth & Brooks/Cole.

[.factor for subsetting of factors.

gl for construction of balanced factors and C for factors with specified contrasts. levels and nlevels for accessing the levels, and unclass to get integer codes.

##### Aliases
• factor
• ordered
• is.factor
• is.ordered
• as.factor
• as.ordered
• is.na<-.factor
• Math.factor
• Ops.factor
• Summary.factor
• Ops.ordered
• Summary.ordered
• .valid.factor
##### Examples
library(base) # NOT RUN { (ff <- factor(substring("statistics", 1:10, 1:10), levels = letters)) as.integer(ff) # the internal codes (f. <- factor(ff)) # drops the levels that do not occur ff[, drop = TRUE] # the same, more transparently factor(letters[1:20], labels = "letter") class(ordered(4:1)) # "ordered", inheriting from "factor" z <- factor(LETTERS[3:1], ordered = TRUE) ## and "relational" methods work: stopifnot(sort(z)[c(1,3)] == range(z), min(z) < max(z)) # } # NOT RUN { ## suppose you want "NA" as a level, and to allow missing values. (x <- factor(c(1, 2, NA), exclude = NULL)) is.na(x) <- TRUE x #  1 <NA> <NA> is.na(x) #  FALSE TRUE FALSE ## More rational, since R 3.4.0 : factor(c(1:2, NA), exclude = "" ) # keeps <NA> , as factor(c(1:2, NA), exclude = NULL) # always did ## exclude = <character> z # ordered levels 'A < B < C' factor(z, exclude = "C") # does exclude factor(z, exclude = "B") # ditto ## Now, labels maybe duplicated: ## factor() with duplicated labels allowing to "merge levels" x <- c("Man", "Male", "Man", "Lady", "Female") ## Map from 4 different values to only two levels: (xf <- factor(x, levels = c("Male", "Man" , "Lady", "Female"), labels = c("Male", "Male", "Female", "Female"))) #>  Male Male Male Female Female #> Levels: Male Female ## Using addNA() Month <- airquality\$Month table(addNA(Month)) table(addNA(Month, ifany = TRUE)) # } 
Documentation reproduced from package base, version 3.5.3, License: Part of R 3.5.3

### Community examples

xiaowyang@gmail.com at Dec 3, 2018 base v3.5.1

> travel_time <- c("Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun") > duration (variable, levels=c(0, 1, 2, 3, 4, 5, 6)  Dec Jan Feb Mar Apr May Jun Levels: 1, 1, 2, 3, 4, 5, 6

xiaowyang@gmail.com at Dec 3, 2018 base v3.5.1

> travel_time <- c("Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun") > duration (variable, levels=c(0, 1, 2, 3, 4, 5, 6)  Dec Jan Feb Mar Apr May Jun Levels: 1, 1, 2, 3, 4, 5, 6

xiaowyang@gmail.com at Dec 3, 2018 base v3.5.1

> travel_time <- c("Dec", "Jan", "Feb", "Mar", "Apr", "May", "Jun") > duration (variable, levels=c(0, 1, 2, 3, 4, 5, 6)  Dec Jan Feb Mar Apr May Jun Levels: 1, 1, 2, 3, 4, 5, 6

xiaowyang@gmail.com at Dec 3, 2018 base v3.5.1

mark@niemannross.com at Nov 5, 2018 base v3.5.1

Example for [LinkedIn Learning : R for Data Science, Lunchbreak Lessons](https://linkedin-learning.pxf.io/rweekly_factor) r # Factors are lists of unique values, Stored as integers I.am.a.vector <- c("blue","black","green","white","black","blue","blue") # color of cars passing my window I.am.a.factor <- factor(I.am.a.vector) I.am.a.vector # notice the repeat of blue and black levels(I.am.a.factor) # no repeat! Efficient storage. levels(I.am.a.factor) <- c("negro","azul","verde","blanco") I.am.a.factor # now in spanish table(I.am.a.factor) # count frequency of values. table is a collection of factors nlevels(I.am.a.factor) # # of unique values barplot(table(I.am.a.factor)) # extra credit levels(ordered(I.am.a.factor)) sum(table(factor(I.am.a.vector,exclude = "blue"))) # count all cars except blue cars 

gvalentini4@gmail.com at Oct 13, 2018 base v3.5.1

Use function factor() to create an ordinal categorical variable: r weather <- c("cold", "cool", "warm", "hot") f_weather <- factor(weather, ordered = TRUE, levels = c("cold", "cool", "warm", "hot")) f_weather 

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

dcp17kshitizgoel@imt.ac.in at Dec 24, 2017 base v3.4.3

maxinehe93@gmail.com at Oct 24, 2017 base v3.4.1

Set levels: ==  > test <- c("small", "big") > factor(test, levels=c("small", "big"), ordered=TRUE)  small big Levels: small < big