forcats v0.2.0

0

Monthly downloads

0th

Percentile

by Hadley Wickham

Tools for Working with Categorical Variables (Factors)

Helpers for reordering factor levels (including moving specified levels to front, ordering by first appearance, reversing, and randomly shuffling), and tools for modifying factor levels (including collapsing rare levels into other, 'anonymising', and manually 'recoding').

Readme

forcats

CRAN\_Status\_Badge Travis-CI Build Status Coverage Status

Overview

R uses factors to handle categorical variables, variables that have a fixed and known set of possible values. Historically, factors were much easier to work with than character vectors, so many base R functions automatically convert character vectors to factors. (For more historical context, I recommend stringsAsFactors: An unauthorized biography by Roger Peng, and stringsAsFactors = <sigh> by Thomas Lumley.) These days, making factors automatically is no longer so helpful, so packages in the tidyverse never create them automatically.

However, factors are still useful when you have true categorical data, and when you want to override the ordering of character vectors to improve display. The goal of the forcats package is to provide a suite of useful tools that solve common problems with factors. If you're not familiar with strings, the best place to start is the chapter on factors in R for Data Science.

Installation

# The easiest way to get forcats is to install the whole tidyverse:
install.packages("tidyverse")

# Alternatively, install just forcats:
install.packages("forcats")

# Or the the development version from GitHub:
# install.packages("devtools")
devtools::install_github("tidyverse/forcats")

Getting started

forcats is not part of the core tidyverse, so you need to load it explicitly:

library(tidyverse)
library(forcats)

Factors are used to describe categorical variables with a fixed and known set of levels. You can create factors with the base factor() or readr::parse_factor():

x1 <- c("Dec", "Apr", "Jan", "Mar")
month_levels <- c(
  "Jan", "Feb", "Mar", "Apr", "May", "Jun", 
  "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"
)

factor(x1, month_levels)
#> [1] Dec Apr Jan Mar
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

parse_factor(x1, month_levels)
#> [1] Dec Apr Jan Mar
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

The advantage of parse_factor() is that it will generate a warning if values of x are not valid levels:

x2 <- c("Dec", "Apr", "Jam", "Mar")

factor(x2, month_levels)
#> [1] Dec  Apr  <NA> Mar 
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

parse_factor(x2, month_levels)
#> Warning: 1 parsing failure.
#> row col           expected actual
#>   3  -- value in level set    Jam
#> [1] Dec  Apr  <NA> Mar 
#> attr(,"problems")
#> # A tibble: 1 × 4
#>     row   col           expected actual
#>   <int> <int>              <chr>  <chr>
#> 1     3    NA value in level set    Jam
#> Levels: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Once you have the factor, forcats provides helpers for solving common problems.

Functions in forcats

Name Description
fct_lump Lump together least/most common factor levels into "other"
fct_inorder Reorder factors levels by first appearance or frequency
fct_anon Anonymise factor levels
as_factor Convert input to a factor.
fct_count Count entries in a factor
fct_rev Reverse order of factor levels
fct_relevel Reorder factor levels by hand
fct_reorder Reorder factor levels by sorting along another variable
fct_drop Drop unused levels
fct_relabel Automatically relabel factor levels, collapse as necessary
fct_recode Change factor levels by hand
fct_other Replace levels with "other"
fct_c Concatenate factors, combining levels
fct_collapse Collapse factor levels into manually defined groups
fct_shift Shift factor levels to left or right, wrapping around at end
lvls Low-level functions for manipulating levels
fct_shuffle Randomly permute factor levels
fct_explicit_na Make missing values explicit
fct_expand Add additional levels to a factor
%>% Pipe operator
fct_unify Unify the levels in a list of factors
fct_unique Unique values of a factor
forcats-package forcats: Tools for Working with Categorical Variables (Factors)
lvls_union Find all levels in a list of factors
No Results!

Last month downloads

Details

License GPL-3
Encoding UTF-8
LazyData true
URL http://forcats.tidyverse.org, https://github.com/tidyverse/forcats
BugReports https://github.com/tidyverse/forcats/issues
RoxygenNote 5.0.1.9000
NeedsCompilation no
Packaged 2017-01-22 19:24:31 UTC; hornik
Repository CRAN
Date/Publication 2017-01-23 16:39:48

Include our badge in your README

[![Rdoc](http://www.rdocumentation.org/badges/version/forcats)](http://www.rdocumentation.org/packages/forcats)