freq: Frequency Table

Description

Creates a frequency table for a vector or variable from a data frame, with options for weighting, sorting, handling labelled data, defining custom missing values, and displaying cumulative percentages.

When styled = TRUE, the function prints a spicy-formatted ASCII table using print.spicy_freq_table() and spicy_print_table(); otherwise, it returns a data.frame containing frequencies and proportions.

Usage

freq(
  data,
  x = NULL,
  weights = NULL,
  digits = 1L,
  valid = TRUE,
  cum = FALSE,
  sort = "",
  na_val = NULL,
  labelled_levels = c("prefixed", "labels", "values"),
  factor_levels = c("observed", "all"),
  rescale = TRUE,
  decimal_mark = ".",
  styled = TRUE,
  ...
)

Value

With styled = FALSE, a plain data.frame with no extra attributes and columns:

value - unique values or factor levels
n - frequency count (weighted if applicable)
prop - proportion of total
valid_prop - proportion of valid responses (if valid = TRUE)
cum_prop, cum_valid_prop - cumulative percentages (if cum = TRUE)

With styled = TRUE (default), prints the formatted table to the console and invisibly returns a spicy_freq_table object: the same data.frame carrying rendering metadata as attributes (digits, data_name, var_name, var_label, class_name, n_total, n_valid, weighted, rescaled, weight_var) used by print.spicy_freq_table().

Arguments

data

A data.frame, vector, or factor. If a data frame is provided, specify the target variable x. If both data and x are supplied as vectors, data is ignored with a warning.

x

A variable from data (unquoted).

weights

Optional numeric vector of weights (same length as x). The variable may be referenced as a bare name when it belongs to data, or as a qualified expression like other$w (evaluated in the calling environment), which always takes precedence over data lookup. Observations with NA weights are dropped from the table with a warning; see Details.

digits

Number of decimal digits to display for percentages (default: 1).

valid

Logical. If TRUE (default), display valid percentages (excluding missing values).

cum

Logical. If FALSE (the default), cumulative percentages are omitted. If TRUE, adds cumulative percentages.

sort

Sorting method for values:

"" - no sorting (default)
"+" - increasing frequency
"-" - decreasing frequency
"name+" - alphabetical A-Z
"name-" - alphabetical Z-A

na_val

Atomic vector of numeric or character values to be treated as missing (NA).

For labelled variables (from haven or labelled), this argument must refer to the underlying coded values, not the visible labels.

Example:

x <- labelled(c(1, 2, 3, 1, 2, 3), c("Low" = 1, "Medium" = 2, "High" = 3))
freq(x, na_val = 1) # Treat all "Low" as missing

labelled_levels

For labelled variables, defines how labels and values are displayed:

"prefixed" or "p" - show labels as [value] label (default)
"labels" or "l" - show only labels
"values" or "v" - show only numeric codes

factor_levels

Character. Controls how factor and labelled values are displayed in the frequency table. "observed" (the default; matches Stata's tab) shows only levels present in the data. "all" (matches SPSS FREQUENCIES and code_book()'s default) keeps every declared level, including unused ones, which appear with n = 0.

rescale

Logical. If TRUE (default), rescale weights so that their total equals the unweighted sample size (length(weights)). See Details for the interaction with NA weights.

decimal_mark

Character used as the decimal mark in printed percentages. Either "." (the default) or ",". Matches the decimal_mark argument of cross_tab() and the three table_*() helpers, so European-locale users get a consistent experience across the package.

styled

Logical. If TRUE (default), print the formatted spicy table. If FALSE, return a plain data.frame with frequency values.

...

Additional arguments passed to print.spicy_freq_table().

Details

This function is designed to mimic common frequency procedures from statistical software such as SPSS or Stata, while integrating the flexibility of R's data structures.

It automatically detects the type of input (vector, factor, or labelled) and applies appropriate transformations, including:

Handling of labelled variables via labelled or haven
Optional recoding of specific values as missing (na_val)
Optional weighting with a rescaling mechanism
Support for cumulative percentages (cum = TRUE)
Multiple display modes for labels via labelled_levels
Schema-vs-observed level display via factor_levels

For factor and labelled inputs, the factor_levels argument controls whether declared-but-unobserved levels appear in the output. The default "observed" drops them (Stata tab behavior); "all" keeps them with n = 0, matching SPSS FREQUENCIES and code_book()'s default. For schema-level inspection without computing frequencies, use varlist() or code_book() with factor_levels = "all".

When weighting is applied (weights), the frequencies and percentages are computed proportionally to the weights. The argument rescale = TRUE normalizes weights so their sum equals the unweighted sample size (length(weights)).

Missing values in weights cause those observations to be dropped from the table entirely (with a warning), matching the behaviour of cross_tab() in spicy 0.11.0+. With rescale = TRUE, the remaining (non-NA-weighted) weights are normalized so the total weighted N equals the count of non-NA-weighted rows. With rescale = FALSE, the total weighted N is the actual sum of non-NA weights.

Examples

Run this code

# Frequency table with labelled ordered factor
freq(sochealth, education)
freq(sochealth, self_rated_health, sort = "-")

library(labelled)

# Simple numeric vector
x <- c(1, 2, 2, 3, 3, 3, NA)
freq(x)

# Plain vector with a sentinel value recoded as missing
freq(c(1, 2, 3, 99, 99), na_val = 99)

# Labelled variable (haven-style)
x_lbl <- labelled(
  c(1, 2, 3, 1, 2, 3, 1, 2, NA),
  labels = c("Low" = 1, "Medium" = 2, "High" = 3)
)
var_label(x_lbl) <- "Satisfaction level"

# Treat value 1 ("Low") as missing
freq(x_lbl, na_val = 1)

# Display only labels, add cumulative %
freq(x_lbl, labelled_levels = "labels", cum = TRUE)

# Display values only, sorted descending
freq(x_lbl, labelled_levels = "values", sort = "-")

# Show all declared factor levels, including unused ones (SPSS-style).
# The default "observed" mirrors Stata's `tab` and drops unused levels.
f <- factor(c("Yes", "No", "Yes"), levels = c("Yes", "No", "Maybe"))
freq(f, factor_levels = "all")

# With weighting
df <- data.frame(
  sex = factor(c("Male", "Female", "Female", "Male", NA, "Female")),
  weight = c(12, 8, 10, 15, 7, 9)
)

# Weighted frequencies (normalized)
freq(df, sex, weights = weight, rescale = TRUE)

# Weighted frequencies (without rescaling)
freq(df, sex, weights = weight, rescale = FALSE)

# Base R style, with weights and cumulative percentages
freq(df$sex, weights = df$weight, cum = TRUE)

# Piped version (tidy syntax) and sort alphabetically descending ("name-")
df |> freq(sex, sort = "name-")

# European decimal mark (matches `cross_tab()` and the `table_*()` family)
freq(sochealth, education, decimal_mark = ",")

# Non-styled return (for programmatic use)
f <- freq(df, sex, styled = FALSE)
head(f)