rare_plot

Generates a rarefaction curve showing the expected number of distinct
categories discovered as sampling effort increases. The curve is estimated
using Monte Carlo permutations of the observation order.

Provides tools for visualizing and analyzing the shape of discrete nominal frequency distributions. The package introduces centered frequency plots, in which nominal categories are ordered from the most frequent category at the center toward less frequent categories on both sides, facilitating the detection of distributional patterns such as uniformity, dominance, symmetry, skewness, and long-tail behavior. In addition, the package supports Pareto charts for the study of dominance and cumulative frequency structure in nominal data. The package is designed for exploratory data analysis and statistical teaching, offering visualizations that emphasize distributional form rather than arbitrary category ordering.

Norberto Asensio

nomiShape

Visualization and Analysis of Nominal Variable Distributions

rare_plot function

<dl><dt>df</dt>
<dd>A data frame containing the nominal variable.</dd>
<dt>var</dt>
<dd>Character string specifying the nominal variable column.</dd>
<dt>reps</dt>
<dd>Number of random permutations used to estimate the curve.
The default is 1000. Smaller values can be used to reduce computation
time when working with large datasets, at the cost of less precise
confidence intervals.</dd>
<dt>max_effort</dt>
<dd>Maximum sampling effort to compute. If NULL (default),
the full sample size is used. For very large datasets, this argument
allows users to limit the rarefaction curve to a smaller number of
observations in order to explore how quickly categories accumulate
and to approximate the minimum sample size required to capture most
of the category diversity.</dd></dl>

Arguments

Rarefaction curve for nominal variables — rare_plot

<dl>

<dt>df</dt>
<dd>A data frame containing the nominal variable.</dd>


<dt>var</dt>
<dd>Character string specifying the nominal variable column.</dd>


<dt>reps</dt>
<dd>Number of random permutations used to estimate the curve.
The default is 1000. Smaller values can be used to reduce computation
time when working with large datasets, at the cost of less precise
confidence intervals.</dd>


<dt>max_effort</dt>
<dd>Maximum sampling effort to compute. If NULL (default),
the full sample size is used. For very large datasets, this argument
allows users to limit the rarefaction curve to a smaller number of
observations in order to explore how quickly categories accumulate
and to approximate the minimum sample size required to capture most
of the category diversity.</dd>

</dl>

rare_plot: Rarefaction curve for nominal variables

Description

Usage

Value

Arguments

Examples