This function creates a log-log plot to visualize Zipf's law, which states that the frequency of a word is inversely proportional to its rank in the frequency table. The plot compares the observed frequency distribution of elements with the expected distribution if Zipf's law were true.
Usage
zipf_plot(sequences_long)
Value
A `ggplot` object that visualizes the observed and expected frequencies of elements according to Zipf's law. The plot includes:
Rank
The rank of each element based on its frequency, plotted on a log scale.
Count
The observed frequency of each element, plotted on a log scale.
Expected
The expected frequency of each element if Zipf's law were true, shown as a grey dashed line.
Arguments
sequences_long
A data frame containing at least one column named `element` which represents the elements of sequences. Each element's frequency is used to create the plot.
Details
- **Observed Frequencies**: Calculated from the provided `sequences_long` data frame.
- **Expected Frequencies**: Calculated using Zipf's law formula, where the frequency of the element is inversely proportional to its rank.
- **Plotting**: Both observed and expected frequencies are plotted on a log-log scale to compare against Zipf's law.