For a single dataframe, the tibble returned contains the columns:
col_name, character vector containing column names of df1.
cnt integer column containing count of unique levels found in each column,
including NA.
common, a character column containing the name of the most common level.
common_pcnt, the percentage of each column occupied by the most common level shown in
common.
levels, a named list containing relative frequency tibbles for each feature.
For a pair of dataframes, the tibble returned contains the columns:
col_name, character vector containing names of columns appearing in both
df1 and df2.
jsd, a numeric column containing the Jensen-Shannon divergence. This measures the
difference in relative frequencies of levels in a pair of categorical features. Values near
to 0 indicate agreement of the distributions, while 1 indicates disagreement.
pval, the p-value corresponding to a NHT that the true frequencies of the categories are equal.
A small p indicates evidence that the the two sets of relative frequencies are actually different. The test
is based on a modified Chi-squared statistic.
lvls_1, lvls_2, the relative frequency of levels in each of df1 and df2.
For a grouped dataframe, the tibble returned is as for a single dataframe, but where
the first k columns are the grouping columns. There will be as many rows in the result
as there are unique combinations of the grouping variables.