diversity: Main function to compute diversity measures

Description

Main function of the package. The diversity function computes diversity measures for a dataset with entities, categories and values.

Usage

diversity(data, type = "all", category_row = FALSE, dis = NULL,
  method = "euclidean", q = 0, alpha = 1, beta = 1, base = exp(1))

Arguments

data

A numeric matrix with entities $i$ in the rows and categories $j$ in the columns. Cells show the respective value (value of abundance) of entity $i$ in the category $j$. It can also be a transpose of the previous matrix, that is, a matrix with categories

type

A string or a vector of strings of nemonic strings referencing to the available diversity measures. The available measures are: "variety", (Shannon) "entropy", "blau","gini-simpson", "simpson", "true-diversity", "herfindahl-hirschman", "berger-parker", "r

category_row

A flag to indicate that categories are in the rows. The analysis assumes that the categories are in the columns of the matrix. If the categories are in the rows and the entities in the columns, then the parameter "category_row" has to be set to TRUE. The

dis

Optional square matrix of distances or dissimilarities between categories. It allows the user to provide her own matrix of dissimilarities between categories. The category names have to be both in the rows and in the columns, and these must be the exact s

method

The "rao-stirling" and "rao"-diversity indices use a disparity function to measure the distance between objects. If the user does not provide a matrix with disparities by using the parameter 'dis', then a matrix of disparities is computed using the method

The parameter used for the true diversity index. This parameter is also used for the Renyi entropy. The default value is 0.

alpha

Parameter for Rao-Stirling diversity. The default value is 1.

beta

Parameter for Rao-Stirling diversity. The default value is 1.

base

Base of the logarithm. Used in Entropy calculations. The default value is exp(1).

Value

A data frame with diversity measures as columns for each entity.

Details

Notation used in the following formulas: $N$, category count; $p_i$, proportion of entity comprises category $i$; $d_{ij}$, disparity between $i$ and $j$; $q$,$\alpha$ and $\beta$, parameters.

The available diversity measures included in the package are listed above. The titles of the formulas are the possible mnemonic values that the parameter "type" might take to compute that formula (i.e. diversity(data, type='variety') or diversity(data, type='v'):

variety, v: Category counts per entity [MacArthur 1965] $$\sum_i(p_i^0)$$.

entropy, e: Shannon entropy per entity [Shannon 1948] $$- \sum_i(p_i \log p_i)$$

Herfindahl-Hirschman, hh, hhi: The Herfindahl-Hirschman Index used in economy to measure the concentration of markets. $$\sum_i(p_i^2)$$

gini-simpson, gs: Gini-Simpson index per object [Gini 1912]. This measure is also known as the Gibbs-Martin index or the Blau index in sociology, psychology and management studies. $$1 - \sum_i(p_i^2)$$

simpson, s: Simpson index per entity [Simpson 1949]. $$D = \sum_i n_i(n_i-1) / N(N-1)$$ When this measure is required, then also associated variations Simpson's Index of Diversity $1-D$ and the Reciprocal Simpson $1/D$ will be computed.

true-diversity, td: True diversity index per entity [Hill 1973]. This measure is $q$ parameterized. When $q=1$ the equation is undefined, then, an aproximation is computed. Default for $q$ is 0. $$(\sum_ip_{i}^q)^{1/(1-q)}$$

berger-parker, bp: Berger-Parker index is equals to the maximum $p_i$ value in the entity, i.e. the proportional abundance of the most abundant type. When this measure is required, the reciprocal measure is also computed.

renyi, re: Renyi entropy per object. This measure is a generalization of the Shannon entropy parameterized by $q$. It corresponds to the logarithm of the true diversity index. The default value for $q$ is 0. $$(1-q)^{-1} \log(\sum_i p_i^q)$$

evenness, ev: Pielou evenness per object across categories [Pielou, 1969]. It is based in Shannon Entropy $$-\sum_i(p_i \log p_i)/\log{v}$$

rao: Rao diversity. $$\sum_{ij}d_{ij} p_i p_j$$

rao-stirling, rs: Rao-Stirling diversity per object across categories [Stirling, 2007]. Default values are $\alpha=1$ and $\beta=1$. For the pairwise disparities the measure allows to consider the Jaccard Index, Euclidean distances, Cosine Similarity among others. $$\sum_{ij}{d_{ij}}^\alpha {(p_i p_j )}^\beta$$

References

Gini, C. (1912). "Italian: Variabilita e mutabilita" 'Variability and Mutability', Memorie di metodologica statistica.

Hill, M. (1973). "Diversity and evenness: a unifying notation and its consequences". Ecology 54: 427-432.

MacArthur, R. (1965). "Patterns of Species Diversity". Biology Reviews 40: 510-533.

Pielou, E. (1969). "An Introduction to Mathematical Ecology". Wiley.

Shannon, C. (1948). "A Mathematical Theory of Communication". Bell entity Technical Journal 27 (3): 379-423.

Simpson, A. (1949). "Measurement of Diversity". Nature 163: 41-48.

Stirling, A. (2007). "A General Framework for Analysing Diversity in Science, Technology and Society". Journal of the Royal Society Interface 4: 707-719.

Examples

Run this code

data(pantheon)
diversity(pantheon)
diversity(pantheon, type='variety')
diversity(geese, type='berger-parker', category_row=TRUE)
#reading csv data matrix
path_to_file <- system.file("extdata", "PantheonMatrix.csv", package = "diverse")
X <- read_data(path = path_to_file)
diversity(data=X, type="gini")
diversity(data=X, type="rao-stirling", method="cosine")
diversity(data=X, type="all", method="jaccard")

#reading csv dataframe
path_to_file <- system.file("extdata", "PantheonEdges.csv", package = "diverse")
X <- read_data(path = path_to_file)
#true diversity
diversity(data=X, type="td", q=1)
#rao stirling with differente parameters
diversity(data=X, type="rao-stirling", method="euclidean", alpha=0, beta=1)
#more than one diversity measure
diversity(data=X, type=c('e','ev','bp','s'))

Run the code above in your browser using DataLab