rlda.bernoulliMH: LDA with bernoulli entry with Metropolis-Hasting.

Description

This method implements the Latent Dirichlet Allocation with Stick-Breaking prior for bernoulli data. rlda.bernoulliMH works with binary data.frame.

Usage

rlda.bernoulliMH(data, loc.id, n_community, alpha0, alpha1, gamma,
  n_gibbs, nadapt, ll_prior = TRUE, display_progress = TRUE)

Arguments

data

A binary data.frame where each row is a sampling unit (i.e. Plots, Locations, Time, etc.) and each column is a categorical type of element (i.e. Species, Firms, Issues, etc.). The elements inside this data.frame must be Zeros and Ones.

loc.id

Vector column from data with the repeated locations for Presence and Absence data.

n_community

Total number of communities to return. It must be less than the total number of columns inside the data data.frame.

alpha0

Hyperparameter associated with the Beta prior Beta(alpha0, alpha1).

alpha1

Hyperparameter associated with the Beta prior Beta(alpha0, alpha1).

gamma

Hyperparameter associated with the Stick-Breaking prior.

n_gibbs

Total number of Gibbs Samples.

nadapt

Total number of Adaptations.

ll_prior

boolean scalar indicating TRUE if the log-likelihood must be computed using also the priors or FALSE otherwise.

display_progress

boolean scalar TRUE if the Progress Bar must be showed and FALSE otherwise.

Value

A R List with three elements:

Theta

The individual probability for each observation (ex: location) belong in each cluster (ex: community). It is a matrix with dimension equal n_gibbs by length(unique(loc.id)) * n_community

Phi

The individual probability for each variable (ex: Specie) belong in each cluster (ex: community). It is a matrix with dimension equal n_gibbs by ncol(data) * n_community

LogLikelihood

The vector of Log-Likelihoods compute for each Gibbs Sample.

Details

rlda.bernoulliMH uses a modified Latent Dirichlet Allocation method to construct Mixed-Membership Clusters using Bayesian Inference. The data must be a non-empty data.frame with the binaries values Zero or Ones for each variable (column) in each observation (row).

References

Blei, David M., Andrew Y. Ng, and Michael I. Jordan. "Latent dirichlet allocation." Journal of machine Learning research 3.Jan (2003): 993-1022. http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf
Valle, Denis, et al. "Decomposing biodiversity data using the Latent Dirichlet Allocation model, a probabilistic multivariate statistical method." Ecology letters 17.12 (2014): 1591-1601.

Examples

Run this code

# NOT RUN {
	
# }
# NOT RUN {
		library(Rlda)
		# Presence
		data(presence)
		# Set seed
		set.seed(9842)
		# Hyperparameters for each prior distribution
		gamma <- 0.01
		alpha0 <- 0.01
		alpha1 <- 0.01
		# Execute the LDA for the Bernoulli entry
		res <- rlda.bernoulliMH(data=presence,loc.id=seq(1,nrow(presence)),
		n_community=5, alpha0=0.01, alpha1=0.99, gamma=0.1,
    n_gibbs=1000, nadapt=1000, ll_prior = TRUE, display_progress = TRUE)
 	
# }

Run the code above in your browser using DataLab