Learn R Programming

⚠️There's a newer version (1.0.2.2) of this package.Take me there.

genscore

This repository contains the generalized score matching estimator introduced in the paper "Generalized Score Matching for Non-Negative Data" (http://www.jmlr.org/papers/volume20/18-278/18-278.pdf), an estimator for high-dimensional graphical models or parameters in truncated distributions. It is a generalization of the regularized score matching estimator in "Estimation of High-Dimensional Graphical Models Using Regularized Score Matching" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5476334/, https://github.com/linlina/highscore).

The current second version further generalizes the distributions to generalized domain types

  1. The real space,
  2. The non-negative orthant of the real space,
  3. A union of intervals as the uniform domain for each component,
  4. The (p-1)-dimensional simplex (with all components positive and sum to 1), and
  5. Intersections and unions of domains defined by polynomial inequalities.

The distributions covered include

  1. the univariate truncated normal distribution,
  2. Gaussian graphical models,
  3. truncated Gaussian graphical models,
  4. exponential square-root graphical models (Inouye et al, 2016),
  5. "gamma graphical models" (Yu et al, 2019),
  6. "a-b models" (Yu et al, 2019), and
  7. the A^d model (Aitchison, 1985).

Installation from GitHub

install.packages(c("devtools", "knitr"))
devtools::install_github("sqyu/genscore", build_vignettes=TRUE)
# Set build_vignettes to FALSE if you do not wish to build the vignette (which takes a few minutes).

Usage

For a complete guide to its usage, please consult the vignette here (or here for the precompiled html).

vignette("gen_vignette")

References

Some parts of the code were initially dervied from https://github.com/linlina/highscore and http://www1.maths.leeds.ac.uk/~wally.gilks/adaptive.rejection/web_page/Welcome.html.

John Aitchison. A general class of distributions on the simplex. Journal of the Royal Statistical Society: Series B (Methodological), 47(1):136–146, 1985. https://doi.org/10.1111/j.2517-6161.1985.tb01341.x

David Inouye, Pradeep Ravikumar, and Inderjit Dhillon. Square root graphical models: Multivariate generalizations of univariate exponential families that permit positive de- pendencies. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2445–2453, 2016. http://proceedings.mlr.press/v48/inouye16.html

Lina Lin, Mathias Drton, and Ali Shojaie. Estimation of high-dimensional graphical models using regularized score m atching. Electron. J. Stat., 10(1):806–854, 2016. https://doi.org/10.1214/16-EJS1126

Shiqing Yu, Mathias Drton, and Ali Shojaie. Graphical models for non-negative data using generalized score matching. In International Conference on Artificial Intelligence and Statistics, pages 1781–1790, 2018. http://proceedings.mlr.press/v84/yu18b.html

Shiqing Yu, Mathias Drton, and Ali Shojaie. Generalized score matching for non-negative data. Journal of Machine Learning Research, 20(76):1–70, 2019. http://jmlr.org/papers/v20/18-278.html

Copy Link

Version

Install

install.packages('genscore')

Monthly Downloads

201

Version

1.0.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Shiqing Yu

Last Published

April 3rd, 2020

Functions in genscore (1.0.0)

binarySearch_bin

Finds the index of the bin a number belongs to using binary search.
crbound_sigma

The Cram\'er-Rao lower bound (times n) for estimating the variance parameter from a univariate truncated normal sample with known mean parameter.
diff_lists

Computes the sum of absolute differences between two lists.
frac_pow

Evaluate x^(a/b) and |x|^(a/b) with integer a and b with extension to conventional operations.
find_max_ind

Finds the max index in a vector that does not exceed a target number.
eBIC

eBIC score with or without refitting.
get_dist

Finds the distance of each element in a matrix x to the its boundary of the domain while fixing the others in the same row.
estimate

The main function for the generalized score-matching estimator for graphical models.
get_elts_exp

The R implementation to get the elements necessary for calculations for the exponential square-root setting (a=0.5, b=0.5).
get_elts_gamma

The R implementation to get the elements necessary for calculations for the gamma setting (a=0.5, b=0).
get_crit_nopenalty

Minimized loss for unpenalized restricted asymmetric models.
get_elts_loglog

The R implementation to get the elements necessary for calculations for the log-log setting (a=0, b=0).
get_elts_gauss

The R implementation to get the elements necessary for calculations for the gaussian setting on R^p.
get_g0

Calculates the l2 distance to the boundary of the domain and its gradient for some domains.
get_trun

The truncation point for h.
get_g0_ada

Adaptively truncates the l2 distance to the boundary of the domain and its gradient for some domains.
h_of_dist

Finds the distance of each element in a matrix x to the its boundary of the domain while fixing the others in the same row (dist(x, domain)), and calculates element-wise h(dist(x, domain)) and h\'(dist(x, domain)) (w.r.t. each element in x).
gcd

Finds the greatest (positive) common divisor of two integers.
gen

Random data generator from general a-b distributions with general domain types, assuming a and b are rational numbers.
make_domain

Creates a list of elements that defines the domain for a multivariate distribution.
get_h_hp

Generator of h and hp (derivative of h) functions.
make_folds

Helper function for making fold IDs for cross validation.
get_h_hp_adaptive

Generator of adaptive h and hp (derivative of h) functions.
diff_vecs

Computes the sum of absolute differences in the finite non-NA/NULL elements between two vectors.
in_bound

Returns whether a vector or each row of a matrix falls inside a domain.
makecoprime

Makes two integers coprime.
interval_intersection

Finds the intersection between two unions of intervals.
read_exponent

Parses the exponent part into power_numer and power_denom.
random_init_uniform

Generates random numbers from a finite union of intervals.
get_elts

The function wrapper to get the elements necessary for calculations for all settings.
domain_for_C

Returns a list to be passed to C that represents the domain.
get_elts_ab

The R implementation to get the elements necessary for calculations for general a and b.
get_safe_log_h_hp

Asymptotic log of h and hp functions for large x for some modes.
s_output

Helper function for outputting if verbose.
interval_union

Finds the union betweeen two unions of intervals.
get_results

Estimate \(\mathbf{K}\) and \(\boldsymbol{\eta}\) using elts from get_elts() given one \(\lambda_{\mathbf{K}}\) (and \(\lambda_{\boldsymbol{\eta}}\) if non-profiled non-centered) and applying warm-start with strong screening rules.
lambda_max

Analytic solution for the minimum \(\lambda_{\mathbf{K}}\) that gives the empty graph.
random_init_polynomial

Randomly generate an initial point in the domain defined by a single polynomial with no negative coefficient.
mu_sigmasqhat

Estimates the mu and sigma squared parameters from a univariate truncated normal sample.
random_init_simplex

Generates a random point in the (p-1)-simplex.
update_finite_infinity_for_uniform

Maximum between finite_infinity and 10 times the max abs value of finite elements in lefts and rights.
read_exponential

Parses the integer coefficient in an exponential term.
varhat

Asymptotic variance (times n) of the estimator for mu or sigmasq for the univariate truncated normal assuming the other parameter is known.
tp_fp

Calculates the true and false positive rates given the estimated and true edges.
read_uniform_term

Attempts to parse a single term in x into power_numer and power_denom.
search_bin

Finds the index of the bin a number belongs to.
refit

Loss for a refitted (restricted) unpenalized model
rlaplace_truncated_centered

Generates centered laplace variables with scale 1.
parse_ab

Parses an ab setting into rational numbers a and b.
ran_mat

Random generator of matrices with given eigenvalues.
get_postfix_rule

Changes a logical expression in infix notation to postfix notation using the shunting-yard algorithm.
get_elts_trun_gauss

The R implementation to get the elements necessary for calculations for the gaussian setting (a=1, b=1) on domains other than R^p.
naiveSearch_bin

Finds the index of the bin a number belongs to using naive search.
parse_ineq

Parses an ineq expression into a list of elements that represents the ineq.
get_elts_loglog_simplex

The R implementation to get the elements necessary for calculations for the log-log setting (a=0, b=0) on the p-simplex.
get_h_hp_vector

Generator of h and hp (derivative of h) functions.
s_at

Returns the character at a position of a string.
rexp_truncated

Generates translated and truncated exponential variables.
read_one_term

Parses the first term of a non-uniform expression.
rlaplace_truncated

Generates laplace variables truncated to a finite union of intervals.
test_lambda_bounds

Searches for a tight bound for \(\lambda_{\boldsymbol{K}}\) that gives the empty or complete graph starting from a given lambda with a given step size
test_lambda_bounds2

Searches for a tight bound for \(\lambda_{\boldsymbol{K}}\) that gives the empty or complete graph starting from a given lambda
cov_cons

Random generator of inverse covariance matrices.
AUC

Calculates the AUC of an ROC curve.
avgrocs

Takes the vertical average of ROC curves.
compare_two_sub_results

Compares two lists returned from get_results().
check_endpoints

Checks if two equally sized numeric vectors satisfy the requirements for being left and right endpoints of a domain defined as a union of intervals.
beautify_rule

Replaces consecutive "&"s and "|"s in a string to a single & and |.
crbound_mu

The Cram\'er-Rao lower bound (times n) for estimating the mean parameter from a univariate truncated normal sample with known variance parameter.
compare_two_results

Compares two lists returned from estimate().
calc_crit

Calculates penalized or unpenalized loss in K and eta given arbitrary data