Unlimited learning, half price | 50% off
Get 50% off unlimited learning

genscore

This repository contains the generalized score matching estimator introduced in the paper "Generalized Score Matching for Non-Negative Data" (https://www.jmlr.org/papers/volume20/18-278/18-278.pdf), an estimator for high-dimensional graphical models or parameters in truncated distributions. It is a generalization of the regularized score matching estimator in "Estimation of High-Dimensional Graphical Models Using Regularized Score Matching" (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5476334/, https://github.com/linlina/highscore).

The current second version further generalizes the distributions to generalized domain types

  1. The real space,
  2. The non-negative orthant of the real space,
  3. A union of intervals as the uniform domain for each component,
  4. The (p-1)-dimensional simplex (with all components positive and sum to 1), and
  5. Intersections and unions of domains defined by polynomial inequalities.

The distributions covered include

  1. the univariate truncated normal distribution,
  2. Gaussian graphical models,
  3. truncated Gaussian graphical models,
  4. exponential square-root graphical models (Inouye et al, 2016),
  5. "gamma graphical models" (Yu et al, 2019),
  6. "a-b models" (Yu et al, 2019), and
  7. the A^d model (Aitchison, 1985).

Installation from CRAN

install.packages("genscore")

Installation from GitHub

install.packages(c("devtools", "knitr"))
devtools::install_github("sqyu/genscore")

Usage

For a complete guide to its usage, please consult the vignette here (or here for the pre-compiled html).

vignette("gen_vignette")

References

Some parts of the code were initially derived from https://github.com/linlina/highscore and http://www1.maths.leeds.ac.uk/~wally.gilks/adaptive.rejection/web_page/Welcome.html.

John Aitchison. A general class of distributions on the simplex. Journal of the Royal Statistical Society: Series B (Methodological), 47(1):136–146, 1985. https://doi.org/10.1111/j.2517-6161.1985.tb01341.x

David Inouye, Pradeep Ravikumar, and Inderjit Dhillon. Square root graphical models: Multivariate generalizations of univariate exponential families that permit positive dependencies. In Proceedings of The 33rd International Conference on Machine Learning, volume 48 of Proceedings of Machine Learning Research, pages 2445–2453, 2016. https://proceedings.mlr.press/v48/inouye16.html

Lina Lin, Mathias Drton, and Ali Shojaie. Estimation of high-dimensional graphical models using regularized score m atching. Electron. J. Stat., 10(1):806–854, 2016. https://doi.org/10.1214/16-EJS1126

Shiqing Yu, Mathias Drton, and Ali Shojaie. Graphical models for non-negative data using generalized score matching. In International Conference on Artificial Intelligence and Statistics, pages 1781–1790, 2018. https://proceedings.mlr.press/v84/yu18b.html

Shiqing Yu, Mathias Drton, and Ali Shojaie. Generalized score matching for non-negative data. Journal of Machine Learning Research, 20(76):1–70, 2019. https://jmlr.org/papers/v20/18-278.html

Copy Link

Version

Install

install.packages('genscore')

Monthly Downloads

202

Version

1.0.2.2

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Shiqing Yu

Last Published

December 16th, 2023

Functions in genscore (1.0.2.2)

get_elts

The function wrapper to get the elements necessary for calculations for all settings.
get_elts_ab

The R implementation to get the elements necessary for calculations for general a and b.
compare_two_results

Compares two lists returned from estimate().
crbound_sigma

The Cram\'er-Rao lower bound (times n) for estimating the variance parameter from a univariate truncated normal sample with known mean parameter.
diff_lists

Computes the sum of absolute differences between two lists.
get_elts_gauss

The R implementation to get the elements necessary for calculations for the gaussian setting on R^p.
compare_two_sub_results

Compares two lists returned from get_results().
get_elts_loglog

The R implementation to get the elements necessary for calculations for the log-log setting (a=0, b=0).
get_elts_loglog_simplex

The R implementation to get the elements necessary for calculations for the log-log setting (a=0, b=0) on the p-simplex.
get_elts_trun_gauss

The R implementation to get the elements necessary for calculations for the gaussian setting (a=1, b=1) on domains other than R^p.
get_elts_gamma

The R implementation to get the elements necessary for calculations for the gamma setting (a=0.5, b=0).
get_g0

Calculates the l2 distance to the boundary of the domain and its gradient for some domains.
get_elts_exp

The R implementation to get the elements necessary for calculations for the exponential square-root setting (a=0.5, b=0.5).
get_h_hp_vector

Generator of h and hp (derivative of h) functions.
crbound_mu

The Cram\'er-Rao lower bound (times n) for estimating the mean parameter from a univariate truncated normal sample with known variance parameter.
diff_vecs

Computes the sum of absolute differences in the finite non-NA/NULL elements between two vectors.
domain_for_C

Returns a list to be passed to C that represents the domain.
h_of_dist

Finds the distance of each element in a matrix x to the its boundary of the domain while fixing the others in the same row (dist(x, domain)), and calculates element-wise h(dist(x, domain)) and h\'(dist(x, domain)) (w.r.t. each element in x).
get_trun

The truncation point for h for h that is truncated (bounded but not naturally bounded).
get_postfix_rule

Changes a logical expression in infix notation to postfix notation using the shunting-yard algorithm.
get_safe_log_h_hp

Asymptotic log of h and hp functions for large x for modes with an unbounded h.
get_dist

Finds the distance of each element in a matrix x to the its boundary of the domain while fixing the others in the same row.
get_crit_nopenalty

Minimized loss for unpenalized restricted asymmetric models.
get_results

Estimate \(\mathbf{K}\) and \(\boldsymbol{\eta}\) using elts from get_elts() given one \(\lambda_{\mathbf{K}}\) (and \(\lambda_{\boldsymbol{\eta}}\) if non-profiled non-centered) and applying warm-start with strong screening rules.
get_h_hp_adaptive

Generator of adaptive h and hp (derivative of h) functions.
get_h_hp

Generator of h and hp (derivative of h) functions.
interval_intersection

Finds the intersection between two unions of intervals.
in_bound

Returns whether a vector or each row of a matrix falls inside a domain.
read_exponential

Parses the integer coefficient in an exponential term.
get_g0_ada

Adaptively truncates the l2 distance to the boundary of the domain and its gradient for some domains.
read_one_term

Parses the first term of a non-uniform expression.
interval_union

Finds the union between two unions of intervals.
lambda_max

Analytic solution for the minimum \(\lambda_{\mathbf{K}}\) that gives the empty graph.
refit

Loss for a refitted (restricted) unpenalized model
read_uniform_term

Attempts to parse a single term in x into power_numer and power_denom.
make_folds

Helper function for making fold IDs for cross validation.
make_domain

Creates a list of elements that defines the domain for a multivariate distribution.
random_init_polynomial

Randomly generate an initial point in the domain defined by a single polynomial with no negative coefficient.
random_init_simplex

Generates a random point in the (p-1)-simplex.
update_finite_infinity_for_uniform

Maximum between finite_infinity and 10 times the max abs value of finite elements in lefts and rights.
tp_fp

Calculates the true and false positive rates given the estimated and true edges.
makecoprime

Makes two integers coprime.
mu_sigmasqhat

Estimates the mu and sigma squared parameters from a univariate truncated normal sample.
test_lambda_bounds2

Searches for a tight bound for \(\lambda_{\boldsymbol{K}}\) that gives the empty or complete graph starting from a given lambda
test_lambda_bounds

Searches for a tight bound for \(\lambda_{\boldsymbol{K}}\) that gives the empty or complete graph starting from a given lambda with a given step size
parse_ab

Parses an ab setting into rational numbers a and b.
naiveSearch_bin

Finds the index of the bin a number belongs to using naive search.
random_init_uniform

Generates random numbers from a finite union of intervals.
read_exponent

Parses the exponent part into power_numer and power_denom.
parse_ineq

Parses an ineq expression into a list of elements that represents the ineq.
ran_mat

Random generator of matrices with given eigenvalues.
s_output

Helper function for outputting if verbose.
search_bin

Finds the index of the bin a number belongs to.
varhat

Asymptotic variance (times n) of the estimator for mu or sigmasq for the univariate normal on a general domain assuming the other parameter is known.
rlaplace_truncated

Generates laplace variables truncated to a finite union of intervals.
rexp_truncated

Generates translated and truncated exponential variables.
rlaplace_truncated_centered

Generates centered laplace variables with scale 1.
s_at

Returns the character at a position of a string.
avgrocs

Takes the vertical average of ROC curves.
AUC

Calculates the AUC of an ROC curve.
check_endpoints

Checks if two equally sized numeric vectors satisfy the requirements for being left and right endpoints of a domain defined as a union of intervals.
cov_cons

Random generator of inverse covariance matrices.
calc_crit

Calculates penalized or unpenalized loss in K and eta given arbitrary data
find_max_ind

Finds the max index in a vector that does not exceed a target number.
gcd

Finds the greatest (positive) common divisor of two integers.
frac_pow

Evaluate x^(a/b) and |x|^(a/b) with integer a and b with extension to conventional operations.
gen

Random data generator from general a-b distributions with general domain types, assuming a and b are rational numbers.
beautify_rule

Replaces consecutive "&"s and "|"s in a string to a single & and |.
binarySearch_bin

Finds the index of the bin a number belongs to using binary search.
estimate

The main function for the generalized score-matching estimator for graphical models.
eBIC

eBIC score with or without refitting.