Given data on a collection of units, this function computes r-values which are percentiles constructed to maximize the agreement between the reported percentiles and the percentiles of the effect of interest. Additional details about r-values are provided below and can also be found in the listed references.
rvalues(data, family = gaussian, hypers = "estimate", prior = "conjugate",
alpha.grid = NULL, ngrid = NULL, smooth = "none", control = list())
A data frame or a matrix with the number of rows equal to the number of sampling units. The first column should contain the main estimates, and the second column should contain the nuisance terms.
An argument which determines the sampling distribution; this could be either
family = gaussian
, family = tdist
, family = binomial
,
family = poisson
values of the hyperparameters; only meaningful when the conjugate prior is used; if set to "estimate", the hyperparameters are found through maximum likelihood; if not set to "estimate" the user should supply a vector of length two.
the form of the prior; either prior="conjugate"
or prior="nonparametric"
.
a numeric vector of points in (0,1); this grid is used in the discrete approximation of r-values
number of grid points for alpha.grid; only relevant when alpha.grid=NULL
either smooth="none"
or smooth
takes
a value between 0 and 10; this determines the level of smoothing applied to the
estimate of \(\lambda(\alpha)\) (see below for the definition of
\(\lambda(\alpha)\)); if smooth
is given a number, the
number is used as the bass
argument in supsmu
.
a list of control parameters for estimation of the prior; only used when the prior is nonparametric
An object of class "rvals" which is a list containing at least the following components:
a data frame containing the r-values, the r-value rankings along with the rankings from several other common procedures
a list containing other extraneous information
a vector of r-values
The r-value computation assumes the following two-level sampling model
\( X_i|\theta_i\) ~ \(p(x|\theta_i,\eta_i)\)
and \(\theta_i\) ~ \(F\), for \(i = 1,...,n\),
with parameters of interest \(\theta_i\), effect size estimates \(X_i\),
and nuisance terms \(\eta_i\). The form of \(p(x|\theta_i,\eta_i)\) is determined
by the family
argument. When family = gaussian
, it is assumed that
\(X_i|\theta_i,\eta_i\) ~ N(\(\theta_i,\eta_i^{2})\).
When family = binomial
, the \((X_i,\eta_i)\) represent the number of successes
and number of trials respectively, and it is assumed that \(X_i|\theta_i,\eta_i\) ~
Binomial\((\theta_i,\eta_i)\). When family = poisson
, the \({X_i}\) should be
counts, and it is assumed that \(X_i|\theta_i,\eta_i\) ~ Poisson(\(\theta_i * \eta_i)\).
The distribution of the effect sizes \(F\) may be a parametric distribution
that is conjugate to the corresponding family
argument,
or it may be estimated nonparametrically. When it is desired that \(F\) be
parametric (i.e., prior = "conjugate"
), the default is to estimate the
hyperparameters (i.e., hypers = "estimate"
), but these may be supplied by the
user as a vector of length two. To estimate \(F\) nonparametrically, one
should use prior = "nonparametric"
(see npmle
for
further details about nonparametric estimation of \(F\)).
The r-value, \(r_i\), assigned to the ith case of interest is determined by \( r_i = \) inf[ \(0 < \alpha < 1: V_\alpha(X_i,\eta_i) \ge \lambda(\alpha) \) ] where \(V_\alpha(X_i,\eta_i) = P( \theta_i \ge \theta_\alpha|X_i,\eta_i) \) is the posterior probability that \(\theta_i\) exceeds the threshold \(\theta_\alpha\), and \(\lambda(\alpha)\) is the upper-\(\alpha\)th quantile associated with the marginal distribution of \(V_\alpha(X_i,\eta_i)\) (i.e., \( P(V_\alpha(X_i,\eta_i) \ge \lambda(\alpha)) = \alpha). \) Similarly, the threshold \(\theta_\alpha\) is the upper-\(\alpha\)th quantile of \(F\) (i.e., \(P(\theta_i \ge \theta_\alpha) = \alpha\) ).
Henderson, N.C. and Newton, M.A. (2015) Making the Cut: Improved Ranking and Selection for Large-Scale Inference. http://arxiv.org/abs/1312.5776
# NOT RUN {
### Binomial example with Beta prior:
data(fluEnrich)
flu.rvals <- rvalues(fluEnrich, family = binomial)
hist(flu.rvals$rvalues)
### look at the r-values for indices 10 and 2484
fig_indices <- c(10,2484)
fluEnrich[fig_indices,]
flu.rvals$rvalues[fig_indices]
### Gaussian sampling distribution with nonparametric prior
### Use a maximum of 5 iterations for the nonparam. estimate
data(hiv)
hiv.rvals <- rvalues(hiv, prior = "nonparametric")
# }
Run the code above in your browser using DataLab