From the vector of specified n.seeds
and possible waves 1:n.wave
around each
seed, the function selects a single number n.seed
and an n.wave
(optimal seed-wave combination) that produce
a labeled snowball with multiple inclusions (LSMI) sample with desired
bootstrap confidence intervals for a parameter of interest. Here by `desired'
we mean that the interval (and corresponding seed-wave combination) are selected
as having the best coverage (closest to the specified level prob
), based on
a cross-validation procedure with proxy estimates of the parameter.
See Algorithm 2 by gel_etal_2017;textualsnowboot and Details
below.
lsmi_cv(
net,
n.seeds,
n.wave,
seeds = NULL,
B = 100,
prob = 0.95,
cl = 1,
param = c("mu"),
method = c("percentile", "basic"),
proxyRep = 19,
proxySize = 30
)
a network object that is a list containing:
degree
the degree sequence of the network, which is
an integer
vector of length \(n\);
edges
the edgelist, which is a two-column matrix, where each row is an edge of the network;
n
the network order (i.e., number of nodes in the network).
The network object can be simulated by random_network
,
selected from the networks available in artificial_networks
,
converged from an igraph
object with igraph_to_network
,
etc.
an integer vector of numbers of seeds for snowball sampling
(cf. a single integer n.seed
in lsmi
). Only
n.seeds <= n
are retained. If seeds
is
specified, only values n.seeds < length(unique(seeds))
are retained
and automatically supplemented by length(unique(seeds))
.
an integer defining the number of waves (order of the neighborhood)
to be recorded around the seed in the LSMI. For example, n.wave = 1
corresponds to
an LSMI with the seed and its first neighbors. Note that the algorithm allows for
multiple inclusions.
a vector of numeric IDs of pre-specified seeds. If specified, LSMIs are constructed around each such seed.
a positive integer, the number of bootstrap replications to perform. Default is 100.
confidence level for the intervals. Default is 0.95 (i.e., 95% confidence).
parameter to specify computer cluster for bootstrapping, passed to
the package parallel
(default is 1
, meaning no cluster is used).
Possible values are:
cluster object (list) produced by makeCluster. In this case, new cluster is not started nor stopped;
NULL
. In this case, the function will attempt to detect
available cores (see detectCores) and, if there are
multiple cores (\(>1\)), a cluster will be started with
makeCluster. If started, the cluster will be stopped
after computations are finished;
positive integer defining the number of cores to start a cluster.
If cl = 1
, no attempt to create a cluster will be made.
If cl > 1
, cluster will be started (using makeCluster)
and stopped afterwards (using stopCluster).
The parameter of interest for which to run a cross-validation
and select optimal n.seed
and n.wave
. Currently, only one
selection is possible: "mu"
(the network mean degree).
method for calculating the bootstrap intervals. Default is
"percentile"
(see Details).
The number of times to repeat proxy sampling. Default is 19.
The size of the proxy sample. Default is 30.
A list consisting of:
A numeric vector of length 2 with the bootstrap confidence interval
(lower bound, upper bound) for the parameter of interest. This interval is
obtained by bootstrapping node degrees in an LSMI with the optimal combination
of n.seed
and n.wave
(the combination is reported in best_combination
).
Point estimate of the parameter of interest
(based on the LSMI with n.seed
seeds and n.wave
waves
reported in the best_combination
).
An integer vector of lenght 2 containing the optimal
n.seed
and n.wave
selected via cross-validation.
A vector of numeric IDs of the seeds that were used
in the LSMI with the optimal combination of n.seed
and n.wave
.
Currently, the bootstrap intervals can be calculated with two alternative
methods: "percentile"
or "basic"
. The "percentile"
intervals correspond to Efron's \(100\cdot\)prob
% intervals
@see @efron_1979, also Equation 5.18 by @davison_hinkley_1997 and Equation 3 by @gel_etal_2017, @chen_etal_2018_snowbootsnowboot:
$$(\theta^*_{[B\alpha/2]}, \theta^*_{[B(1-\alpha/2)]}),$$
where \(\theta^*_{[B\alpha/2]}\) and \(\theta^*_{[B(1-\alpha/2)]}\)
are empirical quantiles of the bootstrap distribution with B
bootstrap
replications for parameter \(\theta\)
(\(\theta\) can be the \(f(k)\) or \(\mu\)),
and \(\alpha = 1 -\) prob
.
The "basic"
method produces intervals
@see Equation 5.6 by @davison_hinkley_1997snowboot:
$$(2\hat{\theta} - \theta^*_{[B(1-\alpha/2)]}, 2\hat{\theta} - \theta^*_{[B\alpha/2]}),$$
where \(\hat{\theta}\) is the sample estimate of the parameter.
Note that this method can lead to negative confidence bounds, especially
when \(\hat{\theta}\) is close to 0.
# NOT RUN {
net <- artificial_networks[[1]]
a <- lsmi_cv(net, n.seeds = c(10, 20, 30), n.wave = 5, B = 100)
# }
Run the code above in your browser using DataLab