sdtoolkit
package. It organizes many subsidiary functions into an interactive session to aid the user in identifying policy-relevant scenarios. It is based on Friedman and Fisher's PRIM, but includes additional modifications and diagnostics to better suit the scenario discovery task.sdprim(x, y = NULL,
thresh = NULL,
peel.alpha = 0.1,
paste.alpha = 0.05,
mass.min = 0.001,
pasting = TRUE,
box.init = NULL,
coverage = TRUE,
outfile = "boxsum.txt",
csvfile = "primboxes.csv",
repro = TRUE,
nbump = 10,
dfrac = 0.5,
threshtype = ">",
trajplot_xlim = c(0,1),
trajplot_ylim = c(0,1),
peel_crit = 1)
thresh
and threshtype
below.y
is not a zero-one vector, you must specify a real value on which to threshold the data.ncol(x)
, with the first row specifying lower bounds on each dimension, and the second NA
(no quotes) if you would not like a file written out.sdprim
will also ask if you would like to write it out as csv, and give you the chance to change the filename.repro = TRUE
, how many resamplings should be performed? Currently this only allows sampling without replacement.repro = TRUE
, what fraction of the dataset should be resampled each time?y
shoul=<>estats
and olap
which are ensemble statistics for the entire box sequence. While the structure of the output below may be of interest for advanced users, there is no need for the non-R user to be familiar with these outputs, as there are multiple functions for interpreting and displaying the output in a more friendly manner, such as seqinfo
and dimplot
.olap
attribute of the box sequence.either
, lower
, and upper
), each having length equal to the number of input dimension. upper
and lower
indicate whether the upper and lower end were restricted, and either
is just an OR of lower
and upper
. Thus, thus one way to only see the restricted box dimensions is with the code bs[[boxnumber]]$box[,bs[[boxnumber]]$dimlist$either].
, where sdprim
was called with argument repro=TRUE
). The columns correspond to the columns of the input matrix, and the first row gives the reproducibility statistics when PRIM was matched on coverage, the second when it was matched on density. The entries represent the fraction of time each dimension was restricted when PRIM was rerun on nbump
random subsamples (of size N*dfrac
) of the dataset.box
, except that the bounds are normalized so that they range from zero to one. These are used in the dimplot
command for visualizing dimension restrictions.sdprim
algorithm is very interactive, and the user will receive several prompts while running it. Note that during this process, at least on MS Windows versions, you will receive a sd.start
for reading in and cleaning data, seqinfo
for viewing the output of sdprim
, and dimplot
for visualizing dimension restrictions.#Load some example data to play with:
data(quakes)
#quakes is a 1000 by 5 dataset of earthquake information. This has no obvious
#policy significance, but we can use this built-in dataset to illustrate the use
#of PRIM.
#Here are the columns:
colnames(quakes)
#We will say magnitude is the output of interest, and call earthquakes greater
#5.0 'interesting.' We can then call sdprim two different ways.
#First, make an input matrix from columns 1,2,3 and 5
inputs <- quakes[,c(1:3,5)] #could also do quakes[,-4]
#Now put our unthresholded y vector:
yout <- quakes[,"mag"] #could also do quakes[,4]
#Now we can either call sdprim and threshold inside PRIM, like this:
myboxes <- sdprim(x=inputs, y=yout, thresh=5.0, threshtype=">")
#Or we can first threshold yout:
ythresh <- 1*(yout>5.0)
#and then call sdprim without worrying about the thresholds:
myboxes <- sdprim(x=inputs, y=ythresh)
Run the code above in your browser using DataLab