Learn R Programming

⚠️There's a newer version (0.4.0) of this package.Take me there.

riskyr

A toolbox for rendering risk literacy more transparent

Starting with a condition (e.g., a disease), a corresponding decision (e.g., a clinical judgment or diagnostic test), and basic probabilities (e.g., the condition's prevalence prev, and the decision's sensitivity sens and specificity spec) we provide a range of functions and metrics to compute, translate, and represent risk-related information (e.g., as probabilities or frequencies for a population of N individuals). By offering a variety of perspectives on the interplay between key parameters, riskyr renders teaching and training of risk literacy more transparent.

Motivation

Solving a problem simply means representing it so as to make the solution transparent. (H.A. Simon)[1]

The issues addressed by riskyr are less of a computational and more of a representational nature (i.e., concerning the expression in and translation between different formats of information). Whereas people tend to find it difficult to understand and compute information expressed in terms of probabilities, the same information is often easy to understand and compute when expressed in terms of frequencies. But rather than just expressing probabilistic information in terms of frequencies, riskyr allows translating between formats and illustrates their relationships in a variety of transparent and interactive ways.

Basic assumptions and goals driving the current development of riskyr include the following:

  1. Effective training in risk literacy requires simple tools and transparent representations.

  2. More specifically, it would be desirable to have a set of (computational and representational) tools that allow various calculations, translations (between formats), and a range of alternative views on the interplay between probabilities and frequencies.

  3. Seeing a variety of visualizations that illustrate how parameters and metrics interact and influence each other facilitates active and explorative learning. It is particularly helpful to view the same or similar relationships from alternative representations or to inspect the change of one parameter as a function of changes in other parameters.

To deliver on these assumptions and goals, we provide a range of computational and representational tools. Importantly, the objects and functions in the riskyr toolbox are not isolated, but complement, explain, and support each other. All functions and visualizations can be used separately and explored interactively, providing immediate feedback on the effect of changes in parameter values. By providing a variety of customization options, users can explore and design representations of risk-related information that suit their personal goals and needs.

Installation

You can install the latest development version of riskyr from its GitHub repository at https://github.com/hneth/riskyr:

# install.packages("devtools")
devtools::install_github("hneth/riskyr")

Quick Start Guide

Defining a scenario

riskyr is designed to address problems like the following:[2]

Screening for hustosis

A new screening device for detecting the clinical condition of hustosis is developed. Its current version is very good, but not yet perfect. It has the following properties:

  1. About 4% of the general population suffer from hustosis.
  2. If someone suffers from hustosis, there is a chance of 80% that he or she will test positively for the condition.
  3. If someone is free from hustosis, there is a chance of 4% that he or she will still test positively for the condition.

Mr. and Ms. Smith have both been screened with this device:

  • Mr. Smith tested positively (i.e., received a diagnosis of hustosis).
  • Ms. Smith tested negatively (i.e., was judged to be free of hustosis).

Please answer the following questions:

  • What is the probability that Mr. Smith actually suffers from hustosis?
  • What is the probability that Ms. Smith is actually free of hustosis?

Probabilities provided

The first challenge in solving such problems is in understanding the information provided. The problem description provides three essential probabilities:

  1. The condition's prevalence (in the general population) of 4%: prev = .04.
  2. The device's or diagnostic decision's sensitivity of 80%: sens = .80.
  3. The device's or diagnostic decision's false alarm rate of 4%, implying a specificity of (100% − 4%) = 96%: spec = .04.

Understanding the questions asked

The second challenge here lies in understanding the questions that are being asked -- and in realizing that their answers are not simply the decision's sensitivity or specificity values. Instead, we are asked to provide two conditional probabilities:

  • The conditional probability of suffering from the condition given a positive test result, aka. the positive predictive value (PPV).
  • The conditional probability of being free of the condition given a negative test result, aka. the negative predictive value (NPV).

Translating into frequencies

One of the best tricks in risk literacy education is to translate probabilistic information into frequencies.[3] To do this, we imagine a representative sample of N = 1000 individuals. Rather than asking about the probabilities for Mr. and Ms. Smith, we could re-frame the questions as:

Assuming a representative sample of 1000 individuals:

  • What proportion of individuals with a positive test result actually suffer from hustosis?
  • What proportion of individuals with a negative test result are actually free of hustosis?

Using riskyr

Here is how riskyr allows you to view and solve such problems:

library(riskyr)  # loads the package
#> Welcome to riskyr!
#> riskyr.guide() opens user guides.

Creating a scenario

Let us define a new riskyr scenario (called hustosis) with the information provided by our problem:

## (1) Create your own scenario: ----------
hustosis <- riskyr(scen.lbl = "Example", 
                   cond.lbl = "hustosis",
                   dec.lbl = "screening test",
                   popu.lbl = "representative sample", 
                   N = 1000, 
                   prev = .04, sens = .80, spec = (1 - .05)
                   )

Summary

To obtain a quick overview of key parameter values, we could ask for the summary of our hustosis scenario:

summary(hustosis)  # summarizes key parameter values
#> Scenario:  Example 
#> 
#> Condition:  hustosis 
#> Decision:  screening test 
#> Population:  representative sample 
#> N =  1000 
#> Source:  Source information for this scenario 
#> 
#> Probabilities:
#> 
#>  Essential probabilities:
#> prev sens mirt spec fart 
#> 0.04 0.80 0.20 0.95 0.05 
#> 
#>  Other probabilities:
#>  ppod   PPV   NPV   FDR   FOR 
#> 0.080 0.400 0.991 0.600 0.009 
#> 
#> Frequencies:
#> 
#>  by conditions:
#>  cond.true cond.false 
#>         40        960 
#> 
#>  by decision:
#> dec.pos dec.neg 
#>      80     920 
#> 
#>  by correspondence (of decision to condition):
#> dec.cor dec.err 
#>     944      56 
#> 
#>  4 essential (SDT) frequencies:
#>  hi  mi  fa  cr 
#>  32   8  48 912 
#> 
#> Accuracy:
#> 
#>  acc:
#> 0.944

The summary distinguishes between probabilities, frequencies, and accuracy information. In Probabilities we find the answer to both of our questions when taking into account the information provided above:

  • The conditional probability that Mr. Smith actually suffers from hustosis given his positive test result is 40% (as PPV = 0.400).

  • The conditional probability that Ms. Smith is actually free of hustosis given her negative test result is 99.1% (as NPV = 0.991).

In case you are surprised by these answers, you are a good candidate for additional instruction in risk literacy. One of the strengths of riskyr is to analyze and view the scenario from a variety of different perspectives. Here is a quick overview over its different types of visualizations:

Tree diagram

## View graphics: 
plot(hustosis, plot.type = "tree", by = "dc")  # plot a tree diagram (by decision):

This particular tree, which splits the population of N = 1000 individuals into two subgroups by decision (by = "dc"), actually contains the answer to the second version of our questions:

  • The proportion of individuals with a positive test result who actually suffer from hustosis is the frequency of "true positive" cases (shown in darker green) divided by "decision positive" cases (shown in purple): 32/80 = .400 (corresponding to our value of PPV above).
  • The proportion of individuals with a negative test result who are actually free from hustosis is the frequency of "true negative" cases (shown in lighter green) divided by "decision negative" cases (shown in blue): 912/920 = .991 (corresponding to our value of NPV above, except for minimal differences due to rounding).

Of course, the frequencies of these ratios were already contained in the hustosis summary above. But the representation in the form of a tree diagram makes it easier to understand which frequencies are required to answer the question.

Icon array

plot(hustosis, plot.type = "icons")   # plot an icon array: 

Mosaic plot

plot(hustosis, plot.type = "mosaic")  # plot a mosaic plot: 

Curves

plot(hustosis, plot.type = "curve")   # plot curves (as a function of prevalence):

Planes

plot(hustosis, plot.type = "plane")   # plot plane (as a function of sens x spec):

Using existing scenarios

As defining your own scenarios can be cumbersome and the literature is full of existing problems (that study so-called Bayesian reasoning), riskyr provides a set of -- currently 25) -- pre-defined scenarios (stored in a list scenarios). Here is an example that shows how you can select and explore them:

Selecting a scenario

Let us assume you want to learn more about the controversy surrounding screening prodecures of prostate-cancer (known as PSA screening). Scenario 21 in our collection of scenarios is from an article on this topic (Arkes & Gaissmaier, 2012). To select a particular scenario, simply assign it to an R object. For instance, we can assign Scenario 21 to s21:

## (2) Explore an existing riskyr scenario: ---------- 
s21 <- scenarios$n21  # assign pre-defined Scenario 21 to s21.

Summary

The following commands provide a quick overview of the scenario content in text form:

# Show basic scenario information: 
s21$scen.lbl  # shows descriptive label:
#> [1] "PSA test 1 (high prev)"
s21$cond.lbl  # shows current condition:
#> [1] "prostate cancer"
s21$dec.lbl   # shows current decision:
#> [1] "PSA test"
s21$popu.lbl  # shows current population:
#> [1] "1000 patients with symptoms diagnostic of prostate cancer taking a PSA test."
s21$scen.apa  # shows current source: 
#> [1] "Arkes, H. R., & Gaissmaier, W. (2012). Psychological research and the prostate-cancer screening controversy. Psychological Science, 23(6), 547--553."

## View parameters:
summary(s21)  # shows key parameter information:
#> Scenario:  PSA test 1 (high prev) 
#> 
#> Condition:  prostate cancer 
#> Decision:  PSA test 
#> Population:  1000 patients with symptoms diagnostic of prostate cancer taking a PSA test. 
#> N =  1000 
#> Source:  Arkes & Gaissmaier (2012), p. 550 
#> 
#> Probabilities:
#> 
#>  Essential probabilities:
#> prev sens mirt spec fart 
#> 0.50 0.21 0.79 0.94 0.06 
#> 
#>  Other probabilities:
#>  ppod   PPV   NPV   FDR   FOR 
#> 0.135 0.778 0.543 0.222 0.457 
#> 
#> Frequencies:
#> 
#>  by conditions:
#>  cond.true cond.false 
#>        500        500 
#> 
#>  by decision:
#> dec.pos dec.neg 
#>     135     865 
#> 
#>  by correspondence (of decision to condition):
#> dec.cor dec.err 
#>     575     425 
#> 
#>  4 essential (SDT) frequencies:
#>  hi  mi  fa  cr 
#> 105 395  30 470 
#> 
#> Accuracy:
#> 
#>  acc:
#> 0.575

Generating the following plots will provide you with a quick visual exploration of the scenario:

Network diagram

plot(s21) # plots a network diagram (by default):

Icon array

plot(s21, plot.type = "icons")   # plot an icon array: 

Mosaic plot

plot(s21, plot.type = "mosaic")   # plot a mosaic plot: 

Curves

The following curves show values of conditional probabilities as a function of prevalence:

plot(s21, plot.type = "curve", what = "all")  # plot all curves (as a function of prevalence):

Planes

The following surface shows the negative predictive value (NPV) as a function of sensitivity and specificity (for a given prevalence):

plot(s21, plot.type = "plane", what = "NPV")  # plot plane (as a function of sens x spec):

We hope that these examples succeeded in whetting your appetite for visual exploration. If so, call riskyr.guide() for viewing the package vignettes and obtaining additional information.

About

riskyr originated out of a series of lectures and workshops on risk literacy in spring/summer 2017. The current version (riskyr 0.1.0, as of Feb. 16, 2018) is still under development. Its primary developers and designers are Hansjörg Neth, Felix Gaisbauer, and Nico Gradwohl, who are researchers at the department of Social Psychology and Decision Sciences at the University of Konstanz, Germany.

The riskyr package is open source software written in R and released under the GPL 2 | GPL 3 licenses.

Please email at contact.riskyr@gmail.com in case you want to use, adapt, or share this software.

Contact

We appreciate your feedback, comments, or questions.

Reference

To cite riskyr in derivations and publications use:

  • Neth, H., Gaisbauer, F., Gradwohl, N., & Gaissmaier, W. (2018). riskyr: A toolbox for rendering risk literacy more transparent. Social Psychology and Decision Sciences, University of Konstanz, Germany. Computer software (R package version 0.1.0, Feb. 16, 2018). Retrieved from https://github.com/hneth/riskyr.

A BibTeX entry for LaTeX users is:

@Manual{,
  title = {riskyr: A toolbox for rendering risk literacy more transparent},
  author = {Hansjörg Neth and Felix Gaisbauer and Nico Gradwohl and Wolfgang Gaissmaier},
  year = {2018},
  organization = {Social Psychology and Decision Sciences, University of Konstanz},
  address = {Konstanz, Germany},
  note = {R package (version 0.1.0, Feb. 16, 2018)},
  url = {https://github.com/hneth/riskyr},
  }    

Calling citation("riskyr") in the package also displays this information.

References

  • Arkes, H. R., & Gaissmaier, W. (2012). Psychological research and the prostate-cancer screening controversy. Psychological Science, 23, 547--553.

  • Gigerenzer, G. (2002). Reckoning with risk: Learning to live with uncertainty. London, UK: Penguin.

  • Gigerenzer, G. (2014). Risk savvy: How to make good decisions. New York, NY: Penguin.

  • Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L., & Woloshin, S. (2007). Helping doctors and patients make sense of health statistics. Psychological Science in the Public Interest, 8, 53--96.

  • Gigerenzer, G., & Hoffrage, U. (1995). How to improve Bayesian reasoning without instruction: Frequency formats. Psychological Review, 102, 684--704.

  • Hoffrage, U., Gigerenzer, G., Krauss, S., & Martignon, L. (2002). Representation facilitates reasoning: What natural frequencies are and what they are not. Cognition, 84, 343--352.

  • Hoffrage, U., Krauss, S., Martignon, L., & Gigerenzer, G. (2015). Natural frequencies improve Bayesian reasoning in simple and complex inference tasks. Frontiers in Psychology, 6, 1473.

  • Hoffrage, U., Lindsey, S., Hertwig, R., & Gigerenzer, G. (2000). Communicating statistical information. Science, 290, 2261--2262.

  • Kurzenhäuser, S., & Hoffrage, U. (2002). Teaching Bayesian reasoning: An evaluation of a classroom tutorial for medical students. Medical Teacher, 24, 516--521.

  • Kurz-Milcke, E., Gigerenzer, G., & Martignon, L. (2008). Transparency in risk communication. Annals of the New York Academy of Sciences, 1128, 18--28.

  • Sedlmeier, P., & Gigerenzer, G. (2001). Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology: General, 130, 380--400.

[1] Simon, H.A. (1996). The Sciences of the Artificial (3rd ed.). The MIT Press, Cambridge, MA. (p. 132).

[2] See Gigerenzer (2002, 2014), Gigerenzer and Hoffrage, U. (1995), Gigerenzer et al. (2007), and Hoffrage et al. (2015) for lots of similar problems. Also, Sedlmeier and Gigerenzer (2001) and Kurzenhäuser and Hoffrage (2002) report related training programs.

[3] See Gigerenzer and Hoffrage (1995) and Hoffrage et al. (2000, 2002) on the concept of natural frequencies.

Copy Link

Version

Install

install.packages('riskyr')

Monthly Downloads

204

Version

0.1.0

License

GPL-2 | GPL-3

Maintainer

Hansjoerg Neth

Last Published

February 19th, 2018

Functions in riskyr (0.1.0)

as_pb

Display a percentage as a (rounded) probability.
comp_spec

Compute a decision's specificity from its false alarm rate.
cond.false

Number of individuals for which the condition is false.
is_freq

Verify that input is a frequency (positive integer value).
is_perc

Verify that input is a percentage (numeric value from 0 to 100).
is_valid_prob_set

Verify that a set of probability inputs is valid.
is_valid_prob_pair

Verify that a pair of probability inputs can be a pair of complementary probabilities.
plot_icons

Plot an icon array of a population.
sens

The sensitivity (or hit rate) of a decision process or diagnostic procedure.
plot_mosaic

Plot a mosaic plot of population frequencies.
spec

The specificity of a decision process or diagnostic procedure.
PPV

The positive predictive value of a decision process or diagnostic procedure.
accu

A list containing current accuracy information.
comp_FDR

Compute a decision's false detection rate (FDR) from probabilities.
comp_NPV

Compute a decision's negative predictive value (NPV) from probabilities.
comp_PPV

Compute a decision's positive predictive value (PPV) from probabilities.
comp_freq_prob

Compute frequencies from (3 essential) probabilities.
comp_FOR

Compute a decision's false omission rate (FOR) from probabilities.
comp_comp_pair

Compute a probability's (missing) complement and return both.
FDR

The false detection rate of a decision process or diagnostic procedure.
FOR

The false omission rate (FOR) of a decision process or diagnostic procedure.
comp_mirt

Compute a decision's miss rate from its sensitivity.
comp_complement

Compute a probability's complement probability.
comp_popu

Compute a population table from frequencies.
N

Number of individuals in the population.
comp_freq

Compute frequencies from (3 essential) probabilities.
comp_freq_freq

Compute frequencies from (4 essential) frequencies.
cr

Frequency of correct rejections or true negatives (TN).
is_complement

Verify that two numbers are complements.
cond.true

Number of individuals for which the condition is true.
comp_prob_prob

Compute probabilities from (3 essential) probabilities.
comp_min_N

Compute a suitable minimum population size value N.
comp_acc

Compute overall accuracy (acc) from probabilities.
is_extreme_prob_set

Verify that a set of probabilities describes an extreme case.
NPV

The negative predictive value of a decision process or diagnostic procedure.
comp_accu

Compute acccuracy of current classification results.
comp_sens

Compute a decision's sensitivity from its miss rate.
df.scenarios

A collection of riskyr scenarios from various sources.
dec.cor

Number of individuals for which the decision is correct.
fa

Frequency of false alarms or false positives (FP).
comp_ppod

Compute the proportion of positive decisions (ppod) from probabilities.
comp_prev

Compute the condition's prevalence (baseline probability) from frequencies.
dec.neg

Number of individuals for which the decision is negative.
init_pal

Initialize basic color information.
dec.pos

Number of individuals for which the decision is positive.
init_txt

Initialize basic text elements.
init_num

Initialize basic numeric variables.
hi

Frequency of hits or true positives (TP).
pal

List current values of basic color information.
dec.err

Number of individuals for which the decision is erroneous.
plot.riskyr

Plot information of a riskyr object.
prev

The prevalence (baseline probability) of a condition.
print.summary.riskyr

Printing summarized risk information.
fart

The false alarm rate (or false positive rate) of a decision process or diagnostic procedure.
prob

List current probability information.
freq

List current frequency information.
riskyr

Create riskyr scenarios.
is_prob

Verify that input is a probability (numeric value from 0 to 1).
is_suff_prob_set

Verify a sufficient set of probability inputs.
plot_curve

Plot curves of selected values (e.g., PPV or NPV) as a function of prevalence.
plot_fnet

Plot a network diagram of frequencies and probabilities.
mirt

The miss rate of a decision process or diagnostic procedure.
num

List current values of basic numeric variables.
is_valid_prob_triple

Verify that a triple of essential probability inputs is valid.
mi

Frequency of misses or false negatives (FN).
plot_plane

Plot a plane of selected values (e.g., PPV or NPV) as a function of sensitivity and specificity.
riskyr.guide

Opens the riskyr package guides
popu

A population table based on current frequencies.
plot_tree

Plot a tree diagram of frequencies and probabilities.
scenarios

A collection of riskyr scenarios from various sources.
ppod

The proportion (or baseline) of a positive decision.
summary.riskyr

Summarizing a riskyr scenario.
txt

List current values of basic text elements.
as_pc

Display a probability as a (rounded) percentage.
comp_fart

Compute a decision's false alarm rate from its specificity.
comp_prob

Compute probabilities from (3 essential) probabilities.
comp_complete_prob_set

Compute a complete set of probabilities from valid probability inputs.
comp_prob_freq

Compute probabilities from (4 essential) frequencies.