Diffusion: The Ratcliff Diffusion Model

Description

Density, distribution function, quantile function, and random generation for the Ratcliff diffusion model with following parameters: a (threshold separation), z (starting point), v (drift rate), t0 (non-decision time/response time constant), d (differences in speed of response execution), sv (inter-trial-variability of drift), st0 (inter-trial-variability of non-decisional components), sz (inter-trial-variability of relative starting point), and s (diffusion constant). Note that the parameterization or defaults of non-decision time variability st0 and diffusion constant s differ from what is often found in the literature and that the parameterization of z and sz have changed compared to previous versions (now absolute and not relative).

Usage

recalc_t0(t0, st0)
ddiffusion(rt, response = "upper", a, v, t0, z = 0.5 * a, d = 0,
  sz = 0, sv = 0, st0 = 0, s = 1, precision = 3,
  stop_on_error = FALSE)
pdiffusion(rt, response = "upper", a, v, t0, z = 0.5 * a, d = 0,
  sz = 0, sv = 0, st0 = 0, s = 1, precision = 3, maxt = 20,
  stop_on_error = FALSE, use_precise = TRUE)
qdiffusion(p, response = "upper", a, v, t0, z = 0.5 * a, d = 0,
  sz = 0, sv = 0, st0 = 0, s = 1, precision = 3, maxt = 20,
  interval = c(0, 10), scale_p = FALSE, scale_max = Inf,
  stop_on_error = FALSE, max_diff = 1e-04)
rdiffusion(n, a, v, t0, z = 0.5 * a, d = 0, sz = 0, sv = 0,
  st0 = 0, s = 1, precision = 3, stop_on_error = TRUE, maxt = 20,
  interval = c(0, 10), method = c("fastdm", "qdiffusion"))

Arguments

non-decision time or response time constant (in seconds). Lower bound for the duration of all non-decisional processes (encoding and response execution). Typical range: 0.1 < t0 < 0.5

st0

inter-trial-variability of non-decisional components. Range of a uniform distribution with mean t0 + st0/2 describing the distribution of actual t0 values across trials. Accounts for response times below t0. Reduces skew of predicted RT distributions. Values different from 0 can slow computation considerably. Typical range: 0 < st0 < 0.2. Default is 0.

a vector of RTs. Or for convenience also a data.frame with columns rt and response (such as returned from rdiffusion or rLBA). See examples.

response

character vector. Which response boundary should be tested? Possible values are c("upper", "lower"), possibly abbreviated and "upper" being the default. Alternatively, a numeric vector with values 1=lower and 2=upper. For convenience, response is converted via as.numeric also allowing factors (see examples). Ignored if the first argument is a data.frame.

threshold separation. Amount of information that is considered for a decision. Large values indicate a conservative decisional style. Typical range: 0.5 < a < 2

drift rate. Average slope of the information accumulation process. The drift gives information about the speed and direction of the accumulation of information. Large (absolute) values of drift indicate a good performance. If received information supports the response linked to the upper threshold the sign will be positive and vice versa. Typical range: -5 < v < 5

starting point. Indicator of an a priori bias in decision making. When the relative starting point z deviates from 0.5*a, the amount of information necessary for a decision differs between response alternatives. Default is 0.5*a (i.e., no bias).

differences in speed of response execution (in seconds). Positive values indicate that response execution is faster for responses linked to the upper threshold than for responses linked to the lower threshold. Typical range: -0.1 < d < 0.1. Default is 0.

inter-trial-variability of starting point. Range of a uniform distribution with mean z describing the distribution of actual starting points from specific trials. Values different from 0 can predict fast errors (but can slow computation considerably). Typical range: 0 < sz < 0.5. Default is 0.

inter-trial-variability of drift rate. Standard deviation of a normal distribution with mean v describing the distribution of actual drift rates from specific trials. Values different from 0 can predict slow errors. Typical range: 0 < sv < 2. Default is 0.

diffusion constant; standard deviation of the random noise of the diffusion process (i.e., within-trial variability), scales a, v, and sv. Needs to be fixed to a constant in most applications. Default is 1. Note that the default used by Ratcliff and in other applications is often 0.1.

precision

numerical scalar value. Precision of calculation. Corresponds roughly to the number of decimals of the predicted CDFs that are calculated accurately. Default is 3.

stop_on_error

Should the diffusion functions return 0 if the parameters values are outside the allowed range (= FALSE) or produce an error in this case (= TRUE).

maxt

maximum rt allowed, used to stop integration problems. Larger values lead to considerably longer calculation times.

use_precise

boolean. Should pdiffusion use the precise version for calculating the CDF? The default is TRUE which is highly recommended. Using FALSE (i.e., the imprecise version) is hardly any faster and produces clearly wrong results for most parameter settings.

vector of probabilities. Or for convenience also a data.frame with columns p and response. See examples.

interval

a vector containing the end-points of the interval to be searched for the desired quantiles (i.e., RTs) in qdiffusion. Default is c(0, 10).

scale_p

logical. Should entered probabilities automatically be scaled by maximally predicted probability? Default is FALSE. Convenience argument for obtaining predicted quantiles. Can be slow as the maximally predicted probability is calculated individually for each p.

scale_max

numerical scalar. Value at which maximally predicted RT should be calculated if scale_p is TRUE.

max_diff

numeric. Maximum acceptable difference between desired and observed probability of the quantile function (qdiffusion).

is a desired number of observations.

method

character. Experimentally implementation of an alternative way of generating random variates via the quantile function (qdiffusion) and random uniform value. For simple calls, the default method "fastdm" is dramatically faster.

Value

ddiffusion gives the density, pdiffusion gives the distribution function, qdiffusion gives the quantile function (i.e., predicted RTs), and rdiffusion generates random response times and decisions (returning a data.frame with columns rt (numeric) and response (factor)).

The length of the result is determined by n for rdiffusion, equal to the length of rt for ddiffusion and pdiffusion, and equal to the length of p for qdiffusion.

The distribution parameters (as well as response) are recycled to the length of the result. In other words, the functions are completely vectorized for all parameters and even the response boundary.

Details

The Ratcliff diffusion model (Ratcliff, 1978) is a mathematical model for two-choice discrimination tasks. It is based on the assumption that information is accumulated continuously until one of two decision thresholds is hit. For introductions see Ratcliff and McKoon (2008), Voss, Rothermund, and Voss (2004), Voss, Nagler, and Lerche (2013), or Wagenmakers (2009).

All functions are fully vectorized across all parameters as well as the response to match the length or rt (i.e., the output is always of length equal to rt). This allows for trialwise parameters for each model parameter.

For convenience, all functions (with the exception of rdiffusion) allow that the first argument is a data.frame containing the information of the first and second argument in two columns (i.e., rt/p and response). Other columns (as well as passing response separately argument) will be ignored. This allows, for example, to pass the data.frame generated by rdiffusion directly to pdiffusion. See examples.

Quantile Function

Due to the bivariate nature of the diffusion model, the diffusion processes reaching each response boundary only return the defective CDF that does not reach 1. Only the sum of the CDF for both boundaries reaches 1. Therefore, qdiffusion can only return quantiles/RTs for any accumulator up to the maximal probability of that accumulator's CDF. This can be obtained by evaluating the CDF at Inf.

As a convenience for the user, if scale_p = TRUE in the call to qdiffusion the desired probabilities are automatically scaled by the maximal probability for the corresponding response. Note that this can be slow as the maximal probability is calculated separately for each desired probability. See examples.

Also note that quantiles (i.e., predicted RTs) are obtained by numerically minimizing the absolute difference between desired probability and the value returned from pdiffusion using optimize. If the difference between the desired probability and probability corresponding to the returned quantile is above a certain threshold (currently 0.0001) no quantile is returned but NA. This can be either because the desired quantile is above the maximal probability for this accumulator or because the limits for the numerical integration are too small (default is c(0, 10)).

References

Ratcliff, R. (1978). A theory of memory retrieval. Psychological Review, 85(2), 59-108.

Ratcliff, R., & McKoon, G. (2008). The diffusion decision model: Theory and data for two-choice decision tasks. Neural Computation, 20(4), 873-922.

Voss, A., Rothermund, K., & Voss, J. (2004). Interpreting the parameters of the diffusion model: An empirical validation. Memory & Cognition. Vol 32(7), 32, 1206-1220.

Voss, A., Nagler, M., & Lerche, V. (2013). Diffusion Models in Experimental Psychology: A Practical Introduction. Experimental Psychology, 60(6), 385-402. doi:10.1027/1618-3169/a000218

Wagenmakers, E.-J., van der Maas, H. L. J., & Grasman, R. P. P. P. (2007). An EZ-diffusion model for response time and accuracy. Psychonomic Bulletin & Review, 14(1), 3-22.

Wagenmakers, E.-J. (2009). Methodological and empirical developments for the Ratcliff diffusion model of response times and accuracy. European Journal of Cognitive Psychology, 21(5), 641-671.

Examples

Run this code

# NOT RUN {
## identical calls (but different random values)
rt1 <- rdiffusion(500, a=1, v=2, t0=0.5)
head(rt1)
rt2 <- rdiffusion(500, a=1, v=2, t0=0.5, d=0, sz=0, sv=0, st0=0)
head(rt2)
  
# get density for random RTs (possible to specify arguments for pdiffusion in same way):
sum(log(ddiffusion(rt1$rt, rt1$response, a=1, v=2, t0=0.5)))  # response is factor
sum(log(ddiffusion(rt1$rt, as.numeric(rt1$response), a=1, v=2, t0=0.5))) # response is numeric
sum(log(ddiffusion(rt1$rt, as.character(rt1$response), a=1, v=2, t0=0.5))) # response is character
sum(log(ddiffusion(rt1, a=1, v=2, t0=0.5))) # response is data.frame


sum(log(ddiffusion(rt2$rt, rt2$response, a=1, v=2, t0=0.5)))

# can we recover the parameters?
ll_diffusion <- function(pars, rt, response) 
{
  densities <- ddiffusion(rt, response=response, 
               a=pars[1], v=pars[2], t0=pars[3], 
               sz=pars[4], 
               st0=pars[5], sv=pars[6])
  if (any(densities == 0)) return(1e6)
  return(-sum(log(densities)))
}

# }
# NOT RUN {
)
start <- c(runif(2, 0.5, 3), 0.1, runif(3, 0, 0.5))
names(start) <- c("a", "v", "t0", "sz", "st0", "sv")
recov <- nlminb(start, ll_diffusion, lower = 0, rt=rt1$rt, response=rt1$response)
round(recov$par, 3)
#     a     v    t0    sz   st0    sv 
# 1.019 1.879 0.496 0.000 0.000 0.389 
## results of course depend on random seed for rdiffusion and runif
# }
# NOT RUN {

# }
# NOT RUN {
## replicate Table 1 from Wagenmakers et al. (2007) using rdiffusion:

n <- 1e5 # number of samples
# take parameter valeus from Table 2 and set s to 0.1
george <- rdiffusion(n, a = 0.12, v = 0.25, t0 = 0.3, s = 0.1)
rich   <- rdiffusion(n, a = 0.12, v = 0.25, t0 = 0.25, s = 0.1)
amy    <- rdiffusion(n, a = 0.08, v = 0.25, t0 = 0.3, s = 0.1)
mark   <- rdiffusion(n, a = 0.08, v = 0.25, t0 = 0.25, s = 0.1)

george$id <- "george"
rich$id <- "rich"
amy$id <- "amy"
mark$id <- "mark"

wag <- rbind(george, rich, amy, mark)
wag$id <- factor(wag$id, levels = c("george", "rich", "amy", "mark"))

opt <- options()
options(digits = 3)
aggregate(cbind(rt, as.numeric(response)-1) ~ id, wag, mean)
#       id    rt    V2
# 1 george 0.517 0.952
# 2   rich 0.467 0.953
# 3    amy 0.422 0.881
# 4   mark 0.372 0.882
options(digits = 1)
aggregate(rt ~ id, wag, var)
#       id    rt
# 1 george 0.024
# 2   rich 0.024
# 3    amy 0.009
# 4   mark 0.009
options(opt)
# }
# NOT RUN {

## plot density:
curve(ddiffusion(x, a=1, v=2, t0=0.5, response = "upper"), 
      xlim=c(0,3), main="Density of upper responses", ylab="density", xlab="response time")
curve(ddiffusion(x, a=1, v=2, t0=0.5, st0=0.2, response = "upper"), 
      add=TRUE, lty = 2)
legend("topright", legend=c("no", "yes"), title = "Starting Point Variability?", lty = 1:2)

# plot cdf:
curve(pdiffusion(x, a=1, v=2, t0=0.5, st0=0.2, response="u"), 
     xlim = c(0, 3),ylim = c(0,1), 
     ylab = "cumulative probability", xlab = "response time",
     main = "CDF of diffusion model with start point variability")
curve(pdiffusion(x, a=1, v=2, t0=0.5, st0=0.2, response="l"), 
     add=TRUE, lty = 2)
legend("topleft", legend=c("upper", "lower"), title="response boundary", lty=1:2)

# }
# NOT RUN {
### qdiffusion can only return values up to maximal predicted probability:
(max_p <- pdiffusion(Inf, a=1, v=2, t0=0.5, st0=0.2, sz = 0.1, sv = 0.5, response="u"))
# [1] 0.87
# (Note that with the current integration routine for pdiffusion use Inf and not smaller values.)

qdiffusion(0.87, a=1, v=2, t0=0.5, st0=0.2, sz = 0.1, sv = 0.5, response="u")
# [1] 1.945802

qdiffusion(0.88, a=1, v=2, t0=0.5, st0=0.2, sz = 0.1, sv = 0.5, response="u")
# NA with warning.

# to get predicted quantiles, scale required quantiles by maximally predicted response rate:
qs <- c(.1, .3, .5, .7, .9)
qdiffusion(qs*max_p, a=1, v=2, t0=0.5, st0=0.2, sz = 0.1, sv = 0.5, response="u")

# or set scale_p to TRUE which scales automatically by maximum p
# (but can be slow as it calculates max_p for each probability separately) 
qdiffusion(qs, a=1, v=2, t0=0.5, st0=0.2, sz = 0.1, sv = 0.5, response="u", scale_p = TRUE)


# qdiffusion also accepts a data.frame as first argument:
t3 <- data.frame(p = rep(c(0.05, 0.1, 0.87), 2), response = rep(c("upper", "lower"), each = 3))
#      p response
# 1 0.05    upper
# 2 0.10    upper
# 3 0.87    upper
# 4 0.05    lower
# 5 0.10    lower
# 6 0.87    lower
qdiffusion(t3, a=1, v=2, t0=0.5, st0=0.2, sz = 0.1, sv = 0.5, scale_p = TRUE)
# }
# NOT RUN {
## LBA and diffusion can be used interchangeably:
rt1 <- rLBA(500, A=0.5, b=1, t0 = 0.5, mean_v=c(2.4, 1.6), sd_v=c(1,1.2))
rt2 <- rdiffusion(500, a=1, v=2, t0=0.5)

# data can also be passed as data.frame (same is true for pLBA):
sum(log(dLBA(rt1, A=0.5, b=1, t0 = 0.5, mean_v=c(2.4, 1.6), sd_v=c(1,1.2))))
sum(log(dLBA(rt2, A=0.5, b=1, t0 = 0.5, mean_v=c(2.4, 1.6), sd_v=c(1,1.2))))

sum(log(ddiffusion(rt1, a=1, v=2, t0=0.5)))
sum(log(ddiffusion(rt2, a=1, v=2, t0=0.5)))

# }

Run the code above in your browser using DataLab