qddplot: Quantile Difference Diagram

Description

Creates quantile difference diagrams. It takes two numerical datasets, sorts them, calculates differences on the same quantiles (i.e. lowest with the lowest, median with the median, 90% with 90% etc.) and displays these differences. It can be used to check the differences in two datasets. Appropriate for visual illustration of Wilcoxon rank test and F-test of equality of variances.

Usage

qddplot(x, y, 
	remove.ratio = 0.1, differences.range = NA, differences.rangemin = 10, 
	differences.drawzero = TRUE, quantiles.drawhalf = TRUE, 
	quantiles.showaxis = TRUE, line.lwd = 5, xlab = "Quantile", 
	ylab = "Difference", main = "Quantile Differences", ...)

Arguments

x, y

vectors of numeric values to be compared. It is not necessary to have the same length.

remove.ratio

a real number in range [0.0, 0.5). Indicates how much leading and tailing data should not be displayed in order to avoid a bias on the diagram caused by outliers.

differences.range

numeric, indicating the value range (i.e. the y axis) to be shown. The default value is NA, indicating the maximum absolute value is considered. This can be overridden with an explicit value.

differences.rangemin

numeric, indicating the minimum value of the value range. Used in combination with differences.range=NA (the default), else ignored.

differences.drawzero

logical, indicating if the y=0 helper line should be displayed.

quantiles.drawhalf

logical, indicating if the x=0.5 helper line should be displayed.

quantiles.showaxis

logical, indicating if the custom values on the x axis should be displayed.

line.lwd

width of the main line.

xlab

title for the x axis, see plot.

ylab

title for the y axis, see plot.

main

plot title, see plot.

...

standard graphic parameters. See par for details.

Details

The quantile difference diagrams can be used to visualize the results of Wilcoxon rank tests (wilcox.test). Wilcoxon rank test basically checks if in one dataset are the values are different (more specifically: higher or lower) than those in the other. If it turns that numbers found in dataset A are higher than those in dataset B, that doesn't necessary mean all the values in A are higher than any value in B, nor it means that in case of pairwise comparison on the same quantile (e.g. minimum with the minimum, median with the median etc.), values in A are higher than values in B. On the other hand, in case of quantile comparison, if significantly higher number of elements are higher in A than B, comparing the number of elements where elements in B are higher than A, then the Wilcoxon rank test results is an appropriately low p-value, indicating that the null-hypothesis (the medians are equal) can be rejected. If in case of comparison, if in all cases of quantiles the values in subset A are higher than values in subset B, then the result would look like a line completely on the same part of y coordinate (either completely on the positive or on the negative part). This is rare. In most of the cases the diagram resembles on a more or less straight line. The slope indicates the difference between variances (see also var.test). The position of the line indicates if the values in one dataset is higher than those in another; being close to 0 at 50%, if they are similar.

Examples

Run this code

# Using default settings with random data.
qddplot(rnorm(100, 30, 50), rnorm(200, 10, 10));

# remove.ratio = 0.0 means the outliers are not removed. The result is usually biased.
qddplot(rnorm(100, 30, 50), rnorm(200, 10, 10), remove.ratio=0.0);

# remove.ratio = 0.25 means only the middle half is displayed, the upper and lower quantile not.
qddplot(rnorm(100, 30, 50), rnorm(200, 10, 10), remove.ratio=0.25);

# Illustrating similar and different medians and variances on 4 quantile difference diagrams.
# This is also an illustration for setting custom main title and subtitle.
dataSetA <- seq(-20, 20) + rnorm(41);
dataSetB <- seq(-15, 25) + rnorm(41);
dataSetC <- seq(-40, 40) + rnorm(81);
dataSetD <- seq(-20, 20) + rnorm(41);
op <- par(mfrow=c(2,2));
qddplot(dataSetA, dataSetD, main = "Similar median, similar variance", 
	sub = "-20...20 vs. -20...20");
qddplot(dataSetA, dataSetB, main = "Different median, similar variance", 
	sub = "-20...20 vs. -15...25");
qddplot(dataSetA, dataSetC, main = "Similar median, different variance", 
	sub = "-20...20 vs. -40...40");
qddplot(dataSetB, dataSetC, main = "Different median, different variance", 
	sub = "-15...25 vs. -40...40"); 
par(op);

# Change plot style: thicker line in red color.
qddplot(rnorm(100, 30, 50), rnorm(200, 10, 10), line.lwd=10, col="red");

# Hide axes, helper lines and captions.
qddplot(rnorm(100, 30, 50), rnorm(200, 10, 10), differences.drawzero=FALSE, 
	quantiles.drawhalf=FALSE, quantiles.showaxis=FALSE, line.lwd=1, yaxt='n', 
	main="", sub="", xlab="", ylab="");

# Do not consider minimal range.
qddplot(rnorm(100, 1, 1), rnorm(100, 2, 1), differences.rangemin=NA);

# Set an explicit range.
qddplot(rnorm(100, 0, 200), rnorm(100, 0, 200), differences.range=40);

Run the code above in your browser using DataLab