Learn R Programming

datana (version 1.1.1)

xyboxplot: Function for building a scatterplot with superposing boxplots

Description

The function creates a scatterplot with superposing boxplots for the Y-axis variable segregated by classes (i.e., groups) of the X-axis variable. For a scatterplot between a response variable Y and a predictor variable X, this function superposes boxplots of the response by groups of the predictor variable. The main aim of the above described graph is to get a sense of the distribution of the response variable depending upon the predictor variable.

Usage

xyboxplot(
  x = x,
  y = y,
  col.dots = "blue",
  transp.dots = 0.1,
  xlab = NULL,
  ylab = NULL,
  num.classes = 10,
  segre.type = "percentile",
  limi.classes = NA,
  x.category = FALSE,
  pch.dots = 19,
  col.box = "red",
  transp.boxp = 0.07,
  xlim = NA,
  ylim = NA,
  class.ticks.lwd = 1,
  class.ticks.col = "red",
  class.marks.col = "black",
  cex.dots = 0.7,
  class.marks = FALSE,
  class.ticks = TRUE
)

Value

The function returns the above described graph.

Arguments

x

A numeric vector representing the X-axis variable.

y

A numeric vector representing the Y-axis variable (response).

col.dots

A string specifying the dot colors. The default value is "blue".

transp.dots

A numeric value to be used as transparency for the dots of the figure to be produced. The defauls is set to 0.2

xlab

(optional) A string specifying X-axis label.

ylab

(optional) A string specifying Y-axis label.

num.classes

The number of classes to be used for computing the prediction capabilities. The default is set to 10.

segre.type

A string specifying the type of segregation to build the classes. The types are: (a) percentile implies to segregate with the same amount, or close, of observations to each of the defined num.classes. (b) user.defined implies that the user must provided the limits of the num.classes-1. The default is set to percentile. Notice if user.defined is specified, the option

limi.classes

A vector of size num.classes-1 containing the limits to be used for defining the classes.

x.category

A logical statement, if set to TRUE, the X-axis variable will be treated as categorical for the drawing of the boxplots. The default is set to FALSE.

pch.dots

A numeric factor altering the shape of the dots.

col.box

A string specifying the boxplot color. The default is "red"

transp.boxp

A numeric value to be used as transparency for the boxpot of the figure to be produced. The defauls is set to 0.1

xlim

(optional) A numeric vector having the minimum and maximum, respectively for the X-axis variable.

ylim

(optional) A numeric vector having the minimum and maximum, respectively for the Y-axis variable.

class.ticks.lwd

The numeric width of the tick line for each of the X-axis variable classes. By default is set to 1.

class.ticks.col

A string with the color of the tick line for each of the X-axis variable classes. By default is set to "red".

class.marks.col

A string with the color of the mark value for each of the X-axis variable classes. By default is set to "black".

cex.dots

A numeric factor altering the size of the dots. The default value is 0.7.

class.marks

Whether (logic: TRUE or FALSE) the number value of each of the X-axis variable classes should be printed. By default is set to FALSE.

class.ticks

Whether (logic: TRUE or FALSE) the number tick of each of the X-axis variable classes should be printed. By default is set to TRUE.

Author

Christian Salas-Eljatib

Details

Notice that the superposing boxplots for the Y-axis variable are computed by grouping the X-axis variable in 10 classes. Those classes are set by computing the 0.1, 0.2, ..., 0.9-percentiles of the X-axis variable, therefore each group has the same number of observations. The wide of the boxplot represent the extend of the respective X-axis variable used for drawwing each boxplot.

References

  • Salas-Eljatib C. 2021. Análisis de datos con el programa estadístico R: una introducción aplicada. Ediciones Universidad Mayor. Santiago, Chile. 170 p. https://eljatib.com

  • Salas C, Stage AR, and Robinson AP. 2008. Modeling effects of overstory density and competing vegetation on tree height growth. Forest Science 54(1):107-122. tools:::Rd_expr_doi("10.1093/forestscience/54.1.107")

Examples

Run this code
df <- datana::fishgrowth
xyboxplot(x=df$length,y=df$scale)
xyboxplot(x=df$length,y=df$scale,col.dots = "red",
xlab="Variable X")
xyboxplot(x=df$length,y=df$scale,xlab="Variable X")

## dots with alpha channel
xyboxplot(x=df$length,y=df$scale,xlab="Variable X",
transp.dots = 0.4)

## with categorical x
xyboxplot(x=df$age,y=df$length,x.category = TRUE)

## fixed x axis limits
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10))

## x marks width to .5
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.ticks.lwd = .5)

## x marks red and width 2
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.ticks.lwd = 2, class.ticks.col = "red")

## larger dots
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          cex.dots = 1.5)

## print classes ticks
xyboxplot(x=df$age,y=df$length,x.category = TRUE, xlim = c(0,10),
          class.marks = FALSE, class.ticks.col = "green")

### the x-variable not recorded such as a categorical variable
df <- datana::fishgrowth
## print classes ticks, by default with red color
xyboxplot(x=df$length, y=df$scale)

## don't print ticks
xyboxplot(x=df$length, y=df$scale, class.ticks=FALSE)

## print classes marks values
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE)

## print classes marks values without ticks
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE, class.ticks=FALSE)

## change class marks and ticks colors
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE,
          class.marks.col = "red",
          class.ticks.col = "blue")

## bigger ticks
xyboxplot(x=df$length, y=df$scale, class.marks=TRUE,
          class.marks.col = "red",
          class.ticks.col = "blue", class.ticks.lwd=3)

## Changing the number of the X-variable classes
xyboxplot(x=df$length,y=df$scale,num.classes=5)

## Defining the classes not by percentiles, but by fixed values
xyboxplot(x=df$length,y=df$scale,xlim=c(0,410),
ylim=c(0,20),num.classes=4,
segre.type="fixed",limi.classes=c(140,195,250))

## Note that the limits must be in agreement with the num.classes
xyboxplot(x=df$length,y=df$scale,xlim=c(0,410),ylim=c(0,20),
num.classes=5,segre.type="fixed",limi.classes=c(100,160,200,250))

Run the code above in your browser using DataLab