Learn R Programming

midfieldr (version 1.0.1)

order_multiway: Order categorical variables of multiway data

Description

Transform a data frame such that two independent categorical variables are factors with levels ordered for display in a multiway dot plot. Multiway data comprise a single quantitative value (or response) for every combination of levels of two categorical variables. The ordering of the rows and panels is crucial to the perception of effects (Cleveland, 1993).

Usage

order_multiway(
  dframe,
  quantity,
  categories,
  ...,
  method = NULL,
  ratio_of = NULL
)

Value

A data.table with the following properties:

  • Rows are not modified.

  • Grouping structures are not preserved.

  • The columns specified by categories are converted to factors and ordered.

  • The column specified by quantity is converted to type double if it is an integer.

  • Two columns are added. Caution! An existing column with the same name as one of the added columns is silently overwritten.

The names of the added columns incorporate the names of the multiway variables. Columns added:

CATEGORY_median columns (when ordering method is "median")

Numeric. Two columns of medians of the quantitative variable grouped by the categorical variables. The CATEGORY placeholder in the column name is replaced by a category name from the categories argument. For example, suppose categories = c("program", "people") and method = "median". The two new column names would be program_median and people_median.

CATEGORY_QUANTITY columns (when ordering method is "percent")

Numeric. Two columns of percentages based on the same ratio that produces the quantitative variable except grouped by the categorical variables. The CATEGORY placeholder in the column name is replaced by a category name from the categories argument; the QUANTITY placeholder is replaced by the quantitative variable name in the quantity argument. For example, suppose categories = c("program", "people"), and quantity = "grad_rate", and method = "percent". The two new column names would be program_grad_rate and people_grad_rate.

Arguments

dframe

Data frame with one numeric variable and two categorical variables of class character or factor. Two additional numeric columns required when using the "percent" ordering method.

quantity

Character, name (in quotes) of the single multiway quantitative variable

categories

Character, vector of names (in quotes) of the two multiway categorical variables

...

Not used, forces later arguments to be used by name.

method

Character, “median” (default) or “percent”, method of ordering the levels of the categories. The median method computes the medians of the quantitative column grouped by category. The percent method computes percentages based on the same ratio underlying the quantitative percentage variable except grouped by category.

ratio_of

Character vector with the names (in quotes) of the numerator and denominator columns that produced the quantitative variable, required when method is "percent". Names can be in any order; the algorithm assumes that the parameter with the larger column sum is the denominator of the ratio.

Details

In our context, "multiway" refers to the data structure and graph design defined by Cleveland (1993), not to the methods of analysis described by Kroonenberg (2008).

Multiway data comprise three variables: a categorical variable of m levels; a second independent categorical variable of n levels; and a quantitative variable (or response) of length mn that cross-classifies the categories, that is, there is a value of the response for each combination of levels of the two categorical variables.

In a multiway dot plot, one category is encoded by the panels, the second category is encoded by the rows of each panel, and the quantitative variable is encoded along identical horizontal scales.

References

Cleveland WS (1993). Visualizing Data. Hobart Press, Summit, NJ.

Kroonenberg PM (2008). Applied Multiway Data Analysis. Wiley, Hoboken, NJ.

Examples

Run this code
# Subset of built-in data set
dframe <- study_results[program == "EE" | program == "ME"]
dframe[, people := paste(race, sex)]
dframe[, c("race", "sex") := NULL]
data.table::setcolorder(dframe, c("program", "people"))

# Class before ordering
class(dframe$program)
class(dframe$people)

# Class and levels after ordering
mw1 <- order_multiway(dframe, 
                      quantity = "stickiness", 
                      categories = c("program", "people"))
class(mw1$program)
levels(mw1$program)
class(mw1$people)
levels(mw1$people)

# Display category medians 
mw1

# Existing factors (if any) are re-ordered
mw2 <- dframe
mw2$program <- factor(mw2$program, levels = c("ME", "EE"))

# Levels before conditioning
levels(mw2$program) 

# Levels after conditioning
mw2 <- order_multiway(dframe, 
                      quantity = "stickiness", 
                      categories = c("program", "people"))
levels(mw2$program) 

# Ordering using percent method
order_multiway(dframe, 
               quantity = "stickiness", 
               categories = c("program", "people"), 
               method = "percent", 
               ratio_of = c("graduates", "ever_enrolled"))

Run the code above in your browser using DataLab