Learn R Programming

GAPR (version 0.1.0)

GAP: Generalized Association Plots (GAP)

Description

Generates a generalized association plot for the given matrix or data frame, with optional proximity computation, ordering, flipping, coloring, and export options.

Usage

GAP(
  data,
  isProximityMatrix = FALSE,
  XdNum = NULL,
  XcNum = NULL,
  YdNum = NULL,
  YcNum = NULL,
  row.name = NULL,
  Xd.name = NULL,
  Xc.name = NULL,
  row.prox = NULL,
  col.prox = NULL,
  show.row.prox = TRUE,
  show.col.prox = TRUE,
  row.order = NULL,
  col.order = NULL,
  row.flip = NULL,
  col.flip = NULL,
  row.externalOrder = NULL,
  col.externalOrder = NULL,
  original.color = NULL,
  row.color = NULL,
  col.color = NULL,
  Xd.color = NULL,
  Xc.color = NULL,
  Yd.color = NULL,
  Yc.color = NULL,
  row.label.size = NULL,
  col.label.size = NULL,
  Xd.label.size = NULL,
  Xc.label.size = NULL,
  Yd.label.size = NULL,
  Yc.label.size = NULL,
  colorbar.margin = 1.5,
  border = FALSE,
  border.width = 1,
  isContainMissingValue = 0,
  MissingValue.color = "gray",
  exp.row_order = FALSE,
  exp.column_order = FALSE,
  exp.row_names = FALSE,
  exp.column_names = FALSE,
  exp.Xc = FALSE,
  exp.Yc = FALSE,
  exp.Xd = FALSE,
  exp.Yd = FALSE,
  exp.Xd_codebook = FALSE,
  exp.Yd_codebook = FALSE,
  exp.originalmatrix = FALSE,
  exp.row_prox = FALSE,
  exp.col_prox = FALSE,
  PNGfilename = NULL,
  PNGwidth = 1800,
  PNGheight = 1200,
  PNGres = 150,
  show.plot = FALSE
)

Value

A composite plot (e.g., heatmap with annotations) is saved or displayed. Additional information may be exported based on the settings.

If one or more export-related options (exp.*) are set to TRUE, the function returns a list containing the requested components. Each element in the list corresponds to an exportable data object,

Arguments

data

A data frame to be visualized.

isProximityMatrix

Logical. Whether the input data is already a proximity matrix.

XdNum, XcNum, YdNum, YcNum

Integer vectors specifying discrete/continuous covariates on X and Y axes.

row.name

Either a character vector, or an integer vector to be used as row names.

Xd.name, Xc.name

Either A string, or a character vector to be used as Xc.name/Xd.name.

row.prox, col.prox

A string indicating the method used to compute row/column proximity.

show.row.prox, show.col.prox

Logical. Whether to show row/column proximity matrices.

row.order, col.order

A string specifying the method used to order rows/columns.

row.flip, col.flip

A string specifying the row/column flipping method.

row.externalOrder, col.externalOrder

Integer vectors used as external references for flipping.

original.color

Color palette for the original data matrix.

row.color, col.color

Color palettes for the row/column proximity matrices.

Xd.color, Xc.color, Yd.color, Yc.color

Color palettes for covariate matrices.

row.label.size, col.label.size

Numeric values controlling the font size of row and column labels.

Xd.label.size, Xc.label.size, Yd.label.size, Yc.label.size

Numeric values controlling the font size of covariate labels for X and Y axes.

colorbar.margin

Numeric. The margin space between the colorbar and the main plot area.

border

Logical. Whether to draw borders around each matrix.

border.width

Numeric value specifying border width.

isContainMissingValue

Integer. Set to 1 if the input data contains missing values; otherwise, use 0.

MissingValue.color

Color to represent missing values in the matrix. Default is "gray".

exp.row_order, exp.column_order

Logical. Whether to export row/column order.

exp.row_names, exp.column_names

Logical. Whether to export sorted row/column names.

exp.Xc, exp.Yc, exp.Xd, exp.Yd

Logical. Whether to export sorted covariate matrices.

exp.Xd_codebook, exp.Yd_codebook

Logical. Whether to export codebooks for discrete covariates.

exp.originalmatrix

Logical. Whether to export the reordered original matrix.

exp.row_prox, exp.col_prox

Logical. Whether to export computed proximity matrices (after ordering).

PNGfilename

A string specifying the output filename for the PNG image.

PNGwidth, PNGheight

Width/height of the PNG image in pixels.

PNGres

Resolution of the PNG image in DPI.

show.plot

Logical. Whether to display the plot in the R graphics window after generation.

Details

isProximityMatrix

If isProximityMatrix = TRUE, you may directly provide a proximity matrix as the input data. In this case, only row-based settings will be applied, such as row.order, row.flip, and row.externalOrder. Note that correlation matrices (e.g., "pearson") must be converted to distance matrices before being used, and the selected color scheme must also be one of the supported diverging palettes (e.g., "GAP_Blue_White_Red", "BrBG", "PiYG", "PRGn", "PuOr", "RdBu", "RdGy").

XdNum, XcNum, YdNum, YcNum

These parameters are used to specify which columns in data should be treated as covariates on the X or Y axes. Provide the column indices (e.g., XdNum = c(3, 5)) of discrete or continuous variables.

Xd.name, Xc.name

If not provided, the default labels will be a sequence of numbers based on the number of selected variables (e.g., "1", "2", ..., up to the length of XdNum or XcNum).

row.name

This parameter can be:

  • A character vector providing custom row names.

  • An integer (column index) indicating a column in data to be used as row names.

  • If row.name = NULL, the row names will be automatically generated as 1:nrow(data).

row.prox, col.prox

Available proximity methods for row.prox and col.prox include:

  • "euclidean"

  • "pearson"

  • "kendall"

  • "spearman"

  • "atancorr" (adjusted tangent correlation)

  • "city-block" (Manhattan distance)

  • "abs_pearson"

  • "uncenteredcorr"

  • "abs_uncenteredcorr"

  • "maximum"

  • "canberra"

For binary data, the following methods are supported:

  • "hamman"

  • "jaccard"

  • "phi"

  • "rao"

  • "rogers"

  • "simple"

  • "sneath"

  • "yule"

show.row.prox, show.col.prox

If set to TRUE, the corresponding proximity matrix will be visualized. If set to FALSE, the proximity matrix will not be shown, but the associated proximity and ordering methods will still be applied. In such cases, the dendrogram (tree structure) will appear alongside the original plot, reflecting the proximity-based ordering.

row.order, col.order

The ordering method determines how the rows or columns are reordered. Supported options include:

  • "original" — Use the original data order.

  • "random" — Randomly permute the order.

  • "reverse" — Reverse the original order.

  • "r2e" — Rank-two ellipse ordering.

  • "single" — Single-linkage hierarchical clustering.

  • "complete" — Complete-linkage hierarchical clustering.

  • "average" — Average-linkage hierarchical clustering (UPGMA).

  • any method name from the seriation package — such as "TSP", "Spectral", "ARSA", etc.

If the ordering method is "original", "random", or "reverse", then proximity matrices are not required, and the parameters row.prox or col.prox may be left unset.

For all other ordering methods, a proximity matrix must be computed first. Therefore, row.prox or col.prox must be specified accordingly.

Note: it is necessary to explicitly specify one of the valid ordering options; the function does not assume a default.

row.flip, col.flip

Supported flipping methods include:

  • "r2e" — Flip using the rank-two ellipse (R2E) method.

  • "uncle" — Apply uncle-flipping based on tree structure.

  • "grandpa" — Apply grandpa-flipping based on tree structure.

Usage restrictions:

  1. Flipping is only applicable when a hierarchical clustering tree is generated. Therefore, if row.order or col.order is set to "original", "random", "reverse", "r2e", or a seriation method, tree structures are not built and flipping cannot be applied.

  2. When using "r2e" as the ordering method, only "r2e" flipping is allowed. "uncle" or "grandpa" flipping will be ignored.

  3. Do not specify both externalOrder and flip at the same time. These options are mutually exclusive. If both are provided, the function will throw an error.

row.externalOrder, col.externalOrder

External orders are used as references when flipping the hierarchical clustering tree. If a tree is available, the external order guides the flipping of the dendrogram’s leaf nodes to better match a predefined sequence.

Important: Do not use externalOrder together with flip; they are mutually exclusive.

Color settings

The function supports a variety of color palette options for visualizing the original matrix, proximity matrices, and covariate matrices.

Supported built-in palettes include:

  • "GAP_Rainbow"

  • "GAP_Blue_White_Red"

  • "GAP_d"

  • "grayscale_palette"

You may also specify any palette name from the RColorBrewer package. However, note that some palettes—such as those under the "Qualitative" category—are not suitable for visualizing continuous data like proximity matrices.

All palette names must be passed as character strings (e.g., "GAP_Rainbow", "Set1").

original.color: The system will automatically determine the appropriate default color palette based on data type. If the input data is binary, the default is a grayscale palette; otherwise, it defaults to "GAP_Rainbow".

row.color, col.color: The system chooses a default palette based on the proximity method used. For distance-based methods (e.g., "euclidean", "city-block"), the default is "GAP_Rainbow". For correlation-based methods (e.g., "pearson", "spearman"), the default is "GAP_Blue_White_Red".

Xd.color, Yd.color (discrete covariates): The default color palette is "GAP_d", which supports up to 16 distinct categories. If there are more than 16 unique levels, a custom palette should be provided by the user.

Label size settings

Font sizes for axis labels and covariate matrices can be customized individually. Default values are:

  • row.label.size: 2

  • col.label.size: 8

  • Xd.label.size, Xc.label.size, Yd.label.size, Yc.label.size, Xc.label.size: 8

You may increase or decrease these values to improve readability depending on figure size and resolution.

Export-related options (exp.*)

When any of the exp.* parameters are set to TRUE, the corresponding data will be stored in a list and returned by the function. This allows users to programmatically retrieve the order, reordered matrix, proximity matrices, covariate data, or codebooks after plotting.

PNG output settings

The following parameters control the export of the PNG image:

  • PNGfilename: The name of the PNG file to be saved.

    • The file extension .png must be included manually (e.g., "myplot.png").

    • If no file path is specified, the image will be saved in a system-generated temporary directory (via tempdir()) using the default filename "output_plot.png".

    • To save the image to a specific location, provide the full path (e.g., "C:/.../myplot.png").

  • PNGwidth: Width of the output image in pixels. Default = 1800.

  • PNGheight: Height of the output image in pixels. Default = 1200.

  • PNGres: Resolution (dots per inch, DPI). Default = 150.

Examples

Run this code
# Example using the crabs dataset from the MASS package
if (requireNamespace("MASS", quietly = TRUE)) {
  df_crabs <- MASS::crabs
  CRAB_result <- GAP(
    data = df_crabs,
    YdNum = c(1,2),        # First two columns as Y discrete covariates
    YcNum = 3,             # Third column as Y continuous covariate
    row.name = c(1,2,3),   # Use First three columns as row names
    row.prox = "euclidean",
    col.prox = "euclidean",
    row.order = "average",
    col.order = "average",
    row.flip = "r2e",
    col.flip = "r2e",
    border = TRUE,
    border.width = 1,
    exp.row_order = TRUE,
    exp.column_order = TRUE,
    exp.row_names = TRUE,
    exp.column_names = TRUE,
    exp.Yd_codebook = TRUE,
    exp.Yd = TRUE,
    exp.Yc = TRUE,
    exp.originalmatrix = TRUE,
    exp.row_prox = TRUE,
    exp.col_prox = TRUE,
    PNGfilename = file.path(tempdir(), "output_plot.png"),
    show.plot = TRUE
  )

  # Access exported results:
  CRAB_result$row_order       # Row order after ordering
  CRAB_result$column_order    # Column order after ordering
  CRAB_result$row_names       # Row names after ordering
  CRAB_result$column_names    # Column names after ordering
  CRAB_result$Yd_codebook     # Codebook for Y discrete covariates
  CRAB_result$Yd              # Y discrete covariates after ordering
  CRAB_result$Yc              # Y continuous covariates after ordering
  CRAB_result$originalmatrix  # Original matrix (after ordering)
  CRAB_result$row_prox        # Row proximity matrix (after ordering)
  CRAB_result$col_prox        # Column proximity matrix (after ordering)

  # Evaluate row ordering quality
  AR(CRAB_result$row_prox, CRAB_result$row_order)
  GAR(CRAB_result$row_prox, CRAB_result$row_order, w = 10)
  RGAR(CRAB_result$row_prox, CRAB_result$row_order, w = 10)

  # Evaluate column ordering quality
  AR(CRAB_result$col_prox, CRAB_result$column_order)
  GAR(CRAB_result$col_prox, CRAB_result$column_order)
  RGAR(CRAB_result$col_prox, CRAB_result$column_order)
}

Run the code above in your browser using DataLab