Learn R Programming

unpivotr (version 0.3.1)

join_header: Join a bag of data cells to some header, by proximity in a given direction, e.g. NNW searches up and up-left from a data cell to find a header cell.

Description

A bag of data cells is a data frame with at least the columns 'row' and 'col', as well as any others that carry information about the cells, e.g. their values. Cells in a table are associated with header cells by proximity. Having collected header cells and data cells into separate data frames, 'join_header' and the related functions 'NNW', 'ABOVE', etc., join the values in the header cells to the data cells, choose the nearest header to each cell, in a given direction.

Usage

join_header(bag, header, direction, boundaries = NULL)

N(bag, header)

E(bag, header)

S(bag, header)

W(bag, header)

NNW(bag, header)

NNE(bag, header)

ENE(bag, header)

ESE(bag, header)

SSE(bag, header)

SSW(bag, header)

WSW(bag, header)

WNW(bag, header)

ABOVE(bag, header, boundaries = NULL)

LEFT(bag, header, boundaries = NULL)

BELOW(bag, header, boundaries = NULL)

RIGHT(bag, header, boundaries = NULL)

Arguments

bag

Data frame. A bag of data cells including at least the columns 'row' and 'column', which are numeric/integer vectors.

header

Data frame. A bag of data cells including at least the columns 'row' and 'column', which are numeric/integer vectors.

direction

Character vector length 1. A compass direction to search for the nearest header. See 'details'.

boundaries

Data frame. Only applies to the directions "ABOVE", "RIGHT", "BELOW" and "LEFT". A bag of cells in one row or one column, demarking boundaries within which to match headers with cells. For example, a boundary could be a bag of cells with borders on one side. This is useful when the nearest header might be the wrong header because it lies on the other side of a border.

Functions

  • N: Join nearest header in the 'N' direction.

  • E: Join nearest header in the 'E' direction.

  • S: Join nearest header in the 'S' direction.

  • W: Join nearest header in the 'W' direction.

  • NNW: Join nearest header in the 'NNW' direction.

  • NNE: Join nearest header in the 'NNE' direction.

  • ENE: Join nearest header in the 'ENE' direction.

  • ESE: Join nearest header in the 'ESE' direction.

  • SSE: Join nearest header in the 'SSE' direction.

  • SSW: Join nearest header in the 'SSW' direction.

  • WSW: Join nearest header in the 'WSW' direction.

  • WNW: Join nearest header in the 'WNW' direction.

  • ABOVE: Join nearest header in the 'ABOVE' direction.

  • LEFT: Join nearest header in the 'LEFT' direction.

  • BELOW: Join nearest header in the 'BELOW' direction.

  • RIGHT: Join nearest header in the 'BELOW' direction.

Details

Headers are associated with data by proximity in a given direction. The directions are mapped to the points of the compass, where 'N' is north (up), 'E' is east (right), and so on. `join_header()` finds the nearest header to a given data cell in a given direction, and joins its value to the data cell. The most common directions to search are 'NNW' (for left-aligned headers at the top of the table) and 'WNW' for top-aligned headers at the side of the table. The difference between 'N' and 'ABOVE' (and similar pairs of directions) is that 'N' finds headers directly above the data cell, whereas 'ABOVE' matches the nearest header, whether above-left, above-right or directly above the data cell. This is useful for matching headers that are not aligned to the edge of the data cells that they refer to. There can be a tie in the directions 'ABOVE', 'BELOW', 'LEFT' and 'RIGHT' , causing NAs to be returned in the place of header values. The full list of available directions is 'N', 'E', 'S', 'W', 'NNW', 'NNE', 'ENE', 'ESE', 'SSE', 'SSW', 'WSW', 'WNW', 'ABOVE', 'BELOW', 'LEFT', 'RIGHT'. For convenience, these directions are provided as their own functions, wrapping the concept of 'join_header()'.

Examples

Run this code
# NOT RUN {
library(dplyr)
# Load some pivoted data
(x <- purpose$`NNW WNW`)
# Make a tidy representation
cells <- tidy_table(x)
cells <- cells[!is.na(cells$chr), ]
head(cells)
# Select the cells containing the values
datacells <-
  cells %>%
  filter(row >= 3, col >= 3)
head(datacells)
# Select the row headers
row_headers <-
  cells %>%
  filter(col <= 2) %>%
  select(row, col, header = chr) %>%
  split(.$col) # Separate each column of headers
row_headers
# Select the column headers
col_headers <-
  cells %>%
  filter(row <= 2) %>%
  select(row, col, header = chr) %>%
  split(.$row) # Separate each row of headers
col_headers
# From each data cell, search for the nearest one of each of the headers
datacells %>%
  NNW(col_headers$`1`) %>%
  N(col_headers$`2`) %>%
  WNW(row_headers$`1`) %>%
  W(row_headers$`2`)
# }

Run the code above in your browser using DataLab