Learn R Programming

replyr (version 0.8.2)

moveValuesToColumnsQ: Map sets rows to columns (query based).

Description

Transform data facts from rows into additional columns using SQL and controlTable.

Usage

moveValuesToColumnsQ(keyColumns, controlTable, tallTableName, my_db, ...,
  columnsToCopy = NULL,
  tempNameGenerator = replyr::makeTempNameGenerator("mvtcq"), strict = TRUE,
  checkNames = TRUE, showQuery = FALSE)

Arguments

keyColumns

character list of column defining row groups

controlTable

table specifying mapping (local data frame)

tallTableName

name of table containing data to be mapped (db/Spark data)

my_db

db handle

...

force later arguments to be by name.

columnsToCopy

character list of column names to copy

tempNameGenerator

a tempNameGenerator from replyr::makeTempNameGenerator()

strict

logical, if TRUE check control table contents for uniqueness

checkNames

logical, if TRUE check names

showQuery

if TRUE print query

Value

wide table built by mapping key-grouped tallTable rows to one row per group

Details

This is using the theory of "fluid data"n (https://github.com/WinVector/cdata), which includes the principle that each data cell has coordinates independent of the storage details and storage detail dependent coordinates (usually row-id, column-id, and group-id) can be re-derived at will (the other principle is that there may not be "one true preferred data shape" and many re-shapings of data may be needed to match data to different algorithms and methods).

The controlTable defines the names of each data element in the two notations: the notation of the tall table (which is row oriented) and the notation of the wide table (which is column oriented). controlTable[ , 1] (the group label) cross colnames(controlTable) (the column labels) are names of data cells in the long form. controlTable[ , 2:ncol(controlTable)] (column labels) are names of data cells in the wide form. To get behavior similar to tidyr::gather/spread one build the control table by running an appropiate query over the data.

Some discussion and examples can be found here: https://winvector.github.io/replyr/articles/FluidData.html and here https://github.com/WinVector/cdata.

See Also

moveValuesToColumns, moveValuesToRowsQ, buildPivotControlTable

Examples

Run this code
# NOT RUN {
my_db <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
tallTableName <- 'dat'
d <- dplyr::copy_to(my_db,
  dplyr::tribble(
   ~ID,   ~group, ~col1,              ~col2,
   "id1", "aa",   "val_id1_gaa_col1", "val_id1_gaa_col2",
   "id1", "bb",   "val_id1_gbb_col1", "val_id1_gbb_col2",
   "id2", "aa",   "val_id2_gaa_col1", "val_id2_gaa_col2",
   "id2", "bb",   "val_id2_gbb_col1", "val_id2_gbb_col2",
   "id3", "aa",   "val_id3_gaa_col1", "val_id3_gaa_col2",
   "id3", "bb",   "val_id3_gbb_col1", "val_id3_gbb_col2" ),
         tallTableName,
         overwrite = TRUE, temporary=TRUE)
controlTable <- dplyr::tribble(~group, ~col1, ~col2,
                                 'aa',  'c1',  'c2',
                                 'bb',  'c3',  'c4')
keyColumns <- 'ID'
moveValuesToColumnsQ(keyColumns,
                     controlTable,
                     tallTableName,
                     my_db)
# # Source:   table<mvtcq_y579atnjk3zevjqvkeok_0000000001> [?? x 5]
# # Database: sqlite 3.19.3 [:memory:]
#      ID               c1               c2               c3               c4
#   <chr>            <chr>            <chr>            <chr>            <chr>
# 1   id1 val_id1_gaa_col1 val_id1_gaa_col2 val_id1_gbb_col1 val_id1_gbb_col2
# 2   id2 val_id2_gaa_col1 val_id2_gaa_col2 val_id2_gbb_col1 val_id2_gbb_col2
# 3   id3 val_id3_gaa_col1 val_id3_gaa_col2 val_id3_gbb_col1 val_id3_gbb_col2

# }

Run the code above in your browser using DataLab