Learn R Programming

flattabler

Pivot tables are generally used to present raw and summary data. They are generated from spreadsheets and, more recently, also from R (pivottabler).

If we generate pivot tables from our own data, flattabler package is not necessary. But, if we get data in pivot table format and need to represent or analyse it using another tool, this package can be very helpful: It can save us several hours of programming or manual transformation.

flattabler package offers a set of operations that allow us to transform one or more pivot tables into a flat table.

Installation

You can install the released version of flattabler from CRAN with:

install.packages("flattabler")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("josesamos/flattabler")

Example

A pivot table contains label rows and columns, and an array of values, usually numeric data. It can contain additional information, such as table header or footer.

Below is an example of a pivot table obtained from the pivottabler package. It is included in flattabler package in the form of the variable df_pivottabler, defined as a data frame.

V1V2V3V4V5V6V7V8V9V10
Express PassengerOrdinary PassengerTotal
DMUEMUHSTTotalDMUEMUTotal
Number of TrainsArriva Trains Wales307930798308303909
CrossCountry2213373222865636322928
London Midland56388849144875591282013379248279
Virgin Trains2137645785948594
Total3298715306732490256484282013468583710
Maximum SpeedArriva Trains Wales9090909090
CrossCountry125125125100100125
London Midland100110110100100100110
Virgin Trains125125125125
Total125125125125100100100125

The transformation to obtain a flat table from the previous pivot table using flattabler package is as follows:

library(flattabler)

ft <- pivot_table(df_pivottabler) |>
  define_labels(n_col = 2, n_row = 2) |>
  fill_labels() |>
  remove_agg() |>
  fill_values() |>
  unpivot(na_rm = TRUE)

The result is a tibble object that can be further transformed, for example, by the dplyr package to remove the added data.

ft <- ft |>
  dplyr::filter(col2 != "Total") |>
  dplyr::filter(row2 != "Total")

The result obtained is as follows:

col1col2row1row2value
Number of TrainsArriva Trains WalesExpress PassengerDMU3079
Number of TrainsArriva Trains WalesOrdinary PassengerDMU830
Number of TrainsCrossCountryExpress PassengerDMU22133
Number of TrainsCrossCountryExpress PassengerHST732
Number of TrainsCrossCountryOrdinary PassengerDMU63
Number of TrainsLondon MidlandExpress PassengerDMU5638
Number of TrainsLondon MidlandExpress PassengerEMU8849
Number of TrainsLondon MidlandOrdinary PassengerDMU5591
Number of TrainsLondon MidlandOrdinary PassengerEMU28201
Number of TrainsVirgin TrainsExpress PassengerDMU2137
Number of TrainsVirgin TrainsExpress PassengerEMU6457
Maximum SpeedArriva Trains WalesExpress PassengerDMU90
Maximum SpeedArriva Trains WalesOrdinary PassengerDMU90
Maximum SpeedCrossCountryExpress PassengerDMU125
Maximum SpeedCrossCountryExpress PassengerHST125
Maximum SpeedCrossCountryOrdinary PassengerDMU100
Maximum SpeedLondon MidlandExpress PassengerDMU100
Maximum SpeedLondon MidlandExpress PassengerEMU110
Maximum SpeedLondon MidlandOrdinary PassengerDMU100
Maximum SpeedLondon MidlandOrdinary PassengerEMU100
Maximum SpeedVirgin TrainsExpress PassengerDMU125
Maximum SpeedVirgin TrainsExpress PassengerEMU125

Once we have defined the necessary transformations for a pivot table, we can apply them to any other with the same structure. Candidate tables can have different number of rows or columns, depending on the number of labels, but they must have the same number of rows and columns of labels, and the same number of header or footer rows, so that the transformations are the same for each table.

To easily perform this operation, we define a function f from the transformations, as shown below.

f <- function(pt) {
  pt |>
    set_page(1, 1) |>
    define_labels(n_col = 2, n_row = 2) |>
    remove_top(1) |>
    fill_labels() |>
    remove_agg() |>
    fill_values() |>
    remove_k() |>
    replace_dec() |>
    unpivot()
}

folder <- system.file("extdata", "csvfolder", package = "flattabler")
lpt <- read_text_folder(folder)

lft <- flatten_table_list(lpt, f)

lft
#> # A tibble: 201 × 6
#>    page  col1  col2  row1  row2  value
#>    <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 M1    b1    a01   e2    d4    1.88 
#>  2 M1    b1    a05   e1    d1    1.91 
#>  3 M1    b1    a05   e2    d3    1.10 
#>  4 M1    b1    a05   e2    d4    2.25 
#>  5 M1    b1    a09   e1    d1    2.55 
#>  6 M1    b1    a09   e1    d2    2.74 
#>  7 M1    b1    a09   e2    d3    3.99 
#>  8 M1    b1    a13   e1    d1    2.99 
#>  9 M1    b1    a13   e1    d2    1.02 
#> 10 M1    b1    a13   e2    d3    3.48 
#> # ℹ 191 more rows

In this way we can generate a flat table from a list of pivot tables. The list of pivot tables is generated using package functions to import them from various data sources.

Copy Link

Version

Install

install.packages('flattabler')

Monthly Downloads

269

Version

2.1.2

License

MIT + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Jose Samos

Last Published

May 1st, 2024

Functions in flattabler (2.1.2)

remove_agg

Remove rows and columns with aggregated data
get_page

Get the page information of a pivot table
pf_ex_compact

Pivot table result of transforming a data frame with a column with data from two label fields
remove_cols

Remove columns from a pivot table
spacer_rows

Spacer rows
remove_bottom

Remove bottom rows from a pivot table
remove_k

Remove thousands separator
remove_empty

Remove empty rows and columns from a pivot table
spacer_columns

Spacer columns
read_text_file

Import text file
read_excel_sheet

Import Excel file sheet
remove_top

Remove top rows from a pivot table
remove_rows

Remove rows from a pivot table
pt_ex

Pivot table with with thousands indicator and decimal numbers
remove_right

Remove right columns from a pivot table
remove_left

Remove left columns from a pivot table
pivot_table

pivot_table S3 class
fill_values

Fill in missing values
read_excel_file

Import Excel file
read_excel_folder

Import one sheet from each Excel file in a folder
unpivot

Unpivot a pivot table
replace_dec

Replace decimal separator
set_page

Set page information to a pivot table
df_set_h_v

Set of pivot tables on one sheet
define_labels

Define the quantity of rows and columns that contain labels
divide

Divide table
extract_labels

Extract labels
df_set_h

Set of pivot tables placed horizontally on one sheet
df_ex

Pivot table in data frame with with thousands indicator and decimal numbers
df_ex_compact

Pivot table in data frame with a column with data from two label fields
df_set_v

Set of pivot tables placed vertically on one sheet
assign_names

data frame col and row names
df_pivottabler

Pivot table with basic and subtotal labels in the same column
fill_labels

Fill in missing labels
flatten_table_list

Transform a pivot_table object list into a flat table
ft_set

Flat table with page from a pivot table with with thousands indicator and decimal numbers
ft_ex

Flat table with page from a pivot table with with thousands indicator and decimal numbers
fill_vector

Fill in missing values in a vector
ft_ex_v2

Flat table without page from a pivot table with with thousands indicator and decimal numbers
get_col_values

Get column values
read_text_folder

Import all text files in a folder