split_headers: Split headers from a sheet or a view

Description

Split headers from a sheet or a view. This function is designed to work well with merge_headers, as some of the headers that will be detected here are going to be multi-row and therefore need collapsing.

Usage

split_headers(x, data_allow_merged = FALSE)
split_headers_find(x, use_frames = TRUE, data_allow_merged = FALSE)
split_headers_apply(x, n)

Arguments

A worksheet or a worksheet view

data_allow_merged

If TRUE, horizontal merged cells are allowed in data regions, otherwise a region is considered not to be a data region if it has any merged cells that extend horizontally. Vertical merged cells, so long as they do not span horizontally, are allowed.

Single integer indicating how many rows are considered to be headers.

Value

For split_headers and split_headers_apply, a worksheet view; in this view the data$headers element will be a worksheet view of the headers. For split_headers_find, a single integer representing the number of rows of headers found (with zero indicating no headers).

Details

Like split_metadata, we need to employ some heuristics to get this right. These will probably evolve as we throw them at more spreadsheets.

This seems pretty hard to generalise, but at the same time it's easy enough to tell when looking at a spreadsheet which bits are headers, but it seems hard to teach the package what these rules are.

Things that are going to be common include varying the background colour; to detect that we're going to need some colour distance metrics (plain colour shifts aren't enough because multi-colour headings are going to be common). Deciding what the relevant colour differences is tricky because that will depend on the sheet so I should look into perhaps something about edge detection here?

Another common way of marking out header rows will be boldness. That often will carry through to the row labels too though, but we'd be OK looking for rows where essentially everything is bold vs ones where a few things are bold.

If we're lucky enough to have split frames and we have a full sheet (not a view) then we can probably treat that as the point where the headers start.

Otherwise we can filter based on things like