split_metadata(x, include_merged = TRUE, min_jump = 2, min_data_block = 5)
split_metadata_find(x, include_merged = TRUE, min_jump = 2, min_data_block = 5)
split_metadata_apply(x, n)split_metadata_find.split_metadata and split_metadata_apply,
a worksheet view; in this view the data$metadata element
will be a worksheet view of the metadata. For
split_metadata_find, a single integer representing the
number of rows of metadata found (with zero indicating no
metadata).
mmm mmm HHHHHHHH dddddddd dddddddd
where m is some metadata (perhaps indicating table name,
creator, dates, etc), H is the header and d is the
actual data. This function will split the metadata (m)
part off, leaving a table that is more suitable for further
processing. In many ways this is like the skip argument to
readxl::read_excel and read.csv, but we will retain
the metadata (somewhere!).
The idea here is that the metadata block starts when we get a
shift in the number of non-blank cells. There needs to be some
heuristics here to help, and things will need to be tuneable: do
merged cells count as non-empty (include_merged; default is
to include them), how big a jump we look for (min_jump;
default is 2 columns), how many rows of the same size do we look
for in the data block (min_data_block; default is 5 rows).
Other things that might be useful, but which aren't supported yet, include looking for different colours and fonts in the metadata and the main block; when we switch from one to the other we're likely to see things like a change here.