split_metadata(x, include_merged = TRUE, min_jump = 2, min_data_block = 5)
split_metadata_find(x, include_merged = TRUE, min_jump = 2, min_data_block = 5)
split_metadata_apply(x, n)
split_metadata_find
.split_metadata
and split_metadata_apply
,
a worksheet view; in this view the data$metadata
element
will be a worksheet view of the metadata. For
split_metadata_find
, a single integer representing the
number of rows of metadata found (with zero indicating no
metadata).
mmm mmm HHHHHHHH dddddddd dddddddd
where m
is some metadata (perhaps indicating table name,
creator, dates, etc), H
is the header and d
is the
actual data. This function will split the metadata (m
)
part off, leaving a table that is more suitable for further
processing. In many ways this is like the skip
argument to
readxl::read_excel
and read.csv
, but we will retain
the metadata (somewhere!).
The idea here is that the metadata block starts when we get a
shift in the number of non-blank cells. There needs to be some
heuristics here to help, and things will need to be tuneable: do
merged cells count as non-empty (include_merged
; default is
to include them), how big a jump we look for (min_jump
;
default is 2 columns), how many rows of the same size do we look
for in the data block (min_data_block
; default is 5 rows).
Other things that might be useful, but which aren't supported yet, include looking for different colours and fonts in the metadata and the main block; when we switch from one to the other we're likely to see things like a change here.