Learn R Programming

PTXQC (version 0.82.6)

MQDataReader$getInvalidLines: Detect broken lines (e.g. due to Excel import+export)

Description

When editing a MQ txt file in Microsoft Excel, saving the file can cause it to be corrupted, since Excel has a single cell content limit of 32k characters (see http://office.microsoft.com/en-001/excel-help/excel-specifications-and-limits-HP010342495.aspx) while MQ can easily reach 60k (e.g. in oxidation sites column). Thus, affected cells will trigger a line break, effectively splitting one line into two (or more).

Arguments

Value

Returns a vector of indices of broken (i.e. invalid) lines

Details

If the table has an 'id' column, we can simply check the numbers are consecutive. If no 'id' column is available, we detect line-breaks by counting the number of NA's per row and finding outliers. The line break then must be in this line (plus the preceeding or following one). Depending on where the break happened we can also detect both lines right away (if both have more NA's than expected).

Currently, we have no good strategy to fix the problem since columns are not aligned any longer, which leads to columns not having the class (e.g. numeric) they should have. (thus one would need to un-do the linebreak and read the whole file again)

[Solution to the problem: try LibreOffice 4.0.x or above -- seems not to have this limitation]