The parseLatex
function parses LaTeX source, producing a structured object.
parseLatex(
text,
verbose = FALSE,
verbatim = c("verbatim", "verbatim*", "Sinput", "Soutput"),
verb = "\\Sexpr",
defcmd = c("\\newcommand", "\\renewcommand", "\\providecommand", "\\def",
"\\let"),
defenv = c("\\newenvironment", "\\renewenvironment"),
catcodes = defaultCatcodes,
recover = FALSE,
showErrors = recover,
...
)
parseLatex
returns parsed Latex in a list with class "LaTeX2"
. Items in the list have class "LaTeX2item"
.
A character vector containing LaTeX source code.
If TRUE
, print debug error messages.
A character vector containing the names of LaTeX environments holding verbatim text.
A character vector containing LaTeX macros that should be assumed to hold verbatim text.
Character vectors of macros that are assumed to define new macro commands or environments respectively. See the note below about some limitations.
A list or dataframe holding LaTeX "catcodes", such as defaultCatcodes.
If TRUE
, attempt to recover from errors and
continue parsing. See Details below.
If TRUE
, show errors after parsing.
Additional parameters to pass to showErrors.
The LaTeX defining commands have fairly simple syntax, but
\def
and \let
from plain Tex have quite variable syntax
and parseLatex()
does not attempt to handle it all. Stick
with simple syntax like \def\bea{\begin{eqnarray}}
and
it should work.
Some versions of LaTeX such as pdflatex
only handle ASCII
inputs, while others such as xelatex
allow Unicode input.
parseLatex
allows Unicode input.
During processing of LaTeX input, an interpreter can change
the handling of characters as it goes, using the \catcode
macro
or others such as \makeatletter
. However, parseLatex()
is purely
a parser, not an interpreter, so it can't do that, but
the user can change handling for the whole call using the
catcodes
argument.
catcodes
should be a list or dataframe
with at least two columns:
char
should be a column of single characters.
catcode
should be a column of integers in the range 0 to 15
giving the corresponding catcode.
During parsing, parseLatex
will check these values first.
If the input character doesn't match anything, then it will
be categorized:
as a letter (catcode 11) using the ICU function
u_hasBinaryProperty(c, UCHAR_ALPHABETIC)
(or iswalpha(c)
on
Windows),
as a control character (catcode 15) if its code point is less than 32,
as "other" (catcode 12) otherwise.
When recover = TRUE
, the parser will mark each error
in the output, and attempt to continue parsing. This
may lead to a cascade of errors, but will sometimes
help in locating the first error. The section of text
related to the error will be marked as an item with
tag ERROR
.
LaTeX2, LaTeX2item
parsed <- parseLatex(r"(fran\c{c}ais)")
parsed
Run the code above in your browser using DataLab