
Last chance! 50% off unlimited learning
Sale ends in
These functions help you respond to web pages that declare incorrect
encodings. You can use guess_encoding
to figure out what
the real encoding is (and then supply that to the encoding
argument of
html), or use repair_encoding
to fix character vectors after the
fact.
guess_encoding(x)repair_encoding(x, from = NULL)
A character vector.
The encoding that the string is actually in. If NULL
,
guess_encoding
will be used.
These function are wrappers around tools from the fantastic stringi package, so you'll need to make sure to have that installed.
# NOT RUN {
# A file with bad encoding included in the package
path <- system.file("html-ex", "bad-encoding.html", package = "rvest")
x <- read_html(path)
x %>% html_nodes("p") %>% html_text()
guess_encoding(x)
# Two valid encodings, only one of which is correct
read_html(path, encoding = "ISO-8859-1") %>% html_nodes("p") %>% html_text()
read_html(path, encoding = "ISO-8859-2") %>% html_nodes("p") %>% html_text()
# }
Run the code above in your browser using DataLab