encoding

0th

Percentile

Guess and repair faulty character encoding.

These functions help you respond to web pages that declare incorrect encodings. You can use guess_encoding to figure out what the real encoding is (and then supply that to the encoding argument of html), or use repair_encoding to fix character vectors after the fact.

Usage
guess_encoding(x)

repair_encoding(x, from = NULL)

Arguments
x

A character vector.

from

The encoding that the string is actually in. If NULL,

stringi

These function are wrappers around tools from the fantastic stringi package, so you'll need to make sure to have that installed.

Aliases
  • encoding
  • guess_encoding
  • repair_encoding
Examples
library(rvest) # NOT RUN { # A file with bad encoding included in the package path <- system.file("html-ex", "bad-encoding.html", package = "rvest") x <- read_html(path) x %>% html_nodes("p") %>% html_text() guess_encoding(x) # Two valid encodings, only one of which is correct read_html(path, encoding = "ISO-8859-1") %>% html_nodes("p") %>% html_text() read_html(path, encoding = "ISO-8859-2") %>% html_nodes("p") %>% html_text() # }
Documentation reproduced from package rvest, version 0.3.2, License: GPL-3

Community examples

Looks like there are no examples yet.