url_decode

0th

Percentile

Encode or decode a URI

encodes or decodes a URI/URL

Usage
url_decode(urls)
url_encode(urls)
Arguments
urls
a vector of URLs to decode or encode.
Details

URL encoding and decoding is an essential prerequisite to proper web interaction and data analysis around things like server-side logs. The relevant IETF RfC mandates the percentage-encoding of non-Latin characters, including things like slashes, unless those are reserved.

Base R provides URLdecode and URLencode, which handle URL encoding - in theory. In practise, they have a set of substantial problems that the urltools implementation solves::

  • No vectorisation: Both base R functions operate on single URLs, not vectors of URLs. This means that, when confronted with a vector of URLs that need encoding or decoding, your only option is to loop from within R. This can be incredibly computationally costly with large datasets. url_encode and url_decode are implemented in C++ and entirely vectorised, allowing for a substantial performance improvement.
  • No scheme recognition: encoding the slashes in, say, http://, is a good way of making sure your URL no longer works. Because of this, the only thing you can encode in URLencode (unless you refuse to encode reserved characters) is a partial URL, lacking the initial scheme, which requires additional operations to set up and increases the complexity of encoding or decoding. url_encode detects the protocol and silently splits it off, leaving it unencoded to ensure that the resulting URL is valid.
  • ASCII NULs: Server side data can get very messy and sometimes include out-of-range characters. Unfortunately, URLdecode's response to these characters is to convert them to NULs, which R can't handle, at which point your URLdecode call breaks. url_decode simply ignores them.

Value

a character vector containing the encoded (or decoded) versions of "urls".

See Also

Bob Rudis's Punycode package on GitHub, for handling punycode in URLs.

Aliases
  • url_decode
  • url_encode
Examples

url_decode("https://en.wikipedia.org/wiki/File:Vice_City_Public_Radio_%28logo%29.jpg")
url_encode("https://en.wikipedia.org/wiki/File:Vice_City_Public_Radio_(logo).jpg")

## Not run: 
# #A demonstrator of the contrasting behaviours around out-of-range characters
# URLdecode("%gIL")
# url_decode("%gIL")
# ## End(Not run)
Documentation reproduced from package urltools, version 1.4.0, License: MIT + file LICENSE

Community examples

Looks like there are no examples yet.