openssl (version 0.5)

hashing: Vectorized hashing functions

Description

Bindings to hash functions in OpenSSL. Supported inputs are binary (raw vector), strings (character vector) or a connection object. Functions are vectorized for the case of character vectors: a vector with n strings returns n hashes. When passing a connection object, the contents will be stream-hashed which minimizes the amount of required memory.

Usage

sha1(x, salt = "")

sha256(x, salt = "")

sha512(x, salt = "")

md4(x, salt = "")

md5(x, salt = "")

ripemd160(x, salt = "")

Arguments

x
a character, raw vector or connection object.
salt
a http://en.wikipedia.org/wiki/Salt_(cryptography){salt} appended to each input element to anonymize or prevent dictionary attacks. See details.

Details

The "sha256" algorithm is generally recommended for sensitive information. While md5 and weaker members of the sha family are usually sufficient for collision-resistant identifiers, they are no longer considered secure for cryptographic purposes.

In applications where hashes should be irreversible (such as names or passwords) it is often recommended to add a random, fixed salt to each input before hashing. This prevents attacks where we can lookup hashes of common and/or short strings. See examples. An common special case is adding a random salt to a large number of records to test for uniqueness within the dataset, while simultaneously rendering the results incomparable to other datasets.

References

OpenSSL manual: https://www.openssl.org/docs/crypto/EVP_DigestInit.html. Digest types: https://www.openssl.org/docs/apps/dgst.html

Examples

Run this code
# Support both strings and binary
md5("foo")
md5(charToRaw("foo"))

# Compare to digest
library(digest)
digest("foo", "md5", serialize = FALSE)

# Other way around
digest(cars, skip = 0)
md5(serialize(cars, NULL))

# Vectorized for strings
md5(c("foo", "bar", "baz"))

# Stream-verify from connections (including files)
myfile <- system.file("CITATION")
md5(file(myfile))

check md5 from: http://cran.r-project.org/bin/windows/base/old/3.1.1/md5sum.txt
md5(url("http://cran.r-project.org/bin/windows/base/old/3.1.1/R-3.1.1-win.exe"))

# Use a salt to prevent dictionary attacks
sha1("admin") # googleable
sha1("admin", salt="some_random_salt_value") #not googleable

# Use a random salt to identify duplicates while anonymizing values
sha256("john") # googleable
sha256(c("john", "mary", "john"), salt = rand_bytes(100))

Run the code above in your browser using DataCamp Workspace