openssl (version 0.3)

crypto digest: Vectorized hashing functions

Description

Bindings to cryptographic hashing functions available in OpenSSL's libcrypto. Both binary and string inputs are supported and the output type will match the input type. Functions are fully vectorized for the case of character vectors: a vector with n strings will return n hashes.

Usage

sha1(x, salt = "")

sha256(x, salt = "")

sha512(x, salt = "")

md4(x, salt = "")

md5(x, salt = "")

ripemd160(x, salt = "")

Arguments

x
a character, raw vector or connection object.
salt
a http://en.wikipedia.org/wiki/Salt_(cryptography){salt} appended to each input element to anonymize or prevent dictionary attacks. See details.

Details

The family of hashing functions implement bindings to OpenSSL's crypto module, which allow for cryptographically hashing strings and raw (binary) vectors. When passing a connection object, they will stream-hash binary contents. To hash other types of objects, use a suitable mapping function such as serialize or as.character.

The full range of OpenSSL-supported cryptographic functions are available. The "sha256" or "sha512" algorithm is generally recommended for sensitive information. While md5 and weaker members of the sha family are probably sufficient for collision-resistant identifiers, cryptographic weaknesses have been directly or indirectly identified in their output.

In applications where hashes should be irreversible (such as names or passwords) it is often recommended to add a random, fixed salt to each input before hashing. This prevents attacks where we can lookup hashes of common and/or short strings. See examples. An common special case is adding a random salt to a large number of records to test for uniqueness within the dataset, while simultaneously rendering the results incomparable to other datasets.

References

OpenSSL manual: https://www.openssl.org/docs/crypto/EVP_DigestInit.html. Digest types: https://www.openssl.org/docs/apps/dgst.html

Examples

Run this code
# Support both strings and binary
md5("foo")
md5(charToRaw("foo"))

# Compare to digest
library(digest)
digest("foo", "md5", serialize = FALSE)

# Other way around
digest(cars, skip = 0)
md5(serialize(cars, NULL))

# Vectorized for strings
md5(c("foo", "bar", "baz"))

# Stream-verify from connections (including files)
myfile <- system.file("CITATION")
md5(file(myfile))

check md5 from: http://cran.r-project.org/bin/windows/base/old/3.1.1/md5sum.txt
md5(url("http://cran.r-project.org/bin/windows/base/old/3.1.1/R-3.1.1-win.exe"))

# Use a salt to prevent dictionary attacks
sha1("admin") # googleable
sha1("admin", salt="some_random_salt_value") #not googleable

# Use a random salt to identify duplicates while anonymizing values
sha256("john") # googleable
sha256(c("john", "mary", "john"), salt = rand_bytes(100))

Run the code above in your browser using DataLab