
Oliver Keyes
39 packages on CRAN
1 packages on GitHub
Survey systems and other third-party data sources commonly use non-standard representations of logical values when it comes to qualitative data - "Yes", "No" and "N/A", say. batman is a package designed to seamlessly convert these into logicals. It is highly localised, and contains equivalents to boolean values in languages including German, French, Spanish, Italian, Turkish, Chinese and Polish.
A connector to the API for 'Wordnik' <https://www.wordnik.com>, a dictionary service that also provides bigram generation, word frequency data, and a whole host of other functionality.
Extracts Exchangeable Image File Format (EXIF) metadata, such as camera make and model, ISO speed and the date-time the picture was taken on, from JPEG images. Incorporates the 'easyexif' (https://github.com/mayanklahiri/easyexif) library.
A dataset of favourite numbers, selected from an online poll of over 30,000 people by Alex Bellos (http://pages.bloomsbury.com/favouritenumber).
Provides tools to encode lat/long pairs into geohashes, decode those geohashes, and identify their neighbours.
Read data from the City of Portland's 'HYDRA' <http://or.water.usgs.gov/precip/> rainfall datasets within R.
Human names are complicated and nonstandard things. Humaniformat, which is based on Anthony Ettinger's 'humanparser' project (https://github.com/ chovy/humanparser) provides functions for parsing human names, making a best- guess attempt to distinguish sub-components such as prefixes, suffixes, middle names and salutations.
Reformat currency-based data as numeric values (or numeric values as currency-based data) and convert between currencies.
A data package containing public domain information on requests made by the 'MuckRock' (https://www.muckrock.com/) project under the United States Freedom of Information Act.
'Open Location Codes' <http://openlocationcode.com/> are a Google-created standard for identifying geographic locations. 'olctools' provides utilities for validating, encoding and decoding entries that follow this standard.
A connector to ORES (<http://ores.wmflabs.org/>), an AI project to provide edit scoring for content on Wikipedia and other Wikimedia projects. This lets a researcher identify if edits are likely to be reverted, damaging, or made in good faith.
A connector to the API maintained by the Open Source Initiative <https://api.opensource.org/licenses/>, which provides machine-readable metadata about a variety of open source software licenses.
Pageview data from the 'Wikimedia' sites, such as 'Wikipedia' <https://www.wikipedia.org/>, from entire projects to per-article levels of granularity, through the new RESTful API and data source <https:// wikimedia.org/api/rest_v1/?doc>.
A wrapper around the 'Parsing Expression Grammar Template Library', a C++11 library for generating Parsing Expression Grammars, that makes it accessible within Rcpp. With this, developers can implement their own grammars and easily expose them in R packages.
Functions to test whether a number is prime and generate the prime numbers within a specified range. Based around an implementation of Wilson's theorem for testing for an integer's primality.
A client library for 'The Guardian' (https://www.guardian.com/) and their API, this package allows users to search for Guardian articles and retrieve both the content and metadata.
Functions to reconstruct sessions from web log or other user trace data and calculate various metrics around them, producing tabular, output that is compatible with 'dplyr' or 'data.table' centered processes.
Connectors to online and offline sources for taking IP addresses and geolocating them to country, city, timezone and other geographic ranges. For individual connectors, see the package index.
Provides functions to retrieve and reformat data from the 'Star Wars' API (SWAPI) <https://swapi.co/>.
A connector to the 'What3Words' (http://what3words.com/) service, which represents each 3m by 3m square on earth with a unique trio of English-language words.
'Radix trees', or 'tries', are key-value data structures optimised for efficient lookups, similar in purpose to hash tables. 'triebeard' provides an implementation of 'radix trees' for use in R programming and in developing packages with 'Rcpp'.
A toolkit for all URL-handling needs, including encoding and decoding, parsing, parameter extraction and modification. All functions are designed to be both fast and entirely vectorised. It is intended to be useful for people dealing with web-related datasets, such as server-side logs, although may be useful for other situations involving large sets of URLs.
R is used by a vast array of people for a vast array of purposes - including web analytics. This package contains functions for consuming and munging various common forms of request log, including the Common and Combined Web Log formats and various Amazon access logs.
Retrieve data from the 'Whoapi' (https://whoapi.com) store of domain information, including a domain's geographic location, registration status and search prominence.
Utilities to generate bounding boxes from 'WKT' (Well-Known Text) objects and R data types, validate 'WKT' objects and convert object types from the 'sp' package into 'WKT' representations.
A wrapper for the MediaWiki API, aimed particularly at the Wikimedia 'production' wikis, such as Wikipedia. It can be used to retrieve page text, information about users or the history of pages, and elements of the category tree.
Anonymized data from surveys conducted by Forwards <https://forwards.github.io/>, the R Foundation task force on women and other under-represented groups. Currently, a single data set of responses to a survey of attendees at useR! 2016 <https://www.r-project.org/useR-2016/>, the R user conference held at Stanford University, Stanford, California, USA, June 27 - June 30 2016.
A toolkit for manipulating, validating and testing 'IP' addresses and ranges, along with datasets relating to 'IP' addresses. Tools are also provided to map 'IPv4' blocks to country codes. While it primarily has support for the 'IPv4' address space, more extensive 'IPv6' support is intended.
Bindings to OpenSSL libssl and libcrypto, plus custom SSH pubkey parsers. Supports RSA, DSA and EC curves P-256, P-384 and P-521. Cryptographic signatures can either be created and verified manually or via x509 certificates. AES can be used in cbc, ctr or gcm mode for symmetric encryption; RSA for asymmetric (public key) encryption or EC for Diffie Hellman. High-level envelope functions combine RSA and AES for encrypting arbitrary sized data. Other utilities include key generators, hash functions (md5, sha1, sha256, etc), base64 encoder, a secure random number generator, and 'bignum' math methods for manually performing crypto calculations on large multibyte integers.
Provides a collection of phonetic algorithms including Soundex, Metaphone, NYSIIS, Caverphone, and others.
String operations the Python way - a package for those of us who miss Python's string methods while we're working in R.
A new implementation of EDM algorithms based on research software previously developed for internal use in the Sugihara Lab (UCSD/SIO). Contains C++ compiled objects that use time delay embedding to perform state-space reconstruction and nonlinear forecasting and an R interface to those objects using 'Rcpp'. It supports both the simplex projection method from Sugihara & May (1990) <DOI:10.1038/344734a0> and the S-map algorithm in Sugihara (1994) <DOI:10.1098/rsta.1994.0106>. In addition, this package implements convergent cross mapping as described in Sugihara et al. (2012) <DOI:10.1126/science.1227079> and multiview embedding as described in Ye & Sugihara (2016) <DOI:10.1126/science.aag0863>.
R interface to the 'LTP'-Cloud service for Natural Language Processing in Chinese (http://www.ltp-cloud.com/).
A suite of custom R Markdown formats and templates for authoring journal articles and conference submissions.
Text mining for word processing and sentiment analysis using 'dplyr', 'ggplot2', and other tidy tools.
Convert natural language text into tokens. Includes tokenizers for shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs, characters, shingled characters, lines, tweets, Penn Treebank, regular expressions, as well as functions for counting characters, words, and sentences, and a function for splitting longer texts into separate documents, each with the same number of words. The tokenizers have a consistent interface, and the package is built on the 'stringi' and 'Rcpp' packages for fast yet correct tokenization in 'UTF-8'.