Learn R Programming

⚠️There's a newer version (0.5-11) of this package.Take me there.

easyr

Helpful functions from Oliver Wyman Actuarial Consulting.

easyr makes difficult operations easy.

Installation

You can install the latest version available on CRAN:


install.packages('easyr')
require(easyr)

Or install the latest version from github:


devtools::install_github( "oliver-wyman-actuarial/easyr" )
require(easyr)

Getting Started

Tutorial: https://www.kaggle.com/brycechamberlain/easyr-tutorial.

Here is what a project looks like using easyr:

# start with begin() to set up your workspace.
# begin will set the working directory to the location of this file and
#     run anything in fun/ or functions/ so put your functions there.
require(easyr)
begin()

# read.any reads in your data regardless of format, with powerful typing to get numbers and dates.
# use ?read.any to see the many options.
dt = read.any( 'path/to/file.extension' )

# let's look at a data dictionary to understand our data.
View( dict( dt ) )

# begin has already loaded dplyr and magrittr so you are ready to go.
dt %<>% 
  filter( !is.na(id) ) %>% 
  mutate( newcol = oldcol1 + oldcol 2 )

# use w to quickly write to out.csv'.
w( dt )

Function categories:

  • shorthand: protect your hands and move faster by typing less when using common functions.
  • type conversion: convert fields to dates, numbers, characters, and logical.
  • data wrangling: join and replace, explore data, factor-friendly joins and binds, etc.
  • workflow: cacheing, run folder, validate data, etc.

Data:

  • nastrings: common NA character values.
  • states: U.S. State abbreviations
  • cblind: color set built by and optimized for color-blind users.

Built, shared, and managed by Oliver Wyman Actuarial Consulting.

Now accepting proposed contributions through GitHub!

Highlights

  • begin sets up your workspace.
  • read.any reads many file types, automatically selecting the best read function for you, and auto-types incoming data so you don't have to.
  • jrepl joins a mapping and adds a column or replaces values where matches occur. It is optimized to use a combination inner and left join and will error out if data is duplicated in the join.
  • cc replaces paste0 to reduce typing.
  • dict returns information about a dataset's columns. fldict does the same for a folder of datasets.
  • eq handles NAs, where chaos ensues with ==.
  • crun concatenates and runs a vector of characters as a command.
  • fmat converts dates and numbers to pretty strings.
  • tonum, todate, tobool flexibly convert character vectors with minimial work.
  • Check out the detailed list of functions below for more.

Philosophy

This packages comes from code we've written to make our daily work more efficient. We rely on it heavily in our organization.

It is built on the following tenets:

  • Fingers are precious: strive to reduce the amount of typing and hand strain during coding. This means avoiding the shift key and choosing short names. Many function names won't be intuitive at first but will save you many keystrokes. A good example: cc exists almost exclusively so you don't have to type paste0.

  • Generic scope: avoid functions that apply to domain-specific tasks. These belong in other packages.

Make A Contribution

Any and all contributions are welcome. The easiest way to contribute is to add an Issue. This can be a bug identified or even an idea you have on how we can improve easyr. Please be detailed and provide examples to make it easy for the community to resolve your issue/idea.

If you would like to make a more material contribution via Pull Request, please consider:

  • The Issue page page lists open issues that we need your help to resolve.
  • build-install-test.R is included to let you run tests. Please run this to ensure your changes don't cause tests or examples to fail.
  • tests/testthat folder contains tests. Consider adding a test to validate your change and prevent someone else from breaking it in the future.
  • cmd-code-run-checks.txt contains command-line scripts you can run to check if your changes will be acceptable to CRAN. If it isn't, it'll require extra work by us before we can submit to CRAN.

Support

Submit an Issue or Pull Request via GitHub and the community will review it.

Functions

Here are the functions in easyr by category. Use ?functionName to view detailed documentation for a function.

Shorthand

Common operations shortened for elegance, simplicity, and speed.

NameDescription
ccShorthand paste0/paste function to make typing these common function easier. Intuitively understands how to combine various-length inputs.
coalfdplyr function "coalesce" but handles factors appropriately. Checks each argument vector starting with the first until a non-null value is found.
crunConcatenate arguments and run them as a command. Shorthand for eval( parse( text = paste0( ... ) ) ). Consider also using base::get() which can be used to get an object from a string, but only if it already exists.
ddiffDate difference function plus shorthand mdiff, qdiff, ydiff.
eqVectorized flexible equality comparison which considers NAs as a value. Returns TRUE if both values are NA, and FALSE when only one is NA.
grGet the golden ratio.
left/right/midBehaves like Excel's LEFT, RIGHT, and MID functions.
nanullFacilitates checking for missing values. NULL values can cause errors on is.na checks, and is.na can cause warnings if it is inside if() and is passed multiple values.
%ni%Not in. Opposite of %in% operator. Equivalent to x %ni% y is equivalent to ! x %in% y.
isvalOpposite of nanull.
read.txtRead the text of a file into a character variable.
other shorthand (multiple)functions to save you keystrokes : na (is.na), nan (is.nan), null (is.null), ischar (is.character), isdate (is.Date), isnum (is.numeric), tochar (as.character)
pad0Adds leading zeros to a character vector to make each value a specific length. For values shorter than length passed, leading zeros are removed.
splExtract a uniform random sample from a dataset or vector.
strxbase::str (structure) function but only for names matching a character value (regex).
wwrite function. Writes to csv without row names and automatically adds .csv to the file name if it isn't there already. Changes to .csv if another extension is passed.

Type Conversion

Helpful for setting or changing variable/vector data types.

NameDescription
atypeAuto-type a dataframe: automatically determine data types and perform conversions per column. Used by read.any to automatically set types.
char2fac, fac2charConvert all character columns to factors and vice-versa.
match.factorsModifies two datasets so matching factor columns have the same levels. Typically this is used prior to joining or bind_rows in the easyr functions bindf, ijoinf, lfjoinf.
toboolFlexible boolean conversion function.
todateFlexible date conversion function using lubridate. Works with dates in many formats, without needing to know the format in advance.
tonumFlexible number conversion for converting strings to numbers. Handles $ , ' and spaces.
xldateConverts dates from Excel integers to something usable in R.
fmatFormat numbers and dates into character quickly and easily.

Data Wrangling

Help with reading and manipulating data.

NameDescription
binbyvolBins a numerical column according to another numerical column's volume.
bindfdplyr's bind_rows doesn't work well when the data frame has factors. This function handles factors before applying bind rows.
dictGet information about a Data Frame or Data Table. Use getinfo to explore a single column instead.
drowsPull rows with a duplicated value.
getbetterintTakes bucket names of binned values such as [1e3,2e3) or [0.1234567, 0.2) and formats the values nicely into values such as 1,000-2,000 or 0.12-0.20
fldictData dictionary for all data in a folder.
getinfoGet information about a Column in a Data Frame or Data Table. Use getdatadict to explore all columns in a dataset instead.
namesxGet column names that match a pattern.
ijoinfdplyr's joins doesn't work well when the data frame has factors. This function handles factors before applying dplyr::inner_join. Also availalbe are ljoinf, rjoinf for left and right join.
jreplJoin and replace. Joins to another dataset and replaces matched values on a given column. Good for quickly grabbing values from another dataset to fill in or replace.
read.anyFlexible read function to handle many types of files, data types, etc. Reduces downstream errors from read issues. Currently handles CSV, TSV, DBF, RDS, XLS (incl. when formatted as HTML), and XLSX.
schSearch a data frame or vector. Attempts to replicate Excel search but with regex.
short_dollarsConverts numeric plot axis dollars and attaches K and divides by 1000.
short_numsShortens axis numbering to thousands or millions and adds.
sumnumSummarize all numeric columns in a dataset.
tcolTranspose operation that sets column names equal to a column in the original data.

Workflow

Operations to run projects and organize code.

NameDescription
beginPerform common operations before running a script. Includes clearing environment variables, disabling scientific notation, loading common packages, and setting the working directory to the location of the current file.
cachingfunctions including cache.init, cache.ok, save.cache, and clear.cache.
check_equalCheck actual versus expected values and get helpful metrics back.
hashfilesCreate a hash uniquely representing the state of files or folders. Helpful for checking for changes.
runfolderRun scripts in a folder. If an error occurs, it will tell you what file had the error. Helpful for running ordered scripts.
tcmsgEasy Try/Catch implementation to return the same message on error or warning. Makes it easier to write tryCatches.
tcwarnLike tcmsg but returns a warning instead of an error when an error occurs, so code can continue to run.
validate.equalCheck that two data frames are equivalent.

Data

These data resources are also included.

NameDescription
nastringsList of strings considered NA by easyr. Includes blank strings, "NA", excel errors, etc.
statesHelpul dataset of U.S. State abbreviations and names.
cblindCharting colors optimized for and selected by colorblind individuals.

Copy Link

Version

Install

install.packages('easyr')

Monthly Downloads

378

Version

0.2-0

License

GPL (>= 2)

Issues

Pull Requests

Stars

Forks

Maintainer

Bryce Chamberlain

Last Published

January 31st, 2020

Functions in easyr (0.2-0)

cc

Concatenate.
astext

As Text
begin

Begin
atype

Auto-Type
cache.ok

Check Cache Status
clear.cache

Clear Cache
binbyvol

Bin by Volume
crun

Concatenate and run.
ecopy

Copy to Clipboard
char2fac

Characters to Factors
bindf

Bind Rows with Factors
cache.init

Initialize cache.
ddiff

Date difference (or difference in days).
dict

Get Data Dictionary
cblind

cblind
eq

NA-Friendly Equality Comparison
drows

Get Rows with Duplicates
getbetterint

Get better Int
getinfo

Get Info
likedate

Like Date
fmat

Number Formatter
fldict

Get Data Dictionary for Files in Folder
gr

Golden Ratio
isdate

Shorthand for lubridate::is.Date
%ni%

Not-In
coalf

Factor-friendly Coalesce
fac2char

Factors to Characters
nanull

NA / NULL Check
nastrings

NA Strings
ljoinf

Left Join with Factors
jrepl

Join and Replace Values.
mid

mid
isfac

Shorthand for is.factor
na

Shorthand for is.na
right

right
tonum

Convert to Number
tochar

Shorthand for as.character
rjoinf

Right Join with Factors
usepkg

Use Package
left

left
todate

Convert to Date
fjoinf

Full Join with Factors
charnum

Check for Number Formatted as Character.
match.factors

Match Factors.
mdiff

Date Difference in Months
tobool

Convert to Logical/Boolean
w

Write
read.txt

Read File as Text
read.any

Read Any File
tcwarn

tryCatch with warning
runfolder

Run Folder
validate.equal

Validate Equal
checkeq

Check Value or Control Total
headers_row

Identify headers row.
ijoinf

Inner Join with Factors
hashfiles

Hash Files
isnum

Shorthand for is.numeric
rx

Read Excel
nan

Shorthand for is.nan
namesx

Names Like
save.cache

Save Cache Saves the arguments to a cache file, using the cache.num last checked with cache.ok.
isval

Is Valid / Is a Value / NA NULL Check
sch

Search a Data Frame.
qdiff

Date Difference in Quarters
null

Shorthand for is.null
spl

Sample
pad0

Pad with Zeros
rany_fixColNames

Fix column names.
ischar

Shorthand for is.character
states

states
sumnum

Summarize All Numeric Columns
strx

Structure with Like
xldate

Convert Excel Number to Date
ydiff

Date Difference in Years
tcmsg

tryCatch with Message
tcol

Transpose at Column.