Learn R Programming

⚠️There's a newer version (0.0.2.0) of this package.Take me there.

genvar: an R package for imperative data manipulation and regression (like Stata)

Installation

To install genvar type, at the R console,

install.packages("devtools", dependencies=TRUE)
library(devtools)
install_github("flynnzac/genvar")
library(genvar)

Motivation

The goal of this package is to remove one barrier to using R, a free software statistical package, for researchers in the social sciences who are used to Stata's data model and imperative syntax. Stata assumes a rectangular model for data (there are observations and variables) while R allow for more flexible data structures. Stata also uses an imperative language where commands intentionally modify the state of the dataset while R uses a more function-based syntax. There are advantages to R's additional flexibility, but in the social sciences, data is almost always in the (observation, variable) framework, the Stata way of working with data is ingrained, and the additional flexibility of R can make things that are routine in Stata more difficult because the user has to know a much wider variety of functions to get the desired result. This package solves the problem by implementing a Stata-like method for manipulating data in R so that this data modification approach (which I will call the "imperative" approach because it involves issuing commands to modify state) is available in a free software package.

The package implements an environment where there is one active dataset and commands can be used to modify or reference variables from that dataset by issuing "commands" as opposed to R's standard environment (applying functions to objects and returning values).

genvar also uses R regression packages (plm, sandwich, and clubSandwich) which incorporate panel regression, robust and clustered standard errors, time series operators, and fixed effects all into one estimation command (this is mostly tying together other R packages which use a more function-object interface into an imperative interface). The goal is not just to replicate Stata's environment, but to offer an improved imperative data environment that takes advantage of the additional flexibility of R.

To get a feel for what genvar looks like see the example in examples/test.r. The syntax is more intuitive (well, hopefully) than standard R to people who are used to thinking in the Stata data model and its imperative language.

Bug Reporting

Report any bugs or feature requests (always willing to add features that you would like to be ported to this environment in R) to the Github repo https://github.com/flynnzac/genvar.

See below for the basic concepts and the reference manual for a list of commands.

Unique genvar variable types

Variable lists

Variable lists in genvar are specified by quoting the names of variables like, "educ wage black". The names can be specified using wildcard characters as well. For example, if the variables "x100 x2 x3" make up the dataset, they can be all be included by specifying, "x*". If we only want to list "x2 x3", then we can specify "x?" because ? matches only one character.

Quoted Expressions

Many genvar commands work by using "quoted expressions" which are bits of code enclosed in quotation marks. For example, to use genvar's gen command to generate log wages, you might type gen("lnwage", "log(wage)"). The second argument is a quoted expression. The quotes are necessary so that R does not try to execute log(wage) outside of the genvar environment. If you need to use a quotation mark in a quoted expression, escape it like so: gen("hello", "\"hello\"") to generate a variable called hello that contains the string hello for every observation.

Basic overview of currently available functions

Use the use function to load a dataset into the genvar environment.

Then, modify the dataset or add additional transformations of variables with the gen command.

Analyze the data using summarize, reg (for linear regression), logit or probit (for binary regression), or execute arbitrary R code in the genvar environment with the do command (if anyone writes a package that makes use of this interface to add new commands, let me know!). You can create a dataset of summary statistics with collapse.

There is support for panel data using xtset (reg works for panel regression as well, see its manual page) and L can be used to generate lags and leads of variables in the panel.

The data can be reshaped from long-to-wide or from wide-to-long using the shape command.

forvar can apply code by variable to a certain variable list in the dataset.

Let me know if you have any feature requests!

Examples

Check out the examples folder for examples. test.r shows most of the features.

Copy Link

Version

Install

install.packages('genvar')

Monthly Downloads

4

Version

0.0.1.4

License

GPL-3

Maintainer

Zach Flynn

Last Published

October 13th, 2019

Functions in genvar (0.0.1.4)

dropvar

drops variables in varlist format from the dataset
estimates_save

save genvar estimates
gen

generates a new variable that is a transformation of existing variables in the dataset or replaces one
forvar

apply a function to each of a list of variables
estimates_restore

restore genvar estimates
savedata

saves data to a CSV or RDS file
shape

reshapes a data set from wide to long or from long to wide formats
estimates_store

store genvar estimates
keepif

keeps some rows in the dataset and drops the rest
estimates_use

loads genvar estimates from file
fillin

Fully rectangularize a dataset
addobs

add observations to the data set
keepvar

keeps some variables in the dataset and drops the others
pred

gets fitted values from a genvar regression object
preserve

preserve a data set before modification
estimates_print

display estimation results
xtset

prepares a panel dataset for lag operations
dropif

drops rows from the dataset
do

Executes R code on the dataset
is_loaded

a command to determine whether data is loaded
headdata

get first few observations
restore

restore a dataset from a previous preserve to be currently used
forval

Execute code in the datasets environment for all values of a vector, replacing a macro with the value in each iteration
rename

renames variables in the dataset
use

uses a dataset, marking it as the active dataset
getdata

exports data frame from genvar environment to R environment
varlist

creates a formula object from a varlist, mostly for internal use.
gvplot

convenience interface to R's plot command
logit

estimate a logistic regression
listif

prints the part of the dataset that satisfies certain conditions
subset.varlist

generate a varlist that is a subset of another
tostring

convert a variable of another type into a string variable
taildata

get last few observations
probit

estimate a probit regression
reg

regress y on x with robust standard errors, clustered standard errors, HAC standard errors, panel fixed effects, etc
summarize

summarize a variable list, giving basic descriptive statistics
builddata

creates a dataset of a given number of observations
describe

lists the names of the variables in the dataset
capture

captures an expression, returning TRUE if there was an error and FALSE otherwise
collapse

collapses a data set by variables using arbitrary aggregation functions
L

a function to take lags and leads with panel data
assert_loaded

assert a dataset is loaded in genvar and error otherwise
clear

clears the dataset in memory
count

Counts how many observations (optionally, satisfying a condition)
destring

convert a variable with string type into a numeric value