
Last chance! 50% off unlimited learning
Sale ends in
Christopher Gandrud
Please report any bugs or suggestions at: https://github.com/christophergandrud/DataCombine/issues.
DataCombine is a set of miscellaneous tools intended to make combining data sets--especially time-series cross-section data--easier. The package is continually being developed as I turn lines of code that I frequently use into single functions. It currently includes the following functions:
CasesTable
function added to report cases after listwise deletion ofmissing values for time-series cross-sectional data.
change
: calculates the absolute, percentage, and proportion change froma specified lag, including within groups.
CountSpell
: function that returns a variable counting the spell numberfor an observation. Works with grouped data.
dMerge
: merges 2 data frames and report/drop/keeps only duplicates.
DropNA
: drops rows from a data frame when they have missing (NA
) values on a
given variable(s).
FillDown
: fills in missing (NA
) values with the previous non-missing value
FillIn
: fills in missing values of a variable from one data frame with the
values from another variable.
FindDups
: find duplicated values in a data frame and subset it to eitherinclude or not include them.
FindReplace
: replaces multiple patterns found in a character string columnof a data frame.
grepl.sub
: subsets a data frame if a specified pattern is found in acharacter string.
InsertRow
: allows user to insert a row into a data frame. Largelyimplements: Ari B. Friedman's function.
MoveFront
: moves variables to the front of a data frame. This can be usefulif you have a data frame with many variables and want to move a variable or variables to the front.
NaVar
: create new variable(s) indicating if there are missing values inother variable(s).
shift
: creates lag and lead variables, including for time-seriescross-sectional data. The shifted variable is returned to a new vector. This function is largely based on TszKin Julian's shift function.
slide
: creates lag and lead variables, including for time-seriescross-sectional data. The slid variable are added to the original data frame.
This expands the capabilities of shift
.
slideMA
: creates a moving average for a period before or after each timepoint for a given variable.
SpreadDummy
: spread a dummy variable (1's and 0') over a specified timeperiod and for specified groups.
StartEnd
: finds the starting and ending time points of a spell, includingfor time-series cross-sectional data.
rmExcept
: removes all objects from a workspace except those specified by theuser.
TimeExpand
: expands a data set so that it includes an observation for eachtime point in a sequence. Works with grouped data.
TimeFill
: creates a continuous Unit
-Time
-Dummy
data frame from a dataframe with Unit
-Start
-End
times.
VarDrop
: drops one or more variables from a data frame.I will continue to add to the package as I build data sets and run across other pesky tasks I do repeatedly that would be simpler if they were completed by a single function.
DataCombine is on CRAN.
You can also install the most recent stable version with install_github
from
the devtools:
devtools::install_github('christophergandrud/DataCombine')
install.packages('DataCombine')