Learn R Programming

speccurvieR (version 0.4.2)

sca: Perform specification curve analysis

Description

sca() is the workhorse function of the package--this estimates models with every possible combination of the controls supplied and returns a data frame where each row contains the pertinent information and parameters for a given model by default. This data frame can then be input to plotCurve() or any other plotting function in the package. Alternatively, if `returnFormulae = TRUE`, it returns a list of formula objects with every possible combination of controls.

Usage

sca(
  y,
  x,
  controls,
  data,
  weights = NULL,
  family = "linear",
  link = NULL,
  fixedEffects = NULL,
  returnFormulae = FALSE,
  progressBar = TRUE,
  parallel = FALSE,
  workers = 2
)

Value

When `returnFormulae` is `FALSE`, a dataframe where each row contains the independent variable coefficient estimate, standard error, test statistic, p-value, model specification, and measures of model fit.

Arguments

y

A string containing the column name of the dependent variable in data.

x

A string containing the column name of the independent variable in data.

controls

A vector of strings containing the column names of the control variables in data.

data

A dataframe containing y, x, controls, and (optionally) the variables to be used for fixed effects or clustering.

weights

Optional string with the column name in `data` that contains weights.

family

A string indicating the family of models to be used. Defaults to "linear" for OLS regression but supports all families supported by `glm()`.

link

A string specifying the link function to be used for the model. Defaults to `NULL` for OLS regression using `lm()` or `fixest::feols()` depending on whether fixed effects are supplied. Supports all link functions supported by the family parameter of `glm()`.

fixedEffects

A string containing the column name of the variable in data desired for fixed effects. Defaults to NULL in which case no fixed effects are included.

returnFormulae

A boolean. When `TRUE` a list of model formula objects is returned but the models are not estimated. Defaults to `FALSE` in which case a dataframe of model results is returned.

progressBar

A boolean indicating whether the user wants a progress bar for model estimation. Defaults to `TRUE`.

parallel

A boolean indicating whether to parallelize model estimation. Parallelization only offers a speed advantage when a large (> 1000) number of models is being estimated. Defaults to `FALSE`.

workers

An integer indicating the number of workers to use for parallelization. Defaults to 2.

Examples

Run this code
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat"),
    data = bottles, progressBar = TRUE, parallel = FALSE);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA*NO3uM", "O2Sat*NO3uM"),
    data = bottles, progressBar = TRUE, parallel = TRUE, workers = 2);
sca(y = "Salnty", x = "T_degC", controls = c("ChlorA", "O2Sat*NO3uM"),
    data = bottles, progressBar = TRUE, parallel = FALSE,
    returnFormulae = TRUE);

Run the code above in your browser using DataLab