subpop: Inference on Most and Least Affected Groups

Description

subpop conducts set inference on the groups of most and least affected. When subgroup = NULL, output is for whole sample. Otherwise the results are subgroup. The output of subpop is a list containing six components: cs_most, cs_least, u, subgroup, most and least. As the names indicate, cs_most and cs_least denote the confidence sets for the most and least affected units. u stores the u-th most and least affected index. subgroup stores the indicators for subpopulations. most and least store the data of the most and least affected groups. The confidence sets can be visualized using the plot.subpop command while the two groups can be tabulated via the summary.subpop command.

Usage

subpop(
  fm,
  data,
  method = c("ols", "logit", "probit", "QR"),
  var_type = c("binary", "continuous", "categorical"),
  var,
  compare,
  subgroup = NULL,
  samp_weight = NULL,
  taus = c(5:95)/100,
  u = 0.1,
  alpha = 0.1,
  b = 500,
  seed = 1,
  parallel = FALSE,
  ncores = detectCores(),
  boot_type = c("nonpar", "weighted")
)

Arguments

Regression formula

data

The data in use

method

Models to be used for estimating partial effects. Four options: "logit" (binary response), "probit" (binary response), "ols" (interactive linear with additive errors), "QR" (linear model with non-additive errors). Default is "ols".

var_type

The type of parameter in interest. Three options: "binary", "categorical", "continuous". Default is "binary".

var

Variable T in interset. Should be a character.

compare

If parameter in interest is categorical, then user needs to specify which two category to compare with. Should be a 1 by 2 character vector. For example, if the two levels to compare with is 1 and 3, then c=("1", "3"), which will calculate partial effect from 1 to 3. To use this option, users first need to specify var as a factor variable.

subgroup

Subgroup in interest. Default is NULL. Specifcation should be a logical variable. For example, suppose data contains indicator variable for women (female if 1, male if 0). If users are interested in women SPE, then users should specify subgroup = data[, "female"] == 1.

samp_weight

Sampling weight of data. Input should be a n by 1 vector, where n denotes sample size. Default is NULL.

taus

Indexes for quantile regression. Default is c(5:95)/100.

Percentile of most and least affected. Default is set to be 0.1.

alpha

Size for confidence interval. Shoule be between 0 and 1. Default is 0.1

Number of bootstrap draws. Default is set to be 500.

seed

Pseudo-number generation for reproduction. Default is 1.

parallel

Whether the user wants to use parallel computation. The default is FALSE and only 1 CPU will be used. The other option is TRUE, and user can specify the number of CPUs in the ncores option.

ncores

Number of cores for computation. Default is set to be detectCores(), which is a function from package parallel that detects the number of CPUs on the current host. For large dataset, parallel computing is highly recommended since bootstrap is time-consuming.

boot_type

Type of bootstrap. Default is "nonpar", and the package implements nonparametric bootstrap. The alternative is "weighted", and the package implements weighted bootstrap.

Examples

Run this code

# NOT RUN {
data("mortgage")
### Regression Specification
fm <- deny ~ black + p_irat + hse_inc + ccred + mcred + pubrec +
   ltv_med + ltv_high + denpmi + selfemp + single + hischl
### Issue the subpop command
set_b <- subpop(fm, data = mortgage, method = "logit", var = "black",
u = 0.1, alpha = 0.1, b = 50)

# }

Run the code above in your browser using DataLab