search.bin: Create a Model Set for Binary Choice Models

Description

Use this function to create a binary choice model set and search for the best models (and other information) based on in-sample and out-of-sample evaluation metrics.

Usage

search.bin(
  data,
  combinations,
  metrics = get.search.metrics(),
  modelChecks = get.search.modelchecks(),
  items = get.search.items(),
  options = get.search.options(),
  costMatrices = NULL,
  searchLogit = TRUE,
  searchProbit = FALSE,
  optimOptions = get.options.newton(),
  aucOptions = get.options.roc()
)

Value

A nested list with the following members:

counts: Information about the expected number of models, number of estimated models, failed estimations, and some details about the failures.
results: A data frame with requested information in items list.
info: The arguments and some general information about the search process such as the elapsed time.

Note that the output does not contain any estimation results, but minimum required data to estimate the models (Use summary() function to get the estimation).

Arguments

data: A list that determines data and other required information for the search process. Use get.data() function to generate it from a matrix or a data.frame.
combinations: A list that determines the combinations of the exogenous variables in the search process. Use get.combinations() function to define it.
metrics: A list of options for measuring performance. Use get.search.metrics function to get them.
modelChecks: A list of options for excluding a subset of the model set. Use get.search.modelchecks function to get them.
items: A list of options for specifying the purpose of the search. Use get.search.items function to get them.
options: A list of extra options for performing the search. Use get.search.options function to get them.
costMatrices: A list of numeric matrices where each one determines how to score the calculated probabilities. Given the number of choices n, a frequency cost matrix is an m x n+1 matrix. The first column determines the thresholds. Elements in the j-th column determine the costs corresponding to the j-1-th choice in y. It can be NULL if it is not selected in metrics.
searchLogit: If TRUE, logit regressions are added to the model set.
searchProbit: If TRUE, probit regressions are added to the model set.
optimOptions: A list for Newton optimization options. Use get.options.newton function to get the options.
aucOptions: A list for AUC calculation options. Use get.options.roc function to get the options.

Examples

Run this code

# We simulate some data for this example:
# sample data:
n = 50 # number of observations
num_x_r <- 3L # number of relevant explanatory variables
num_x_ir <- 20 # (relatively large) number of irrelevant explanatory variables
set.seed(340)
sample <- sim.bin(num_x_r, n)
x_ir <- lapply(1:num_x_ir, function(x) rnorm(n))

# prepare data:
data <- data.frame(sample$y, sample$x, x_ir)
colnames(data) <- c("Y", colnames(sample$x), paste0("z", 1:num_x_ir))

# Use glm function to estimate and analyse:
fit <- glm(Y ~ . - Y, data = data, family = binomial())
summary(fit)

# You can also use this package estimation function:
data0 <- get.data(data,
                equations = list(Y ~ . - Y),
                addIntercept = FALSE)
fit <- estim.bin(data = data0)
# format and print coefficients:
print(fit)

# Alternatively, You can define a binary choice model set:
x_sizes = c(1:3) # assuming we know the number of relevant explanatory variables is less than 3
metric_options <- get.search.metrics(typesIn = c("sic")) # We use SIC for searching
search_res <- search.bin(data = data0,
                         combinations = get.combinations(sizes = x_sizes),
                         metrics = metric_options)
print(search_res)

# Use summary function to estimate the best model:
search_sum <- summary(search_res, y = sample$y, x = data[,3:ncol(data)])

# format and print coefficients:
s_fit <- summary(search_res)
print(s_fit$results[[1]]$value)

# Try a step-wise search for creating a larger model set:
search_res <- search.bin(data = data0,
                         combinations = get.combinations(
                           sizes = list(c(1, 2, 3), c(4)),
                           stepsNumVariables = c(NA, 7)),
                         metrics = metric_options)
# combinations argument is different

print(search_res)
# Use summary like before.

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

See Also

Examples