FSR: FSR

Description

FSR

Usage

FSR(Xy, max_poly_degree = 3, max_interaction_degree = 2, outcome = NULL,
  linear_estimation = FALSE, threshold_include = 0.01,
  threshold_estimate = 0.001, min_models = NULL, max_fails = 2,
  standardize = FALSE, pTraining = 0.8, file_name = NULL,
  store_fit = "none", max_block = 250, noisy = TRUE, seed = NULL)

Arguments

matrix or data.frame; outcome must be in final column.

max_poly_degree

highest power to raise continuous features; default 3 (cubic).

max_interaction_degree

highest interaction order; default 2 (allow x_i*x_j). Also interacts each level of factors with continuous features.

outcome

Treat y as either 'continuous', 'binary', 'multinomial', or NULL (auto-detect based on response).

linear_estimation

Logical: model outcome as linear and estimate with ordinary least squares? Recommended for speed on large datasets even if outcome is categorical. (For multinomial outcome, this means treated response as vector.) If FALSE, estimator chosen based on 'outcome' (i.e., OLS for continuous outcomes, glm() to estimate logistic regression models for 'binary' outcomes, and nnet::multinom() for 'multinomial').

threshold_include

minimum improvement to include a recently added term in the model (change in fit originally on 0 to 1 scale). -1.001 means 'include all'. Default: 0.01. (Adjust R^2 for linear models, Pseudo R^2 for logistic regression, out-of-sample accuracy for multinomial models. In latter two cases, the same adjustment for number of predictors is applied as pseudo-R^2.)

threshold_estimate

minimum improvement to keep estimating (pseudo R^2 so scale 0 to 1). -1.001 means 'estimate all'. Default: 0.001.

min_models

minimum number of models to estimate. Defaults to the number of features (unless P > N).

max_fails

maximum number of models to FSR() can fail on computationally before exiting. Default == 2.

standardize

if TRUE (not default), standardizes continuous variables.

pTraining

portion of data for training

file_name

If a file name (and path) is provided, saves output after each model is estimated as an .RData file. ex: file_name = "results.RData". See also store_fit for options as to how much to store in the outputted object.

store_fit

If file_name is provided, FSR() will return coefficients, measures of fit, and call details. Save entire fit objects? Options include "none" (default, just save those other items), "accepted_only" (only models that meet the threshold), and "all".

max_block

Most of the linear algebra is done recursively in blocks to ease memory managment. Default 250. Changing up or down may slow things...

noisy

display measures of fit, progress, etc. Recommended.

seed

Automatically set but can also be passed as paramater.

Value

list with slope coefficients, model details, and measures of fit

Examples

Run this code

# NOT RUN {
out <- FSR(mtcars)
# }

Run the code above in your browser using DataLab