Learn R Programming

stochQN (version 0.1.2-1)

stochastic.logistic.regression: Stochastic Logistic Regression

Description

Stochastic Logistic Regression

Usage

stochastic.logistic.regression(formula = NULL, pos_class = NULL,
  dim = NULL, intercept = TRUE, x0 = NULL, optimizer = "adaQN",
  optimizer_args = list(initial_step = 0.1, verbose = FALSE),
  lambda = 0.001, random_seed = 1, val_data = NULL)

Arguments

formula

Formula for the model, if it is fit to data.frames instead of matrices/vectors.

pos_class

If fit to data in the form of data.frames, a string indicating which of the classes is the positive one. If fit to data in the form of matrices/vector, pass `NULL`.

dim

Dimensionality of the model (number of features). Ignored when passing `formula` or when passing `x0`. If the intercept is added from the option `intercept` here, it should not be counted towards `dim`.

intercept

Whether to add an intercept to the covariates. Only ussed when fitting to matrices/vectors. Ignored when passing formula (for formulas without intercept, put `-1` in the RHS to get rid of the intercept).

x0

Initial values of the variables. If passed, will ignore `dim` and `random_seed`. If not passed, will generate random starting values ~ Norm(0, 0.1).

optimizer

The optimizer to use - one of `adaQN` (recommended), `SQN`, `oLBFGS`.

optimizer_args

Arguments to pass to the optimizer (same ones as the functions of the same name). Must be a list. See the documentation of each optimizer for the parameters they take.

lambda

Regularization parameter. Be aware that the functions assume the log-likelihood (a.k.a. loss) is divided by the number of observations, so this number should be small.

random_seed

Random seed to use for the initialization of the variables. Ignored when passing `x0`.

val_data

Validation data (only used for `adaQN`). If passed, must be a list with entries `X`, `y` (if passing data.frames for fitting), and optionally `w` (sample weights).

Value

An object of class `stoch_logistic`, which can be fit to batches of data through functon `partial_fit_logistic`.

Details

Binary logistic regression, fit in batches using this package's own optimizers.

See Also

partial_fit_logistic, coef.stoch_logistic , predict.stoch_logistic , adaQN , SQN, oLBFGS

Examples

Run this code
# NOT RUN {
library(stochQN)

### Load Iris dataset
data("iris")

### Example with X + y interface
X <- as.matrix(iris[, c("Sepal.Length", "Sepal.Width",
  "Petal.Length", "Petal.Width")])
y <- as.numeric(iris$Species == "setosa")

### Initialize model with default parameters
model <- stochastic.logistic.regression(dim = 4)

### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(X) / 3)
for (i in 1:10) {
  set.seed(i)
  batch <- sample(nrow(X),
      size = batch_size, replace=TRUE)
  partial_fit_logistic(model, X, y)
}

### Check classification accuracy
cat(sprintf(
  "Accuracy after 10 iterations: %.2f%%\n",
  100 * mean(
    predict(model, X, type = "class") == y)
  ))


### Example with formula interface
iris_df <- iris
levels(iris_df$Species) <- c("setosa", "other", "other")

### Initialize model with default parameters
model <- stochastic.logistic.regression(Species ~ .,
  pos_class="setosa")

### Fit to 10 randomly-subsampled batches
batch_size <- as.integer(nrow(iris_df) / 3)
for (i in 1:10) {
  set.seed(i)
  batch <- sample(nrow(iris_df),
      size=batch_size, replace=TRUE)
  partial_fit_logistic(model, iris_df)
}
cat(sprintf(
  "Accuracy after 10 iterations: %.2f%%\n",
  100 * mean(
    predict(
      model, iris_df, type = "class") == iris_df$Species
      )
  ))
# }

Run the code above in your browser using DataLab