Learn R Programming

OptimalBinningWoE (version 1.0.8)

fit_logistic_regression: Fit Logistic Regression Model

Description

This function fits a logistic regression model to binary classification data. It supports both dense and sparse matrix inputs for the predictor variables. The optimization is performed using the L-BFGS algorithm.

Usage

fit_logistic_regression(X_r, y_r, maxit = 300L, eps_f = 1e-08, eps_g = 1e-05)

Value

A list containing the results of the logistic regression fit:

coefficients

Numeric vector of estimated regression coefficients.

se

Numeric vector of standard errors for the coefficients.

z_scores

Numeric vector of z-statistics for testing coefficient significance.

p_values

Numeric vector of p-values associated with the z-statistics.

loglikelihood

Scalar. The maximized log-likelihood value.

gradient

Numeric vector. The gradient at the solution.

hessian

Matrix. The Hessian matrix evaluated at the solution.

convergence

Logical. Whether the algorithm converged successfully.

iterations

Integer. Number of iterations performed.

message

Character. Convergence message.

Arguments

X_r

A numeric matrix or sparse matrix (dgCMatrix) of predictor variables. Rows represent observations and columns represent features.

y_r

A numeric vector of binary outcome values (0 or 1). Must have the same number of observations as rows in X_r.

maxit

Integer. Maximum number of iterations for the optimizer. Default is 300.

eps_f

Numeric. Convergence tolerance for the function value. Default is 1e-8.

eps_g

Numeric. Convergence tolerance for the gradient norm. Default is 1e-5.

Details

The logistic regression model estimates the probability of the binary outcome \(y_i \in \{0, 1\}\) given predictors \(x_i\): $$P(y_i = 1 | x_i) = \frac{1}{1 + e^{-(\beta_0 + \beta_1 x_{i1} + ... + \beta_p x_{ip})}}$$

The function maximizes the log-likelihood: $$\ell(\beta) = \sum_{i=1}^n [y_i \cdot (\beta^T x_i) - \ln(1 + e^{\beta^T x_i})]$$

Standard errors are computed from the inverse of the Hessian matrix evaluated at the estimated coefficients. Z-scores and p-values are derived under the assumption of asymptotic normality.

Examples

Run this code
# Generate sample data
set.seed(123)
n <- 100
p <- 3
X <- matrix(rnorm(n * p), n, p)
# Add intercept column
X <- cbind(1, X)
colnames(X) <- c("(Intercept)", "X1", "X2", "X3")

# True coefficients
beta_true <- c(0.5, 1.2, -0.8, 0.3)

# Generate linear predictor
eta <- X %*% beta_true

# Generate binary outcome
prob <- 1 / (1 + exp(-eta))
y <- rbinom(n, 1, prob)

# Fit logistic regression
result <- fit_logistic_regression(X, y)

# View coefficients and statistics
print(data.frame(
  Coefficient = result$coefficients,
  Std_Error = result$se,
  Z_score = result$z_scores,
  P_value = result$p_values
))

# Check convergence
cat("Converged:", result$convergence, "\n")
cat("Log-Likelihood:", result$loglikelihood, "\n")

Run the code above in your browser using DataLab