semimrFull: Semiparametric Mixture Regression Models with Single-index Proportion and Fully Iterative Backfitting

Description

Assume that $\boldsymbol{x} = (\boldsymbol{x}_1,\cdots,\boldsymbol{x}_n)$ is an n by p matrix and $Y = (Y_1,\cdots,Y_n)$ is an n-dimensional vector of response variable. The conditional distribution of $Y$ given $\boldsymbol{x}$ can be written as: $$f(y|\boldsymbol{x},\boldsymbol{\alpha},\pi,m,\sigma^2) = \sum_{j=1}^C\pi_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}) \phi(y|m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}),\sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x})).$$ `semimrFull' is used to estimate the mixture of single-index models described above, where $\phi(y|m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x}),\sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x}))$ represents the normal density with a mean of $m_j(\boldsymbol{\alpha}^{\top}\boldsymbol{x})$ and a variance of $\sigma_j^2(\boldsymbol{\alpha}^{\top}\boldsymbol{x})$, and $\pi_j(\cdot), \mu_j(\cdot), \sigma_j^2(\cdot)$ are unknown smoothing single-index functions capable of handling high-dimensional non-parametric problem. This function employs kernel regression and a fully iterative backfitting (FIB) estimation procedure (Xiang and Yao, 2020).

Usage

semimrFull(x, y, h = NULL, coef = NULL, ini = NULL, grid = NULL, maxiter = 100)

Value

A list containing the following elements:

pi: matrix of estimated mixing proportions.
mu: estimated component means.
var: estimated component variances.
coef: estimated regression coefficients.
run: total number of iterations after convergence.

Arguments

x: an n by p matrix of observations where n is the number of observations and p is the number of explanatory variables.
y: an n-dimensional vector of response values.
h: bandwidth for the kernel regression. Default is NULL, and the bandwidth is computed in the function by cross-validation.
coef: initial value of $\boldsymbol{\alpha}^{\top}$ in the model, which plays a role of regression coefficient in a regression model. Default is NULL, and the value is computed in the function by sliced inverse regression (Li, 1991).
ini: initial values for the parameters. Default is NULL, which obtains the initial values, assuming a linear mixture model. If specified, it can be a list with the form of list(pi, mu, var), where pi is a vector of mixing proportions, mu is a vector of component means, and var is a vector of component variances.
grid: grid points at which nonparametric functions are estimated. Default is NULL, which uses the estimated mixing proportions, component means, and component variances as the grid points after the algorithm converges.
maxiter: maximum number of iterations. Default is 100.

References

Xiang, S. and Yao, W. (2020). Semiparametric mixtures of regressions with single-index for model based clustering. Advances in Data Analysis and Classification, 14(2), 261-292.

Li, K. C. (1991). Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414), 316-327.

Examples

Run this code

xx = NBA[, c(1, 2, 4)]
yy = NBA[, 3]
x = xx/t(matrix(rep(sqrt(diag(var(xx))), length(yy)), nrow = 3))
y = yy/sd(yy)
ini_bs = sinvreg(x, y)
ini_b = ini_bs$direction[, 1]
est = semimrFull(x[1:50, ], y[1:50], h = 0.3442, coef = ini_b)