The article for this function is in preparation for publication. Please be patient.
Two stage subsampling algorithm for big data under linear regression for potential model misspecification.
First stage is to obtain a random sample of size \(r_0\) and estimate the model parameters.
Using the estimated parameters subsampling probabilities are evaluated for A-, L-, L1-optimality criteria,
RLmAMSE and enhanced RLmAMSE (log-odds and power) subsampling methods.
Through the estimated subsampling probabilities a sample of size \(r \ge r_0\) is obtained.
Finally, the two samples are combined and the model parameters are estimated for A-, L-, L1-optimality,
RLmAMSE and enhanced RLmAMSE (log-odds and power).
NOTE : If input parameters are not in given domain conditions
necessary error messages will be provided to go further.
If \(r \ge r_0\) is not satisfied then an error message will be produced.
If the big data \(X,Y\) has any missing values then an error message will be produced.
The big data size \(N\) is compared with the sizes of \(X,Y\),F_estimate_Full and
if they are not aligned an error message will be produced.
If \(\alpha > 1\) for the scaling factor is not satisfied an error message will be produced.
If proportion is not in the region of \((0,1]\) an error message will be produced.
model is a formula input formed based on the covariates through the spline terms (s()),
squared term (I()), interaction terms (lo()) or automatically. If model is empty or NA
or NAN or not one of the defined inputs an error message is printed. As a default we have set
model="Auto", which is the main effects model wit the spline terms.