splitratio() finds the optimal splitting ratio by assuming a polynomial regression model with interactions can approximate the true model. The number of parameters in the model is estimated from the full data using stepwise regression. A simpler solution is to choose the number of parameters to be square root of the number of unique rows in the input matrix of the dataset. Please see Joseph (2022) for details.
Usage
splitratio(x, y, method = "simple", degree = 2)
Arguments
x
Input matrix
y
Response (output variable)
method
This could be <U+201C>simple<U+201D> or <U+201C>regression<U+201D>. The default method <U+201C>simple<U+201D> uses the square root of the number of unique rows in x as the number of parameters, whereas <U+201C>regression<U+201D> estimates the number of parameters using stepwise regression. The <U+201C>regression<U+201D> method works only with continuous output variable.
degree
This specifies the degree of the polynomial to be fitted, which is needed only if method=<U+201C>regression<U+201D> is used. Default is 2.
Value
Splitting ratio, which is the fraction of the dataset to be used for testing.
References
Joseph, V. R. (2022). Optimal Ratio for Data Splitting. Statistical Analysis & Data Mining: The ASA Data Science Journal, to appear.