which_poly: Find optimal polynomial model

Description

which_poly tries polynomial regression with polynomials from degree 0 (a constant) to degree 6, on data provided. It then outputs values of the variance of the residuals for each degree and displays a plot of the same versus the degree number, in an effort to suggest the degree of the best polynomial for the regression. The regression coefficients can then be calculated with the function polysolveLS.

Usage

which_poly(pts, mmax = 6, plt = TRUE, tol = NULL)

Value

A data frame with two columns, the first named m

and including the degrees of all polynomials tested. The second called sige and including the value of the variances corresponding to all values of m. The function also displays a plot of sige vs m, by default.

Arguments

pts: A $n \times 2$ matrix or data frame where each row contains the coordinates of a data point used for regression.
mmax: An integer. The highest degree of the polynomial to be used to calculate the variance of the residuals. The default value is 6.
plt: A logical variable to command the display of the plot of the variance vs the polynomials' degree. The default is plt=TRUE.
tol: A real number. The solution of a linear system can be compromised when the condition number of the matrix of coefficients is particularly high (ill-conditioned matrices). tol is the reciprocal of the condition number. For values of tol smaller than 1e-17, ill-conditioning is deemed to be sever enough not to guarantee an accurate solution. For such values the function stops execution, returning an error message. In fact, the solution can still be accurate, notwithstanding ill-conditioning, and the user can force the calculation of a solution using a value of tol smaller than 1e-17. Default is NULL, corresponding to a tol=1e-17.

Details

The ability of a polynomial regression to account for most data variability, without including data noise is reflected in how the variance, $$ \sigma_e^2=(\sum_{i=1}^n \epsilon_i^2)/(n-m-1) $$ drops with the increasing degree of the polynomial used to perform the regression. A sudden drop, followed by values slowly decreasing, or alternating slightly increasing and decreasing behaviour, indicates that the degree corresponding to the sudden drop belongs to the polynomial modelling most data variability and neglecting data noise. As polynomial regression is normally used with polynomials of degree up to 4 or 5, a default set of polynomials up to degree 6 is here tried out. Degrees higher than 6 can be forced by the user, but the risk with higher degrees is that the system of normal equations connected with regression becomes severly ill conditioned. In such situations the user should change the tolerance (tol) to values smaller than the default 1e-17.

Examples

Run this code

# 21 points close to the quadratic x^2 - 5*x + 6
x <- seq(-2,5,length=21)
set.seed(7766)
eps <- rnorm(21,mean=0,sd=0.5)
y <- x^2-5*x+6+eps

# Data frame
pts <- data.frame(x=x,y=y)

# Try function without plot
ddd <- which_poly(pts,plt=FALSE)
print(ddd)

# Try function with plot and extending
# highest polynomials' degree to 10
ddd <- which_poly(pts,mmax=10)

Run the code above in your browser using DataLab