Learn R Programming

berryFunctions (version 1.4)

mReg: multiple regression

Description

Multiple regression fitting various function types including e.g. linear, cubic, logarithmic, exponential, power, reciprocal. Quick way to find out what function type fits the data best. Plots data and fitted functions and adds a legend with the functions (or their types=structure) sorted by R squared. Returns the fitted functions with their parameters and R^2 values in a data.frame.

Usage

mReg(x, y, Poly45=FALSE, exp_4=FALSE,
xf=deparse(substitute(x)),  yf=deparse(substitute(y)), ncolumns=9, plot=TRUE,
add=FALSE,  nbest=12, R2min, selection=NULL, digits=2, extend=0.4,
xlim=range(x, finite=TRUE) + c(-1,1)* extend * diff(range(x,  finite=TRUE)),
ylim=range(y, finite=TRUE) + c(-1,1)* extend * diff(range(y, finite=TRUE)),
xlab=xf,  ylab=yf, las=1, lwd=rep(1, 12), lty=rep(1, 12), col=NULL,
pcol=par("col"), pch=16, legend=TRUE, legargs=NULL, legendform="nameform", ...)

Arguments

x
Numerical vector. x coordinates of points for regression
y
Y-values
Poly45
Logical. Should 4th and 5th degree polynomials also be fitted? DEFAULT: FALSE, as the formulas are very long.
exp_4
Logical. Return 4-parametric exponential distibution fits? (only best fit is plotted). exp_4par_ini has the initial values of exponential fitting with the data relocated to first quadrant. The others are optimized with the methods of
xf
Character. x name for Formula. DEFAULT: substitute(x) before replacing zeros in x and y
yf
Ditto for y
ncolumns
Number of columns in output. Set lower to avoid overcrowding the console. DEFAULT: 9
plot
Logical. plot data and fitted functions? DEFAULT: TRUE
add
Logical. add lines to existing plot? DEFAULT: FALSE
nbest
Integer. Number of best fitting functions to be plotted (console output table always has all). DEFAULT: 12
R2min
Numerical. Minimum Rsquared value for function type to be plotted. Suggestion: 0.6 (2/3 of variation of y is explained by function of x). DEFAULT: empty
selection
Integers of functions to be plotted, assigned as in list above. DEFAULT: NULL, meaning all
digits
Integer. number of significant digits used for rounding formula parameters and R^2 displayed. DEFAULT: 2
extend
Numerical. Extention of axis ranges (proportion of range). DEFAULT: 0.4
xlim
Numerical vector with two values, defining the x-range of the lines to be plotted. DEFAULT: extended range(x)
ylim
Ditto for Y-axis
xlab
Character. default labels for axis labeling and for formulas. DEFAULT: substitute(x) before replacing zeros in x and y
ylab
Ditto for y axis.
las
Integer in 0:4. label axis style. See par. DEFAULT: 1
lwd
Numerical of length 12. line width for lines. DEFAULT: rep(1,12)
lty
Numerical of length 12. line type. DEFAULT: rep(1,12)
col
Numerical of length 12. line colors. DEFAULT: NULL, means they are specified internally
pcol
Numerical. Color used for the data-points themselves. DEFAULT: par('col')
pch
Integer or single character. Point CHaracter for the data points. See par. DEFAULT: 16
legend
Logical. add legend to plot? DEFAULT: TRUE
legargs
List. Arguments passed to legend. Will overwrite defaults. DEFAULT: NULL
legendform
One of 'full', 'form', 'nameform' or 'name'. Complexity (and length) of legend in plot. See Details. DEFAULT: 'nameform'
...
Further graphical parameters passed to plot

Value

  • data.frame with rounded R squared, formulas, and full R^2 and parameters for further use. Rownames are the names (types) of function. Sorted decreasingly by R^2

warning

A well fitting function does NOT imply correct causation! A good fit does NOT mean that you describe the bahaviour of a system adequatly! Extrapolation can be DANGEROUS! Always extrapolate to see if a function fits the expected results there as well. Avoid overfitting: Poly45 will often yield good results (in terms of R^2), but can be way overfitted. And outside the range of values, they act wildly.

Details

legendform : example full : 7.8*x + 6.31 form : a*x+b nameform : linear a*x+b name : linear full can be quite long, especially with Poly45=TRUE!

References

Listed here: http://rclickhandbuch.wordpress.com/berryfunctions/#mReg

See Also

glm, lm, optim

Examples

Run this code
set.seed(12)
x <- c(runif(100,0,3), runif(200, 3, 25)) # random from uniform distribution
y <- 12.367*log10(x)+7.603+rnorm(300)     # random from normal distribution
plot(x,y, xlim=c(0,40))
mReg(x,y) # Warnings come from negative y-values

# Passing arguments to legend:
mReg(x,y, pch=1, legargs=list(x="bottomright", cex=0.7), legendform="form")

mReg(x,y, extend=0.2) # less empty space around data points
mReg(x,y, nbest=4) # only 4 distributions plotted
mReg(x,y, legargs=list(x=7, y=8, bty="o")) # Legend position as coordinates

View(mReg(x,y, Poly45=TRUE, exp_4=TRUE, plot=FALSE)) # exp_4: fit more distributions
# optim methods often yield different results, so be careful using this.
# I might insert a possibility to specify initial values for optim.
# 4 Parameters allow several combinations to yield similarly good results!
plot( 0:10, 3.5*exp(0.8*( 0:10 + 2      )) + 15 , type="l")
lines(0:10,  18*exp(0.8*( 0:10 - 2.5e-05)) - 5, col=2)


# okay, different dataset:
x <- c(1.3, 1.6, 2.1, 2.9, 4.4, 5.7, 6.6, 8.3, 8.6, 9.5)
y <- c(8.6, 7.9, 6.6, 5.6, 4.3, 3.7, 3.2, 2.5, 2.5, 2.2)
mReg(x,y, legargs=list(cex=0.7, x="topright"), main="dangers of extrapolation")
points(x,y, cex=2, lwd=2)
# Polynomial fits are good within the data range, but, in this case obviously,
# be really careful extrapolating! If you know that further data will also be low,
# add another point to test differences:
mReg(c(x,11,13,15), c(y,2,2,2), xf="myX", yf="myY", Poly45=TRUE, legendform="name")
points(x,y, cex=2, lwd=2)
# The Polynomials are still very good: they have 5 to 6 Parameters, after all!
# Poly45 is set to FALSE by default to avoid such overfitting.
 
mReg(x,y, pcol=8, ncol=0) # no return to console

# only plot a subset: best n fits, minimum fit quality, or user selection
mReg(x,y, pcol=8, ncol=2, nbest=4)
mReg(x,y, pcol=8, ncol=2, R2min=0.7)
mReg(x,y, pcol=8, ncol=2, selection=c(2,5,8))
# selecting the fifth degree polynomial activates Poly45 (in the output table)

# Add to axisting plot:
plot(x,y, xlim=c(0,40))
mReg(x,y, add=TRUE, lwd=12:1/2, ncol=0)
# lwd, lty can be vectors of length 12, specifying each line separately.
# Give those in fix order (see section notes), not in best-fit order of the legend.
# The order is Polynomial(1:5), log, exp, power, reciprocal, rational, exp_4_param
# color has to be a vector of 12
# opposedly, lwd and lty are repeated 12 times, if only one value is given


# One more dataset:
j <- c(5,8,10,9,13,6,2) ; k <- c(567,543,587,601,596,533,512)
# Inset from margin of plot region:
mReg(j,k, legargs=list(x="bottomright", inset=.05, bty="o"), legendform="name")
# Legend forms
mReg(j,k, legargs=list(x="bottomright"), legendform="name")
mReg(j,k, legargs=list(x="bottomright"), legendform="form")
mReg(j,k, legargs=list(x="bottomright"), legendform="nameform")
mReg(j,k, legargs=list(x="bottomright"), legendform="full")


# The question that got me started on this whole function...
# exponential decline of temperature of a mug of hot chocolate
tfile <- system.file("extdata/Temp.txt", package="berryFunctions")
temp <- read.table(tfile, header=TRUE, dec=",")
head(temp)
plot(temp)
temp <- temp[-20,] # missing value - rmse would complain about it

x <- temp$Minuten
y <- temp$Temp 
mReg(x,y, exp_4=TRUE, selection=11)
# y=49*e^(-0.031*(x - 0  )) + 25 correct, judged from the model:
# Temp=T0 - Te *exp(k*t) + Te     with    T0=73.76,  Tend=26.21, k=-0.031
# optmethod="Nelder-Mead"  # y=52*e^(-0.031*(x + 3.4)) + 26 wrong


x <- seq(1, 1000, 1)
y <- (x+22)/(x+123) # can't find an analytical solution so far. Want to check out nls
mReg(x, y, posx="right")


# Solitaire Results. According to en.wikipedia.org/wiki/Klondike_(solitaire):
# Points=700000/Time + Score
# I recorded my results on a windows XP system in 2010 as an excuse to play this game a lot.
# Games longer than 100 seconds are intentionally bad to find what happens then...
Time <- c(81, 89, 101, 81, 85, 86, 78, 104, 120, 81, 76, 79, 92, 81, 107, 108,
73, 79, 77, 85, 141, 88, 92, 65, 96, 99, 93, 73, 91, 83, 96, 91, 93, 85, 70, 113,
113, 102, 84, 123, 89, 73, 90, 97, 82, 124, 101, 85, 78, 96, 85, 84, 83, 84, 153,
92, 88, 79, 112, 81, 74, 72, 99, 97, 108, 98, 60, 111, 175, 71, 76, 68, 195, 297,
73, 109, 69, 69, 461, 75, 84, 81, 65, 66, 120, 88, 92, 347, 408, 103, 63, 272,
227, 341, 250, 74, 71, 85, 120, 84, 325, 496, 120, 110)

Points <- c(9189, 8324, 7250, 9089, 8685, 8569, 9516, 7126, 6139, 9124, 9786,
9421, 8033, 9059, 6973, 6917, 10111, 9451, 9526, 8814, 5225, 8529, 8067, 11423,
7755, 7547, 7992, 10106, 8168, 8964, 7657, 8252, 8067, 8719, 10551, 6521, 6546,
7350, 8904, 6019, 8304, 10111, 8362, 7787, 9094, 6105, 7325, 8809, 9526, 7852,
8699, 8853, 8969, 8900, 4790, 8027, 8314, 9451, 6716, 9194, 9806, 10167, 7417,
7689, 6850, 7512, 12203, 6854, 4451, 10426, 9764, 10738, 4152, 2712, 10116, 6780,
10693, 10563, 1918, 9881, 8799, 9074, 11403, 11193, 6226, 8429, 8087, 2383, 2150,
7170, 11753, 2896, 3639, 2313, 2880, 9991, 10376, 8709, 6156, 8794, 2451, 1753, 
6178, 6678)

mReg(Time, Points) # and yes, reciprocal ranks highest! Play Fast!

Run the code above in your browser using DataLab