bfsl: Calculates the Best-fit Straight Line

Description

bfsl calculates the best-fit straight line to independent points with (possibly correlated) normally distributed errors in both coordinates.

Usage

bfsl(...)
# S3 method for default
bfsl(x, y = NULL, sd_x = 0, sd_y = 1, r = 0, control = bfsl_control(), ...)
# S3 method for formula
bfsl(
  formula,
  data = parent.frame(),
  sd_x,
  sd_y,
  r = 0,
  control = bfsl_control(),
  ...
)

Arguments

...

Further arguments passed to or from other methods.

A vector of x observations or a data frame (or an object coercible by as.data.frame to a data frame) containing the named vectors x, y, and optionally sd_x, sd_y and r. If weights w_x and w_y are given, then sd_x and sd_y are calculated from sd_x = 1/sqrt(w_x) and sd_y = 1/sqrt(w_y). Specifying y, sd_x, sd_y or r directly as function arguments overwrites these variables in the data structure.

A vector of y observations.

sd_x

A vector of x measurement error standard deviations. If it is of length one, all data points are assumed to have the same x standard deviation.

sd_y

A vector of y measurement error standard deviations. If it is of length one, all data points are assumed to have the same y standard deviation.

A vector of correlation coefficients between errors in x and y. If it is of length one, all data points are assumed to have the same correlation coefficient.

control

A list of control settings. See bfsl_control for the names of the settable control values and their effect.

formula

A formula specifying the bivariate model (as in lm, but here only y ~ x makes sense).

data

A data.frame containing the variables of the model.

Value

An object of class "bfsl", which is a list containing the following components:

coefficients

A 2x2 matrix with columns of the fitted coefficients (intercept and slope) and their standard errors.

chisq

The goodness of fit (see Details).

fitted.values

The fitted mean values.

residuals

The residuals, that is y observations minus fitted values.

df.residual

The residual degrees of freedom.

cov.ab

The covariance of the slope and intercept.

control

The control list used, see the control argument.

convInfo

A list with convergence information.

call

The matched call.

data

A list containing x, y, sd_x, sd_y and r.

Details

bfsl provides the general least-squares estimation solution to the problem of fitting a straight line to independent data with (possibly correlated) normally distributed errors in both x and y.

With sd_x = 0 the (weighted) ordinary least squares solution is obtained. The calculated standard errors of the slope and intercept multiplied with sqrt(chisq) correspond to the ordinary least squares standard errors.

With sd_x = c, sd_y = d, where c and d are positive numbers, and r = 0 the Deming regression solution is obtained. If additionally c = d, the orthogonal distance regression solution, also known as major axis regression, is obtained.

Setting sd_x = sd(x), sd_y = sd(y) and r = 0 leads to the geometric mean regression solution, also known as reduced major axis regression or standardised major axis regression.

The goodness of fit metric chisq is a weighted reduced chi-squared statistic. It compares the deviations of the points from the fit line to the assigned measurement error standard deviations. If x and y are indeed related by a straight line, and if the assigned measurement errors are correct (and normally distributed), then chisq will equal 1. A chisq > 1 indicates underfitting: the fit does not fully capture the data or the measurement errors have been underestimated. A chisq < 1 indicates overfitting: either the model is improperly fitting noise, or the measurement errors have been overestimated.

References

York, D. (1968). Least squares fitting of a straight line with correlated errors. Earth and Planetary Science Letters, 5, 320<U+2013>324, https://doi.org/10.1016/S0012-821X(68)80059-7

Examples

Run this code

# NOT RUN {
x = pearson_york_data$x
y = pearson_york_data$y
sd_x = 1/sqrt(pearson_york_data$w_x)
sd_y = 1/sqrt(pearson_york_data$w_y)
bfsl(x, y, sd_x, sd_y)
bfsl(y~x, pearson_york_data, sd_x, sd_y)

fit = bfsl(pearson_york_data)
plot(fit)

# }

Run the code above in your browser using DataLab