Usage
phyreg(control, test, data, subset, phydata, taxmatrix, heightsdata,
rho = -1, lorho = 0.3, hirho = 0.6, errrho = 0.02, minrho = 1e-04,
tolerance = 1e-06, oppf = 5, opdf = 0, parmx = 0, parmxz = 0,
opfunccall = 0, addDF = 0, linputs = FALSE, sinputs = FALSE, means = FALSE,
lmshortx = FALSE, lmshortxz = FALSE, lmlongx = FALSE, lmlongxz = FALSE,
hinput = FALSE, paper = FALSE, dfwarning = TRUE, oprho = FALSE, plot=FALSE,
reset=FALSE)
Arguments
control
The null hypothesis model, which should be a formula with a response variable. If there are no variables to control for, specify y~1
. (y~0
is not permitted.)
test
The test terms. To control for y~b
, and test for a*x
, you can specify (i) test="a*x"
or (ii) test=.~b+a*x
(Note the LHS *must* be a dot for this "update-style" specification), or (iii) test=~+a*x
. (Note the "+" and the absence of the response variable in this final, "adding terms", form.)
data
A mandatory dataframe containing the variables in the model formulae
subset
a vector of the same length as the data, containing 0/1 for exclusion or inclusion of a species. If unspecified, the internal default is to include all species. To return to being unspecified, say subset=NULL
phydata
a vector with the phylogeny in the internal package format (containing for each non-root node the ID number of its parent). Either phydata
or taxmatrix
must be supplied. To unspecify, say phydata=NULL
. A phylogeny in newick or phylo format can be converted into the internal format, preserving height information, with phyfromnewick
or phyfromphylo
.
taxmatrix
a dataframe with a set of taxonomic vectors to specify the working phylogeny. There should be no species column, and the others should start with low levels (e.g. genus) and end with the highest (perhaps order). If taxmatrix
is specified, you can alternatively specify heightsdata
with a vector containing one height for each level you've given, plus a height for the root, or if that is NULL
the default "Figure 2" heights of Grafen (1989) will be used. Either taxmatrix
or phydata
(-- not both) must be supplied. Despecify with taxmatrix=NULL
.
heightsdata
If not specified, the height of nodes will be calculated using the default "Figure 2" method of Grafen (1989). If a height is specified for each node, then the vector will be one element longer than phydata
, as the root has a height but no parent. If taxmatrix
is specified, then heightsdata
can also be specified as a vector of length one plus the number of taxonomic vectors, in which case each node is given a height corresponding to its level in the taxonomy -- the extra element is the height of the root, and the species are assumed to have a height of zero. A parent node must be higher than its daughter -- zero branch lengths are not allowed, but are equivalent to combining the daughters of the two nodes under a single node.
rho
If zero or negative, rho
will be fitted by maximum likelihood. If positive, that value will taken for rho and no fitting will be done. Once the node heights are scaled to lie between 0 and 1, each height is raised to the power rho before being used to determine the error structure in the model. See Grafen (1989) for details.
lorho
The starting lower bound for the search for rho
hirho
The starting upper bound for the search for rho
errrho
The search for rho will stop when it is known to lie in an interval of length errrho
minrho
The search for rho will stop when the whole current search interval lies below minrho
. A zero value of rho is problematic for the program.
tolerance
This tolerance decides whether there is enough variability among the daughters of a higher node to justify putting it into the short regression. This is because the residuals in the long regression on the control variables are used to calculate a contrast; if they are very small, then the contrast is effectively being determined by machine rounding errors, and if they are zero, then all contrasts should also be zero and so the higher node will be irrelevant in a regression through the origin.
oppf
If non-zero, the p-value and F-statistic are printed out at each call of phyreg
, with ndigits=oppf
. If zero, they are not printed out.
opdf
If non-zero, a breakdown of degrees of freedom is printed out at each call of phyreg
.
parmx
If non-zero, the parameters in the long regression with the control formula are printed out at each call of phyreg
.
parmxz
If non-zero, the parameters in the long regression with the control+test formula are printed out at each call of phyreg
. Note these should be regarded with caution, as the value of rho was fitted just on the control formula. For better estimates, run phyreg
again with the old control+test as the new control, and set opparmx=1
. But opparmxz
will usually give a good rough guide.
plot
A plot is produced based on the output of the short regression, which has one point for each higher node included. The data for this plot are contained after x<-phyreg(...)
as x$details$influence$RCT
and x$details$influence$DR
. The horizontal axis, RCT
, is the residual of the point after fitting the control+test model. The vertical axis, DR
, contains the improvement in that point through including the test terms, measured as the difference between the squares of the residuals. In each case, the residuals are adjusted for degrees of freedom, i.e. divided by the residual degree of freedom of the control or control+test model as appropriate. The variables are then scaled to make them easier to read and interpret. DR
is scaled so it adds to 100, and each point then indicates the percentage of the improvement in SS that is due to a higher node (possibly negative if it fits less well after the test variables are added). RCT
is scaled so the mean square error is 1
, and so Normal errors would give 95% of points within 1.96 of the mean (but remember the regression is through the origin so the average of RCT
does not have to be zero). The significance of the plot is that extreme points on the horizontal axis fit badly after adding the test variables, while those near zero fit well. Points near zero on the vertical axis have been little changed in their contribution to badness-of-fit by adding the test variables, while highly positive points have had a lot of their deviation explained by the test variables.
opfunccall
If non-zero, the function call is printed out
addDF
Advised to leave well alone. Included for backwards compatibility with the original GLIM (Trademark) version. It was originally but misguidedly intended to make a correction to the total degrees of freedom for the test in recognition of the fitting of rho. However, the fitting of rho is now regarded as affecting the numerator and denominator SS equally, and so not to require the correction.
linputs
If TRUE
or 1
, the inputs to the long regression will be stored. After x<-phyreg(...,linputs=1)
, it will be found that x$linputs$y
, x$linputs$designxz
and x$linputs$w
will contain the response, design matrix, and weights for the long regression, respectively. Use with care. The whole point of the the problem of unrecognised phylogeny is that the standard errors from this regression cannot be trusted, though with a dependable binary phylogeny, the output can be accepted at face value.
sinputs
If TRUE
or 1
, the inputs to the short regression will be stored. After x<-phyreg(...,sinputs=1)
, it will be found that x$sinputs$y
, x$sinputs$designxz
and x$sinputs$w
will contain the response, design matrix, and weights for the short regression, respectively. Use with care. Not only can the standard errors from this regression not be trusted, but the parameter estimates are biassed!
means
If TRUE
or 1
, the phylogenetically-weighted means will be printed for the response and for each variable in the design matrix of the control+test model (in the original species dataset, which are the same as the weighted means in the long regression).
lmshortx
If TRUE
or 1
, then after x<-phyreg(...,lmshortx=1)
, the lm.wfit()
output from the short regression with the control formula will be stored in x$lmshortx
. Use with care -- the parameter estimates are biassed...
lmshortxz
If TRUE
or 1
, then after x<-phyreg(...,lmshortxz=1)
, the lm.wfit()
output from the short regression with the control+test formula will be stored in x$lmshortxz
. Use with care -- the parameter estimates are biassed...
lmlongx
If TRUE
or 1
, then after x<-phyreg(...,lmlongx=1)
, the lm.wfit()
output from the long regression with the control formula will be stored in x$lmlongx
. The SEs cannot be trusted for non-binary phylogenies.
lmlongxz
If TRUE
or 1
, then after x<-phyreg(...,lmlongxz=1)
, the lm.wfit()
output from the long regression with the control+test formula will be stored in x$lmlongxz
. The SEs cannot be trusted for non-binary phylogenies.
hinput
If TRUE
or 1
, then after x<-phyreg(...,hinput=1)
, the heights used will be stored in x$hinput
. If you use taxmatrix=...
, this is a way to obtain the internal "phy" version of the heights for each higher node. These heights have not been modulated by rho i.e. they are the starting heights before rho is applied.
paper
If TRUE
or 1
, then after x<-phyreg(...,paper=1)
, various matrices that appear in the appendix of the source paper (Grafen 1989) will be stored under x$paper
. Type names(x$paper)
to see which ones!
dfwarning
If TRUE
or 1
, the loss of degrees of freedom for phylogenetic reasons is detailed. It is a "warning", so if none are lost, nothing is printed, and it doesn't occupy space in your output.
oprho
If nonzero, details of rho will be printed with ndigits=oprho
. If rho was fitted, the value of rho and the log-likelihood will be given, and how the search ended (lower boundary, or an internal maximum).
reset
If TRUE
or 1, the parameters are all reset to the start-of-session values before applying the other parameters you specify in the same call of phyreg
. Useful if you want to start a new project, or just be sure there's no previous setting that is lurking and affecting your results.