Package: |
phyreg |
Type: |
Package |
Version: |
0.7 |
Date: |
2014-02-08 |
License: |
GPL-2 | GPL-3 |
This package provides all the basic output for all the regressions we've discussed. As the PGLS and independent contrasts regressions are equivalent, you can find the residual sums of squares, parameter estimates, t-tests for them, etc, in the output from the "long regression". The single contrasts regression is the test provided by the package, and its detailed output is under the "short regression". One element of the PGLS output is not provided by the long regression, namely the fitted values, so the output of phyreg
contains them as pglsFVx
and pglsFVxz
, for the control and control+test models, respectively. From them you can calculate the residuals as y-variable minus fitted values. It is important, however, to say that the only hypothesis testing that is justified is the F-ratio provided by phyreg
-- the others you can obtain or construct are invalid, and the only use I can see for them is to show quite how wrong they are.
If it sounds like I'm saying "I was right 25 years ago, and am still right today", you have understood me exactly, but this is not a matter of mere assertion. The key question is whether the single contrasts I extract are the right ones. I am not at the time of writing aware of any substantial discussion of that point -- perhaps it was too technical at the time. You may like to read sections 3(b) and 3(c) of Grafen (1989), and also Grafen (1992), which explicitly tackles the question of whether other contrasts might work, and adds some further points. For example, if the test is for adding two independent variables X and Z, then we may well desire a test that would give the same answer if we instead proposed to add X-Z and X+Z, as they contain the same information -- contrasts based on test variables cannot do this, while it is automatic under the phylogenetic regression. Unless all those arguments are overturned, to my mind the phylogenetic regression is indeed the (only) right way to do linear models on comparative data. Fortunately, it is now available in R!
There is one unusual feature of the method that I draw attention to here. In all linear models, the logic of each test involves controlling for some model terms while adding some test term(s). In simple cases such as ordinary multiple regression, the same analysis will give lots of tests and the user need only work out what the (often implicit) control and test terms are for each test. For example, in usual regressions, Type I sums of squares test for each variable, controlling for all previous variables, while Type III sums of squares test for each variable, controlling for all the others. With the phylogenetic regression, only one test is performed for a given analysis, and it is necessary to be explicit each time about the control terms and the test terms. This is because the single contrast taken across the daughters of one higher node depends upon the residuals in the control model, and so one analysis must always have the same control variables in tests it provides. In principle, then, we could have a single fixed set of control variables, but a number of different sets of test variables -- but currently, only one set of test variables is handled. (A major gain in efficiency can be obtained for this situation on the assumption you have no missing values. Obtain rho for an analysis with a given set of control variables, and then set rho in the arguments of subsequent calls to phyreg with the same control variables. This works because rho is fitted just to the control model, and most of the execution time is spent fitting rho. But the fitted value of rho will also depend on which species are included, hence the caveat about missing values.)
One key conceptual difference between Felsenstein's 1985 paper and my 1989 paper is that Felsenstein assumed that the traits themselves undergo Brownian motion over evolutionary time, whereas I did my analysis on the basis that only the error term does so. (This point is rediscovered every now and then.) It is important because (i) it is a much smaller assumption (ii) it allows independent variables to be categorical (iii) it means that as we add independent variables to a model, the correct branch lengths may well change, because the error before you add X includes X, while afterwards it doesn't, so removing a variable that explains a lot of variation between major groups may well mean we should be altering our branch lengths to increase the relative lengths of those nearer the species end. Thus there is no hard and fast principle about branch lengths reflecting durations between splits, or indeed anything else - they must treated pragmatically. I also don't see any merit in alternative evolutionary processes (such as Ornstein-Uhlenbeck) when applied to the error -- however appropriate they may or may not be for the traits themselves.
Full details and examples are given under phyreg
phyreg