setCor(y,x,data,z=NULL,n.obs=NULL,use="pairwise",std=TRUE,square=FALSE,
main="Regression Models")
setCor.diagram(sc,main="Regression model",digits=2,show=TRUE,...)
set.cor(y,x,data,z=NULL,n.obs=NULL,use="pairwise",std=TRUE,square=FALSE,
main="Regression Models")
mat.regress(y, x,data, z=NULL,n.obs=NULL,use="pairwise",square=FALSE)
Unfortunately, the R2 is sensitive to one of the canonical correlations being very high. An alternative, T2, is the proportion of additive variance and is the average of the squared canonicals. (Cohen et al., 2003), see also Cramer and Nicewander (1979). This average, because it includes some very small canonical correlations, will tend to be too small. Cohen et al. admonition is appropriate: "In the final analysis, however, analysts must be guided by their substantive and methodological conceptions of the problem at hand in their choice of a measure of association." ( p613).
Yet another measure of the association between two sets is just the simple, unweighted correlation between the two sets. That is,
$$R_{uw} =\frac{ 1 R_{xy} 1' }{(1R_{yy}1')^{.5} (1R_{xx}1')^{.5}}$$ where Rxy is the matrix of correlations between the two sets. This is just the simple (unweighted) sums of the correlations in each matrix. This technique exemplifies the robust beauty of linear models and is particularly appropriate in the case of one dimension in both x and y, and will be a drastic underestimate in the case of items where the betas differ in sign.
When finding the unweighted correlations, as is done in alpha
, items are flipped so that they all are positively signed.
A typical use in the SAPA project is to form item composites by clustering or factoring (see fa
,ICLUST
, principal
), extract the clusters from these results (factor2cluster
), and then form the composite correlation matrix using cluster.cor
. The variables in this reduced matrix may then be used in multiple R procedures using set.cor
.
Although the overall matrix can have missing correlations, the correlations in the subset of the matrix used for prediction must exist. If the number of observations is entered, then the conventional confidence intervals, statistical significance, and shrinkage estimates are reported. If the input is rectangular, correlations or covariances are found from the data. The print function reports t and p values for the beta weights, the summary function just reports the beta weights.
J. Cohen, P. Cohen, S.G. West, and L.S. Aiken. (2003) Applied multiple regression/correlation analysis for the behavioral sciences. L. Erlbaum Associates, Mahwah, N.J., 3rd ed edition.
H. Hotelling. (1936) Relations between two sets of variates. Biometrika 28(3/4):321-377.
E.Cramer and W. A. Nicewander (1979) Some symmetric, invariant measures of multivariate association. Psychometrika, 44:43-54.
cluster.cor
, factor2cluster
,principal
,ICLUST
, link{cancor}
and cca in the yacca package.#the Kelly data from Hoteling
kelly <- structure(list(speed = c(1, 0.4248, 0.042, 0.0215, 0.0573), power = c(0.4248,
1, 0.1487, 0.2489, 0.2843), words = c(0.042, 0.1487, 1, 0.6693,
0.4662), symbols = c(0.0215, 0.2489, 0.6693, 1, 0.6915), meaningless = c(0.0573,
0.2843, 0.4662, 0.6915, 1)), .Names = c("speed", "power", "words",
"symbols", "meaningless"), class = "data.frame", row.names = c(NA,
-5L))
kelly
setCor(1:2,3:5,kelly)
#Hotelling reports canonical correlations of .3073 and .0583 or squared correlations of
# 0.09443329 and 0.00339889 vs. our values of 0.0946 0.0035,
setCor(y=c(7:9),x=c(1:6),data=Thurstone,n.obs=213)
#now try partialling out some variables
set.cor(y=c(7:9),x=c(1:3),z=c(4:6),data=Thurstone) #compare with the previous
#compare complete print out with summary printing
sc <- setCor(x=c("gender","education"),y=c("SATV","SATQ"),data=sat.act) # regression from raw data
sc
summary(sc)
Run the code above in your browser using DataLab