cor.sdf: Run a bivariate correlation on an edsurvey.data.frame.

Description

Computes the variance of x and correlation of x and y if these are vectors using an edsurvey.data.frame.

Usage

cor.sdf(x, y, data, method = c("Pearson", "Spearman", "Polychoric",
  "Polyserial"), weightVar = "default", reorder = NULL,
  schoolMergeVarStudent = NULL, schoolMergeVarSchool = NULL,
  omittedLevels = TRUE, defaultConditions = TRUE, recode = NULL)

Arguments

a character variable name of class factor from the data to be correlated with y.

a character variable name of class factor from the data to be correlated with x.

data

an edsurvey.data.frame.

method

a character string indicating which correlation coefficient (or covariance) is to be computed. One of “Pearson” (default), “Spearman”, “Polychoric”, or “Polyserial”.

weightVar

character indicating the weight variable to use; see Details.

reorder

a list to reorder variables. Defaults to NULL. Can be set as reorder = list(var1 = c("a","b","c"), var2 = c("4", "3", "2", "1")). See Examples.

schoolMergeVarStudent

a character variable name from the student file used to merge student and school data files. Set to NULL by default.

schoolMergeVarSchool

a character variable name name from the school file used to merge student and school data files. Set to NULL by default.

omittedLevels

a logical value. When set to the default value of TRUE, drops those levels of all factor variables that are specified in edsurvey.data.frame. Use print on an edsurvey.data.frame to see the omitted levels.

defaultConditions

a logical value. When set to the default value of TRUE, uses the default conditions stored in edsurvey.data.frame to subset the data. Use print on an edsurvey.data.frame to see the default conditions.

recode

a list of lists to recode variables. Defaults to NULL. Can be set as recode = list(var1 = list(from = c("a","b","c"), to = "d")). See Examples.

Value

An edsurvey.cor that has print and summary methods.

The class includes the following elements:

correlation

The estimated correlation coefficient.

Zse

Square root of the variance (Vimp + Vjrr).

correlates

A vector of length two showing the columns for which the correlation coefficient was calculated.

variables

correlates that are discrete.

order

A list that shows the order of each variable.

method

The type of correlation estimated.

Vjrr

The jackknife component of variance estimate.

Vimp

The imputation component of the variance estimate.

weight

The weight variable used.

npv

The number of plausible values used.

njk

The number of jackknife replicates used.

Details

Note that the getData arguments and recode may be useful. (See examples.) The correlation methods are calculated as described in seperate documentation.

Examples

Run this code

# NOT RUN {
# read in the example data (generated, not real student data)
sdf <- readNAEP(system.file("extdata/data", "M36NT2PM.dat", package = "NAEPprimer"))

# for two categorical variables any of the following work
c1_pears <- cor.sdf(x="b017451", y="b003501", data=sdf, method="Pearson",
                    weightVar="origwt")
c1_spear <- cor.sdf(x="b017451", y="b003501", data=sdf, method="Spearman",
                    weightVar="origwt")
c1_polyc <- cor.sdf(x="b017451", y="b003501", data=sdf, method="Polychoric",
                    weightVar="origwt")

c1_pears
c1_spear
c1_polyc

# these take awhile to calculate for large datasets, so limit to a subset
sdf_dnf <- subset(sdf, b003601 == 1)

# for a categorical variable and a scale score any of the following work
c2_pears <- cor.sdf(x="composite", y="b017451", data=sdf_dnf, method="Pearson",
                    weightVar="origwt")
c2_spear <- cor.sdf(x="composite", y="b017451", data=sdf_dnf, method="Spearman",
                    weightVar="origwt")
c2_polys <- cor.sdf(x="composite", y="b017451", data=sdf_dnf, method="Polyserial",
                    weightVar="origwt")

c2_pears
c2_spear
c2_polys

# recode two variables
cor.sdf(x="c046501", y="c044006", data=sdf, method="Spearman", weightVar="origwt",
        recode=list(c046501=list(from="0%",to="None"),
                    c046501=list(from=c("1-5%", "6-10%", "11-25%", "26-50%",
                                        "51-75%", "76-90%", "Over 90%"),
                                 to="Between 0% and 100%"),
                    c044006=list(from=c("1-5%", "6-10%", "11-25%", "26-50%",
                                        "51-75%", "76-90%", "Over 90%"),
                                 to="Between 0% and 100%")))

# reorder two variables
cor.sdf(x="b017451", y="sdracem", data=sdf, method="Spearman", weightVar="origwt", 
        reorder=list(sdracem=c("White", "Hispanic", "Black", "Asian/Pacific Island",
                               "Amer Ind/Alaska Natv", "Other"),
                     b017451=c("Every day", "2 or 3 times a week", "About once a week",
                               "Once every few weeks", "Never or hardly ever")))

# recode two variables and reorder
cor.sdf(x="pared", y="b013801", data=sdf, method="Spearman", weightVar = "origwt",
        recode=list(pared=list(from="Some ed after H.S.", to="Graduated H.S."), 
                    pared=list(from="Some ed after H.S.", to="Graduated H.S."),
                    pared=list(from="Graduated college", to="Graduated H.S."),
                    b013801=list(from="0-10", to="Less than 100"), 
                    b013801=list(from="11-25", to="Less than 100"),
                    b013801=list(from="26-100", to="Less than 100")),
        reorder=list(b013801=c("Less than 100", ">100")))
# }