Learn R Programming

mbgraphic (version 1.0.1)

splines2d: Spline-based dependency measure for pairs of variables

Description

The function calculates a smoothing spline-based measure for quantifying functional dependencies between two variables. The function gam from package mgcv is used.

Usage

splines2d(x, y = NULL, binning = FALSE, b = 50, anchor = "min", parallel=FALSE)

Arguments

x

A numeric vector, a numeric matrix or a data frame. In case of a data frame only the numeric variables are used.

y

A numeric vector.

binning

A logical value. Whether or not binning should be used. TRUE, "equi" for equidistant binng, "quant" for quantile based binning or "hexb" for hexagonal binning. Default is FALSE.

b

A positive integer. Number of bins in each variable.

anchor

A chraracter string or a numeric value. How should the anchorpoint be chosen? "min" (default) for the minimum of each variable, "ggplot" for the method used in ggplot graphics, "nice" for a "pretty" anchorpoint, or a user specified value.

parallel

A logical value. Whether or not parallelization should be used. Default is FALSE.

Value

A numeric value decribing the value of the measure if a pair of vectors is given. Otherwise a data frame with the following variables:

splines2d

Value of the measure.

x1

Number of first variable

x2

Number of second variable.

nx1

Name of first variable (missing if x is not a data frame).

nx2

Name of second variable (missing if x is not a data frame).

tarvar

The variable which was use as target variable (delivered higher value in the measure).

Details

For each pair of variables x and y a model where x depends on y and a model where y depends on x are calculated. The proportions of the explained variance is calculated for both models and the maximum is returned. "cr" basis is used for faster calculation.

The number of start knots depends on the number of unique values in the independent variable. If the number is smaller than 20, 3 start knots are used, 10 otherwise.

The smoothing parameter is determined by cross validation.

References

S. N. Wood (2006) Generalized Additive Models: An Introduction with R. CRC Press, London.

S. N. Wood (2016). mgcv: Mixed GAM Computation Vehicle with GCV/AIC/REML Smoothness Estimation. https://cran.r-project.org/package=mgcv

See Also

gam in mgcv, dcor2d

Examples

Run this code
# NOT RUN {
data(Election2005)
# }
# NOT RUN {
# spline-based measure for all pairs of variables
spl <- splines2d(Election2005)

# order the pairs decreasing
o_spl <- spl[with(spl,order(spl[,1],decreasing=TRUE)),]

# show the 10 pairs with highest values
o_spl[1:10,]

# Show the 4 scatterplots with highest values
par(mfrow=c(2,2))
for(i in 1:4){
plot(with(Election2005,get(as.character(o_spl$nx1[i]))),
  with(Election2005,get(as.character(o_spl$nx2[i]))), 
  xlab=paste(o_spl$nx1[i]),ylab=paste(o_spl$nx2[i]),pch=19)
}
# }

Run the code above in your browser using DataLab