Learn R Programming

ShapeSelectForest (version 1.1)

shape: Shape Selection

Description

Given a predictor vector $\bold{x}$, e.g., years, and a matrix $\bold{ymat}$ whose columns are response vectors, e.g., Landsat signals. The shape routine will select a shape that is the best fit for each response vector according to the Bayes information criterion (BIC) or the cone information criterion (CIC).

Usage

shape(x, ymat, infocrit = "CIC", flat = TRUE, dec = TRUE, jp = TRUE, 
invee = TRUE, vee = TRUE, inc = TRUE, db = TRUE, nsim = 1e+3, 
edf0 = NULL, get.edf0 = FALSE, random = FALSE, msg = TRUE)

Arguments

x
A $n$ by $1$ predictor vector, for example, years.
ymat
A $n$ by $N$ matrix whose columns are response vectors corresponding to x, for example, Landsat signals.
infocrit
The criterion used to select the best shape for a scatterplot. It can either be the Bayes information criterion (BIC) or the cone information criterion (CIC).
flat
A logical flag. If it is TRUE, there is a flat shape choice; otherwise, there is no such a shape option.
dec
A logical flag. If it is TRUE, there is a decreasing shape choice; otherwise, there is no such a shape option.
jp
A logical flag. If it is TRUE, there is a one-jump shape choice; otherwise, there is no such a shape option.
invee
A logical flag. If it is TRUE, there is an inverted-vee shape choice; otherwise, there is no such a shape option.
vee
A logical flag. If it is TRUE, there is a vee shape choice; otherwise, there is no such a shape option.
inc
A logical flag. If it is TRUE, there is an increasing shape choice; otherwise, there is no such a shape option.
db
A logical flag. If it is TRUE, there is a double-jump shape choice; otherwise, there is no such a shape option. The routine is usually slower when there is a double-jump shape choice than it is when there is no such a choice.
nsim
Number of simulations used to get the edf0 vector. The default is nsim = 1e+3. See references in this section for more details about edf0.
edf0
The edf0 given by the user. When $\bold{x}$ is an equally spaced vector whose number of elements is between $20$ and $40$. The user doesn't need to provide an edf0 vector; otherwise, the user has to set get.edf0 to be TRUE such that the shape routine will
get.edf0
A logical flag. When $\bold{x}$ is not an equally spaced vector whose number of elements is between $20$ and $40$. The user has to set get.edf0 to be TRUE such that the shape routine will simulate an edf0 vector, or the user can choose to simulate an edf0
random
A parameter used by the maintainer to test if each shape option can be both included and excluded.
msg
A logical flag. If msg is TRUE, then a warning message will be printed when there is a non-convergence problem; otherwise no warning message will be printed. The default is msg = TRUE

Value

  • shapeA $N$ by $1$ vector. The $i$th element is the best shape for each of the $i$th scatterplot.
  • icA $k$ by $N$ matrix where the $i$th column is the vector of "BIC" or "CIC" values used to choose the best shape for the $i$th scatterplot. $k$ is the number of shapes allowed by the user.
  • thetabA $n$ by $N$ matrix where the $i$th column is the vector of predicted values for the chosen shape for the $i$th scatterplot.
  • xThe argument x.
  • ymatThe argument ymat.
  • infocritThe argument infocrit.
  • kThe number of knots used.
  • bsA list of coefficient vectors. Each vector is the vector of coefficients for regression basis functions for each scatterplot.
  • ijpsA list storing the position of the first jump for scatterplots whose best shape is one-jump or double-jump. It also stores the position of the knot from where $\bold{f}$ starts increasing (decreasing) for scatterplots whose best shape is vee (inverted vee).
  • jjpsA list storing the position of the second jump for scatterplots whose best shape is double-jump.
  • m_isA vector storing the centering values for the first ramp edge for scatterplots whose best shape is one-jump or double-jump.
  • m_jsA vector storing the centering values for the second ramp edge for scatterplots whose best shape is double-jump.

bold

  • R
  • coneproj
  • R
  • coneproj

url

https://cran.r-project.org/package=coneproj

Details

Given a scatterplot of $(x_i, y_i)$, $i=1,\ldots,n$, where $\bold{x}$ could be a vector of years and $\bold{y}$ could be a vector of Landsat signals, constrained least-squares spline fits are obtained for the following shapes:
  • 1. flat
2. decreasing 3. one-jump, i.e., decreasing, jump up, decreasing 4. inverted vee (increasing then decreasing) 5. vee (decreasing then increasing) 6. linear increasing 7. double-jump, i.e., decreasing, jump up, decreasing, jump up, decreasing.

References

Meyer, M. C. (2013a) Semi-parametric additive constrained regression. Journal of Nonparametric Statistics 25(3), 715.

Meyer, M. C. (2013b) A simple new algorithm for quadratic programming with applications in statistics. Communications in Statistics 42(5), 1126--1139.

Liao, X. and M. C. Meyer (2014) coneproj: An R package for the primal or dual cone projections with routines for constrained regression. Journal of Statistical Software 61(12), 1--22.

See Also

plotshape, edf0s

Examples

Run this code
# import the matrix of Landsat signals 
	data("ymat")

	# define the predictor vector: the year 1985 to the year 2010	
	x <- 1985:2010
# Example 1:	
	# call the shape routine allowing a double jump shape using "BIC"
	ans <- shape(x, ymat, "BIC")
	plotshape(ans, ids = 1:6, both = TRUE, form = TRUE)
	# Example 2:
	# call the shape routine not allowing a double jump shape using "CIC"
	ans <- shape(x, ymat, "CIC", db = FALSE)
	plotshape(ans, ids = 1:6, both = TRUE, form = TRUE)

Run the code above in your browser using DataLab