outlier: Screen Data for Outliers

Description

A chosen column of a data frame is screened for outliers, outliers are marked and/or eliminated. Either absolute lower and upper limits are applied, or outliers are identified based on z-transformed data.

Usage

outlier(data, dv,  todo = "na", res.name = "outlier", upper.limit = NaN, lower.limit = NaN, limit.exact = FALSE, upper.z = NaN, lower.z = NaN, z.exact = FALSE, factors = NaN, z.keep = TRUE, z.name = "zscores", vsj = FALSE, print.summary = TRUE)

Arguments

data

A data frame containing the data to be screened as well was appropriate condition variables.

Character string specifying the name of the variable within data that is to be screened for outlier. Alternatively, dv can be the appropriate column index.

todo

Character string specifying the fate of outliers: "na" - outliers are turned into NAs, "elim" - rows containing outliers are deleted from dataframe, "nothing" - nothing happens, DEFAULT: todo = "na".

res.name

Character string specifying the name of the variable wherein outliers are marked, DEFAULT: res.name = "outlier".

upper.limit

An optional numerical specifying the absolute upper limit defining outliers.

lower.limit

An optional numerical specifying the absolute lower limit defining outliers.

limit.exact

Logical, if TRUE values equal to lower.limit/upper.limit are deemed outlier.

upper.z

An optional numerical specifying how much standard deviations within a cell a value must exceed to be identified as an outlier.

lower.z

An optional numerical specifying how much standard deviations within a cell a value must undercut to be identified as an outlier.

factors

A string or vector of strings (e.g., c("subject","condition")) stating the conditions that should be used for splitting the data.

z.exact

Logical, if TRUE z-values equal to lower.z/upper.z are deemed outlier.

z.keep

Logical, if TRUE, z-scores are stored in an additional column. If FALSE, z-scores are discarded after outlier correction.

z.name

Character string, specifying a name for the variable that should be used for storing z-scores.

vsj

To be implemented in a future version...

print.summary

Logical, if TRUE, a short summary on identified outliers is printed.

Value

outlier(data,...) returns the original data frame with the outlier correction applied. This data frame also has one additional column containing flags for outliers (0 = not suspicious, 1 = outlier). If z-scores are requested, these scores are retured as an additional column.

Details

If both, absolute limits and z-limits are specified, absolute limits are processed first and z-scores are computed for the remaining data points.

Description

Usage

Arguments

Value

Details

See Also