A chosen column of a data frame is screened for outliers, outliers are marked and/or eliminated. Either absolute lower and upper limits are applied, or outliers are identified based on z-transformed data. Either exact limits and/or cutoffs based on z-values need to be entered.
outlier(data, dv,
todo = "na", res.name = "outlier",
upper.limit = NaN, lower.limit = NaN,
limit.exact = FALSE,
upper.z = NaN, lower.z = NaN,
z.exact = FALSE, factors = NaN,
z.keep = TRUE, z.name = "zscores",
vsj = FALSE,
print.summary = TRUE)
outlier(data,...)
returns the original data frame with the outlier correction applied. This data frame also has one additional column containing flags for outliers (0
= not suspicious, 1
= outlier). If z-scores are requested, these scores are retured as an additional column.
A data frame containing the data to be screened as well was appropriate condition variables.
Character string specifying the name of the variable within data
that is to be screened for outlier. Alternatively, dv
can be the appropriate column index.
Character string specifying the fate of outliers: "na"
- outliers are turned into NAs, "elim"
- rows containing outliers are deleted from dataframe, "nothing"
- nothing happens, default=todo = "na"
.
Character string specifying the name of the variable to be used for marking outliers, default=res.name = "outlier"
.
An optional numerical specifying the absolute upper limit defining outliers.
An optional numerical specifying the absolute lower limit defining outliers.
Logical, if TRUE
values equal to lower.limit
/upper.limit
are deemed outlier.
An optional numerical specifying how much standard deviations within a cell a value must exceed to be identified as an outlier.
An optional numerical specifying how much standard deviations within a cell a value must undercut to be identified as an outlier.
A string or vector of strings (e.g., c("subject","condition")
) stating the conditions that should be used for splitting the data.
Logical, if TRUE
z-values equal to lower.z
/upper.z
are deemed outlier.
Logical, if TRUE
, z-scores are stored in an additional column. If FALSE
, z-scores are discarded after outlier correction.
Character string, specifying a name for the variable that should be used for storing z-scores.
To be implemented in a future version...
Logical, if TRUE
, a short summary on identified outliers is printed.
Markus Janczyk, Roland Pfister
If both, absolute limits and z-limits are specified, absolute limits are processed first and z-scores are computed for the remaining data points.
split
; zscores
;