Learn R Programming

sitar (version 1.4.0)

velout: Identify outliers with abnormal velocity in growth curves

Description

Quickly identifies putative outliers in a large number of growth curves.

Usage

velout(x, y, id, data, lag = 1, velpower = 0.5, limit = 5, linearise = FALSE)

Value

Returns a data frame with columns: id, x, y (from the call), code (as described below), vel1, vel2 and vel3 (corresponding to the velocities AB, BC and AC above). The 'data' attribute contains the name of 'data'.

Code is a factor taking values between 0 and 8, with 0 normal (see table below). Values 1-6 depend on the pattern of abnormal velocities, while 7 and 8 indicate a duplicate age (7 for the first in an individual and 8 for later ones). Edge outliers, i.e. first or last for an individual, have just one velocity. Code 4 indicates a conventional outlier, with both AB and BC abnormal and AC normal. Code 6 is an edge outlier. Other codes are not necessarily outliers, e.g. codes 1 or 3 may be adjacent to a code 4. Use codeplot to look at individual curves, and zapvelout to delete outliers.

codeAB+BCACinterpretation
000no outlier
00NAno outlier
101rare pattern
210complicated - look at curve
311adjacent to simple outlier
420single outlier
521double outlier
61NAedge outlier
7--first duplicate age
8--later duplicate age

Arguments

x

age vector.

y

outcome vector, typically weight or height.

id

factor identifying each subject.

data

data frame containing x, y and id.

lag

lag between measurements for defining growth velocity.

velpower

a value, typically between 0 and 1, defining the power of delta x to use when calculating velocity as delta(y)/delta(x)^velpower. The default of 0.5 is midway between velocity and increment.

limit

the number of standard deviations beyond which a velocity is deemed to be an outlier.

linearise

if TRUE y is converted to a residual about the median curve of y versus x.

Author

Tim Cole tim.cole@ucl.ac.uk

Details

The algorithm works by viewing serial measurements in each growth curve as triplets (A-B-C) and comparing the velocities between them. Velocity is calculated as

diff(y, lag = lag) / diff(x, lag = lag) ^ velpower

Missing values for x or y are ignored. If any of the AB, BC or AC velocities are abnormal (more than limit SDs in absolute value from the median for the dataset) the code for B is non-zero.

See Also

codeplot, zapvelout

Examples

Run this code

outliers <- velout(age, height, id, heights, limit=3)

Run the code above in your browser using DataLab