milkloss_detect: Identify milk loss events and resilience indicators from daily milk yields

Description

Identify milk loss events and resilience indicators from daily milk yields

Usage

milkloss_detect(
  data,
  id_col,
  dim_col,
  MY_col = "MY_real",
  MY_pred,
  dim_start = 1L,
  dim_end = 305L,
  rec_mode = c("pctbase", "band", "resid"),
  drop_pct = 0.1,
  min_len = 1L,
  tol = 0.05,
  stick = 3L,
  rec = 1
)

Value

A list with two data frames:

episodes: individual milk loss events and their resilience indicators;
aggregates: milk loss events aggregated per individual.

The resilience indicators identified are described in the Details section.

Arguments

data: A data frame containing the observed and predicted daily milking records.
id_col: The name of the column containing the individual IDs.
dim_col: The name of the column containing the days in milk.
MY_col: The name of the column containing the observed milk yield.
MY_pred: The name of the column containing the predicted milk yield (baseline).
dim_start: The first day in milk to consider when identifying milk loss events and resilience indicators.
dim_end: The last day in milk to consider when identifying milk loss events and resilience indicators.
rec_mode: How "recovery" is defined. One of: "pctbase": recovery when the observed value reaches a given fraction of the baseline (rec), for a given number of consecutive days (stick); "band": recovery when the observation is inside a tolerance band around the baseline (+/- tol), for at least stick consecutive days; "resid": recovery when the residual has improved enough from the nadir (by a fraction rec of the nadir's absolute residual) for stick consecutive days.
drop_pct: Minimum relative drop from the anchor (baseline reference) to accept an episode.
min_len: Minimum number of consecutive days with negative residuals required to define an episode.
tol: Used when the "band" mode is selected. Half-width of the tolerance band around baseline in relative terms.
stick: Minimum number of consecutive days in recovery to consider an episode finished.
rec: Minimum relative recovery from the nadir to finish an episode (used in "pctbase" and "resid" modes).

Details

The function computes several descriptors of milk-yield perturbation episodes.

1) Nadir (day of minimum)

The worst day inside the episode (deepest point of the perturbation).

t_hat = argmin_{t in [t_start, t_end]} obs(t)

Nadir = obs(t_hat)

where t_start and t_end are the episode boundaries.

2) Amplitude (drop)

Depth of the dip relative to the baseline at the episode start.

A = baseline(t_start) - obs(t_hat)

Some variants use baseline(t_hat) instead of baseline(t_start); here the start of the episode is used as the reference.

3) ML_per_event (AUD)

Total milk lost (in baseline units) over the episode, i.e., the integrated milk deficit.

ML_per_event = AUD = sum_{t=t_start..t_end} [baseline(t) - obs(t)]

In discrete data, AUD is computed with day-weighting: each observation contributes

(baseline(t) - obs(t)) * delta_days

where delta_days is the gap to the next observed DIM (last day weight = 1).

4) Time-to-baseline (TTB)

Time after the nadir until the profile returns to (and stays near) the baseline.

Recovery is declared when obs(t) re-enters a tolerance band around the baseline and stays there for stick consecutive days (controlled by tol and stick).

We find the smallest tau >= 0 such that for all u in the interval from t_hat + tau to t_hat + tau + stick - 1:

abs(obs(u) - baseline(u)) <= tol * baseline(u)

Then: TTB = tau.

If this condition is never satisfied before DIM 305, TTB is set to NA (right-censored).

5) Recovery half-life (t_1_2)

Earliest time after nadir when half of the drop has been recovered.

With amplitude A as above, define the half-recovery level:

L_half = baseline(t_start) - A / 2

Then:

t_1_2 = min{tau >= 0 : obs(t_hat + tau) >= L_half}

6) Slopes (decline and recovery)

Average daily change during the decline into the nadir and during early recovery, summarizing the episode shape.

For a K-day local window:

DeclineSlope = (obs(min(t_hat, t_start + K)) - obs(t_start)) / (min(t_hat, t_start + K) - t_start)

RecoverySlope = (obs(min(t_end, t_hat + K)) - obs(t_hat)) / (min(t_end, t_hat + K) - t_hat)

7) AUC_deviation

Trapezoidal area under the curve of the milk deficit baseline(t) - obs(t) across the whole episode. It summarizes how much milk was lost and for how long.

Conceptually:

AUC_deviation = integral_{t_start..t_end} [baseline(t) - obs(t)] dt

In practice this is approximated via the trapezoidal rule on discrete DIMs.

8) prod_decline_slope_amp

Product of the decline slope (anchor -> nadir) and the amplitude (anchor - nadir). It combines speed and depth of the decline into a single indicator of how "aggressive" the drop is.

prod_decline_slope_amp = DeclineSlope * A

9) prod_recovery_slope_TTB

Product of the recovery slope (nadir -> recovery) and time-to-baseline (TTB). It combines how fast the animal recovers with how long recovery takes, summarizing recovery efficiency.

prod_recovery_slope_TTB = RecoverySlope * TTB