mcs_mileage: Simulation of Unknown Covered Distances using a Monte Carlo Approach

Description

This function simulates distances for units where these are unknown, i.e. mileage = NA.

First, random numbers of the annual mileage distribution, estimated by dist_mileage, are drawn. Second, the drawn annual distances are converted with respect to the actual operating times (in days) using a linear relationship. See 'Details'.

Usage

mcs_mileage(
  mileage,
  time,
  status = NULL,
  id = paste0("ID", seq_len(length(time))),
  distribution = c("lognormal", "exponential")
)

Value

A list containing the following elements:

data : A tibble with classes wt_mcs_data and wt_reliability_data if status is provided. Since the class wt_reliability_data enables the direct usage of data inside estimate_cdf.wt_reliability_data, the required lifetime characteristic is automatically set to the distance mileage.

If status = NULL class is wt_mcs_data, which is not supported by estimate_cdf due to missing status.

The tibble contains the following columns:
- mileage : Simulated distances for unknown mileage and input distances for known mileage.
- time : Input operating times.
- status (optional) :
  - If argument status = NULL column status does not exist.
  - If argument status is provided the column contains the entered binary data (0 or 1).
- id : Identification of every unit.
sim_data : A tibble with column sim_mileage that holds the simulated distances for unknown mileage and 0 otherwise.
model_estimation : A list containing a named list ("mileage_distribution") with output of dist_mileage.

Arguments

mileage: A numeric vector of distances covered. Use NA for missing elements.
time: A numeric vector of operating times. Use NA for missing elements.
status: Optional argument. If used it has to be a vector of binary data (0 or 1) indicating whether unit i is a right censored observation (= 0) or a failure (= 1). The effect of status on the return is described in 'Value'.
id: A vector for the identification of every unit.
distribution: Supposed distribution of the random variable.

Details

Assumption of linear relationship: Imagine the distance of the vehicle is unknown. A distance of 3500.25 kilometers (km) was drawn from the annual distribution and the known operating time is 200 days (d). So the resulting distance of this vehicle is $$3500.25 km \cdot (\frac{200 d} {365 d}) = 1917.945 km$$

Examples

Run this code

# Data for examples:
date_of_registration <- c("2014-08-17", "2014-03-29", "2014-12-06",
                          "2014-09-09", "2014-05-14", "2014-07-01",
                          "2014-06-16", "2014-04-03", "2014-05-23",
                          "2014-05-09", "2014-05-31", "2014-08-12",
                          "2014-04-13", "2014-02-15", "2014-07-07",
                          "2014-03-12", "2014-05-27", "2014-06-02",
                          "2014-05-20", "2014-03-21", "2014-06-19",
                          "2014-02-12", "2014-03-27")
date_of_repair       <- c(NA, "2014-09-15", "2015-07-04", "2015-04-10", NA,
                          NA, "2015-04-24", NA, "2015-04-25", "2015-04-24",
                          "2015-06-12", NA, "2015-05-04", NA, NA, "2015-05-22",
                          NA, "2015-09-17", NA, "2015-08-15", "2015-11-26",
                          NA, NA)
date_of_analysis     <- "2015-12-31"

## Assume that mileage is only known for units that have failed (date_of_repair != NA).
mileage              <- c(NA, 15655, 13629, 18292, NA, NA, 33555, NA, 21737,
                          29870, 21068, NA, 122283, NA, NA, 36088, NA, 11153,
                          NA, 122842, 20349, NA, NA)

## time in service is the difference between repair and registration for failed
## items and the difference between date of analysis and date of registration
## for intact units.
time_in_service <- difftime(
  as.Date(date_of_repair, format = "%Y-%m-%d"),
  as.Date(date_of_registration, format = "%Y-%m-%d"),
  units = "days"
)
time_in_service[is.na(time_in_service)] <- difftime(
  as.Date(date_of_analysis, format = "%Y-%m-%d"),
  as.Date(date_of_registration[is.na(time_in_service)], format = "%Y-%m-%d"),
  units = "days"
)
time_in_service <- as.numeric(time_in_service)

# Example 1 - Reproducibility of drawn random numbers:
set.seed(1234)
mcs_distances <- mcs_mileage(
  mileage = mileage,
  time = time_in_service,
  distribution = "lognormal"
)

# Example 2 - MCS for distances assuming a exponential annual mileage distribution:
mcs_distances_2 <- mcs_mileage(
  mileage = mileage,
  time = time_in_service,
  distribution = "exponential"
)

status <- ifelse(!is.na(date_of_repair), 1, 0)

# Example 3 - MCS for distances using status:
mcs_distances_3 <- mcs_mileage(
  mileage = mileage,
  time = time_in_service,
  status = status,
  distribution = "lognormal"
)

## Using result of *$data in estimate_cdf()
prob_estimation <- estimate_cdf(
  x = mcs_distances_3$data,
  methods = "kaplan"
)

plot_prob_estimation <- plot_prob(prob_estimation)