rmw_train_model: Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables.

Description

Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables.

Usage

rmw_train_model(
  df,
  variables,
  n_trees = 300,
  mtry = NULL,
  min_node_size = 5,
  keep_inbag = TRUE,
  n_cores = NA,
  verbose = FALSE
)

Value

A ranger model object, a named list.

Arguments

df: Input tibble after preparation with rmw_prepare_data. df has a number of constraints which will be checked for before modelling.
variables: Independent/explanatory variables used to predict "value".
n_trees: Number of trees to grow to make up the forest.
mtry: Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables.
min_node_size: Minimal node size.
keep_inbag: Should in-bag data be kept in the ranger model object? This needs to be TRUE if standard errors are to be calculated when predicting with the model.
n_cores: Number of CPU cores to use for the model calculation. Default is system's total minus one.
verbose: Should the function give messages?

Author

Stuart K. Grange

Examples

Run this code


# \donttest{

# Load package
library(dplyr)

# Keep things reproducible
set.seed(123)

# Prepare example data
data_london_prepared <- data_london %>% 
  filter(variable == "no2") %>% 
  rmw_prepare_data()

# Calculate a model using common meteorological and time variables
model <- rmw_train_model(
  data_london_prepared,
  variables = c(
    "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour"
  ),
  n_trees = 300
)

# }

Run the code above in your browser using DataLab

Description

Usage

Value

Arguments

Author

See Also

Examples