Learn R Programming

rmweather (version 0.2.62)

rmw_train_model: Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables.

Description

Function to train a random forest model to predict (usually) pollutant concentrations using meteorological and time variables.

Usage

rmw_train_model(
  df,
  variables,
  n_trees = 300,
  mtry = NULL,
  min_node_size = 5,
  keep_inbag = TRUE,
  n_cores = NA,
  verbose = FALSE
)

Value

A ranger model object, a named list.

Arguments

df

Input tibble after preparation with rmw_prepare_data. df has a number of constraints which will be checked for before modelling.

variables

Independent/explanatory variables used to predict "value".

n_trees

Number of trees to grow to make up the forest.

mtry

Number of variables to possibly split at in each node. Default is the (rounded down) square root of the number variables.

min_node_size

Minimal node size.

keep_inbag

Should in-bag data be kept in the ranger model object? This needs to be TRUE if standard errors are to be calculated when predicting with the model.

n_cores

Number of CPU cores to use for the model calculation. Default is system's total minus one.

verbose

Should the function give messages?

Author

Stuart K. Grange

See Also

rmw_prepare_data, rmw_normalise

Examples

Run this code

# \donttest{

# Load package
library(dplyr)

# Keep things reproducible
set.seed(123)

# Prepare example data
data_london_prepared <- data_london %>% 
  filter(variable == "no2") %>% 
  rmw_prepare_data()

# Calculate a model using common meteorological and time variables
model <- rmw_train_model(
  data_london_prepared,
  variables = c(
    "ws", "wd", "air_temp", "rh", "date_unix", "day_julian", "weekday", "hour"
  ),
  n_trees = 300
)

# }

Run the code above in your browser using DataLab