LogisticReg: Downscaling using interpolation and logistic regression.

Description

This function performs a downscaling using an interpolation and a logistic regression. See multinom for further details. It is recommended that the observations are passed already in the target grid. Otherwise, the function will also perform an interpolation of the observed field into the target grid. The coarse scale and observation data can be either global or regional. In the latter case, the region is defined by the user. In principle, the coarse and observation data are intended to be of the same variable, although different variables can also be admitted.

Usage

LogisticReg(
  exp,
  obs,
  exp_cor = NULL,
  exp_lats,
  exp_lons,
  obs_lats,
  obs_lons,
  target_grid,
  int_method = NULL,
  log_reg_method = "ens_mean",
  probs_cat = c(1/3, 2/3),
  return_most_likely_cat = FALSE,
  points = NULL,
  method_point_interp = NULL,
  lat_dim = "lat",
  lon_dim = "lon",
  sdate_dim = "sdate",
  member_dim = "member",
  time_dim = "time",
  source_file_exp = NULL,
  source_file_obs = NULL,
  region = NULL,
  loocv = TRUE,
  ncores = NULL
)

Value

A list of three elements. 'data' contains the dowscaled data, that could be either in the form of probabilities for each category or the most likely category. 'lat' contains the downscaled latitudes, and 'lon' the downscaled longitudes.

Arguments

exp: an array with named dimensions containing the experimental field on the coarse scale for which the downscaling is aimed. The object must have, at least, the dimensions latitude, longitude, start date and member. The object is expected to be already subset for the desired region. Data can be in one or two integrated regions, e.g., crossing the Greenwich meridian. To get the correct results in the latter case, the borders of the region should be specified in the parameter 'region'. See parameter 'region'.
obs: an array with named dimensions containing the observational field. The object must have, at least, the dimensions latitude, longitude and start date. The object is expected to be already subset for the desired region.
exp_cor: an optional array with named dimensions containing the seasonal forecast experiment data. If the forecast is provided, it will be downscaled using the hindcast and observations; if not provided, the hindcast will be downscaled instead. The default value is NULL.
exp_lats: a numeric vector containing the latitude values in 'exp'. Latitudes must range from -90 to 90.
exp_lons: a numeric vector containing the longitude values in 'exp'. Longitudes can range from -180 to 180 or from 0 to 360.
obs_lats: a numeric vector containing the latitude values in 'obs'. Latitudes must range from -90 to 90.
obs_lons: a numeric vector containing the longitude values in 'obs'. Longitudes can range from -180 to 180 or from 0 to 360.
target_grid: a character vector indicating the target grid to be passed to CDO. It must be a grid recognised by CDO or a NetCDF file.
int_method: a character vector indicating the regridding method to be passed to CDORemap. Accepted methods are "con", "bil", "bic", "nn", "con2". If "nn" method is to be used, CDO_1.9.8 or newer version is required. For method "con2", CDO_2.2.2 or older version is required.
log_reg_method: a character vector indicating the logistic regression method to be used. Accepted methods are "ens_mean", "ens_mean_sd", "sorted_members". "ens_mean" uses the ensemble mean anomalies as predictors in the logistic regression, "ens_mean_sd" uses the ensemble mean anomalies and the ensemble spread (computed as the standard deviation of all the members) as predictors in the logistic regression, and "sorted_members" considers all the members ordered decreasingly as predictors in the logistic regression. Default method is "ens_mean".
probs_cat: a numeric vector indicating the percentile thresholds separating the climatological distribution into different classes (categories). Default to c(1/3, 2/3). See convert2prob.
return_most_likely_cat: if TRUE, the function returns the most likely category. If FALSE, the function returns the probabilities for each category. Default to FALSE.
points: a list of two elements containing the point latitudes and longitudes of the locations to downscale the model data. The list must contain the two elements named as indicated in the parameters 'lat_dim' and 'lon_dim'. If the downscaling is to a point location, only regular grids are allowed for exp and obs. Only needed if the downscaling is to a point location.
method_point_interp: a character vector indicating the interpolation method to interpolate model gridded data into the point locations. Accepted methods are "nearest", "bilinear", "9point", "invdist4nn", "NE", "NW", "SE", "SW". Only needed if the downscaling is to a point location.
lat_dim: a character vector indicating the latitude dimension name in the element 'data' in exp and obs. Default set to "lat".
lon_dim: a character vector indicating the longitude dimension name in the element 'data' in exp and obs. Default set to "lon".
sdate_dim: a character vector indicating the start date dimension name in the element 'data' in exp and obs. Default set to "sdate".
member_dim: a character vector indicating the member dimension name in the element 'data' in exp and obs. Default set to "member".
time_dim: a character vector indicating the time dimension name in the element 'data' in exp and obs. Default set to "time".
source_file_exp: a character vector with a path to an example file of the exp data. Only needed if the downscaling is to a point location.
source_file_obs: a character vector with a path to an example file of the obs data. Only needed if the downscaling is to a point location.
region: a numeric vector indicating the borders of the downscaling region. It consists of four elements in this order: lonmin, lonmax, latmin, latmax. lonmin refers to the left border, while lonmax refers to the right border. latmin indicates the lower border, whereas latmax indicates the upper border. If set to NULL (default), the function takes the first and last elements of the latitudes and longitudes in obs.
loocv: a logical vector indicating whether to perform leave-one-out cross-validation in the fitting of the logistic regression. In this procedure, all values from the corresponding year are excluded, so that when fitting the model for a given year, none of that year’s data is used. Default to TRUE.
ncores: an integer indicating the number of cores to use in parallel computation. The default value is NULL.

Author

J. Ramon, jaumeramong@gmail.com

E. Duzenli, eren.duzenli@bsc.es

Examples

Run this code

# \donttest{
exp <- rnorm(1500) 
dim(exp) <- c(member = 5, lat = 4, lon = 5, sdate = 15) 
exp_lons <- 1:5 
exp_lats <- 1:4 
obs <- rnorm(2700) 
dim(obs) <- c(lat = 12, lon = 15, sdate = 15) 
obs_lons <- seq(1,5, 4/14) 
obs_lats <- seq(1,4, 3/11) 
if (Sys.which("cdo") != "") {
res <- LogisticReg(exp = exp, obs = obs, exp_lats = exp_lats, exp_lons = exp_lons, 
                  obs_lats = obs_lats, obs_lons = obs_lons, int_method = 'bil', 
                  target_grid = 'r1280x640', probs_cat = c(1/3, 2/3))
}
# }