daily_response: daily_response

Description

Function calculates all possible values of a selected statistical metric between one or more response variables and daily sequences of environmental data. Calculations are based on moving window which is defined with two arguments: window width and a location in a matrix of daily sequences of environmental data. Window width could be fixed (use fixed_width) or variable width (use lower_limit and upper_limit arguments). In this case, all window widths between lower and upper limit will be used. All calculated metrics are stored in a matrix. The location of stored calculated metric in the matrix is indicating a window width (row names) and a location in a matrix of daily sequences of environmental data (column names).

Usage

daily_response(
  response,
  env_data,
  method = "cor",
  metric = "r.squared",
  cor_method = "pearson",
  lower_limit = 30,
  upper_limit = 90,
  fixed_width = 0,
  previous_year = FALSE,
  neurons = 1,
  brnn_smooth = TRUE,
  remove_insignificant = FALSE,
  alpha = 0.05,
  row_names_subset = FALSE,
  PCA_transformation = FALSE,
  log_preprocess = TRUE,
  components_selection = "automatic",
  eigenvalues_threshold = 1,
  N_components = 2,
  aggregate_function = "mean",
  temporal_stability_check = "sequential",
  k = 2,
  k_running_window = 30,
  cross_validation_type = "blocked",
  subset_years = NULL,
  plot_specific_window = NULL,
  ylimits = NULL,
  seed = NULL,
  tidy_env_data = FALSE,
  reference_window = "start",
  boot = FALSE,
  boot_n = 1000,
  boot_ci_type = "norm",
  boot_conf_int = 0.95,
  day_interval = ifelse(c(previous_year == TRUE, previous_year == TRUE), c(-1, 366),
    c(1, 366)),
  dc_method = NULL,
  dc_nyrs = NULL,
  dc_f = 0.5,
  dc_pos.slope = FALSE,
  dc_constrain.nls = c("never", "when.fail", "always"),
  dc_span = "cv",
  dc_bass = 0,
  dc_difference = FALSE,
  cor_na_use = "everything"
)

Value

a list with 17 elements:

$calculations - a matrix with calculated metrics
$method - the character string of a method
$metric - the character string indicating the metric used for calculations
$analysed_period - the character string specifying the analysed period based on the information from row names. If there are no row names, this argument is given as NA
$optimized_return - data frame with two columns, response variable and aggregated (averaged) daily data that return the optimal results. This data.frame could be directly used to calibrate a model for climate reconstruction
$optimized_return_all - a data frame with aggregated daily data, that returned the optimal result for the entire env_data (and not only subset of analysed years)
$transfer_function - a ggplot object: scatter plot of optimized return and a transfer line of the selected method
$temporal_stability - a data frame with calculations of selected metric for different temporal subsets
$cross_validation - a data frame with cross validation results
$plot_heatmap - ggplot2 object: a heatmap of calculated metrics
$plot_extreme - ggplot2 object: line plot of a row with the highest value in a matrix of calculated metrics
$plot_specific - ggplot2 object: line plot of a row with a selected window width in a matrix of calculated metrics
$PCA_output - princomp object: the result output of the PCA analysis
$type - the character string describing type of analysis: daily or monthly
$reference_window - character string, which reference window was used for calculations
$boot_lower - matrix with lower limit of confidence intervals of bootstrap calculations
$boot_upper - matrix with upper limit of confidence intervals of bootstrap calculations
$aggregated_climate - matrix with all aggregated climate series

Arguments

response: a data frame with tree-ring proxy variables as columns and (optional) years as row names. Row.names should be matched with those from a env_data data frame. If not, set row_names_subset = TRUE.
env_data: a data frame of daily sequences of environmental data as columns and years as row names. Each row represents a year and each column represents a day of a year. Row.names should be matched with those from a response data frame. If not, set row_names_subset = TRUE. Alternatively, env_data could be a tidy data with three columns, i.e. Year, DOY and third column representing values of mean temperatures, sum of precipitation etc. If tidy data is passed to the function, set the argument tidy_env_data to TRUE.
method: a character string specifying which method to use. Current possibilities are "cor" (default), "lm" and "brnn".
metric: a character string specifying which metric to use. Current possibilities are "r.squared" and "adj.r.squared". If method = "cor", metric is not relevant.
cor_method: a character string indicating which correlation coefficient is to be computed. One of "pearson" (default), "kendall", or "spearman".
lower_limit: lower limit of window width
upper_limit: upper limit of window width
fixed_width: fixed width used for calculation. If fixed_width is assigned a value, upper_limit and lower_limit will be ignored
previous_year: if set to TRUE, env_data and response variables will be rearranged in a way, that also previous year will be used for calculations of selected statistical metric.
neurons: positive integer that indicates the number of neurons used for brnn method
brnn_smooth: if set to TRUE, a smoothing algorithm is applied that removes unrealistic calculations which are a result of neural net failure.
remove_insignificant: if set to TRUE, removes all correlations bellow the significant threshold level, based on a selected alpha. For "lm" and "brnn" method, squared correlation is used as a threshold
alpha: significance level used to remove insignificant calculations.
row_names_subset: if set to TRUE, row.names are used to subset env_data and response data frames. Only years from both data frames are kept.
PCA_transformation: if set to TRUE, all variables in the response data frame will be transformed using PCA transformation.
log_preprocess: if set to TRUE, variables will be transformed with logarithmic transformation before used in PCA
components_selection: character string specifying how to select the Principal Components used as predictors. There are three options: "automatic", "manual" and "plot_selection". If argument is set to automatic, all scores with eigenvalues above 1 will be selected. This threshold could be changed by changing the eigenvalues_threshold argument. If parameter is set to "manual", user should set the number of components with N_components argument. If components selection is set to "plot_selection", Scree plot will be shown and a user must manually enter the number of components to be used as predictors.
eigenvalues_threshold: threshold for automatic selection of Principal Components
N_components: number of Principal Components used as predictors
aggregate_function: character string specifying how the daily data should be aggregated. The default is 'mean', the other options are 'median', 'sum', 'min' and 'max'
temporal_stability_check: character string, specifying, how temporal stability between the optimal selection and response variable(s) will be analysed. Current possibilities are "sequential", "progressive" and "running_window". Sequential check will split data into k splits and calculate selected metric for each split. Progressive check will split data into k splits, calculate metric for the first split and then progressively add 1 split at a time and calculate selected metric. For running window, select the length of running window with the k_running_window argument.
k: integer, number of breaks (splits) for temporal stability and cross validation analysis.
k_running_window: the length of running window for temporal stability check. Applicable only if temporal_stability argument is set to running window.
cross_validation_type: character string, specifying, how to perform cross validation between the optimal selection and response variables. If the argument is set to "blocked", years will not be shuffled. If the argument is set to "randomized", years will be shuffled.
subset_years: a subset of years to be analyzed. Should be given in the form of subset_years = c(1980, 2005)
plot_specific_window: integer representing window width to be displayed for plot_specific
ylimits: limit of the y axes for plot_extreme and plot_specific. It should be given in the form of: ylimits = c(0,1)
seed: optional seed argument for reproducible results
tidy_env_data: if set to TRUE, env_data should be inserted as a data frame with three columns: "Year", "DOY", "Precipitation/Temperature/etc."
reference_window: character string, the reference_window argument describes, how each calculation is referred. There are three different options: 'start' (default), 'end' and 'middle'. If the reference_window argument is set to 'start', then each calculation is related to the starting day of window. If the reference_window argument is set to 'middle', each calculation is related to the middle day of window calculation. If the reference_window argument is set to 'end', then each calculation is related to the ending day of window calculation. For example, if we consider correlations with window from DOY 15 to DOY 35. If reference window is set to 'start', then this calculation will be related to the DOY 15. If the reference window is set to 'end', then this calculation will be related to the DOY 35. If the reference_window is set to 'middle', then this calculation is related to DOY 25. The optimal selection, which describes the optimal consecutive days that returns the highest calculated metric and is obtained by the $plot_extreme output, is the same for all three reference windows.
boot: logical, if TRUE, bootstrap procedure will be used to calculate estimates correlation coefficients, R squared or adjusted R squared metrices
boot_n: The number of bootstrap replicates
boot_ci_type: A character string representing the type of bootstrap intervals required. The value should be any subset of the values c("norm","basic", "stud", "perc", "bca").
boot_conf_int: A scalar or vector containing the confidence level(s) of the required interval(s)
day_interval: a vector of two values: lower and upper time interval of days that will be used to calculate statistical metrics. Negative values indicate previous growing season days. This argument overwrites the calculation limits defined by lower_limit and upper_limit arguments.
dc_method: a character string to determine the method to detrend climate (environmental) data. Possible values are c("Spline", "ModNegExp", "Mean", "Friedman", "ModHugershoff"). Defaults to "none" (see dplR R package).
dc_nyrs: a number giving the rigidity of the smoothing spline, defaults to 0.67 of series length if nyrs is NULL (see dplR R package).
dc_f: a number between 0 and 1 giving the frequency response or wavelength cutoff. Defaults to 0.5 (see dplR R package).
dc_pos.slope: a logical flag. Will allow for a positive slope to be used in method "ModNegExp" and "ModHugershoff". If FALSE the line will be horizontal (see dplR R package).
dc_constrain.nls: a character string which controls the constraints of the "ModNegExp" model and the "ModHugershoff" (see dplR R package).
dc_span: a numeric value controlling method "Friedman", or "cv" (default) for automatic choice by cross-validation (see dplR R package).
dc_bass: a numeric value controlling the smoothness of the fitted curve in method "Friedman" (see dplR R package).
dc_difference: a logical flag. Compute residuals by subtraction if TRUE, otherwise use division (see dplR R package).
cor_na_use: an optional character string giving a method for computing covariances in the presence of missing values for correlation coefficients. This must be (an abbreviation of) one of the strings "everything" (default), "all.obs", "complete.obs", "na.or.complete", or "pairwise.complete.obs". See also the documentation for the base cor() function.

Examples

Run this code

# \donttest{
# Load the dendroTools R package
library(dendroTools)

# Load data
data(data_MVA)
data(data_TRW)
data(data_TRW_1)
data(example_proxies_individual)
data(example_proxies_1)
data(LJ_daily_temperatures)

# 1 Example with fixed width. Lower and upper limits are ignored.
example_daily_response <- daily_response(response = data_MVA,
    env_data = LJ_daily_temperatures,
    method = "cor", fixed_width = 40, cor_method = "spearman",
    row_names_subset = TRUE, previous_year = TRUE,
    remove_insignificant = TRUE, boot = TRUE,
    alpha = 0.005, aggregate_function = 'mean',
    reference_window = "start")

summary(example_daily_response)
plot(example_daily_response, type = 1)
plot(example_daily_response, type = 2)

# 2 Example for past and present. Use subset_years argument.
example_MVA_early <- daily_response(response = data_MVA,
    env_data = LJ_daily_temperatures, cor_method = "kendall",
    method = "cor", lower_limit = 21, upper_limit = 90,
    row_names_subset = TRUE, previous_year = TRUE,
    remove_insignificant = TRUE, alpha = 0.05,
    plot_specific_window = 60, subset_years = c(1940, 1980),
    aggregate_function = 'sum')

example_MVA_late <- daily_response(response = data_MVA,
    env_data = LJ_daily_temperatures,
    method = "cor", lower_limit = 21, upper_limit = 60,
    row_names_subset = TRUE, previous_year = TRUE,
    remove_insignificant = TRUE, alpha = 0.05,
    plot_specific_window = 60, subset_years = c(1981, 2010),
    aggregate_function = 'sum')

plot(example_MVA_early, type = 1)
plot(example_MVA_late, type = 1)
plot(example_MVA_early, type = 2)
plot(example_MVA_late, type = 2)

# 3 Example PCA
example_PCA <- daily_response(response = example_proxies_individual,
    env_data = LJ_daily_temperatures, method = "lm",
    lower_limit = 21, upper_limit = 180,
    row_names_subset = TRUE, remove_insignificant = TRUE,
    alpha = 0.01, PCA_transformation = TRUE,
    components_selection = "manual", N_components = 2)

summary(example_PCA$PCA_output)
summary(example_PCA)
plot(example_PCA, type = 2)

# 4 Example negative correlations
example_neg_cor <- daily_response(response = data_TRW_1,
    env_data = LJ_daily_temperatures, previous_year = TRUE,
    method = "cor", lower_limit = 21, upper_limit = 90,
    row_names_subset = TRUE, remove_insignificant = TRUE,
    alpha = 0.05)

summary(example_neg_cor)
plot(example_neg_cor, type = 1)
plot(example_neg_cor, type = 2)
example_neg_cor$temporal_stability

# 5 Example of multiproxy analysis
summary(example_proxies_1)
cor(example_proxies_1)

example_multiproxy <- daily_response(response = example_proxies_1,
   env_data = LJ_daily_temperatures,
   method = "lm", metric = "adj.r.squared",
   lower_limit = 21, upper_limit = 180,
   row_names_subset = TRUE, previous_year = FALSE,
   remove_insignificant = TRUE, alpha = 0.05)

plot(example_multiproxy, type = 1)

# 6 Example to test the temporal stability
example_MVA_ts <- daily_response(response = data_MVA,
   env_data = LJ_daily_temperatures, method = "brnn",
   lower_limit = 100, metric = "adj.r.squared", upper_limit = 180,
   row_names_subset = TRUE, remove_insignificant = TRUE, alpha = 0.05,
   temporal_stability_check = "running_window", k_running_window = 10)

example_MVA_ts$temporal_stability

# 7 Example with nonlinear brnn estimation
example_brnn <- daily_response(response = data_MVA,
   env_data = LJ_daily_temperatures, method = "brnn", boot = FALSE,
   lower_limit = 100, metric = "adj.r.squared", upper_limit = 101,
   row_names_subset = TRUE, remove_insignificant = TRUE, boot_n = 10)

summary(example_brnn)
# }

Run the code above in your browser using DataLab