Basic Behavior:
This function makes a few basic checks to ensure that the response curve data
includes the expected information and does not include any mistakes. If no
problems are detected, this function will be silent with no return value. If a
problem is detected, then the user will be notified in one or more ways:
If error_on_failure is TRUE, then this function will
throw an error with a short message. If print_information is
also TRUE, then additional information will be printed to the R
terminal.
If error_on_failure is FALSE and
print_information is also FALSE, then this function will
throw a warning with a short message.
If error_on_failure is FALSE and
print_information is true, information about the problem will
be printed to the R terminal.
This function will (optionally) perform several checks:
Checking for infinite values: If col_to_ignore_for_inf is not
NULL, no numeric columns in exdf_obj should have
infinite values, with the exception of columns designated in
col_to_ignore_for_inf.
Checking required columns: All elements of identifier_columns
should be present as columns in exdf_obj. If
driving_column is not NULL, it should also be present as
a column in exdf_obj. If constant_col is not empty, then
these columns must also be present in exdf_obj.
Checking the number of points in each curve: The general idea is to
ensure that each curve has the expected number of points. Several
options can be specified via the value of expected_npts, as
discussed below.
Checking the driving column: If driving_column is not
NULL, then each curve should have the same sequence of values
in this column. To allow for small variations, a nonzero
driving_column_tolerance can be specified.
Checking the constant columns: If constant_col is not empty,
then each specified column should either be constant, or only vary by
a specified amount. See details below.
By default, most of these are not performed (except the simplest ones like
checking for infinite values or checking that key columns are present). This
enables an "opt-in" use style, where users can specify just the checks they
wish to make.
More Details:
There are several options for checking the number of points in each curve:
If expected_npts is a single negative number, no check will be
performed.
If expected_npts is 0, then each curve is expected to have the
same number of points.
If expected_npts is a single positive number, then each curve
is expected to have that many points. For example, if
expected_npts is 7, then each curve must have 7 points.
If expected_npts is a pair of positive numbers, then each curve
is expected to have a number of points lying within the range defined
by expected_npts. For example, if expected_npts is
c(6, 8), then each curve must have no fewer than 6 points and
no more than 8 points.
If expected_npts is a pair of numbers, one of which is zero and
one of which is positive, then the positive number specifies a range;
each curve must differ from the average number of points by less than
the range. For example, if expected_npts is c(0, 3),
then every curve must have a number of points within 3 of the average
number of points.
There are two options for checking columns that should be constant:
A value of NA indicates that all values of that column must be
exactly identical; this check applies for numeric and character
columns.
A numeric value indicates that the range of values of that column must
be smaller than the specified range; this range applies for numeric
columns only.
For example, setting constant_col = list(species = NA, Qin = 10) means
that each curve must have only a single value of the species column,
and that the value of the Qin column cannot vary by more than 10 across
each curve.
Use Cases:
Using check_response_curve_data is not strictly necessary, but it can
be helpful both to you and to anyone else reading your analysis code. Here are
a few typical use cases:
Average response curves: It is common to calculate and plot
average response curves, either manually or by using
xyplot_avg_rc. But, it only makes sense to do this if
each curve followed the same sequence of the driving variable. In this
case, check_response_curve_data can be used to confirm that
each curve used the same values of CO2_r_sp (for an A-Ci curve)
or Qin (for an A-Q curve).
Removing recovery points: It is common to wish to remove one
or more recovery points from a set of curves. The safest way to do
this is to confirm that all the curves use the same sequence of
setpoints; then you can be sure that, for example, points 9 and 10 are
the recovery points in every curve.
Making a statement of expectations: If you measured a set of
A-Ci curves where each curve has 16 points and used the same sequence
of CO2_r setpoints, you could record this somewhere in your
notes. But it would be even more meaningful to use
check_response_curve_data in your script with
expected_npts set to 16. If this check passes, then it means
not only that your claim is correct, but also that the identifier
columns are being interpreted properly.
Checking identifiers: If the data set includes some
identifying metadata, such as a species or location, it may be helpful
to confirm that each curve has a single value of these "identifier"
columns. Otherwise, the data set may be difficult to interpret.
Checking measurement conditions: If the response curves are
expected to be measured under constant temperature, humidity, light,
or other environmental variables, it may be helpful to confirm that
these variables do not vary too much across each individual curve.
Otherwise, parameter values estimated from the curves may not be
meaningful.
Sometimes the response curves in a large data set were not all measured with
the same sequence of setpoints. If only a few different sequences were used,
it is possible to split them into groups and separately run
check_response_curve_data on each group. This scenario is discussed in
the Frequently Asked Questions vignette.
Even if none of the above situations are relevant to you, it may still be
helpful to run run check_response_curve_data but with
expected_npts set to 0 and error_on_failure set to FALSE.
With these settings, if there are curves with different numbers of points, the
function will print the number of points in each curve to the R terminal, but
won't stop the rest of the script from running. This can be useful for
detecting problems with the curve_identifier column. For example, if
the longest curves in the set are known to have 17 points, but
check_response_curve_data identifies a curve with 34 points, it is
clear that the same identifier was accidentally used for two different curves.