Suppose element data of one data set (DT1) are biased because the concentrations are the result of a mixture of two substances, of which one substance are the element concentrations of DT2. In order to correct DT1 to \(DT_{corrected}\) a fraction of DT2 has to be subtracted from DT1. The basic equation for the correction is: $$ DT_{corrected}=\frac{DT1 - x * DT2}{1 - x} $$ whereof x is the amount of DT2 to be subtracted.
The function is written for the case that x is unknown. To calculate x the condition is that in \(DT_{corrected}\) at least one element concentration is zero or known. Suppose \(vars_{i}\) has a very low concentration, close to zero, in \(DT_{corrected}\): \(DT_{corrected}[vars_{i}]=0\), then: $$ x = \frac{DT1[vars_{i}]}{DT2[vars_{i}]} $$
The function was developed for the use to correct plant concentrations for adhering particles: Exact and reproducible analysis of element concentrations in plant tissue is the basis for many research fields such as environmental, health, phytomining, agricultural or provenance studies. Unfortunately plant samples collected in the field will always contain particles on their tissue surfaces such as airborne dust or soil particles. If not removed these particles may induce a bias to the element concentrations measured in plant samples.
For full description of the calculations and the background of correction plants for adhering particles please refer to:
Pospiech, S., Fahlbusch, W., Sauer, B., Pasold, T., & Ruppert, H. (2017). Alteration of trace element concentrations in plants by adhering particles<U+2013>Methods of correction. Chemosphere, 182, 501-508. and the section Details.
Correction.AdheringParticles(DT1, DT2 = NULL, vars = NULL,
vars_ignore = c("As", "Se", "Sn", "V", "Be", "Ge", "Pt"), method, element,
id.vars, group1.vars, group2.vars, var_subgroup, offset = 0,
use_only_DT2 = TRUE, DT2_replace = NULL, Errors = TRUE,
return_as_list = TRUE, negative_values = FALSE,
set_statistical_0 = FALSE, Error_method = "gauss", STD_DT1 = STD_Plant,
STD_DT2 = STD_Soil, minNr_DT1 = 100, minNr_DT2 = 100)data.frame or data.table, samples in rows and variables in columns
data.frame or data.table, samples in rows and variables in columns.
optional, character vector of column names of DT1 and DT2, default is function select.VarsElements.
Please make sure the columns given in vars are of class numeric.
character vector of column names, only for 'method 3'. These variables are ignored for calculating the median of amount of DT2 (x) in 'method 3'. Please note: the functions returns corrected values for these columns because they are only ignored for calculating the median of x. Default is "As", "Se", "Sn", "V", "Be", "Ge" and "Pt". Please see Details for further explanation.
characters (no character vector!, please give m3 instead of "m3") denoting the method. Options are m1, m2 and m3 and subtr. Default is m3. Please see details.
string, only for method 1. Denotes the column with which amount of DT2 (x) is to be calculated.
column with unique (!) entries for each row. Class can be integer (corresponding row numbers) or character (e.g. sample IDs).
If missing, all columns but vars will be assigned to it.
Please note: Function is faster and more stable if id.vars is provided.
character vector, column name(s) for subsetting DT1 and DT2
optional, column name for subsetting DT1 and DT2 if some entries in group1.vars are empty.
optional, character vector of one column name of DT1. This option affects the only the error calculation, hence it is ignored if Errors is set to FALSE.
If provided, DT1 is split into subsets by group1.vars and 'var_subgroup' and the error will calculated for each of these subset.
Please read in the Details for further information.
numeric, default is 0. The offset diminishes the subtracted amount of DT2 x: x = x - offset. If used with m2 all concentrations will stay > 0. Reasonable offset is e.g. offset = 0.0001
logical, default is FALSE. If there are not enough DT2 data of the location should the DT2s of the region be used? If the use_only_DT2 is set to FALSE then the Upper Crust is used for the correction.
optional, if a DT1 sample does not have DT2 data of the corresponding location with this option you can define which data you would like to use as DT2. Default is the build-in data set UpperCrust (geochemical composition of the earth's upper crust). If you would like to have something else, please provide a named vector/ one-row data.table with values used instead of DT2.
logical, should absolute errors get calculated appended to the list - output? Default is FALSE.
If Errors are set to TRUE it overrides the option return_as_list and always returns a list.
logical, should the result get returned as list? Default is FALSE.
logical, should negative values be returned? If set to FALSE negative values are set to 0. Default is FALSE.
logical, only for method 3. Should all values of the variables contributing to the median of x be set to 0? Default is FALSE.
method with which the error should be calculated. At the moment you can choose between "gauss" (default) and "biggest". See Details for explanation.
optional, data.frame or data.table object for calculating errors for DT1, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.
optional, data.frame or data.table object for calculating errors for DT2, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.
minimum numbers of samples/observations in DT1 for calculating a relative error of observations.
If the number of observations of DT1 is smaller than minNr_DT1 the error is calculated via the data set STD_DT1.
Default is 50.
minimum numbers of samples/observations in DT2 for calculating a relative error of observations.
If the number of observations of DT1 is smaller than minNr_DT2 the error is calculated via the data set STD_DT2.
Default is 50.
data.frame (or data.table if DT1 is data.table) according to method.
The main option of this function is the method which determines how the amount of DT2 to be subtracted, the x, is going to be calculated.
There are four options:
Method 1: calculate x via a fixed element
Method 2: calculate x via the element with the smallest ratio between DT1[vars] and DT2[vars]
Method 3: calculate x via the median of several, very small ratios between DT1[vars] and DT2[vars]
Method subtr: calculate the concentrations for \(x * DT2[vars]\)
To Method 1:
For example using Ti as element \(DT_{corrected}\) is calculated with \( x = DT1[Ti]/DT2[Ti]\).
Typical elements for the option element are e.g. Ti, Al, Zr, Sc, ...
This will eventually lead to negative concentrations for some elements.
To Method 2: This method subtracts the smallest possible content of DT2 from DT1 (smallest x). For each row/sample the element with the smallest x of all ratios \( x = DT1[vars]/DT2[vars]\) of each sample is taken as element, hence every sample is corrected based on a different element. With this method there are no negative concentrations.
To Method 3:
In order to reduce the uncertainty of the content of DT2 in DT1 (x) based on only one element as in method 1 and 2 an average of several x of elements can be calculated.
With \(\Delta x\) being the absolute error of x the median is calculated by all x of elements which values \( x - \Delta x\) are smaller than \( x_{smallest} + \Delta x_{smallest}\).
The value of the median \(\bar{x}\) is then used as x.
This will eventually lead to negative concentrations for some elements.
Because statistically the x of all elements, which error overlaps the error of the element with smallest x, are indistinguishable we suggest to set all elements contributing to \(\bar{x}\) to zero, because these small values should not be interpreted:
Set option set_statistical_0 to TRUE.
It is advisable to exclude elements with a huge error margin in the option vars_ignore because they could severely increase the median \(\bar{x}\) by "opening" the window of error-ranges for many elements with significantly higher ratios.
This could lead to an unnatural high median \(\bar{x}\) resulting into an overcorrection.
If option id.vars is provided the functions prints the 'group1.vars' and 'id.vars' of the sample.
For examples and more information please refer to: Pospiech, S., Fahlbusch, W., Sauer, B., Pasold, T., & Ruppert, H. (2017). Alteration of trace element concentrations in plants by adhering particles<U+2013>Methods of correction. Chemosphere, 182, 501-508.
Other ratio functions: preparationDT2,
ratioDT,
ratio_append_smallest