Correction.AdheringParticles: Correction.AdheringParticles

Description

Suppose element data of one data set (DT1) are biased because the concentrations are the result of a mixture of two substances, of which one substance are the element concentrations of DT2. In order to correct DT1 to $DT_{corrected}$ a fraction of DT2 has to be subtracted from DT1. The basic equation for the correction is: $$ DT_{corrected}=\frac{DT1 - x * DT2}{1 - x} $$ whereof x is the amount of DT2 to be subtracted.

The function is written for the case that x is unknown. To calculate x the condition is that in $DT_{corrected}$ at least one element concentration is zero or known. Suppose $vars_{i}$ has a very low concentration, close to zero, in $DT_{corrected}$: $DT_{corrected}[vars_{i}]=0$, then: $$ x = \frac{DT1[vars_{i}]}{DT2[vars_{i}]} $$

The function was developed for the use to correct plant concentrations for adhering particles: Exact and reproducible analysis of element concentrations in plant tissue is the basis for many research fields such as environmental, health, phytomining, agricultural or provenance studies. Unfortunately plant samples collected in the field will always contain particles on their tissue surfaces such as airborne dust or soil particles. If not removed these particles may induce a bias to the element concentrations measured in plant samples.

For full description of the calculations and the background of correction plants for adhering particles please refer to:

Pospiech, S., Fahlbusch, W., Sauer, B., Pasold, T., & Ruppert, H. (2017). Alteration of trace element concentrations in plants by adhering particles<U+2013>Methods of correction. Chemosphere, 182, 501-508. and the section Details.

Usage

Correction.AdheringParticles(DT1, DT2 = NULL, vars = NULL,
  vars_ignore = c("As", "Se", "Sn", "V", "Be", "Ge", "Pt"), method, element,
  id.vars, group1.vars, group2.vars, var_subgroup, offset = 0,
  use_only_DT2 = TRUE, DT2_replace = NULL, Errors = TRUE,
  return_as_list = TRUE, negative_values = FALSE,
  set_statistical_0 = FALSE, Error_method = "gauss", STD_DT1 = STD_Plant,
  STD_DT2 = STD_Soil, minNr_DT1 = 100, minNr_DT2 = 100)

Arguments

DT1

data.frame or data.table, samples in rows and variables in columns

DT2

data.frame or data.table, samples in rows and variables in columns.

vars

optional, character vector of column names of DT1 and DT2, default is function select.VarsElements. Please make sure the columns given in vars are of class numeric.

vars_ignore

character vector of column names, only for 'method 3'. These variables are ignored for calculating the median of amount of DT2 (x) in 'method 3'. Please note: the functions returns corrected values for these columns because they are only ignored for calculating the median of x. Default is "As", "Se", "Sn", "V", "Be", "Ge" and "Pt". Please see Details for further explanation.

method

characters (no character vector!, please give m3 instead of "m3") denoting the method. Options are m1, m2 and m3 and subtr. Default is m3. Please see details.

element

string, only for method 1. Denotes the column with which amount of DT2 (x) is to be calculated.

id.vars

column with unique (!) entries for each row. Class can be integer (corresponding row numbers) or character (e.g. sample IDs). If missing, all columns but vars will be assigned to it. Please note: Function is faster and more stable if id.vars is provided.

group1.vars

character vector, column name(s) for subsetting DT1 and DT2

group2.vars

optional, column name for subsetting DT1 and DT2 if some entries in group1.vars are empty.

var_subgroup

optional, character vector of one column name of DT1. This option affects the only the error calculation, hence it is ignored if Errors is set to FALSE. If provided, DT1 is split into subsets by group1.vars and 'var_subgroup' and the error will calculated for each of these subset. Please read in the Details for further information.

offset

numeric, default is 0. The offset diminishes the subtracted amount of DT2 x: x = x - offset. If used with m2 all concentrations will stay > 0. Reasonable offset is e.g. offset = 0.0001

use_only_DT2

logical, default is FALSE. If there are not enough DT2 data of the location should the DT2s of the region be used? If the use_only_DT2 is set to FALSE then the Upper Crust is used for the correction.

DT2_replace

optional, if a DT1 sample does not have DT2 data of the corresponding location with this option you can define which data you would like to use as DT2. Default is the build-in data set UpperCrust (geochemical composition of the earth's upper crust). If you would like to have something else, please provide a named vector/ one-row data.table with values used instead of DT2.

Errors

logical, should absolute errors get calculated appended to the list - output? Default is FALSE. If Errors are set to TRUE it overrides the option return_as_list and always returns a list.

return_as_list

logical, should the result get returned as list? Default is FALSE.

negative_values

logical, should negative values be returned? If set to FALSE negative values are set to 0. Default is FALSE.

set_statistical_0

logical, only for method 3. Should all values of the variables contributing to the median of x be set to 0? Default is FALSE.

Error_method

method with which the error should be calculated. At the moment you can choose between "gauss" (default) and "biggest". See Details for explanation.

STD_DT1

optional, data.frame or data.table object for calculating errors for DT1, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.

STD_DT2

optional, data.frame or data.table object for calculating errors for DT2, e.g. the standards. Please see Details. If left empty a default of 5.2% relative error is used.

minNr_DT1

minimum numbers of samples/observations in DT1 for calculating a relative error of observations. If the number of observations of DT1 is smaller than minNr_DT1 the error is calculated via the data set STD_DT1. Default is 50.

minNr_DT2

minimum numbers of samples/observations in DT2 for calculating a relative error of observations. If the number of observations of DT1 is smaller than minNr_DT2 the error is calculated via the data set STD_DT2. Default is 50.

Value

data.frame (or data.table if DT1 is data.table) according to method.

Details

The main option of this function is the method which determines how the amount of DT2 to be subtracted, the x, is going to be calculated. There are four options:

Method 1: calculate x via a fixed element
Method 2: calculate x via the element with the smallest ratio between DT1[vars] and DT2[vars]
Method 3: calculate x via the median of several, very small ratios between DT1[vars] and DT2[vars]
Method subtr: calculate the concentrations for $x * DT2[vars]$

To Method 1: For example using Ti as element $DT_{corrected}$ is calculated with $ x = DT1[Ti]/DT2[Ti]$. Typical elements for the option element are e.g. Ti, Al, Zr, Sc, ... This will eventually lead to negative concentrations for some elements.

To Method 2: This method subtracts the smallest possible content of DT2 from DT1 (smallest x). For each row/sample the element with the smallest x of all ratios $ x = DT1[vars]/DT2[vars]$ of each sample is taken as element, hence every sample is corrected based on a different element. With this method there are no negative concentrations.

To Method 3: In order to reduce the uncertainty of the content of DT2 in DT1 (x) based on only one element as in method 1 and 2 an average of several x of elements can be calculated. With $\Delta x$ being the absolute error of x the median is calculated by all x of elements which values $ x - \Delta x$ are smaller than $ x_{smallest} + \Delta x_{smallest}$. The value of the median $\bar{x}$ is then used as x. This will eventually lead to negative concentrations for some elements. Because statistically the x of all elements, which error overlaps the error of the element with smallest x, are indistinguishable we suggest to set all elements contributing to $\bar{x}$ to zero, because these small values should not be interpreted: Set option set_statistical_0 to TRUE.

It is advisable to exclude elements with a huge error margin in the option vars_ignore because they could severely increase the median $\bar{x}$ by "opening" the window of error-ranges for many elements with significantly higher ratios. This could lead to an unnatural high median $\bar{x}$ resulting into an overcorrection.

If option id.vars is provided the functions prints the 'group1.vars' and 'id.vars' of the sample.

For examples and more information please refer to: Pospiech, S., Fahlbusch, W., Sauer, B., Pasold, T., & Ruppert, H. (2017). Alteration of trace element concentrations in plants by adhering particles<U+2013>Methods of correction. Chemosphere, 182, 501-508.

Description

Usage

Arguments

Value

Details

See Also