Function to simultaneously replace all missing data of an historical database of several pollen types by using different methods of interpolation.
interpollen(data, method = "lineal", maxdays = 30, plot = TRUE,
factor = 2, ndays = 3, spar = 0.5, data2 = NULL, data3 = NULL,
data4 = NULL, data5 = NULL, mincorr = 0.6, result = "wide")
A data.frame
object including the general database where interpollation must be performed. This data.frame
must include a first column in Date
format and the rest of columns in numeric
format. Each column must contain information of one pollen type. It is not necessary to insert missing gaps; the function will automatically detect them.
A character
string specifying the method applied to calculate and generate the pollen missing data. The implemented methods that can be used are: "lineal"
, "movingmean"
, "spline"
, "tseries"
or "neighbour"
. A more detailed information about the different methods may be consulted in Details. The method
argument will be "lineal"
by default.
A numeric (interger)
value specifying the maximum number of consecutive days with missing data that the algorithm is going to interpolate. If the gap is bigger than the argument value, the gap will not be interpolated. Not valid with "tseries"
method. The maxdays
argument will be 30
by default.
A logical
argument. If TRUE
, graphical previews of the input database will be plot at the end of the interpolation process. All the interpolated gaps will be marked in red. The plot
argument will be TRUE
by default.
A numeric (interger)
value bigger than 1
. Only valid if the "movingmean"
method is chosen. The argument specifies the factor which will multiply the gap size to stablish the range of the moving mean that will fulfill the gap. A more detailed information about the selection of the factor may be consulted in Details. The argument factor
will be 1
by default.
A numeric (interger)
value bigger than 1
. Only valid if the "spline"
method is chosen. Specifies the number of days beyond each side of the gap which are used to perform the spline regression. The argument ndays
will be 3
by default.
A numeric (double)
value ranging 0_1
specifying the degree of smoothness of the spline regression adjustment. As smooth as the adjustment is, more data are considered as outliers for the spline regression. Only valid if the "spline"
method is chosen. The argument "spar"
will be 0.5
by default.
A data.frame
object (each one) including database of a neighbour pollen station which will be used to interpolate missing data in the target station. Only valid if the "neighbour" method is chosen. This data.frame
must include a first column in Date
format and the rest of columns in numeric
format belonging to each pollen type by column. It is not necessary to insert the missing gaps; the function will automatically detect them. The arguments will be NULL
by default.
A numeric (double)
value ranging 0_1
. It specifies the minimal correlation coefficient (Spearman correlations) that neighbour stations must have with the target station to be taken into account for the interpolation. Only valid if the "neighbour"
method is chosen. The argument "mincorr"
will be 0.6
by default.
A character
string specifying the format of the resulting data.frame
. Only "wide"
or "long"
. The result
argument will be "wide"
by default.
This function returns different results:
If result = "wide"
, returns a data.frame
including the original data and completed with the interpolated data.
If result = "long"
, returns a data.frame
containing your data in long format (the first column for date, the second for pollen type, the third for concentration and an additional fourth column with 1
if this data has been interpolated or 0
if not).
If plot = TRUE
, plots for each year and pollen type with daily values are represented in the active graphic window. Interpolated values are marked in red. If method
argument is "tseries"
, the seasonality is also represented in grey.
This function allows to interpolate missing data in a pollen database using 4 different methods which are described below. Interpolation for each pollen type will be automatically done for gaps smaller than the "maxdays"
argument.
"lineal"
method. The interpolation will be carried out by tracing a straight line between the gap extremes.
"movingmean"
method. It calculates the moving mean of the pollen daily concentrations with a window size of the gap size multiplicated by the factor
argument and replace the missing data with the moving mean for these days. It is a dynamic function and for each gap of the database, the window size of the moving mean changes depending of each gap size.
"spline"
method. The interpolation will be carried out by performing a spline regression with the previous and following days to the gap. The number of days of each side of the gap that will be taken into account for calculating the spline regression are specified by ndays
argument. The smoothness of the adjustment of the spline regression can be specified by the spar
argument.
"tseries"
method. The interpolation will be carried out by analysing the time series of pollen database. It performs a seasonal_trend decomposition based on LOESS (Cleveland et al., 1990). The seasonality of the historical database is extracted and used to predict the missing data by performing a linear regression with the target year.
"neighbour"
method. Other near stations provided by the user are used to interpolate the missing data of the target station. First of all, a Spearman correlation is performed between the target station and the neighbour stations to discard the neighbour stations with a correlation coefficient smaller than mincorr
value. For each gap, a linear regression is performed between the neighbour stations and the target stations to determine the equation which converts the pollen concentrations of the neighbour stations into the pollen concentration of the target station. Only neighbour stations without any missing data during the gap period are taken into account for each gap.
Cleveland RB, Cleveland WS, McRae JE, Terpenning I (1990) STL: a seasonal_trend decomposition procedure based on loess. J Off Stat 6(1):3_33.
# NOT RUN {
data("munich_pollen")
interpollen(munich_pollen, method = "lineal", plot = FALSE)
# }
Run the code above in your browser using DataLab