The top and bottom percentiles of selected variables calculated by percdata
can be plotted by percplot
that offers a vivid check of possible outliers. It uses reshape2::melt
or dataprep::melt
to melt the data and uses ggplot2
and scales
to plot the data.
percplot(data, start = NULL, end = NULL, group = NULL, ncol = NULL,
diff = 0.1, part = 'both')
A data frame to calculate percentiles, from the column start
to the column end
.
The column number of the first variable to calculate percentiles for.
The column number of the last variable to calculate percentiles for.
The column number of the grouping variable. It can be selected according to whether the data needs to be processed in groups. If grouping is not required, leave it default (NULL); if grouping is required, set group
as the column number (position) where the grouping variable is located. If there are more than one grouping variable, it can be turned into a longer group through combination and transformation in advance.
The total columns of the plot.
The common difference between quantile
's probs
. Default is 0.1.
The option of plotting bottom and/or top percentiles (parts). Default is 'both', or 2 for both bottom and top parts. Setting it as 'bottom' or 0 for bottom part and 'top' or 1 for top part.
Top (highest or greatest) and bottom (lowest or smallest) percentiles are plotted.
Quantile with probs = 0
Quantile with probs = 0.001
Quantile with probs = 0.002
Quantile with probs = 0.003
Quantile with probs = 0.004
Quantile with probs = 0.005
Quantile with probs = 0.995
Quantile with probs = 0.996
Quantile with probs = 0.997
Quantile with probs = 0.998
Quantile with probs = 0.999
Quantile with probs = 1
Four scenes are considered according to the scales of x and y axes, namely the ranges of x and y values. For example, the code, sd(diff(log(as.numeric(as.character(names(data[, start:end])))))) / mean(diff(log(as.numeric(as.character(names(data[, start:end])))))) < 0.1 & max(data[, start:end], na.rm = T) / min(data[, start:end], na.rm = T) > = 10^3
, means that the coefficient of variation of the lagged differences of log(x)
is below 0.1 and meanwhile the maximum y is 1000 times greater than or equal to the minimum y.
1. Example data is from https://smear.avaa.csc.fi/download. It includes particle number concentrations in SMEAR I Varrio forest.
2. Wickham, H. 2007. Reshaping data with the reshape package. Journal of Statistical Software, 21(12):1-20.
3. Wickham, H. 2009. ggplot2: Elegant Graphics for Data Analysis. http://ggplot2.org: Springer-Verlag New York.
4. Wickham, H. 2016. ggplot2: elegant graphics for data analysis. Springer-Verlag New York.
5. Wickham, H. 2017. scales: Scale Functions for Visualization. 0.5.0 ed. https://github.com/hadley/scales.
6. Wickham, H. & Seidel, D. 2019. scales: Scale Functions for Visualization. R package version 1.1.0. https://CRAN.R-project.org/package=scales.
dataprep::percdata
and dataprep::melt
# NOT RUN {
# Plot
percplot(data,5,65,4)
# Plot
percplot(data1,3,7,2)
# }
Run the code above in your browser using DataLab