Estimates the probability density function for a data sample.
estimatePDF(sample, pdfLength = NULL, estimationPoints = NULL,
lowerBound = NULL, upperBound = NULL, target = 70, lagrangeMin = 1,
lagrangeMax = 200, debug = 0, outlierCutoff = 7, smooth = TRUE)
returns true if the pdf calculated is not considered an acceptable estimate of the data according to the scoring function.
represents the quality of the solution returned. Values of 40 to 70 indicate high confidence in the estimate. Values less than 5 are considered to be of poor quality. For more information on scoring see the referenced publication.
estimated range of density data
estimated probability density function
estimated cummulative density function
scaled quantile residual. Provides a sample-size invariant measure of the fluctuations in the estimate.
length of the returned scaled quantile residual. In most cases, this is the size of the input sample. Exceptions are if outliers are detected and/or if the failedSolution flag is true.
values of lagrange multipliers. Can be used to reproduce the expansions for an analytical solution.
inverse of cdf for the sample.
the data sample from which to calculate the density estimate. If the sample has more than 1 column, the multivariate estimation function, estimatePDFmv(), is called instead.
the desired length of the estimate returned. Default value is calculated based on sample length. Overriding this calculation can increase or decrease the resolution of the estimate.
a vector containing the points to estimate. If not specified, this is calculated automatically to span the entire sample data.
the lower bound of the PDF, if known. Default value is calculated based on the range of the data sample.
the upper bound of the PDF, if known. Default value is calculated based on the range of the data sample.
a value from 1 to 100 representing the desired confidence percentage for the estimate score. The default of 70% represents the most likely score based on empirical simulations. A lower value may smooth estimates. A higher value tends to overfit to the sample and is not recommended.
minimum number of lagrange multipliers
maximum number of lagrange multipliers
verbose output printed to console
outliers are automatically detected and removed according to the formula: < Q1 - outlierCutoff * IQR; or > Q3 + outlierCutoff * IQR, where Q1, Q3, and IQR represent the first quartile, third quartile, and inter-quartile range, respectively. Setting outlierCutoff = 0 turns off outlier detection.
minimizes noise in estimates, particularly in areas of low data density
Jenny Farmer, Donald Jacobs
A nonparametric density estimator based on the maximum-entropy method. Accurately predicts a probability density function (PDF) for random data using a novel iterative scoring function to determine the best fit without overfitting to the sample.
Farmer, J. and D. Jacobs (2018). "High throughput nonparametric probability density estimation." PLoS One 13(5): e0196937.
#Estimates a normal distribution with 1000 sample points using default parameters
sampleSize = 1000
sample = rnorm(sampleSize, 0, 1)
dist = estimatePDF(sample)
Run the code above in your browser using DataLab