rfPermute
Description
rfPermute estimates the significance of importance metrics for a Random Forest model by permuting the response variable. It will produce null distributions of importance metrics for each predictor variable and p-values of observed importances. The package also includes several summary and visualization functions for randomForest and rfPermute results. See rfPermuteTutorial() in the package for a guide on running, summarizing, and diagnosing rfPermute and randomForest models.
Contact
- submit suggestions and bug-reports: https://github.com/ericarcher/rfPermute/issues
- send a pull request: https://github.com/ericarcher/rfPermute/
- e-mail: eric.archer@noaa.gov
Installation
To install the stable version from CRAN:
install.packages('rfPermute')To install the latest version from GitHub:
# make sure you have devtools installed
if (!require('devtools')) install.packages('devtools')
# install from GitHub
devtools::install_github('EricArcher/rfPermute')Current Functions
Variable importance p-value estimation, summary, and visualization
rfPermuteEstimate Permutation p-values for Random Forest Importance MetricsimportanceExtract rfPermute Importance Scores and p-valuesplotNullPlot Random Forest Importance Null DistributionsplotImpPredsDistribution of Important Variables
Random Forest model summary
summarySummarize rfPermute and randomForest modelsconfusionMatrixConfusion MatrixcasePredictionsReturn predictions and votes for training casespctCorrectPercent Correctly Classified
Random Forest model visualization and diagnostics
plotInbagDistribution of sample inbag ratesplotPredictedProbsDistribution of prediction assignment probabilitiesplotProximityPlot Random Forest Proximity ScoresplotTraceTrace of cumulative error rates in forestplotVotesVote Distribution
Miscellaneous functions
combineRPCombine rfPermute modelsbalancedSampsizeBalanced Sample SizecleanRFdataClean Random Forest Input Data
Changelog
version 2.5.5 (on CRAN)
- move of package to SWFSC/rfPermute as main GitHub repository
version 2.5.4 (on CRAN)
- fixed print.rfPermute output for regression models.
version 2.5.2
- fixed bug in plotImportance heatmap to now properly choose top rather than bottom
npredictors. - update package documentation for CRAN
version 2.5.1
- added
pct.correctargument toplotTrace(). Default is now to have y-axis as 1 - OOB error rate.
version 2.5
NOTE: v2.5 is a large redevelopment of the package. The structure of rfPermute model objects has changed make them incompatible with previous versions. Also, the name and functionality of several functions has changed to make them more consistent with one another.
A tutorial (under construction) is available within the package as rfPermuteTutorial().
version 2.2 (on CRAN)
- moved value of OOB expected error rate to end of output vector in
exptdErrRate - changed default of
thresholdargument inclassConfIntandconfusionMatrixtoNULL - added new grouping and labelling options to proximityPlot()
- added binomial test for priors in
exptdErrRateandconfusionMatrix
version 2.1.81
- Fixed bug in
pctCorrect - Added
casePredictions - Updated parallel code
version 2.1.7
- Fixed bug in parallel processing code.
version 2.1.6
- Added
plotConfMat,plotOOBtimes,plotRFtrace, andplotInbag, andplotImpVarDistvisualizations. - Changed
confusionMatrixso it will work whenrandomForestmodel doesn't have a$confusionelement, like when model is result ofcombine-ing multiple models. - Improved efficiency and stability of parallel processing code. Changed default value of
num.corestoNULL.
version 2.1.5
- Added
typeargument toplotVotesto choose between area and bar charts. - Changed
plot.rfPermutetoplotNullto avoid clashes and maintain functionality ofrandomForest::plot.randomForest. - Changed name of
proximity.plottoproximityPlot,exptd.err.ratetoexptdErrRate, andclean.rf.datatocleanRFdatato make camelCase naming scheme more consistent in package. - Changed
plotNullfrom base graphics to ggplot2. - Added
symb.metabdata set.
version 2.1.1
- Added
nargument toimpHeatmap. - Added functions:
classConfInt,confusionMatrix,plotVotes,pctCorrect.
version 2.0.1
- Fixed bug in
plot.rfPermutethat was reporting the p-value incorrectly at the top of the figure. - Fixed multi-threading in
rfPermuteso it works on Windows too. - Added
impHeatmapfunction. - Switched
proximity.plotto useggplot2graphics.
version 2.0
- Fixed bug with calculation of p-values not respecting importance measure scaling (division by standard deviations). New format of output of
rfPemutehas separate$null.distand$pvalelements, each with results for unscaled and scaled importance mesures. See?rfPermutefor more information. rp.importanceandplot.rfPermutenow take ascaleargument to specify whether or not importance values should be scaled by standard deviations.- If
nrep = 0forrfPermute, arandomForestobject is returned.
version 1.9.3
- Fixed import declarations to avoid
gridname clashes. - Fixed logic error in
clean.rf.datawhere fixed predictors were not removed. - Fixed error in use of
mainargument inplot.rp.importance.
version 1.9.2
- Added this NEWS.md
- Added README.md
- Added
num.coresargument torfPermuteto take advantage of multi-threading
version 1.9.1
- Added internal keyword to
calc.imp.pvalto keep it from indexing - Updated imports to match new CRAN policies