Learn R Programming

⚠️There's a newer version (2.3.1) of this package.Take me there.

pophelper 1.2.0

pophelper is an R package and web app to analyse and visualise population structure. pophelper curently supports output files generated from population analysis programs such as ADMIXTURE, fastSTRUCTURE, STRUCTURE and TESS. The pophelper package can be used to tabulate runs, summarise runs, estimate K using the Evanno method, export files for CLUMPP, export files for DISTRUCT and generate barplot figures. Additionally, there are functions to create spatial plots and interpolation using geographical coordinates.

For a detailed demonstration and walkthrough, refer the online vignette. Click here to watch video tutorials based on version 1.1.5. New versions and updates are shown only on this GitHub page.

Installation

You need to have R (> 3.1.2) statistical package installed on your system. R is open-source and freely available to download for Windows, Mac and other OS. Then, install the 'devtools' package. Then, you can install pophelper from github using the devtools package.

#install devtools package from CRAN
install.packages('devtools',dependencies=T)
library(devtools)

#install pophelper package from GitHub
install_github('royfrancis/pophelper')

#load library for use
library(pophelper)

pophelper 1.2.0 has been tested using R version 3.3.1 on Windows 7 64bit, Mac OS X 10.10.1 (Yosemite) 64bit, Linux Ubuntu 16.04.1 and Scientific Linux 6.7 (R 3.3.0). Note that pophelper 1.2.0 includes binary executables for CLUMPP and DISTRUCT.

Web App

An online interactive version of pophelper is now available at pophelper.com.

List of Functions

For help on any function, use
?tabulateRunsStructure
?evannoMethodStructure

For functions where one or more files need to be selected, the selection can be performed interactively. Windows users can use choose.files(multi=T). Mac users can use file.choose() for single selection and tk_choose.files() from tcltk package for multiple selection.

tabulateRunsStructure()   # Get a tabulation of several STRUCTURE files
summariseRunsStructure()  # Summarise runs by repeats for each K
evannoMethodStructure()   # Perform the Evanno method on summarised data
runsToDfStructure()       # Convert STRUCTURE run files to R dataframe
clumppExportStructure()   # Export files for use with CLUMPP

collectRunsTess()         # Collect TESS output from multiple directories into one
tabulateRunsTess()        # Get a tabulation of several TESS files
summariseRunsTess()       # Summarise runs by repeats for each K
runsToDfTess()            # Convert TESS run files to R dataframe
clumppExportTess()        # Export files for use with CLUMPP

tabulateRunsMatrix()      # Get a tabulation of several MATRIX files
summariseRunsMatrix()     # Summarise runs by repeats for each K
runsToDfMatrix()          # Convert MATRIX run files to R dataframe
clumppExportMatrix()      # Export files for use with CLUMPP

collectClumppOutput()     # Collect CLUMPP output into a common directory
plotRuns()                # Plot a barplot from STRUCTURE/TESS/Matrix/table files
PlotMultiline()           # Plot a multi-line barplot from STRUCTURE/TESS/Matrix/table file
distructExport()          # Export files for DISTRUCT from STRUCTURE/TESS/Matrix/table file

plotRunsInterpolate()     # Spatially interpolate clusters from a STRUCTURE/TESS run file
plotRunsSpatial()         # Cluster by max assignment and plot points spatially

analyseRuns()             # A wrapper function to quickly tabulate, summarise, perform evanno method, 
                          # clumpp output and generate barplots from STRUCTURE or TESS run files.

Functions and workflow


Fig 1. Workflow for STRUCTURE files. Files/objects are indicated in black text and functions are indicated in blue. The analyseRuns() function is a wrapper function which can be used to run several functions together. This is indicated by the asterisk and the orange path. For clumpp results, the clumpp executable must be run to continue with the workflow.


Fig 2. Workflow for TESS files. Files/objects are indicated in black text and functions are indicated in blue. The analyseRuns() function is a wrapper function which can be used to run several functions together. This is indicated by the asterisk and the orange path. For clumpp results, the clumpp executable must be run to continue with the workflow.


Fig 3. Workflow for MATRIX files. Files/objects are indicated in black text and functions are indicated in blue. The analyseRuns() function is a wrapper function which can be used to run several functions together. This is indicated by the asterisk and the orange path. For clumpp results, the clumpp executable must be run to continue with the workflow.


Fig 4. Plots from Evanno Method.


Fig 5. Left: Single run plotted separately without pop labels (top) and with pop labels (bottom). Right: Two runs joined together in one image without pop labels (top) and with pop labels (bottom).


Fig 6. Left: Single run plotted separately with pop labels sorted by one cluster (top) and sorted by all clusters (bottom). Right Top: Single run with pop labels reordered. Pop B before Pop A. Right Bottom: Two runs joined together in one image with pop labels reordered and individuals sorted by all clusters.


Fig 7. Left: Combined files (Three STRUCTURE runs for K=4). Middle: Aligned files (Three STRUCTURE runs for K=4 aligned using CLUMPP). Right: Merged file (Three runs for K=4 merged into one table/figure using CLUMPP).


Fig 8. Left: plotMultiline default output. Middle Left: Modified output where samples per line and lines per page were defined manually. Middle Right: Individuals sorted by cluster 1. Right: Individuals sorted by all clusters.

Fig 9. Multiline plots with (left) standard colours, (middle) rich.colors() from gplots package and (right) brewer.pal(8,"Spectral") from RColorBrewer package.

Fig 10. Interpolated plot of one TESS run file containing 6 clusters (K=6). The default interpolation algorithm (method) used was kriging. In this particular case, it is known that K=2, therefore only cluster 2 has useful information.

Fig 11. Interpolated plot of one cluster (Cluster 2) of one TESS run file containing 6 clusters (K=6) showing different interpolation methods. Row 1 from left: bilinear, bicubic and Inverse distance weighting. Row 2 from left: Nearest neighbour and Kriging.

Fig 12. Interpolation plots showing some of the available colour palettes.

Fig 13. Some of the plots created using the function plotRunsSpatial().

For detailed demonstration and description, refer the vignette.

Disclaimer

The pophelper R package is offered free and without warranty of any kind, either expressed or implied. I will not be held liable to you for any damage arising out of the use, modification or inability to use this program. pophelper R package can be used, redistributed and/or modified freely for non-commercial purposes subject to the original source being properly cited. Licensed under GPL-3. Please make sure you verify all your results by eye.

Contact

If you have an comments, suggestions, corrections or ideas on ways to improve or extend this package, feel free to contact me. In case of issues, you can also add a comment to the issues section here on Github. Preferred contact email is roy[dot]m[dot]francis[at]outlook[dot]com. If you don't receive a reply from me in 48 hours, consider sending an email to roy[dot]francis[at]ebc[dot]uu[dot]se as well.

2016 Roy M Francis

Copy Link

Version

Version

1.2.0

License

GPL-3

Issues

Pull Requests

Stars

Forks

Maintainer

Roy Francis

Last Published

December 29th, 2020

Functions in pophelper (1.2.0)

determineRowsAndCols

Internal: Determine rows and columns for arbitrary number of plots
distructColours

Internal: Vector of 90 Distruct colours
getOS

Internal: Find current OS
getPlotParams

Internal: Generate parameters for plots with labels
plotRunsSpatial

plotRunsSpatial
popLabels

Internal: Handles pop subset/order
unitConverter

Internal: Convert between dimension units
clumppExportAdmixture

Combine ADMIXTURE runs and export files for use with software CLUMPP
clumppExportMatrix

Combine MATRIX runs and export files for use with software CLUMPP
distructExport

Generate input files for DISTRUCT.
ellipseCI

Internal: ellipseCI
summariseRunsAdmixture

Summarise ADMIXTURE runs
summariseRunsMatrix

Summarise MATRIX runs
tabulateRunsStructure

Tabulate STRUCTURE runs
tabulateRunsTess

Tabulate TESS runs
summariseRunsStructure

Summarise STRUCTURE runs
summariseRunsTess

Summarise TESS runs
clumppExportStructure

Combine STRUCTURE runs and export files for use with software CLUMPP
clumppExportTess

Combine TESS runs and export files for use with software CLUMPP
runsToDfAdmixture

Convert ADMIXTURE cluster files to R dataframe.
runsToDfMatrix

Convert MATRIX cluster files to R dataframe.
getColours

Internal: Get Colours
getDim

Internal: Get dimensions for figures.
plotRuns

Plot STRUCTURE, TESS, MATRIX or TAB files as barplots.
plotRunsInterpolate

Interpolate STRUCTURE, TESS or MATRIX runs spatially
tabulateRunsAdmixture

Tabulate Admixture runs (Deprecated)
tabulateRunsMatrix

Tabulate Matrix runs
collectClumppOutput

Collect CLUMPP output files from multiple folders
collectRunsTess

Collect TESS cluster run files from multiple folders
evannoMethodStructure

Perform the Evanno method for STRUCTURE runs.
getColors

Internal: Get Colours
analyseRuns

Analyse STRUCTURE, TESS or MATRIX runs. Wrapper around several smaller functions.
checkRuns

Internal: Check if a selected run is STRUCTURE, TESS, MATRIX or TAB file.
llToUtmzone

Internal: Find UTM zone from a latitude and longitude
plotMultiline

Plot STRUCTURE, TESS, MATRIX run files and TAB files in multiline
runsToDfStructure

Convert STRUCTURE run files to R dataframes.
runsToDfTess

Convert TESS cluster files to R dataframe.