strataG v2.4.905


Monthly downloads



Summaries and Population Structure Analyses of Genetic Data

A toolkit for analyzing stratified population genetic data. Functions are provided for summarizing and checking loci (haploid, diploid, and polyploid), single stranded DNA sequences, calculating most population subdivision metrics, and running external programs such as structure and fastsimcoal. The package is further described in Archer et al (2016) <doi:10.1111/1755-0998.12559>.


CRAN version CRAN last day downloads CRAN last week downloads CRAN last month downloads CRAN total downloads
Zenodo DOI
Travis-CI Build Status AppVeyor Build Status



strataG is a toolkit for haploid sequence and multilocus genetic data summaries, and analyses of population structure. One can select select specific individuals, loci, or strata using standard R '[' indexing methods. . The package contains functions for summarizing haploid and diploid loci (e.g., allelic richness, heterozygosity, haplotypic diversity, etc.), and haploid sequences by locus and by strata as well as functions for computing by-site base frequencies and identifying variable and fixed sites among strata. There are both overall and pairwise standard tests of population structure like PHIst, Fst, Gst, and Jost's D. If individuals are stratified according to multiple schemes, these stratifications can be changed with the stratify() function and summaries or tests can be re-run on the new object. The package also includes wrappers for several external programs like fastsimcoal2, STRUCTURE, and mafft. There are also multiple conversion functions for data objects for other population packages such as adegenet, pegas, and phangorn.


To install the stable version from CRAN:


To install the latest version from GitHub:

# make sure you have Rtools installed
if (!require('devtools')) install.packages('devtools')
# install from GitHub
devtools::install_github('ericarcher/strataG', build_vignettes = TRUE)


Vignettes are available on several topics:

  • Creating and manipulating gtypes ("gtypes")
  • Genotype and sequence summaries ("summaries")
  • Working with sequences ("sequences")
  • Tests of population structure ("population.structure")
  • Installing external programs ("external.programs")

To see the list of all available vignettes:


To open a specific vignette:

vignette("gtypes", "strataG")

There is also a tutorial detailing running fastsimcoal2 through strataG available through the function fscTutorial().


The paper can be obtained here, and is cited as (preferred):

Archer, F. I., Adams, P. E. and Schneiders, B. B. (2016), strataG: An R package for manipulating, summarizing and analysing population genetic data. Mol Ecol Resour. doi:10.1111/1755-0998.12559

If desired, the current release version of the package can be cited as:

Archer, F. 2016. strataG: An R package for manipulating, summarizing and analysing population genetic data. R package version 1.0.6. Zenodo.


version 2.4.9 (devel)

  • Deleted functions: alleleFreqFormat, as.array.gtypes
  • Changed structure of gtypes object, making it no longer compatible with previous versions
  • Fixed and enhanced arlequinRead() so that it will read and parse all .arp files. Added arp2gtypes() to create gtypes object from parsed .arp files.
  • Improved performance of several standard summary functions, most notably dupGenotypes().
  • Full rework of fastsimcoal2 wrapper.
  • Removed strataGUI().

version 2.1

  • fixed error in ldNe when missing data are present
  • added STANDARD marker type to fastsimcoal
  • added na.rm = TRUE to calculation of mean locus summaries by strata in summary.gtypes. This avoids NaNs when there is a locus with genotypes missing for all samples.
  • explicitly convert x to a data.frame in df2gtypes in case it is a data.table or tibble.

version 2.0.2 (current on GitHub)

  • NOTE: In order to speed up indexing the data in large data sets, this version changes the underlying structure of the gtypes object by replacing the @loci data.frame slot with a @data data.table slot. The data.table has a id character column, a strata character column, and every column afterwards represents one locus. The @strata slot has been removed.
  • The loci accessor has been removed.
  • Added as.array which returns a 3-dimensional array with dimensions of [id, locus, allele].
  • The print (show) function for gtypes objects no longer shows a by-locus summary. The display was getting too slow for data sets with a large number of loci.
  • The summary function now includes by-sample results.
  • Fixed computational errors in population structure metrics due to incorrect sorting of stratification.
  • Added maf to return minimum allele frequency for each locus.
  • Added ldNe to calculate Ne.
  • Added expandHaplotypes to expand the haplotypes in a gtypes object to one sequence per individual.

version 1.0.6

  • Added read.arlequin back. Fixed missing function error with write.arlequin.
  • Added summarizeSamples
  • Changed evanno from base graphics to ggplot2
  • Updated logic in labelHaplotypes to assign haplotypes if possible alternative site combinations match a present haplotype
  • Added Zenodo DOI
  • Added shiny app (strataGUI) for creating gtypes objects, QA/QC, and population structure analyses
  • Added type argument to structurePlot to select between area and bar charts
  • Changed haplotypeLikelihoods to sequenceLikelihoods
  • neiDa now creates haplotypes before calculating metric
  • Fixed error in writePhase that was creating improper input files for PHASE

version 1.0.5

  • Fixed error in dupGenotypes, propSharedLoci, and propSharedIDs where missing genotypes were not being properly counted.
  • Added
  • Removed gtypes2df.
  • Added arguments to as.matrix.gtypes to include id and strata columns in output.
  • Removed the jmodeltest function as this functionality is available in the modeltest function in the phangorn package.
  • Added conversion functions gtypes2phyDat and phyDat2gtypes to facilitate interoperability with the phangorn package.
  • Removed read.arlequin.
  • Added alleleNames accessor for gtypes object, which returns list of allele names for each locus.

version 1.0

  • New version with different gtypes format from previous versions. See vignettes for instructions and examples.

Functions in strataG

Name Description Convert gtypes to data.frame or matrix
alleleSplit Split Alleles For Diploid Data
as.multidna Convert to multidna
allelicRichness Allelic Richness
LDgenepop Linkage Disequlibrium
bowhead.snp.position Bowhead Whale SNP Genotype Groups
baseFreqs Base Frequencies
arlequin Read and Write Arlequin Files
alleleFreqs Allele Frequencies
createConsensus Consensus Sequence
df2gtypes Convert a data.frame to gtypes
fscRun Run fastsimcoal
dolph.seqs Dolphin mtDNA D-loop Sequences
fscWrite Write fastsimcoal2 input files
fasta Read and Write FASTA
fixedDifferences Fixed Differences
TiTvRatio Transition / Transversion Ratio
gtypes2genind Convert Between gtypes And genind objects.
dolph.haps Dolphin mtDNA Haplotype Sequences
dloop.g Dolphin dLoop gtypes Object
dolph.msats Dolphin Microsatellite Genotypes
gtypes2loci Convert Between gtypes And loci objects.
fixedSites Fixed Sites
expandHaplotypes Expand Haplotypes
clumpp Run CLUMPP
evanno Run Evanno Method on STRUCTURE Results
bowhead.snps Bowhead Whale SNP Genotypes
neiDa Nei's Da
lowFreqSubs Low Frequency Substitutions
mRatio M ratio
msats.g Dolphin Microsatellite gtypes Object
fusFs Fu's Fs
gelato GELATo - Group ExcLusion and Assignment Test
dolph.strata Dolphin Genetic Stratification and Haplotypes
gtypes2phyDat Convert Between gtypes And phyDat objects.
heterozygosity Heterozygosity
readGenData Read Genetic Data
removeSequences Remove Sequences
numGenotyped Number of Individuals Genotyped
numAlleles Number of Alleles
freq2GenData Convert Haplotype Frequency Matrices
landscape2gtypes Convert Rmetasim landscape
jackHWE Hardy-Weinberg Equlibrium Jackknife
gtypes.accessors gtypes Accessors
structure STRUCTURE
labelHaplotypes Find and label haplotypes Show a gtypes object
nucleotideDiversity Nucleotide Diversity
phase PHASE
structurePlot Plot STRUCTURE Results
nucleotideDivergence Nucleotide Divergence
variableSites Variable Sites
is.gtypes Test if object is gtypes
ldNe ldNe
mostDistantSequences Most Distant Sequences
iupac IUPAC Codes
popGenEqns Population Genetics Equations Write NEXUS File for SNAPP
fscRead Read fastsimcoal output
fsc.input Input functions for fastsimcoal parameters
dupGenotypes Duplicate Genotypes
maf Minor Allele Frequencies
mostRepresentativeSequences Representative Sequences
numMissing Number Missing Data
popStructTest Population Differentiation Tests
popStructStat Population structure statistics
sfs Site Frequency Spectrum
mafft MAFFT Alignment
genepop Run GENEPOP
gtypes-class gtypes Class
sharedLoci Shared Loci
summarizeInds Individual Summaries
summarizeAll Summarize Genotypes and Sequences
strataSplit Split Strata
stratify Stratify gtypes
summarizeLoci Locus Summaries
hweTest Hardy-Weinberg Equilibrium
initialize,gtypes-method gtypes Constructor
permuteStrata Permute strata
sequence2gtypes Convert Sequences To gtypes
maverickRun Run MavericK
summarizeSeqs Sequence Summaries
sequenceLikelihoods Sequence Likelihoods
summary,gtypes-method Summarize gtypes Object
privateAlleles Private Alleles
simGammaHaps Simulate Haplotypes
propUniqueAlleles Proportion Unique Alleles
strataG-package Summaries and population structure analyses of DNA sequence genotypic data
mega Read and Write MEGA
theta Theta
writeGtypes Write gtypes
trimNs Trim N's From Sequences
tajimasD Tajima's D
No Results!

Vignettes of strataG

No Results!

Last month downloads


License GNU General Public License
Collate strataG-package.R gtypes.class.R gtypes.accessors.R is.gtypes.R gtypes.initialize.R strataG-internal.R alleleFreqs.R alleleSplit.R allelicRichness.R as.multidna.R dupGenotypes.R heterozygosity.R labelHaplotypes.R numAlleles.R numGenotyped.R numMissing.R permuteStrata.R privateAlleles.R propUniqueAlleles.R readGenData.R removeSequences.R sharedLoci.R strataSplit.R stratify.R summarizeLoci.R summarizeInds.R summarizeSeqs.R gtypes.summary.R writeGtypes.R df2gtypes.R sequence2gtypes.R baseFreqs.R createConsensus.R iupac.R expandHaplotypes.R trimNs.R fasta.R fixedSites.R variableSites.R nucleotideDiversity.R fixedDifferences.R lowFreqSubs.R sequenceLikelihoods.R nucleotideDivergence.R popGenEqns.R freq2GenData.R gtypes2genind.R gtypes2loci.R gtypes2phyDat.R theta.R maf.R TiTvRatio.R simGammaHaps.R mostDistantSequences.R mostRepresentativeSequences.R mega.R mRatio.R neiDa.R mafft.R arlequin.R ldNe.R genepop.R hweTest.R jackHWE.R LDgenepop.R fsc.input.R fscWrite.R fscRun.R fscRead.R sfs.R structure.R structurePlot.R evanno.R clumpp.R fusFs.R tajimasD.R popStructStat.R popStructTest.R RcppExports.R gelato.R phase.R maverickRun.R summarizeAll.R landscape2gtypes.R
LazyData TRUE
VignetteBuilder knitr
LinkingTo Rcpp
Encoding UTF-8
RoxygenNote 7.0.2
NeedsCompilation yes
Packaged 2020-02-24 00:30:34 UTC; ericarcher
Repository CRAN
Date/Publication 2020-02-28 07:10:02 UTC

Include our badge in your README