join: Merge species data sets on common columns (species)

Description

Merges any number of species matrices on their common columns to create a new data set with number of columns equal to the number of unqiue columns across all data frames. Needed for analysis of fossil data sets with respect to training set samples.

Usage

join(…, verbose = FALSE, na.replace = TRUE, split = TRUE, value = 0,
     type = c("outer", "left", "inner"))
# S3 method for join
head(x, …)
# S3 method for join
tail(x, …)

Arguments

…

for join, data frames containing the data sets to be merged. For the head and tail methods, additional arguments to head and tail, in particular "n" to control the number of rows of each joined data set to display.

verbose

logical; if TRUE, the function prints out the dimensions of the data frames in "\dots", as well as those of the returned, merged data frame.

na.replace

logical; samples where a column in one data frame that have no matching column in the other will contain missing values (NA). If na.replace is TRUE, these missing values are replaced with zeros. This is standard practice in ecology and palaeoecology. If you want to replace with another value, then set na.replace to FALSE and do the replacement later.

split

logical; should the merged data sets samples be split back into individual data frames, but now with common columns (i.e. species)?

value

numeric; value to replace NA with if na.replace is TRUE.

type

logical; type of join to perform. "outer" returns the union of the variables in data frames to be merged, such that the resulting objects have columns for all variables found across all the data frames to be merged. "left" returns the left outer (or the left) join, such that the merged data frames contain the set of variables found in the first supplied data frame. "inner" returns the inner join, such that the merged data frame contain the intersection of the variables in the supplied data frames. See Details.

an object of class "join", usually the result of a call to join.

Value

If split = TRUE, an object of class "join", a list of data frames, with as many components as the number of data frames originally merged.

Otherwise, an object of class c("join", "data.frame"), a data frame containing the merged data sets.

head.join and tail.join return a list, each component of which is the result of a call to head or tail on each data set compont of the joined object.

Details

When merging multiple data frames the set of variables in the merged data can be determined via a number of routes. join provides for two (currently) join types; the outer join and the left outer (or simply the left) join. Which type of join is performed is determined by the argument type.

The outer join returns the union of the set of variables found in the data frames to be merged. This means that the resulting data frame(s) contain columns for all the variable observed across all the data frames supplied for merging.

With the left outer join the resulting data frame(s) contain only the set of variables found in the first data frame provided.

The inner join returns the intersection of the set of variables found in the supplied data frames. The resulting data frame(s) contains the variables common to all supplied data frames.

Examples

Run this code

# NOT RUN {
## load the example data
data(swapdiat, swappH, rlgh)

## merge training and test set on columns
dat <- join(swapdiat, rlgh, verbose = TRUE)

## extract the merged data sets and convert to proportions
swapdiat <- dat[[1]] / 100
rlgh <- dat[[2]] / 100

## merge training and test set using left join
head(join(swapdiat, rlgh, verbose = TRUE, type = "left"))

## load the example data
data(ImbrieKipp, SumSST, V12.122)

## merge training and test set on columns
dat <- join(ImbrieKipp, V12.122, verbose = TRUE)

## extract the merged data sets and convert to proportions
ImbrieKipp <- dat[[1]] / 100
V12.122 <- dat[[2]] / 100

## show just the first few lines of each data set
head(dat, n = 4)

## show just the last few lines of each data set
tail(dat, n = 4)

## merge training and test set using inner join
head(join(ImbrieKipp, V12.122, verbose = TRUE, type = "inner"))

## merge training and test set using outer join and replace
## NA with -99.9
head(join(ImbrieKipp, V12.122, verbose = TRUE, value = -99.9))
# }

Run the code above in your browser using DataLab

Get 50% off unlimited learning