Learn R Programming

PNC (version 0.1.0)

merge_dataset: Merge Two Datasets Based on Species Column

Description

This function merges two data frames based on the 'species' column, handling missing values and column differences intelligently. It provides flexible options for resolving conflicts when the same species appears in both datasets.

Usage

merge_dataset(main_data, additional_data, priority = "main")

Value

A data frame containing all unique species from both input datasets, with all columns from both datasets. The 'species' column is placed first, followed by all other columns in alphabetical order.

Arguments

main_data

A data frame containing the primary dataset. Must include a 'species' column.

additional_data

A data frame containing the secondary dataset. Must include a 'species' column.

priority

A character string specifying how to handle conflicts when both datasets contain non-missing values for the same species and column. Options are:

  • "main" (default): Use values from main_data

  • "additional": Use values from additional_data

  • "mean": Calculate mean for numeric values, use main_data for non-numeric

Details

The function performs the following operations:

  • Combines all unique species from both datasets

  • Includes all columns from both datasets

  • Handles missing values by using available non-missing values

  • Resolves conflicts based on the specified priority

  • For duplicate species within a dataset, only the first occurrence is used

Examples

Run this code
# Create sample datasets
main_data <- data.frame(
  species = c("Abies alba", "Coussapoa trinervia", "Crataegus monogyna"),
  genus = c("Abies", "Coussapoa", "Crataegus"),
  family = c("Pinaceae", "Urticaceae", "Rosaceae"),
  LA = c(NA, 2050.24, 449.15),
  LeafN = c(13.10, 14.52, 17.46),
  Seedmass = c(53.64, NA, 95.92),
  stringsAsFactors = FALSE
)

additional_data <- data.frame(
  species = c("Abies alba", "Corydalis solida"),
  genus = c("Abies", "Corydalis"),
  family = c("Pinaceae", "Papaveraceae"),
  LA = c(25.58, NA),
  LMA = c(0.19, 0.2),
  PlantHeight = c(53.66, 0.14),
  stringsAsFactors = FALSE
)

# Merge with main data priority (default)
merge_dataset(main_data, additional_data)

Run the code above in your browser using DataLab