Learn R Programming

dataSDA

Datasets and Basic Statistics for Symbolic Data Analysis

Overview

dataSDA collects a diverse range of symbolic data and offers a comprehensive set of functions that facilitate the conversion of traditional data into the symbolic data format. It supports reading, writing, and conversion of symbolic data in diverse formats, as well as computing descriptive statistics of symbolic variables.

Installation

From GitHub

# install.packages("devtools")
devtools::install_github("hanmingwu1103/dataSDA")

From source

Download the latest release from the Releases page, then:

install.packages("dataSDA_0.1.5.tar.gz", repos = NULL, type = "source")

Features

Descriptive Statistics

Interval-valued data (int_*)

Compute mean, variance, covariance, and correlation for interval-valued data with 8 methods: CM, VM, QM, SE, FV, EJD, GQ, SPT.

library(dataSDA)
data(mushroom.int)

int_mean(mushroom.int, var_name = "Pileus.Cap.Width")
int_var(mushroom.int, var_name = c("Stipe.Length", "Stipe.Thickness"), method = c("CM", "FV", "EJD"))

int_cov(mushroom.int, var_name1 = "Pileus.Cap.Width",
        var_name2 = c("Stipe.Length", "Stipe.Thickness"),
        method = c("CM", "VM", "EJD", "GQ", "SPT"))
int_cor(mushroom.int, var_name1 = "Pileus.Cap.Width",
        var_name2 = "Stipe.Length", method = "CM")

Histogram-valued data (hist_*)

Compute mean, variance, covariance, and correlation for histogram-valued data with methods BG and L2W (cov/cor also support BD, B).

library(HistDAWass)

hist_mean(HistDAWass::BLOOD, var_name = "Cholesterol", method = "BG")
hist_var(HistDAWass::BLOOD, var_name = "Cholesterol", method = "L2W")

hist_cov(HistDAWass::BLOOD, var_name1 = "Cholesterol",
         var_name2 = "Hemoglobin", method = "BD")
hist_cor(HistDAWass::BLOOD, var_name1 = "Cholesterol",
         var_name2 = "Hemoglobin", method = "BG")

Data Format Conversion

FunctionDescription
RSDA_to_MMConvert RSDA / symbolic_tbl to MM (min-max) format
MM_to_iGAPConvert MM format to iGAP format
iGAP_to_MMConvert iGAP format to MM format
RSDA_to_iGAPConvert RSDA format to iGAP format
SODAS_to_MMConvert SODAS format to MM format
SODAS_to_iGAPConvert SODAS format to iGAP format
RSDA_formatConvert conventional data to RSDA format
set_variable_formatOne-hot encode set variables for RSDA format

Utilities

FunctionDescription
clean_colnamesClean column names of a data frame
write_csv_tableWrite data to CSV file

Datasets

The package includes 32 built-in datasets for symbolic data analysis:

Interval-valued datasets (symbolic_tbl class)

Abalone, Cars.int, ChinaTemp.int, age_cholesterol_weight.int, baseball.int, bird.int, blood_pressure.int, car.int, finance.int, hierarchy.int, horses.int, lackinfo.int, LoansbyPurpose.int, mushroom.int, nycflights.int, ohtemp.int, profession.int, soccer.bivar.int, veterinary.int

iGAP / data.frame datasets

Abalone.iGAP, Face.iGAP, airline_flights, airline_flights2, crime, crime2, fuel_consumption, health_insurance, health_insurance2, hierarchy, mushroom, occupations, occupations2

Dependencies

Authors

License

GPL (>= 2)

Copy Link

Version

Install

install.packages('dataSDA')

Monthly Downloads

203

Version

0.1.8

License

GPL (>= 2)

Maintainer

Han-Ming Wu

Last Published

February 11th, 2026

Functions in dataSDA (0.1.8)

credit_card.int

Credit Card Expenses Interval Dataset
histogram_stats

Statistics for Histogram Data
bird_species_extended.mix

Bird Species Extended Mixed Symbolic Dataset
iGAP_to_MM

iGAP to MM
interval_position

Position and Scale Measures for Interval Data
interval_geometry

Geometric Properties of Interval Data
int_convert_format

Convert Interval Data Format
int_detect_format

Detect Interval Data Format
iGAP_to_RSDA

iGAP to RSDA
crime2

Crime Demographics Modal-Valued Dataset
mushroom.int

Mushroom Species Interval Dataset
fuel_consumption

Fuel Consumption by Region Dataset
employment.int

European Employment by Gender and Age Interval Dataset
blood_pressure.int

Blood Pressure Interval Dataset
interval_distance

Distance Measures for Interval Data
int_list_conversions

List Available Format Conversions
clean_colnames

clean_colnames
bird.mix

Bird Species Mixed Symbolic Dataset
bird_species.mix

Bird Species Mixed Symbolic Dataset
china_temp.int

China Meteorological Stations Quarterly Temperature Interval Dataset
car.int

Car Models Interval Dataset
tennis.int

Tennis Court Types Interval Dataset
trivial_intervals.int

Trivial and Non-Trivial Intervals Example Dataset
interval_uncertainty

Uncertainty and Variability Measures for Interval Data
interval_utils

Internal Utility Functions for Interval Data
cars.int

Cars Interval Dataset
health_insurance.mix

Health Insurance Mixed Symbolic Dataset
veterinary.int

Veterinary Interval Dataset
mushroom_fuzzy

Mushroom Species Fuzzy/Symbolic Dataset
interval_similarity

Similarity Measures for Interval Data
set_variable_format

Set Variable Format
interval_stats

Statistics for Interval Data
health_insurance2

Health Insurance Modal-Valued Dataset
soccer_bivar.int

French Soccer Championship Bivariate Interval Dataset
loans_by_purpose.int

Loans by Purpose Interval Dataset
lackinfo.int

Lack of Information Questionnaire Interval Dataset
world_cup.int

World Cup Soccer Teams Interval Dataset
teams.int

Pickup League Teams Interval Dataset
interval_robust

Robust Statistics for Interval Data
interval_shape

Distribution Shape Measures for Interval Data
horses.int

Horse Breeds Interval Dataset
occupations

Occupation Salaries Dataset
occupations2

Occupation Salaries Modal-Valued Dataset
write_csv_table

Write Symbolic Data Table
nycflights.int

New York City Flights Interval Dataset
temperature_city.int

World Cities Monthly Temperature Interval Dataset
ohtemp.int

Ohio River Basin 30-Year Trimmed Mean Daily Temperatures Interval Dataset
town_services.mix

Town Services Concatenated Mixed Symbolic Dataset
oils.int

Oils and Fats Interval Dataset
lung_cancer.hist

Lung Cancer Treatments by State Histogram-Valued Dataset
hierarchy.int

Hierarchy Interval Dataset
hierarchy

Hierarchy Dataset
profession.int

Profession Work Salary Time Interval Dataset
mushroom

Mushroom Species Dataset (Original Format)
crime

Crime Demographics Dataset
abalone.int

Abalone Interval Dataset
RSDA_to_MM

RSDA to MM
RSDA_format

RSDA Format
MM_to_RSDA

MM to RSDA
MM_to_iGAP

MM to iGAP
face.iGAP

Face Dataset (iGAP Format)
airline_flights.hist

JFK Airport Airline Flights Histogram-Valued Dataset
airline_flights2

JFK Airport Airline Flights Modal-Valued Dataset
finance.int

Finance Sector Interval Dataset
bank_rates

Bank Interest Rates AR Model Symbolic Dataset
energy_consumption.distr

US Energy Consumption Distribution-Valued Dataset
age_cholesterol_weight.int

Age-Cholesterol-Weight Interval Dataset
acid_rain.int

Acid Rain Pollution Indices Interval Dataset
abalone.iGAP

Abalone Dataset (iGAP Format)
SODAS_to_MM

SODAS to MM
RSDA_to_iGAP

RSDA to iGAP
SODAS_to_iGAP

SODAS to iGAP
baseball.int

Baseball Teams Interval Dataset
bats.int

Bat Species Interval Dataset