Learn R Programming

autodb

autodb is an R package for automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.)

Installation

Install the stable released version from CRAN with:

install.packages("autodb")

Install the unstable development version from GitHub with:

# install.packages("devtools")
devtools::install_github("CharnelMouse/autodb")

Example

Turning a simple data frame into a database:

library(autodb)
#> 
#> Attaching package: 'autodb'
#> The following object is masked from 'package:stats':
#> 
#>     decompose
summary(ChickWeight)
#>      weight           Time           Chick     Diet   
#>  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
#>  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
#>  Median :103.0   Median :10.00   20     : 12   3:120  
#>  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
#>  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
#>  Max.   :373.0   Max.   :21.00   19     : 12          
#>                                  (Other):506
db <- autodb(ChickWeight)
db
#> database with 2 relations
#> 4 attributes: weight, Time, Chick, Diet
#> relation Chick: Chick, Diet; 50 records
#>   key 1: Chick
#> relation Time_Chick: Time, Chick, weight; 578 records
#>   key 1: Time, Chick
#> references:
#> Time_Chick.{Chick} -> Chick.{Chick}
graphviz_text <- gv(db)
DiagrammeR::grViz(graphviz_text)

Using the exclude argument to forbid certain variables from appearing in keys:

summary(CO2)
#>      Plant             Type         Treatment       conc          uptake     
#>  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95   Min.   : 7.70  
#>  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175   1st Qu.:17.90  
#>  Qn3    : 7                                    Median : 350   Median :28.30  
#>  Qc1    : 7                                    Mean   : 435   Mean   :27.21  
#>  Qc3    : 7                                    3rd Qu.: 675   3rd Qu.:37.12  
#>  Qc2    : 7                                    Max.   :1000   Max.   :45.50  
#>  (Other):42
db2_noexclude <- autodb(CO2)
db2_noexclude
#> database with 3 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#>   key 1: Plant
#> relation Plant_conc: Plant, conc, Treatment, uptake; 84 records
#>   key 1: Plant, conc
#>   key 2: Treatment, conc, uptake
#> relation conc_uptake: conc, uptake, Type; 82 records
#>   key 1: conc, uptake
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
#> Plant_conc.{conc, uptake} -> conc_uptake.{conc, uptake}
graphviz_text2_noexclude <- gv(db2_noexclude)
DiagrammeR::grViz(graphviz_text2_noexclude)
db2 <- autodb(CO2, exclude = "uptake")
db2
#> database with 2 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#>   key 1: Plant
#> relation Plant_conc: Plant, conc, uptake; 84 records
#>   key 1: Plant, conc
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
graphviz_text2 <- gv(db2)
DiagrammeR::grViz(graphviz_text2)

There are also functions for doing each step of the database creation separately, including functional dependency detection and normalisation. See the vignette for more details.

Copy Link

Version

Install

install.packages('autodb')

Monthly Downloads

195

Version

3.1.0

License

BSD_3_clause + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Webster

Last Published

September 18th, 2025

Functions in autodb (3.1.0)

merge_schemas

Merge relation schemas in given pairs
gv.relation_schema

Generate Graphviz input text to plot relation schemas
records

Relational data records
nudge

Nudge meta-analysis data
insert

Insert data
subschemas

Schema subschemas
subrelations

Database subrelations
gv.data.frame

Generate Graphviz input text to plot a data frame
references

Schema references
reduce.database_schema

Remove database schema relations not linked to the given relations
gv.database

Generate Graphviz input text to plot databases
gv.database_schema

Generate Graphviz input text to plot database schemas
relation_schema

Relation schema vectors
gv.relation

Generate Graphviz input text to plot relations
synthesise

Synthesise relation schemas from functional dependencies
rename_attrs

Rename relational data attributes
detset

Determinant sets
normalise

Create normalised database schemas from functional dependencies
reduce

Remove relations not linked to the main relations
merge_empty_keys

Merge relation schemas with empty keys
relation

Relation vectors
rejoin

Join a database into a data frame
keys

Relational data keys
reduce.database

Remove database relations not linked to the main relations
attrs

Relational data attributes
create

Create instance of a schema
attrs_order

Relational data attribute order
d2

Generate D2 input text to plot objects
d2.data.frame

Generate D2 input text to plot a data frame
autodb

Create a normalised database from a data frame
autoref

Add foreign key references to a normalised database
database

Databases
autodb-package

Database-style normalisation for data.frames
d2.relation_schema

Generate D2 input text to plot relation schemas
df_rbind

Combine R Objects by Rows or Columns
database_schema

Database schemas
discover

Dependency discovery with DFD
df_duplicated

Determine Duplicate Elements
dependant

Dependants
decompose

Decompose a data frame based on given normalised dependencies
functional_dependency

Functional dependency vectors
gv

Generate Graphviz input text to plot objects
df_equiv

Test data frames for equivalence under row reordering