Learn R Programming

autodb

autodb is an R package for automatic normalisation of a data frame to third normal form, with the intention of easing the process of data cleaning. (Usage to design your actual database for you is not advised.)

Installation

Install the stable released version from CRAN with:

install.packages("autodb")

Install the unstable development version from GitHub with:

# install.packages("devtools")
devtools::install_github("CharnelMouse/autodb")

Example

Turning a simple data frame into a database:

library(autodb)
#> 
#> Attaching package: 'autodb'
#> The following object is masked from 'package:stats':
#> 
#>     decompose
summary(ChickWeight)
#>      weight           Time           Chick     Diet   
#>  Min.   : 35.0   Min.   : 0.00   13     : 12   1:220  
#>  1st Qu.: 63.0   1st Qu.: 4.00   9      : 12   2:120  
#>  Median :103.0   Median :10.00   20     : 12   3:120  
#>  Mean   :121.8   Mean   :10.72   10     : 12   4:118  
#>  3rd Qu.:163.8   3rd Qu.:16.00   17     : 12          
#>  Max.   :373.0   Max.   :21.00   19     : 12          
#>                                  (Other):506
db <- autodb(ChickWeight)
db
#> database with 2 relations
#> 4 attributes: weight, Time, Chick, Diet
#> relation Chick: Chick, Diet; 50 records
#>   key 1: Chick
#> relation Time_Chick: Time, Chick, weight; 578 records
#>   key 1: Time, Chick
#> references:
#> Time_Chick.{Chick} -> Chick.{Chick}
graphviz_text <- gv(db)
DiagrammeR::grViz(graphviz_text)

Using the exclude argument to forbid certain variables from appearing in keys:

summary(CO2)
#>      Plant             Type         Treatment       conc          uptake     
#>  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95   Min.   : 7.70  
#>  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175   1st Qu.:17.90  
#>  Qn3    : 7                                    Median : 350   Median :28.30  
#>  Qc1    : 7                                    Mean   : 435   Mean   :27.21  
#>  Qc3    : 7                                    3rd Qu.: 675   3rd Qu.:37.12  
#>  Qc2    : 7                                    Max.   :1000   Max.   :45.50  
#>  (Other):42
db2_noexclude <- autodb(CO2)
db2_noexclude
#> database with 3 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#>   key 1: Plant
#> relation Plant_conc: Plant, conc, Treatment, uptake; 84 records
#>   key 1: Plant, conc
#>   key 2: Treatment, conc, uptake
#> relation conc_uptake: conc, uptake, Type; 82 records
#>   key 1: conc, uptake
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
#> Plant_conc.{conc, uptake} -> conc_uptake.{conc, uptake}
graphviz_text2_noexclude <- gv(db2_noexclude)
DiagrammeR::grViz(graphviz_text2_noexclude)
db2 <- autodb(CO2, exclude = "uptake")
db2
#> database with 2 relations
#> 5 attributes: Plant, Type, Treatment, conc, uptake
#> relation Plant: Plant, Type, Treatment; 12 records
#>   key 1: Plant
#> relation Plant_conc: Plant, conc, uptake; 84 records
#>   key 1: Plant, conc
#> references:
#> Plant_conc.{Plant} -> Plant.{Plant}
graphviz_text2 <- gv(db2)
DiagrammeR::grViz(graphviz_text2)

There are also functions for doing each step of the database creation separately, including functional dependency detection and normalisation. See the vignette for more details.

Copy Link

Version

Install

install.packages('autodb')

Monthly Downloads

348

Version

3.2.4

License

BSD_3_clause + file LICENSE

Issues

Pull Requests

Stars

Forks

Maintainer

Mark Webster

Last Published

November 17th, 2025

Functions in autodb (3.2.4)

gv

Generate Graphviz input text to plot objects
insert

Insert data
gv.relation_schema

Generate Graphviz input text to plot relation schemas
gv.relation

Generate Graphviz input text to plot relations
gv.database_schema

Generate Graphviz input text to plot database schemas
keys

Relational data keys
gv.database

Generate Graphviz input text to plot databases
discover

Dependency discovery with DFD
gv.data.frame

Generate Graphviz input text to plot a data frame
normalise

Create normalised database schemas from functional dependencies
references

Schema references
reduce.database

Remove database relations not linked to the main relations
merge_schemas

Merge relation schemas in given pairs
rejoin

Join a database into a data frame
functional_dependency

Functional dependency vectors
reduce

Remove relations not linked to the main relations
merge_empty_keys

Merge relation schemas with empty keys
nudge

Nudge meta-analysis data
reduce.database_schema

Remove database schema relations not linked to the given relations
records

Relational data records
rename_attrs

Rename relational data attributes
synthesise

Synthesise relation schemas from functional dependencies
relation

Relation vectors
subrelations

Database subrelations
relation_schema

Relation schema vectors
subschemas

Schema subschemas
autoref

Add foreign key references to a normalised database
autodb-package

Database-style normalisation for data.frames
d2

Generate D2 input text to plot objects
d2.database

Generate D2 input text to plot databases
d2.database_schema

Generate D2 input text to plot database schemas
attrs

Relational data attributes
d2.data.frame

Generate D2 input text to plot a data frame
create

Create instance of a schema
autodb

Create a normalised database from a data frame
attrs_order

Relational data attribute order
df_equiv

Test data frames for equivalence under row reordering
d2.relation

Generate D2 input text to plot relations
d2.relation_schema

Generate D2 input text to plot relation schemas
decompose

Decompose a data frame based on given normalised dependencies
dependant

Dependants
df_rbind

Combine R Objects by Rows or Columns
detset

Determinant sets
database

Databases
df_duplicated

Determine Duplicate Elements
database_schema

Database schemas