Relational model

Adding "views" to autodb, part three: summarising the schema

Last time, we wrote a function to check whether a given database schema is connected – ignoring relations for constants, as a special case – and wrote a property test to show that the schemas generated by normalise and autodb aren’t always connected. Our goal is to add virtual “pre-decomposition” relations to make the schema connected; I’m referring to them as “views”, which is a misnomer, but is much shorter to write.

Adding "views" to autodb, part two: initial tests

Last time, I showed how I think that adding “views” – i.e. previous decomposition steps, or “pre-decompositions” – to a database schema might ensure that it stays connected. Now we need to work out how to implement it. library(autodb) ## ## Attaching package: 'autodb' ## The following object is masked from 'package:stats': ## ## decompose show <- function(x) DiagrammeR::grViz(gv(x), width = "100%") Test property The problem we want to solve is that the database schema given by autodb sometimes isn’t connected, so the first thing we need is a function that checks whether a schema is connected.

Adding "views" to autodb, part one: motivation

autodb is still chugging along. Most of the additions I’ve made recently have been stuff that’s nice to have around the edges. Add a code generator for diagrams in D2. Improve the documentation a bit. Speed the existing algorithms up a bit. But there’s been no large addition since I added the FDHits search algorithm earlier in the year, which sped up the search by several orders of magnitude.

autodb is on CRAN

I’m pleased to announce that autodb is now on CRAN, in addition to Github. I haven’t submitted anything to CRAN before, so I’m really pleased. It’s been a bit of a long road: I started writing it nearly three years ago. This was a side project, and I’m not the quickest worker to begin with. Hopefully, the trade-off is that there aren’t many bugs left in what I have written.

R has no consistent table class

The usual case R, in addition to being array-based, can also be table-based: it has a table class in the base language, data.frame. This is great, because a lot of data comes in table form. Here are some simple examples: twocols <- data.frame( a = rep(1:3, 4), b = rep(1:2, 6) ) twocols ## a b ## 1 1 1 ## 2 2 2 ## 3 3 1 ## 4 1 2 ## 5 2 1 ## 6 3 2 ## 7 1 1 ## 8 2 2 ## 9 3 1 ## 10 1 2 ## 11 2 1 ## 12 3 2 onecol <- data.