Today we’ll rant on the term “NoSQL” which is even more ambiguous than the term “big data” with it’s ever changing definition. By the way, I quite dislike the term big data as well which has become a big machine of marketing hyperbole instead of solid technical explanations.
Before we get into that, I wanted to remind you that it’s a good time to plan to attend the WWDVC this year. If you’ve already reserved your seat, congratulations.
Otherwise, here’s the link:
[ WWDVC ]
Alright, back to NoSQL, sometimes referred to as Not-Only-SQL, when they really should be “Also … SQL”. The SQL interfaces are almost always an add-on programmed capability.
First a bit of history …
Now, I don’t know if you’ve heard of symbolic expressions a.k.a S-expressions. These were extremely powerful constructs and were used with the Lisp family of languages like Common Lisp and Scheme, so this is from late 60s, early 70s.
Basically, you would enclose things in parentheses (brackets for the original English speakers).
So, data would look something like this:
`(ID1 (name `John)
(address ((“123 acme street”)
(“Farmville”)
(“KY”)(“USA”))
(profession `superman)
(children `(“Jack”, “Black”, “Brick”, “Dumbo”)))
Elements could easily be accessed via list manipulation.
Need a key-value pair. Easy.
`((orange . orange)(orange . carrot)(yellow . mango)(green . pepper)(yellow . pepper))
It’s called an associated list or a-list. There are built in evaluators and functions to retrieve elements and they can easily be programmed too.
And even the code looked the same. A factorial function would be
(define fact (x)
(if (= x 0)
1
(* x (fact (- n 1)))))
The code and data looks the same because code is data and data is code. This was beautiful and extremely powerful. In fact, so powerful that many people had a hard time comprehending the capabilities. Lisp macros which are self-expanding and evaluating constructs would make most programmers head explode despite it’s meta-programming capabilities.
A lot of folks are also put off by superficial things like parentheses.
Sidebar: There’s an inside joke that Lisp is an acronym that stands for “Lots of Irritating Silly Parentheses”.
This programming language was in fact one of the first non-relational databases. It’s so powerful that you can write anything in it. In fact, the award winning triple store AllegroGraph is written in Common Lisp. Later Common Lisp also got an object store called CLOS which was both a programming environment and a non-relational store.
You could just as easily write your own fully function relational database in it. All you’d need is a SQL parser and a storage mechanism and bang … It’s a NoSQL database with SQL capabilities.
Most students of Lisp have done this as an elementary exercise and most Lisp books walk you through at least one example of it.
And … all of this predates even the invention of the C language which happened in 1972.
Of course we saw symbolic expressions re-invented poorly again as XML and JSON and other formats, as only data.
Ok, history lesson over.
Back to the term “NoSQL”.
A common question that’s asked is what to do with Data Vaults and NoSQL databases. What differences would there be in design?
First, you’d need Data Vault 2.0. You can’t do this on a DV 1.0 (formerly called Data Vault)
To learn more about Data Vault 2.0, visit
[ Data Vault 2.0 ]
Then, you’d definitely want to qualify which NoSQL database you’re using as you cannot use the same design construct for every one of them, because you’ll have completely different use cases for big table implementations including Accumulo, HBase, Cassandra and BigTable as these are implemented as column stores.
The data stored in document stores like CouchDB or MongoDB on the other hand will be completely in a different format. The equivalent of a table is a collection of documents, but there’s no structural restriction or enforcement that they contain the same attributes.
And, if you go for graph databases like AllegroGraph or Neo4J, then you’re looking at something entirely different.
The design decisions will vary considerably.
And I haven’t even talked about key-value stores or object stores here.
Now, you know why the generic term NoSQL is such a misnomer. We’ll talk about each of these in detail soon.
In a majority of cases, these are unsuitable for the Data Warehouse component.
But, there are certain conditions that when met, makes some of these ideal for some of the Data Warehouse components, and sometime with marked performance improvements.
This is only available in Data Vault 2.0 here
[ Data Vault 2.0 ]
That’s all folks!