Cloud DBMS < High Performance DBMS
In my post here I suggested that database computing was becoming a special case of high-performance computing. This trend will bump up against the trend towards cloud computing and the bump will be noisy.
In the case of general commercial computing customers running cloudy virtualized servers paid a 5%-20% performance penalty… but the economics still worked for the cloud side.
For high-performance database computing it is unclear how much the penalty will be? If a virtualized, cloudy, database gives up performance because SIMD becomes problematic, priming the cache becomes hard, CPU stalls become more common, and there is a move from a shared nothing architecture to SANs or SAN-like shared data devices, then the penalty may be 300%-500% and the cloud databases will likely lose.
As I noted in the series starting here, there are lots of issues around high-performance database computing in the cloud. It will be interesting to see how the database vendors manage the bump and the noise. So keep an eye out. If your database of choice starts to look cloudy… if it becomes virtualized and it starts moving from a shared-nothing cluster to a SAN… then you will know which side of the bump they are betting on. And if they pick the cloudy side then you need to ask how they plan to architect the system to hold the penalty to under 20%…
I also mentioned in that series that in-memory databases had an advantage over peripheral-based databases as they did not have to pay a penalty for de-coupling the IO bandwidth that is part of a shared-nothing cluster. But even those vendors have to manage the fact that the database is abstracted… virtualized… away from the hardware.
If I were King I would develop a high-performance database that implemented the features of a cloud database: elasticity, easy provisioning, multi-tenancy; over bare metal. Then you might get the best of both worlds.
Database Computing is Supercomputing… Some external reading: May 2013
I would like to recommend to you John Appleby’s post here on the HANA blog site. While the title suggests the article is about HANA, in fact it is about trends in computing and processors… and very relevant to posts here past, present, and upcoming…
I would also recommend Curt Monash’s site. His notes on Teradata here mirror my observation that a 30%-50% performance boost per release cycle is the target for most commercial databases… and what wins in the general market. This is why the in-memory capabilities offered by HANA and maybe DB2 BLU are so disruptive. These products should offer way more than that… not 1.5X but 100X in some instances.
Finally I recommend “What Every Programmer Should Know About Memory” by Ulrich Drepper here. This paper provides a great foundation for the deep hardware topics to come.
Database computing is becoming a special case, a commercial case, of supercomputing… high-performance computing (HPC) to those less inclined to superlatives. Over the next few years the differentiation between products will increasingly be due to the use of high-performance computing techniques: in-memory techniques, vector processing, massive parallelism, and use of HPC instruction sets.
This may help you to get ready…
Teradata CPU Planning
I suggested here that Teradata shipped the EDW 6700 series without waiting for Ivy Bridge because they could not use the cores effectively… but it could be that Haswell (see here) fit their release schedule better. It will be interesting to see whether they can use all of the cores then?
@Sapphire
If anyone is at SAP’s Sapphire conference this week, look me up… I’ll be in the Intel booth every afternoon chatting about HANA and Hadoop….
The Fog is Getting Thicker…
I renamed this so that Teradata folks would not get here so often… its not really about Intelligent Memory… just prompted by it. The post on Intelligent Memory is here. – Rob
Two quick comments on Teradata’s recent announcement of Intelligent Memory.
First… very very cool. More on this to come.
Next… life is going to become very hard for my readers and for bloggers in this space. The notion of an in-memory database is becoming rightfully blurred… as is the notion of column store.
Oracle blurs the concepts with words like “database in-memory” and “hybrid column compression” which is neither an in-memory database or a column store.
Teradata blurs the concept with a strong offering that uses DRAM as a block-IO device (like the old RAM-disks we used to configure on our PCs).
Teradata and Greenplum blur the idea of a column store by adding columnar tables over their row store database engines.
I’m not a fan of the double-speak… but the ability of companies to apply the 80/20 rule to stretch their architectures and glue on new advanced technologies is a good thing for consumers.
But it becomes very hard to distinguish the products now.
In future blogs I’ll try to point out differences… but we’ll have to go a little deeper into the Database Fog.
How Good Is Teradata’s Intelligent Memory?
A 30 feet chunk of the cliff below the apartment building fell to Pacific Ocean. (Photo credit: Wikipedia)
Jason asked a great question in the comment section here… he asked… does Teradata’s Intelligent Memory erode HANA’s value proposition? Let me answer here in a more general way that is applicable to the general database space…
Every time a vendor puts more silicon between the CPU and the disk they will improve their performance (and increase their price). Does this erode HANA’s value proposition? Sure. Every advance by any vendor erodes every other vendor’s position.
To win business a new database product has to be faster than the competition. In my experience you have to be at least 30% faster to unseat the incumbent. If you are 50% faster you will win a lot of business. If you are 2x, 100%, faster you win nearly every time.
Therefore the questions are:
- Did the Teradata announcement eliminate a set of competitors from reaching these thresholds when Teradata is the incumbent? Yup. It is very smart.
- Does Intelligent Memory allow Teradata to reach these thresholds when they compete against another incumbent. Yup.
- Did it eliminate HANA from reaching these thresholds when competing with Teradata? I do not think so… in fact I’m pretty sure it is not the case… HANA should still be way over the 2x threshold… but the reasons why will require a deeper dive… stay tuned.
In the picture attached a 30 foot chunk eroded… but Exadata still stands. Will it be condemned?
Note: Here is a commercial post on the SAP HANA blog site that describes at a high level why I think HANA retains a distinct architectural advantage.
Memory Trends and HANA
If the Gartner estimates here are correct… then DRAM prices will fall 50% per year per year over the next several years… and then in 2015 non-volatile RAM (see the related articles below) will become generally available.
It has been suggested that memory prices will fall slower than data warehouses will grow (see here). That does not seem to be the case… and the combination of cheaper memory and then non-volatile memory will make in-memory databases like SAP HANA ever more compelling. In fact, as I predicted… and to their credit, Teradata is adding more memory (see here).
Related articles
