I renamed this so that Teradata folks would not get here so often… its not really about Intelligent Memory… just prompted by it. The post on Intelligent Memory is here. – Rob
Two quick comments on Teradata’s recent announcement of Intelligent Memory.
First… very very cool. More on this to come.
Next… life is going to become very hard for my readers and for bloggers in this space. The notion of an in-memory database is becoming rightfully blurred… as is the notion of column store.
Oracle blurs the concepts with words like “database in-memory” and “hybrid column compression” which is neither an in-memory database or a column store.
Teradata blurs the concept with a strong offering that uses DRAM as a block-IO device (like the old RAM-disks we used to configure on our PCs).
Teradata and Greenplum blur the idea of a column store by adding columnar tables over their row store database engines.
I’m not a fan of the double-speak… but the ability of companies to apply the 80/20 rule to stretch their architectures and glue on new advanced technologies is a good thing for consumers.
But it becomes very hard to distinguish the products now.
In future blogs I’ll try to point out differences… but we’ll have to go a little deeper into the Database Fog.
Jason asked a great question in the comment section here… he asked… does Teradata’s Intelligent Memory erode HANA’s value proposition? Let me answer here in a more general way that is applicable to the general database space…
Every time a vendor puts more silicon between the CPU and the disk they will improve their performance (and increase their price). Does this erode HANA’s value proposition? Sure. Every advance by any vendor erodes every other vendor’s position.
To win business a new database product has to be faster than the competition. In my experience you have to be at least 30% faster to unseat the incumbent. If you are 50% faster you will win a lot of business. If you are 2x, 100%, faster you win nearly every time.
Therefore the questions are:
- Did the Teradata announcement eliminate a set of competitors from reaching these thresholds when Teradata is the incumbent? Yup. It is very smart.
- Does Intelligent Memory allow Teradata to reach these thresholds when they compete against another incumbent. Yup.
- Did it eliminate HANA from reaching these thresholds when competing with Teradata? I do not think so… in fact I’m pretty sure it is not the case… HANA should still be way over the 2x threshold… but the reasons why will require a deeper dive… stay tuned.
In the picture attached a 30 foot chunk eroded… but Exadata still stands. Will it be condemned?
Note: Here is a commercial post on the SAP HANA blog site that describes at a high level why I think HANA retains a distinct architectural advantage.
If the Gartner estimates here are correct… then DRAM prices will fall 50% per year per year over the next several years… and then in 2015 non-volatile RAM (see the related articles below) will become generally available.
It has been suggested that memory prices will fall slower than data warehouses will grow (see here). That does not seem to be the case… and the combination of cheaper memory and then non-volatile memory will make in-memory databases like SAP HANA ever more compelling. In fact, as I predicted… and to their credit, Teradata is adding more memory (see here).
In the post here I listed the units of parallelism (UoP) applied by various products on a single node. Those findings are summarized in the table below.
Cores per Node
UoP per Node
|Greenplum||DCA UAP Edition||
|Recommends 1 Segment for each 2 cores. Maybe some multi-threading per query so it could be greater than 8 on the average… and could be 16 with hyper-threads… but not more than 32 for sure.|
|Maybe only 12… cannot find if they use hyper-threads.|
|May use hyper-threads but limited by 16 FPGAs.|
|HANA||Any Xeon E7-4800||
A UoP is defined as the maximum number of instructions that can execute in parallel on a single node for a single query. Note that in the comments there was a lively debate where some readers wanted to count threads or processes or slices that were “active” but in a wait state. Since any program can start threads that wait I do not count these as UoP (later we might devise a new measure named units of waiting that would gauge the inefficiency in any given design by measuring the amount of waiting around required to keep the CPUs fed… maybe the measure would be valuable in measuring the inefficiency of the queue at your doctor’s office or at any government agency).
On some CPUs vendors such as Intel allow two threads to execute instructions in-parallel in a core. This is called hyper-threading and, if implemented, it allows for two UoP on a single core. Rather than constantly qualify the statements for the rest of this blog when I refer to cores I mean to imply hyper-threads.
The lively comments in the blog included some discussion of the sort of techniques used by vendors to try and keep the cores in the CPU on each node fed. It is these techniques that lead to more active I/O streams than cores and more threads than cores.
For several years now Intel and the other CPU manufacturers have been building ever more cores into their products. This has allowed them to continue the trend known as Moore’s Law. Multi-core is now a fact of life and even phones, tablets, and personal computers have multi-core chips.
But if you look at the table you can see that the database products above, even the newly announced products from Teradata and Netezza, are using CPUs with relatively few cores. The high-end Intel processors have 40 cores and the databases, with the exception of HANA, use Intel products with at most 16 cores. Further, Intel will deliver Ivy Bridge processors to the market this year with 120 cores. These vendors know this… yet they have chosen to deliver appliances with the previous generation CPUs. You might ask why?
I believe that there is an architectural reason for this (also a marketing reason covered here).
It is very hard to keep 80 cores fed with data when you have to perform block I/O. It will be nearly impossible to keep the 240 cores coming with Ivy Bridge fed. One solution is to deploy more nodes in a shared-nothing configuration with fewer cores per node… but this will be expensive requiring more power, floorspace, administration, etc. This is the solution taken by most of the vendors above. Another solution is to solve the problem without I/O with an in-memory database (IMDB) architecture. This is the solution taken by SAP with HANA.
Intel, IBM, and the rest will continue to build out using the multi-core approach for the foreseeable future. IMDB products will be able to fully utilize this product. Other products will struggle to take full advantage as we can see already… they will adapt and adjust and do what they can… but ultimately IMDB will win, I think… because there is just no other way to keep up as Moore’s Law continues to drive technology… no other way to feed the CPU engines with data fast enough.
If I am right then you will see more IMDB offerings from more vendors, including from the major vendors in the near future (note that this does not include the announcements of “database in memory” from Oracle which is not by any measure an in-memory database).
This is the underlying reason why Donald Feinberg (and Timo Elliott) are right on here. Every organization will be running in-memory… and soon.
If I were the Register I would have titled this: Raging Stuffed Elephant To Devour Two Warehouse Vendors… I love the Register… if you do not read it have a look…
This is a post is about the market implications of architecture…
Let us assume that Hadoop matures and finds a permanent place in the market. This is not certain with some folks expressing concern (here) and others boundless enthusiasm (here). So let’s assume… and consider where it might fit.
One place is in the data warehouse market… This view says Hadoop replaces the DBMS for data warehouses. But the very mature BI/DW market requires a high level of operational integrity and Hadoop is not there yet… it is advancing rapidly as an enterprise platform and I believe it will get there… but it will be 3-4 years. This is the thinking I provided here that leads me to draw the picture in Figure 1.
It is not that I believe that Hadoop will consume the data warehouse market but I believe that very large EDW’s… those over 1PB… and maybe over 500TB will be compelled by the economics of “free” to move big warehouses to Hadoop. So Hadoop will likely move down into the EDW space from the top.
Another option suggests that Big Data will be a platform unto itself. In this view Hadoop will sit beside the existing BI/DW platform and feed that platform the results of queries that derive structure from unstructured data… and/or that aggregate Big Data into consumable chunks. This is where Hadoop sits today.
In data warehouse terms this positions Hadoop as a very large independent analytic data mart. Figure 2 depicts this. Note that an analytics data mart, and a Hadoop cluster, require far less in the way of operational infrastructure… they share very similar technical requirements.
This leads me to the point of this post… if Hadoop becomes a very large analytic data mart then where will Greenplum and Netezza fit in 2-3 years? Both vendors are positioning themselves in the analytic space… Greenplum almost exclusively so. Both vendors offer integrated Hadoop products… Greenplum offers the Greenplum database and Hadoop in the same hardware cluster (see here for their latest announcement)… Netezza provides a Hadoop connector (here). But if you believe in Hadoop… as both vendors ardently do… where do their databases fit in the analytics space once Hadoop matures and fully supports SQL? In the next 3-4 years what will these RDBMSs offer in the big data analytics space that will be compelling enough to make the configuration in Figure 3 attractive?
I know that today Hadoop cannot do all that either Netezza or Greenplum can do. I understand that Netezza has two positions in the market… as an analytic appliance and as a data mart appliance… so it may survive in the mart space. But the overlap of technical requirements between Hadoop and an analytic data mart… combined with the enormous human investment in Hadoop R&D, both in the core and in the eco-system… make me wonder about where “Big Data” analytic relational databases will fit?
Note that this is not a criticism of the Greenplum RDBMS. Greenplum is a very fine product, one of the best EDW platforms around. I’ll have more to say about it when I provide my 2 Cents… But if Figure 2 describes the end state for analytics in 2-3 years then where is the place for the Figure 3 architecture? If Figure 3 is the end state then I do not see where the line will be drawn between the analytic workload that requires Greenplum and that that will run on Hadoop? I barely can see it now… and I cannot see it at all in the near future.
Both EMC Greenplum and IBM seem to strongly believe in Hadoop… they must see the overlap in functionality and feel the market momentum of Hadoop. They must see, better than most, that Hadoop wins this battle.
This is a rehash of my post for SAP here… I thought you might find it interesting as it describes the architecture HANA uses to support OLTP and BI against a single table.
A couple of points to think about:
- If you have only one database structure you can optimize for only one query; e.g. the OLTP query is fast against a OLTP structure but slow against a BI structure… or visa versa.
- If you have two structures you have to ETL the data between the two at some cost. There is cost in keeping a replica of the data, cost in developing, administering, and executing the ETL process. In addition there is a lost opportunity cost hidden in the latency of the data. You cannot see the current state of the business by querying the BI data as some data has not yet been ETL’d across.
- OLTP performance is normally paramount; so the perfect system would not compromise that performance or compromise it only a little.
Let’s look at the HANA approach to this at a high level.
HANA provides a single view of a table to an application or a user, but under-the-covers each table includes a OLTP optimized part, a BI optimized part, and a mechanism for moving data from one part to the other
When a transaction hits the system; inserts, updates, and deletes are processed in the OLTP part with no performance penalty. The read portion of the OLTP query accesses the read-optimized internal structure with no performance penalty. Note that reading a single column in a column store, which is the key for the transaction, is roughly equivalent to reading an index structure on top of a standard disk-based DBMS. Except the column is always in-memory which means I/O is never required. This provides the HANA system with an advantage over a disk-based system. Disk I/O is 120+ times slower than memory access so even an index is unlikely to beat in-memory. See here for some numbers you should know.
After the transaction is committed into the internal, OLTP-optimized part, a process starts that moves the data to the BI optimized part. This is called a delta merge as the OLTP portion holds all of the changes, the delta, in the data set.
When a BI query starts it can limit the scan to only partitions in the BI optimized part, or if real-time data is required it can scan both parts. The small portion of the scan that accesses the OLTP/delta portion is sub-optimal when compared to the scan of the BI part, but not slow at all as the data is all in-memory.
We can tease the performance apart as follows:
- There is a OLTP insert/update/delete “write” portion… and HANA executes this like any OLTP database, as fast as an OLTP RDBMS, with a commit after a write-to-log;
- There is a OLTP select “read” portion… and HANA performs this in the in-memory column store faster than many OLTP databases… and scans the delta structure as fast as any OLTP database;
- There is a delta merge from the OLTP write-optimized part to the BI read-optimized column store that is hundreds to tens of thousands of times faster than any ETL tool; and
- There is a BI select portion that scans the in-memory column store hundreds to thousands of times faster than a disk-based BI database.
- If the BI query requires access to real-time data then an in-memory scan of the delta file is required… there is no analogy to this in a system with separate OLTP and BI tables.
There seems to be a sort of odd tradition for bloggers to look back at the past year as the New Year starts to unfold. Here is my review of my posts and some presents
Far and away the most viewed post was Exalytics vs. HANA What are they thinking? This simply notes that these two products are not really comparable sharing only the descriptor “in-memory”.
My Favorite Post
I liked this the best… ’nuff said: What is Big Data?
OK, here is my 2nd favorite: A Quick Five Minute Rule Update for In-memory Databases, but you probably need to read the prequel first: The Five Minute Rule and In-memory Databases
These papers and the underlying thinking by smarter folks than I will inform you about the definition of Hot Data from the point of pure IT economics.
The Most Under-rated Post
This is the post I thought was the most important… as it might strongly influence data warehouse platform buying decisions over the next few years… And it might even influence the stocks you pick: The Future of Hadoop and Big Data DBMSs
Some Other Posts to Read
Here are two posts that informed me:
The Five Minute Rule… This will point you to a Wikipedia article that will point you to the whole series of papers.
What Every Programmer Should Know About Memory… This paper goes into gory detail about how memory works inside a processor. It is hardware-centric for you software folks… but provides the basis for understanding why in-memory DBMSs are fast and why Exadata is not an in-memory DBMS.
And some other Good Stuff
Kevin Closson on Exadata
Thank you for your attention last year. I hope that each of you has a safe, prosperous, and happy new year…
I posted a blog on the SAP site here that discussed the implications of mobile clients. I want to re-emphasize the issue as it is crucial.
While at Greenplum we routinely replaced older EDW platforms and provided stunning performance. I recall one customer in particular where we were given a query that ran in 7 hours and Greenplum executed the query in seven seconds. This was exceptional… more typical were cases where we reduced run-times from several hours to under 30 minutes… to 10 minutes… to 5 minutes. I’m sure that every major competitor: Teradata, Greenplum, Netezza, and Exadata has similar stories to tell.
But 5 minutes will not cut it if you are servicing a mobile client where sub-second response to the device is a requirement… and 10 minutes is out of the question. It does not matter if it ran in 10 hours before… 10 minute response is not acceptable to a mobile device.
Today we see sub-second response delivered to our phones by custom applications built on special high-performance platforms designed specifically to service a mobile client: iPhones, iPads, and Android devices.
But what will we do about the BI applications built on commercial platforms which have just used every trick in the book to become one of the 5 minute stories mentioned above?
I think that there are only a couple of architectural choices.
- We can rewrite the high-value queries as custom applications using specialized infrastructure… at great expense… and leaving the vast majority of queries un-serviced.
- We can apply the 80/20 rule to get the easiest queries serviced with only 20% of the effort. But according to Murphy the 20% left will be the highest value queries.
- We can tack on expensive, specialized, accelerators to some queries… to those that can be accelerated… but again we leave too much behind.
- Or we can move to a general purpose high performance computing platform that can service the existing BI workload with sub-second response.
In-memory computing will play a role… Exalytics provides option #3… HANA option #4.
SSD devices may play a role… but the performance improvements being quoted by vendors who use SSD as a block I/O device is 10X or less. A 10X improvement applied to a query that was just improved to 10 minutes yields a 1 minute query… still not the expected level of service.
IT departments will have to evaluate the price/performance, not just the price, as they consider their next platform purchases. The definition of adequate response is changing… and the old adequate, at the least cost, may not cut it. Mobile clients are here to stay. The productivity gains expected from these devices is significant. High performance BI computing is going to be a requirement.
When I was at Greenplum… and now again at SAP… I ran into a strange logic from Teradata about query concurrency. They claimed that query concurrency was a good thing and an indicator of excellent workload management. Let’s look at a simple picture of how that works.
In Figure 1 we depict a single query on a Teradata cluster. Since each node is working in parallel the picture is representative no matter how many nodes are attached. In the picture each line represents the time it takes to read a block from disk. To make the picture simple we will show I/O taking only 1/10th of the clock time… in the real world it is slower.
Given this simplification we can see that a single query can only consume 10% of the CPU… and the rest of the time the CPU is idle… waiting for work. We also represented some I/O to spool files… as Teradata writes all intermediate results to disk and then reads them in the next step. But this picture is a little unfair to Greenplum and HANA as I do not represent spool I/O completely. For each qualifying row the data is read from the table on disk, written to spool, and then read from spool in the subsequent step. But this note is about concurrency… so I simplified the picture.
Figure 2 shows the same query running on Greenplum. Note that Greenplum uses a data flow architecture that pushes tuples from step to step in the execution plan without writing them to disk. As a result the query completes very quickly after the last tuple is scanned from the table.
Let me say again… this story is about CPU utilization, concurrency, and workload management… I’m not trying to say that there are not optimizations that might make Teradata outperform Greenplum… or optimizations that might make Greenplum even faster still… I just want you to see the impact on concurrency of the spool architecture versus the data flow architecture.
Note that on Greenplum the processors are 20% busy in the interval that the query runs. For complex queries with lots of steps the data flow architecture provides an even more significant advantage to Greenplum. If there are 20 steps in the execution plan then Teradata will do spool I/O, first writing then reading the intermediate results while Greenplum manages all of the results in-memory after the initial reads.
In Figure 3 we see the impact of having the data in-memory as with HANA or TimeTen. Again, I am ignoring the implications of HANA’s columnar orientation and so forth… but you can clearly see the implications by removing block I/O.
Now let’s look at the same pictures with 2 concurrent queries. Let’s assume no workload management… just first in, first out.
In Figure 4 we see Teradata with two concurrent queries. Teradata has both queries executing at the same time. The second query is using up the wasted space made available while the CPUs wait for Query 1’s I/O to complete. Teradata spools the intermediate results to disk; which reduces the impact on memory while they wait. This is very wasteful as described here and here (in short, the Five Minute Rule suggests that data that will be reused right away is more economically stored in memory)… but Teradata carries a legacy from the days when memory was dear.
But to be sure… Teradata has two queries running concurrently. And the CPU is now 20% busy.
Figure 5 shows the two-query picture for Greenplum. Like Teradata, they use the gaps to do work and get both queries running concurrently. Greenplum uses the CPU much more efficiently and does not write and read to spool in between every step.
In Figure 6 we see HANA with two queries. Since one query consumed all of the CPU the second query waits… then blasts through. There is no concurrency… but the work is completed in a fraction of the time required by Teradata.
If we continue to add queries using these simple models we would get to the point where there is no CPU available on any architecture. At this point workload management comes into play. If there is no CPU then all that can be done is to either manage queries in a queue… letting them wait for resources to start… or start them and let them wastefully thrash in and out… there is really no other architectural option.
So using this very simple depiction eventually all three systems find themselves in the same spot… no CPU to spare. But there is much more to the topic and I’ve hinted about these in previous posts.
Starting more queries than you can service is wasteful. Queries have to swap in and out of memory and/or in and out of spool (more I/O!) and/or in and out of the processor caches. It is best to control concurrency… not embrace it.
Running virtual instances of the database instead of lightweight threads adds significant communications overhead. Instances often become unbalanced as the data returned makes the shards uneven. Since queries end when the slowest instance finishes it’s work this can reduce query performance. Each time you preempt a running query you have to restore state and repopulate the processor’s cache… which slows the query by 12X-20X. … Columnar storage helps… but if the data is decompressed too soon then the help is sub-optimal… and so on… all of the tricks used by databases and described in these blogs count.
But what does not count is query concurrency. When Teradata plays this card against Greenplum or HANA they are not talking architecture… it is silliness. Query throughput is what matters. Anyone would take a system that processes 100,000 queries per hour over a system that processes 50,000 queries per hour but lets them all run concurrently.
I’ve been picking on Teradata lately as they have been marketing hard… a little too hard. Teradata is a fine system and they should be proud of their architecture and their place in the market. I am proud to have worked for them. I’ll lay off for a while.
I recently pointed out some silliness published by Teradata to several SAP prospects. There is more nonsense that was sent and I’d like to take a moment to clear up these additional claims.
In their note to HANA prospects they used the following numbers from the paper SAP published here:
|# of Query Streams||
|# of Queries per Hour (Throughput)||
Teradata makes several claims from these numbers. First they claim that the numbers demonstrate a bottleneck that is tied to either the NUMA effect or to the SMP Knee Curve. This nonsense is the subject of a previous blog here.
For any database system as you increase the number of queries to the point where there is contention the throughput decreases. This is just common sense. If you have 10 cores and 10 threads and there is no contention then all threads run at the same speed as fast as possible. If you add an 11th thread then throughput falls off, as one thread has to wait for a core. As you add more threads the throughput falls further until the system is saturated and throughput flattens. Figure 1 is an example of the saturation curve you would expect from any system as the throughput flattens.
There are some funny twists to this, though. If you are an IMDB then each query can use 100% of a core. If you are multi-threaded IMDB then each query can use 100% of all cores. If you are a disk-based system then you give up the CPU to another query while you wait for I/O… so throughput falls. I’ll address these twists in a separate blog… but you will see a hint at the issue here.
Teradata claims that these numbers reflect a scaling issue. This is a very strange claim. Teradata tests scaling by adding hardware, data, and queries in equal amounts to see if the query performance holds constant… or they add hardware and data to look for a correlation between the number of nodes and query performance… hoping that as the nodes increase the response time decreases. In fact Teradata scales well… as does HANA… But the hardware is constant in the HANA benchmark so there is no view into scaling at all. Let me emphasis this… you cannot say anything about scaling from the numbers above.
Teradata claims that they can extrapolate the saturation point for the system… this represents very bad mathematics. They take the four data points in the table and create an S curve like the one in Figure 1… except they invert it to show how throughput decreases as you move towards the saturation point… Figure 2 shows the problem.
If you draw a straight line through the curve using any sort of math you miss the long tail at the end. This is an approximation of the picture Teradata drew… but even in their picture you can see a tail forming… which they ignore. It is also questionable math to extrapolate from only four observations. The bottom line is that you cannot extrapolate the saturation point from these four numbers… you just don’t know how far out the tail will run unless you measure it.
To prove this is nonsense you just have to look here. It turns out that SAP publicly published these benchmark results in two separate papers and this second one has numbers out to 60 streams. Unsurprisingly at 60 streams HANA processed 112,602 queries per hour while Teradata told their customers that it would saturate well short of that… at 49,601 queries (they predicted that HANA would thrash and the number of queries/hour would fall back… more FUD).
Teradata is sending propaganda to their prospects with scary extrapolations and pronouncements of architectural bottlenecks in HANA. The mathematics behind their numbers is weak and their incorrect use of deep architectural terms demonstrates ignorance of the concepts. They are trying to create Fear, Uncertainty, and Doubt. Bad marketing… not architecture, methinks.