The Cost of Dollars per Terabyte
Let me be blunt: using price per terabyte as the measure of a data warehouse platform is holding back the entire business intelligence industry.
Consider this… The Five Minute Rule (see here and here) clearly describes the economics of HW technology… suggesting exactly when data should be retained in memory versus when it may be moved to a peripheral device. But vendors who add sufficient memory to abide by the Rule find themselves significantly improving the price/performance of their products but weakening their price/TB and therefore weakening their competitive position.
We see this all of the time. Almost every database system could benefit from a little more memory. The more modern systems which use a data flow paradigm, Greenplum for example, try to minimize I/O by using memory effectively. But the incentive is to keep the memory configured low to keep their price/TB down. Others, like Teradata, use memory carefully (see here) and write intermediate results to disk or SSD to keep their price/TB down… but they violate the Five Minute Rule with each spool I/O. Note that this is not a criticism of Teradata… they could use more memory to good effect… but the use of price/TB as the guiding principle dissuades them.
Now comes Amazon Redshift… with the lowest imaginable price/TB… and little mention of price/performance at all. Again, do not misunderstand… I think that Redshift is a good thing. Customers should have options that trade-off performance for price… and there are other things I like about Redshift that I’ll save for another post. But if price/TB is the only measure then performance becomes far too unimportant. When price/TB is the driver performance becomes just a requirement to be met. The result is that today adequate performance is OK if the price/TB is low. Today IT departments are judged harshly for spending too much per terabyte… and judged less harshly or excused if performance becomes barely adequate or worse.
I believe that in the next year or two that every BI/DW eco-system will be confronted with the reality of providing sub-three second response to every query as users move to mobile devices: phones, tablets, watches, etc. IT departments will be faced with two options:
- They can procure more expensive systems with a high price/TB ratio… but with an effective price/performance ratio and change the driving metric… or
- They can continue to buy inexpensive systems based on a low price/TB and then spend staff dollars to build query-specific data structures (aggregates, materialized views, data marts, etc.) to achieve the required performance.
It is time for price/performance to become the driver and support for some number of TBs to be a requirement. This will delight users who will appreciate better, not adequate, performance. It will lower the TCO by reducing the cost of developing and operating query-specific systems and structures. It will provide the agility so missed in the DW space by letting companies use hardware performance to solve problems instead of people. It is time.