The Cost of Dollars per Terabyte


(Photo credit: Images_of_Money)

Let me be blunt: using price per terabyte as the measure of a data warehouse platform is holding back the entire business intelligence industry.

Consider this… The Five Minute Rule (see here and here) clearly describes the economics of HW technology… suggesting exactly when data should be retained in memory versus when it may be moved to a peripheral device. But vendors who add sufficient memory to abide by the Rule find themselves significantly improving the price/performance of their products but weakening their price/TB and therefore weakening their competitive position.

We see this all of the time. Almost every database system could benefit from a little more memory. The more modern systems which use a data flow paradigm, Greenplum for example, try to minimize I/O by using memory effectively. But the incentive is to keep the memory configured low to keep their price/TB down. Others, like Teradata, use memory carefully (see here) and write intermediate results to disk or SSD to keep their price/TB down… but they violate the Five Minute Rule with each spool I/O. Note that this is not a criticism of Teradata… they could use more memory to good effect… but the use of price/TB as the guiding principle dissuades them.

Now comes Amazon Redshift… with the lowest imaginable price/TB… and little mention of price/performance at all. Again, do not misunderstand… I think that Redshift is a good thing. Customers should have options that trade-off performance for price… and there are other things I like about Redshift that I’ll save for another post. But if price/TB is the only measure then performance becomes far too unimportant. When price/TB is the driver performance becomes just a requirement to be met. The result is that today adequate performance is OK if the price/TB is low. Today IT departments are judged harshly for spending too much per terabyte… and judged less harshly or excused if performance becomes barely adequate or worse.

I believe that in the next year or two that every BI/DW eco-system will be confronted with the reality of providing sub-three second response to every query as users move to mobile devices: phones, tablets, watches, etc. IT departments will be faced with two options:

  1. They can procure more expensive systems with a high price/TB ratio… but with an effective price/performance ratio and change the driving metric… or
  2. They can continue to buy inexpensive systems based on a low price/TB and then spend staff dollars to build query-specific data structures (aggregates, materialized views, data marts, etc.) to achieve the required performance.

It is time for price/performance to become the driver and support for some number of TBs to be a requirement. This will delight users who will appreciate better, not adequate, performance. It will lower the TCO by reducing the cost of developing and operating query-specific systems and structures. It will provide the agility so missed in the DW space by letting companies use hardware performance to solve problems instead of people. It is time.

About these ads

5 thoughts on “The Cost of Dollars per Terabyte

  1. Largely, I think this is a manifestation of two things:

    1) trying to make a simple comparison metric for financial decision makers who have to budget and pay for these systems – they are not cheap, and explaining the complexities and cost/benefits of competing systems can take a lot of time executives either don’t have – or are unwilling to give, in addition to the “marketing mischief” and general bias employed by vendors – intentional or not.

    2) as the market place for BI fractures from generalized Bi computing platforms into solution specific platforms, meaningful comparison is increasingly moving towards a custom benchmark for each and every problem. But consumer/customers are increasingly resorting to religion or overly simplistic metrics to account for a lack of time or ability to invest in the fact generation needed to make a good subjective decision.

    Like this

  2. Hi Rob, this article reminded me of another article I recently saw in the Financial Times. It was a quick interview of the CEO of Kroger. He mentioned they’ve been doing really well in giving customers options across various price spectrums. For example, for cost conscious folks there is a jelly that is private label at about .06 an ounce filled with high fructose syrup. On the top end, there is also a private label item, but it’s more of a preserves fruit spread. It costs .30 an ounce and is arguably a healthier product. The bean counter could argue both items are “jelly”, but they in fact they serve totally different markets, tastes and preferences.

    Thanks for writing this article. I enjoyed it.

    Like this

  3. I’m not convinced price per TB is holding back the entire BI industry, but it’s definitely a blunt comparison tool.

    The race to the bottom on price may make sense to sales folks (“buy mine it’s cheaper for just as much”) and bean counters (“that new one is cheaper for the same size, and they’re both just computers”) but it doesn’t mean much to those at the sharp end.

    “..vendors who add sufficient memory to abide by the Rule find themselves significantly improving the price/performance of their products but weakening their price/TB and therefore weakening their competitive position.”

    Is this suggesting that as RAM is added as a performance aid the price must increase, meaning for a fixed amount of disk or data the price per TB (of disk/data) rises? If so, this would suggest that vendors are being dissuaded from adding RAM to aid performance due to concerns about degrading their cost per TB. Interesting take, for sure.

    The price per TB seems to be a relatively new sales/marketing approach led by those trying to eat what is basically Teradata’s lunch. Pre-IBM, Netezza never led with the price per TB message, not in the UK or EMEA anyway. How they go to market now as part of IBM I’m not sure. Price/TB seems to be an attempt to shine a light on what was/is perceived to be the relatively high cost of Teradata and other established platforms.

    As those that read blogs such as this are hopefully all too aware, the price per unit of disk storage acquired tells us nothing about the capabilities of a given system, and that’s where BI success lives or dies. That and a skilled SI to help ‘do the doing’, of course ;-)

    Like this

  4. Hi Paul,

    I think that price/TB has been the standard for a long time.

    This does not mean that it is the only criteria by any means… but it drives the economic discussions for both sharp and dull. The sharp bring it up to negotiate down price… and the dull think that it is the standard measure.

    It does not mean that every vendor sells with a price/TB message… it just means that every vendor knows that it will come up sooner or later in the cycle… and it influences product plans.

    Actually I think it should be augmented with a a metric that judges skilled SIs by price/Kg… which would be as meaningful…


    Like this

  5. Pingback: Thoughts on AWS Redshift… | Database Fog Blog

Comments are closed.