Part 7 – How Hadooped is Greenplum, the Pivotal GPDB?

Now for Greenplum & Hadoop… to continue this thread on RDBMS-Hadoop integration (Part 1, Part 2, Part 3, Part 4, Part 5, Part 6) I have suggested that we could evaluate integration architecture using three criteria: How parallel are the pipes to move data between the RDBMS and the parallel file system; Is there intelligence to push down predicates; and Is there…

The Hype of Big Data

As preface to this you might check out the definition I suggested for Big Data last week here… – Rob I left Greenplum in large part because they made their mark in… and then abandoned… the  data warehouse market for a series of big hype plays: first analytics and data science; then analytics, data science,…

Who is How Columnar? Exadata, Teradata, and HANA – Part 2: Column Processing

In my last post here I suggested that there were three levels of maturity around column orientation and described the first level, PAX, which provides columnar compression. This apparently is the level Exadata operates at with its Hybrid Columnar Compression. In this post we will consider the next two levels of maturity: early materialized column…

The Fog is Getting Thicker…

I renamed this so that Teradata folks would not get here so often… its not really about Intelligent Memory… just prompted by it. The post on Intelligent Memory is here. – Rob Two quick comments on Teradata’s recent announcement of Intelligent Memory. First… very very cool. More on this to come. Next… life is going…

Hadoop and the EDW

Cloudera and Teradata have jointly published a nice paper here that presents an interesting perspective of how Hadoop and an EDW play together. Simply put, Hadoop becomes the staging area for “raw data streams” while the EDW stores data from “operational systems”. Hadoop then analyzes the raw data and shares the results with the EDW. Two…

Some Unaudited HANA Performance Numbers

The following performance numbers are being reported publicly for HANA: HANA scans data at 3MB/msec/core On a high-end 80-core server this translates to 240GB/sec per node HANA inserts rows at 1.5M records/sec/core Or 120M records/sec per node… Aggregates 12M records/sec/core Or 960M records per node… These numbers seem reasonable: A 100X improvement over disk-based scan…