I love the tweetness of the @howarddresner posts restated here regarding data science… and the dialog it has started… and I would like to add a twist to the conversation.
First a story…
I once consulted for a company who was building a marketing service for their clients. They targeted customers with products provided by those clients using data they had in-house. The company had a team of data scientists who built targeting models. The same team built models/reports that evaluated the effectiveness of the targeting. Somehow their evaluations demonstrated that they were brilliant… producing results that were unprecedented and completely justifying the service.
But a close look at the targeting and the evaluation showed that the targeting was weak and that the results were grossly inflated.
Karl Popper tells us that science must be falsifiable. But science requires enough rigor that somebody must attempt to falsify any claims.
“Data Science” is not science in this respect. It is alchemy… and the shortage of data scientists is twice as bad as you think… because there must be two data scientists for every claim of data discovery… one to discover it and one to test the discovery.
My Father used to say” “Figures never lie… but liars figure”. I am not accusing all data scientists of being as unethical as those in my story… but I worry that under the algorithmic mumbo-jumbo there emerges new versions of the truth that are utterly untested and will often prove inaccurate.
- Karl Popper, and Social Probability (statelegitimacy.com)
In the late 1980′s and the early 1990′s the term for software that business users executed to run reports, fire off canned queries, and/or to explore data ad hoc was called “decision support” software. Later, and still today, the term “business intelligence” came into use.
I never understood the sense of the switch. The term “business intelligence” is vague… sort of fluffy and pretentious. “Decision support” implies a purpose. In the years when the switch from one term to the other was in progress, if you asked the question: what do you mean by “business intelligence” the answer was… it is “decision support”.
Today the analytics that underlie both terms are becoming more sophisticated, and they execute in near-real-time. It could be said that there is business intelligence in the process that acquires data, analyzes it, discovers a pattern, and applies a rule automatically as a result. But the software programmer who built the system was focused on automating the decision process… not on creating intelligence.
A clear focus on supporting complex decisions will increase the chances of delivering a return on your investment in analytics. ”Intelligence” is not useful unless it is applied to make a better decision. I vote for a return to the phrase “decision support”.
OLAP searches a set of pre-aggregated data… a cube. If the cube is large enough that you don’t bump into the edges you might think that your search is ad hoc… but that is an illusion. The set is prescribed not ad hoc.
In the 1980′s we sent paper reports out… they were moved on a pallet with a fork-lift. The reports aggregated key metrics to many levels in a hierarchy sliced and diced across many dimensions. Today we take the lines off the reports and store them digitally in a cube and provide tools to let users navigate the cube to build their reports. What they build looks, to a large extent, like the reports from the 80′s.
Data warehousing provides more data and better data… so there are more cubes, more dimensions, more reports… and hopefully more business intelligence. But these reports provide 1980′s quality business intelligence on a screen instead of on paper… bounded by the OLAP cube.
When you hear folks talk about data science and data mining and advanced analytics and optimization… they are talking about advanced mathematical treatment of the data… know that this is going to require technology that is beyond the capabilities of a OLAP engine.
Exalytics is a OLAP engine. Here are some Exalytics use cases from a proponent. They are about OLAP dashboards… good stuff… but hardly advanced analytics. Oracle says that Exalytics is engineered for Extreme Analytics. If we agree that “extreme” analytics is not in any way advanced… then I agree.
Here is a sound bite on Big Data I composed for another source…
Big Data is relative. For some firms Big Data will be measured in petabytes and for other in hundreds of gigabytes. The point is that very detailed data provides the vital statistics that quantify the health of your business.
To store and access Big Data you need to build on a scalable platform that can grow. To process Big Data you need a fully scalable parallel computing environment.
With the necessary infrastructure in place the challenge becomes: how do you gauge your business and how do you change the decision-making processes to use the gauges?