Recently, there has been a conversation on what defines “Big Data”. It’s my position (among others) that Big Data is data that is so large that a single computer cannot process it in a timely manner. Hence, we have grid computing. Grid computing is not inexpensive and is overkill for many organizations.
The term “Huge Data” has been bandied about as well. In the conversations regarding what is Big Data, it was sort of agreed that Huge Data is a data set that sits somewhere between 10GB and 20GB in size. (Note: In about two years I will look back at this article and laugh about writing that a 20GB data set is huge for desktops and small servers.) The term Big Data is so abused and misused by the technical press and even many of the BI vendors that it’s almost an irrelevant term. But Huge Data has my interest and I will tell you why.
The other day I read a blog article on the failure of Big Data projects. The article talks about a failure rate of 55%. I was not surprised by that kind of failure rate. I was surprised that there were not solutions being offered. In the analytics world, especially in finance and health care, we tend to work with data that comes from a data warehouse or a specialized data mart. The specialized data mart is really an analytics data mart with the data cleaned and transformed into a form that is useful for analysis.
Analytical data marts are cost effective. This is especially true when the server that is required is modest compared to the monsters DB’s running on large iron. Departments can almost always afford a smaller server and expect and receive much better turnaround time on jobs than most data warehouses. Data marts are more easily expandable and can be tuned more effectively for analytics. Heck, I’ve yet to work on a mainframe or large data warehouse that could outrun a smaller server or desktop for most of my needs.
The cost for a WPS server license on a four, eight or even sixteen core analytics data mart is quite reasonable. With WPS on the desktop and a WPS Linux server, analyst can remotely submit code to the data mart and receive back the log, listings and graphics right back into their desktop workbench. But the biggest beauty of running WPS in your data mart platform is that WPS comes with all the database access engines as part of the package. If you have worked in a large environment with multiple database vendors, you can see how this can be very cost effective when it comes to importing data from all these different data bases into an analytical data mart.
About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.