Tag Archives: Business Intelligence

Some thoughts on a rainy Monday

The more I use Linux, the more I come around to understand just how much I can do with it. As a matter of fact, I could easily do without Widows and switch 100% of the way over to Linux if I wanted. The desktop(s) and business applications have really gotten that good.

Windows 8 just soured me on the whole MS ecosystem. When they bolted on the Metro interface on a server OS — that was the last straw for me. Who ever made that decision to strap on a touch interface to a server should be let go. Shown the door. Asked to leave…

I have Apple hardware here in the office, and it runs well, but I just have not been able to embrace it like so many others have. Apple makes some fine hardware and there’s a load of support for Office productivity applications as well as analytical apps. WPS runs quite well on OS X as well as R. As a matter of fact, I see a lot of R users who work on OS X as there preferred platform.

But Linux, and specifically Ubuntu 12.04 and 14.04 have been especially good. I don’t have memory issues when I run large simulations in R that require a lot of RAM. With Windows, that is often a problem, trying to allocate a large block of memory and there’s not sufficient contiguous memory to hold a large array, vector or data frame. The memory management is significantly different under Linux than under Windows.

Use of NVidia’s CUDA framework seems to be predominantly used on Linux and not Windows. I’m not sure why that is to be honest.

I’ve been reading a lot of articles stating that MS is working feverishly trying to get Windows 9 out the door. No doubt (at least in my mind) it has to do with the terrible Metro interface and people staying away in hoards. Of course, you can slap Start8 by Stardock on Windows 8 and it makes it useable by implementing the start button, and kudos to Stardock for doing such a thing, but I still can’t find a way to embrace MS on the desktop any longer.

An interesting phenomena that I have been witnessing is how much analytical and scientific development has been happening over the years on the Linux platforms. There are a lot of tools out there that are helpful if you are a data scientist or working with “BIG DATA” as it pertains to Linux. My experiences in reselling WPS is that there is an equal amount of interest (perhaps more) in using Linux on servers than in running Windows servers. Cost is one factor but performance is also a factor. Linux often out performs Windows Servers dollar for dollar and CPU second to CPU second.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Thursday Ramblings

Does anyone do comparisons of graphics cards and measure performance in a VM? Specifically, do certain graphics cards boost performance when running VM’s on the desktop? I like to see my windows “snap” open when I switch from VM to VM. As a developer, I often wonder if spending an additional $150 on a popular graphics card will yield a perceptible performance boost.

Speaking of graphics cards, we recently bought a couple of used Nvidia Quadro graphics cards from a local CAD/CAM company that is upgrading their workstations. I got these at about 5% of their original retail price so I’m happy. We were having problems getting a couple of servers to go into sleep mode using Lights Out and we discovered that we needed a different graphics card to accomplish this. The plus side is that these are Nvidia cards with 240 CUDA cores and 4GB of RAM. So we now have the opportunity to try our hand at CUDA development if we want. I’m mostly interested in using CUDA for R.

One drawback to using CUDA, as I understand it, is that it is a single user interface. Say you have a CUDA GPU in a server, only one job at a time can access the CUDA cores. If you have 240 CUDA cores on your GPU and would like to appropriate 80 CUDA cores to an application — thinking you can run three of your apps at a time, well that is not possible. What it seems you have to do is have three graphics cards installed on the box and each user or job has access to a single card.

There’s a new Remote Desktop application coming out from MS that will run on your android device(s) as well as a new release from the Apple Store. I use the RDC from my mac mini and it works great. I’m not sure what they could throw in the app to make it more compelling however.

Toms Hardware has a fascinating article on SSD’s and performance in a RAID setup. On our workstations and servers, we have SSD’s acting as a cache for the work and perm folders on our drive arrays. According to the article, RAID0 performance tends to top out with three SSD’s for writes and around four on reads.

FancyCache from Romex Software has become PrimoCache. It has at least one new feature that I would like to test and that is L2 caching using an SSD. PrimoCache is in Beta so if you have the memory and hardware, it might be advantageous to give it a spin to see how it could improve your BI stack. We did a performance review of FancyCache on a series of posts on Analytic Workstations.

FYI, PrimoCache is not the only caching software available that can be used in a WPS environment. SuperSpeed has a product called SuperCache Express 5 for Desktop Systems. I’m unsure if SuperCache can utilize an SSD as a Level 2 cache. It is decently priced at $80 for a desktop version but $450 for a standard Windows Server version. I have to admit, $450 for a utility would give me cause for pause. For that kind of money, the results would have to be pretty spectacular. SuperSpeed offers a free evaluation as well.

If you are running a Linux box and want to enjoy the benefits of SSD caching, there’s a great blog article on how to do this for Ubuntu from Kyle Manna. I’m very intrigued by this and if I find some extra time, may give it the old Solid State Spin. There’s also this announcement about the Linux 3.10 Kernel and BCache that may make life a whole lot easier.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Building a BI Consulting Company Part II

In the previous post I mentioned that software costs and licensing can be a major impediment to offering a competitive consulting business. I’ve written numerous times demonstrating the cost between a WPS license and our competitor licensed product. You can see those articles here and here.

If you’re a small business and/or just starting an analytics business then cash flow is a major issue. You expect that there will be some significant startup costs but wisely choosing your products can have a major impact on whether you will be successful or not.

The same goes for what you can do with the license. For example, some software companies put the screws to you when you want to use their licensed software in a B2B fashion. This can be innocuous as creating reports and data sets for your customer. The vendor, if they realize it will then dramatically increase your license fees.

How about licensing issues between your company and the software vendor where they have a vested interest in a software solution and you want to offer a competing product? Or perhaps (and more likely) what if they develop a competing product to your solution and decide that they no longer want to provide your organization with a software license? This is a very possible scenario where software companies want to create or move into vertical market applications at the expense of their license holders.

So those are a few things to consider in regards to software costs and licensing. Do your research and ask questions of the vendor. It never hurts to be informed.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Building a BI Consulting Company

Over the last few weeks, I’ve been engaged in a series of conversations regarding consulting and necessary hardware and software to run a successful consulting house. In the last year we’ve seen so many references to “big data” and many of us in the consulting field just shrug our shoulders and smirk because we’ve understood that “big data” is a lot of hype for most of us. If you want to be precise about it, the term (and what we should be concerned with) is actually “big analytics.”

As a BI consultant or consulting house, you don’t have to replicate your client’s systems or data warehouse to consult on “big analytics.” As a matter of fact, some of the most successful BI consulting going on today are with companies that have outsourced a portion of their analytics to a third party. For example, loyalty cards are a driving force in retail and many organizations have outsourced this to third party analytics firms. We also see a growing opportunity in health care for fraud detection and pricing of procedures and prescriptions.

So the question comes down to what is your consulting focus? Is it providing knowledge and programming expertise to a company and perform the consulting remotely (or even onsite) or is it more encompassing and moving in the direction where you have the client’s data on your systems and perform a daily/weekly/monthly service?

I’m inclined to argue that the more financially successful firms that are offering consulting are the ones that are taking client’s data and providing the analytics services away from the client. The rates and fees are higher than when you are on site and there is limited travel time and expense to deal with.

I often see quotes for servers that they have been solicited from Dell, IBM or HP when they are sizing hardware to run WPS. I am amazed at how reasonably an organization can purchase or lease hardware that is immensely powerful for processing data sets when running WPS. I’ve seen 16 and 32 core servers that can run dozens of WPS jobs simultaneously priced between $40K and $60K.

I’m convinced that if you have a good services offering (and a decent sales staff who can find you clients) that this is the golden age in analytics for smaller firms and firms considering jumping into this space. My observations with advertising agencies and others who offer such services bears out that the supply of talent is low and the demand is high.

Of course, hardware cost is just one factor in this line of business so in a future column we will talk about how software cost and licensing can constrain you to the point where you can’t provide any services to third parties or it can set you free and allow you to make significantly more money. Software licensing is a major component to running a profitable BI/Analytics service.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Complexity and Cost

This past weekend, my wife and I went to a lovely wedding. This was a Catholic wedding that was amazingly short but the priest had a very interesting sermon on complexity and cost. He talked about complexity in our lives and the cost both direct and indirect that we each experience. One example that he gave was smart phones and how expensive they are in terms of outright cost of service as well as the indirect cost, that being how much time we take playing and looking at the gadgets at the expense of others and relationships around us.

Hi sermon got me thinking. This is true for software and business intelligence in particular. The cost of non-open source software can be pretty high. And the reason for that? Support cost, sales cost, maintenance cost, legal costs, etc…

I often see how companies have purposely fragmented their products so that they can charge more for additional libraries modules. This has increased cost tremendously for the consumer. Our competitor is a prime example of this. They send out a local or regional sales person to chat up the prospect. Often, they can’t answer the questions the customer has because of the complexity of the product. So they send out a Sales Engineer or two who visits the prospect to answer these questions and chat them up a second time. Now we have three people in the mix who are making a 100 grand a year (at least) involved in the sale. The price of the software product has to increase to the customer because of all the people involved in the sale.

Here’s another example of added complexity. Different pricing for the same product depending on how you use it. Take companies that are B2B in nature. Firms such as actuarial firms, claims processing, advertising etc… are often labeled as data service providers because they want to use the software in a B2B capacity. Sometimes this is as innocuous as being a Contract Research Organization providing statistical analysis. The cost here comes from a different license (think lawyers), people to audit the customer and employees to enforce the license. It all adds up!

That above examples illustrate everything that is wrong with traditional ways of thinking in terms of software. At MineQuest Business Analytics, we’re proud that we are able to help keep cost down for the customer. We don’t have such draconian licensing for companies that are DSP’s. We don’t have an organization that is setup to milk and churn the customer for every last cent. What we do have is a company that is dedicated to providing the best service and software at an affordable price.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Business Analytics Predictions for 2011

1. Companies will begin to significantly look at their present enterprise agreements and due to economic uncertainty start renegotiating them in an effort to cut cost. One familiar refrain will be that basic analytical software has become a commodity and they will not continue to pay high annual licensing costs. We will see this trend accelerate dramatically with local and state governments who are being crushed by looming deficits.

2. Open Source will continue to make in-roads in the analytics sphere. R will continue to grow in the enterprise by virtue of its popularity in the academic circles. As students enter the work force, they will want to use software they’re most comfortable with at the time.

3. Enterprises will start offering analysis, reports and data to trade partners that show how they can improve their services with each other. This will be a win-win scenario for both organizations.

4. As hardware capability increases, analytical software pricing will become a major concern as businesses will want to use the software on more platforms and in more areas of the company. Linux will be the platform of choice for most of these companies due to low cost and high performance.

5. Desktop analytics, contrary to popular opinion will continue to dominate the enterprise. This is where the hardcore data analysts live, on the desktop, and this will also be where the new algorithms will be developed. Visualization software will also start to become common on the desktop. Businesses who short change their analytical development staff with low powered desktops and small LCD monitors will see less active development by their staffs.

6. We will see enterprises who have invested in specific high cost analytic languages and who have put into production the rules, reports and algorithms on large servers either recode to a new language or migrate to compatible and lower cost languages.

7. The role of innovation will be double edged. There will be those companies and organizations who invest heavily in analytics see advantages over their competitors. There will also be those companies who gain competitive advantage by utilizing their BI stack more effectively by making it available throughout the company.

8. Licensing will continue to hamper companies and organizations as well constrain growth by restricting what companies can provide (reports, data, etc…) to their customers by virtue of being labeled Data Service Providers. Processing of third party data will be a monumental problem to companies due to license issues.

9. The days of processing large amounts of data on z/OS are all but over. I know this has been said before but there just isn’t growth on that platform. Plus, all the innovation in analytics is taking place on the desktop and smaller servers. Companies will look at moving their analytics to z/Linux and other Linux platforms in an attempt to save money on hardware and software cost.

10. Multi-threaded applications running on the BI stack will be the rage. As core counts and memory availability continue to expand, the ability to make use of SMP and MMP hardware will be more important than ever.

11. Server pricing based on client access or client counts will begin to decline. Competition for customers will make such pricing ill-advised.

12. The allure of cloud computing will be strong but with regulatory constraints and laws regarding privacy including fear of losing control of data (i.e. wikileaks) the two largest service sectors in the United States (which are banking and healthcare) will have taken note and will continue to avoid the use of public clouds.

13. Just as in other parts of the economy where we see the creation and bursting of bubbles, in 2011 social media such as Facebook and Twitter will start to be seen as a venue for narcissist’s and a time waste for many people. Companies who have invested millions of dollars to “mine” tweets will see such analysis as less than helpful given such low ROI and such analysis will begin to fall out of favor.

14. Mobile applications will be hot. The delivery of analytics on devices such as iPads and other tablets including smart phones will become much more common.

15. Since “flat is the new norm” Cell Phone providers will find that the high cost of data plans will drive potential customers away and since Wi-Fi connectivity has become so common (almost everywhere I go there is free Wi-Fi), we will see a decrease in 3G and 4G use and Wi-Fi only tablets will dominate. We already are seeing this trend with the Apple iTouch vs. the iPhone, and now the Rim BlackBerry Playbook will also be offered in a Wi-Fi only version.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

What is in Your BI Stack?

Earlier this week, I was sitting around talking to a few friends at a place called the Tilted Kilt (not a bad place either) about what constitutes their analytics platform in regards to software that they use on a regular basis. One fellow works for a major finance house here in town, the second person works for an educational consortium, and the third works for an advertising agency of about 100 employees.

Pretty much as expected, both the finance and the education employee are stuck with what is “provided” by their employer. In other words, they’re not allowed by the IT group to add any software to the “standard” analytics desktop (for whatever that means). The software for these two folks was pretty straight forward and included an expensive stat package (SAS) and Excel.

Lynn, who works for an ad agency was fairly unique in my view because of the diversity of tools she had at her disposal. She had the standard Microsoft Office install, but also had SPSS, Stata, RapidMiner and R, as well as a data visualization package which I simply cannot remember the name of right now.

I understand that some of the tools that Lynn uses is driven by the fact that they are open source and cost effective, but she’s also one of the smartest data analysts I’ve known for the last six or seven years. It started me thinking about what I use most often and currently, my BI stack consists of:

WPS – a SAS language compatible software application

R – open source statistics and graphics

Bridge to R – interface into the R system for WPS users

Excel – spreadsheet

Ggobi – data visualization

Google Refine – data cleansing

Looking at my list, three of the six software applications are open source.

I’m curious to hear from others on what constitutes your BI stack and whether your organization allows you to augment the software with tools of your choice. I’m especially interested in hearing how your company deals with open source software and if you think that having a choice of tools allows you to think outside the box?

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

MultiCores and Software Pricing – the Paradox

The rate of computing power is increasing dramatically. A recent article in PC World that I read announced the availability of a 12 core CPU from AMD. What was interesting to me was the price point that was being announced. The story stated that the low end AMD 12 core CPU was priced at $293 per CPU. Samsung has announced that they are starting production of 4GB DDR3 modules that will produce 16GB and 32GB DIMMS in desktops. Slap three 32GB DIMMS in your desktop and you have 96GB of RAM. Perhaps that’s not surprising to you in some ways, but it did raise a lot of eyebrows from Quants who are looking to utilize more computing power.

Related to this fact, a few weeks ago J.D. Long who runs the blog Cerebral Mastication made a statement to me about the growing popularity of R. I had commented that two years ago, I rarely saw a copy of R on a Quants desktop at the clients that I consult with. But today, I see R on 50% of the desktops. J.D. stated that “I suspect R is getting adopted from the ground up, not top down.” The more I think about that statement, the more I think it’s a very astute observation.

But there’s a twist to this story and that is how software companies, especially those in the database and BI areas, price their software. Most of these companies price their software by the “logical CPU” or by the “core.” These companies can’t get away with doing such a thing on desktops but they can on the server. The cost of such software can escalate quickly on even a mid-size server. But I’m contending that pricing in such a manner is quite risky and can make your software less valuable to your customers.

Let me explain my thoughts. In a way it’s a paradox. As your software becomes more expensive (I’m talking when software cost is in the five or six figures) you move from being a “partner” to being a risk. Instead of being able to penetrate numerous departments with your software, the cost becomes a barrier to entry to these departments. Also, cost can be such a factor that it’s unrealistic to think that a single company will continue to invest in your solutions. It’s the nature of business to avoid risk and try to find the most cost effective solution.

No longer is it true that you need to spend money to make money. What I’m postulating is that many open source solutions are available that are sufficiently good enough to replace the standard bearers. Talend and Pentaho are two examples in the BI sector that sit on a server. These systems don’t have to be best of breed; they just have to be good enough to operate effectively in the corporate environment.

Is R a replacement for number crunching on the desktop? I believe it is. In the last week, I’ve been exploring graphics in R and how they could complement or replace SAS/Graph. I found out that I can do a bar chart, a histogram, a contour plot, a filled contour plot, a perspective plot, and even a 3-D plot with only two lines of code for each plot type.

So this is what I think will happen. In the short term, Quants and Developers will use R in conjunction with popular BI software. As they become more familiar and confident with their abilities and with the near term availability of six and 12 core CPU’s with 64GB to 96GB of memory, desktop workstations will replace a lot of servers performing number crunching using R. We can see this in the strategy of “if you can’t beat them, join them” by companies like SAS and SPSS which have incorporated R into their products to try to avoid the criticisms of not offering cutting edge statistical technology.

But the bottom line is that open source BI software is maturing and due to vendor software pricing on the servers that are based on logical CPU’s, these companies have effectively killed the goose that laid the golden egg.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

WPS, VM’s Linux and Windows

I certainly hope that everyone has had a great Christmas. I always enjoy this holiday, probably the most of any. It seems like the week between Christmas and New Years is always slow. Maybe this is a godsend because I’ll have some time to play with software!

I discovered a great blog called "The Fat Bloke Sings" that focuses quite a bit on VirtualBox. This is a cool site and the author lots of information and video tutorials on using Virtualbox. The video on teleportation is worthwhile watching and definitely shows the need for some scripts to automate ad more easily allow the transport of a running Virtualbox VM from one machine to another.

After watching the video, I couldn’t help but think about how this could be used in the BI space. Specifically, if you are running critical systems or real-time BI. I’ve never had great success in terms of speed and performance when running SAS or WPS (or any other disk intensive BI program for that matter) in a VM. I typically use VM’s for development purposes and not production work. That said, I do know from reading the latest spec sheets on Vbox, they’ve been able to increase I/O speed as well as paging. But, if you are running a system that hits large datasets on a regular basis, I think a VM solution could be problematic.

Another thing I’m interested in is if there’s any real speed difference between running WPS on a Linux and Windows system. I have a very generic system setup where I have both Windows XP and Fedora 11 installed on a Quad-Core machine. The hard drives are pretty vanilla 500GB, 7200 RPM drives with Linux and Windows installed on disk 0, and disk 1 partitioned between Linux and Windows with a 250GB work drive between the two. Finally, I have disk 2 as a drive with the permanent data. This is probably the fairest way to do a real comparison between the two.

If I get a chance, perhaps I’ll do some benchmarks on a system looking at disk I/O, Real time and CPU time between the two platforms. I’ve heard from the Linux and Unix zealots that the ‘nix platforms are faster than Windows but I’ve never seen a direct comparison.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

 

Greenplum, Teradata, WPS and SAS

I’ve long been an advocate of SAS and WPS programmers having database skills and I’ve rallied about this in the past on this blog. It’s often something that makes the difference between landing a job and not landing a job.

One problem though is how to get certain database skills that are in demand. In my opinion, there are four database systems that are pretty popular in the BI industry. They are DB2, Oracle, SQL Server and Teradata. There are other databases that are used, but the larger companies tend to cluster around these four products.

In the last week or two, there have been two announcements that have caught my eye and are directly applicable to SAS and WPS developers. First, Greenplum announced a free Single Node Edition license of their software. This license allows you to install a version of Greenplum’s DB on a up to an eight core Linux box.

Greenplum is an interesting company in the database analytics field. It’s a firm with a product that is rapidly becoming popular and is nipping on the heels of Teradata. I’ve downloaded the Greenplum software but have yet to install it.

Teradata, a week after Greenplum’s announcement, decided to offer a “Teradata Express” appliance. Teradata Express is for developers and for training and is available as either a VMware appliance, a Windows installer, or on Amazon’s EC2. You can get a nice little setup on Amazon and practice your heart content for less than fifteen cents an hour.

You can access these database systems with either WPS or SAS. WPS comes with the required drivers for Teradata. If you are running SAS, you will have to pay for a SAS/Access license for OleDB, ODBC or SAS/Access for Teradata.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.