Tag Archives: Benchmarks

High Performance Workstations for BI

There’s one thing I really enjoy and that’s powerful workstations for performing analytics. It’s fun to play around with and can be insightful to speculate on the design and then build a custom higher-end workstation for running BI applications like WPS and R.

ARS Builds

Every quarter, ARS Technica goes through an exercise where they build three PC’s mainly to assess gaming performance and then do a price vs. performance comparison. There’s a trend that you will soon see after reading a few of these quarterly builds and that is, the graphics card plays a major role in their performance assessment. The CPU, number of cores and fixed storage tend to be minimal when comparing the machines.

This if course will be in contrast to what we want to do for our performance benchmarks. We are looking at a holistic approach of CPU throughput, DISK I/O and graphics for getting the most for the dollar on a workstation build. But ARS does have a lot to recommend when it comes to benchmarking and I think it’s worthwhile including some of their ideas.

What Constitutes a High End Analytics Workstation?

This is an interesting question and one that I will throw out for debate. It’s so easy to get caught up in spending thousands of dollars, if not ten thousand dollars (see the next section) for a work station. One thing that even the casual observer will soon notice is that being on the bleeding edge is a very expensive proposition. It’s an old adage that you are only as good as your tools. There’s also the adage that it’s a poor craftsman that blames his tools. In the BI world, especially when speed means success, it’s important to have good tools.

As a basis for what constitutes a high end workstation, I will offer the following as a point of entry.

  • At least 4 Logical CPU’s.
  • At least 8GB of RAM, preferably 16GB to 32GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a mid-tier solution, I would think that a workstation comprised of the following components would be ideal.

  • At least 8 Logical CPU’s.
  • A minimum of 16GB of RAM.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a high end solution, I would think that a workstation built with the following hardware would be close to ultimate for many (if not most) analysts.

  • Eight to 16 Logical CPU’s – Xeon Class (or possible step down to an Intel I7).
  • A minimum of 32GB of RAM and up to 64GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • Multiple 24” displays capable of at least 1920×1200 each.

I do have a bias towards hardware that is upgradeable. All-in-one solutions tend to be one shot deals and thus expensive. I like upgradability for graphics cards, memory, hard drives and even CPU’s. Expandability can save you thousands of dollars over a period of a few years.

The New Mac Pro – a Game Changer?

The new Mac Pro is pretty radical from a number of perspectives. It’s obviously built for video editing but its small size is radical in my opinion. As a Business Analytics computer it offers some intriguing prospects. You have multiple cores, lots of RAM, high end graphics but limited internal storage. That’s the main criticism that I have about the new Mac Pro. The base machine comes with 256GB of storage and that’s not much for handling large data sets. You are forced to go to external storage solutions to be able to process large data sets. Although I’ve not priced out the cost of adding external storage, I’m sure it’s not inexpensive.

Benchmarks

This is a tough one for me because so many organizations have such an array of hardware and some benchmarks are going to require hardware that has specific capabilities. For example, Graphics Cards that are CUDA enabled to do parallel processing in R. Or the fact that we use the Bridge to R for invoking R code and the Bridge to R only runs on WPS (and not SAS).

I did write a benchmark a while ago that I like a lot. It provides information on the hardware platform (i.e. amount of memory and the number of LCPU’s available) and just runs the basic suite of PROCS that I know is available in both WPS and SAS. Moving to more statistically oriented PROC’s such as Logistic and GLM may be difficult because SAS license holders may not have the statistical libraries necessary to run the tests. That’s a major drawback to licensing the SAS System. You are nickel and dimed to death all the time. The alternative to this is to have a Workstation benchmark that is specific to WPS.

Perhaps the benchmark can be written where it tests if certain PROCS and Libraries are available and also determine if the hardware required is present (such as CUDA processors) to run that specific benchmark. Really, the idea is to determine the performance of the specific software for a specific set of hardware and not a comparison between R, WPS and SAS.

Price and Performance Metrics

One aspect of ARS that I really like is when they do their benchmarks, they calculate out the cost comparison for each build. They often base this on hardware pricing at the time of the benchmark. What they don’t do is price in the cost of the software for such things as video editing, etc… I think it’s important to show the cost with both hardware and software as a performance metric benchmark.

Moving Forward

I’m going to take some time and modify the WPS Workstation Benchmark Program that I wrote so that it doesn’t spew out so much unnecessary output into the listing window. I would like it to just show the output from the benchmark report. I think it would also be prudent to see if some R code could be included in the benchmark and compare and contrast the performance if there are some CUDA cores available for assisting in the computations.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.