Category Archives: Threaded Processing

Thursday Ramblings

Does anyone do comparisons of graphics cards and measure performance in a VM? Specifically, do certain graphics cards boost performance when running VM’s on the desktop? I like to see my windows “snap” open when I switch from VM to VM. As a developer, I often wonder if spending an additional $150 on a popular graphics card will yield a perceptible performance boost.

Speaking of graphics cards, we recently bought a couple of used Nvidia Quadro graphics cards from a local CAD/CAM company that is upgrading their workstations. I got these at about 5% of their original retail price so I’m happy. We were having problems getting a couple of servers to go into sleep mode using Lights Out and we discovered that we needed a different graphics card to accomplish this. The plus side is that these are Nvidia cards with 240 CUDA cores and 4GB of RAM. So we now have the opportunity to try our hand at CUDA development if we want. I’m mostly interested in using CUDA for R.

One drawback to using CUDA, as I understand it, is that it is a single user interface. Say you have a CUDA GPU in a server, only one job at a time can access the CUDA cores. If you have 240 CUDA cores on your GPU and would like to appropriate 80 CUDA cores to an application — thinking you can run three of your apps at a time, well that is not possible. What it seems you have to do is have three graphics cards installed on the box and each user or job has access to a single card.

There’s a new Remote Desktop application coming out from MS that will run on your android device(s) as well as a new release from the Apple Store. I use the RDC from my mac mini and it works great. I’m not sure what they could throw in the app to make it more compelling however.

Toms Hardware has a fascinating article on SSD’s and performance in a RAID setup. On our workstations and servers, we have SSD’s acting as a cache for the work and perm folders on our drive arrays. According to the article, RAID0 performance tends to top out with three SSD’s for writes and around four on reads.

FancyCache from Romex Software has become PrimoCache. It has at least one new feature that I would like to test and that is L2 caching using an SSD. PrimoCache is in Beta so if you have the memory and hardware, it might be advantageous to give it a spin to see how it could improve your BI stack. We did a performance review of FancyCache on a series of posts on Analytic Workstations.

FYI, PrimoCache is not the only caching software available that can be used in a WPS environment. SuperSpeed has a product called SuperCache Express 5 for Desktop Systems. I’m unsure if SuperCache can utilize an SSD as a Level 2 cache. It is decently priced at $80 for a desktop version but $450 for a standard Windows Server version. I have to admit, $450 for a utility would give me cause for pause. For that kind of money, the results would have to be pretty spectacular. SuperSpeed offers a free evaluation as well.

If you are running a Linux box and want to enjoy the benefits of SSD caching, there’s a great blog article on how to do this for Ubuntu from Kyle Manna. I’m very intrigued by this and if I find some extra time, may give it the old Solid State Spin. There’s also this announcement about the Linux 3.10 Kernel and BCache that may make life a whole lot easier.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Services for WPS Conversions and Evaluations

It’s been a while since we’ve reviewed and re-jiggered the services that we offer and the ones we do offer need a bit of a makeover. Although our existing services are still pretty pertinent, we are looking at expanding and rounding out our consulting services in the area of SAS to WPS conversions. In the next few weeks we’ll be modifying our website to reflect these changes as well as creating some marketing brochures that can be downloaded and shared.

Many organizations are looking to significantly reduce software cost over the next two to three years. They don’t want to necessarily change their current architecture and most want to continue using existing source code whenever and wherever possible. Based on those premises, we’ll soon be putting together a services portfolio that encompasses the following practices.

Work with IT or an organizations Analytics Departments in providing a WPS Proof of Concept.

  1. Evaluate Price/Performance
  2. Define the requirements for an analytical and/or reporting replacement.

Assist in the evaluation of WPS Software as a replacement to existing SAS products.

Perform detailed Code Evaluation on existing SAS user and production SAS code libraries to evaluate compatibility with WPS and provide or recommend workarounds as necessary.

Recommend hardware and specific configurations for a WPS Server Installation.

Provide SMP libraries for Symmetrical Multi-Processor Hardware.

Install and test The Bridge to R.

Provide guidance to companies who are Data Service Providers on how best to reduce their exposure to SAS DSP fees.

Provide Consulting to departments and users who are focused on particular projects, i.e.

  1. For re-architecting their systems.
  2. For jump start/quick start scenarios.

Although these last two may seem a little “out there” for many people, you would be surprised to find out how common it is that a company acquires another organization and inherits a system that requires immediate attention in terms of licensing, cost reduction or consulting assistance to move the system to a new platform. It’s also not a rare situation where a company needs to immediately move their source code off of SAS due to DSP issues, escalating server costs or license problems. In these situations, MineQuest and World Programming can be of immense help.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Multi-Processing WPS and SAS

In an article appearing on the Computerworld website, "Desktop multi-processing: Not so fast" the author, Lamont Wood, discusses some issues surrounding desktop applications and the ability to use all the cores and Hyper-Threads found on most current and modern desktop CPU’s. It’s an interesting read and it discusses problems that are found and being worked on for the typical desktop users. But WPS and SAS programmers are not typical users (at least in my opinion) and I think the type of work and processing they tend to do can often be threaded, especially when it comes to moving systems into production.

So how does Wood’s assertions affect WPS and SAS programmers? Developers who use the SAS language tend to fall into two camps. There are those who do a lot of ETL, reporting and database work, and then there are those who live in the statistics world. Both camps can make use of parallel processing and decrease execution and decrease total run time speed. In my world, I’m after decreasing total run-time speed and not so much concerned with CPU time.

So exactly how does a WPS or SAS developer run code that is parallelized? A SAS user has a few options and it varies depending on what products the user has licensed. For example, some SAS/Stat PROCS implement parallel processing, as does some BASE. Here’s a list I found on the SAS support website that makes use of SMP (symmetrical multi-processing) hardware. If you are a SAS user and have SAS/Connect, you can use that expensive product to run portions of your code in parallel to decrease your execution time.

If you are a WPS developer, you can use MPExec (Multi-Processor Execute) to run portions of your WPS code simultaneously and recombine log files, list files and datasets that were spawn from each thread. MPExec is very easy to use and I won’t go into it again other than to say you only have to learn three new keywords to implement it in your code.

At one level, MPExec is similar to SAS/Connect in that you can spawn multiple WPS Sessions where each session runs a thread. You, as the developer are responsible for eye-balling your WPS code to decide what sections of code will run in an individual thread. At another level, MPExec doesn’t have the baggage or the expense that SAS/Connect has going for it so it’s easier to use and justify.

Running MPExec or any other methodology that allows for multi-processing of your SAS and WPS programs often boils down to just two factors. The first, if you have sufficient cores to execute the threads that you want to execute, and two, if you have sufficient I/O to read and write the data sets for all the threads you are executing. I find that the I/O is typically what hinders WPS and SAS multi-threaded programs the most. But, if you have sufficient hardware, you can get some awesome gains out of your tried and true production code using MPExec.

And the best part of MPExec? MPExec is free if you license WPS for the Windows desktop or Windows Server from MineQuest.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

MPExec Documentation Preview

We have the first draft of the documentation for MPExec. MPExec is a software add-in that allows for multi-threading WPS applications. I’ll place the document out on the MineQuest website for anyone interested in taking a look at it. Be forewarned, it is a first draft so expect some grammar and spelling issues.

The documentation is pretty short, only five pages long but you can easily read and understand what is required to run MPExec. Adding threading to a WPS program using MPExec is easy to do. Simply add a few statements to existing code segments that are good candidates for concurrent processing and you can dramatically reduce your applications execution time.

Right now, we are planning on including it as part of the MineQuest MacroLib. You can get the MacroLib included for free if you choose to license WPS (either Windows Desktops or Windows Servers) from MineQuest directly. The MacroLib includes some other useful applications such as XMLRead and XMLWrite, and the Bridge to R for WPS. We also plan on supporting MPExec on the Unix/Linux platforms in the future and we’ll announce that availability when we are thoroughly done testing.

If you’ve licensed WPS from WPC directly or some other reseller, we can offer an annual license for the MacroLib so you have access to all the goodies as if you had purchased directly from us. Right now, the pricing for the MacroLib is $119 for the desktop version. The pricing for the Windows Server version is set at 15% of the annual WPS license fee.

The documentation for MPExec can be found at:

MPExec requires WPS release 2.3.5. By the end of June, we expect to be able to begin fulfilling phone orders. Make sure you check our website for an announcement!

MPExec – Everything but the Documentation

I finally finished the programming for MPExec and have the runtime notes for the log looking just the way I want. It’s a nice little system to run concurrent WPS processes on a windows work station or server and get full utilization of your processor cores. I have a log file that you can view where we are running eight processes on four cores. You will want to look at the last thirty five or forty lines in the log to see the summary of the multi-threading for all eight processes. You can view the log at:

Over the next few days, I’ll be writing the documentation for MPExec. Installation is pretty easy and the only configuration required is to write a few lines to your file. Any code changes to your existing programs will be pretty simple. There’s only four new commands so you can multi-thread your WPS program without much work.

Currently, MPExec requires WPS version 2.3.5 which is the latest release. We use some new functionality in this release to get our code to execute the way we want and not have to use an external DLL or .EXE file. Hence the requirement for the latest release of WPS.

We also intend to get MPExec running in the Unix and Linux platforms as well. We will announce that availability at a later date.

Finally, we intend to include MPExec as part of the macrolib that we include when you license WPS from MineQuest. The new macrolib will include the Bridge to R for WPS, XMLRead and XMLWrite as well as MPExec. If you have licensed WPS through another reseller or through WPC directly, we can sell you a license for the MineQuest macrolib separately. Contact us directly for pricing.


Implementing Threaded Processes in WPS

I’ve been working the last few days on creating code that will allow a WPS user to run parts of a WPS program in parallel. I’ve made a lot of progress and it seems to be working fine. It needs a lot of cleaning up and some tuning yet, but I’m fairly satisfied with the progress.

The good news is that it’s fairly easy to implement in your existing code. Simply adding a few lines of code before and after each segment will spawn a new thread that is executed separately. At the end of all the threads execution, we read in the LOG and LST files so the output appears in the Hosts programs log and list files.

There are a couple of nifty attributes that are worth mentioning. First, you can pass a macro variable’s value to the spawned program. Secondly, you have access to all the work files that were created at the end of all the threads from your Host program. This way, you can use the temp data sets in future steps and PROCS.

Performance seems pretty decent on my work station which is a Quad-Core with 8GB of RAM. Once I get the code cleaned up and documented, I’ll run some benchmarks on a server. On my workstation however, running four tasks with 10 million records each in parallel took 42 seconds. Running these sequentially took 1:30 (mm:ss). When I upped that to five tasks with 10 million records and a different mix of PROCS and data steps, my sequential processing time took 2:37 (mm:ss) and time to run in parallel was 1:23. So you can see, there are some real potential performance increases available.

It will be interesting to see how well this can scale in terms of number of threads. I’m seeing that the larger the data set to be processed, the better the performance. I suppose this is reasonable when you consider that each thread must fire up another instance of WPS so that smaller threads are handicapped by the start-up and initialization of the system.

There are some ugly issues though. Log Analyzers are going to be confused with duplicate line numbers (each threaded processes will have line numbers that are also in other threaded processes) and making the log appear in such a way that it’s not just a mass of text that is hard to follow also needs to be addressed.

But these are things that are doable and I’ll provide some more information as time permits.