Category Archives: MPExec

Introducing WPS Version 3

It’s a new year and version 3 of WPS is out and available. There’s a lot of new items in this release and I think it has been well worth the wait. I personally feel that this is the block buster release that will turn heads in the corporate environment due to so many added features. With version 3, World Programming has made the software more friendly and easier to use, added new procedure support both statistically and in the core language, and added support for more language elements.

I’ve been beta testing WPS 3 for a few months now and so I’ve had a bit of time to try the software out and evaluate it extensively. Below is a list of items and features that really stood out to me in WPS V3. It’s far from being exhaustive but these are probably the most pertinent for many WPS users and SAS customers who are considering WPS as an alternative to SAS.

Workbench Support on Additional Platforms

The Eclipse workbench is available on Unix, Linux, Windows, AIX, Solaris and OS X platforms. The GUI that you’ve been using on Windows is now available on all the platforms except for z/OS. Amazingly, the Eclipse Workbench is identical (and I mean identical) across the platforms. Now when companies and organizations add servers to their data center and the users will have the same identical experience (read no learning curve) on all the platforms.

Apple OS X Support

WPS has been available on Apple’s OS X for a while, but now it has the Eclipse Workbench as the GUI. The WPS implementation on OS X is native, meaning you don’t have to run Parallels or Boot Camp to run WPS on your Mac. WPS is a true 64-bit implementation on OS X. The technical specifications for the Mac OS X implementation is that it needs to be an x86 chip and you will need version 10.5 (Leopard) or higher. WPS V3 works fine on Lion as you can see in the screen shot below.

Click here to view larger image.

About a month or two ago I bought a Mac Mini just to test WPS and to port the Bridge to R over to it. The Mac implementation is a bit prettier than on Windows or Linux in my opinion. It may be the font, but it does look nice as you can see in the image above. In testing WPS on OS X, I was amazed how easy it was to transfer my knowledge of the WPS implementation on Windows over to OS X.

There’s a lot of folks who love their Macs and WPS running natively gives the Mac faithful a SAS language compatible system that takes full advantage of their hardware. Mac users give up nothing over the Windows or Linux versions of WPS and desktop pricing for the Mac is the same as for the other desktop platforms. I’m excited for the university students where there is a preponderance of Macs and I can see the take up of WPS because of the hardware and operating system support.

Linux Support

As mentioned earlier, the workbench is also available on Linux. I have it running on both Ubuntu 10.4 LTS and Fedora 13. As you can see in the screen shots below, WPS looks the same on each of the platforms. One thing to note about the Linux version, it is only available as a server license. But for shops that have a Linux platform, WPS executes exactly like the Windows version in that it provides the end users the same experience as it pertains to the GUI.

Click here to view larger image.

Many shops run Red Hat as their preferred version of Linux and WPS does run just dandy on that platform. I find that many of the larger customers we have do use Red Hat, but I suspect that the smaller businesses use Ubuntu or Fedora instead.

AIX and Solaris

The workbench is also available on these platforms. I don’t have a license or even a machine to run these two implementations on, but I would suspect that the experience is identical to the Apple and Linux versions discussed above.

WPS Link

This is an exciting feature that is being introduced in WPS 3.0. WPS Link is a Client/Server architecture that allows you to remotely execute code on Linux or Unix servers. So for example, you can be running WPS on your Windows or Mac OS X desktop and submit your WPS code over to a WPS Linux server. The log and listing files come back from the remote machine to your desktop. WPS Link is different from SAS/Connect in that this is just a client to execute your programs on the remote box. You can’t upload and download files from the desktop to the server. But, if a lot of your processing takes place on the server and that’s where your data sets reside, this is a nifty solution for processing data on a remote server.

WPS Link is easy to setup from you WPS Desktop. Simply go to the Server Connection tab and select Local. Right mouse click on Local and you get a drop down menu. Select New || Server Connection || Remote SSH.

From that point you get a dialog box where you simply fill out the hostname (or IP Address) the Connection Name (how about “WPS Remote Server”) your user name and the Launch Command. The Launch command was the most difficult part for me. It’s simply the location of the folder where WPSLinks module is located. On my server it is:

/home/minequest/wps-3.0.0.0.19782-private/bin/wpslinks

Make sure you have SSH running on your Linux server and you will be able to remotely start WPS on the server from your desktop!

Submitting code from the desktop to the WPS Server is relatively easy. Simply go to the Submit Icon and you will find a downward facing arrow next to it. You can select whether you want to run the code Locally or on the Server. One of the cool innovations with WPS Link is that you can have multiple servers registered and choose which WPS server you would like to run your code on. Note the options in the drop down menu shown below.

WPS Link comes standard as part of your WPS purchase. So if you license WPS on the desktop, WPS Link will be available to you as part of your desktop license. Also be aware that in this first release, WPS Link requires WPS running on an AIX, Solaris, Linux Servers and Linux on Mainframe System/z. I suspect that you will see Windows Servers being supported in a later release. But for our customers running large volumes of data on Linux or Unix, this feature will be very welcome.

One other note of importance on WPS Link. You can connect from a desktop to a server and server-to-server. You cannot link from desktop-to- desktop.

Procedure Support

In version 3, World Programming has added support for the following statistical procedures.

  • T-Test
  • GLM
  • GLMMOD
  • STDIZE
  • PRINCOMP
  • DISTANCE
  • FACTOR

Database Engine Support

One of the things that SAS does that I really don’t like is the up-charge for database engines. With WPS, you get the database engines as part of the WPS license. You don’t buy these separately. With version 3 of WPS, there is now support for an XML Libname engine and Sybase databases.

If you’re a shop that has a few different databases from different vendors, it would be financially wise to take a look at WPS just for the cost reduction with the included access engines.

Other Enhancements

WPS has now implemented multi-threaded support for PROC MEANS and PROC SUMMARY. It’s always fun to look at the execution time for CPU and Real Time when dealing with multi-threaded procedures. From some of the WPS documentation you can see that the multi-threaded support that is used in MEANS and SUMMARY has made it into procedures that make use of this code such as PROC TTEST and I assume PROC CORR.

Also, for those of you who were unaware, PROC SORT is also multi-threaded and is very fast. If you do a fair amount of sorting in your environment, it would be beneficial to be running on a 64-bit platform and load up with memory. The more memory you have available to WPS, the more data it can store in RAM when sorting and this results in faster sort times.

There is now an import and export wizard for importing files into WPS and writing out WPS data sets to text files. This is pretty big in my opinion. I could never remember the PROC IMPORT statements even though I use them often. Probably everyone uses a code template for importing and exporting but this just makes life easier.

The data set viewer has been enhanced over the previous release. You can view the variables labels instead of just the variable names. You can also show and hide variable columns.

Also supported for the first time is the SYSTASK and WAITFOR commands. I’ve been waiting for these two language statements to be supported for a while. As part of the Bridge to R on Windows there is a module called MPExec. MPExec allows you to run multiple programs in parallel. I’ve had to resort to using other methods in lieu of SYSTASK and WAITFOR to get this to work. Now that these two commands are available, we will update MPExec to use these features instead, thus MPExec will become portable to other platforms.

Performance

Performance has improved in each release of WPS and WPS v3 is no exception. With multi-threaded support for more procedures and better data handling, WPS is a very viable candidate to replace SAS at many organizations. MineQuest has many customers who run large data sets (millions, ten millions and hundred millions) of records through the WPS System and performance is more than adequate. These companies are able to save loads of money over the competing software system.

Expandability

With affordable pricing, customers can use WPS to perform reporting and analytics in areas and for departments that just wasn’t justifiable before. Using WPS in lieu of SAS allows many organizations to expand the analytics platform and run more data through the system. This is because such advantageous pricing allows them to purchase an additional server at great savings over our competitors.

Data Service Providers

We love Data Service Providers! Seriously, we find that DSP’s are some of our best customers. With WPS V3, licensing terms stay pretty much the same. World Programming LTD does not have DSP fees and you are free to use WPS to service your customers by providing them with data sets, reports and analytics. For any organization that is paying DSP fees, we can dramatically reduce your license fees. Your customer is your customer.

Caveats

Before you download the latest release of WPS, there is one caveat to keep in mind for upgrading. If you are using batch command files to run your jobs, you need to modify the name of the WPS executable in your .bat or .cmd file. The program name you want to use is WPS.EXE (as opposed to the former name which was WPSI.EXE).

Also in WPS V3 the WPD dataset has changed and is faster and more robust. You can still read WPD V2 datasets but V3 datasets are the default in this release.

Finally, you will need a new license key to be able to install V3. Don’t expect to simply download it and run it using the V2 key as you have been able to do previously. If you are a MineQuest customer, contact us if you want to upgrade to V3 and we will facilitate getting you a license key.

Evaluations

MineQuest is offering free 30 day evaluations of WPS V3. You can contact us at (614) 457-3714 to request an evaluation or by email. If you prefer to request your evaluation by email, send a request to info@minequest.com.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Services for WPS Conversions and Evaluations

It’s been a while since we’ve reviewed and re-jiggered the services that we offer and the ones we do offer need a bit of a makeover. Although our existing services are still pretty pertinent, we are looking at expanding and rounding out our consulting services in the area of SAS to WPS conversions. In the next few weeks we’ll be modifying our website to reflect these changes as well as creating some marketing brochures that can be downloaded and shared.

Many organizations are looking to significantly reduce software cost over the next two to three years. They don’t want to necessarily change their current architecture and most want to continue using existing source code whenever and wherever possible. Based on those premises, we’ll soon be putting together a services portfolio that encompasses the following practices.

Work with IT or an organizations Analytics Departments in providing a WPS Proof of Concept.

  1. Evaluate Price/Performance
  2. Define the requirements for an analytical and/or reporting replacement.

Assist in the evaluation of WPS Software as a replacement to existing SAS products.

Perform detailed Code Evaluation on existing SAS user and production SAS code libraries to evaluate compatibility with WPS and provide or recommend workarounds as necessary.

Recommend hardware and specific configurations for a WPS Server Installation.

Provide SMP libraries for Symmetrical Multi-Processor Hardware.

Install and test The Bridge to R.

Provide guidance to companies who are Data Service Providers on how best to reduce their exposure to SAS DSP fees.

Provide Consulting to departments and users who are focused on particular projects, i.e.

  1. For re-architecting their systems.
  2. For jump start/quick start scenarios.

Although these last two may seem a little “out there” for many people, you would be surprised to find out how common it is that a company acquires another organization and inherits a system that requires immediate attention in terms of licensing, cost reduction or consulting assistance to move the system to a new platform. It’s also not a rare situation where a company needs to immediately move their source code off of SAS due to DSP issues, escalating server costs or license problems. In these situations, MineQuest and World Programming can be of immense help.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Looking Down the Road

We’ve had the Bridge to R out for about a week now and so far, it’s been very positive on the feedback. The Bridge is pretty stable and I think fairly easy to use after you setup R and install the Bridge itself.

Which brings us to what we may want to include in the next release. What we’re thinking about is including MPExec, and a nifty little macro called WPS2XML which allows you to create an XML file as well as generate a schema (DTD or XSD) from your WPS data set. The code for both utilities is already written and just requires testing in a more stressful environment.

So, what is MPExec? It’s a very robust utility that we wrote for WPS customers last year. MPExec stands for Multi-Processor Execution and it allows the WPS user to thread their programs so that they can run multiple parts of the program at the same time. On a multi-core desktop or server, one can dramatically reduce their programs execution time — depending on how well the program can be threaded.

Most programmers (and SAS programmers especially) think in a top down fashion when designing their programs. For example, you may have multiple steps to extract and clean data from a database, another set of steps that access data in a transport file that was created on the mainframe that also needs cleaned and sorted, and finally, some historical data that is sitting in another database on another server.

None of these three steps outlined above have anything in common (i.e. data sharing) in the sense that they have to run sequentially. Why not run these all at the same time on your multi-core desktop or server and save time? That’s exactly what MPExec allows you to do.

I’m interested in hearing what other users may have interest in when it comes to utilities to enhance and expand their use of WPS. Currently, the Bridge to R and the accompanying utilities are only available on Windows platforms. Is there a need or interest to have them also execute on Linux/Unix/Solaris? Of course, we’re always interested in hearing ideas about how we can expand our utilities to include R in a more seamless fashion as well.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

MPExec Documentation Preview

We have the first draft of the documentation for MPExec. MPExec is a software add-in that allows for multi-threading WPS applications. I’ll place the document out on the MineQuest website for anyone interested in taking a look at it. Be forewarned, it is a first draft so expect some grammar and spelling issues.

The documentation is pretty short, only five pages long but you can easily read and understand what is required to run MPExec. Adding threading to a WPS program using MPExec is easy to do. Simply add a few statements to existing code segments that are good candidates for concurrent processing and you can dramatically reduce your applications execution time.

Right now, we are planning on including it as part of the MineQuest MacroLib. You can get the MacroLib included for free if you choose to license WPS (either Windows Desktops or Windows Servers) from MineQuest directly. The MacroLib includes some other useful applications such as XMLRead and XMLWrite, and the Bridge to R for WPS. We also plan on supporting MPExec on the Unix/Linux platforms in the future and we’ll announce that availability when we are thoroughly done testing.

If you’ve licensed WPS from WPC directly or some other reseller, we can offer an annual license for the MacroLib so you have access to all the goodies as if you had purchased directly from us. Right now, the pricing for the MacroLib is $119 for the desktop version. The pricing for the Windows Server version is set at 15% of the annual WPS license fee.

The documentation for MPExec can be found at: http://www.minequest.com/Misc/mpexec_users_guide.pdf

MPExec requires WPS release 2.3.5. By the end of June, we expect to be able to begin fulfilling phone orders. Make sure you check our website for an announcement!

Benchmarks WPS in a Threaded Environment

I finally completed the writing of MPExec over the weekend. MPExec is the name of the macro’s that create an environment where you can run multiple threads or processes at the same time. All-in-all, I have maybe 40 hours into the project and I’m quite satisfied with the results. I’m sure it would have normally taken me more than 40 hours but I was able to piggy-back on existing code that I had written with the Bridge to R.

I decided to make MPExec as portable as possible. Thus, I didn’t use any outside exec files that I used in the Bridge to R to do such things as check for file existence and to spawn new threads Btw, FileExist is a new function in WPS 2.3.5 and works great! The other thing that I see in version 2.3.5 is that I can shut off log messages/notes/source and not have the log fill up with blank lines as in the earlier releases. This is fantastic and allows my logs to look professional without lots of white space.

I don’t think I can get the code to execute any faster than it does on my development machine when executing multiple threads. So let’s take a look at some benchmarks. Below is a table that shows how well a Quad-Core PC can execute multiple WPS threads. All times are in minutes and seconds.

 

1,000,000

Records

2,000,000

Records

5,000,000

Records

Threads

Par / Seq

Par / Seq

Par / Seq

2

0: 18 / 0:22

0:18 / 0:28

0:25 / 0:41

4

0:19 / 0:44

0: 20 / 0:54

0:42 / 1:28

6

0:22 / 1:06

0:30 / 1:20

0:58 / 2:01

8

0:28 / 1:44

0:34 / 1:56

2:16 / 2:49

A brief explanation so the above table makes a little sense is in order. I ran each test three times and took the average time for all three runs and rounded to the highest value. I developed the test programs so that one thread was creating the data and then performing a SORT, MEANS and FREQ on the data. The other thread that executed was always a UNIVARIATE and a CORR using a permanent data set with 600,000 records. I kept this balance of creating temp data sets and using permanent data sets when I had more than two threads. So, when running four threads, I had two CORR and UNIVARIATES running and two SORT, MEANS and FREQ running. With six threads, I had three of each and with eight threads, I had four of each running. For an example of the code, see the bottom of the blog.

I ran the test times sequentially as well. This gives us the time that it would take to run the programs without threading. Comparing the Parallel times with the sequential times, we can get an idea of how much faster we can run our code using threading.

One thing to note. Since we are always running the CORR and UNIVARIATES using 600,000 records (and from a different drive array) these times tend to be pretty constant. This is true especially with two and four threads with one or two million records. The time differences start to disappear appear when we start using 5,000,000 records and six or eight threads. The test machines temp drive(s) start to become overwhelmed and are I/O bound.

With a fast drive array for your work space (temp files), you can really get some amazing decreases in your execution times by using threading. The system I’m running these tests on has a two drive RAID-0 setup for temp space. If I was to add an additional drive to that array, I’m sure the execution times with eight threads and five million records would be much lower… perhaps by 30 to 40%.

WPS Code for benchmarking two threads.

%MPExec;

%let iter=1e6;

%startthread(Job_A);

     data a;
       do ii=1 to &iter;
       a=ranuni(0);
       b=ranuni(0);
       c=ranuni(0);
       d=ranuni(0);
       e=ranuni(0);
       f=ranuni(0);
       g=ranuni(0);
       h=ranuni(0);
       i=ranuni(0);
       aa=round(a*10,1);
       output;
       end;
    run;

    Proc sort data=a; by ii; run;

    proc means data=a;
    run;

    Proc freq data=a;
    tables aa;
    run;

;;;;
%stopThread;

%startThread(Corr_Run);

    libname tstdata ‘c:\wpstestdata\’;

    Proc univariate data=tstdata.d;
    var j k l;
    run;

    Proc corr data=tstdata.d;
    run;

;;;;

%stopThread;

%WaitForThreads;