I finally completed the writing of MPExec over the weekend. MPExec is the name of the macro’s that create an environment where you can run multiple threads or processes at the same time. All-in-all, I have maybe 40 hours into the project and I’m quite satisfied with the results. I’m sure it would have normally taken me more than 40 hours but I was able to piggy-back on existing code that I had written with the Bridge to R.
I decided to make MPExec as portable as possible. Thus, I didn’t use any outside exec files that I used in the Bridge to R to do such things as check for file existence and to spawn new threads Btw, FileExist is a new function in WPS 2.3.5 and works great! The other thing that I see in version 2.3.5 is that I can shut off log messages/notes/source and not have the log fill up with blank lines as in the earlier releases. This is fantastic and allows my logs to look professional without lots of white space.
I don’t think I can get the code to execute any faster than it does on my development machine when executing multiple threads. So let’s take a look at some benchmarks. Below is a table that shows how well a Quad-Core PC can execute multiple WPS threads. All times are in minutes and seconds.
| |
1,000,000
Records
|
2,000,000
Records
|
5,000,000
Records
|
|
Threads
|
Par / Seq
|
Par / Seq
|
Par / Seq
|
|
2
|
0: 18 / 0:22
|
0:18 / 0:28
|
0:25 / 0:41
|
|
4
|
0:19 / 0:44
|
0: 20 / 0:54
|
0:42 / 1:28
|
|
6
|
0:22 / 1:06
|
0:30 / 1:20
|
0:58 / 2:01
|
|
8
|
0:28 / 1:44
|
0:34 / 1:56
|
2:16 / 2:49
|
A brief explanation so the above table makes a little sense is in order. I ran each test three times and took the average time for all three runs and rounded to the highest value. I developed the test programs so that one thread was creating the data and then performing a SORT, MEANS and FREQ on the data. The other thread that executed was always a UNIVARIATE and a CORR using a permanent data set with 600,000 records. I kept this balance of creating temp data sets and using permanent data sets when I had more than two threads. So, when running four threads, I had two CORR and UNIVARIATES running and two SORT, MEANS and FREQ running. With six threads, I had three of each and with eight threads, I had four of each running. For an example of the code, see the bottom of the blog.
I ran the test times sequentially as well. This gives us the time that it would take to run the programs without threading. Comparing the Parallel times with the sequential times, we can get an idea of how much faster we can run our code using threading.
One thing to note. Since we are always running the CORR and UNIVARIATES using 600,000 records (and from a different drive array) these times tend to be pretty constant. This is true especially with two and four threads with one or two million records. The time differences start to disappear appear when we start using 5,000,000 records and six or eight threads. The test machines temp drive(s) start to become overwhelmed and are I/O bound.
With a fast drive array for your work space (temp files), you can really get some amazing decreases in your execution times by using threading. The system I’m running these tests on has a two drive RAID-0 setup for temp space. If I was to add an additional drive to that array, I’m sure the execution times with eight threads and five million records would be much lower… perhaps by 30 to 40%.
WPS Code for benchmarking two threads.
%MPExec;
%let iter=1e6;
%startthread(Job_A);
data a;
do ii=1 to &iter;
a=ranuni(0);
b=ranuni(0);
c=ranuni(0);
d=ranuni(0);
e=ranuni(0);
f=ranuni(0);
g=ranuni(0);
h=ranuni(0);
i=ranuni(0);
aa=round(a*10,1);
output;
end;
run;
Proc sort data=a; by ii; run;
proc means data=a;
run;
Proc freq data=a;
tables aa;
run;
;;;;
%stopThread;
%startThread(Corr_Run);
libname tstdata ‘c:\wpstestdata\’;
Proc univariate data=tstdata.d;
var j k l;
run;
Proc corr data=tstdata.d;
run;
;;;;
%stopThread;
%WaitForThreads;