Category Archives: WPS

Load and Stability Testing WPS

The last few days, I’ve been testing WPS with some rather large data sets to try to determine what kind of limits there are and to check the stability of the system under heavy load. So far, I’ve limited my tests to doing simple PROCS like Append, Copy, Freq, and Means on WPS data sets. In these tests, I keep increasing the size of the data sets as each batch successfully completes. I’ve now run tests that have contained observations of 1 million, 10 million, 100 million, 1 billion and 2 billion records.

I believe that that the upper limit of a WPS data set on Windows is (2^31) -1, or 2,147,483,647 observations. I’ve tried to append data to go beyond that number and WPS graciously tells me that there is an error, “Too many records in data set xxxxxx” and refuses to load any more observations into the data set. An interesting side note, in discussing the upper bound limit with a colleague, he reminded me that SAS had this same limitation at one time as well.

I’ve never had an opportunity at any site that I’ve worked at to process such large amounts of data. I have consulted to a client that had 800 million credit card accounts and 32 billion transactions in their data warehouse (different vendors, private label cards, etc…) but actually analyzing a single portfolio never exceeded 160 million records.

Running these tests can be time consuming. Getting my head around the size of these files is something that does not come naturally to me. For example, going from a dataset of 1 million records to 2 billion records means the data set is 2,000 times larger. I don’t have the fastest Windows Server in town (I do have a lot of storage though) and I just basically let it chug while I work on other projects. But these tests do illustrate that WPS is capable of handling large files.

The real reason I’ve been performing such tests is to determine if WPS can replace SAS to do the heavy lifting for which it is so often used. At this point, I believe the answer is yes. For those companies that are interested in using WPS for a backend for development and implementation of vertical market applications, database access and web enablement, and not have to contend with the sky-high licensing fees requested by SAS Institute, this is something to have on your short list for evaluation. For those organizations that have a need to do daily heavy lifting of data, sorting, summarization and reporting, WPS can fulfill most of your needs in replacing SAS/Base, SAS/Access and SAS/IntrNet.

The WPS Eclipse Configurable IDE

One of the cool things about WPS is the IDE. The environment that hosts WPS is the Eclipse IDE and i’s quite configurable. I’ve written about this in a previous blog posting but I think it’s important to contrast this open environment with the closed IDE that SAS has. With Eclipse, you can use “Plug-in’s” to expand the capability of the IDE. I downloaded XML Buddy’s Eclipse plug-in and installed it by simply unzipping the files into the plug-in directory. When I fired up WPS and clicked on File||New||Other, I was presented with an option to use the XMLBuddy editor. I can use that to Edit, Validate XML as well as generate DTD’s. With the Professional version of XML Buddy, you can do even more.

Here’s a screen shot that shows the XML Drop Down menu and the XMLBuddy Editor in WPS.

The WPS Libname Engine

One other thing I wanted to mention. WPS can read SAS data sets natively! No longer do you have to go through the torturous and excruciating pain of exporting your SAS data sets when running under Windows to some bizarre format so it can be read in elsewhere. WPS cannot write a SAS data set. To read a SAS dataset using WPS, just create a libname statement with the SAS data set engine name. For example, if your SAS data set is in the directory D:\SASDATA you can write:

LIBNAME sasdsets sas7bdset “D:\SASDATA”;

And presto, you have access to SAS data. That’s a great feature when you are dealing with users and companies who are in a mixed environment and are using both SAS and WPS.

Links:
WPS – www.teamwpc.co.uk

XMLBuddy – www.xmlbuddy.com

WPS Beta5 Release

Well I finally got my hands on the latest WPS beta release (beta5) and see a lot of improvements and fixes. With this release, I’ve been stress testing it with large data sets and with a few exceptions (ODBC related) performance and stability is quite good.

I’ve been focusing on writing macros to implement PROC RANK and PROC STANDARD in WPS. Both of these procs are absent in the latest release but I’ve reached a point where I’m nearly done with these macros. Once I finish these macros, I can do about 95% of what I can do in SAS/Base as a consultant and developer.

I’m really getting excited about this product for a number of reasons. First, as a consultant and SAS developer, I see the opportunity to expand my client base. Many small and medium size businesses simply cannot afford to license SAS at the rates the Institute demands. WPS gives these smaller companies a shot at implementing SAS like solutions without breaking the bank. With more companies being able to afford these solutions using WPS, I believe my consulting practice will grow with this market expansion.

I’ve also read comments that WPS doesn’t have the statistical procs that SAS does. In my 20 plus years as a SAS consultant, I’m still amazed at how much of the work I do is performed in just Base and Access. Unless you’re a statistician, you will probably never miss those procs anyway.

I also like the fact that there’s competition in this arena. With competition, I feel that things will only get better. There certainly will be more improvements with an eye towards the developers (and not just the CIO and the non-stop kissing up of management) that are the guns in the trenches. That’s why I’m excited about WPS, it has placed its focus on the developer.

Watch out SAS Institute. There’s a value player in the market now!

Links: http://www.teamwpc.co.uk

A Look at the WPS IDE

I’ve received some requests asking to see what the development environment looks like for WPS on Windows. I guess that’s fair considering all the complaining I did in a previous blog posting on how terribly retro SAS’s environment is.

Below are some screen captures that show the WPS editor. Note that the IDE uses the Eclipse environment. The more I use it, the more I like it. It’s fast and fairly intuitive to use. Windows are nicely laid out and easy to find.

The first screen capture is of the IDE and the Editor. I have a SAS program in the editor that reads the SF3 Census file.

The second screen below is a browse window that has been opened onto the WPS data set that was just created.

The third screen capture shows the output window with a proc means that was just run. Note that I didn’t set up any page size or line size options.

And finally, the fourth screen capture that shows the WPS LOG window.

As you can see from the screen shots, and if you are a SAS developer, this should all look pretty familiar to you. The environment is easy to use and friendly and should be easily adapted too by any programmer.

I’ll be posting more about WPS, the SAS/Base alternative in the next few days.

Stay tuned!

Links: http://www.teamwpc.co.uk

Thoughts on Benchmarking SAS and WPS

In a previous post, I stated that I wrote some benchmarks comparing SAS to WPS to get an idea of how they stack up against each other. After giving this some additional thought, I’ve decided that this is not fair to WPS, mainly because it’s still in beta. I think a comparison should be made with the release versions of both packages.

So, with the knowledge of those at the World Programming, I will write some code that will perform some simple benchmarks between the two systems when WPS goes live (i.e. not a beta release) and will compare and contrast the times for the data step and a few PROC’s such as Means, Sort, Freq and Tabulate. I’ve also agreed to first let World Programming review the source code that will be used and will entertain comments from them about how well written the code is and if it’s a justifiable benchmark.

I will post the code and the data sets that I use on my website (www.minequest.com) so that others can review the code and can comment and critique the source code as well. This is not designed to be an exhaustive benchmarking between the two but will provide those interested a simple basis for comparison.

Links: http://www.teamwpc.co.uk
http://www.sas.com
http://www.minequest.com

WPS – The World Programming System

I thought I would share with everyone my latest little time burner… beta testing the WPS (World Programming System.) I’ve only had it a day and have run about a dozen programs through it, and so far, I’m impressed. WPS is a SAS alternative that mimics SAS/BASE pretty darn close. It has a great IDE (it uses Eclipse) for interactive programming and development. You can check out the WPS website at: http://www.teamwpc.co.uk/index.html to see what they have to offer and how they are positioning the product.

So far, the only problems I’ve had running WPS has been with a few programs that use some of the random number generators that I had in the programs. That was easily taken care of by switching the syntax to use CALL RANNOR and CALL RANUNI instead. I was pleasantly surprised to be able to just copy almost all of my SAS programs that I have that read large datasets and create indexes, do some PROC FREQS and PROC MEANS to validate the load and they worked without any changes.

Performance is not on par with SAS, however. WPS is 10% to 20% slower on doing MEANS and FREQS. The data step code is hit or miss when it comes to execution times. Sometimes WPS is a few percent faster and other times, SAS will outrun it easily. I still have not found the elements that account for the discrepancies but overall, SAS is faster. Sometimes, a lot faster.

According to the WPS website, pricing for the PC version is around $600 for first year fees and half that for the renewal. The download is 60mb in size. That amazed me after being conditioned to feed CD’s to SAS over the years.

Below is a list of language elements that are supported.

Procedures
Append, Compare, Contents, Copy, Datasets, Delete, Export, Format, Freq, Import, Means, Options, Print, Printto, Sort, SQL, Summary, Tabulate, Transpose

Library Engines
DB2, DB2 (z/OS) MySql ,ODBC, SASV6, SASV8, Xport, SPSS, SPSSDIR, SqlServer, Teradata, TeraData (z/OS), WPD (z/OS) WPD1 (Wpd is the compliment of the SAS dataset)

Data Step Statements
ABORT, ARRAY, ATTRIB, BY, CALL, CARDS, CONTINUE, DATA, DATALINES, DELETE, DO,DO, iterative, DO UNTIL, DO WHILE, DROP, END, FILE, FORMAT, GO TO, IF, subsetting IF-THEN/ELSE, INFILE, INFORMAT, INPUT, KEEP, LABEL, Labels,Statement LEAVE, LENGTH, LINK, LIST, MERGE, OUTPUT, PUT, RENAME, RETAIN, RETURN, SELECT, SET, STOP, Sum, UPDATE, WHERE

Data Step Functions and CALL Routines
ABS, ARCOS,ARSIN, ARTAN, BAND, BLSHIFT, BNOT, BOR, BRSHIFT, BXOR, BYTE, CALL EXECUTE, CALL RANCAU, CALL RANNOR, CALL RANUNI, CALL SYMDEL, CALL SYMPUT, CALL SYSTEM, CEIL, CHOOSEC, CHOOSEN, COMPRESS, COS, COSH, CSS, CV, DATE, DATEJUL, DATEPART, DATETIME, DAY, DHMS, DIF, DIM, EXIST, EXP, FLOOR, GETOPTION, HBOUND, HMS, HOUR, INDEX, INPUT,INT, INTNX, JULDATE, JULDATE7, KURTOSIS, LAG, LBOUND, LEFT, LENGTH, LIBREF, LOG, LOG10, LOG2, LOWCASE, MAX, MDY, MEAN, MIN, MINUTE, MISSING, MOD, MONTH, N, NMISS, PUT, QTR, RANGE, RANCAU, RANNOR, RANUNI, REPEAT, RIGHT, ROUND, SCAN, SECOND, SIGN, SIN, SINH, SKEWNESS, SQRT, STD, SUBSTR, SUM, SYMGET, SYSPARM, SYSPROD, SYSTEM, TAN, TANH, TIME,TIMEPART, TODAY, TRANSLATE, TRIM, UPCASE, USS, VAR, VERIFY, WEEKDAY, YEAR, YYQ

Data Set Options
COMPRESS, DROP, FIRSTOBS, IN, INDEX, KEEP, LABEL, OBS, POINTOBS, RENAME, REPLACE, WHERE

Macro Processor
%BQUOTE, %* comment, %DO, %DO, (Iterative) %DO %UNTIL, %DO %WHILE, %END, %EVAL, %GLOBAL, %GOTO, %IF-%THEN/%ELSE, %INDEX, %label, %LENGTH, %LET, %LOCAL, %LOWCASE, %MACRO, %MEND, %NRBQUOTE, %NRQUOTE, %NRSTR, %PUT, %QLOWCASE, %QSCAN, %QSUBSTR, %QSYSFUNC, %QUOTE, %QUPCASE, %SCAN, %STR, %SUBSTR, %SUPERQ, %SYSCALL, %SYSEVALF, %SYSFUNC, %SYSPROD, %SYSRC, %UNQUOTE, %UPCASE

System Options
AUTOEXEC, BLKSIZE, BYLINE, CENTER, CHARCODE, COMPRESS, DATE, DB2IN, DB2SSID, DKRICOND, DKROCOND, DSNFERR, ECHOAUTO, ENGINE, ERRORABEND, ERRORS, FILEBLKSIZE(device-type), FILESPPRI, FILESPSEC, FILESYSTEM, FILEUNIT, FIRSTOBS, FMTERR, INITSTMT, _LAST_, LINESIZE, MACRO, MACROGEN, MAUTOSOURCE, MERROR, MISSING, MLOGIC, MPRINT, MRECALL, MSGLEVEL, MTRACE, NOTES, NUMBER, OBS, OLDMAC, PAGENO, PAGESIZE, REPLACE, S, S2, SASAUTOS, SERROR, SORTCUTP, SORTEQOP, SORTLIST, SORTMSG, SORTNAME, SORTOPTS, SORTPARM, SORTPGM, SORTSIZE, SORTSUMF, SOURCE, SOURCE2, STIMER, SUMSIZE, SYMBOLGEN, SYSPARM, SYSPREF, USER, VNFERR, WORK, WORKINIT, WORKTERM, WPSTRACE, XCMD, XSYNC, XWAIT, YEARCUTOFF