A Parallel Method for Analytical Processing

August 26th, 2008

I’ve been doing some interesting work with R and WPS the last few days and I want to share a little bit about what I’m doing. I developed the Bridge to R a few months back and I’m basically satisfied with how it works. I will be introducing some new features to the next release to make things a bit easier to use and clean up the documentation. One of my interests however is in how to maximize the CPU usage with WPS and R so that I can complete statistical tasks more quickly.

What I have running on my development machine is a system that is written in WPS (I’m doing most of my SAS language development in WPS now due to the better IDE), Delphi and R. This system gives you the ability to launch multiple R programs and allows them to run in parallel. After the last R program completes, WPS grabs the R listing and log files, writes them to the appropriate windows and continues on processing WPS statements.  It’s nice and tidy and it’s fun to watch the CPU meter bounce around like crazy on a CPU intensive set of programs.

At this point, I let the operating system balance which CPU should run which jobs but I might change that. I am able to force R to use a specific core or logical processor and since R is not multi-threaded (i.e. doesn’t use more than a single Logical CPU)  it might have some benefits to the user to specify in code what they want to have done resource wise. I’m undecided on this aspect though because I’m thinking that the OS (Vista in this case) can handle the workload more efficiently than I or a user can.

The calling conventions I’m using are straightforward too. Consider the macro calls below:

%RppStart(machineName, Mode);

%Rstart();

                R code here…

%Rstop;

%RppStop ;

Where

%RppStart is the macro calling convention and stands for  ”R Parallel Program Start”

MachineName - currently, the name is always the local machine name. This could be expanded later to allow other machines on the network to execute the R code.

Mode - takes a value of P for Parallel or S for sequential processing.

%RppStop  - is the macro that pulls together all the output from the completed R programs and writes information to the proper WPS windows.

Say you want to run three R programs in parallel and these programs are a cluster analysis, a factor analysis, and logistic regression, The complete sequence for calling a series of R programs would look like:

%RppStart(Phil-Vista,P);

%Rstart(Auto,MySASdataSetName,NoGraphWindow);
      R code for cluster here…
%Rstop;

 %Rstart(Auto,MySASdataSetName,NoGraphWindow);
      R code for factor  here…
%Rstop;

%Rstart(Auto,MySASdataSetName,NoGraphWindow );
      R code for logistic here…
%Rstop;

%RppStop;

 

All three programs above would be launched almost simultaneously,  execute at the same time and any code past the %RppStop would not run until all the programs in the %RppStart / %RppStop block had completed.  

One interesting use for such a method or implementation is to allow one to run such CPU intensive programs such as forecasting models that really don’t take a lot of storage but often require testing numerous methods to obtain a best fit on literally hundreds of products. As desktop computers get more powerful with more cores for processing, such analysis will become more common. Moving to the servers, we see Intel announcing six core CPU’s and multiple socket motherboards so enterprise analytics will become quicker and more affordable too. 

 

 

 

The Value Proposition in BI Software

August 22nd, 2008

Much has been written about the value of software, how it is derived and the cost associated with using it. One of the things that I think is rather bizarre is just how explicit some software vendors are in how you can use their software to run your business. Let’s take a look at one example that I’m quite aware of, taking your business analytics and putting it on the web so your suppliers can view some statistics about your companies usage of their products and how much you think you may be using of their products in the future.

SAS has a product called SAS/IntrNet. I kid you not, that’s how they spell it… IntrNet. The philosophy that SAS has in regards to this product is that you should pay a license fee that is insanely expensive because anyone in your company might have access to SAS by using SAS/IntrNet. If you want to allow your partners and vendors access to YOUR CORPORATE INFORMATION, then be prepared for SAS to lay claim to an even bigger chunk of your wallet. The amount of the “bigger chunk” is also predicated on the size of your company. If you’re an organization that has revenues of $500 million or more, you are going to pay a lot more for the same software than a company that has $450 million in revenue… and I mean a lot more.

Which brings up another issue that I hear complaints about quite often and that’s how SAS plays the game on pricing. Cruise over to the SAS web site and see if you can find how much SAS will cost. I can virtually guarantee you that you will not find pricing on their website. And the reason is because they want to talk to you to try to get an idea of how much you can pay. The shake-down is only the beginning! Someone has to pay for the cost of free M&M’s, the onsite health club and the 35 hour work week. Note: MineQuest fully discloses the cost of licensing WPS on our website. You can see those costs by viewing our pricing page.

Back to the main issue of what is the real value proposition of your BI software and what is a solution to these rising costs? Reconsider your BI stack and how you are using this expensive software. Break down how the software is used into three categories. Advanced analytics, ETL and reporting, and web presentation and report distribution.  Take a hard look at how your Quants are using their advanced analytics software for modeling. It’s a good guess that they have licensed procedures and modules that they will never, ever use. What is the value proposition in paying for software that your organization will never use? Solution: only license the modules that are currently being used for the advanced analytics. Supplement the non-licensed Procs with an open source package such as R.

For ETL and Reporting, these processes tend to be more than a one-time deal. Reports are often run daily, weekly or monthly, and ETL processes are often created to cleanse data going into a warehouse or other database system on a regular basis. If you are using SAS for these two processes, and the SAS language is a natural for these, then consider dramatically reducing your cost and license a copy of WPS. The same code that your programmers have invested time in writing can be directly imported into WPS for doing the exact same thing.

Finally, let’s take a look at web presentation and report distribution using SAS/IntrNet. It’s expensive to acquire, expensive to keep, and quite frankly insanely expensive to use if you want to let your partners and vendors access your data. The golden lining here is that swapping out SAS/InterNet for WPS-WEB is not going to cost you anything. WPS-WEB comes with WPS. You are not going to pay for an add-on that is more than the cost of the base product… it’s included in the WPS license.

As the economy tightens up and you need to look to reduce costs and streamline operations, this is a good time to review your BI software license agreements and pricing. It’s also a very good time to open negotiations with your BI sales reps asking them what they can do to reduce cost for your business.

WPS Reseller Announcement

August 13th, 2008

Here’s a “preannouncement announcement” for those of you who are interested in licensing a copy of WPS on Windows. MineQuest will shortly become a reseller of WPS on the Windows Platforms to firms, universities and individuals primarily in the United States and Canada. Our pricing will be virtually the same as that offered by WPC but we can offer a few more things that I feel make it attractive to purchase your licenses from us. First, we can offer services  for conversions from SAS to WPS. Secondly, we have a good deal of knowledge and experience in using the SAS language (25 years) and we can apply it to our consulting business, particularly when it comes to implementing a WPS solution.

The other reason for licensing WPS from MineQuest is that we will be providing a free copy of a Bridge to R for WPS Users with every Desktop and Terminal Server license. A Bridge to R allows you to run R code from within the WPS Workbench, returning the R output and listing files to your WPS log and listing files. The Bridge provides the means to perform some very advanced statistical analysis that is not available in packages such as SAS.

In becoming a reseller of WPS software, I believe we are making a commitment to some rather bold statements that we (MineQuest) have been making about this product as a cost effective replacement for SAS.  WPS is constantly improving and is quite compatible with your existing SAS/Base source code. In the consulting world where I work, 95% of the programming that I do for my clients revolves around SAS/Base. The other five percent is using SAS/Access and SAS/Graph. When I have run into an issue, I’ve always been able to resolve it fairly easily. WPS just works with my existing SAS/Base code.

We also hope to have WPS and our SAS and WPS consulting services available via the GSA Schedule. The GSA schedule will provide our government customers to purchase WPS directly and not have to go through the hassles of a bid process. GSA customers can also purchase licenses using their GSA credit cards which is significant when you consider how inexpensive WPS is in comparison to SAS.

As soon as we iron out some logistics in taking credit card orders (in the next few days), we’ll be making a formal announcement soon.

A Tribute to a Good Friend, Rusty Williams

August 12th, 2008

Once in awhile, reality hits you pretty hard and that usually happens in your personal life. I lost a dear friend on Sunday. Rusty died of brain cancer and had only been diagnosed with the disease a few weeks earlier.

I first met Rusty early in my career.  We were both on our second or third job in the mid 1980’s and he was a COBOL programmer and I was a SAS developer. We both were given the task of supporting the same division in our company so we worked together often. Once in awhile, things just click between two people and we laughed, giggled and drank beer for a few years while we “worked” together.

Rusty was a teacher in the truest sense of the word. His documentation was pristine and accurate. SQL was just starting to gain popularity and he took the time to teach SQL to me. If I had a nickel for every time he would yell over the cubicle wall, “check your commas.” I would be sitting pretty.

As our careers developed, we both went into consulting, albeit with different companies. We stayed good friends and our paths crossed professionally over the years and we had the pleasure of working together again. No matter what the project, we knew we would have fun.

I don’t know how many times we took off to go to a Browns game in Cleveland or an OSU football game here in Columbus. It was never about the event itself that was important. It was just spending time talking and the camaraderie that was special.

One thing that I admired about Rusty was how much he loved his two kids. Both Alex and Lauren were the focus of his life. He never grew tired talking about them growing up and how much fun he had being with them.

Thank you Rusty for being a good friend and I’ll see you again on the other side.

Social Networking and Consulting

August 10th, 2008

Social networking sites are all the rage today. Some are better than others because of relevance and the quality of people they attract. Social networking in my mind is really a misnomer. It should probably be called business networking because that’s what most of us are interested in after we pass the age of 25.

As a SAS developer and consultant, I find the most important networking sites are LinkedIn and the SASConsulting forum on Google.  The SASConsulting forum on Google is really top notch in that there are discussions ranging from client issues, billing and rates to Cloud and Grid computing and how business analytics may take advantage of these technologies. The SASConsulting forum is a private forum but has over three hundred members. If you have any interest, you can request membership at http://groups.google.com/group/sasconsulting/

 On the other end of the spectrum, we have the Analyticbridge that has so many special interest groups that it’s just plain difficult to find anything. As a matter of fact, as I write this (August 9th 2008) the SAS and Statistical Programming group has not had any activity since June 3, 2008. There’s just too many groups and some of them seem to be about the same thing. It surely needs to be pruned to be more effective.

So what it comes down to is how you can use such sites to your greatest advantage and make the most efficient use of your time. When I get a request to “link” or to be “friends” with someone who has 500+ contacts, I wonder what they do all day. How can they actually get any work done if they are adding 10 to 25 new contacts a day?  Obviously, some of these folks are recruiters/headhunters but for many of these folks, it’s just a game and they provide no intrinsic value to your network.

WPS Graphics and some news on A Bridge to R

August 9th, 2008

It’s been just over a week since I published A Bridge to R for SAS User’s and I’ve received some very encouraging feedback on the product. In the first week of availability, our weblogs revealed that there were over 160 viewings of the online demonstration of the Bridge for SAS User’s and just over 100 downloads. I suspect that there’s much more demand for these kinds of products than just what the download numbers reveal. At any rate, feel free to let me know your thoughts on the Bridge and what improvements and additional capabilities that you would like to see.

One thing that I just recently discovered is that the latest release of WPS supports PROC GREPLAY. As they begin developing the graphics library, I’ll be interested to watch which types of graphics procedures and options are included. I’m not a big fan of SAS/Graph so I don’t have an investment in code or have invested much time in developing my own graphics libraries with this product (I typically use Excel for graphics) but there are a number of folks who have. Beyond pie, line and bar charts, what are people using SAS/Graph for? That seems to be the basic business graphics for 99% of the users.

There’s a lot more going on folks, but at the moment I can’t write about it. I need to wait for some things to settle down before I can make any announcements.

A Bridge to R for SAS Users

July 31st, 2008

I’ve recently finished writing a piece of software called a “Bridge to R for SAS Users” and an accompanying installation guide and want to let those who may be interested know about its availability. The Bridge to R for SAS Users is very similar to a version I wrote for WPS users in that you can write R code inside the SAS IDE and execute the R code on your desktop. The output from R, both the LST files and LOG files are routed back to the appropriate SAS window.

This release is a Community Preview and will operate until October 31, 2008. After that date, you will have the option to purchase a license if you decide you would like to continue using it. We’re looking at an annual license of approximately $49.95 for the Bridge.

The Bridge to R for SAS Users requires you to be running SAS 9.2 on a Windows desktop platform (tested with XP Pro, Vista Ultimate 32 and Vista Ultimate 64). If you want to read SAS data sets directly using R, you will need to download and install the SAS ODBC driver from the SAS web site. If you don’t want to go through the process of setting up the SAS ODBC driver, you can use the CSV option instead. Amazingly, there’s virtually no difference in execution time between using a CSV file or the SAS ODBC driver when reading data sets of less than 50,000 records and 20 or so variables.

I have put together a short screen cast the demonstrates how the Bridge to R for SAS Users integrates into the SAS environment and how the Bridge is used. You can view the screen cast at the MineQuest, LLC website by clicking here.

If you wish to use the software during the Community Preview period, you can download the Installation and Users Guide as well as the actual program from the download section of the MineQuest web site.

Links:
Bridge to R Screen Cast: http://www.minequest.com/Misc/bridge2rdemo/Bridge_2_R_for_SAS.html

Bridge to R for SAS Users download: http://www.minequest.com/downloads.html

An Update on the WPS2R Bridge for SAS

July 19th, 2008

We finally have a version of the WPS2R Bridge for SAS working on Windows XP, Vista (both 64 and 32 bit versions) and Windows Server 2003. The WPS2R Bridge is a facility that provides the SAS programmer working in WPS or SAS to write and process R source code from within the SAS IDE. I’m excited that this works as well as it does. I think it’s a much more friendly and intuitive way of learning and using R for the SAS developer than anything else out there.

In writing the code for the Bridge, the processing is actually straight forward. There is a lot of code that had to be written to catch user errors such as missing or incorrect data set names, incorrect transport methods, etc…  before submitting the R source code to R.

The WPS2R Bridge for SAS works with the SAS/ODBC driver to natively read SAS data sets. If you don’t want to go through the short exercise of setting up the SAS/ODBC driver, you can forego that and just use the CSV transport method. Speaking of transport methods, we support three different transport calls . They are CSV for comma separated values, AUTO for having the Bridge create the code behind the scenes to connect to the SAS ODBC driver and MANUAL for allowing the user to write their own connection code. The idea behind the architecture of the WPS2R Bridge is to make connecting to R as simple as possible so that the developer can concentrate on writing code and not worrying about connections.

There’s a few things that are left to do before releasing the WPS2R Bridge for SAS. I want to revisit some of the error messages that I use. I think I can be a little more informative than what I currently have in place. I also need to radically update the documentation for this product as well.

The WPS2R Bridge for SAS will require that you be running Windows XP or Vista. You will also need to have SAS 9.2 as well as a copy of R installed. If you want to read SAS data sets natively, you will also have to download a copy of the SAS ODBC driver that is available from the SAS web site. Regarding the requirement that you have SAS 9.2, that was a conscious design decision. Previous versions of SAS do not have some of the facilities that we make use that are available in 9.2.

In the next few days, I’ll create a screencast that demonstrates how the WPS2R Bridge for SAS works. We will put out a community preview of the software that will be good for 90 days so SAS users can get a taste of how the Bridge and R work. My goal is to have the Community Preview available on August 1, 2008. After the community preview period is over, the software will no longer work but if you find that it’s something useful, you will be able to license it for around $50.

A Little Background on Developing the WPS2R Bridge

July 14th, 2008

I’ve received numerous email’s asking why I’m developing the WPS2R Bridge. Basically, and without pulling any punches, I’ve decided to write the interface into R that is callable from WPS and SAS for a couple of reasons.

General Background
Provide a mechanism for SAS developers to access R programs from within the WPS workbench. This mechanism, called the WPS2R Bridge, allows you to gain access to more advanced statistical procedures as well as a great graphics library.

Provide a way for companies to stop the hemorrhaging of cash that they have to pay out to SAS Institute for licenses. I think everyone with the exception of the dick nozzles at SAS (aka System Design Architects) understands that we’re in a global economic downturn. This is the perfect time to start  moving your code to WPS and saving a lot of money.  For a pricing comparison of SAS and WPS on a Windows Server, take a look here.

For organizations that are working in a mixed environment using  R and SAS, the WPS2R Bridge allows the developers to continue working with SAS Code in a more user friendly IDE. Your organizations developers can continue to leverage their SAS coding skills and reuse much of their code.

Allow SAS Developers and SAS Consulting companies to begin building Vertical Market Applications (VMA’s) using WPS and R. Since WPS doesn’t have the statistical capabilities that SAS has, using R in lieu of SAS/STAT, SAS/ETS, SAS/OR, etc… can be a huge cost savings for both the developers who have to license all the SAS products as well as the companies that will be purchasing these new VMA’s.

Moving away from a 100% proprietary system gives developers, analysts, and organizations that are looking to license VMA’s alternatives that they currently don’t have. Options to pick and choose and mix and match are important when designing custom analytical software.

The Myth of Large Data Sets
The criticisms that R cannot handle large data files is for the most part misplaced. Let’s put this into perspective once and for all. The people who most often criticize R’s supposed inability to handle large files are simply parroting what they’ve heard from others.  There’s a difference between manipulating large data sets doing ETL and getting them in a format for statistical processing and actually running the statistical processes. The last three consulting contracts I’ve had, all Fortune 500 companies have had analysis performed on business portfolio’s of 55,000, 80,000 and 145,000 businesses. When the data was summarized and organized into a format that the analysis could be run, it would easily run on a PC with 2GB or RAM using R.

Many government agencies don’t have that many records that would prevent analysis in R either. Take the State of Ohio’s Department of Tax. They have roughly 215,000 businesses that have one employee or more. The Columbus City Schools, which is the largest public school district in the state has less than 60,000 students. Both of these organizations data could fit on a decent desktop PC for analysis using WPS and R for statistical processing. Even using a 32 bit OS, the argument that R is not useful for large data sets is a red herring in my opinion. When we see R running on a Windows 64 bit platform, the argument’s will be even less credible. Now that we see beta versions of R running on Intel hardware for Macintosh, I suspect that the wait for a 64 bit Windows version is not too far off.

WPS2R Bridge v2

July 9th, 2008

Over the last few weeks, reader’s of SAS-L have been witnessing a debate about R and its popularity, especially in regards to SAS’s statistical procedures. For those who have missed all the postings, I’ll summarize what I think are the most important points that have been made. R is indeed popular, especially the graphics routines. Using R in place of SAS for statistical procedures has become more common and major universities and corporations are looking at R and open source software not necessarily as a replacement to SAS/STAT, (but indeed it can be just that) but as complimentary software. R is used by pharma companies as well as by the FDA for research and validation. I know that R is also being used in Risk Management in banking for its capabilities.

R is also being used for its advanced statistical routines that SAS does not have. There’s lots of criticism that SAS has fallen behind, if not ignored many of the advanced statistical routines that have become available over the last few years. Bayesian statistics is one area that SAS has ignored until the latest release.  It’s widely acknowledged by many professional SAS developers that the Institute is putting all its time and effort into vertical market applications and is ignoring the developer who has to write SAS code for a living. Just look at the lack of development of the SAS IDE over the last 15 years if you need proof.

It seems that R is becoming mainstream. So, for WPS users this is indeed exciting. By using the WPS2R Bridge, you can use WPS and R together and not leave your favorite programming environment and have access to R for statistics. Currently, the WPS2R Bridge is only operational with WPS and I will soon be making this free software even more robust. I now have a working prototype where you can pass macro variable values from your WPS programs to R.
 

Right about now, you’re probably doing your best Gary Coleman impression and asking “Whatcha talkin about Willis?”

Being able to use macro variables in R, generated from a WPS program is powerful. This gives you the ability to reuse much of your R code, callable from within WPS without the hassles of changing your R programs. By using macro substitutions, this allows you to more easily automate your R software that has been integrated into WPS. For example, you can modify/substitute  variable names, pass directory and file names to R as well as pass most any values that R might need. You will still use the same syntax, i.e. use the ampersand sign in front of text that you want resolved. This cuts down your learning curve dramatically and makes life a lot easier for developers and consultants who have to review and modify code.

One of my goals with the WPS2R Bridge is to continue the integration of R and WPS. Hopefully, with a release of an ODBC driver from World Programming, we will be able to transparently read WPS data sets in R. Between extending the macro language to R and reading native WPS data sets, developers, researchers and analysts can begin building production based statistical systems based on WPS and enjoy the tremendous cost savings afforded by using such a combination.

By the end of the month, I’ll post a new release of the WPS2R Bridge that implements the macro feature. I need some time to test the code and write some documentation for this. If you have any questions or suggestions, let me know by sending an email to: BlogPost@MineQuest.com