March 1st, 2010
We’ve had the Bridge to R out for about a week now and so far, it’s been very positive on the feedback. The Bridge is pretty stable and I think fairly easy to use after you setup R and install the Bridge itself.
Which brings us to what we may want to include in the next release. What we’re thinking about is including MPExec, and a nifty little macro called WPS2XML which allows you to create an XML file as well as generate a schema (DTD or XSD) from your WPS data set. The code for both utilities is already written and just requires testing in a more stressful environment.
So, what is MPExec? It’s a very robust utility that we wrote for WPS customers last year. MPExec stands for Multi-Processor Execution and it allows the WPS user to thread their programs so that they can run multiple parts of the program at the same time. On a multi-core desktop or server, one can dramatically reduce their programs execution time — depending on how well the program can be threaded.
Most programmers (and SAS programmers especially) think in a top down fashion when designing their programs. For example, you may have multiple steps to extract and clean data from a database, another set of steps that access data in a transport file that was created on the mainframe that also needs cleaned and sorted, and finally, some historical data that is sitting in another database on another server.
None of these three steps outlined above have anything in common (i.e. data sharing) in the sense that they have to run sequentially. Why not run these all at the same time on your multi-core desktop or server and save time? That’s exactly what MPExec allows you to do.
I’m interested in hearing what other users may have interest in when it comes to utilities to enhance and expand their use of WPS. Currently, the Bridge to R and the accompanying utilities are only available on Windows platforms. Is there a need or interest to have them also execute on Linux/Unix/Solaris? Of course, we’re always interested in hearing ideas about how we can expand our utilities to include R in a more seamless fashion as well.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Bridge to R, MPExec, Open Source, Parallel Processing, R, RSTATS, WPS
Posted in Bridge to R, MPExec, Open Source, Parallel Processing, WPS | No Comments »
February 24th, 2010
It’s been a while since I last posted to my blog. I’ve been very busy on two different projects as well as writing RFP’s for WPS license sales. Looking back on the blog, the last post was January 27th. WOW!
So what’s the latest news? A new release of the Bridge to R is now available for WPS Users. This release, v2.4 is a bit more robust being able to recover from some formerly catastrophic programming errors.
But the highlight of Version 2.4 is that it now includes support for Ggobi for interactive data visualization. We’ve simplified calling Ggobi and passing data to it so all you really need to do is type:
%ggobi( datasetname );
If you’re not familiar with Ggobi, I suggest you visit www.ggobi.org and view the tutorials and demos on the website. There’s some really nice things that they’ve implemented.
If you’re a WPS user, you will need version 2.4 of WPS to run the Bridge to R. It does make use of some new features available in WPS that requires v2.4.
If you have licensed WPS from MineQuest, and you’ve not received a copy of version 2.4 of the Bridge to R, contact sales@minequest.com and we’ll get a copy right out to you.
The Bridge to R is provided as a free utility from MineQuest only if you purchase your WPS license directly from us. If you’ve licensed a copy of WPS directly from World Programming or from another reseller, The Bridge to R can be purchased for $149 for a desktop license and $499 for a Windows Server license.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Posted in Bridge to R, R, RSTATS, data visualization, ggobi | No Comments »
January 27th, 2010
When you install your copy of WPS on your Windows PC or Windows server, there’s a number of PDF files that describe the use of WPS. If you go to the help system and type in “batch” in the search box, you will see a link that explains how to setup a windows command file to run WPS in batch mode.
I have a command file that I use on a pretty regular basis that does a provides some additional information on the WPS job that you’re executing and it’s called RunWPS.CMD. RunWPS adds two things to the batch program that I personally like to see. First, it writes out to the screen if there’s an error or if the job executed correctly. If there’s an error, it tells you the type of error based on the ErrorLevel value returned by WPS. The other aspect the batch file RunWps.CMD adds is it gives you the start time, stop time and elapsed time that the job took to execute.
Click to view image
You run the batch program by simply typing:
RunWPS myprogram.sas
If you want to run your WPS program in a different priority than NORMAL, you can do that too. For example, if you want to run your program in the background in low priority, you can type:
Start /Low /B RunWPS myprogram.sas
or in high priority by using the command:
Start /High /B RunWPS myprogram.sas
After you download the program, you will have to change line 10 in RunWps.CMD to reflect the location of where WPSI.EXE is installed. On Vista 64 and Windows 7 64 bit, WPS installed by default to:
c:\Program Files (x86)\World Programming WPS 2
So you will want to modify line 10 of the RunWPS.cmd to be -
SET wpsloc=c:\Program Files (x86)\World Programming WPS 2
If you are running a 32 bit operating system, the default location that WPS was installed to is c:\Program Files\World Programming WPS 2
In that case, change line 10 of RunWPS.CMD to be -
SET wpsloc=c:\Program Files\World Programming WPS 2
You can download the 2kb zip file at: http://www.minequest.com/downloads/RunWps.zip
The program is fairly well documented and you can easily change the format of the start, stop and elapsed time from hh:mm:ss to hh:mm:ss:ms. Most of the program deals with calculating elapsed time so don’t be scared by all the statements dealing with time! Finally, one caveat and that is the program will not properly report the elapsed time if it is more than 24 hours.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Batch, SAS Replacement, Windows, WPS
Posted in Batch, WPS, Windows | No Comments »
January 26th, 2010
Over the last few weeks, we’ve been working a bit on the next release of the Bridge to R for WPS. For those of you who are not familiar with the Bridge to R, it allows you to execute R programs from within the WPS Workbench and return the log and list files back into the WPS GUI.
The next release will contain a couple of program fixes but the major feature that we will be adding is the integration of a data visualization system. The visualization system that the Bridge to R will be talking to is called GGobi. GGobi is open source and allows you to perform analysis using highly interactive graphics in a dynamic fashion. Here’s a screen shot of GGOBI executing via WPS.
Click image to view
What the interface into GGobi does for you, the programmer analyst, is automate loading the data into GGobi and calling the software for execution.
To get an idea of what GGobi can provide to you, I suggest you visit their website and take a look at the tutorials and demos at http://www.ggobi.org/docs/
Currently, we have the Bridge to R talking to GGOBI through an R interface. This works OK but leaves the R GraphicsViewer on the screen and looks much more cramped than necessary. One thing we are going to try to do is get rid of the R GraphicsViewer window that pops up and talk more directly to GGobi bypassing R.
If we write a version that will bypass R, you will only have to download and install GGobi on your workstation. Using R, the method you will use to invoke GGOBI can be seen in the sample code below:
%Rstart(csv,mydata,GRAPHWINDOW);
datalines4;
library(rggobi)
g <- ggobi(mydata)
;;;;
%Rstop;
There are some advantages to writing code as above, especially if you like to code in R. However, for most folks I think what we are really striving for is to have an interface that is simple to call and easy to remember. For example, I think the following code snippet is much simpler:
%GGOBI( dsetname );
where dsetname is the name of the data set that contains the data you want to visualize.
Right now, we’re looking at a at least a few weeks before the next version of the Bridge to R is available. We’ll have to update the documentation to reflect what has to be done for installation as well as thoroughly test and debug the software. But things are looking good!
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Posted in Bridge to R, Business Intelligence, Open Source, R | No Comments »
January 19th, 2010
In my previous blog post (click here to read) I discussed a little history as well as opinion on the pitfalls of creating a Vertical Market Application (VMA) using the SAS System. In this post, I want to discuss some alternatives that you might want to consider to either (1) port your application to WPS or (2) create from scratch your VMA to run under WPS.
To me, the major drawback to using SAS to create a VMA is cost and license complexity. You constantly have to worry about which SAS products you need to have in your portfolio. If you want to create graphics in your VMA, you need SAS\Graph, if you want to be able to connect to a myriad of database engines, you need to be have numerous SAS\Access engines included in your product, etc…
With WPS, that is taken care of for you. The price of the product is all inclusive and you don’t have to worry about which access engines you need to include because they’re already there! A lot of pain and paperwork is eliminated by taking the WPS Bundle approach.
Another pro to using WPS is the low cost of the product. Especially when compared to SAS. If you’re developing an application that is server based, the difference in pricing of WPS and SAS, your cost, the customers cost, and inevitably your profits are tremendous. Take a look at our WPS vs SAS cost comparison that we did back in October to get some real world numbers. A quick note here, this cost comparison is the most popularly viewed and downloaded file on our website. I think that says something!
So we’ve talked about the advantage of WPS pricing, let us take a look at licensing. As a developer, you are encouraged to create products that run WPS Software.
You may use the WPS SOFTWARE for the processing of third party data, programs and applications and for the creation of products that RUN on WPS SOFTWARE. You may process data produced or consumed by WPS SOFTWARE on other platforms.
What you are not allowed to do with WPS is sell time to other parties (people outside of your company) to use the software on your network or on a public network. Here’s a passage from the WP license agreement.
If you license the WPS SOFTWARE for server usage you are permitted to RUN the WPS SOFTWARE for both attended and unattended operation. Remote user sessions connected to the same server via a private network controlled by you may also RUN the WPS SOFTWARE on the same server without additional WPS SOFTWARE licenses. Concurrent usage by two or more users of the WPS software is permitted. Third party users are excluded from the right to access WPS SOFTWARE on the server unless the third party holds an appropriate WPS SOFTWARE license.
I want to point out that you don’t have to be a member of some "Partner Program" that costs $10K and up per year to develop a VMA with WPS. Funny, I’m writing this and looking at the SAS Partner Program web page and they have seven different categories and three levels of participation for their Partner Program. And yet, they only have three 3rd party VMA’s that I can find. And even given that, they don’t list or promote these other VMA’s. Sheesh!
Another advantage to using WPS to develop your VMA’s is that you can run your application on all the platforms that WPS supports. Currently, WPS runs on Windows Desktops and Servers, Linux x86, Mac OS X on x86, SUN Solaris for Sparc, SUN Solaris x86, IBM’s AIX on pSeries, z/OS on System z, and Linux on System z. That’s a sufficiently large number of platforms to support development and sales of your application in virtually any corporate environment.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Business, Licensing, SAS, SAS Replacement, Vertical Market Applications, WPS
Posted in Business, Licensing, SAS, SAS Replacement, Vertical Market Applications | No Comments »
January 17th, 2010
This blog post is going to be in a couple of parts due to the nature and complexity of the subject, and how I want to cover a little history (as well as offer s some personal opinion) for users creating Vertical Market Apps (VMA).
Have you ever wondered why there are so few vertical market applications for resale written in SAS by third party vendors? I have wondered about that and have some thoughts to offer on that matter too. But first, what are the 3rd party apps that are available? I can only think of a few, and they are: MXG, CA’s MICS, Futrix and you can probably throw in Link King but it’s not being offered for sale, but is provided at no cost. Even though Link King is offered freely, its quality is pretty decent and could easily be offered for resale or even provided using SaaS (Software as a Service).
I think there are numerous reasons why there’s so few VMA’s written in SAS. First, the cost factor of relicensing SAS is prohibitive, complex and confusing. Will SAS allow you to relicense it’s products if you create one that competes on some level with a VMA they currently have?
The second issue that arises, and more importantly, is that it comes down to who owns the customer. Does SAS own YOUR customer because you’ve decided to create a SAS based VMA or do you own the customer?
From personal experience from when we were a SAS Quality Partner years ago, if you develop an application that competes on any level with something SAS has or is interested in moving into, you can expect to get the run around on license issues and never get a clear answer. Unless you know what your costs are going to be, it’s impossible to create a pricing strategy and know how much development effort will have to be put into the product.
In our case, we were looking at creating (#1) a standardized test tracking system that would allow school districts to track a student’s progress over years and (#2) even assign students to class rooms based on likelihood of passing state mandated tests so schools would less likely be labeled deficient or as a failure. We had a working prototype and when we approached SAS on licensing issues, we were given the run around and even refused to sign an NDA, though they were very keen on seeing what we had.
Let’s take a hypothetical example. If you are a financial services company that offers risk products such as credit scoring, fraud detection, portfolio evaluation; i.e. companies like Fitch, Moody’s, S&P, Fair Isaac, etc… will SAS prevent you from reselling your software products if written using the SAS System because that’s an obvious market they want to be a player in?
The question in my opinion, is how abusive is SAS when it comes to third parties wanting to develop VMA’s using the SAS System? If they’re not technically a monopoly, are they acting as one when it comes to this kind of behavior? Does SAS have API’s that are only known to their development staff that are not available to others who are looking to develop VMA’s? Is there a good reason why the SAS data set layout is proprietary in nature and not open for use without having to have the SAS System on your hardware?
These are all questions and issues you need to think about before spending lots of money and time when getting in bed with a large company. I think for a company that has been in business for 35+ years and has only three or four 3rd party VMA’s is a telling story about how difficult it is to work with them in creating applications for resale.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Business, Licensing, SAS, SAS Replacement, Vertical Market Applications, WPS
Posted in Business, Licensing, SAS, Vertical Market Applications, WPS | 2 Comments »
January 14th, 2010
After working for at least a week on revamping the MineQuest website, we can finally take a nice long breath. I’ve wanted to change the look of the site as well as reorganize the the structure to make things easier to find and more efficient.
The website revolves around three areas that we are engaged in. First, is consulting and contract programming for WPS and SAS. The second area that we are featuring is WPS by World Programming. WPS is the SAS language alternative that runs SAS code on Windows, Linux, AIX, Solaris and Macintosh platforms. The third area that the website focuses on is the Bridge to R. The Bridge to R is our own product that allows WPS users to execute R from inside WPS.
When recreating the website, one thing we did was create two columns for most of the web pages. The column on the left contains information on how to quickly find information that we’ve identified through our weblogs that users are most likely looking for on those topics. The text on the right hand side offers information on the current page for the topic you requested.
We’ve also included pricing for WPS on the Windows Desktop and Server platforms that you can download in PDF format. Pricing for other platforms that we resell, such as Linux, Solaris, AIX and Macintosh are available if you contact us and request a quote. We hope to roll out more pricing on the website in the near future.
Over the next few weeks, we’ll continue to flush out the rest of the website, tweak it where it needs tweaking and add back in more downloads. The downloads that have been referenced in prior blog postings are still available so you can continue to download them but we’ve not yet had time to categorize and post them on the new site yet.
Regarding the downloads section, there were a number of macros and such that are no longer necessary and will be deprecated. The newer versions of WPS has many of these functions built-in and make these tools obsolete.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Linux, MineQuest, Pricing, R, SAS, SAS Replacement, WPS
Posted in Linux, MineQuest, Pricing, R, SAS, WPS | No Comments »
January 13th, 2010
On Tuesday Jan-12-2010, the Inquirer ran a story with the title " British ‘David’ pokes US ‘Goliath’ — Gets Big Blue thumbs up" and I thought it was rather interesting for its tone but much more interesting is the content of the article.
World Programming (WP) the software developer behind WPS, an alternative SAS language system has come out with a release for z/Linux. For those who are unaware, z/Linux is going to be a big deal in the mainframe market because of the way the hardware and OS is priced, and how software that runs on z/Linux is priced. The Inquirer does a good job at describing this and you can read the article for yourself at http://www.theinquirer.net/inquirer/news/1585633/british-david-pokes-us-goliath-eye. There’s also a press release on the WP website at: http://www.teamwpc.co.uk/press/zlinux_accreditation.
What I find interesting is that the familiarity people have with Linux and WPS will be directly applicable to running their existing applications on this platform. SAS doesn’t have a product on z/Linux and they would most likely have to undercut their z/OS pricing to be successful in selling on z/Linux. Basically, they would be cannibalizing one product for another.
Using WPS in z/Linux, you would have the ability to run the exact same WPS code on any of the Linux/Unix/Solaris platforms without change. There’s no JCL to worry about, so if you were smart, you only have to modify some libname statements and some file statements that you either put at the beginning of your code or "included" in your job.
I personally think this is a fairly large sized market for WP and one where SAS has been caught napping.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Linux, Pricing, SAS, SAS Replacement, WPS, z/Linux
Posted in Linux, Pricing, SAS, WPS, z/Linux | No Comments »
December 30th, 2009
I’ve been working on setting up some baseline testing of how Virtual Machines perform running WPS. This proved to be a bigger PITA than I originally thought because I realized the Linux VM I had was too small to do any testing with data sets of any appreciable size. Since there not any easy to use tools for expanding a VirtualBox VM partition, I ended up just recreating the VM with a larger footprint.
What I wanted to test is how much faster WPS is when running on a native host and compare and contrast those timings with WPS running in an XP VM and in a Linux Fedora VM, all running under SUN’s VirtualBox software.
The VM’s are setup to be as identical as I can make them. They both have 3GB of RAM and each VM has two logical CPU’s dedicated to the VM. The host system is an Intel Quad-Core Q6600 with 8GB of RAM and 1.2 TB of hard drive space spanning four drives in a RAID-1 configuration. I setup the graphic card parameters to be identical in each machine as well.
One thing I did learn in this exercise is that there’s a fair amount of tuning you can do to a VM. That includes setting the number of cores dedicated to the VM to installing a real storage/IO driver inside the VM to get the best performance you can.
One area that I did make a mistake early on was trying to use a shared folder as the temp disk space for WPS. VirtualBox is slow as molasses reading and writing to a shared folder. Performance improved dramatically when I had WPS use the temp folder inside the VM and not using the shared folder.
What I ended up doing is write a simple benchmark program that invoked as many WPS PROCS as I could (and that I typically use) as well as some data step and SQL steps. I’m not trying to be exhaustive in writing the benchmark but I want to get a feel for how VirtualBox performs in contrast to running the same application in a non-VM environment. I also wanted to have some kind of realistic number of records being processed for the test runs. I decided to run a 100,000 record test and then a 1,000,000 record test.
I’m afraid the table below will wrap unless you have your text size set to small. Just in case it does, you can view the Excel Spreadsheet by clicking here..
Below are the results from benchmark program.
|
|
Vista x64 Native
|
Vista x64 Native
|
|
XP 3GB Vbox
|
XP 3GB Vbox
|
|
Fedora 3GB Vbox
|
Fedora 3GB Vbox
|
|
Procedures and Data Steps
|
Real
|
CPU
|
|
Real
|
CPU
|
|
Real
|
CPU
|
|
|
|
|
|
|
|
|
|
|
Statistically Oriented Procedures
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Proc Corr - 12 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.453
|
0.452
|
|
0.57
|
0.55
|
|
0.586
|
0.135
|
| 1,000K Obs. |
4.252
|
4.258
|
|
5.187
|
5.157
|
|
5.718
|
1.804
|
| |
|
|
|
|
|
|
|
|
| Proc Means - 19 Vars |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.184
|
0.202
|
|
0.27
|
0.26
|
|
0.358
|
0.122
|
| 1,000K Obs. |
1.768
|
1.887
|
|
2.343
|
2.323
|
|
3.383
|
0.425
|
| |
|
|
|
|
|
|
|
|
|
Proc Summary - 19 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.131
|
0.14
|
|
0.27
|
0.18
|
|
0.32
|
0.03
|
| 1,000K Obs. |
1.357
|
1.404
|
|
1.802
|
1.782
|
|
3.349
|
0.439
|
| |
|
|
|
|
|
|
|
|
|
Proc Univariate - 19 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
1.484
|
1.513
|
|
2.343
|
2.133
|
|
2.235
|
1.441
|
| 1,000K Obs. |
13.768
|
14.258
|
|
18.957
|
18.887
|
|
18.327
|
11.127
|
| |
|
|
|
|
|
|
|
|
|
Proc Standard - 12 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.493
|
0.499
|
|
0.65
|
0.64
|
|
0.922
|
0.17
|
| 1,000K Obs. |
5.006
|
5.257
|
|
11.626
|
6.99
|
|
8.829
|
1.459
|
| |
|
|
|
|
|
|
|
|
| Proc Rank - 12 Vars |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.425
|
0.452
|
|
0.62
|
0.61
|
|
1.016
|
0.348
|
| 1,000K Obs. |
5.129
|
5.288
|
|
11.416
|
7.57
|
|
13.047
|
4.494
|
| |
|
|
|
|
|
|
|
|
|
Proc Freq - two var crosstab
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.157
|
0.156
|
|
0.3
|
0.21
|
|
0.391
|
0.078
|
| 1,000K Obs. |
1.567
|
1.638
|
|
2.193
|
2.183
|
|
3.634
|
0.491
|
| |
|
|
|
|
|
|
|
|
| Data Manipulation |
|
|
|
|
|
|
|
|
|
Proc Append - Two 50K obs data sets, 24 Variables
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.059
|
0.046
|
|
0.11
|
0.1
|
|
0.216
|
0.009
|
| 1,000K Obs. |
0.459
|
0.53
|
|
11.326
|
1.231
|
|
2.285
|
0.127
|
| |
|
|
|
|
|
|
|
|
| Proc Sort - 24 Vars |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.153
|
0.171
|
|
0.34
|
0.32
|
|
1.028
|
0.167
|
| 1,000K Obs. |
1.269
|
1.856
|
|
20.349
|
3.845
|
|
12.235
|
2.893
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Proc Datasets - Create Simple Index
|
|
|
|
|
|
|
|
|
| 100K Obs. |
1.725
|
1.809
|
|
1.612
|
1.462
|
|
8.335
|
0.796
|
| 1,000K Obs. |
22.937
|
23.384
|
|
18.797
|
18.396
|
|
117.437
|
6.967
|
| |
|
|
|
|
|
|
|
|
|
Proc SQL - Simple Where Returns 10% of Obs.
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.074 |
0.093 |
|
0.12 |
0.12 |
|
0.269 |
0.033 |
| 1,000K Obs. |
0.871 |
0.92 |
|
1.792 |
1.261 |
|
2.606 |
0.241 |
| |
|
|
|
|
|
|
|
|
|
Data Step Create Records
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.341
|
0.343
|
|
0.56
|
0.48
|
|
0.556
|
0.261
|
| 1,000K Obs. |
3.294
|
3.307
|
|
8.872
|
4.676
|
|
5.874
|
3.587
|
| |
|
|
|
|
|
|
|
|
|
Proc Transpose - 1 Var,1 ID var, by var
|
|
|
|
|
|
|
|
|
| 100K Obs. |
2.779
|
2.979
|
|
3.184
|
3.124
|
|
3.791
|
3.12
|
| 1,000K Obs. |
27.263
|
28.204
|
|
32.156
|
31.815
|
|
34.932
|
29.461
|
| |
|
|
|
|
|
|
|
|
| Proc SQL - Join |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.266
|
0.28
|
|
0.731
|
0.55
|
|
0.805
|
0.095
|
| 1,000K Obs. |
5.783
|
6.676
|
|
35.1
|
13.669
|
|
23.381
|
3.72
|
| |
|
|
|
|
|
|
|
|
| Data Step Merge |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.407
|
0.39
|
|
1.291
|
0.68
|
|
0.932
|
0.199
|
| 1,000K Obs. |
3.883
|
4.087
|
|
9.233
|
6.719
|
|
6.197
|
2.712
|
| |
|
|
|
|
|
|
|
|
| Reporting |
|
|
|
|
|
|
|
|
|
Proc Tabulate - 2 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.238
|
0.202
|
|
1.001
|
0.25
|
|
0.442
|
0.165
|
| 1,000K Obs. |
2.11
|
2.262
|
|
2.984
|
2.804
|
|
4.458
|
1.88
|
| |
|
|
|
|
|
|
|
|
|
Proc Print - 1,000 Obs. 24 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.09 |
0.093 |
|
0.931 |
0.1 |
|
0.119 |
0.09 |
| 1,000K Obs. |
0.087 |
0.062 |
|
0.1 |
0.09 |
|
0.118 |
0.095 |
| |
|
|
|
|
|
|
|
|
| Data Access |
|
|
|
|
|
|
|
|
|
Create Transport File - 24 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.254
|
0.187
|
|
0.69
|
0.3
|
|
0.62
|
0.063
|
| 1,000K Obs. |
2.973
|
1.856
|
|
9.103
|
3.304
|
|
6.204
|
1.056
|
| |
|
|
|
|
|
|
|
|
| Read Transport File - 24 Vars |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.232
|
0.249
|
|
0.38
|
0.37
|
|
0.633
|
0.197
|
| 1,000K Obs. |
2.341
|
2.355
|
|
7.3
|
4.025
|
|
6.351
|
1.412
|
| |
|
|
|
|
|
|
|
|
|
Create SPSS file - 24 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.167
|
0.156
|
|
1.061
|
0.34
|
|
1.29
|
0.154
|
| 1,000K Obs. |
1.707
|
1.887
|
|
5.467
|
3.505
|
|
8.55
|
0.769
|
| |
|
|
|
|
|
|
|
|
|
Read SPSS file - 24 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.222
|
0.218
|
|
0.971
|
0.35
|
|
0.873
|
0.12
|
| 1,000K Obs. |
2.19
|
2.308
|
|
7.34
|
3.975
|
|
7.451
|
1.657
|
| |
|
|
|
|
|
|
|
|
|
Proc Import - CSV; 24 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
0.816
|
0.811
|
|
1.161
|
1.091
|
|
1.386
|
0.446
|
| 1,000K Obs. |
8.262
|
8.58
|
|
12.788
|
11.055
|
|
13.74
|
7.045
|
| |
|
|
|
|
|
|
|
|
|
Proc Export - CSV; 24 Vars
|
|
|
|
|
|
|
|
|
| 100K Obs. |
1.816
|
1.887
|
|
2.343
|
2.153
|
|
3.077
|
2.244
|
| 1,000K Obs. |
17.987
|
18.517
|
|
22.882
|
21.971
|
|
30.306
|
20.288
|
| |
|
|
|
|
|
|
|
|
| Data Management |
|
|
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|
| Proc Delete |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.002
|
0
|
|
0.1
|
0.1
|
|
0.014
|
0.005
|
| 1,000K Obs. |
0.11
|
0
|
|
0.2
|
0.2
|
|
0.034
|
0.004
|
| |
|
|
|
|
|
|
|
|
| Proc Copy - 24 Vars |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.88 |
0.109 |
|
0.24 |
0.23 |
|
0.62 |
0.063 |
| 1,000K Obs. |
2.973 |
1.856 |
|
9.103 |
3.304 |
|
4.629 |
0.187 |
| |
|
|
|
|
|
|
|
|
| Proc Contents |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.002
|
0
|
|
0
|
0
|
|
0.003
|
0.001
|
| 1,000K Obs. |
0.008
|
0
|
|
0.781
|
0
|
|
0.002
|
0
|
| |
|
|
|
|
|
|
|
|
| Graphics |
|
|
|
|
|
|
|
|
| Proc Plot |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.12
|
0.14
|
|
0.64
|
0.22
|
|
0.453
|
0.083
|
| 1,000K Obs. |
1.243
|
1.294
|
|
2.113
|
2.093
|
|
4.147
|
0.577
|
| |
|
|
|
|
|
|
|
|
| Proc Chart |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.06
|
0.062
|
|
0.16
|
0.11
|
|
0.217
|
0.041
|
| 1,000K Obs. |
0.594
|
0.624
|
|
1.171
|
1.131
|
|
1.981
|
0.269
|
| |
|
|
|
|
|
|
|
|
| Proc Gplot |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.258
|
0.28
|
|
0.46
|
0.41
|
|
1.135
|
0.179
|
| 1,000K Obs. |
2.435
|
2.371
|
|
5.247
|
3.785
|
|
9.985
|
1.319
|
| |
|
|
|
|
|
|
|
|
| Proc Gchart |
|
|
|
|
|
|
|
|
| 100K Obs. |
0.081
|
0.109
|
|
0.14
|
0.11
|
|
0.237
|
0.049
|
| 1,000K Obs. |
0.613
|
0.592
|
|
1.131
|
1.121
|
|
2.046
|
2.22
|
|
|
|
|
|
|
|
|
|
| Total of all Times |
158.608 |
161.546 |
|
302.108 |
206.42 |
|
394.115 |
119.629 |
One of the first things that jumps out at me is the elapsed time for creating an index in the Linux VM. The test took 117 seconds to complete. At first I thought this might just be an artifact of running a large dataset in a virtual machine but I’m now of the belief that this is a reflection of a first release of WPS on Linux and this aspect of it needs some performance tuning.
The other issue I see (and I don’t have the CIS background to figure this out on my own) is the CPU time for the Linux tests. Those seem rather small to me on many of the PROCS that I benchmarked. Especially when you contrast them to the Windows tests.
Other than those two issues, I don’t think anything else really jumps out at me. The performance timings you see in the chart are pretty indicative of how other applications run in terms of real time. I will say that with some work and perhaps throwing a $1,000 at the problem, I could get the timings for the VM’s to drop by 30% or 40%. I’d start by adding the Intel Matrix Storage Controller in the VM’s as well as adding some disk storage so that I had a two additional virtual disks. That way, I can have a separate temp work space as well as a disk for my permanent files and be able to read and write simultaneously and I’m certain I could get I/O down considerably.
In my opinion, if you had modest size data sets that you process on a regular basis (modest meaning no more than a few million records), I think running WPS or most any other BI application in a VM is quite possible. The reasons for running a BI application in a VM is to save money in that you only pay for the cores that you need for your application and not all the cores on the server. Hence, if you have a 16 core server and you only need six cores for your research group, you only pay for six cores. That holds for WPS licensing but not SAS licensing. The greedy folks at SAS want you to pay for all 16 cores whether you use them or not.
I hope this information helps those folks looking at WPS and are thinking about running it in a VM. I could have included some other PROCS such as LOGISTIC, REG and COMPARE, but quite frankly I became exhausted doing this study.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Cloud Computing, Licensing, Linux, SAS, SAS Replacement, Vbox, Virtual Box, Windows, WPS
Posted in Cloud Computing, Licensing, Linux, SAS, Virtual Box, WPS, Windows | No Comments »
December 27th, 2009
I certainly hope that everyone has had a great Christmas. I always enjoy this holiday, probably the most of any. It seems like the week between Christmas and New Years is always slow. Maybe this is a godsend because I’ll have some time to play with software!
I discovered a great blog called "The Fat Bloke Sings" that focuses quite a bit on VirtualBox. This is a cool site and the author lots of information and video tutorials on using Virtualbox. The video on teleportation is worthwhile watching and definitely shows the need for some scripts to automate ad more easily allow the transport of a running Virtualbox VM from one machine to another.
After watching the video, I couldn’t help but think about how this could be used in the BI space. Specifically, if you are running critical systems or real-time BI. I’ve never had great success in terms of speed and performance when running SAS or WPS (or any other disk intensive BI program for that matter) in a VM. I typically use VM’s for development purposes and not production work. That said, I do know from reading the latest spec sheets on Vbox, they’ve been able to increase I/O speed as well as paging. But, if you are running a system that hits large datasets on a regular basis, I think a VM solution could be problematic.
Another thing I’m interested in is if there’s any real speed difference between running WPS on a Linux and Windows system. I have a very generic system setup where I have both Windows XP and Fedora 11 installed on a Quad-Core machine. The hard drives are pretty vanilla 500GB, 7200 RPM drives with Linux and Windows installed on disk 0, and disk 1 partitioned between Linux and Windows with a 250GB work drive between the two. Finally, I have disk 2 as a drive with the permanent data. This is probably the fairest way to do a real comparison between the two.
If I get a chance, perhaps I’ll do some benchmarks on a system looking at disk I/O, Real time and CPU time between the two platforms. I’ve heard from the Linux and Unix zealots that the ‘nix platforms are faster than Windows but I’ve never seen a direct comparison.
About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.
Tags: Business Intelligence, Linux, Windows, WPS, WPS 2.4
Posted in Business Intelligence, Linux, Virtual Box, WPS, Windows | No Comments »