Category Archives: WPS V3

Why WPS v3.3 is important

A few weeks ago, I posted a blog about the new release of WPS v3.3. Quite a few people and companies have been waiting for this release and I want to talk about why this is such an historic release. I also want to write about what this release means to data scientists.

First, this release is truly a blockbuster. With the Interop for R and Python modules, this is the first time as far as I am aware, where a software product allows a developer to easily use the Language of SAS, R and Python to implement a program, project or product – all in one development environment. As data science has evolved, R and Python have become more popular but BOTH lack the data management capabilities of WPS. With WPS, you not only get the language of SAS to process your data, but you also get a multitude of database engines to read\write and access data in the most popular databases.

The beauty of such integration is ease of development. If you have been a developer for as long as I have, you know the time demands of learning a new development environment. Now, with WPS v3.3 a developer can stay within a single development environment for all of their analytical development needs. The developer doesn’t have to learn an R IDE to write, test and execute R code. The same can be said for Python. The Python interface with WPS means having development control as well as execution control of python programs or similarly R programs.

Organizations want to take advantage of R and Python integration because it allows them to create and expand programs and projects. It will quickly become apparent for third-party developers, i.e. those who want to create vertical market applications that these two additional languages radically increase their tool sets going forward. For many, it also means running production jobs that are self-contained, meaning control and execution is controlled by WPS and not a bunch of separate tasks or processes having to be handled individually.

I have been fortunate to have had access to Alpha and Beta builds of WPS for v3.3 and was totally blown away by the improvements and additions that I saw. For example:

  • The data step is faster.
  • Inclusion of the Python programming language.
  • Implementation of Proc IML.
  • Faster data engine access for many databases including multi-threaded loading.
  • More complete graphics output in statistical procedures.
  • PDF support

But what makes this one of the most compelling releases is the integration of R and Python. With Python specifically, one now has access to executing Machine Learning code from WPS. I’m sure many of you who read this blog are involved in credit scoring, fraud detection, anti-money laundering, market basket analysis, loyalty programs and other real-time analytics. With Python and OpenGL and the CUDA libraries, one can now perform incredibly high speed processing of data on your desktop/server GPU.

When looking at the breadth of the WPS offering, it’s amazing how much is included for the data scientist who needs to work in multiple languages supporting data analytics. With all the database engines (including Hadoop), WPS Graphics, IML, R, Python, WPS Statistics, and WPS Time Series, I almost faint thinking what the cost would be for something similar from our competitor SAS Institute.

The other aspect of WPS v3.3 that is so enticing is the licensing. There are two components to this that deserve discussion. The first, is that Data Service Providers (DSP’s) can make use of this software to develop and provide statistical and mathematical models for third parties. Let’s face it, many if not most organizations lack the expertise to develop these models for companies.

The second component is the creation of Vertical Market Applications using the WPS software. The absolute reasonable cost of the software is a driving factor in being able to create and resell your VMA at a price that small, mid-size and large companies can all afford. Using WPS as the basis for your VMA is advantageous because you’re not beholden to some other organization requiring you to pay partnership fees for access to the software and marketing. I think you would be shocked to learn about how competitive and perhaps ruthless a software vendor can be when it comes to introducing and pricing a competing VMA.

If you want to learn more about the latest release of WPS v3.3, especially as it pertains to workstations, read the previous blog post and download the latest brochure for v3.3. You can request an evaluation of WPS v3.3 by contacting info@minequest.com or filling out our evaluation or quote request form here.

I hope everyone had a Merry Christmas and wishing everyone a Happy New Year.

 

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in beautiful Tucson, Arizona. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS consulting and contract programming services and is an authorized reseller of WPS in North America.

WPS v3.3 Now Available

On Thursday December 15th WPL introduced WPS v3.3. This new version is available for immediate download. With v3.3, WPS includes a slew of new Procedures that will be of great value to those who hold WPS licenses and those who are looking to convert over to WPS from SAS.

New Language Procedures

Matrix Language Support is now available with PROC IML. PROC IML is included as a standard procedure in WPS and is not an additional cost module. There is a 430 page programming guide in PDF format that is included in the installation folder detailing how to use the Matrix Programming Language.

Python Support is now included in WPS with PROC PYTHON. PROC Python allows a WPS user to create, edit and invoke python programs from within WPS. The implementation of PROC PYTHON is very similar to PROC R. PROC PYTHON is included in WPS and is not an additional cost module.

ODS Support

WPS now includes output to PDF as well as HTML and Text output destinations. Note that PDF support is available on all platforms except z/OS at this time.

New Statistical Procedures

PROC ACECLUSProvides two methods for approximating the within-cluster covariance structure for a clustering model under the assumption of equal multivariate Gaussian distributed clusters.

PROC CANCORR – Identifies and measures the associations among two sets of variables.

PROC GENMOD – its generalized linear models.

PROC LIFEREG – Fits parametric, accelerated failure time models in the presence of left-, right- and interval censored data.

PROC LIFETEST – Estimates non-parametric survival functions in the presence of censored data using Kaplan-Maier or actuarial methods.

PROC LOESS – Fits non-parametric regression surfaces to multi-dimensional input data. The smoothness of the non-parametric model can be controlled. Outliers in the input data are detected.

PROC MI – Imputes the values of missing values in an input dataset.

PROC MIXED – Fits a mixed linear model to input data.

PROC MODECLUS – Produces various cluster output statistics.

PROC PHREG – Fits the Cox proportional hazards model to survival data.

PROC PROBIT – Fits binary or ordinal response regression models, useful for dose-response type analysis. Various types of model are supported by the procedure. Parameter estimates are generated through the use of maximum likelihood estimation. Model fit statistics enable the quality of the generated model to be assessed.

PROC VARCOMP – Fits generalized linear models with random effects, where the associated covariance matrix is assumed to be diagonal.

Note that WPS Statistics is included in the cost of a WPS license and is not a module that needs to be licensed separately at an additional cost.

New Graphics Procedure

PROC GBARLINE – The GBARLINE Procedure has been added to WPS. This procedure allows you generate bar charts on which plot data has been overlaid on to the bar chart.

New Data Engine

XLSX Engine – This is a cross platform engine that provides read and write access to file in Microsoft Excel format. You can process Excel data on any platform you choose and are no longer limited to Windows platforms. The XLSX engine is included in WPS and is not an additional cost module.

Data Engine Enhancements

NETEZZAM -Is a replacement engine for the NETEZZA Engine. NETEZZAM provides for multi-threaded operation using a new architecture enabling significant performance increases. The NETEZZAM engine is included in WPS and is not an additional cost module.

ORACLEM – Is also a replacement for the ORACLE Engine of prior releases. ORACLEM is also multi-threaded bringing performance increases. The ORACLEM engine is included in WPS and is not an additional cost module.

Both the above engines provide for the ability to Bulk-Load data.

There are a number of additional language features and workbench features that are worth investigating as well. WPS v3.3 is a major release where the functionality and language and procedures have been augmented.

For a list of all the WPS Procedures and Database Engines that are currently supported in v3.3, you can download a two-page brochure from MineQuest. This brochure lists the database engines that are supported on the Linux, OS X and Windows platforms as well as language support and PROC Support.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in beautiful Tucson Arizona. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS consulting and contract programming services and is an authorized reseller of WPS in North America.

PROC REG WPS v3.2–New Graphics and PMML

So, those of you who have downloaded WPS v3.2, there are a number of new features. I want to show two new features using PROC REG. WPS now has the ability to create plots for PROC REG. Quite handy indeed!

Also, in Proc REG for v3.2, we see experimental support for PMML (Predictive Model Markup Language).

Here is some sample code that demonstrates the plots.

*–> Data is census population data from 1790 to 2010;
data census;
   input year pop @@;
   pop2 = Round(Pop/1000000,.1);
   popsq=pop2*pop2;
   lpop=lag(pop2);
cards;
1790 3929214 1800 5308483 1810 7239881 1820 9638453 1830 12860702 1840 17063353
1850 23191876 1860 31443321 1870 38558371 1880 50189209 1890 62979766 1900 76212168
1910 92228496 1920 106021537 1930 123202624 1940 142164569 1950 161325798
1960 189323175 1970 213302031 1980 236542199 1990 258709873 2000 291421906 2010 308745538
;;;;
run;

*–> PROC REG with the PMML attribute to output the model in PMML form.;

filename outfile ‘c:\temp\regpmml.txt’;
Proc Reg data=census outpmml=outfile pmmlver=”4_2″ plots;
model pop2 = year lpop;
Title “US Census Population – PROC REG”;
run;

 

US Census Population – PROC REG
The REG Procedure
Model: MODEL1
Dependent variable: pop2

Number of Observations Read 23
Number of Observations Used 22
Number of Observations with Missing Values 1

Analysis of Variance
Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 206768 103384 9307.59 <.0001
Error 19 211.04266 11.10751    
Corrected Total 21 206979      

Root MSE 3.332793 R-Square 0.998980
Dependent Mean 111.704545 Adj R-Sq 0.998873
Coeff Var 2.983579    

Parameter Estimates
Variable DF Parameter Estimate Standard Error t Value Pr > |t|
Intercept 1 -299.75395 71.30929 -4.20 0.0005
year 1 0.16607 0.03878 4.28 0.0004
lpop 1 0.97176 0.02754 35.28 <.0001

ResidualPlot2

DiagnosticsPanel3
 

The PMML output generated is:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<PMML version=”4.2″ xmlns=”
http://www.dmg.org/PMML-4_2″>
    <Header copyright=”World Programming Limited 2002-2015″>
        <Application name=”World Programming System (WPS)” version=”3.2.0″/>
    </Header>
    <DataDictionary numbeOfFields=”5″>
        <DataField name=”year” optype=”continuous” dataType=”double”/>
        <DataField name=”pop” optype=”continuous” dataType=”double”/>
        <DataField name=”pop2″ optype=”continuous” dataType=”double”/>
        <DataField name=”popsq” optype=”continuous” dataType=”double”/>
        <DataField name=”lpop” optype=”continuous” dataType=”double”/>
    </DataDictionary>
    <RegressionModel functionName=”regression” targetFieldName=”pop2″>
        <MiningSchema>
            <MiningField name=”year”/>
            <MiningField name=”lpop”/>
            <MiningField name=”pop2″ usageType=”target”/>
        </MiningSchema>
        <RegressionTable intercept=”-299.753951850233″>
            <NumericPredictor name=”year” coefficient=”0.166074316077245″/>
            <NumericPredictor name=”lpop” coefficient=”0.971762137737628″/>
        </RegressionTable>
    </RegressionModel>
</PMML>

Interested in a free 30 day evaluation of WPS? If your organization is located in North America, simply fill out the Evaluation Request from our website.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in beautiful Tucson Arizona. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Post Installation Steps for WPS Workstations

We recently wrote a short technical document on a set of post installation steps that MineQuest Business Analytics recommends after you install WPS on your workstation. We are often asked what needs to be done after WPS is installed to get the greatest performance out of WPS without too much hassle.

The document walks you through modifying your WPS configuration file, moving your work folder to another drive, why you want to install R (for using PROC R of course!), creating an autoexec.sas file, turning out write caching and a few other pointers. You don’t need to to all of the suggestions, after all they are just suggestions, but they are useful modifications that will enable you to get more out of WPS on your workstation.

You can find the document “Post Installation Steps for WPS Workstations” in the Papers Section of the MineQuest website.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Who says only big companies can afford to utilize Business Intelligence?

One of the reasons I got into reselling WPS was the fact (and it’s still a fact) that it’s very expensive for a new firm or startup to utilize SAS products. Actually, it’s prohibitively expensive. A commercial startup business is looking at $8700 for a desktop license that provides access to BASE, GRAPH and STAT. That $8700 is for the first year and it doesn’t include access to a database, Open Source R or reading and writing to desktop files like Excel and Access. Add those necessities in the price and you are looking at more than $15,000 for the first year and more than $4200 for renewal.

With WPS our pricing is different. We kind of joke that whatever SAS does pricing wise, we do just the opposite. We don’t have a high barrier in terms of cost to start using our products. Actually, we encourage you to use our products! Currently, we charge $1,311 for a single desktop license. That’s the cost for the first year and it includes all the database engines that you would want.

We don’t have a high barrier to using the language. If you are already familiar with the language of SAS, then you are ready to go with WPS.

We don’t have a high barrier when it comes to accessing your SAS data sets. We can read and write SAS data sets just fine.

But enough about barriers, let’s talk about servers.

The pricing differential is even greater when you start looking at servers. You can license a small WPS server for less than $5,700. That’s a two LCPU server and it includes all the bells and whistles that our desktop licenses include as well. Meaning it includes all the database access engines. The nice thing about our licensing is that we don’t have client license fees. Client license fees are fees that you pay to be able to access the server you just bought! It’s a stupid fee and we try not to do stupid things!

Another way we differ from our competitor is that we don’t have Data Service Provider fees. Let’s face it, many small companies (and large companies too) provide data and reports to their customers and vendors for further analysis and research. As a DSP, you will pay significantly more for your SAS license than what is listed. Expect to pay at least 30% more and often times, a lot more.

If you’re a startup, the message is clear. You probably don’t have a lot of money to toss around and cash flow is an issue. MineQuest has partnered with Balboa Capital to help company’s manage their licensing costs. By working with Balboa Capital, you can manage your license costs by paying a monthly amount of money towards your license. You will have to take out a two year WPS license to qualify for the program, but it’s an easy and efficient way to manage your resources.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

On ErrorAbend

One issue I always had with the SAS system as a developer was when I had a job that ran in batch that had an error. The SAS System would set the number of observations to zero and go into syntax checking mode for the remainder of the program.

This had some virtues but more often than not, the error was thrown because I had misspelled a variable name in a MEANS statement or FREQ statement that was used for checking my output. This would cause SAS to go into the syntax checking mode and all the rest of my program would not execute even though it was proper.

WPS, when running in batch doesn’t do this but if you want the same effect for your batch jobs, it’s easy enough to implement. Consider the following macro – called %ErrorAbend. %ErrorAbend simply checks that the program is not running in the FOREground and checks the value of the &syserr variable after every PROC or data step and if it returns a value of 3, then issues a note and sets the number of observations to zero.

%macro onerrorabend;
  %if %eval(&syserr eq 3) and &sysenv NE FORE %then %do;
     options obs=0;
     %put NOTE: WPS has been set with OPTION OBS=0 and will continue to check statements.
  %end;
%mend;

Below is a sample program that when run in batch, puts the system into syntax checking mode and basically stops the execution of any downstream statements.

data a b;
do ii=1 to 2000;
  x=ranuni(0)* 10;
  y=Round(ranuni(0),.01)* 100;
  z=round(ranuni(0),.01)* 10000;
  

  a=ranuni(0)* 10;
  b=Round(ranuni(0),.01)* 100;
  c=round(ranuni(0),.01)* 10000;
  
  e=ranuni(0)* 10;
  f=Round(ranuni(0),.01)* 100;
  g=round(ranuni(0),.01)* 10000;
  
  i=ranuni(0)* 10;
  j=Round(ranuni(0),.01)* 100;
  k=round(ranuni(0),.01)* 10000;

  output;
end;
run;

proc freq data=a;
tables ik;
run;

%onerrorabend;


proc means data=b;
run;

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Banking, Financial Services and WPS

As a consultant as well as a reseller, MineQuest Business Analytics often has the opportunity to see and hear about the BI Stack that our customers use. Some of these systems are incredibly complex while others are uniquely simple.

One thing that we have seen time-and-time again is how the mainframe is used in the banking and finance industry. It’s been an evolving process, but the complex statistical systems seem to have moved to less expensive servers and heavy duty analytical workstations while the mainframe has become a repository for data.

Accessing data on the mainframe whether it’s in VSAM, Oracle or DB2 is a cinch with WPS also on the mainframe. As the analytics have moved away from the mainframe, the use of expensive software like those of our competitor is being called into question. The mainframe is now typically used for MXG (computer performance analytics) and ETL work. A lot of what is being done on the mainframe is just the extraction and summarization of data that is to be downloaded to the distributed systems that almost all banks and finance houses have in place.

Putting WPS on your mainframe can save you a lot of money over our competitor’s product on the same machine. If you have not already taken a look at WPS on z/OS you owe it to your company’s bottom line to investigate this product further. You will be pleasantly surprised at what you will see.

Finally, MineQuest Business Analytics can help your organization migrate your current processing to WPS. We can provide project management services as well as consulting, assessments, code review and code migration for your organization.

 

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Creating a WPS Launch Icon in Ubuntu

I use Ubuntu for my WPS Linux OS and it’s pretty easy to install. However, unlike the vast majority of people out there who run it in batch mode; I like to run it in interactive mode using the Eclipse Workbench. Hence I want an icon that I can click on to start WPS. Here’s how to do it.

On the Ubuntu desktop, right mouse click on an empty part of the screen and you will get a little option menu. Click on “Create Launcher…” You will see a dialog box pop up that looks like:

clip_image002

On my Ubuntu Linux Server, I installed WPS into a folder named wps-3.0.1. The directions below use that folder name as our example. You may have installed WPS into another folder so be sure to consider that when performing the tasks below.

Name: WPS 3.01

Command: /home/minequest/wps-3.0.1/eclipse/workbench

Comment: WPS 3.0.1 Linux

Click on the icon on the upper left hand of the Create Launcher Dialog Box (the little spring) and you will get a choose icon list box. Simply go to the WPS install folder and go into the eclipse folder. There you will find a file named icon.xpm. Click on icon.xpm and then click Open and then click OK.

That’s all there is to it. You should have the WPS icon installed and available from your desktop.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

An Interesting Fact on Pricing

When you license WPS from MineQuest Business Analytics, you don’t pay a sales tax on your purchase unless your business has a nexus in the state of Michigan. Now there are a lot of businesses that don’t have nexus in this state. Our competitor however does have offices in many states (including Michigan) and thus must charge you sales tax for your home state.

If you look at the pricing comparison for a two core WPS server vs. a two core SAS server, the amount of sales tax you pay on that SAS license is often as much if not more than the cost of the WPS Server! So, at 6.4% sales tax, a two core SAS server at $85,423 for the first year license fee, you pay ~$5,467 in sales tax. That’s more than what a two core WPS Server on x86 hardware cost.

It’s hard to argue that a SAS Server license is not over priced.

 About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

test, Test and TEST

I’m always amazed and somewhat peeved about how much error checking one has to have in their SAS language program. Even for the simplest things. So today’s mantra is test, Test, and TEST!

I was writing some code the other day to copy a WPS data set from my PC to a server share. The basic code is really quite simple, only three lines using PROC COPY. But what if the user has misidentified the source libname or the destination libname? Do you just let it blow up and hope the user looks at the logs? And then you have the data sets to be copied if you are using the SELECT statement. Do you check if the data set already exist and if so, just overwrite the file?

Although it was trivial, albeit time consuming to write the code to check for these conditions, it is well worth it. I purposely decided not to automatically overwrite an existing data set on the server. And that is good for two reasons. First, I want the user to be forced to make the decision to overwrite the data set by use of a PROC DATASETS or PROC DELETE before the copy takes place. That makes it their responsibility to delete the data set.

Secondly, I found out that writing the data set with the same name can sometimes create problems under Windows when the server folder is shared. I have had some experiences where Windows locks the file on the server and the copy never takes place. The copy procedure just hangs with a .lck extension on the file. So something is going on where it’s just not reliable.

One interesting thing to note, I don’t seem to have the problem with a lock on Linux. The copy takes place without issue every single time.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.