Tag Archives: Graphics

PROC REG WPS v3.2–New Graphics and PMML

So, those of you who have downloaded WPS v3.2, there are a number of new features. I want to show two new features using PROC REG. WPS now has the ability to create plots for PROC REG. Quite handy indeed!

Also, in Proc REG for v3.2, we see experimental support for PMML (Predictive Model Markup Language).

Here is some sample code that demonstrates the plots.

*–> Data is census population data from 1790 to 2010;
data census;
   input year pop @@;
   pop2 = Round(Pop/1000000,.1);
   popsq=pop2*pop2;
   lpop=lag(pop2);
cards;
1790 3929214 1800 5308483 1810 7239881 1820 9638453 1830 12860702 1840 17063353
1850 23191876 1860 31443321 1870 38558371 1880 50189209 1890 62979766 1900 76212168
1910 92228496 1920 106021537 1930 123202624 1940 142164569 1950 161325798
1960 189323175 1970 213302031 1980 236542199 1990 258709873 2000 291421906 2010 308745538
;;;;
run;

*–> PROC REG with the PMML attribute to output the model in PMML form.;

filename outfile ‘c:\temp\regpmml.txt’;
Proc Reg data=census outpmml=outfile pmmlver=”4_2″ plots;
model pop2 = year lpop;
Title “US Census Population – PROC REG”;
run;

 

US Census Population – PROC REG
The REG Procedure
Model: MODEL1
Dependent variable: pop2

Number of Observations Read 23
Number of Observations Used 22
Number of Observations with Missing Values 1

Analysis of Variance
Source DF Sum of Squares Mean Square F Value Pr > F
Model 2 206768 103384 9307.59 <.0001
Error 19 211.04266 11.10751    
Corrected Total 21 206979      

Root MSE 3.332793 R-Square 0.998980
Dependent Mean 111.704545 Adj R-Sq 0.998873
Coeff Var 2.983579    

Parameter Estimates
Variable DF Parameter Estimate Standard Error t Value Pr > |t|
Intercept 1 -299.75395 71.30929 -4.20 0.0005
year 1 0.16607 0.03878 4.28 0.0004
lpop 1 0.97176 0.02754 35.28 <.0001

ResidualPlot2

DiagnosticsPanel3
 

The PMML output generated is:

<?xml version=”1.0″ encoding=”utf-8″ ?>
<PMML version=”4.2″ xmlns=”
http://www.dmg.org/PMML-4_2″>
    <Header copyright=”World Programming Limited 2002-2015″>
        <Application name=”World Programming System (WPS)” version=”3.2.0″/>
    </Header>
    <DataDictionary numbeOfFields=”5″>
        <DataField name=”year” optype=”continuous” dataType=”double”/>
        <DataField name=”pop” optype=”continuous” dataType=”double”/>
        <DataField name=”pop2″ optype=”continuous” dataType=”double”/>
        <DataField name=”popsq” optype=”continuous” dataType=”double”/>
        <DataField name=”lpop” optype=”continuous” dataType=”double”/>
    </DataDictionary>
    <RegressionModel functionName=”regression” targetFieldName=”pop2″>
        <MiningSchema>
            <MiningField name=”year”/>
            <MiningField name=”lpop”/>
            <MiningField name=”pop2″ usageType=”target”/>
        </MiningSchema>
        <RegressionTable intercept=”-299.753951850233″>
            <NumericPredictor name=”year” coefficient=”0.166074316077245″/>
            <NumericPredictor name=”lpop” coefficient=”0.971762137737628″/>
        </RegressionTable>
    </RegressionModel>
</PMML>

Interested in a free 30 day evaluation of WPS? If your organization is located in North America, simply fill out the Evaluation Request from our website.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in beautiful Tucson Arizona. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

But can it do the Cowboy Hat?

One of the interesting things about having a blog are the comments and emails that you receive. In the previous posting, I demonstrated how you can use the Bridge to R and R graphics to create histograms and contour plots that rivals anything that SAS/Graph can do. Well, I received an email from a long lost soul who I worked with in Michigan for about a year on a banking project. He asked, "But can it do the Cowboy Hat?"

That made me laugh out loud and brought back a lot of memories. David and I were both doing a lot of SAS programming work for a bank and this was in the era of IBM 3179G graphics terminals. SAS had just pushed out a new procedure called g3d for creating three dimensional plots. We were both kind of blown away by the sample program that came with SAS/Graph called the "Cowboy Hat." Well, long story short, David and I both wanted to find a way to create that kind of plot with our bank data, even if it killed us.

Over the course of three weeks (mostly in our spare time, obviously), we tried numerous variations of data that we had in our DB but the plots that came out of g3d never looked anywhere as good as the sample program. We would submit programs and wait for 10 minutes to slowly see our graphics terminal display the top part of the plot, wait another five seconds, and then watch the bottom part of the plot appear. We were always disappointed at what we had created.

One day, we learned that there was a project going on inside the bank to optimize the branch system and all the customer accounts had been geocoded with lat/long coordinates. Both our eyes lit up at the same time because we knew we finally had some data we could use to make a "cool" plot. As we investigated the file, things just kept getting better. There were additional coordinate pairs that specified the nearest "major intersection" and the coordinates of the centroid for the census block group that the account belonged.

We ended up using deposit balances for our Z axis, and the lat/long coordinates for our X and Y pair. After tweaking the code, we got this amazing plot to come out. It was a plot showing the strength of deposit balances over a specified area (Wayne, Oakland and Macomb counties) and the plot demonstrated all the characteristics of the Central Business District and Central Place Theory that is discussed in modern geography. We were both tickled to death. In retrospect, between the mainframe charge backs and our time that we billed, that plot could vie for the most expensive plot made to that date. But we didn’t care! It looked amazingly cool.

So to answer your question David, "Yes, R can do the Cowboy Hat." For reference, here’s a link to the SAS/Graph version.

Here’s the WPS and R code to do the same plot.

*–> Cowboy hat;
data hat;
   do x=-5 to 5 by .25;
      do y=-5 to 5 by .25;
         z=sin(sqrt(x*x+y*y));
         output;
      end;
   end;

*–> Perspective plots;
%Rstart(csv, hat,GRAPHWINDOW);
datalines4;

library(akima)
attach(hat)
surface <- interp(x, y, z)

persp(surface, axes="TRUE", theta=110, phi=15,
      ticktype="detailed", expand=0.50)
title("Cowboy Hat with R",font.main=1)

;;;; 
%Rstop(import=);

 

And the R version of the Cowboy Hat.

 

cowboyhat

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Graphics with the Bridge to R

One of the nice features of the Bridge to R for WPS users is how well it complements WPS. For example, WPS supports the graphics procedures Gchart and Gplot. Gchart can create vertical and horizontal bar charts as well as pie charts. The Gplot procedure allows you to create x, y plots (i.e. scatter plots) for displaying two dimensional data. Granted, bar charts and pie charts account for most of the graphics that are used in business, but what if you want to create graphs and plots that aren’t currently supported by WPS?

This is where the Bridge to R makes it easy to access the graphics capabilities in R. For those of you who have not delved into this area before, it’s really pretty simple and the quality of the graphics is amazing.

I’m going to show you some code examples that take data from WPS, loads it into R using the Bridge to R and creates a graph or plot. We will create four different plots in the examples below including a simple histogram, two contour plots and a perspective plot. So let’s get started!

The code below starts out by creating 1500 observations and the three variables x, y, and z. We interface with the R system with the statement %Rstart. We load the library MASS, attach the data to an R data frame and call the histogram plotting routine “truehist(y)”. The last statement, “title” simply applies a title to the plot with the desired color.

Program Histogram.sas

data plotdata;
  do ii=1 to 1500;
     x= rannor(0);
     y = rannor(1);
     z = 1000 * rannor(0);
output;
end;
run;

*–> histogram;
%Rstart(csv,plotdata,GRAPHWINDOW);
datalines4;

library(MASS)
attach(plotdata)

truehist(y)                  # create histogram
title(main="Histogram of y", # title the plot
      col.main="blue",
      font.main=4)

;;;; 
%Rstop;

 

 Output from WPS program Histogram.sas

 

histogram

What if you want to create a Contour Plot? SAS has a proc called Gcontour that WPS doesn’t provide support for at this time. Using the Bridge to R, you can easily create contour plots. In the example below, we use the same data set as before and create three plots. They are in order of creation, a simple contour plot, a filled contour plot, and a perspective plot.

Program Contour.sas

data plotdata;
  do ii=1 to 1500;
     x= rannor(0);
     y = rannor(1);
     z = 1000 * rannor(0);
output;
end;
run;

*–> contour plots;
%Rstart(csv,plotdata,GRAPHWINDOW);
datalines4;

library(akima)
attach(plotdata)

surface <- interp(x,y,z)

contour(surface) # create the contour plot
title(main="Contour Plot of x, y, z", col.main="blue", font.main=4)

filled.contour(surface) # create filled contour
title(main="Filled Contour Plot of x, y, z", col.main="blue", font.main=4)

persp(surface,      # create perspective plot
      col="green", expand=0.3)
title(main="Perspective Plot of x, y, z", col.main="blue", font.main=4)

;;;; 
%Rstop;

 

Output Generated by Contour.sas

Contour Plot

contour

Filled Contour Plot

filled_contour

Perspective Plot

perspective

So you can see just how easy it is to more fully flesh out your WPS graphics with the Bridge to R. We’ve only touched on the power of R when it comes to graphics but this gives you an idea of the flexibility that R and the Bridge can offer the data analyst. Remember, the Bridge to R is offered at no cost to those customers who license WPS from MineQuest!

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Technorati Tags: ,,