Category Archives: Bridge to R

High Performance Workstations for BI

There’s one thing I really enjoy and that’s powerful workstations for performing analytics. It’s fun to play around with and can be insightful to speculate on the design and then build a custom higher-end workstation for running BI applications like WPS and R.

ARS Builds

Every quarter, ARS Technica goes through an exercise where they build three PC’s mainly to assess gaming performance and then do a price vs. performance comparison. There’s a trend that you will soon see after reading a few of these quarterly builds and that is, the graphics card plays a major role in their performance assessment. The CPU, number of cores and fixed storage tend to be minimal when comparing the machines.

This if course will be in contrast to what we want to do for our performance benchmarks. We are looking at a holistic approach of CPU throughput, DISK I/O and graphics for getting the most for the dollar on a workstation build. But ARS does have a lot to recommend when it comes to benchmarking and I think it’s worthwhile including some of their ideas.

What Constitutes a High End Analytics Workstation?

This is an interesting question and one that I will throw out for debate. It’s so easy to get caught up in spending thousands of dollars, if not ten thousand dollars (see the next section) for a work station. One thing that even the casual observer will soon notice is that being on the bleeding edge is a very expensive proposition. It’s an old adage that you are only as good as your tools. There’s also the adage that it’s a poor craftsman that blames his tools. In the BI world, especially when speed means success, it’s important to have good tools.

As a basis for what constitutes a high end workstation, I will offer the following as a point of entry.

  • At least 4 Logical CPU’s.
  • At least 8GB of RAM, preferably 16GB to 32GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a mid-tier solution, I would think that a workstation comprised of the following components would be ideal.

  • At least 8 Logical CPU’s.
  • A minimum of 16GB of RAM.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a high end solution, I would think that a workstation built with the following hardware would be close to ultimate for many (if not most) analysts.

  • Eight to 16 Logical CPU’s – Xeon Class (or possible step down to an Intel I7).
  • A minimum of 32GB of RAM and up to 64GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • Multiple 24” displays capable of at least 1920×1200 each.

I do have a bias towards hardware that is upgradeable. All-in-one solutions tend to be one shot deals and thus expensive. I like upgradability for graphics cards, memory, hard drives and even CPU’s. Expandability can save you thousands of dollars over a period of a few years.

The New Mac Pro – a Game Changer?

The new Mac Pro is pretty radical from a number of perspectives. It’s obviously built for video editing but its small size is radical in my opinion. As a Business Analytics computer it offers some intriguing prospects. You have multiple cores, lots of RAM, high end graphics but limited internal storage. That’s the main criticism that I have about the new Mac Pro. The base machine comes with 256GB of storage and that’s not much for handling large data sets. You are forced to go to external storage solutions to be able to process large data sets. Although I’ve not priced out the cost of adding external storage, I’m sure it’s not inexpensive.

Benchmarks

This is a tough one for me because so many organizations have such an array of hardware and some benchmarks are going to require hardware that has specific capabilities. For example, Graphics Cards that are CUDA enabled to do parallel processing in R. Or the fact that we use the Bridge to R for invoking R code and the Bridge to R only runs on WPS (and not SAS).

I did write a benchmark a while ago that I like a lot. It provides information on the hardware platform (i.e. amount of memory and the number of LCPU’s available) and just runs the basic suite of PROCS that I know is available in both WPS and SAS. Moving to more statistically oriented PROC’s such as Logistic and GLM may be difficult because SAS license holders may not have the statistical libraries necessary to run the tests. That’s a major drawback to licensing the SAS System. You are nickel and dimed to death all the time. The alternative to this is to have a Workstation benchmark that is specific to WPS.

Perhaps the benchmark can be written where it tests if certain PROCS and Libraries are available and also determine if the hardware required is present (such as CUDA processors) to run that specific benchmark. Really, the idea is to determine the performance of the specific software for a specific set of hardware and not a comparison between R, WPS and SAS.

Price and Performance Metrics

One aspect of ARS that I really like is when they do their benchmarks, they calculate out the cost comparison for each build. They often base this on hardware pricing at the time of the benchmark. What they don’t do is price in the cost of the software for such things as video editing, etc… I think it’s important to show the cost with both hardware and software as a performance metric benchmark.

Moving Forward

I’m going to take some time and modify the WPS Workstation Benchmark Program that I wrote so that it doesn’t spew out so much unnecessary output into the listing window. I would like it to just show the output from the benchmark report. I think it would also be prudent to see if some R code could be included in the benchmark and compare and contrast the performance if there are some CUDA cores available for assisting in the computations.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Plotting Points on a Street Level Map using the Bridge to R and WPS

In the last few installments of this blog, I have shown how you can use WPS and the Bridge to R to calculate drive distances, geocode records and pull down a map from Google maps. I want to use this post to pull all this together and show how you can geocode your addresses and plot them on a street level map.

First some background you need to know about using Google for geocoding and mapping. There are limits to what Google will allow you to do with their services before they want you to start paying. You can geocode 2,500 records a day for free. You can pull down 25,000 maps a day for free. Once you start moving past these limits, there are fees involved.

One thing that you should probably start to consider is caching records locally that have been geocoded so that you don’t have to go back to the Google geocoder every time you want to plot some points on a map. I could easily run through 2,500 addresses in a day. The limitations on the number of maps is just not an issue for me. I think 25,000 maps a day is a very liberal offering for the kind of work that I would want to use the service for.

In the sample code below, I split the mapping process into two components for ease in understanding the entire process. I first geocode the file to get the latitude and longitude for each record. The second part of the process is creating a map and using the lat’s and long’s to plot points on the map. We could have put this into a single step but it wouldn’t be as clear or as flexible.

Without further ado, here’s the code using the Bridge to R and WPS.

data gasstations;
input company $1-29 address $30-52 city $53-64 state $66-67;
addr2geocode=trim(address)||', '||trim(city)||', '||trim(state);
cards;
Citgo Gas Station            5189 28th St Se        Grand Rapids MI
28th Street BP               5155 28th St Se        Grand Rapids MI
Twenty-Eighth Street C Store 5556 28th St Se        Grand Rapids MI
Speedway                     4045 28th St Se        Grand Rapids MI
Speedway                     2305 E Paris Ave Se    Grand Rapids MI
Superamerica                 2305 E Paris Ave Se    Grand Rapids MI
Shell Food Mart              3960 28th St Se        Grand Rapids MI
Admiral Petroleum            3927 28th St Se        Grand Rapids MI
Cascade C Store              4591 Cascade Rd Se     Grand Rapids MI
Friendly Food Shops          6799 Cascade Rd Se     Grand Rapids MI
Family Fare Quick Stop       6799 Cascade Rd Se     Grand Rapids MI
Cascade Citgo                6820 Cascade Rd Se     Grand Rapids MI
Dutton Fuel Mart LLC         2560 E Beltline Ave Se Grand Rapids MI
Centerpointe Marathon        2560 E Beltline Ave Se Grand Rapids MI
Shell Food Mart              2600 E Beltline Ave Se Grand Rapids MI
Speedway                     4018 Cascade Rd Se     Grand Rapids MI
Grand Rapids Gas Incorporated3214 28th St Se        Grand Rapids MI
Cascade Shell                4033 Cascade Rd Se     Grand Rapids MI
Speedway                     4665 44th St Se        Kentwood     MI
Super Petroleum Incorporated 2411 28th St Se        Grand Rapids MI
;;;;
run;


*--> Geocode the addresses using the Google Geocoder. Keep the geocoded records
     in the output dataset names locs for further processing.;

%rstart(dataformat=csv,data=gasstations,rGraphicsFile=);
datalines4;

## options(repos=structure(c(CRAN="http://cran.case.edu/")))
## install.packages("ggmap", dependencies = FALSE)

   attach(gasstations)

   library(ggmap)

   gaddress <- as.character(gasstations$addr2geocode)
   locs <- geocode(gaddress,output="more")

;;;;
%rstop(import=locs);


*--> Pull a Google map that is centered on a particular address and plot the locations
     on the map. Use the data set that was created (locs) above that contains the 
     lat and longs to plot the points.;
     
Title 'Gas Stations on or Near 28th Street';
Title2 'Grand Rapids, Michigan';     
     
%rstart(dataformat=csv,data=locs,rGraphicsFile=);
datalines4;

attach(locs)
addr <- locs;

library(ggmap)

map.center <- geocode('3960 28th St Se, Grand Rapids, MI');

 grmap <- qmap(c(lon=map.center$lon, lat=map.center$lat), zoom = 13,color = 'color', legend = 'topleft')
          grmap +geom_point(aes(x = lon, y = lat, size=3.0), data = addr)

;;;;
%rstop(import=);

The map that is created looks like this:

GR_Stations

I cropped this down a bit and got rid of the borders so that it would be easier to view on this blog. Note the black points on the map that indicate the locations of the gas stations. We could continue this exercise by plotting a label to the points with the names of the service stations but that would be a good exercise for the reader who wants to learn more about using ggmap and street level mapping.

If you want to learn more about ggmap and street level mapping, I encourage you to take a look at the following document, “ggmap: Spatial Visualization with ggplot2 – the R Journal” and can be viewed in PDF format here. What I have presented is really a quick and dirty set of examples that just begin to scratch the surface of what ggmap can do for you.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Creating a Street Level Map with WPS and the Bridge to R

Creating a street level map using the Bridge to R and WPS is actually pretty easy. As in our other examples (see the two previous blogs) we again use ggmap to pull down a map from Google Maps and display it using HTML. Amazingly, this only takes four lines of R code. Here’s an example:

 

%rstart(dataformat=man,data=,rGraphicsFile=);
datalines4;

   library(ggmap)

   bp <- "4045 28th St Se, Grand Rapids, MI, USA"
   qmap(bp, zoom=12)
   print(bp)

;;;;
%rstop(import=);

The code is fairly easy to follow. We load the ggmap library that will do most of the work for us. We center the map using the address “4045 28th St Se, Grand Rapids, MI, USA”. The next line queries the map with a specified zoom level (we are using zoom level 12). Finally, we print the map using the print function.

This is what the map looks like.

b2rplt_1700486050_2_1

 

We can actually take this a bit further. Instead of using a known address, we can us a place of interest for querying and creating the map. If we replace the address in the code above with “White House, Washington DC, USA” we get a map like below.

b2rplt_1700487946_3_1

So know we have seen how easy it is to pull down a map from Google using ggmap and the Bridge to R for WPS. If you have a copy of the Bridge to R, I recommend you play with the demonstration programs to get an idea of what you can do with the software and the mapping service. It’s always fun to see what gets rendered using ggmap, R and the Bridge to R.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Geocoding with WPS and the Bridge to R

 

One of the things that I truly enjoy is the flexibility of the language of SAS and how well WPS allows you to integrate the product with other services. One aspect of research, whether you are in the social sciences, marketing or some other area of study is the use and application of location data. 

I’m not talking necessarily about getting the data from your cell phone on where you are or have been, but in taking address data and using it to create a business advantage. Visualizing data on a map is important for many people but it’s often a laborious task to get all the data enhanced so that it can be mapped or plotted. Specifically, I’m talking about taking an address and finding additional information such as political districts and latitude and longitude for each address.

Using the Bridge to R and WPS, we can use R to geocode our data. In our example, we will use ggmap which was developed and written by Hadley Wickham and David Kahle. It is a truly amazing application and every time I use it, I learn something new.

In this example, I have 20 records that have the name and address of gas stations in Cascade Township which is a part of Grand Rapids, Michigan. What we want to do is geocode these 20 records to find their latitude and longitude. Below is the entire code snippet to do just that.

data gasstations;
input company $1-29 address $30-52 city $53-64 state $66-67;
addr2geocode=trim(address)||', '||trim(city)||', '||trim(state);
cards;
Citgo Gas Station            5189 28th St Se        Grand Rapids MI
28th Street BP               5155 28th St Se        Grand Rapids MI
Twenty-Eighth Street C Store 5556 28th St Se        Grand Rapids MI
Speedway                     4045 28th St Se        Grand Rapids MI
Speedway                     2305 E Paris Ave Se    Grand Rapids MI
Superamerica                 2305 E Paris Ave Se    Grand Rapids MI
Shell Food Mart              3960 28th St Se        Grand Rapids MI
Admiral Petroleum            3927 28th St Se        Grand Rapids MI
Cascade C Store              4591 Cascade Rd Se     Grand Rapids MI
Friendly Food Shops          6799 Cascade Rd Se     Grand Rapids MI
Family Fare Quick Stop       6799 Cascade Rd Se     Grand Rapids MI
Cascade Citgo                6820 Cascade Rd Se     Grand Rapids MI
Dutton Fuel Mart LLC         2560 E Beltline Ave Se Grand Rapids MI
Centerpointe Marathon        2560 E Beltline Ave Se Grand Rapids MI
Shell Food Mart              2600 E Beltline Ave Se Grand Rapids MI
Speedway                     4018 Cascade Rd Se     Grand Rapids MI
Grand Rapids Gas Incorporated3214 28th St Se        Grand Rapids MI
Cascade Shell                4033 Cascade Rd Se     Grand Rapids MI
Speedway                     4665 44th St Se        Kentwood     MI
Super Petroleum Incorporated 2411 28th St Se        Grand Rapids MI
;;;;
run;

proc print data=gasstations;
var addr2geocode;
run;


%rstart(dataformat=csv,data=gasstations,rGraphicsFile=);
datalines4;

   attach(gasstations)

   library(ggmap)

   gaddress <- as.character(gasstations$addr2geocode)
   locs <- geocode(gaddress,output="more")

;;;;
%rstop(import=locs);

proc print data=locs(drop=var2);
run;

The output that is returned from the PROC Print looks like:

                                                 The WPS System                     19:07 Thursday, November 14, 2013    1
                                                                                                                                    
 Obs          lon          lat type             loctype                                                                             
                                                                                                                                    
   1  -85.5396645   42.9127946 street_address   rooftop                                                                             
   2   -85.540616    42.913151 street_address   rooftop                                                                             
   3  -85.5308863   42.9129211 street_address   range_interpolated                                                                  
   4  -85.5678243   42.9128289 street_address   rooftop                                                                             
   5  -85.5692342    42.921498 street_address   range_interpolated                                                                  
   6  -85.5692342    42.921498 street_address   range_interpolated                                                                  
   7  -85.5685794   42.9125298 street_address   range_interpolated                                                                  
   8  -85.5700129   42.9125533 street_address   range_interpolated                                                                  
                                                                                                                                    
 Obs address                                                                                               north        south       
                                                                                                                                    
   1 5189 28th street southeast, grand rapids, mi 49508, usa                                         42.91414358  42.91144562       
   2 5155 28th street southeast, grand rapids, mi 49512, usa                                         42.91449998  42.91180202       
   3 5556 28th street southeast, grand rapids, mi 49512, usa                                         42.91427683  42.91157887       
   4 4045 28th street southeast, grand rapids, mi 49512, usa                                         42.91417788  42.91147992       
   5 2305 east paris avenue southeast, grand rapids, mi 49546, usa                                   42.92284723  42.92014927       
   6 2305 east paris avenue southeast, grand rapids, mi 49546, usa                                   42.92284723  42.92014927       
   7 3960 28th street southeast, grand rapids, mi 49512, usa                                         42.91388553  42.91118757       
   8 3927 28th street southeast, grand rapids, mi 49512, usa                                         42.91389553  42.91119757       
                                                                                                                                    
 Obs         east         west  postal_code country         administrative_area_level_2 administrative_area_level_1 locality        
                                                                                                                                    
   1 -85.53831552 -85.54101348        49508 united states             kent                      michigan            grand rapids    
   2 -85.53926702 -85.54196498        49512 united states             kent                      michigan            grand rapids    
   3 -85.52953782 -85.53223578        49512 united states             kent                      michigan            grand rapids    
   4 -85.56647532 -85.56917328        49512 united states             kent                      michigan            grand rapids    
   5 -85.56787602 -85.57057398        49546 united states             kent                      michigan            grand rapids    
   6 -85.56787602 -85.57057398        49546 united states             kent                      michigan            grand rapids    
   7 -85.56723042 -85.56992838        49512 united states             kent                      michigan            grand rapids    
   8 -85.56866387 -85.57136183        49512 united states             kent                      michigan            grand rapids    
                                                                                                                                    
 Obs street                                  streetNo point_of_interest query                                                       
                                                                                                                                    
   1 28th street southeast                       5189        NA         5189 28th St Se, Grand Rapids, MI                           
   2 28th street southeast                       5155        NA         5155 28th St Se, Grand Rapids, MI                           
   3 28th street southeast                       5556        NA         5556 28th St Se, Grand Rapids, MI                           
   4 28th street southeast                       4045        NA         4045 28th St Se, Grand Rapids, MI                           
   5 east paris avenue southeast                 2305        NA         2305 E Paris Ave Se, Grand Rapids, MI                       
   6 east paris avenue southeast                 2305        NA         2305 E Paris Ave Se, Grand Rapids, MI                       
   7 28th street southeast                       3960        NA         3960 28th St Se, Grand Rapids, MI                           
   8 28th street southeast                       3927        NA         3927 28th St Se, Grand Rapids, MI                                                                                      The WPS System                     19:07 Thursday, November 14, 2013    2
                                                                                                                                    
 Obs          lon          lat type             loctype                                                                             
                                                                                                                                    
   9   -85.556224    42.946438 street_address   rooftop                                                                             
  10    -85.50019    42.915388 street_address   rooftop                                                                             
  11    -85.50019    42.915388 street_address   rooftop                                                                             
  12   -85.499809    42.913584 street_address   rooftop                                                                             
  13   -85.583252    42.916731 street_address   rooftop                                                                             
  14   -85.583252    42.916731 street_address   rooftop                                                                             
  15   -85.583243    42.916087 street_address   rooftop                                                                             
  16   -85.570236    42.947743 street_address   rooftop                                                                             
                                                                                                                                    
 Obs address                                                                                               north        south       
                                                                                                                                    
   9 4591 cascade road southeast, grand rapids, mi 49546, usa                                        42.94778698  42.94508902       
  10 6799 cascade road southeast, grand rapids, mi 49546, usa                                        42.91673698  42.91403902       
  11 6799 cascade road southeast, grand rapids, mi 49546, usa                                        42.91673698  42.91403902       
  12 6820 cascade road southeast, grand rapids, mi 49546, usa                                        42.91493298  42.91223502       
  13 2560 east beltline avenue southeast, centerpointe mall, grand rapids, mi 49546, usa             42.91807998  42.91538202       
  14 2560 east beltline avenue southeast, centerpointe mall, grand rapids, mi 49546, usa             42.91807998  42.91538202       
  15 2600 east beltline avenue southeast, centerpointe mall, grand rapids, mi 49546, usa             42.91743598  42.91473802       
  16 4018 cascade road southeast, grand rapids, mi 49546, usa                                        42.94909198  42.94639402       
                                                                                                                                    
 Obs         east         west  postal_code country         administrative_area_level_2 administrative_area_level_1 locality        
                                                                                                                                    
   9 -85.55487502 -85.55757298        49546 united states             kent                      michigan            grand rapids    
  10 -85.49884102 -85.50153898        49546 united states             kent                      michigan            grand rapids    
  11 -85.49884102 -85.50153898        49546 united states             kent                      michigan            grand rapids    
  12 -85.49846002 -85.50115798        49546 united states             kent                      michigan            grand rapids    
  13 -85.58190302 -85.58460098        49546 united states             kent                      michigan            grand rapids    
  14 -85.58190302 -85.58460098        49546 united states             kent                      michigan            grand rapids    
  15 -85.58189402 -85.58459198        49546 united states             kent                      michigan            grand rapids    
  16 -85.56888702 -85.57158498        49546 united states             kent                      michigan            grand rapids    
                                                                                                                                    
 Obs street                                  streetNo point_of_interest query                                                       
                                                                                                                                    
   9 cascade road southeast                      4591        NA         4591 Cascade Rd Se, Grand Rapids, MI                        
  10 cascade road southeast                      6799        NA         6799 Cascade Rd Se, Grand Rapids, MI                        
  11 cascade road southeast                      6799        NA         6799 Cascade Rd Se, Grand Rapids, MI                        
  12 cascade road southeast                      6820        NA         6820 Cascade Rd Se, Grand Rapids, MI                        
  13 east beltline avenue southeast              2560        NA         2560 E Beltline Ave Se, Grand Rapids, MI                    
  14 east beltline avenue southeast              2560        NA         2560 E Beltline Ave Se, Grand Rapids, MI                    
  15 east beltline avenue southeast              2600        NA         2600 E Beltline Ave Se, Grand Rapids, MI                    
  16 cascade road southeast                      4018        NA         4018 Cascade Rd Se, Grand Rapids, MI                                                                                   The WPS System                     19:07 Thursday, November 14, 2013    3
                                                                                                                                    
 Obs          lon          lat type             loctype                                                                             
                                                                                                                                    
  17  -85.5879549    42.912112 street_address   rooftop                                                                             
  18   -85.569238    42.948355 street_address   rooftop                                                                             
  19  -85.5499342   42.8836312 street_address   range_interpolated                                                                  
  20   -85.607639    42.912997 street_address   rooftop                                                                             
                                                                                                                                    
 Obs address                                                                                               north        south       
                                                                                                                                    
  17 3214 28th street southeast, grand rapids, mi 49512, usa                                         42.91346098  42.91076302       
  18 4033 cascade road southeast, grand rapids, mi 49546, usa                                        42.94970398  42.94700602       
  19 4665 44th street southeast, kentwood, mi 49512, usa                                             42.88497343  42.88227547       
  20 2411 28th street southeast, grand rapids, mi 49512, usa                                         42.91434598  42.91164802       
                                                                                                                                    
 Obs         east         west  postal_code country         administrative_area_level_2 administrative_area_level_1 locality        
                                                                                                                                    
  17 -85.58660592 -85.58930388        49512 united states             kent                      michigan            grand rapids    
  18 -85.56788902 -85.57058698        49546 united states             kent                      michigan            grand rapids    
  19 -85.54858527 -85.55128323        49512 united states             kent                      michigan            kentwood        
  20 -85.60629002 -85.60898798        49512 united states             kent                      michigan            grand rapids    
                                                                                                                                    
 Obs street                                  streetNo point_of_interest query                                                       
                                                                                                                                    
  17 28th street southeast                       3214        NA         3214 28th St Se, Grand Rapids, MI                           
  18 cascade road southeast                      4033        NA         4033 Cascade Rd Se, Grand Rapids, MI                        
  19 44th street southeast                       4665        NA         4665 44th St Se, Kentwood, MI                               
  20 28th street southeast                       2411        NA         2411 28th St Se, Grand Rapids, MI                           
                                                                                                                

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Calculating Driving Distances with WPS and the Bridge to R

A few weeks ago, there was a posting on SAS-L where the poster was attempting to get the driving distance between two cities using google’s mapping services. I found that a rather interesting question and decided to see what I could do using WPS and the Bridge to R.

For those who are unfamiliar with the Bridge to R, it is a product from MineQuest Business Analytics that allows you to execute R statements from within the WPS environment. You can pass WPS datasets to R and return R frames to WPS quite easily. You also get the R log and list files returned to your WPS session in the corresponding log and list windows.

Here is the code that we used to create a driving distance matrix between three cities. The output is printed using the PROC Print statement in WPS. 

*--> data set for drive distances;
data rdset;
input fromdest $1-17 todest $ 20-36;
cards;
Grand Rapids, MI   State College, PA
Columbus, OH       Grand Rapids, MI
Chicago, IL        Grand Rapids, MI
;;;;
run;


%Rstart(dataformat=csv,data=rdset,rGraphicsFile=);
datalines4;

    attach(rdset)
    library(ggmap)

    from <- as.character(fromdest)
    to  <- as.character(todest)

    mydist <- mapdist(from,to)

;;;;
%rstop(import=mydist);

proc print data=mydist(drop=var2);
format m comma10. km comma 8.2 miles 8.2 seconds comma7. minutes comma8.2 hours 6.2;
run;

And this is the output:

      Obs    from                  to                              m          km       miles    seconds     minutes     hours       
                                                                                                                                    
       1     Grand Rapids, MI      State College, PA         843,978      843.98      524.45     28,256      470.93      7.85       
       2     Columbus, OH          Grand Rapids, MI          521,289      521.29      323.93     17,543      292.38      4.87       
       3     Chicago, IL           Grand Rapids, MI          285,836      285.84      177.62      9,695      161.58      2.69       
                                                                                                                             

So you can see how handy WPS and the Bridge2 to R can be as a resource – kind of a Swiss Army knife if you like.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Thoughts on Mapping and Geocoding in WPS

I’ve been working on a framework for a set of new macros to be included in the Bridge to R that I think will be very useful for many WPS users. Coming from a background in Demography, I’ve always been partial to maps and charts. There’s a plethora of open source products out there as well as API’s from Google, Bing, etc… that allow a user to create some pretty darn nice maps.

I recently became aware of a new R library by Hadley Wickham and David Kahle called ggmap. Professor Wickham has created some phenomenal software for the open source R system. Hadley created ggplot2 that is truly the standard for graphics in the R world. He has also written a book on ggplot2 that is well worth purchasing and can be found on Amazon by clicking here.

For most of us in the WPS world, we are rather limited to the native graphics available in the product. That was mostly overcome with the Bridge to R that we created a few years ago. So there are two things that I see as important at this juncture that needs to be addressed and that is geocoding and mapping.

First I want to discuss geocoding. Geocoding has always been this strange process that is 10x more convoluted than it really needs to be. For most of us, we want to provide an address from out data set and get back latitude and longitude for that address. For a much smaller group of users, they want to enhance their data with latitude and longitude as well as zip code, etc… Either way, using external services from commercial companies to do such a thing is often expensive. This is especially true or smaller data sets where there is a standard fee plus so much per name.

The second aspect worth discussing is the availability of mapping software and the associated cost. Some of these programs are expensive to say the least. Unless you intend to make a career out of creating a map (and I am not) then other alternatives need to be looked at to keep costs down.

So, long story short, it’s worth investigating interfacing into open source and cost free solutions for map making using WPS. I have briefly looked at Google, Bing and OpenStreetMaps. There are pros and cons to each one of them but I want easy and nice looking and that’s the driver behind my development.

Professor’s Wickham and Kahle have done a lot in this area trying address the short comings of R for mapping. I could not approach their creative genius and determination in creating ggmap, but I can create an interface that makes using ggmap easier for WPS users. So, my summer adventure is to create a clean interface using WPS and the Bridge to R so that WPS users will have some extraordinary maps that they can create using ggmap.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

What are the benefits when buying from MineQuest?

I’m often asked what are the benefits of buying a WPS license from MineQuest Business Analytics?

There are many, but I will touch on three of them in this post. First, when you purchase your server licenses from MineQuest, we provide a means of protecting your license investment. If you find that you need to scale up your server from say four to eight cores in the middle of your license period, we can upgrade your license so that you don’t lose money in the transition. We do require that you stay on the same operating system, but other than that, you get full credit for the time remaining on your license when you trade up.

The second benefit is that you automatically get a copy of the Bridge to R. The Bridge to R allows you to interface WPS into the R system for running more advanced statistical routines and enhanced graphics.

Third, we offer a first line of support for our customers. We have a few years of WPS experience under our belt and have developed products based on WPS. We know the product relatively well and we provide consulting services to companies needing it. We can implement WPS Link (i.e. submit programs from a desktop to a WPS Server) for customers who want a client server environment on any x86 architecture and help with installation and product overviews to users.

 About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Another View of R and Big Data

I was reading a blog entry the other day that just blew me away. Librestats has a blog entry entitled, “R at 12,000 Cores” and it is a very good (and fun) read. It’s amazing what can be done by the open source advocates and this article is a great example of that.

After reading the article, I can’t help but think about the relationship between extremely large data, server size (both CPU’s and RAM) and how fast data is growing. There has to be a way to crunch through the amount of data that is piling up and this article addresses that issue.

I believe you will begin seeing vendors embrace R more openly, mainly because they have to embrace it. There’s not any companies that can develop code at the break neck pace that the R community is putting out packages. It’s truly amazing and cost effective to model data in the way that the above article describes the state-of-the-art.

Even small companies can make use of multiple servers with dozen of cores and lots of RAM rather inexpensively. Using Linux and R on a set of servers, an organization can have a hundred cores at their disposal for crunching data and not paying very much in licensing fees.

I have been giving some thought to making the Bridge to R run in parallel on a single server as well as across a set of servers using WPS and pdbR or Rmpi. This way, WPS would handle the management between the servers and the data transparently and provide for number crunching at very low cost. God knows we have a few extra multiple core servers laying around here so it may be an interesting adventure to give this a spin!

My first thought and intention is to make the code backward compatible. Perhaps just add a macro that can be called that contains the information needed to implement running R across cores and on a grid. It could be something as simple as:

%Rconfig(RconfigFile=xyz, RunInParallel=True||False);

The remaining statements in the Bridge to R would continue as they are and the R code would be pushed to the servers based on the information in the RconfigFile. WPS would still collect the output from these jobs and route the appropriate information to the log and listing window as well as the graphics to the graphics viewing window (wrapped in HTML) for users to view their output.

 

Is R Worthy of the Enterprise?

I’ve been a big proponent of R for the last few years and have written extensively on R as well in this blog. There have been a lot of folks who have written and believe that R is worthy of being in the Enterprise and I have to say, at this point I’m just not so sure of that.

My gripe with R is just how slow it seems to be for performing the basics such as descriptive statistics and frequency tables. When you compare the timings for these procedures against WPS or SAS using moderate sized data sets (i.e. 500,000 records), R is left in the dust.

What really caused my reversal in thought towards R is that I started to test the R library SAS7BDAT to read a SAS version 7 data set. I thought it might be a nice addition to the Bridge to R to be able to read a SAS data set directly. As I got into test the library for performance issues, I was a little surprised by what I discovered. Just reading in a SAS v7 data set that has five variables and 500,000 observations (or records) to perform a simple T Test, WPS was up to 18 times faster. The larger the data set, the faster WPS was over R.

I have always heard that R is supposed to be fast because the data frame is held in memory. I also think it has its place in education for learning statistics and data analysis. But the corporate world is another story. Using WPS, I can often blow R out of the water in terms of performance and this is with reading the WPS data set from the hard drive AND performing the computations.

Personally, I think the strength of R is in development of algorithms for models and graphics. GGPLOT2 is absolutely awesome and allows you to do some amazing graphs. But for running production jobs, especially time critical jobs, using WPS for the models when appropriate is a much better solution to the problem.

Don’t forget there’s still time to get into the action to win a Google Nexus 7 Tablet. If you register to take out a WPS evaluation before September 30th, 2012, you will automatically be registered in the drawing for the tablet. Certain conditions apply so read the the earlier blog post for all the details. You can request a WPS evaluation by going to the MineQuest Business Analytics website at the WPS evaluation page.

Bridge to R Study Edition now available

After giving some consideration to creating a Study Edition of the Bridge to R, we finally have put the package together. Starting on Friday, May 18th, you can request a download package for the Bridge to R SE to use to learn and experiment with using WPS.

The Bridge to R SE is limited to reading 1500 records from a WPS dataset and passing those observations on to R. This should be sufficient for testing your R programs running in WPS and to learn the language nuances of the Bridge to R.

For more information and download instructions, go to the Bridge to R SE page. There’s also a short video that shows what the Bridge to R can do for you.

About the author: Phil Rack is President of MineQuest, LLC and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a reseller of WPS in North America.