Tag Archives: RSTATS

Submitting R Programs Remotely using Dropbox

One of the great software applications currently available is a product called DropBox. DropBox is a piece of downloadable software that allows you to access your files between different computers by dropping a file into your Dropbox folder. Dropbox automatically syncs the files between all the computers that have access to your Dropbox folder. The great thing about Dropbox is that it just works and is smooth as can be.

I’ve been using Dropbox for about two or three months now and thought how great would it be to extend the functionality of Dropbox by being able to place into a specific folder a WPS or R file and have it automatically execute and write the output back into the Dropbox folder. Basically, you would have access to your organizations server for executing programs while travelling or working onsite.

My experimentation with this is under Windows, and I put together a little application that will allow you to remotely submit an R job. On my server, I have a filewatcher program that monitors the DropBox folder of my choosing and when it sees a new R program (i.e. one with a .R extension) it fires up R and processes the program. The system writes back any output to the Dropbox folder so you also have your .lst and .log files to review. You can also directly write output from your program (say an RDataframe file you created) by referencing the folder in your program.

I’ve included a little video of how R and Dropbox can be used to submit R programs on a remote server using a browser and place the output back into a Dropbox folder.

Click here to view a short 02:30 minute video of Drop4R

Of course, you don’t have to use a browser to place the files in the Dropbox folder. You can always just copy and paste or drag and drop the R program into the DropBox folder and the Job Spawner will simply execute the R program.

I’ve created a small zip file that contains a first draft of an installation guide on how you can setup Drop4R on your Windows computers. I’ve made the application freely available and you can use it without any restrictions.

Links:

Installation Guide: Dropbox Guide

Drop4R Installation File: drop4r.zip

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Technorati Tags: ,,,,

Bridge to R v2.4.2 Available for a 60 Day Trial

The latest Bridge to R (version 2.4.2) is now available for download on an extended 60 day trial. The Bridge to R allows you to execute R syntax from within your WPS or SAS IDE and return the log and listing files from R into the SAS or WPS log and listing window. The Bridge alleviates the need to license SAS/IML Studio to access R using SAS. Also, this version of the Bridge brings SAS back into the picture in that both platforms, WPS and SAS are supported.

Requirements

The Bridge has minimal requirements. They are:

· WPS 2.4.x or SAS 9.2.x

· Windows Desktop Operating System

· R versions 2.7.x through 2.11.0.

Note that R release 2.11.0 is fairly new and not all the R packages from CRAN have been brought forward yet. Specifically, the package Hmisc still has not been released and there are some example programs that we use that rely on the Hmisc library.

The Bridge to R has also been tested on the x64 R build (i.e. the 64-bit alpha build for Windows) and so far, seems to work fine with that release as well.

Download

You can download the Bridge to R by going to the MineQuest website at:

http://minequest.com/BridgePreview.html

From the above web page, you can download the Bridge for your specific installation (i.e. WPS or SAS) as well as watch a tortuous video of what the Bridge is able to do. At least the video is only six minutes long but it does provide the background you need to decide if this is something you want to add to your software portfolio.

Installation

Place the Bridge2R.zip file on your desktop and unzip the package. The structure and contents of the folder should be:

\Bridge2R

\Bridge2R\SASMACR.WPCCAT

\Bridge2R\Bridge to R v242.pdf

\Bridge2R\samples\

There’s also a short installation and user guide that you can read before downloading the software. The installation guide is also included in the zip file.

If you have questions on installation issues, please visit the support forum that we just setup to help answer these kinds of questions.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Notes on the Next Release of the Bridge to R

Thought I’d write a little about what MineQuest has been working on for the next release of the Bridge to R. We just wrapped up the programming for the latest release and I’m pretty happy with what is right around the corner for our current users and new users as well.

First, we’ve added the ability to export WPS data sets directly into your R workspace as R data frames. We’ve always provided support for taking a single WPS data set into a data frame but this release makes it easy to export multiple data sets into R. This actually required a lot of effort to do and is based on a request from numerous customers who are using the Bridge.

Originally, I envisioned that people would use the Bridge in a similar way that they would use a WPS or SAS procedure. They would create a data set that contained all the variables they needed for a specific statistical routine in R and use that for their analysis. But I was easily convinced that this was short-sighted because it didn’t allow for the analyst to move all the data sets needed for such things as matrix operations into the R work space.

The other thing that convinced me that this was necessary is that I recently became aware of a book called "A Practical Guide to Geostatisical Mapping" by Tomislav Hengl. Tomislav writes about mapping and to create maps, you need to have multiple data sets. You need one that contains the data to be displayed and a data set that contains the coordinate files. I eventually want to provide some mapping data sets for the Bridge to R so one can create maps using the Bridge so the ability to read multiple WPS data sets is necessary.

Exporting WPS data sets to R is accomplished by specifying the names of the WPS data sets in the %Rstart() clause. Here’s an example:

%Rstart(DataFormat=xpt, data=a b c, rGraphicsViewer=No)

The data sets a, b, c are automatically exported to R dataframes for you without any other commands or programming.

The other improvement in the next release of the Bridge to R is that you can import multiple data frames from your R session to WPS. This is easily done and just requires the analyst to list the R data frames on the Import= clause of the %Rstop macro to bring all the frames back into WPS. For example:

%Rstop(import=dataframe1 dataframe2 dataframe3);

where dataframe1, dataframe2 and dataframe3 are the names of the R data frames that you want to import back into WPS. This will create three WPS data sets named dataframe1, dataframe2, and dataframe3, respectively.

We’ve also added more error checking to the Bridge. We now catch errors when using the XPORT transport format. One problem with using XPORT as a transport format is that it’s limited to eight character variable names. We now examine all the WPS data sets before they are exported to make sure that the variable names are eight characters or less in length and if not, we throw an exception, report on it and don’t try to process the R code because we already know it won’t execute.

By the way, the reason we support the transport format is due to customer requests from those in the biostats area. They wanted to make sure that they can pass a possible data processing audit and they felt much more comfortable with the XPORT format than passing data via a CSV format.

So what’s left? With the next release of the Bridge to R (by the end of April 2010) we are updating the documentation and adding more sample R programs that demonstrate how to use the Bridge. We are adding another half dozen R graphic sample programs and a few more statistical type programs as well.

I’m very confident that the Bridge to R when used with WPS can complement the WPS system by allowing the analysts to do just about any kind of graphics or statistical procedures all from within the WPS IDE. With the low cost of the Bridge (free if you license WPS from MineQuest) and the use of open source R, you can replace SAS/IML, SAS/Graph and many of the SAS statistical modules and be state-of-the-art on your analytics platform.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Graphics with the Bridge to R

One of the nice features of the Bridge to R for WPS users is how well it complements WPS. For example, WPS supports the graphics procedures Gchart and Gplot. Gchart can create vertical and horizontal bar charts as well as pie charts. The Gplot procedure allows you to create x, y plots (i.e. scatter plots) for displaying two dimensional data. Granted, bar charts and pie charts account for most of the graphics that are used in business, but what if you want to create graphs and plots that aren’t currently supported by WPS?

This is where the Bridge to R makes it easy to access the graphics capabilities in R. For those of you who have not delved into this area before, it’s really pretty simple and the quality of the graphics is amazing.

I’m going to show you some code examples that take data from WPS, loads it into R using the Bridge to R and creates a graph or plot. We will create four different plots in the examples below including a simple histogram, two contour plots and a perspective plot. So let’s get started!

The code below starts out by creating 1500 observations and the three variables x, y, and z. We interface with the R system with the statement %Rstart. We load the library MASS, attach the data to an R data frame and call the histogram plotting routine “truehist(y)”. The last statement, “title” simply applies a title to the plot with the desired color.

Program Histogram.sas

data plotdata;
  do ii=1 to 1500;
     x= rannor(0);
     y = rannor(1);
     z = 1000 * rannor(0);
output;
end;
run;

*–> histogram;
%Rstart(csv,plotdata,GRAPHWINDOW);
datalines4;

library(MASS)
attach(plotdata)

truehist(y)                  # create histogram
title(main="Histogram of y", # title the plot
      col.main="blue",
      font.main=4)

;;;; 
%Rstop;

 

 Output from WPS program Histogram.sas

 

histogram

What if you want to create a Contour Plot? SAS has a proc called Gcontour that WPS doesn’t provide support for at this time. Using the Bridge to R, you can easily create contour plots. In the example below, we use the same data set as before and create three plots. They are in order of creation, a simple contour plot, a filled contour plot, and a perspective plot.

Program Contour.sas

data plotdata;
  do ii=1 to 1500;
     x= rannor(0);
     y = rannor(1);
     z = 1000 * rannor(0);
output;
end;
run;

*–> contour plots;
%Rstart(csv,plotdata,GRAPHWINDOW);
datalines4;

library(akima)
attach(plotdata)

surface <- interp(x,y,z)

contour(surface) # create the contour plot
title(main="Contour Plot of x, y, z", col.main="blue", font.main=4)

filled.contour(surface) # create filled contour
title(main="Filled Contour Plot of x, y, z", col.main="blue", font.main=4)

persp(surface,      # create perspective plot
      col="green", expand=0.3)
title(main="Perspective Plot of x, y, z", col.main="blue", font.main=4)

;;;; 
%Rstop;

 

Output Generated by Contour.sas

Contour Plot

contour

Filled Contour Plot

filled_contour

Perspective Plot

perspective

So you can see just how easy it is to more fully flesh out your WPS graphics with the Bridge to R. We’ve only touched on the power of R when it comes to graphics but this gives you an idea of the flexibility that R and the Bridge can offer the data analyst. Remember, the Bridge to R is offered at no cost to those customers who license WPS from MineQuest!

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Technorati Tags: ,,

Importing and Exporting R data frames with the Bridge to R

Even though the weather has been spectacular, I spent the better part of the weekend indoors watching the NCAA Basketball Tournament and adding a new feature in to the Bridge to R. I think anyone who uses the Bridge will find this very useful. The feature is the ability to read an R data frame into WPS.

We’ve always provided you with the ability to create a data frame within the Bridge, but for some reason have always struggled with how to read in a frame that was created in R. Part of the dilemma is keeping an arm’s length distance from R as to not infringe on copyright or overstep the license agreement.

The other aspect that is problematic is catching all the errors when something goes south, and eventually, there will be an issue where a data frame doesn’t exist or was misspecified and things go south quickly.

What I decided to do is extend the %Rstop macro that completes and executes the R interface by providing an “Import=” option. By listing the data frames in the “Import=” clause, i.e. the ones that you created in your R session, the Bridge to R will automatically look for those frames and read them in and convert them to WPS data sets for you.

Below is a short code example using the LM (linear model) function in R. If you’re an R programmer, you know that almost everything in R is an object. What you need to do is specify and create a data frame from the object. In the example, we’re creating three data frames. They are coeff, resid and fitted and they output data that corresponds to the model coefficient’s, the residuals and the fitted values.

%Rstart(csv,census,NOGRAPHWINDOW);

datalines4;

  attach(census)

  census.lm <- lm(pop2 ~ year+lpop, data=census)

  summary(census.lm)

  coeff = data.frame(coef(census.lm))

  resid = data.frame(resid(census.lm))

  fitted = data.frame(fitted(census.lm))

;;;;

%Rstop(import=coeff resid fitted);

The last line of the code above shows the import clause in the %Rstop macro. By listing the data frames that you created in R (and using the exact same capitalization), we can read in the data frames and create the corresponding WPS data sets.

This new enhancement will be available in the next release of the Bridge to R. In the next week, we’ll create a video or two on using Ggobi from the Bridge as well as showing how to import data frames.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Technorati Tags: ,,,,

Looking Down the Road

We’ve had the Bridge to R out for about a week now and so far, it’s been very positive on the feedback. The Bridge is pretty stable and I think fairly easy to use after you setup R and install the Bridge itself.

Which brings us to what we may want to include in the next release. What we’re thinking about is including MPExec, and a nifty little macro called WPS2XML which allows you to create an XML file as well as generate a schema (DTD or XSD) from your WPS data set. The code for both utilities is already written and just requires testing in a more stressful environment.

So, what is MPExec? It’s a very robust utility that we wrote for WPS customers last year. MPExec stands for Multi-Processor Execution and it allows the WPS user to thread their programs so that they can run multiple parts of the program at the same time. On a multi-core desktop or server, one can dramatically reduce their programs execution time — depending on how well the program can be threaded.

Most programmers (and SAS programmers especially) think in a top down fashion when designing their programs. For example, you may have multiple steps to extract and clean data from a database, another set of steps that access data in a transport file that was created on the mainframe that also needs cleaned and sorted, and finally, some historical data that is sitting in another database on another server.

None of these three steps outlined above have anything in common (i.e. data sharing) in the sense that they have to run sequentially. Why not run these all at the same time on your multi-core desktop or server and save time? That’s exactly what MPExec allows you to do.

I’m interested in hearing what other users may have interest in when it comes to utilities to enhance and expand their use of WPS. Currently, the Bridge to R and the accompanying utilities are only available on Windows platforms. Is there a need or interest to have them also execute on Linux/Unix/Solaris? Of course, we’re always interested in hearing ideas about how we can expand our utilities to include R in a more seamless fashion as well.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Linux Port of the Bridge to R

After some trial and error, I’ve finally been able to get WPS on Linux to talk – at arms length – to the statistics package, R. I’m pretty excited about this because it opens new avenues for the Bridge to R.

Given my time line, I hope to have the Bridge to R on Linux (x86) completed and tested by November 5th, 2009. If any WPS sites running Linux are interested in playing with a beta release of this, let me know and we will arrange for you to get a copy of the compiled macro catalog.

I’m making some assumptions here, but I imagine that the same code that implements the Bridge for Linux will also be able to run under SUN Solaris (x86) and Macintosh OSX. Compatibility between different Linux (x86) platforms will be something we will test down the road.

I do like the idea of running Linux as a 64-bit OS so we can run R programs that require a large memory address space. As the 64-bit platforms become common, I think this will become the standard installation for almost all servers and many desktop machines.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.