Tag Archives: Open Source

Thoughts on Open Source

tucson_sunset

Unbelievable sunsets in Tucson, AZ

I recently read an article on Linux.com by Esther Shein titled “The Case for Open Source Software at Work” where she discusses the results of a survey on the use of Open Source software in companies. Pretty interesting read and it makes the argument that IT workers feel about the importance for having source code.

The elephant in the room that is never presented is how the value is measured by accounting or say purchasing. For example, how much perceived value is derived by other parts of the company because they look at the software as being free… i.e. cost free?

Individuals are different in their purchasing and use habits. Most individuals I know are driven by price as the first factor and popularity and completeness as other factors in their consideration.

I can’t recall ever seeing a survey of corporate types that measure the desire for specific software where free vs open source code is derived. I imagine that it may be measured internally by some companies but I would love to see a public survey that addresses that issue.

My own opinion, derived from looking at MS Office vs Libre Office is that quality and support is the most important driver for desktop office software. Every large company that I have consulted with use MS Office. They may use an older version but they use MS Office.

When I switch my thoughts to analytical software, I see the same thing. Corporations purchase or license software like WPS or SAS because of support and completeness. Documentation is also a big factor here too. Individuals who don’t have the financial resources to license analytical software like the aforementioned products gravitate towards free software.

I do grudgingly use R when needed but I prefer WPS over any other analytical software. It’s based on a language that I have used for 30 years and feel very comfortable with. I find it much easier to debug my code and like that if I chose to build a product, I know it will run on Windows, OS X, Linux and the mainframe.

When I factor in that I can license WPS for a bit over $3 a day on a Windows or Mac workstation (and our competitor charges just north of $41 a day for your first year) I find it compelling to have WPS in my BI stack. I can still use R and Python but the language of SAS is just too rich and broad to ignore.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in beautiful Tucson, Arizona. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS consulting and contract programming services and is an authorized reseller of WPS in North America.

Complexity and Cost

This past weekend, my wife and I went to a lovely wedding. This was a Catholic wedding that was amazingly short but the priest had a very interesting sermon on complexity and cost. He talked about complexity in our lives and the cost both direct and indirect that we each experience. One example that he gave was smart phones and how expensive they are in terms of outright cost of service as well as the indirect cost, that being how much time we take playing and looking at the gadgets at the expense of others and relationships around us.

Hi sermon got me thinking. This is true for software and business intelligence in particular. The cost of non-open source software can be pretty high. And the reason for that? Support cost, sales cost, maintenance cost, legal costs, etc…

I often see how companies have purposely fragmented their products so that they can charge more for additional libraries modules. This has increased cost tremendously for the consumer. Our competitor is a prime example of this. They send out a local or regional sales person to chat up the prospect. Often, they can’t answer the questions the customer has because of the complexity of the product. So they send out a Sales Engineer or two who visits the prospect to answer these questions and chat them up a second time. Now we have three people in the mix who are making a 100 grand a year (at least) involved in the sale. The price of the software product has to increase to the customer because of all the people involved in the sale.

Here’s another example of added complexity. Different pricing for the same product depending on how you use it. Take companies that are B2B in nature. Firms such as actuarial firms, claims processing, advertising etc… are often labeled as data service providers because they want to use the software in a B2B capacity. Sometimes this is as innocuous as being a Contract Research Organization providing statistical analysis. The cost here comes from a different license (think lawyers), people to audit the customer and employees to enforce the license. It all adds up!

That above examples illustrate everything that is wrong with traditional ways of thinking in terms of software. At MineQuest Business Analytics, we’re proud that we are able to help keep cost down for the customer. We don’t have such draconian licensing for companies that are DSP’s. We don’t have an organization that is setup to milk and churn the customer for every last cent. What we do have is a company that is dedicated to providing the best service and software at an affordable price.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Open Source BI

I ran across an interesting blog post the other day and thought it was worth sharing. The article, Open source BI: New kids on the block is a set of view points from Open source vendors discussing Jim Goodnight’s comment that “We haven’t noticed [open source BI] a lot. Most of our companies need industrial-strength software that has been tested; put through every possible scenario or failure to make sure everything works correctly. That’s what you’re getting from software companies like us – they’re well tested and it scales to very, very large amounts of data.”

I find it an entertaining read and agree with some of what is argued, but I think the bigger point that is missed is not whether Open source BI will continue to gain momentum and replace commercial BI products, but how Open source will become integrated into and begin working in tandem with commercial products.

Technorati Tags: BI,Open Source,SAS,SAS Replacement

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Business Analytics Predictions for 2011

1. Companies will begin to significantly look at their present enterprise agreements and due to economic uncertainty start renegotiating them in an effort to cut cost. One familiar refrain will be that basic analytical software has become a commodity and they will not continue to pay high annual licensing costs. We will see this trend accelerate dramatically with local and state governments who are being crushed by looming deficits.

2. Open Source will continue to make in-roads in the analytics sphere. R will continue to grow in the enterprise by virtue of its popularity in the academic circles. As students enter the work force, they will want to use software they’re most comfortable with at the time.

3. Enterprises will start offering analysis, reports and data to trade partners that show how they can improve their services with each other. This will be a win-win scenario for both organizations.

4. As hardware capability increases, analytical software pricing will become a major concern as businesses will want to use the software on more platforms and in more areas of the company. Linux will be the platform of choice for most of these companies due to low cost and high performance.

5. Desktop analytics, contrary to popular opinion will continue to dominate the enterprise. This is where the hardcore data analysts live, on the desktop, and this will also be where the new algorithms will be developed. Visualization software will also start to become common on the desktop. Businesses who short change their analytical development staff with low powered desktops and small LCD monitors will see less active development by their staffs.

6. We will see enterprises who have invested in specific high cost analytic languages and who have put into production the rules, reports and algorithms on large servers either recode to a new language or migrate to compatible and lower cost languages.

7. The role of innovation will be double edged. There will be those companies and organizations who invest heavily in analytics see advantages over their competitors. There will also be those companies who gain competitive advantage by utilizing their BI stack more effectively by making it available throughout the company.

8. Licensing will continue to hamper companies and organizations as well constrain growth by restricting what companies can provide (reports, data, etc…) to their customers by virtue of being labeled Data Service Providers. Processing of third party data will be a monumental problem to companies due to license issues.

9. The days of processing large amounts of data on z/OS are all but over. I know this has been said before but there just isn’t growth on that platform. Plus, all the innovation in analytics is taking place on the desktop and smaller servers. Companies will look at moving their analytics to z/Linux and other Linux platforms in an attempt to save money on hardware and software cost.

10. Multi-threaded applications running on the BI stack will be the rage. As core counts and memory availability continue to expand, the ability to make use of SMP and MMP hardware will be more important than ever.

11. Server pricing based on client access or client counts will begin to decline. Competition for customers will make such pricing ill-advised.

12. The allure of cloud computing will be strong but with regulatory constraints and laws regarding privacy including fear of losing control of data (i.e. wikileaks) the two largest service sectors in the United States (which are banking and healthcare) will have taken note and will continue to avoid the use of public clouds.

13. Just as in other parts of the economy where we see the creation and bursting of bubbles, in 2011 social media such as Facebook and Twitter will start to be seen as a venue for narcissist’s and a time waste for many people. Companies who have invested millions of dollars to “mine” tweets will see such analysis as less than helpful given such low ROI and such analysis will begin to fall out of favor.

14. Mobile applications will be hot. The delivery of analytics on devices such as iPads and other tablets including smart phones will become much more common.

15. Since “flat is the new norm” Cell Phone providers will find that the high cost of data plans will drive potential customers away and since Wi-Fi connectivity has become so common (almost everywhere I go there is free Wi-Fi), we will see a decrease in 3G and 4G use and Wi-Fi only tablets will dominate. We already are seeing this trend with the Apple iTouch vs. the iPhone, and now the Rim BlackBerry Playbook will also be offered in a Wi-Fi only version.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Creating Maps with WPS and the Bridge to R

A while back, I demonstrated how you can use the Bridge to R to create almost any graph or plot using WPS in combination with R. I showed how you can create the cowboy hat as well as some basic and not so basic charts and plots. One thing that I didn’t demonstrate was how you can create thematic maps (aka chloropleth maps) using the Bridge.

Today, I want to delve into that area a little bit and provide some programming samples that you can use to create these maps. First, you need to have a copy of the Bridge to R and WPS (or SAS) to run these demos. Some of the later code also uses the county2000 dataset available from the downloads section of the minequest.com website.

First a little background. Thematic mapping is a great way to show how certain attributes change or vary over given political boundaries. For example, depicting how states differ in terms of income tax assessment, or what are the most populous counties in the country. Providing a visual map for your users to understand variation across geography is always helpful in my opinion. R provides a library called “maps” that contains polygons for drawing thematic maps and a means for attaching a variable that you want to visually display demonstrating change over a given geography. I will show how you can use state and county outlines from R to do just that with the Bridge to R.

To draw a simple outline of the United States using the Bridge, it only takes three lines of R code. For example:

Program 1. Displaying U.S. State Outlines.

   1: *--> Outline of the United States - by state;

   2: %Rstart(dataformat=manual, data=, rGraphicsViewer=true);

   3: datalines4;

   4:

   5:  library(maps)  # load the boundary file

   6:  map("state", interior = TRUE, projection="polyconic", col="blue")

   7:  title('United States')  # draw the map

   8:

   9: ;;;;

  10: %Rstop(import=);

Map 1. U.S. State Outlines Map.

US_States

Click on map to view an expanded image

We can expand on the above map by adding one more line of code which will draw the county outlines inside of the state boundary outline.

Program 2: Creating State and County Outlines.

   1:

   2: *--> Outline of the United States - by state/county;

   3: %Rstart(dataformat=manual, data=, rGraphicsViewer=true);

   4: datalines4;

   5:

   6: library(maps)

   7:

   8:  map('county', boundary=TRUE,

   9:  interior=TRUE, projection="polyconic", col='lightgray', fill=TRUE, resolution=0, lty=1)

  10:  map('state', boundary=FALSE,

  11:      projection="polyconic", col='white', fill=FALSE, add=TRUE, lty=1)

  12: ;;;;

  13: %Rstop(import=);

Map 2. State County Outlines.

US_State_county_outline Click on map to view an expanded image

We can take this one step further by selecting only the geographic areas that we are interested in displaying by passing an argument to R passing just the regions we are interested in viewing. In this case, I’ve taken the liberty to pass the string containing the regions to R by using a macro variable. The Bridge to R can pass macro variables to R to help minimize typing and making mistakes.

Program 3. Selecting specific areas to map.

   1: *--> Great Lakes States by county - How to map a subset;

   2: %let geogarea = 'ohio','michigan','indiana','illinois','wisconsin';

   3:

   4: %Rstart(dataformat=manual, data=, rGraphicsViewer=true);

   5: datalines4;

   6:

   7: library(maps)

   8:  map('county',region= c(&geogarea), boundary=TRUE,

   9:  interior=TRUE, projection="polyconic", col='lightgray', fill=TRUE, resolution=0, lty=1)

  10:  map('state',region= c(&geogarea), boundary=FALSE,

  11:      projection="polyconic", col='white', fill=FALSE, add=TRUE, lty=1)

  12:

  13:  title('Great Lakes States')

  14:

  15: ;;;;

  16: %Rstop(import=);

When we run the code above (Program 3), we are presented with a map that just contains the counties for the Great Lakes States Ohio, Michigan, Indiana, Illinois, and Wisconsin.

Map 3. Great Lakes States.

greatlakes_states_county_outlineClick on map to view an expanded image

So far, I’ve shown you how to (1) create a map, (2) overlay two geographic areas (state and county) on a map, and (3) how to select a specific subset of the data to display (Great Lakes States) in creating your maps. Let’s move on and see how you can map your data using the Bridge to R and the R maps library.

The data I’m using to create the county population density map below is from a zip file that you can download from the MineQuest website. Basically, I’m using WPS to manipulate the data to get it into a format that R can use and then using the Bridge to R, call the mapping routines to display this data.

Program 4. Displaying your data in a thematic map.

   1: libname cntydata 'c:\data';

   2:

   3: proc format;

   4: value popval

   5: 0-24999 = 1

   6: 25000-99999=2

   7: 100000-249999=3

   8: 250000-499999=4

   9: 500000-749999=5

  10: 750000-high=6;

  11: run;

  12:

  13:

  14: data cntydata(keep=names cntypop);

  15:   set cntydata.county2000;

  16:   length names $ 32 cntypop 8;

  17:   cntypop = pop100;

  18:   if state in('02','15','72') then delete;

  19:   x=indexw(name,'County');

  20:   if x > 0 then cntyname=substr(name,1,x-1);

  21:

  22:   y=indexw(name,'Parish');

  23:   if y > 0 then cntyname=substr(name,1,y-1);

  24:

  25:   names=trim(lowcase(fipname(state)))||','||trim(lowcase(cntyname));

  26:   format cntypop popval.;

  27: run;

  28:

  29:

  30: *--> great a US map at county level showing population density;

  31: %Rstart(dataformat=csv, data=cntydata, rGraphicsViewer=true);

  32: datalines4;

  33:

  34: library(maps)  # Load the maps library

  35: popdata <- (cntydata)

  36:

  37: #define the color map to be used

  38: cols <- c("#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043")

  39:

  40: mp <- map("county", plot=FALSE,namesonly=TRUE)

  41: # draw the county outlines

  42: map("county", col=cols[popdata[match(mp,popdata$names),]$cntypop],fill=TRUE, projection="polyconic")

  43:

  44: # Draw the state outlines

  45: map('state', boundary=FALSE,projection="polyconic", col='white', fill=FALSE, add=TRUE, lty=1)

  46:

  47: title('U.S. County Population Density')

  48: ;;;;

  49: %Rstop(import=);

Map 4. U.S. County Population Density.

us_state_county_pop_2000 Click on map to view an expanded image

Above is the map generated by the code in Program Listing 4. Personally, I think it’s a nice thematic map and does demonstrate population density by county. It obviously can be enhanced by adding a legend and perhaps a footnote, but I will leave that up to you to figure out. The code that creates the map is only seven lines long. This could easily be made into a template by users for further expanding the map as well as for code reuse purposes.

For more information on creating maps with R, visit Cran at:

http://cran.r-project.org/web/packages/maps/index.html and download the maps.pdf file.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Bridge to R Preview Still Available

Just a quick reminder to those who are interested in the Bridge to R. The free trial preview of the Bridge is still available for download at: http://minequest.com/BridgePreview.html. The trial is time limited and will stop working after June 30th, 2010.

There are two versions available depending on whether you’re a WPS user or a SAS user. If you’re on the fence whether R and the Bridge to R is something you want to explore and would like to see a short web video on the Bridge, there’s additional links as well as installation instructions at the link above.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Sizing up Grid Computing

There’s a lot of talk as well as writing about grid computing and its value to the corporation. I for one have always had an interest in the subject as well as possible applications. Using or creating a grid has always been one of those things where I have seen limited value in the corporate world, mainly because of how I defined a grid. To me, historically, I’ve thought of a grid as something that is only good for high performance number crunching for mathematical or statistical modeling and mostly for brute force computations.

But that has changed as I’ve seen how organizations define what a grid is in terms of their computing environment. To many of these companies, the grid is simply a holistic view of the combined corporate compute power across all the divisions in the company and is viewed as “the grid.” And that may be a pretty good way of looking at it too. Consider a company that has numerous WPS server licenses and want to make sure that they are getting the biggest bang for the buck for their software investment. It would make sense to make the available computing power available across the enterprise.

Historically, doing this required some expensive software and fair amount of expert consulting to implement such a thing. It was often something that was only found in large organizations and rarely in more modest sized companies.

My view, and what I want to write about is how this has changed and implementing a grid can be done in a less costly fashion. Briefly, the major software that has been used for grid environments and scheduling includes LSF, SGE (Sun Grid Engine) and there are others. Some are commercial and some are open source so it makes sense to look at your options closely.

Recently, I discovered another grid system called Condor. Condor is available from the University of Wisconsin and is a powerful grid engine. What I like about Condor is that it is available on the most common platforms that I tend to work on and use and Condor supports: Windows, Mac OS X, Linux, AIX, and Solaris. There is a lot of work going on around Condor and I can imagine that it’s going to continue to grow in breadth and popularity.

Condor is open source software and is licensed under the Apache 2.0 license. This is free software that can be used in the development of commercial products. I could see some enterprising company integrate Condor into the desktop with the client portion (i.e. Condor Personal) and the server software include a master. That would be instant grid computing.

This weekend, I downloaded Condor and installed it on a couple of machines on my network. I limited the installation to Windows boxes for testing. Between reading parts of the manual (960 pages!) and writing test programs, I was able to install and run Condor in about a day. I’m happy to tell you that you can submit WPS programs from a client PC and have them execute on any number of servers without too much hassle. As a matter of fact, you can submit a WPS program to the master controller, disconnect your laptop from the network, go to dinner and reconnect later and pick up the output for viewing.

I’ve only scratched the surface of using Condor for implementing a WPS Grid environment, but there are a few things I don’t like. There’s not a GUI for monitoring the grid. Everything is done by command line. This is a product that just cries out for a GUI and I imagine that someone will create one, but this tends to detract for a fine piece of software. Update: I have found a web interface for Condor but have not tried it.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Bridge to R and 64-bit R on Windows

After writing the last blog post on “Multicores and Software Pricing – the Paradox” and almost on cue, CRAN releases a 64-bit alpha version of R for Windows. I quickly downloaded the 64-bit version and installed it on my Windows 64-bit machine to test it with the Bridge. Guess what? The Bridge to R threw an error. Oops!

The error is a subtle one but we ended up changing a single line of code in one of the macros to properly quote some text and voila! The Bridge to R initialized and ran the test code. It’s pretty exciting to see R running in 64-bit mode on Windows and when issuing the R command ”memory.limit()” it returns 8195 megabytes. So now, we have a large memory address to use with R on Windows to solve even more memory intensive programming problems.

At the end of the month, we’re going to release an extended trial of the Bridge to R for WPS users for those who have not yet had an opportunity to give the Bridge a spin. Normally, we provide a 30 day trial which is usually sufficient time to evaluate the software. By offering an extended trial, you have the time to really shake out the software beyond just playing with the sample programs.

Once you start using the Bridge with WPS, you will be tickled at how much more work and computing power are available to you. With the Bridge, you have access to a rich library of graphics as well as statistical routines to round out and embellish your WPS software investment. So check back to this blog or the MineQuest website on April 30th to see when and where you can download the Bridge to R for WPS Users.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

MultiCores and Software Pricing – the Paradox

The rate of computing power is increasing dramatically. A recent article in PC World that I read announced the availability of a 12 core CPU from AMD. What was interesting to me was the price point that was being announced. The story stated that the low end AMD 12 core CPU was priced at $293 per CPU. Samsung has announced that they are starting production of 4GB DDR3 modules that will produce 16GB and 32GB DIMMS in desktops. Slap three 32GB DIMMS in your desktop and you have 96GB of RAM. Perhaps that’s not surprising to you in some ways, but it did raise a lot of eyebrows from Quants who are looking to utilize more computing power.

Related to this fact, a few weeks ago J.D. Long who runs the blog Cerebral Mastication made a statement to me about the growing popularity of R. I had commented that two years ago, I rarely saw a copy of R on a Quants desktop at the clients that I consult with. But today, I see R on 50% of the desktops. J.D. stated that “I suspect R is getting adopted from the ground up, not top down.” The more I think about that statement, the more I think it’s a very astute observation.

But there’s a twist to this story and that is how software companies, especially those in the database and BI areas, price their software. Most of these companies price their software by the “logical CPU” or by the “core.” These companies can’t get away with doing such a thing on desktops but they can on the server. The cost of such software can escalate quickly on even a mid-size server. But I’m contending that pricing in such a manner is quite risky and can make your software less valuable to your customers.

Let me explain my thoughts. In a way it’s a paradox. As your software becomes more expensive (I’m talking when software cost is in the five or six figures) you move from being a “partner” to being a risk. Instead of being able to penetrate numerous departments with your software, the cost becomes a barrier to entry to these departments. Also, cost can be such a factor that it’s unrealistic to think that a single company will continue to invest in your solutions. It’s the nature of business to avoid risk and try to find the most cost effective solution.

No longer is it true that you need to spend money to make money. What I’m postulating is that many open source solutions are available that are sufficiently good enough to replace the standard bearers. Talend and Pentaho are two examples in the BI sector that sit on a server. These systems don’t have to be best of breed; they just have to be good enough to operate effectively in the corporate environment.

Is R a replacement for number crunching on the desktop? I believe it is. In the last week, I’ve been exploring graphics in R and how they could complement or replace SAS/Graph. I found out that I can do a bar chart, a histogram, a contour plot, a filled contour plot, a perspective plot, and even a 3-D plot with only two lines of code for each plot type.

So this is what I think will happen. In the short term, Quants and Developers will use R in conjunction with popular BI software. As they become more familiar and confident with their abilities and with the near term availability of six and 12 core CPU’s with 64GB to 96GB of memory, desktop workstations will replace a lot of servers performing number crunching using R. We can see this in the strategy of “if you can’t beat them, join them” by companies like SAS and SPSS which have incorporated R into their products to try to avoid the criticisms of not offering cutting edge statistical technology.

But the bottom line is that open source BI software is maturing and due to vendor software pricing on the servers that are based on logical CPU’s, these companies have effectively killed the goose that laid the golden egg.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Notes on the Next Release of the Bridge to R

Thought I’d write a little about what MineQuest has been working on for the next release of the Bridge to R. We just wrapped up the programming for the latest release and I’m pretty happy with what is right around the corner for our current users and new users as well.

First, we’ve added the ability to export WPS data sets directly into your R workspace as R data frames. We’ve always provided support for taking a single WPS data set into a data frame but this release makes it easy to export multiple data sets into R. This actually required a lot of effort to do and is based on a request from numerous customers who are using the Bridge.

Originally, I envisioned that people would use the Bridge in a similar way that they would use a WPS or SAS procedure. They would create a data set that contained all the variables they needed for a specific statistical routine in R and use that for their analysis. But I was easily convinced that this was short-sighted because it didn’t allow for the analyst to move all the data sets needed for such things as matrix operations into the R work space.

The other thing that convinced me that this was necessary is that I recently became aware of a book called "A Practical Guide to Geostatisical Mapping" by Tomislav Hengl. Tomislav writes about mapping and to create maps, you need to have multiple data sets. You need one that contains the data to be displayed and a data set that contains the coordinate files. I eventually want to provide some mapping data sets for the Bridge to R so one can create maps using the Bridge so the ability to read multiple WPS data sets is necessary.

Exporting WPS data sets to R is accomplished by specifying the names of the WPS data sets in the %Rstart() clause. Here’s an example:

%Rstart(DataFormat=xpt, data=a b c, rGraphicsViewer=No)

The data sets a, b, c are automatically exported to R dataframes for you without any other commands or programming.

The other improvement in the next release of the Bridge to R is that you can import multiple data frames from your R session to WPS. This is easily done and just requires the analyst to list the R data frames on the Import= clause of the %Rstop macro to bring all the frames back into WPS. For example:

%Rstop(import=dataframe1 dataframe2 dataframe3);

where dataframe1, dataframe2 and dataframe3 are the names of the R data frames that you want to import back into WPS. This will create three WPS data sets named dataframe1, dataframe2, and dataframe3, respectively.

We’ve also added more error checking to the Bridge. We now catch errors when using the XPORT transport format. One problem with using XPORT as a transport format is that it’s limited to eight character variable names. We now examine all the WPS data sets before they are exported to make sure that the variable names are eight characters or less in length and if not, we throw an exception, report on it and don’t try to process the R code because we already know it won’t execute.

By the way, the reason we support the transport format is due to customer requests from those in the biostats area. They wanted to make sure that they can pass a possible data processing audit and they felt much more comfortable with the XPORT format than passing data via a CSV format.

So what’s left? With the next release of the Bridge to R (by the end of April 2010) we are updating the documentation and adding more sample R programs that demonstrate how to use the Bridge. We are adding another half dozen R graphic sample programs and a few more statistical type programs as well.

I’m very confident that the Bridge to R when used with WPS can complement the WPS system by allowing the analysts to do just about any kind of graphics or statistical procedures all from within the WPS IDE. With the low cost of the Bridge (free if you license WPS from MineQuest) and the use of open source R, you can replace SAS/IML, SAS/Graph and many of the SAS statistical modules and be state-of-the-art on your analytics platform.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.