Category Archives: data visualization

Thoughts on Mapping and Geocoding in WPS

I’ve been working on a framework for a set of new macros to be included in the Bridge to R that I think will be very useful for many WPS users. Coming from a background in Demography, I’ve always been partial to maps and charts. There’s a plethora of open source products out there as well as API’s from Google, Bing, etc… that allow a user to create some pretty darn nice maps.

I recently became aware of a new R library by Hadley Wickham and David Kahle called ggmap. Professor Wickham has created some phenomenal software for the open source R system. Hadley created ggplot2 that is truly the standard for graphics in the R world. He has also written a book on ggplot2 that is well worth purchasing and can be found on Amazon by clicking here.

For most of us in the WPS world, we are rather limited to the native graphics available in the product. That was mostly overcome with the Bridge to R that we created a few years ago. So there are two things that I see as important at this juncture that needs to be addressed and that is geocoding and mapping.

First I want to discuss geocoding. Geocoding has always been this strange process that is 10x more convoluted than it really needs to be. For most of us, we want to provide an address from out data set and get back latitude and longitude for that address. For a much smaller group of users, they want to enhance their data with latitude and longitude as well as zip code, etc… Either way, using external services from commercial companies to do such a thing is often expensive. This is especially true or smaller data sets where there is a standard fee plus so much per name.

The second aspect worth discussing is the availability of mapping software and the associated cost. Some of these programs are expensive to say the least. Unless you intend to make a career out of creating a map (and I am not) then other alternatives need to be looked at to keep costs down.

So, long story short, it’s worth investigating interfacing into open source and cost free solutions for map making using WPS. I have briefly looked at Google, Bing and OpenStreetMaps. There are pros and cons to each one of them but I want easy and nice looking and that’s the driver behind my development.

Professor’s Wickham and Kahle have done a lot in this area trying address the short comings of R for mapping. I could not approach their creative genius and determination in creating ggmap, but I can create an interface that makes using ggmap easier for WPS users. So, my summer adventure is to create a clean interface using WPS and the Bridge to R so that WPS users will have some extraordinary maps that they can create using ggmap.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

What is in Your BI Stack?

Earlier this week, I was sitting around talking to a few friends at a place called the Tilted Kilt (not a bad place either) about what constitutes their analytics platform in regards to software that they use on a regular basis. One fellow works for a major finance house here in town, the second person works for an educational consortium, and the third works for an advertising agency of about 100 employees.

Pretty much as expected, both the finance and the education employee are stuck with what is “provided” by their employer. In other words, they’re not allowed by the IT group to add any software to the “standard” analytics desktop (for whatever that means). The software for these two folks was pretty straight forward and included an expensive stat package (SAS) and Excel.

Lynn, who works for an ad agency was fairly unique in my view because of the diversity of tools she had at her disposal. She had the standard Microsoft Office install, but also had SPSS, Stata, RapidMiner and R, as well as a data visualization package which I simply cannot remember the name of right now.

I understand that some of the tools that Lynn uses is driven by the fact that they are open source and cost effective, but she’s also one of the smartest data analysts I’ve known for the last six or seven years. It started me thinking about what I use most often and currently, my BI stack consists of:

WPS – a SAS language compatible software application

R – open source statistics and graphics

Bridge to R – interface into the R system for WPS users

Excel – spreadsheet

Ggobi – data visualization

Google Refine – data cleansing

Looking at my list, three of the six software applications are open source.

I’m curious to hear from others on what constitutes your BI stack and whether your organization allows you to augment the software with tools of your choice. I’m especially interested in hearing how your company deals with open source software and if you think that having a choice of tools allows you to think outside the box?

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Creating Maps with WPS and the Bridge to R

A while back, I demonstrated how you can use the Bridge to R to create almost any graph or plot using WPS in combination with R. I showed how you can create the cowboy hat as well as some basic and not so basic charts and plots. One thing that I didn’t demonstrate was how you can create thematic maps (aka chloropleth maps) using the Bridge.

Today, I want to delve into that area a little bit and provide some programming samples that you can use to create these maps. First, you need to have a copy of the Bridge to R and WPS (or SAS) to run these demos. Some of the later code also uses the county2000 dataset available from the downloads section of the minequest.com website.

First a little background. Thematic mapping is a great way to show how certain attributes change or vary over given political boundaries. For example, depicting how states differ in terms of income tax assessment, or what are the most populous counties in the country. Providing a visual map for your users to understand variation across geography is always helpful in my opinion. R provides a library called “maps” that contains polygons for drawing thematic maps and a means for attaching a variable that you want to visually display demonstrating change over a given geography. I will show how you can use state and county outlines from R to do just that with the Bridge to R.

To draw a simple outline of the United States using the Bridge, it only takes three lines of R code. For example:

Program 1. Displaying U.S. State Outlines.

   1: *--> Outline of the United States - by state;

   2: %Rstart(dataformat=manual, data=, rGraphicsViewer=true);

   3: datalines4;

   4:

   5:  library(maps)  # load the boundary file

   6:  map("state", interior = TRUE, projection="polyconic", col="blue")

   7:  title('United States')  # draw the map

   8:

   9: ;;;;

  10: %Rstop(import=);

Map 1. U.S. State Outlines Map.

US_States

Click on map to view an expanded image

We can expand on the above map by adding one more line of code which will draw the county outlines inside of the state boundary outline.

Program 2: Creating State and County Outlines.

   1:

   2: *--> Outline of the United States - by state/county;

   3: %Rstart(dataformat=manual, data=, rGraphicsViewer=true);

   4: datalines4;

   5:

   6: library(maps)

   7:

   8:  map('county', boundary=TRUE,

   9:  interior=TRUE, projection="polyconic", col='lightgray', fill=TRUE, resolution=0, lty=1)

  10:  map('state', boundary=FALSE,

  11:      projection="polyconic", col='white', fill=FALSE, add=TRUE, lty=1)

  12: ;;;;

  13: %Rstop(import=);

Map 2. State County Outlines.

US_State_county_outline Click on map to view an expanded image

We can take this one step further by selecting only the geographic areas that we are interested in displaying by passing an argument to R passing just the regions we are interested in viewing. In this case, I’ve taken the liberty to pass the string containing the regions to R by using a macro variable. The Bridge to R can pass macro variables to R to help minimize typing and making mistakes.

Program 3. Selecting specific areas to map.

   1: *--> Great Lakes States by county - How to map a subset;

   2: %let geogarea = 'ohio','michigan','indiana','illinois','wisconsin';

   3:

   4: %Rstart(dataformat=manual, data=, rGraphicsViewer=true);

   5: datalines4;

   6:

   7: library(maps)

   8:  map('county',region= c(&geogarea), boundary=TRUE,

   9:  interior=TRUE, projection="polyconic", col='lightgray', fill=TRUE, resolution=0, lty=1)

  10:  map('state',region= c(&geogarea), boundary=FALSE,

  11:      projection="polyconic", col='white', fill=FALSE, add=TRUE, lty=1)

  12:

  13:  title('Great Lakes States')

  14:

  15: ;;;;

  16: %Rstop(import=);

When we run the code above (Program 3), we are presented with a map that just contains the counties for the Great Lakes States Ohio, Michigan, Indiana, Illinois, and Wisconsin.

Map 3. Great Lakes States.

greatlakes_states_county_outlineClick on map to view an expanded image

So far, I’ve shown you how to (1) create a map, (2) overlay two geographic areas (state and county) on a map, and (3) how to select a specific subset of the data to display (Great Lakes States) in creating your maps. Let’s move on and see how you can map your data using the Bridge to R and the R maps library.

The data I’m using to create the county population density map below is from a zip file that you can download from the MineQuest website. Basically, I’m using WPS to manipulate the data to get it into a format that R can use and then using the Bridge to R, call the mapping routines to display this data.

Program 4. Displaying your data in a thematic map.

   1: libname cntydata 'c:\data';

   2:

   3: proc format;

   4: value popval

   5: 0-24999 = 1

   6: 25000-99999=2

   7: 100000-249999=3

   8: 250000-499999=4

   9: 500000-749999=5

  10: 750000-high=6;

  11: run;

  12:

  13:

  14: data cntydata(keep=names cntypop);

  15:   set cntydata.county2000;

  16:   length names $ 32 cntypop 8;

  17:   cntypop = pop100;

  18:   if state in('02','15','72') then delete;

  19:   x=indexw(name,'County');

  20:   if x > 0 then cntyname=substr(name,1,x-1);

  21:

  22:   y=indexw(name,'Parish');

  23:   if y > 0 then cntyname=substr(name,1,y-1);

  24:

  25:   names=trim(lowcase(fipname(state)))||','||trim(lowcase(cntyname));

  26:   format cntypop popval.;

  27: run;

  28:

  29:

  30: *--> great a US map at county level showing population density;

  31: %Rstart(dataformat=csv, data=cntydata, rGraphicsViewer=true);

  32: datalines4;

  33:

  34: library(maps)  # Load the maps library

  35: popdata <- (cntydata)

  36:

  37: #define the color map to be used

  38: cols <- c("#F1EEF6", "#D4B9DA", "#C994C7", "#DF65B0", "#DD1C77", "#980043")

  39:

  40: mp <- map("county", plot=FALSE,namesonly=TRUE)

  41: # draw the county outlines

  42: map("county", col=cols[popdata[match(mp,popdata$names),]$cntypop],fill=TRUE, projection="polyconic")

  43:

  44: # Draw the state outlines

  45: map('state', boundary=FALSE,projection="polyconic", col='white', fill=FALSE, add=TRUE, lty=1)

  46:

  47: title('U.S. County Population Density')

  48: ;;;;

  49: %Rstop(import=);

Map 4. U.S. County Population Density.

us_state_county_pop_2000 Click on map to view an expanded image

Above is the map generated by the code in Program Listing 4. Personally, I think it’s a nice thematic map and does demonstrate population density by county. It obviously can be enhanced by adding a legend and perhaps a footnote, but I will leave that up to you to figure out. The code that creates the map is only seven lines long. This could easily be made into a template by users for further expanding the map as well as for code reuse purposes.

For more information on creating maps with R, visit Cran at:

http://cran.r-project.org/web/packages/maps/index.html and download the maps.pdf file.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Version 2.4 of the Bridge to R now Available

It’s been a while since I last posted to my blog. I’ve been very busy on two different projects as well as writing RFP’s for WPS license sales. Looking back on the blog, the last post was January 27th. WOW!

So what’s the latest news? A new release of the Bridge to R is now available for WPS Users. This release, v2.4 is a bit more robust being able to recover from some formerly catastrophic programming errors.

But the highlight of Version 2.4 is that it now includes support for Ggobi for interactive data visualization. We’ve simplified calling Ggobi and passing data to it so all you really need to do is type:

%ggobi( datasetname );

If you’re not familiar with Ggobi, I suggest you visit www.ggobi.org and view the tutorials and demos on the website. There’s some really nice things that they’ve implemented.

If you’re a WPS user, you will need version 2.4 of WPS to run the Bridge to R. It does make use of some new features available in WPS that requires v2.4.

If you have licensed WPS from MineQuest, and you’ve not received a copy of version 2.4 of the Bridge to R, contact sales@minequest.com and we’ll get a copy right out to you.

The Bridge to R is provided as a free utility from MineQuest only if you purchase your WPS license directly from us. If you’ve licensed a copy of WPS directly from World Programming or from another reseller, The Bridge to R can be purchased for $149 for a desktop license and $499 for a Windows Server license.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.