Utilities for Reading and Writing Excel

Here at MineQuest Business Analytics, we us Windows desktops and servers, Linux servers and Apple’s OS X on the desktop. As a Value Added Reseller I can tell you that a lot of WPS applications are run and are being developed on these platforms. Mainly due to pricing of hardware and software (commodity x86 CPU’s), most of our sales relate to Windows and Linux as it pertains to WPS software.

One issue that I run into is that reading an Excel file is a piece of cake on Windows, it’s not so easy on OS X or Linux. It seems that Microsoft doesn’t support an ODBC driver on OS X to read Excel. That does seem strange since they have an Office product that runs on OS X.

We have developed a utility to read and write Excel files for the three platforms that were mentioned above. We are currently writing the documentation for the utilities so that will take another week perhaps. MineQuest will be releasing these utilities to MineQuest’s customers as a free add-on just for ordering their WPS licenses through us. If you procured your WPS licenses through another reseller or through World Programming LTD directly, there is a small one time charge to acquire these macros.

We will also provide the source code to these two utilities (ReadExcel and WriteExcel) so that developers can enhance and modify the code themselves. The support for the product will only be to update code that was developed by others to enhance the product.

If you want to use the utilities in a product that you or your company will be selling, then you will be required to have a commercial license. There is a small charge for a commercial license so that you can redistribute the product in your own application.

Note that these utilities are not designed to work in SAS. They are specific to WPS.

The utility to read an Excel workbook is a simple macro call.

          %ReadExcel( data = mydataset,
                      Workbook = “c:\temp\testbook1.xlsx”,
                      Sheet = “sheet1″);

Where

Data is the name of the WPS/WPD data set you want to create.
Workbook is the name of the Excel Workbook you want to read the data from.
Sheet is the name of the sheet in the workbook from which you want to read the data from.

The utility to write an Excel Workbook is just as simple.

          %WriteExcel( data = a,
                       Workbook =   “c:\excelutils\data\testworkbook.xlsx”,
                       Sheet = “sheet1″,
                       Datefmt = mm/dd/yyyy,
                       javamemsize = 500m,
                       Replace = TRUE);

Where

   Data is the WPS/WPD data set that you want to write out to Excel.
   Workbook is the name and location of the workbook you want to create.
   Sheet is the name of the sheet in the workbook that you want to create.
   Datefmt is the format that you want the date fields to look like.
   Javamemsize is the amount of memory you want to allocate to java (optional).
   Replace is whether you want to delete the existing workbook before creating the new workbook.

What I like about these two utilities is that you can use the exact same syntax across all three OS platforms to create an Excel Workbook. If you are creating or already have a product (i.e. a vertical market application) in WPS, you can open up your product to other platforms without changing any code.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in MineQuest | Leave a comment

Some thoughts on a rainy Monday

The more I use Linux, the more I come around to understand just how much I can do with it. As a matter of fact, I could easily do without Widows and switch 100% of the way over to Linux if I wanted. The desktop(s) and business applications have really gotten that good.

Windows 8 just soured me on the whole MS ecosystem. When they bolted on the Metro interface on a server OS — that was the last straw for me. Who ever made that decision to strap on a touch interface to a server should be let go. Shown the door. Asked to leave…

I have Apple hardware here in the office, and it runs well, but I just have not been able to embrace it like so many others have. Apple makes some fine hardware and there’s a load of support for Office productivity applications as well as analytical apps. WPS runs quite well on OS X as well as R. As a matter of fact, I see a lot of R users who work on OS X as there preferred platform.

But Linux, and specifically Ubuntu 12.04 and 14.04 have been especially good. I don’t have memory issues when I run large simulations in R that require a lot of RAM. With Windows, that is often a problem, trying to allocate a large block of memory and there’s not sufficient contiguous memory to hold a large array, vector or data frame. The memory management is significantly different under Linux than under Windows.

Use of NVidia’s CUDA framework seems to be predominantly used on Linux and not Windows. I’m not sure why that is to be honest.

I’ve been reading a lot of articles stating that MS is working feverishly trying to get Windows 9 out the door. No doubt (at least in my mind) it has to do with the terrible Metro interface and people staying away in hoards. Of course, you can slap Start8 by Stardock on Windows 8 and it makes it useable by implementing the start button, and kudos to Stardock for doing such a thing, but I still can’t find a way to embrace MS on the desktop any longer.

An interesting phenomena that I have been witnessing is how much analytical and scientific development has been happening over the years on the Linux platforms. There are a lot of tools out there that are helpful if you are a data scientist or working with “BIG DATA” as it pertains to Linux. My experiences in reselling WPS is that there is an equal amount of interest (perhaps more) in using Linux on servers than in running Windows servers. Cost is one factor but performance is also a factor. Linux often out performs Windows Servers dollar for dollar and CPU second to CPU second.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in Apple, Business, Business Intelligence, Linux, Windows | Tagged , , , , , | Comments Off

Post Installation Steps for WPS Workstations

We recently wrote a short technical document on a set of post installation steps that MineQuest Business Analytics recommends after you install WPS on your workstation. We are often asked what needs to be done after WPS is installed to get the greatest performance out of WPS without too much hassle.

The document walks you through modifying your WPS configuration file, moving your work folder to another drive, why you want to install R (for using PROC R of course!), creating an autoexec.sas file, turning out write caching and a few other pointers. You don’t need to to all of the suggestions, after all they are just suggestions, but they are useful modifications that will enable you to get more out of WPS on your workstation.

You can find the document “Post Installation Steps for WPS Workstations” in the Papers Section of the MineQuest website.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in Disk Cache, SAS Replacement, Support, Tips, WPS 3, WPS 3.1, WPS V3 | Tagged , , , , | Comments Off

WPS Release 3.1.1 now Available

For those who are unaware, World Programming had a small release this month, version 3.1.1. This is mostly a maintenance release but it does include PROC CORRESP.

A few other things of interest, is that v3.1.1 includes the NOSPARSE and OUTEXPECT options in PROC FREQ as well as the RAND function. Finally, PROC LOGISTIC also includes the LINK-GLOGIT parameter.

MineQuest Business Analytics strongly suggests that you upgrade your existing version of WPS to version 3.1.1 to make sure you have all the maintenance and stability enhancements of the current release.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in WPS | Comments Off

WPS for Workstations

In the last few weeks, we put together a document that describes the World Programming System for workstations and desktops. The document describes some of the licensing behind WPS and what procedures and database engines are supported.

If you are considering a WPS solution and want some detailed background on the product before purchasing a WPS Workstation license, this document should help.

You can download the Product Overview from our website by clicking the link below.

Product Overview – WPS for Workstations (1.02MB PDF)

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in Licensing, MineQuest Business Analytics, WPS 3.1 | Tagged , , | Comments Off

A Summer Project

One of my summer projects is building and performance tuning a relatively inexpensive analytics server. Many of the parts that are being used have been scavenged from another server or two that have been retired. One thing I want to do this summer is report on what I have discovered in performance tuning a modest server.

The server consist of a six core AMD processor and 16GB of RAM to start out with. I would like to experiment with different combinations of RAM, hard drives, hard disk controller cards and perhaps an SSD or two. The OS will be Linux, Ubuntu 14.04 specifically.

My baseline build has just two work drives in RAID-0 and use the SATA 3 ports on the motherboard. I will use the Workstation Performance Assessment Program that I wrote about back in 2012. I’ve slightly modified that program so that it doesn’t spew output in the listing with the exception of the actual performance benchmark.

One thing I have already learned is that you need to make sure that you have the Write Cache enabled. In Ubuntu, you would do this by going to Disks and clicking on the options button at the top right of the dialog box and then selecting Drive Settings. Simply select the Write Cache and click on Enable Write Cache. You will need to do this for each disk in the raid array.

ubuntu_disk_cache

When I enable the write cache, my timings for the PROCs and data steps that took place on data sets that existed on the work array dropped 35%. That’s a big improvement!

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in Disk Cache, Linux, SAS Replacement, Ubuntu, WPS 3.1 | Tagged , , , , , | Comments Off

So What is this PROC R?

At the end of March 2014, World Programming introduced a new PROC that is a first for WPS. World Programming created a unique PROC called PROC R. What PROC R is all about is interfacing into the R ecosystem using WPS and allowing you, the user/developer/statistician to write R code and have it execute from within WPS.

PROC R is pretty neat. In contrast to our main competitor, integration with the R system comes as part of the WPS package. With the SAS System, you need to license IML (mucho moolaa $$$) to use Open Source R.

PROC R works as you would expect it to. You write your R code, submit the job and WPS manages the output for you. The output comes back into your LOG and LISTING windows (or files if you are running batch) just like any other PROCEDURE. R graphs, charts, plots, maps, etc… come back and are viewable from the WPS Workbench as well.

So what does PROC R look like? I think it is pretty straight forward because there only a couple of keywords to remember. Here’s an example using some population data that I have.

*--> You must have the R Hmisc package installed before running this program.;

data censusHistory;
   input year pop @@;
   pop2 = Round(Pop/1000000,.1);
   popsq=pop2*pop2;
   lpop=lag(pop2);
cards;
1790 3929214 1800 5308483 1810 7239881 1820 9638453 1830 12860702 1840 17063353 
1850 23191876 1860 31443321 1870 38558371 1880 50189209 1890 62979766 1900 76212168 
1910 92228496 1920 106021537 1930 123202624 1940 142164569 1950 161325798 
1960 189323175 1970 213302031 1980 236542199 1990 258709873 2000 291421906 
;;;;
run;


PROC R;
submit;

 # if you have the Hmisc package already installed - you do not need to run this part
 
 # options(repos=structure(c(CRAN="http://cran.case.edu/")))
 # install.packages("Hmisc", dependencies = FALSE)
 
 ENDSUBMIT;
 Quit;

 
 
 

PROC R;
EXPORT data=censusHistory R=census;

submit;

    census.lm <- lm(pop2 ~ year+lpop, data=census)
    summary(census.lm)
    coef(census.lm)
    resid(census.lm)
    fitted(census.lm)
    plot(census.lm)
    
ENDSubmit;
quit;

Here’s what the WPS log looks like for the above code.

87        *--> You must have the R Hmisc package installed before running this program.;
88        
89        data censusHistory;
90           input year pop @@;
91           pop2 = Round(Pop/1000000,.1);
92           popsq=pop2*pop2;
93           lpop=lag(pop2);
94        cards;

NOTE: A new line was read when INPUT statement read past the end of a line
NOTE: Data set "WORK.censusHistory" has 22 observation(s) and 5 variable(s)
NOTE: The data step took :
      real time : 0.012
      cpu time  : 0.015


95        1790 3929214 1800 5308483 1810 7239881 1820 9638453 1830 12860702 1840 17063353
96        1850 23191876 1860 31443321 1870 38558371 1880 50189209 1890 62979766 1900 76212168
97        1910 92228496 1920 106021537 1930 123202624 1940 142164569 1950 161325798
98        1960 189323175 1970 213302031 1980 236542199 1990 258709873 2000 291421906
99        ;;;;
100       run;
101       
102       
103       PROC R;
104       submit;
105       
106        # if you have the Hmisc package already installed - you do not need to run this part
107       
108        # options(repos=structure(c(CRAN="http://cran.case.edu/")))
109        # install.packages("Hmisc", dependencies = FALSE)
110       
111        ENDSUBMIT;
NOTE: Using R version 3.0.2 (2013-09-25) from C:\Program Files\R\R-3.0.2

NOTE: Submitting statements to R:

> 
>  # if you have the Hmisc package already installed - you do not need to run this part
>  
>  # options(repos=structure(c(CRAN="http://cran.case.edu/")))
>  # install.packages("Hmisc", dependencies = FALSE)
>  

NOTE: Processing of R statements complete

112        Quit;
NOTE: Procedure R step took :
      real time : 0.274
      cpu time  : 0.015


113       
114       
115       
116       
117       
118       PROC R;
NOTE: Using R version 3.0.2 (2013-09-25) from C:\Program Files\R\R-3.0.2
119       EXPORT data=censusHistory R=census;
NOTE: Creating R data frame 'census' from data set 'WORK.censusHistory'

120       
121       submit;
122       
123           census.lm <- lm(pop2 ~ year+lpop, data=census)
124           summary(census.lm)
125           coef(census.lm)
126           resid(census.lm)
127           fitted(census.lm)
128           plot(census.lm)
129       
130       ENDSubmit;

NOTE: Submitting statements to R:

> 
>     census.lm <- lm(pop2 ~ year+lpop, data=census)
>     summary(census.lm)
>     coef(census.lm)
>     resid(census.lm)
>     fitted(census.lm)
>     plot(census.lm)
>     

NOTE: Processing of R statements complete
NOTE: Successfully written image C:\Users\ADMINI~1\AppData\Local\Temp\WPS Temporary Data\_TD8748\ODS LISTING images\I0000005.jpeg
NOTE: Successfully written image C:\Users\ADMINI~1\AppData\Local\Temp\WPS Temporary Data\_TD8748\ODS LISTING images\I0000006.jpeg
NOTE: Successfully written image C:\Users\ADMINI~1\AppData\Local\Temp\WPS Temporary Data\_TD8748\ODS LISTING images\I0000007.jpeg
NOTE: Successfully written image C:\Users\ADMINI~1\AppData\Local\Temp\WPS Temporary Data\_TD8748\ODS LISTING images\I0000008.jpeg

131       quit;
NOTE: Procedure R step took :
      real time : 0.372
      cpu time  : 0.015


132       run;
133 

The output generated by R.

Call:                                                                                                                               
lm(formula = pop2 ~ year + lpop, data = census)                                                                                     
Residuals:                                                                                                                          
    Min      1Q  Median      3Q     Max                                                                                             
-4.9170 -0.9663 -0.0757  1.0125  5.8427                                                                                             
Coefficients:                                                                                                                       
              Estimate Std. Error t value Pr(>|t|)                                                                                  
(Intercept) -213.08618   53.54030  -3.980 0.000878 ***                                                                              
year           0.11848    0.02915   4.064 0.000728 ***                                                                              
lpop           1.01868    0.02191  46.491  < 2e-16 ***                                                                              
---                                                                                                                                 
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1                                                                      
Residual standard error: 2.338 on 18 degrees of freedom                                                                             
  (1 observation deleted due to missingness)                                                                                        
Multiple R-squared:  0.9994,    Adjusted R-squared:  0.9993                                                                            
F-statistic: 1.521e+04 on 2 and 18 DF,  p-value: < 2.2e-16                                                                          
 (Intercept)         year         lpop                                                                                              
-213.0861783    0.1184845    1.0186849                                                                                              
          2           3           4           5           6           7           8           9          10          11          12 
 1.14121085  0.43020696 -0.29013941 -0.61982824 -0.96633352 -0.32965525  0.47152164 -1.86653985  1.21408360  1.01249332 -0.01151889 
         13          14          15          16          17          18          19          20          21          22             
 1.35699492 -2.32680910 -0.36950625 -0.07573220 -1.51559104  5.84268162  0.13465832 -2.29862522 -4.91696081  3.98338854             
         2          3          4          5          6          7          8          9         10         11         12         13 
  4.158789   6.769793   9.890139  13.519828  18.066334  23.529655  30.928478  40.466540  48.985916  61.987507  76.211519  90.843005 
        14         15         16         17         18         19         20         21         22                                  
108.326809 123.569506 142.275732 162.815591 183.457318 213.165342 238.798625 263.616961 287.416611 

Finally, the graphs that R generated.

 

I0000001

I0000002

 

I0000003

I0000004

 

This gives you the idea of how easy PROC R is to use in WPS.  PROC R is available across all platforms that are supported by R and that includes the WPS platforms of Linux, AIX, Solaris, Unix, Windows and OS X.

WPS is a high value and low cost alternative to the SAS System and with the new PROC R being included (not an additional purchase), it confirms the value proposition of WPS in both small and large companies.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in MineQuest | Comments Off

What’s New in WPS v3.1

We have a new release of WPS out the door and we wanted to share the news! This is a major release and includes a number of new features and procedures.

New and Enhanced Communication Features

WPS Link – WPS Link is an interface for the communication between a desktop version of WPS (or a fat client) and a WPS Server. WPS Link implements the Eclipse Workbench that allows a user to submit WPS programs from their desktop to a server. WPS Link also includes a file explorer where a user can store their programs on the server and access them as if they were on their desktop. WPS Link will only talk to a WPS Server and does not provide desktop-to-desktop communications. WPS Link is included as part of your WPS license fee.

WPS Communicate – is a product that allows for the remote submitting and scripting of WPS programs to a server. It differs from WPS Link in that WPS Communicate allows for the Upload and Download of files and data sets programmatically. Communicate will only communicate between a desktop copy of WPS and a Server or Server-to-Server. WPS Communicate does not provide desktop-to-desktop communications. The WPS Communicate client is included in your desktop license and a server connection is included with a server license of WPS.

 

New Procedures

PROC ARIMA – Arima (autoregressive moving averages) is a time series modeling technique to help better understand your data or predict future points in a data series.

PROC EXPAND – is a procedure that allows the WPS user to expand or contract time series data and interpolate missing values as well.

PROC FORECAST – a forecasting module that implements basic forecasting methods that are highly automated. Proc forecast is able to forecast hundreds of series at a time using either separate variables or with the use of the By statement.

PROC HTTP – allows access to remote “cloud-based” files.

PROC JAVAINFO – allows the WPS developer to ascertain information about the Java environment that WPS is using.

PROC KDE – The KDE procedure performs either univariate or bivariate kernel density estimation.

PROC R – Proc R is the first procedure written by World Programming that is unique to WPS. Proc R allows you to execute R code from within the Eclipse environment and to exchange data frames and WPS data sets between the two applications.

PROC Soap – The Proc Soap procedure reads in XML from a file using a fileref and writes XML output to another file that also has a fileref.

PROC VARCLUS – Varclus is a procedure that implements variable reduction by separating variables into non-overlapping groups (i.e. clusters).

PROC X12 – is a procedure that seasonally adjust time series data either monthly or quarterly.

 

New System Features

DBCS – Double Byte Character Support is now available for the first time in this release. DBCS allows for support for languages that have more than 256 characters. Not available on z/OS.

JavaObj – is an interface that allows the WPS and Java developer to run Java Programs from WPS.

Secure Email – WPS now has support for secure email.

 

Database Engine Features

SAP Hana – Support for SAP’s in-memory database is now included as a new access engine.

Actian Matrix – A new database engine for Actian Matrix (formerly Paraccel) has been implemented and is included as a component of the system.

Netezza – WPS engine for Netezza named pipe bulk loading and unloading.

MySQL - Added bulk insert functionality for MySQL.

SSL – added support for SSL (Secure Sockets Layer).

 

Pricing and licensing

Pricing for the new release remains the same starting at $1,311 for a new workstation license. Don’t expect any price increases until the end of the year. In regards to licensing, there are no up-charges for Data Service Providers.

 

Evaluations and Quotes

MineQuest Business Analytics is an Authorized reseller of the World Programming System. Contact us to arrange your free 30 day product evaluation on a desktop or server. Contact us for a quote or to arrange a free 30 day evaluation for any of our products.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in R, WPS 3.1, WPS Evaluation | Tagged , , , | Comments Off

Who says only big companies can afford to utilize Business Intelligence?

One of the reasons I got into reselling WPS was the fact (and it’s still a fact) that it’s very expensive for a new firm or startup to utilize SAS products. Actually, it’s prohibitively expensive. A commercial startup business is looking at $8700 for a desktop license that provides access to BASE, GRAPH and STAT. That $8700 is for the first year and it doesn’t include access to a database, Open Source R or reading and writing to desktop files like Excel and Access. Add those necessities in the price and you are looking at more than $15,000 for the first year and more than $4200 for renewal.

With WPS our pricing is different. We kind of joke that whatever SAS does pricing wise, we do just the opposite. We don’t have a high barrier in terms of cost to start using our products. Actually, we encourage you to use our products! Currently, we charge $1,311 for a single desktop license. That’s the cost for the first year and it includes all the database engines that you would want.

We don’t have a high barrier to using the language. If you are already familiar with the language of SAS, then you are ready to go with WPS.

We don’t have a high barrier when it comes to accessing your SAS data sets. We can read and write SAS data sets just fine.

But enough about barriers, let’s talk about servers.

The pricing differential is even greater when you start looking at servers. You can license a small WPS server for less than $5,700. That’s a two LCPU server and it includes all the bells and whistles that our desktop licenses include as well. Meaning it includes all the database access engines. The nice thing about our licensing is that we don’t have client license fees. Client license fees are fees that you pay to be able to access the server you just bought! It’s a stupid fee and we try not to do stupid things!

Another way we differ from our competitor is that we don’t have Data Service Provider fees. Let’s face it, many small companies (and large companies too) provide data and reports to their customers and vendors for further analysis and research. As a DSP, you will pay significantly more for your SAS license than what is listed. Expect to pay at least 30% more and often times, a lot more.

If you’re a startup, the message is clear. You probably don’t have a lot of money to toss around and cash flow is an issue. MineQuest has partnered with Balboa Capital to help company’s manage their licensing costs. By working with Balboa Capital, you can manage your license costs by paying a monthly amount of money towards your license. You will have to take out a two year WPS license to qualify for the program, but it’s an easy and efficient way to manage your resources.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in Business Intelligence, Pricing, SAS Replacement, WPS 3, WPS V3 | Tagged , , , , , , | Comments Off

High Performance Workstations for BI

There’s one thing I really enjoy and that’s powerful workstations for performing analytics. It’s fun to play around with and can be insightful to speculate on the design and then build a custom higher-end workstation for running BI applications like WPS and R.

ARS Builds

Every quarter, ARS Technica goes through an exercise where they build three PC’s mainly to assess gaming performance and then do a price vs. performance comparison. There’s a trend that you will soon see after reading a few of these quarterly builds and that is, the graphics card plays a major role in their performance assessment. The CPU, number of cores and fixed storage tend to be minimal when comparing the machines.

This if course will be in contrast to what we want to do for our performance benchmarks. We are looking at a holistic approach of CPU throughput, DISK I/O and graphics for getting the most for the dollar on a workstation build. But ARS does have a lot to recommend when it comes to benchmarking and I think it’s worthwhile including some of their ideas.

What Constitutes a High End Analytics Workstation?

This is an interesting question and one that I will throw out for debate. It’s so easy to get caught up in spending thousands of dollars, if not ten thousand dollars (see the next section) for a work station. One thing that even the casual observer will soon notice is that being on the bleeding edge is a very expensive proposition. It’s an old adage that you are only as good as your tools. There’s also the adage that it’s a poor craftsman that blames his tools. In the BI world, especially when speed means success, it’s important to have good tools.

As a basis for what constitutes a high end workstation, I will offer the following as a point of entry.

  • At least 4 Logical CPU’s.
  • At least 8GB of RAM, preferably 16GB to 32GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a mid-tier solution, I would think that a workstation comprised of the following components would be ideal.

  • At least 8 Logical CPU’s.
  • A minimum of 16GB of RAM.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a high end solution, I would think that a workstation built with the following hardware would be close to ultimate for many (if not most) analysts.

  • Eight to 16 Logical CPU’s – Xeon Class (or possible step down to an Intel I7).
  • A minimum of 32GB of RAM and up to 64GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • Multiple 24” displays capable of at least 1920×1200 each.

I do have a bias towards hardware that is upgradeable. All-in-one solutions tend to be one shot deals and thus expensive. I like upgradability for graphics cards, memory, hard drives and even CPU’s. Expandability can save you thousands of dollars over a period of a few years.

The New Mac Pro – a Game Changer?

The new Mac Pro is pretty radical from a number of perspectives. It’s obviously built for video editing but its small size is radical in my opinion. As a Business Analytics computer it offers some intriguing prospects. You have multiple cores, lots of RAM, high end graphics but limited internal storage. That’s the main criticism that I have about the new Mac Pro. The base machine comes with 256GB of storage and that’s not much for handling large data sets. You are forced to go to external storage solutions to be able to process large data sets. Although I’ve not priced out the cost of adding external storage, I’m sure it’s not inexpensive.

Benchmarks

This is a tough one for me because so many organizations have such an array of hardware and some benchmarks are going to require hardware that has specific capabilities. For example, Graphics Cards that are CUDA enabled to do parallel processing in R. Or the fact that we use the Bridge to R for invoking R code and the Bridge to R only runs on WPS (and not SAS).

I did write a benchmark a while ago that I like a lot. It provides information on the hardware platform (i.e. amount of memory and the number of LCPU’s available) and just runs the basic suite of PROCS that I know is available in both WPS and SAS. Moving to more statistically oriented PROC’s such as Logistic and GLM may be difficult because SAS license holders may not have the statistical libraries necessary to run the tests. That’s a major drawback to licensing the SAS System. You are nickel and dimed to death all the time. The alternative to this is to have a Workstation benchmark that is specific to WPS.

Perhaps the benchmark can be written where it tests if certain PROCS and Libraries are available and also determine if the hardware required is present (such as CUDA processors) to run that specific benchmark. Really, the idea is to determine the performance of the specific software for a specific set of hardware and not a comparison between R, WPS and SAS.

Price and Performance Metrics

One aspect of ARS that I really like is when they do their benchmarks, they calculate out the cost comparison for each build. They often base this on hardware pricing at the time of the benchmark. What they don’t do is price in the cost of the software for such things as video editing, etc… I think it’s important to show the cost with both hardware and software as a performance metric benchmark.

Moving Forward

I’m going to take some time and modify the WPS Workstation Benchmark Program that I wrote so that it doesn’t spew out so much unnecessary output into the listing window. I would like it to just show the output from the benchmark report. I think it would also be prudent to see if some R code could be included in the benchmark and compare and contrast the performance if there are some CUDA cores available for assisting in the computations.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Posted in Bridge to R, Business Intelligence, Hardware, Parallel Processing, R, WPS, WPS 3 | Tagged , , , , , , | Comments Off