Category Archives: Hardware

High Performance Workstations for BI

There’s one thing I really enjoy and that’s powerful workstations for performing analytics. It’s fun to play around with and can be insightful to speculate on the design and then build a custom higher-end workstation for running BI applications like WPS and R.

ARS Builds

Every quarter, ARS Technica goes through an exercise where they build three PC’s mainly to assess gaming performance and then do a price vs. performance comparison. There’s a trend that you will soon see after reading a few of these quarterly builds and that is, the graphics card plays a major role in their performance assessment. The CPU, number of cores and fixed storage tend to be minimal when comparing the machines.

This if course will be in contrast to what we want to do for our performance benchmarks. We are looking at a holistic approach of CPU throughput, DISK I/O and graphics for getting the most for the dollar on a workstation build. But ARS does have a lot to recommend when it comes to benchmarking and I think it’s worthwhile including some of their ideas.

What Constitutes a High End Analytics Workstation?

This is an interesting question and one that I will throw out for debate. It’s so easy to get caught up in spending thousands of dollars, if not ten thousand dollars (see the next section) for a work station. One thing that even the casual observer will soon notice is that being on the bleeding edge is a very expensive proposition. It’s an old adage that you are only as good as your tools. There’s also the adage that it’s a poor craftsman that blames his tools. In the BI world, especially when speed means success, it’s important to have good tools.

As a basis for what constitutes a high end workstation, I will offer the following as a point of entry.

  • At least 4 Logical CPU’s.
  • At least 8GB of RAM, preferably 16GB to 32GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a mid-tier solution, I would think that a workstation comprised of the following components would be ideal.

  • At least 8 Logical CPU’s.
  • A minimum of 16GB of RAM.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • A large display – 24” capable of at least 1920×1080.

As a high end solution, I would think that a workstation built with the following hardware would be close to ultimate for many (if not most) analysts.

  • Eight to 16 Logical CPU’s – Xeon Class (or possible step down to an Intel I7).
  • A minimum of 32GB of RAM and up to 64GB.
  • Multiple hard drives for OS, temporary workspace and permanent data set storage with emphasis on RAID storage solutions and SSD Caching.
  • A graphics card that can be used for more than displaying graphics, i.e. parallel computing.
  • Multiple 24” displays capable of at least 1920×1200 each.

I do have a bias towards hardware that is upgradeable. All-in-one solutions tend to be one shot deals and thus expensive. I like upgradability for graphics cards, memory, hard drives and even CPU’s. Expandability can save you thousands of dollars over a period of a few years.

The New Mac Pro – a Game Changer?

The new Mac Pro is pretty radical from a number of perspectives. It’s obviously built for video editing but its small size is radical in my opinion. As a Business Analytics computer it offers some intriguing prospects. You have multiple cores, lots of RAM, high end graphics but limited internal storage. That’s the main criticism that I have about the new Mac Pro. The base machine comes with 256GB of storage and that’s not much for handling large data sets. You are forced to go to external storage solutions to be able to process large data sets. Although I’ve not priced out the cost of adding external storage, I’m sure it’s not inexpensive.

Benchmarks

This is a tough one for me because so many organizations have such an array of hardware and some benchmarks are going to require hardware that has specific capabilities. For example, Graphics Cards that are CUDA enabled to do parallel processing in R. Or the fact that we use the Bridge to R for invoking R code and the Bridge to R only runs on WPS (and not SAS).

I did write a benchmark a while ago that I like a lot. It provides information on the hardware platform (i.e. amount of memory and the number of LCPU’s available) and just runs the basic suite of PROCS that I know is available in both WPS and SAS. Moving to more statistically oriented PROC’s such as Logistic and GLM may be difficult because SAS license holders may not have the statistical libraries necessary to run the tests. That’s a major drawback to licensing the SAS System. You are nickel and dimed to death all the time. The alternative to this is to have a Workstation benchmark that is specific to WPS.

Perhaps the benchmark can be written where it tests if certain PROCS and Libraries are available and also determine if the hardware required is present (such as CUDA processors) to run that specific benchmark. Really, the idea is to determine the performance of the specific software for a specific set of hardware and not a comparison between R, WPS and SAS.

Price and Performance Metrics

One aspect of ARS that I really like is when they do their benchmarks, they calculate out the cost comparison for each build. They often base this on hardware pricing at the time of the benchmark. What they don’t do is price in the cost of the software for such things as video editing, etc… I think it’s important to show the cost with both hardware and software as a performance metric benchmark.

Moving Forward

I’m going to take some time and modify the WPS Workstation Benchmark Program that I wrote so that it doesn’t spew out so much unnecessary output into the listing window. I would like it to just show the output from the benchmark report. I think it would also be prudent to see if some R code could be included in the benchmark and compare and contrast the performance if there are some CUDA cores available for assisting in the computations.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Thursday Ramblings

Does anyone do comparisons of graphics cards and measure performance in a VM? Specifically, do certain graphics cards boost performance when running VM’s on the desktop? I like to see my windows “snap” open when I switch from VM to VM. As a developer, I often wonder if spending an additional $150 on a popular graphics card will yield a perceptible performance boost.

Speaking of graphics cards, we recently bought a couple of used Nvidia Quadro graphics cards from a local CAD/CAM company that is upgrading their workstations. I got these at about 5% of their original retail price so I’m happy. We were having problems getting a couple of servers to go into sleep mode using Lights Out and we discovered that we needed a different graphics card to accomplish this. The plus side is that these are Nvidia cards with 240 CUDA cores and 4GB of RAM. So we now have the opportunity to try our hand at CUDA development if we want. I’m mostly interested in using CUDA for R.

One drawback to using CUDA, as I understand it, is that it is a single user interface. Say you have a CUDA GPU in a server, only one job at a time can access the CUDA cores. If you have 240 CUDA cores on your GPU and would like to appropriate 80 CUDA cores to an application — thinking you can run three of your apps at a time, well that is not possible. What it seems you have to do is have three graphics cards installed on the box and each user or job has access to a single card.

There’s a new Remote Desktop application coming out from MS that will run on your android device(s) as well as a new release from the Apple Store. I use the RDC from my mac mini and it works great. I’m not sure what they could throw in the app to make it more compelling however.

Toms Hardware has a fascinating article on SSD’s and performance in a RAID setup. On our workstations and servers, we have SSD’s acting as a cache for the work and perm folders on our drive arrays. According to the article, RAID0 performance tends to top out with three SSD’s for writes and around four on reads.

FancyCache from Romex Software has become PrimoCache. It has at least one new feature that I would like to test and that is L2 caching using an SSD. PrimoCache is in Beta so if you have the memory and hardware, it might be advantageous to give it a spin to see how it could improve your BI stack. We did a performance review of FancyCache on a series of posts on Analytic Workstations.

FYI, PrimoCache is not the only caching software available that can be used in a WPS environment. SuperSpeed has a product called SuperCache Express 5 for Desktop Systems. I’m unsure if SuperCache can utilize an SSD as a Level 2 cache. It is decently priced at $80 for a desktop version but $450 for a standard Windows Server version. I have to admit, $450 for a utility would give me cause for pause. For that kind of money, the results would have to be pretty spectacular. SuperSpeed offers a free evaluation as well.

If you are running a Linux box and want to enjoy the benefits of SSD caching, there’s a great blog article on how to do this for Ubuntu from Kyle Manna. I’m very intrigued by this and if I find some extra time, may give it the old Solid State Spin. There’s also this announcement about the Linux 3.10 Kernel and BCache that may make life a whole lot easier.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Servers and Power Savings

We run a few servers here at MineQuest for developing WPS applications and we are always looking for ways to reduce cost and our carbon foot print. Running a server is not necessarily cheap but not all that expensive either. For example, our Windows Server which is a Six Core machine with 32GB of RAM and filled with 11 hard drives and a few SSD’s uses about 115 watts of power while turned on. That works out to about $10 a month. When we put the server into sleep mode, the power used is about 4 watts.

In reality, we only use the servers for about 12 hours a day during the week and maybe four hours a day on the weekends for doing maintenance. By putting a server into sleep mode when it’s not being used it lowers our estimated bill to about $3.40 a month.

We use a software product called Lights Out on our Windows desktops and Servers that do all the work for us. It also calculates the cost to date for the server in terms of energy usage and the amount of money saved by putting the server into sleep mode.

The product we use, Lights-Out-Green IT for Windows Server Solutions is so reasonably priced. I strongly suggest that anyone running a Windows Server and wants to save some money on their electrical bill take a look at the product. They have a free 30 day evaluation of their software. You can’t ask for much more than that!

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Analytical Data Marts

Recently, there has been a conversation on what defines “Big Data”. It’s my position (among others) that Big Data is data that is so large that a single computer cannot process it in a timely manner. Hence, we have grid computing. Grid computing is not inexpensive and is overkill for many organizations.

The term “Huge Data” has been bandied about as well. In the conversations regarding what is Big Data, it was sort of agreed that Huge Data is a data set that sits somewhere between 10GB and 20GB in size. (Note: In about two years I will look back at this article and laugh about writing that a 20GB data set is huge for desktops and small servers.) The term Big Data is so abused and misused by the technical press and even many of the BI vendors that it’s almost an irrelevant term. But Huge Data has my interest and I will tell you why.

The other day I read a blog article on the failure of Big Data projects. The article talks about a failure rate of 55%. I was not surprised by that kind of failure rate. I was surprised that there were not solutions being offered. In the analytics world, especially in finance and health care, we tend to work with data that comes from a data warehouse or a specialized data mart. The specialized data mart is really an analytics data mart with the data cleaned and transformed into a form that is useful for analysis.

Analytical data marts are cost effective. This is especially true when the server that is required is modest compared to the monsters DB’s running on large iron. Departments can almost always afford a smaller server and expect and receive much better turnaround time on jobs than most data warehouses. Data marts are more easily expandable and can be tuned more effectively for analytics. Heck, I’ve yet to work on a mainframe or large data warehouse that could outrun a smaller server or desktop for most of my needs.

The cost for a WPS server license on a four, eight or even sixteen core analytics data mart is quite reasonable. With WPS on the desktop and a WPS Linux server, analyst can remotely submit code to the data mart and receive back the log, listings and graphics right back into their desktop workbench. But the biggest beauty of running WPS in your data mart platform is that WPS comes with all the database access engines as part of the package. If you have worked in a large environment with multiple database vendors, you can see how this can be very cost effective when it comes to importing data from all these different data bases into an analytical data mart.

About the author: Phil Rack is President of MineQuest Business Analytics, LLC located in Grand Rapids, Michigan. Phil has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and is a authorized reseller of WPS in North America.

Another View of R and Big Data

I was reading a blog entry the other day that just blew me away. Librestats has a blog entry entitled, “R at 12,000 Cores” and it is a very good (and fun) read. It’s amazing what can be done by the open source advocates and this article is a great example of that.

After reading the article, I can’t help but think about the relationship between extremely large data, server size (both CPU’s and RAM) and how fast data is growing. There has to be a way to crunch through the amount of data that is piling up and this article addresses that issue.

I believe you will begin seeing vendors embrace R more openly, mainly because they have to embrace it. There’s not any companies that can develop code at the break neck pace that the R community is putting out packages. It’s truly amazing and cost effective to model data in the way that the above article describes the state-of-the-art.

Even small companies can make use of multiple servers with dozen of cores and lots of RAM rather inexpensively. Using Linux and R on a set of servers, an organization can have a hundred cores at their disposal for crunching data and not paying very much in licensing fees.

I have been giving some thought to making the Bridge to R run in parallel on a single server as well as across a set of servers using WPS and pdbR or Rmpi. This way, WPS would handle the management between the servers and the data transparently and provide for number crunching at very low cost. God knows we have a few extra multiple core servers laying around here so it may be an interesting adventure to give this a spin!

My first thought and intention is to make the code backward compatible. Perhaps just add a macro that can be called that contains the information needed to implement running R across cores and on a grid. It could be something as simple as:

%Rconfig(RconfigFile=xyz, RunInParallel=True||False);

The remaining statements in the Bridge to R would continue as they are and the R code would be pushed to the servers based on the information in the RconfigFile. WPS would still collect the output from these jobs and route the appropriate information to the log and listing window as well as the graphics to the graphics viewing window (wrapped in HTML) for users to view their output.

 

New WPS vs SAS Pricing Comparisons on a Windows Server

We updated our server pricing comparison of WPS on a two, four and eight core server with SAS on the same hardware. We included the cost of the SAS/Toolkit (it’s included with WPS) in the comparison, but for many, it might not be something that is important for them.

At any rate, the figures are there so that an individual or an organization can subtract out the price if they so choose and see for themselves the pricing difference for the two products for the first year and a three year window. It’s pretty amazing how cost effective WPS is on a server in comparison to SAS.

You can view the document by clicking: Pricing_Comparisons_Between_WPS_and_SAS. (pdf ~467kb)

Don’t forget there’s still time to get into the action to win a Google Nexus 7 Tablet. If you register to take out a WPS evaluation before September 30th, 2012, you will automatically be registered in the drawing for the tablet. Certain conditions apply so read the the earlier blog post for all the details. You can request a WPS evaluation by going to the MineQuest Business Analytics website at the WPS evaluation page.

Taking WPS 3.01 for a Quick Spin

I’m writing this today from the testing facilities at MineQuest Business Analytics, the center of the BI Universe. Haha! I’ve always wanted to write that as an opening line.

Anyway, I finally installed the latest GA release of WPS v3.01 on all the machines here. It took me a little while to setup all the configuration files so that WPS will use the optimal disk array for work among other things. Just thinking about this, it’s probably the first time I’ve ever had the exact same release on all the desktops and servers.

I wanted to test out the WPS Link technology more thoroughly. For those folks who are unfamiliar with WPS Link, it allows your workstation version of WPS to link to a Linux server for the purposes of submitting WPS code. So, basically you need WPS on both the desktop and on the Linux server. I have WPS on a Mac, a desktop, laptop and a VM running XP. My desktop host is running Vista x64.

One thing that I do like when I’m on the Mac is that the fonts just seem nicer than on Windows. It’s just a bit more aesthetically pleasing to me. The Eclipse Workbench is available across all the platforms except for z/OS as far as I can tell. On Linux, the fonts are similar to the Mac in style but seem a bit heavier. I imagine a lot of that has to do with the platform and font support from Apple on OS X versus the Linux Open Source Community.

Interestingly, I can submit my code on to the Linux Server from all these clients and it works amazingly well. The server is a small box with four cores and only 8GB of RAM but lots of disk space. I ended up setting some options such as setting MAXMEM and SORTSIZE to a reasonable level so that everyone will play well together. Small jobs are almost instant. I’ll start testing with some large jobs next week.

I’ve stated before that sizing a server for a workgroup isn’t always easy. But with 16GB of RAM and four cores of compute power, you can run four to eight simultaneous users quite readily. When you think about the compute power you get with WPS on a server, factoring in the price versus our competition, it’s just so small business friendly and startup friendly.

Don’t forget there’s still time to get into the action to win a Google Nexus 7 Tablet. If you register to take out a WPS evaluation before September 30th, 2012, you will automatically be registered in the drawing for the tablet. Certain conditions apply so read the the earlier blog post for all the details. You can request a WPS evaluation by going to the MineQuest Business Analytics website at the WPS evaluation page.

Analytic Workstations – Conclusions

Continued from: Analytic Workstations – Part III

After going through a few days of tests and tuning on the new workstation, I can say I’m satisfied with the results. I’m sure I could squeak out a few more seconds here and there as it pertains to reductions in run times but I don’t think it is worth the additional effort.

Going forward, I think it would be an interesting exercise to examine the effect of swapping out the Intel I5 processor for an Intel I7. That would take the workstation from a four core to an eight core (i.e. four cores with hyperthreading) so CPU time should theoretically improve.

The other component to look at would be adding memory to the system so it has the full compliment that the motherboard can handle — 32GB’s. The additional memory would be useful for the large sort used in the benchmark as well as adding another 4GB to 8GB of memory to the level 1 cache.

Finally, the other enhancement that I’m loathe to do is overclock the CPU. I don’t think that’s a worthy exercise because how often do you see a corporate workstation over clocked? I’ve never seen one to be honest.

One issue that I did not mention in the previous blog posts is the advantage gained by using a Level 1 Cache like FancyCache and using SSD’s. By using FancyCache, I reduced the hits and the wear-and-tear on the SSD’s. If I don’t have to write to the SSD’s and can instead use the Level 1 Cache, my investment in Solid State Disks should last longer.

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.

Analytic Workstations – Part III

Analytic Workstations – Part III

After giving some thought on how I could improve some of the benchmarks that I presented in the previous blog post – Analytic Workstations Part II, I realized that I had three choices. I could opt for a faster processor such as an eight core Intel I7, add more memory or change out my I/O system where my temp work space resides.

Given the $500 budget and that I have $182 left to spend, the CPU upgrade would be prohibitive. I could swap out my 16GB of RAM and add 32GB of RAM but that too would put me over the $182 build budget. The CPU upgrade would cost about $260 and the memory upgrade around $200. That leaves me with a disk system upgrade that I can easily do with the $182 remaining in the kitty.

I decided to buy two 60GB SSD’s and put them in RAID 0. That would provide me with 112GB of temp storage space and since my motherboard has two available SATA III sockets available, it was an obvious choice. The two SSD’s cost me $65 each and I even get a $10 rebate. I had to also buy a tray for the two drives so my cost at this point is $438. Still under budget!

Installing the drives was pretty easy and below are the results from Anvil’s Storage Utilities for the RAID Array.

Drive F: Raid 0 2x60GB Level 2 Cache.

drive_f_ssd

 

Not too bad when you compare it against what was originally being used for the temp work space. So basically I get 980+ MB’s of read speed and 840+ MB’s of write speed. Contrast that with 186 MB’s per second Read and 168 MB’s per second Write in the old configuration. That’s a 5x increase in Read and Write speeds.

I reran the benchmarks with the FancyCache set at 4GB’s and a Deferred Write of 10 seconds. Using the new SSD’s for temp work space, here are the benchmark results along with the other tests done previously.

Record Count

No Cache

mm:ss:hh

Level 1

4GB Cache

mm:ss:hh

RAID 0

SSD’s with Level 1 Cache

1 Million ( 500 MB) Real

17.823

10.66

10.5

CPU

14.118

14.695

14.2

2 Million ( 1 GB ) Real

33.751

21.189

21.5

CPU

28.204

28.828

29.3

4 Million ( 2 GB ) Real

59.888

43.672

44.4

CPU

57.611

58.141

59.4

8 Million ( 4 GB ) Real

02:08.7

01:35.4

1:32.7

CPU

01:58.2

02:00.2

1:59.5

16 Million ( 8 GB ) Real

06:59.6

07:04.7

4.16.4

CPU

04:05.5

04:06.3

4:07.6

32 Million ( 16 GB ) Real

25:38.7

25:45.3

10:0.6

CPU

09:10.6

09:26.5

8:41.1

64 Million (32 GB ) Real

57:31.7

56:04.8

23:49.2

CPU

19:51.9

18:52.9

19:11.5

 

As you can see in the chart above, as we moved into larger dataset sizes that are 16 million records ( 8 GB ) and larger, the SSD’s in the RAID 0 array really show their stuff. In the dataset sizes of 8 million records and less, the data basically sits in the 4GB cache the whole time so we don’t see any improvement in performance for those datasets.

The reduction in run times for the 32 million record dataset is 2.5 times and the 64 million record dataset shows a proportionate number. That is, the program was able to execute the test script 2.4 times faster in a Level 1 Cached SSD environment. These are pretty amazing numbers.

Although I didn’t achieve my goal of creating a machine that could execute a benchmark where Real Time was always less than CPU Time, I did get pretty close. There is still room for experimentation with block size and Level 1 Cache size to try to increase the performance of the machine, but the return on the time needed to do this would probably be miniscule.

Overall, this has been an enlightening experience for me. I’ve been able to take a pretty vanilla workstation and tune it using both software and hardware and show what $500 can do in terms of upgrading your hardware. Across the board, for every size dataset that I tend to process, I significantly reduced processing time when running WPS.

I’m going to go over the numbers a bit more in the coming days and I will post a round-up of my thoughts and present some justifications for what might be done going forward to improve performance for this workstation. After all, I still have $62 left in my budget.

Continue Reading: Analytic Workstations – Conclusions

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.


Analytic Workstations – Part II

In the previous post, Analytic Workstations – Part I, I touched on my disappointment on the performance of a new workstation that I had just built. I expected a bit more “oomph” right out of the box, especially for the smaller size datasets. So in this post, I’m going to step through some performance tuning and see what kind of performance gains we can realize.

It’s obvious that this machine is being held back by I/O. With 16GB of RAM, I think we should try to make better use of that memory than by letting it just sit there waiting to be used by WPS or other processes. First, we need to find some software that makes use of some of that memory for disk caching. I’ve found two utilities that implement caching and they are SuperCache by SuperSpeed and FancyCache by Romex Software. For my testing and tuning, I’m going to use FancyCache. It is currently in beta test, so if you desire, you can download and give it a spin for 180 days to see if it is something that you are comfortable using.

After installing FancyCache, you are provide with a screen that allows you to configure the software. I’ve tested FancyCache using different block sizes and algorithms and I found that a block size of 4K with the LRU (least Recently Used) algorithm with 4GB of RAM works best for me. I also enabled Deferred Writes and set the latency to 10 seconds.

fancycache

After you set the configuration, you will have to reboot your workstation for FancyCache to utilize your new settings. Using the Anvil’s storage Utilities, lets see what kind of performance our D drive (used for WPS work space) is capable of.

anvil_d_fancycache

Now this is amazing! The performance improvement is out of sight. Compare the above chart with the baseline chart we did in the previous post for the same drive (see below).

anvil_d_nocache

Read speed is 136.9 times faster with FancyCache. Write speeds are an incredible 90 times faster. the 4K IOPS for Read is a whopping 700 times faster and for write it is 260 times faster. This is pretty breathtaking and shows just how slow a hard disk can be compared to reading and writing to memory.

As they say, the proof is in the pudding and it’s time to look at the benchmarks with FancyCache running on the workstation.

Record Count

No Cache

mm:ss:hh

Level 1

4GB Cache

mm:ss:hh

1 Million ( 500 MB) Real

17.823

10.66

CPU

14.118

14.695

2 Million ( 1 GB ) Real

33.751

21.189

CPU

28.204

28.828

4 Million ( 2 GB ) Real

59.888

43.672

CPU

57.611

58.141

8 Million ( 4 GB ) Real

02:08.7

01:35.4

CPU

01:58.2

02:00.2

16 Million ( 8 GB ) Real

06:59.6

07:04.7

CPU

04:05.5

04:06.3

32 Million ( 16 GB ) Real

25:38.7

25:45.3

CPU

09:10.6

09:26.5

64 Million (32 GB ) Real

57:31.7

56:04.8

CPU

19:51.9

18:52.9

By simply using a disk cache and utilizing system RAM in a more efficient manner, we are able to reduce processing time for datasets of eight million records and less. By looking at the chart above, we also see that Real Time is lower than CPU time for the first four datasets that are between one million and eight million records. I’m pretty sure at this point that I am CPU bound.

It appears that all the data for the first three trials are running directly out of the cache. I suspect that the fourth trial which has eight million records is getting flushed meaning some of the data is being written to the hard drive and staying in the cache. The datasets that have more than eight million records really don’t make use of the cache and run slightly (and insignificantly) slower than if there was no cache whatsoever.

For the most part I am satisfied with the improvement afforded by FancyCache. Probably 90 percent of all the processing I do is with datasets that are in the two to four million record range and perhaps 5% are in the eight million record range and as big as 4GB in size. But those last three trials, the ones that are 16 million to 64 million records in size are taunting me and are just begging to see what can be done to improve their performance. Plus, I still have $182 left to play with out of my $500 budget.

Continue to: Analytic Workstations – Part III

About the author: Phil Rack is President of MineQuest, LLC. and has been a SAS language developer for more than 25 years. MineQuest provides WPS and SAS consulting and contract programming services and a reseller of WPS in North America.