Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: SAS versus R - The longest discussion on Linkedin I have ever seen

Wednesday, February 22, 2012

SAS versus R - The longest discussion on Linkedin I have ever seen

Six months ago Oleg Okun asked the posed the following question to the Advanced Business Analytics, Data Mining and Predictive Modeling Group on Linkedin:

SAS versus R

Did anyone have to justify to a prospect/customer why R is better than SAS? What arguments did you provide? Did your prospect/customer agree with them? Why do you think, despite being free and having a lot of packages, R is still not a favorite in Data Mining/Predictive Analytics in the corporate world?

I responded to the original question and engaged in a some discussion as time has gone by. It has been fun and interesting. The range and breath of this discussion thread, and the number of participants is amazing. It has never gone stale and there are new contributor every day. The most recent topics include Oracle R Enterprise which was not even in existence when the question was originally posed by Oleg.

Here is a sample of the Discussion:

John Charnes • @Daniel Lieb -- Hope all's well with you, Dan. Yes, the dig on R has been limits on the size of data sets that it can handle. However, I listened to a webcast last week that described Oracle R Enterprise. The speakers described analyses of terabytes of data by running existing R scripts directly against data stored in Oracle Database 11g. I haven't used it yet myself, but is definitely worth checking out.

Ashish Patel • Support and security is the prime concern for Corporates when it comes to Open source tool adoption..
Last thing business want is tool failure resulting in Business bottom line impact...
Also Enterprisewide configuration and updates management plays crucial role too...
Ex: after Red Hat started providing support for Linux , enterprise adoption increased..

David Tussey • A side comment, I see Python and the various packages (SciPy, NumPy) as the emerging winner in this battle. Python is much easier to program than R, and more "data friendly" when it comes to file and database manipulation.

Greg Sterijevski • Some general comments on SAS vs R:

R: tons of estimation technique, experimental as well as tried and true ones
SAS: not as many techniques, a long experimental dev stage before a proc is considered finalized

Usage Consistency:
R: None or minimal, each package is its own point process
SAS: Very consistent use across procs and packages. If a class statement is supported by a proc, then it works identically to the class statement in any other proc. Concommitantly, all elements of a technique are typically fleshed out before becoming production. R might have an estimation technique 'XYZ', in SAS that technique is typically not considered finished unless it implements a slew of ancillary functionality (like hypothesis testing, parameter restriction, hccme covariance matrices)

R: pretty good coverage for base packages
SAS: a humongous test library accumulated over 20+ years. A consistent, if not complete, testing philosophy.

Numeric Consistency:
R: Not sure, have only run R on Win and Linux
SAS: Numerically 'close' results on platforms from mainframe to PC. Furthermore, an almost slavish insistence on numeric results not changing for a given technique from release to release. In other words, if your actions change the results (even insignificantly) of benched tests, you'd better have a very good reason.

Output reusability:
R: Excellent. The techniques typically yield a structure which whose members can be accessed in a very natural way. One can chain together strings of calcs to come up with much more complicated 'meta models' very easily.

SAS: Start of the art in 1980. I haven't kept up with their improvements in the last few years, but the only reasonable way to capture and reuse output use d to be an OUT= statement (if supported) or snatch output from the ODS. SAS/IML is a bit of standout in that it works a bit like R or Python, but the feature set is not as complete.

Overall, there is not a clear winner. If your client is looking for a supported, vetted analytics engine then SAS edges out the competition. If you are a startup with a bit more time than money, R or even an open source library like Apache Math Commons, Mahout, or NumPy will do. For the organization which is not afraid of bit of coding, the open source solution offers the ability to tailor their analytic system to the business need.
Alfredo Roccato • In the real world (I'm speaking of large commercial organizations) where 80%-90% of the time is spent in large scale data processing, SAS has proven to be a very efficient and flexible tool. In an academic contest, where most of time is spent in analysis, mainly dealing with toys data, no doubt that R is the preferred software. In my opinion these packages do not compete each other, even if there is a considerable overlap for statistical methodologies. Rather, a better communication would benefit both: you can use SAS for complex data manipulation and R for all the analyses written by the moltitude of its contributors.

If you are on Linkedin join the group and the "SAS versus R" thread.



  1. In recent days SAS technology place vital role to get prestigious job in IT industry, in my article i have briefly explained the importance of this certification course & its benefits, please watch.....
    SAS Institutes in Chennai|SAS Training in Chennai

  2. R runs out of memory (all data is stored in memory, where as SAS writes data sets to disk). So if you are handling anything larger than your RAM (8-32 GB?) you will have to be smart about how you do things in R, where as in SAS you can just run a transpose on a 100m row data set and "it just works".

    sas training in chennai