Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Figuring out the number of R users in comparison to SAS, SPSS and all

Friday, July 15, 2011

Figuring out the number of R users in comparison to SAS, SPSS and all

I have wrote about this problem before and have tried to come up with various approaches to come up with a prediction.  I have quoted the article  The Popularity of Data Analysis Software by Robert A. Muechen before. It is a great article, Muechen maintains it as a living document constantly updating its contents. If you have not looked at it in a while it is worth another look.

There are two areas of Muechen's article that I would like to talk about. They are Internet Searches and Job Postings. For R both of these types of measurables are problematic and may under report the number of Internet Searches or Jobs Posted.

A person can effectively search for SAS, SPSS, Strata, but not really for R. The problem is that R is a single character that is heavily used ( Toys R Us, R Kelly, R rated movies, etc). Therefore a number of the searches for R may be for other things or are more likely to have other terms in the search line than SAS, SPSS and all. This results in a disconnect in the search results no matter how you define it. I will say as the number of R users has grown in the last few years this has improved. I ran the Google Adwords for the website at Revolution Computing and this proved to be a challenging problem to crack. It took a lot of thought and refinement that would not be required if R had a more unique name. Another search term to stay away from in Google Adwords is BI. It does not always mean Business Intelligence.

Job searches have a similar issue, but with an added twist. Rarely do job posters list the requirement as solely an "R" programmer, and rarely do people describe their skill as "R" programmer. Again I ran into this problem while trying to find talent for Revolution Computing. Job Posters and Job Seekers often list their "R" skill as "R/S", "R/S+","R/SPlus" or "R/S-Plus" and numberous other permutations of that. Again R's single character name is problematic.

On twitter this has been addressed by using the Hashtag of Rstats instead of R. The difference in ease of access and elimination of confusion has been huge. If R would ever expand its name to something like Rstats it would radically improve the quality of searches for information, and help employers find employees. Just a thought.

2 comments:

  1. Interesting. For the first approach, I feel that instead of looking at "R" keyword, a possible approach could be to look at the popular combinations that R users use to search. If we can start with a list of popular keywords - like "R package", "R stats", "R forum", or "GARCH/XYZ model in R", and then look at related keywords (using Adwords tool) for the same, this would become a considerably large sample data set out of a total population of "R" related keywords. Then by extrapolation or using certain statiscal fitting techniques, the gross volume of R searches can be estimated.

    ReplyDelete
  2. I agree that would work for estimating the number of R users. The problem comes when you try to compare those estimates to the number of SAS or SPSS users. for Example "R Packages" is a great keyword combination for R, but "SAS Packages" or "SPSS packages" really does not work for SAS or SPSS.

    ReplyDelete