Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: To pipe or not to pipe that is the question - Adventures in R

Friday, June 5, 2015

To pipe or not to pipe that is the question - Adventures in R

Recently I started using some of Hadley Wickham's new package like ggvis and dplyr. IN a number of examples he used pipe when he was presenting his code. What are pipes? Pipe is the "%>%" notation in code that allows the programmer to rewrite his code in a different format. The idea is that this re-writing of code make the code more readable.

At first I totally disagreed that this made my code more readable in fact I was of the opposite opinion. I thought it made my code unreadable. I have to admit this may be because I am so used to the brackets inside of brackets inside of brackets inside of brackets style of R code that I can just read it.

However now that I have had more time to get used to it and had time to reflect that when I first started using R I could not read the code easily. I had softened on the position somewhat. Let me show you:

Using the ggvis package the standard code would like this:

layer_points(ggvis(mtcars,x = ~wt,y = ~mpg,fill = ~disp,shape = ~factor(cyl)))

This code is only three sets of brackets but see how much movement of your eyes it takes to figure it out. First you read the data set "mtcars" then you read the package "ggvis" then the type of plot "layer_points" then the variables "wt" and "mpg" and finally the design elements "fill" and "shape".  This resulted in four scans of the code to find the elements. This is slow going, but not hard because I have been reading this type of code for years and kinda know where to look.

Now for the same code with pipes:

mtcars %>%
            ggvis(x = ~wt,y = ~mpg, fill = ~disp,shape = ~factor(cyl)) %>%
            layer_points()

So this code I read data set (first), package (second), function(last), variables(third) and design(fourth). Overall the is much more efficient on the eyes than the first example. Although I would point out that the way I read code the function is in the wrong place at the end of the code rather than third in the line in the progression. Overall I would encourage you to use pipe because in the end I am finding that my code is easier to read not only for me, but for the people I tend to present to who in general are not R programmers. In fact in many cases they are not even programmers.

Although the packages for pipe get loaded as a dependency for ggvis. It also can be down loaded on its own. It is available on CRAN as the  magrittr package.






1 comment:

  1. I have access to the presentation of the information you have presented, but it may be contrary to what I want, but I still follow it.



    ไพ่ป๊อกเด้ง

    ReplyDelete