Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: June 2015

Tuesday, June 9, 2015

R for Kids

I have been saying for years that someone should write an R for Kids book. No one has. I do not know why. Books on other software languages write for kids how been very successful. Python for Kids is number #8 python book title on Amazon. Only two Minecraft books sell better in the children software category. I do not believe that the majority of people who buy Python for Kids are kids or are even bought for kids. I think a lot of buyers are adults looking for a gentle introduction to a software language. Similar in concept to the "_______ for Dummies" series of books just less in your face about it.

It also steams me a little that Python has a book for kids but not R. R is a much larger community than Python. R has more users than Python. Not that I would ever enter in the R vs Python discussion, but R is just plain cooler than Python.

So I have never written a book. I have never done an online video, but I can not take it anymore. This void has to be filled. If JD Long (my first choice because of R in the voice of Dr Suess) or Hadley Wickham ( my second choice because well he is Hadley) won't step up to the plate I will. It may be a little rough at first, but I hope you will all bear with me and contribute any content that you feel is relavent. I will add a github for this as well as soon as it get going and I will post the link here as well.

Wish me luck and code with your kid.

Kirk

Friday, June 5, 2015

To pipe or not to pipe that is the question - Adventures in R

Recently I started using some of Hadley Wickham's new package like ggvis and dplyr. IN a number of examples he used pipe when he was presenting his code. What are pipes? Pipe is the "%>%" notation in code that allows the programmer to rewrite his code in a different format. The idea is that this re-writing of code make the code more readable.

At first I totally disagreed that this made my code more readable in fact I was of the opposite opinion. I thought it made my code unreadable. I have to admit this may be because I am so used to the brackets inside of brackets inside of brackets inside of brackets style of R code that I can just read it.

However now that I have had more time to get used to it and had time to reflect that when I first started using R I could not read the code easily. I had softened on the position somewhat. Let me show you:

Using the ggvis package the standard code would like this:

layer_points(ggvis(mtcars,x = ~wt,y = ~mpg,fill = ~disp,shape = ~factor(cyl)))

This code is only three sets of brackets but see how much movement of your eyes it takes to figure it out. First you read the data set "mtcars" then you read the package "ggvis" then the type of plot "layer_points" then the variables "wt" and "mpg" and finally the design elements "fill" and "shape".  This resulted in four scans of the code to find the elements. This is slow going, but not hard because I have been reading this type of code for years and kinda know where to look.

Now for the same code with pipes:

mtcars %>%
            ggvis(x = ~wt,y = ~mpg, fill = ~disp,shape = ~factor(cyl)) %>%
            layer_points()

So this code I read data set (first), package (second), function(last), variables(third) and design(fourth). Overall the is much more efficient on the eyes than the first example. Although I would point out that the way I read code the function is in the wrong place at the end of the code rather than third in the line in the progression. Overall I would encourage you to use pipe because in the end I am finding that my code is easier to read not only for me, but for the people I tend to present to who in general are not R programmers. In fact in many cases they are not even programmers.

Although the packages for pipe get loaded as a dependency for ggvis. It also can be down loaded on its own. It is available on CRAN as the  magrittr package.






Thursday, June 4, 2015

An example of using ggvis package in R with pipes for data visualization

This is a example of using the ggvis package with pipes. The ggvis package is a futher development of the ggplot package and designed around the grammar of graphics. This is a very easy to use package the combined with pipes provides asy to read code

First you will need to load install the ggvis package

install.packages("ggvis")

Then you will need to require that package

require(ggvis)
## Loading required package: ggvis

For the first example we are going to uses the build in R data set mtcars. Actually using ggvis without using pipes makes the code look kinda ugly

layer_points(ggvis(mtcars,x = ~wt,y = ~mpg,fill = ~disp,shape = ~factor(cyl)))


As you can see the second code chunk is much easier to read. First is what data set are you using, Second is what are you x and y values and the how points are going to presented (color, shape, etc). Finally it say what kind of graph this is (points, lines, etc). That is basically it.

Just for fun lets do a graph with the iris data set. This is kind fun because unlike the mtcars data set which is really a regression sdata set the iris data set is a classification data set. So we are not looking to see if we can find a trend line but rather how the variable define the groups. In this case that is what species is the iris (setosa, versicolor, virginica). So here is the plot:

iris %>%
  ggvis(x = ~Sepal.Length,y = ~Sepal.Width,fill = ~factor(Species)) %>%
  layer_points()


Pretty cool

Monday, June 1, 2015

Open Data Science Conference in Boston

This past weekend I went to the Open Data Science Conference in Boston. This was a really run conference with about 1,200 attendees. The concept behind it was to bring together the Open source tools to one conference so that these groups can collaborate and develop a better data science community rather than just ,say, an R or Python community.

In this effort I would have to say that the conference was successful. The was a significant number of both Python and R users there and the talks seemed evenly split between R and Python users. I will even admit that I went to a few Python talks because the topics were interesting.

As at most conferences the most valuable time may be away from the talk and structure of the conference itself. That was true for this conference. There were so many people that I have worked with over the years there that there were not enough break in the sessions to spend time with them all to catch up on what they are doing now. I will say that everyone I talked to is still working in the field and happy with the kinds of projects they are working on. A good sign of the health of the field.

I had a good time in Boston and look forward to returning to the conference next year. Meanwhile I believe they are going to post videos of the session online in the next week or so.