Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Everything old in Data Science is new again - Graphs

Wednesday, October 1, 2014

Everything old in Data Science is new again - Graphs

20 years ago when I was still in school we used graphs heavily before we tried to fit a model. The data sets were pretty small, but we used the graphs to help select the model we were going to fit. The process of fitting a model at the time was time consuming and tedious so it was important to minimize the number of inappropriate models that you might try. I even remember having templates of graphs of data which basically said if your data looks like this do this.

As time went on, I continued to use graphs, but I confess I spent less time graphing the data in various ways then I did back then and more time experimenting with a suite of models. The result was that I lost some of the connection to the data that is the art and feel side of data science.

Lately I have been playing with RStudio's shiny app ( along with everyone else). The result has been that it brought back to me the importance of exploring your data graphically and introduced me to a lot of types of graphs that I had never seen or used. Many of these plots provide powerful insight into the structure and the nature of the data that I was working with that I do not believe I could have gotten any other way.

So I thought I might spend the next couple of weeks doing relative short blog posts on graph different types of data set with different types of graphs. I hope this simple review of one of the core elements of data science will be as helpful to you as it has been to me.

For more detail you can always depend on Hadley. His stuff is always great


2 comments: