Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: In defense of the abuse of Statistics and their charts

Sunday, May 15, 2011

In defense of the abuse of Statistics and their charts

I took a Statistics class in 1990 at Cornell from a guy named Lionel Weiss. He started the first class by telling us he was going to teach us to "make numbers lie". His point was that we always need to be very careful in how we looked at data, pick our models and verify our result. It sounds simple and basic that it was hard for me to believe it even needed to be said in an undergraduate class. Boy was I a naive optimist.

The thing I failed to recognize at the time was most analysis is not done to figure out what is going on, but to support a position already taken by the analyst or the person the analyst works for. The results of this would be funny if they weren't so scarey.

Andrew Gelman wrote a post on his blog recently about research papers out of China which he got from the Statistics blog forum. While no conclusions are drawn from the study it is suggested that the Chinese researchers knew the conclusion they are supposed to get and therefore get an even better result than was had before. While it would be comforting for us to say that the problem exists over there and not here, I have heard enough stories to say that is not true. I recently talked to an individual the was working on a research paper with another person. The second person developed analysis that supported one of his own beliefs. The first researcher strongly disagreed and pointed to some major problems in the other researchers analysis. The result was the first researcher had his name removed from the paper, but the second researcher publish the paper anyway.

Junk charts not only goes after visualizations that are misleading because of poor representation choices, but also those that may have been chosen to be misleading. In this second group I would put this graphic posted on Junk charts. There are so many oversights in this chart comparison, and it is hard to argue this was oversight rather than an overzealous analyst trying to support a predetermined conclusion.

On the blog Numbers rule your world I saw this post comparing the life expectancy versus the number of retirement years. This has the problem of comparing averages of two disconnected things, but it is another example of sloppy work or a graph to put forth a desired result.

So more than 20 years later Lionel was right. We can make "numbers lie". However, if we want to produce useful results we must fight that urge to produce a result that supports are own bias. We should make that extra effort to refute those findings that support our own beliefs before we publish them to the world. Let the data speak to us instead of us telling the data what to say.

No comments:

Post a Comment