Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: May 2015

Sunday, May 31, 2015

Coursera Data Science compared to Data Camp courses for R

Recently Data Camp has really expanded their offering fof R tutorials. So I thought the time has come to re-look at the courses offered on Coursera for Data Science compared to those offered on Data Camp. This review is solely based on my own experience.

I took the courses in the Coursera Data Science series last year. Usually I enrolled in a class or two at a time. Each course in the nine course series was a month long and consisted of a series of lectures, some quizzes and one or two projecst which had the student work on a piece of data and submit the results for evaluation that could include automated grading or peer review. The courses started with setting up R, Github and Rstudio. The courses then go on to cover data visualization, data manipulation, regression and some machine learning. I found the courses to be a good base overview of the skills and tools needed to work in R as a data scientist. My greatest concern is the hardest parts are the first few classes that set everything up. After that I found the classes to be pretty easy. In fact, I am concerned that many of the students who fail to finish this series do so because they can not even get started. The forums are a good source of information and help on stuff. I found them really important when I got stuck. The other down side was the peer review of your projects. I found few reviewers spent much time and effort on this part of the class, and their reviews were in many cases not helpful or just plain wrong. There was a case where I did a project incorrectly yet all my reviewers gave me full credit, and I had another project that I did differently but properly than many of the other students, but received poor reviews because what I did did not look like the projects of my reviewers.

The Data Camp courses are different from the Coursera classes in that you are running R in the Data Camp environment. This is a benefit because you do not have to go through the work of setting us all the things you need to do this work, but has the same down side that you really are only learning how to do this stuff on the data camp site and not in the real world. I did enjoy the interactive and step by step method of learning examples that is the core of the data camp method. I did not like that the interface really requires the work you do to be in the exact format that the teacher used. This could be very vexing at times.

At this stage I would more strongly recommend the Coursera class because they really get you ready to do real work. However, if you are frustrated or getting stuck with the Coursera series. Do some modules on Data Camp. It will function as a remedial trainer and up your skill and confidence to take on more challenging and more independent tasks.

Friday, May 29, 2015

Fake Data used to make fake research papers and studies

Recently there has been a lot of attention given to Michael LaCour, a UCLA graduate student, making up data and some other thing in a widely reference academic paper. It is certainly embarrassment for the other authors and the publishers of the article. I think it is time to be honest this issue is not a rare problem and that has been well known for some time.

The first time I heard about faked research paper was in a blog by Andrew Gelman. I later heard him give a talk on the same topic. His methods with quickly able to vette out a number of Chinese studies that were obviously faked. I think this is where the current system of research papers falls down. It only meets the standard of peer review. There is no systemized effort to analyze or reproduce the results and finding of the study. Peer review is not a high enough standard because we are at our core human which means prone to error and our own bias. We need to up the standard of acceptance in scholarly articles.

Recent Blog post by Gelman on Research Fraud (there are others)

It is even better that Jeff Leek in involved in this discussion with Gelmen

"I had a brief email exchange with Jeff Leek regarding our recent discussions of replication, criticism, and the self-correcting process of science.
Jeff writes:
(1) I can see the problem with serious, evidence-based criticisms not being published in the same journal (and linked to) studies that are shown to be incorrect. I have been mostly seeing these sorts of things show up in blogs. But I’m not sure that is a bad thing. I think people read blogs more than they read the literature. I wonder if this means that blogs will eventually be a sort of “shadow literature”? 
(2) I think there is a ton of bad literature out there, just like there is a ton of bad stuff on Google. If we focus too much on the bad stuff we will be paralyzed. I still manage to find good papers despite all the bad papers. 
(3) I think one positive solution to this problem is to incentivize/publish referee reports and give people credit for a good referee report just like they get credit for a good paper. Then, hopefully the criticisms will be directly published with the paper, plus it will improve peer review.
A key decision point is what to do when we encounter bad research that gets publicity. Should we hype it up (the “Psychological Science” strategy), slam it (which is often what I do), ignore it (Jeff’s suggestion), or do further research to contextualize it (as Dan Kahan sometimes does)?
OK, I’m not planning to take that last option any time soon: research requires work, and I have enough work to do already. And we’re not in the business of hype here (unless the topic is Stan). So let’s talk about the other two options: slamming bad research or ignoring it. Slamming can be fun but it can carry an unpleasant whiff of vigilantism. So maybe ignoring the bad stuff is the better option. As I wrote earlier:
Ultimately, though, I don’t know if the approach of “the critics” (including myself) is the right one. What if, every time someone pointed me to a bad paper, I were to just ignore it and instead post on something good? Maybe that would be better. The good news blog, just like the happy newspaper that only prints stories of firemen who rescue cats stuck in trees and cures for cancer. But . . . the only trouble is that newspapers, even serious newspapers, can have low standards for reporting “cures for cancer” etc. For example, here’s the Washington Post and here’s the New York Times. Unfortunately, these major news organizations seem often to follow the “if it’s published in a top journal, it must be correct” rule.
Still and all, maybe it would be best for me, Ivan Oransky, Uri Simonsohn, and all the rest of us to just turn the other cheek, ignore the bad stuff and just resolutely focus on good news. It would be a reasonable choice, I think, and I would fully respect someone who were to blog just on stuff that he or she likes.
Why, then?
Why, then, do I spend time criticizing research mistakes and misconduct, given that it could even be counterproductive by drawing attention to sorry efforts that otherwise might be more quickly forgotten?
The easiest answer is education. When certain mistakes are made over and over, I can make a contribution by naming, exploring, and understanding the error (as in this famous example or, indeed, many of the items on the lexicon).
Beyond this, exploring errors can be a useful research direction. For example, our criticism in 2007 of the notorious beauty-and-sex-ratio study led in 2009 to a more general exploration of the issue of statistical significance, which in turn led to a currently-in-the-revise-and-resubmit-stage article on a new approach to design analysis.
Similarly, the anti-plagiarism rants of Thomas Basbøll and myself led to a paper on the connection between plagiarism and ideas of statistical evidence, and another paper storytelling as model checking. So, for me, criticism can open doors to new research.
But it’s not just about research
One more thing, and it’s a biggie. People talk about the self-correcting nature of the scientific process. But this self-correction only happens if people do the correction. And, in the meantime, bad ideas can have consequences.
The most extreme example was the infamous Excel error by Reinhardt and Rogoff, which may well have influenced government macroeconomic policy. In a culture of open data and open criticism, the problem might well have been caught. Recall that the paper was published in 2009, its errors came to light in 2013, but as early as 2010, Dean Baker was publicly asking for the data.
Scientific errors and misrepresentations can also have indirect influences. Consider …, where Stephen Jay Gould notoriously… And evolutionary psychology continues to be a fertile area for pseudoscience. Just the other day, Tyler Cowen posted, on a paper called “Money, Status, and the Ovulatory Cycle,” which he labeled as the “politically incorrect paper of the month.”
The trouble is that the first two authors are Kristina Durante, Vladas Griskevicius, and I can’t really believe anything that comes out of that research team, given they earlier published the ridiculous claim that among women in relationships, 40% in the ovulation period supported Romney, compared to 23% in the non-fertile part of their cycle. (For more on this issue, see section 5 of this paper.)
Does publication and publicity of ridiculous research cause problems (besides wasting researchers’ time)? Maybe so. Two malign effects that I can certainly imagine coming from this sort of work are (a) a reinforcing of gender stereotypes, and (b) a cynical attitude about voting and political participation. Some stereotypes reflect reality, I’m sure of that—and I’m with Steven Pinker on not wanting to stop people from working in controversial areas. But I don’t think anything is gained from the sort of noise-mining that allows researchers to find whatever they want. At this point we as statisticians can contribute usefully be stepping in and saying: Hey, this stuff is bogus! There ain’t no 24% vote swings. If you think it’s important to demonstrate that people are affected in unexpected ways by hormones, then fine, do it. But do some actual scientific research. Finding “p less than 0.05″ patterns in a non representative between-subjects study doesn’t cut it, if your goal is to estimate within-person effects.
What about meeeeeeeee?
Should I be spending time on this? That’s another question. All sorts of things are worth doing by somebody but not necessarily by me. Maybe I’d be doing more for humanity by working on Stan, or studying public opinion trends in more detail, or working harder on pharmacokinetic modeling, or figuring out survey weighting, or go into cancer research. Ir maybe I should chuck it all and do direct services with poor people, or get a million-dollar job, make a ton of money, and then give it all away. Lots of possibilities. For this, all I can say is that these little investigations can be interesting and fruitful for my general understanding of statistics (see the items under the heading “Why then” above). But, sure, too much criticism would be too much.
“Bumblers and pointers”
A few months ago after I published an article criticizing some low-quality published research, I received the following email:
There are two kinds of people in science: bumblers and pointers. Bumblers are the people who get up every morning and make mistakes, trying to find truth but mainly tripping over their own feet, occasionally getting it right but typically getting it wrong. Pointers are the people who stand on the sidelines, point at them, and say “You bumbled, you bumbled.” These are our only choices in life.
The sad thing is, this email came from a psychology professor! Pretty sad to think that he thought those were our two choices in life. I hope he doesn’t teach this to his students. I like to do both, indeed at the same time: When I do research (“bumble”), I aim criticism at myself, poking holes in everything I do (“pointing”). And when I criticize (“pointing”), I do so in the spirit of trying to find truth (“bumbling”).
If you’re a researcher and think you can do only one or the other of these two things, you’re really missing out."

Article from Buzz Feed on Michael LaCour

"A study claiming that gay people advocating same-sex marriage can change voters’ minds has been retracted due to fraud. 
What’s more, the funding agencies credited with supporting the study deny having any involvement.
The study was published last December in Science, and received lots of media attention (including from BuzzFeed News). It found that a 20-minute, one-on-one conversation with a gay political canvasser could steer California voters in favor of same-sex marriage. Not only that, but these changed opinions lasted for months and influenced other people in the voter’s household, the study found.
Donald Green, the senior author on the study, retracted it shortly after learning that his co-author, UCLA graduate student Michael LaCour, had faked the results of surveys supposedly taken by voters. On Thursday afternoon, Science posted an official retraction, citing funding discrepancies and “statistical irregularities.”
“I am deeply embarrassed by this turn of events and apologize to the editors, reviewers, and readers of Science,” Green, a professor of political science at Columbia University, said in his retraction letter to the journal, as posted on the Retraction Watch blog.
“There was an incredible mountain of fabrications with the most baroque and ornate ornamentation. There were stories, there were anecdotes, my dropbox is filled with graphs and charts, you’d think no one would do this except to explore a very real data set,” Green told Ira Glass, host of the This American Life radio program, last week. This American Life had featured the study in an episode in April.
“I stand by the findings,” LaCour told BuzzFeed News by email. He also said he will provide “a definitive response” by May 29.
The problems came to light after three other researchers tried, and failed, to replicate the study. David Broockman, of Stanford, Joshua Kalla, of the University of California, Berkeley, and Peter Aronow of Yale found eight statistical irregularities in the data set. No one of these would by itself be proof of wrongdoing, they wrote, but all of them collectively suggest that “the data were not collected as described.” 
Broockman, Kalla, and Aronow told Green about the paper’s “irregularities” and sent him a summary of their concerns. According to his retraction letter, Green then contacted Lynn Vavreck, LaCour’s adviser at UCLA, who confronted him. LaCour couldn’t come up with the raw data of his survey results. He claimed that he accidentally deleted the file, but a representative from Qualtrics — the online survey software program he used — told UCLA that there was no evidence of such a deletion. What’s more, according to what Green told Politico, the company didn’t know anything about the project and “denied having the capabilities” to do the survey.
Vavreck also asked LaCour for the contact information of the survey respondents. He didn’t have it, and apparently confessed that he hadn’t used any of the study’s grant money to conduct any of the surveys. 
What happened, apparently, is that people from the Los Angeles LGBT Center — more than 1,000 volunteers, according to This American Life — really did go out and talk to people about same-sex marriage; it’s just that those people were never actually surveyed about their opinions. As one of the canvassers told Ira Glass: “LaCour gave them lists of people he claimed to have signed up for the online survey. Then canvassers did their jobs and went to those houses. This took hundreds of hours.” 
David Fleischer, a leader of the LGBT Center, sent BuzzFeed News a statement about the study: 
“We were shocked and disheartened when we learned yesterday of the apparent falsification of data by independent researcher Michael LaCour,” Fleischer stated. 
“We are not in a position to fully interpret or assess the apparent irregularities in the research as we do not have access to the full body of information and, by design, have maintained an arms-length relationship with the evaluation of the project,” Fleischer added. “We support Donald Green’s retraction of the Science article and are grateful that the problems with LaCour’s research have been exposed.”
In the study’s acknowledgements, LaCour states that he received funding from three organizations — the Ford Foundation, Williams Institute at UCLA, and the Evelyn and Walter Haas, Jr., Fund. But when contacted by BuzzFeed News, all three funders denied having any involvement with LaCour and his work. (In 2012, the Haas, Jr. Fund gave a grant to the Los Angeles LGBT Center related to their canvassing work, but the Center said that LaCour’s involvement did not begin until 2013.) Science cited this fabrication in its official retraction
There are at least two CVs that were reportedly published on LaCour’s website but have since been taken down. Both list hundreds of thousands of dollars in grants for his work. One of these listings, a $160,000 grant in 2014 from the Jay and Rose Phillips Family Foundation of Minnesota, was made up, according to reporting by Jesse Singal at The Science of Us.

Political scientists are shocked and disappointed by the news of the fabrication, especially because the study was so celebrated.

“The whole episode is tragic,” David Nickerson, an associate professor of political science at Notre Dame, told BuzzFeed News. 
The tainted study was some of the strongest evidence to date for the 60-year-old “contact hypothesis,” which says that the best way to reduce prejudice against individuals in a minority group is to boost interactions between them and the majority. 
“It’s pretty clear that what you think about the world, policy issues, can be shaped by who you come into contact with,” Ryan Enos, an assistant professor of government at Harvard, told BuzzFeed News. 
Before LaCour and Green’s study, there was a lot of survey-based evidence for the contact hypothesis. In a study in 2011, for example, Gregory Lewis from Georgia State University compiled data from 27 national surveys and found that “people who know LGBs are much more likely to support gay rights.” And last year, a study by Andrew Flores of UCLA found that the higher the population of gay people in legislative districts, the more likely those districts will support rights for same-sex couples. 
The trouble with these survey-based studies, though, is that it’s impossible to determine causality: Does having gay friends make you more supportive of them, or does being supportive make you more friendly? 
“That’s a very, very difficult hypothesis to tease out using plain old survey data,” Patrick Egan, a political scientist at NYU, told BuzzFeed News. LaCour and Green’s study, in contrast, was a field experiment that could compare the opinions of the same group of people before and after having contact with a gay person. 
That’s why the study’s fabrication is so disappointing, Egan said. “It’s a real loss to knowledge that we don’t actually have real data coming out of this experiment.”

Same-sex marriage advocates say they will still be pushing this kind of “field persuasion.”

It’s “really disheartening to see that someone apparently tainted a study,” Marc Solomon, national campaign director of Freedom to Marry, told BuzzFeed News by email. But this approach has long been a key component of strategy for the LGBT movement, and there is other evidence for it beyond this one study, he said.
In Maine, for example, the organization found that about one-quarter of opponents to same-sex marriage became more supportive after having an in-depth conversation. Freedom To Marry has worked closely with social scientists “to ensure that we can prove that what we’re doing works!” Solomon added. “The efficacy of it has been proven multiple times.”"

Wednesday, May 20, 2015

Evidence that Data Scientists should never be singers: The Overfitting music video

Finally irrefutable proof that Data Scientists should never become entertainers. Yes the evil that is the overfitting song has been released upon us. The words are scary and fitted over the overplayed Michael Jackson song Thriller. I know that many hang onto the magic song that was SVD, but I promise you that that song was purely an outlier in the realm of geek songs. The Overfitting song is more the norm.

So if you are a data scientist, predictive modeler or simple R programmer please do not sing! Just build better models and try not to overfit.

The Importance of Time when Building Cohort for Healthcare Data

In Temporal Relativity in Cohort Builds video Dr Eran Bellin explains this important feature in correctly build a cohort. The simple fact is that doing this correctly can produce very different results than doing this incorrectly. As Dr Bellin says himself,"It is important to understand the temporal relationships between condition lines used to build a cohort. Further, the index event line is especially important to identify the condition line whose event is going to define the index date of the cohort. By recapitulating a study from the medical literature we will demonstrate how a temporally aware cohort object is built."

Dr Bellin is an authority on how to do predictive analytics on historical patient data. He has been doing it for over 20 years. This video is one in a series of video he produced. He also wrote the book Riddles in Accountable Healthcare: A Primer to Develop Analytic Intuition for Medical Homes and Population Health. This book is a critical read for anyone trying to understand how to do predictive analytics on historical patient data.

Tuesday, May 19, 2015

Anatomy of a Simple Multiple Event Cohort

Building a multiple event cohort can be essential in doing any sort of analytic of patient data in healthcare. Here Dr Bellin does gives an example of building a simple multiple event cohort with patient data.

Dr Bellin is a thought leader in Healthcare Analytics having worked with patient data for over 20 years. His work has improved the performance of healthcare systems and the outcomes of patients. He also recently published a book on Healthcare Analytics called Riddles in Accountable Care.

This is the second video in the series. I posted the first video earlier. Here is the Link to that video.

Monday, May 18, 2015

Video of Max Kuhn's talk at the NYC Data Science Academy on Applied Predictive Modeling

While I was working on another post I can across this video of Max Kuhn giving a talk to a Meetup of Data Scientists about his book Applied Predictive Modeling. The video itself is quite long running just over an hour. However, Max's talks tend to be well worth it. The cover a variety of classification model which are the basis of his R package called Caret.

Link to Slides from Max Kuhn's talk on Random Forest and Caret

Sunday, May 17, 2015

Video slides of the R Caret package including Random Forest (RF)

By far the most visited page on my blog is the example of Random Forest that I posted about a year ago. I wrote it when I was taking the Coursera Data Science Classes which use the Caret package in their Machine Learning Section. A few months back Max Kuhn, the creator of the Caret Package, gave a talk for the Orange County R Users group on Caret which was recorded. I am posting it here for anyone who is interested in a deeper drive into Caret because Caret does so much more than just Random Forest (RF). It is fairly lengthy at about an hour long, but well worth it.

Also here is the link to my simple example of using Caret for Random Forest on the Iris data set.

Caret is so  much more than just Random Forest. It can do a lot of preprocessing with things like centering and Scaling. Also there are almost 200 other classification models in Caret other than Random Forest. Caret has really become the deficit tool for establishing the baseline for the predictive power of a dataset and building out a superior parsimonious model.

Saturday, May 16, 2015

Video presentation example of Analysis in Healthcare: The Cohort Paradigm

Recently Dr Eran Bellin did a series of presentations on Analytical approaches to data in Healthcare. Those include the use of the Cohort Paradigm. Dr Bellin is the author of Riddles in Accountable Healthcare. Dr Bellin is probably the most experience practitioner of Analytics to improve outcome having practiced a data driven approach to healthcare in a hospital environment for over 20 years.

Friday, May 15, 2015

Coursera's Machine Learning class by Andrew Ng of Stanford University

Recently I took the Machine Learning Course offered on Coursera's website which is taught by Andrew Ng of Stanford university. I am really pleased with what Professor Ng has done here. His course covers a lot of material in a very accessible and understandable way that makes this course useful to non-experts and serves as a nice refresher to those who work in the field. I also really like his teaching style.

I find that I only like about 50% of the Coursera classes that I look at or take. I am not sure if that is a good yield or not. It is just my experience. I am a big fan of the Data Science track which is taught by a group of Professors out of the Bloomberg School of Health at the John Hopkins. It is a really great series of classes taught well that truly give you the tools to start doing real work in the field of data science. If you are just getting started and really want to start work with data I would strongly recommend that track. It is based in R using RStudio and GitHub.

If there is one complaint that I have with the Machine Learning class it is that it is not taught in R. It is taught in Octive and Matlab. Not that those are not useful languages or environments to do this work. I just feel that for the type of people who might be taking this class R is more appropriate. I understand that I have an R basis, but there is a reason for this opinion beyond that. Within the Coursera classes in this area of interest R is the dominate language used so the use of R here would apply some consistency to the portfolio. My weakest link is learning new programming languages so not to need to do that is really helpful to me and allows me to spend more time learning the concept other then learning a new programming language. Secondly the student are more likely to be taught in R at their school and university. So by teaching basics of R on Coursera student will be further along the learning curve of R when they need to use it in practice.

Link to Coursera Machine Learning Class

Thursday, May 14, 2015

We need to keep what the Patriot did to the game balls in perspective

The Patriots and Tom Brady have had a tough week with the issue of the NFL's final report on the modifying of the game balls. The punishment came down with suspension of Tom Brady for four games, and the Patriots loss of the two draft picks over the next two years. The public comments on this have abounded many with everyone weighing in as to what the appropriate punishment should be and what kind of infraction this should be considered. These comments have been over the full range of suggestions with heavy weights at the extreme of the possibilities.

One of the comments I hear is that the infraction damages the integrity of the game. I do not believe this to be accurate. Attempts to modify the equipment is a long standing effort in sports that there are rules against. All those who get caught modifying the equipment should be punished but all those who have modified the equipment do not get caught. The key here was that Tom and the Patriots were caught breaking the rule to the preponderance of the evidence. This simply does not rise to the level of game fixing or point shaving and should not be treated as such an infraction.

Baseball is a sport known most for these types of equipment modifications. Batters use corked bats and pine tar to gain an illegal advantage while pitchers scuff balls or coat them with foreign substances to improve their results. These actions have not hurt those reputations or the sport's. Whitey Ford is probably the originator of the scuff baseball, and Gaylord Perry was ejected for throwing spitballs. Both men are respected in baseball and members of the Hall of Fame. 

In the NFL players modify the equipment as well to get an edge. Lineman use vaseline to make it harder for rushers to grab them, receivers use stickum to improve their ability to catch a ball, and quarterbacks modify the football to increase their grip. QB Brad Johnson and Receiver Jerry Rice admitted this. Lineman from the Denver Broncos and others were fined for this. The issue here is that the NFL does not catch enough people cheating. The odds of getting caught are too low in the NFL. The NFL must do a better job of finding people that cheating and punishing them appropriately. 

Tom Brady and the Patriots did what any competitive person and organization will do. They took whatever advantage they could even if it was illegal because the risk of getting caught was almost zero. Make enforcement better and keep giving out harsh punishments for cheating, and Tom Brady and the Patriots won't cheat. Neither will all the other teams and players that are cheating along with Tom and the Patriots.

Wednesday, May 13, 2015

Videos from the NYC R conference posted

The videos from the NYC R Conference have been posted on their website. Jared Lander did a great job putting this conference together. The speakers were awesome, and the amount of data science talent in the room was amazing. Basically this conference ended up being a gathering of the thought leaders of in Data Science on the East Coast. Sadly I was not able to go to the first half of the conference on Friday because I was working in Boston, but I was there all Saturday and it was impressive.

Here is a link to the website and the videos

The talks that I would most recommend watching from this conference from Andrew Gelman, Bryan Lewis and Hilary Parker

Tuesday, May 12, 2015

PGA selects to be the new title sponsor of the Desert PGA Tour event

Yesterday it was announced that will be the title sponsor of the PGA tour event in the Desert. This is really good news for the event as they needed a replacement for Humana that was exiting the event in 2015. However, I think it is an even more significant event for PGA in general because I believe is one of the first if not the first new economy companies to sponsor a major tour event. Traditionally events were sponsored by companies like American Express, Humana, Firestone, Chrysler and other mature businesses. Technology and club based companies like are no longer starving tech startups but established mature companies with solid multi-tiered marketing strategies that include thing like sponsorship of mainstream events. Also given that plays in the crowded job search market against companies like Linkedin, Monster, Ladders and other it is important that it differentiate itself from the other also rans in the space. Congratulations to the PGA and! I will see you on the course.

Here is a link to the announcement 

Here is the text of the announcement from the website:"

CareerBuilder Becomes New Title Sponsor of Former Bob Hope Classic

CareerBuilder Challenge will continue to partner with the Clinton Foundation and honor Bob Hope’s legacy

PONTE VEDRA BEACH, Fla. and LA QUINTA, Calif. (May 11, 2015) – The PGA TOUR, Desert Classic Charities and the Clinton Foundation announced today that CareerBuilder, the global leader in human capital solutions, is the new title sponsor of the former Bob Hope Classic after entering a six-year agreement that runs through 2021.
The newly named CareerBuilder Challenge will continue to honor and celebrate the legacy of Bob Hope and his longtime role as tournament host; and through its existing relationship with the Clinton Foundation, the tournament again will promote a health-focused theme in 2016. 
“We are extremely pleased to welcome CareerBuilder as the new title sponsor, joining the Clinton Foundation in support of a tournament that has been an important part of the PGA TOUR schedule for more than 55 years,” said PGA TOUR Commissioner Tim Finchem. “While long associated with the great Bob Hope as the tournament host, the CareerBuilder Challenge has the distinction of being one of only two PGA TOUR tournaments, along with the AT&T Pebble Beach National Pro-Am, in which amateurs play alongside the professionals during actual tournament competition. This is a unique event that also has had a significant charitable impact throughout the Coachella Valley since 1960. We look forward to working with CareerBuilder, the Clinton Foundation and Desert Classic Charities to continue to build upon this tradition.” 
Since its introduction in 1960 as the Palm Springs Golf Classic, the tournament has generated more than $56 million for numerous non-profit organizations in the Coachella Valley that enrich the lives of Valley residents. 
“CareerBuilder is excited to become the title sponsor of a tournament that is not only rich in tradition and sports excellence, but also has a strong commitment to philanthropy,” said Matt Ferguson, CEO of CareerBuilder. “This is a great venue to showcase our technology and evolution into a global HR SaaS company. We look forward to working with the PGA TOUR and the Clinton Foundation as we continue on our mission to empower employment around the world.”
The tournament underwent a transformation in 2012 when Humana and the Clinton Foundation joined forces with Desert Classic Charities and the PGA TOUR to solidify the tournament’s future and redefine it as not only a tournament, but a strategic platform to establish and communicate new initiatives in health and well-being, including a major conference hosted by former President Bill Clinton. 
“We have been proud to partner with the PGA TOUR and Desert Classic Charities to improve health and wellness in the Coachella Valley and beyond, and we are so pleased to continue this work through the CareerBuilder Challenge,” said Valerie Alexander, Chief Marketing Officer of the Clinton Foundation. “We look forward to being part of an exciting new chapter of this storied tournament, which will enable the community to enjoy world-class golf and give us the opportunity to help even more people live healthier lives.”
Also in 2012, the tournament adopted the current format of a four-day tournament with the first three rounds played in a pro-am format; the pro-am teams consisting of one professional and one amateur playing in groups of four; and for each day of the three round pro-am competition, the professional playing with a different amateur partner. While all amateurs compete in daily competitions as well as an overall, three-day competition, the experience became even richer for six of the amateurs beginning in 2014, as the top three gross and top three net leaders through three rounds now advance to compete during Sunday’s final round, playing individual stroke play. 
“Desert Classic Charities is thrilled CareerBuilder will be our title sponsor, only the third in the 56-year history of our tournament,” said John Foster, Chairman and President of Desert Classic Charities. “I know the people of the desert communities will join us in warmly welcoming this great company as we look forward to a long and beneficial partnership.”
The tournament continues to honor the memory of Bob Hope, who became the tournament host in 1965 and was a constant presence throughout the years until his passing on July 27, 2003 at age 100. One element of tribute is the Bob Hope Trophy that is awarded to the champion of the CareerBuilder Challenge.
“On behalf of our family and the Bob Hope Legacy, we thank the PGA TOUR and look forward to an exciting association with CareerBuilder and renewing our association with the Clinton Foundation,” said daughter Linda Hope. “Dad's spirit will continue to live on in the game he loved; in the Tournament which bore his name, and in the charities that benefited from the marriage of those two. Dad's dream is in good hands.”
In addition to replacing Humana as the tournament’s title sponsor, CareerBuilder is joining the PGA TOUR Official Marketing Partner program as the “Official Career Site of the PGA TOUR and Champions Tour.” 
About CareerBuilder
CareerBuilder is the global leader in human capital solutions, helping companies target and attract great talent. Its online career site,®, is the largest in the United States with more than 24 million unique visitors and 1 million jobs. CareerBuilder works with the world’s top employers, providing everything from labor market intelligence to talent management software and other recruitment solutions. Owned by Gannett Co., Inc. (NYSE:GCI), Tribune Company and The McClatchy Company (NYSE:MNI), CareerBuilder and its subsidiaries operate in the United States, Europe, South America, Canada and Asia. For more information, visit
About the Clinton Foundation
The Clinton Foundation convenes businesses, governments, NGOs, and individuals to improve global health and wellness, increase opportunity for women and girls, reduce childhood obesity, create economic opportunity and growth, and help communities address the effects of climate change. Because of our work, 26,000 American schools are providing kids with healthy food choices in an effort to eradicate childhood obesity; 36,000 farmers in Malawi have improved their incomes by more than 500 percent; 248 million tons of greenhouse gas emissions are being reduced in cities worldwide; more than 5,000 people have been trained in marketable job skills in Colombia; 8.2 million people have access to lifesaving HIV/AIDS medications; $200 million in strategic investments have been made, impacting the health of 75 million people in the U.S.; and members of the Clinton Global Initiative have made nearly 3,200 Commitments to Action to improve more than 430 million lives around the world. Learn more at, on Facebook at and on Twitter @ClintonFdn.
About Desert Classic Charities
Since its inception, Desert Classic Charities, the charitable entity that organizes the PGA TOUR event in the Coachella Valley, has contributed more than $56 million to a wide range of Coachella Valley charitable organizations and Eisenhower Medical Center. The scope of giving is broad and includes support for structured and mentoring programs for children, social services, and food and safe shelter for the less fortunate. Desert Classic Charities is dedicated to continuing its mission to serve human needs in the Coachella Valley and beyond by generating funds and opportunities every year through the event.
The PGA TOUR is the world’s premier membership organization for touring professional golfers, co-sanctioning more than 130 tournaments on the PGA TOUR, Champions Tour, Tour, PGA TOUR LatinoamĂ©rica, PGA TOUR Canada and PGA TOUR China. 
The PGA TOUR’s mission is to entertain and inspire its fans, deliver substantial value to its partners, create outlets for volunteers to give back, generate significant charitable and economic impact in the communities in which it plays, and provide financial opportunities for TOUR players.
Worldwide, PGA TOUR tournaments are broadcast to more than 1 billion households in 226 countries and territories in 32 languages. Virtually all tournaments are organized as non-profit organizations in order to maximize charitable giving. In 2014, tournaments across all Tours generated a record $140.5 million for local and national charitable organizations, after surpassing $2 billion in all-time charitable contributions early in the year. 
The PGA TOUR's web site is PGATOUR.COM, the No. 1 site in golf, and the organization is headquartered in Ponte Vedra Beach, FL.
Chris Smith
Jennifer Grasz
Craig Minassian
Clinton Foundation
Toby Zwikel / Brian Robin
Brener Zwikel & Associates, Inc.
818-462-5599 / 818-462-5610 /

CareerBuilder Media Contact
For all media inquiries and interview requests, contact:

Jennifer Grasz
(P) 773-527-1164

Monday, May 11, 2015

Slides from Bryan Lewis at the NYC R meetup on htmlwidgets visualizations

Bryan Lewis did a great presentation on htmlwidgets for the New York City R meetup. This presentation really shows the power of R to utilize other functionality in other languages like the graphics javascript libraries. I have been an advocate of visualizations for a long time because I believe they are a great way to communicate results and provide insight to lay people. The next critical step in the use of data for analytics is the democratization of the field. This can only be done by making it more accessible to non experts.

Here are the slides:

Thursday, May 7, 2015

R Meetup - Building an R package

Last night the Connecticut R Users Group had an Meetup on how to build an R package. It was a small group which was great because it was a very informal talk. I was amazed on how much had changed since Jared Lander gave us this talk in 2012. It has become downright easy to create an R package using the tools now available to everyone.

I did the talk using Rstudio, roxygen2 and devotools. It was quick taking about 15 minutes. In the talk itself since it was such a small group we created an R package in real time together. It was simple.

After the talk I found a video where he build an R package. It also is about 15 minutes long

I think in the future when we have such a small group we will record the talk because the walk through I was able to do was really valuable.