Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Some Great R Users Meetups are coming up

Tuesday, October 25, 2011

Some Great R Users Meetups are coming up

If only I had a ton of frequent flier miles I would go to them all. However, I am running a little low so I won't be able to go to them all. Here are the ones that I think will be awesome:

Nov 3 (NYC Predictive Analytics) Hidden Markov Models in a Nutshell

Description: Hidden Markov Models (HMMs) have emerged as a powerful paradigm for modeling stochastic processes and pattern sequences. Originally, HMMs have been applied to the domain of speech recognition, and became the dominating technology. In recent years, they have attracted growing interest in automatic target detection and classification, computational molecular biology, bioinformatics, computational finance, mine detection, handwritten character/word recognition, and other computer vision applications. The purpose of this talk is to define HMM and its categories, present the corresponding underlying problems, and explain the step-by-step working of the most popular procedure for HMM parameter estimation: Baum-Welch algorithm.

Bio: Oualid Missaoui is researcher with Pipeline Financial Group, Inc. where he is in charge of developing data mining and pattern recognition based algorithmic trading framework. He received his Ph.D. in Computer Engineering & Science for his research in the fields of machine learning, landmine detection, and image processing, from University of Louisville (2010). He earned his engineering degree in Signal and Systems and M.Sc. in Applied Mathematics from Ecole Polytechnique de Tunisie (2003, 2005).

This group is one of my favorite groups to go to, and any time there is a talk on Markov I am there.

Nov 8 ( NYC R Meetup) Parallel R with Hadoop

R is free, open-source, and in many ways a data scientist's dream ... but it strains under new-age Big Data problems.  One solution is to use Hadoop's scalable, parallel computing framework to drive R.  In this talk, consultant and author of the forthcoming bookParallel R, Q Ethan McCallum will walk through the what, how, and why of getting R to dance with the elephant.
We will also have a lightning talk from JunHo Cho, who will introduce his tool RHive, which integrates R with Hive. 
Q recently co-wrote a book with Steve Weston of Rforeach fame. I am excited to read it when it comes out.

Nov 10 (Greater Boston R users) Teaching Statistics with Open Source Tools

Nicholas Horton, Associate Professor of Mathematics and Statistics at Smith College, will be presenting on how to ease the use of R in an academic environment. This talk is hosted by Gordon College and we know it is a bit out of town (and early in the day for some), but we hope you can attend. It will be a great talk for beginner R users or those who haven't made the switch to R, but want to!
Summary: Professor Horton will demonstrate the use of the mosaic package, which was created with instructors and students in mind, and to help facilitate the use of modeling in introductory statistics, science and calculus courses. He'll give an overview of these systems for use in introductory statistics courses and undergraduate research projects. No prior experience with R or the mosaic package necessary. Minor refreshments will be provided.
I am always interested in how people approach things with steep learning curves.

Nov 14 (DC R User Group) Moneyball Meets R: Sabermetrics with the MLB Pitch Data Set by Mike Driscoll

For our next meetup we'll have some fun with Mike Driscoll (fellow Data and R Geek, organizer of the Bay Area R meetup group, CTO of MetaMarketsO'Reilly Strata/OSCON speaker, and author of the"The Three Sexy Skills of Data Geeks" blog post) while he talks about the validation of Bill James’ sabermetrics approach to batting performance using 30 years of Major League Baseball statistics, and a derived predictor for batters’ salaries using R.
He will highlight R’s functional programming features, its compact syntax for statistical modeling, and its ease of connectivity with persistent data stores. This talk will emphasize techniques and approach over detail. 
I am a huge sabermetrics and Mike Driscoll fan. I saw Mike speak most recently at Strata NYC where he probably skipped the sabermetrics stuff because both the Mets and Yankees were already out of it.

Nov 15 (Boston Predictive Analytics) Big Data and Hadoop: Applications from Enterprises and Individuals

6:30 - 6:50:  Overview of Big Data and Hadoop:  Jeffrey Kelly, who is an industry analyst covering Big Data, will be presenting the state of the industry.  In addition to early adopting web-based companies, he will be covering a variety of "use cases" that are now occurring across more industry verticals.

6:50 - 7:00:  Web/Mobile and Big Data:  Sanjay Vakil, who is a technology manager at Trip Advisor, will be presenting past and current Big Data projects that their team have been developing.

7:00 - 7:30:  Enterprise Case Studies:  Rob Lancaster and Patrick Angeles of Cloudera, a company which provides enterprise solutions that extends upon Hadoop functionality, will be presenting a high-level overview of big data and associated applications.  Secondly, they will be presenting a variety of "use cases" including diving into technical details of Hadoop and related software.

7:30 - 8:00:  "Open Data" Project:  Satish Gopalakrishnan and Vineet Manohar will be presenting their Wikipedia / Hadoop project which they created as part of the Hack/Reduce event this past summer at Microsoft NERD.  Their computer program was voted the coolest hack using Hadoop with open data.

I love this short talk format and Hadoop is the hot buzz word of the year.