Your verification ID is: guDlT7MCuIOFFHSbB3jPFN5QLaQ Big Computing: Data Mining is not new or scary. Target can predict if your pregnant. Walmart can predict when you will buy Pop Tarts and Beer. MIT students can Predict if you are straight or gay

Monday, February 27, 2012

Data Mining is not new or scary. Target can predict if your pregnant. Walmart can predict when you will buy Pop Tarts and Beer. MIT students can Predict if you are straight or gay

Data Mining is not new or scary. Humans have been collecting information and using that information to better understand human behavior since before recorded history. The only change is that as data storage and computer processing speed has increased the ability of Data Miners to study larger data sets and use more complex model has increased. The UPC codes that are scanned at checkout were put there to allow stores to collect data on their customers and the shopper cards we all use were not created to help us but to allow stores to better track our purchases and behaviors.

The first main stream article I can remember about this was about Walmart addressing the needs of its customers before a major storm. By analyzing the data of stores before major storms Walmart learned that people bought Pop Tarts and Beer. In fact they bought Pop Tarts and Beer at seven times the normal purchasing rate. It is interesting to note they did not just buy Pop Tarts, but Strawberry Pop Tarts. I guess Strawberry just goes better with Beer. Walmart used this information combined with weather data to ship massive amounts of Pop Tarts and Beer to their stores in advance of major storms. The result was the stores did not run out of these products as they had in the past and Walmart increased their revenue.

Here is a Link to that article: What Walmart Knows about their Customers

The recent Target story is really not very different than that the Walmart article of five years before. The additional information that Target had to use as they sifted through the data was that they had the individual customer data. This additional level of detail was achieved by getting customers to use a Target credit card or a shopper card. That enabled Target to link the purchases of particular items to a customer they had information about (age, location, income, etc.). This type of personal information allows retailers to direct promotions to specific customers ( in this case pregnant women) in an effort to increase and maximize revenue.

Here is a Link to that article: How Companies Learn Your Secrets

Retailers have improved their predictive analytics massively in the last decade. Where retailers used to be concerned with figuring out what customers in a region would purchase now are working on what an individual consumer wants to purchase before they walk in the store or go online. Retailers are getting very accurate and specific. A good example of this was the Netflix prize which was an open predictive analytics competition that produced an improved movie recommender for Netflix customer.

None of these efforts are evil in terms of what they were trying to achieve. The goal is to accurately identify the wants and needs of their customers which would result in greater revenues for the retailerand better service for the customer. That is not a bad goal.

Providing the goods people want, and not the ones they don't was a key issue for 7-11 stores. They have achieved this through years of data point of sale collection with great effect. Here is a case study on data analytics and 7-11. Gone are the days of stale out of date food at 7-11 that no one will ever buy.

However, there is a potential dark side to all this data collection and predictive analytics on it. It can be used in to discover things about individuals that the did not want or agree to reveal to others. It can be abused.

A Research paper by two MIT students showed that by examining the friends of a person on Facebook one could predict the sexual orientation of that person. The problem arises when that individual is not ready to expose their sexuality to the outside world. Similar to when Target revealed to a father of a teenage girl that she was pregnant. That young girl may not have wanted her father to be aware of her pregnancy. Here is a link to the MIT paper.

Recently the American Civil Liberties Union has expressed concerns about the data collected from traffic light cameras. Apparently this data becomes available to both the government and the company the government engages to collect the data.  It also can become available to anyone who requests it through the Freedom of Information Act. Their concern is that here data is being collected without the permission of the individual. This is different from the Target situation because customers were providing their personal information to Target. This is also the case in License plate scanning that is done by a number of cities and towns in Connecticut. The ACLU of Connecticut has filed suit to force the towns to periodically purge the data and have reasonable controls on it for privacy.

Data mining and predictive analytics are here to stay because they are powerful and useful tools to the organizations that use them. In many cases they provide insights and results that not only benefit the organizations that employ them but to the community at large. If you feel that you do not want to be part of this system than do not participate. How? That is easy.  Buy everything with cash. Do not uses any frequent shopper cards. Do not use/own a cell phone, and stay away from the internet. There is a price to opting out of the system just are there is a price for choosing to be part of the system, but it can be done.


  1. Hey, nice site you have here! Keep up the excellent work!

    Data Mining Company in Chennai

  2. Hi there, awesome site. I thought the topics you posted on were very interesting.
    Data Mining Company in Chennai