iTunes allows users to look at their song catalog by simple criteria such as:
- name of the track
- length of the track
- album name
If I sorted by track, I can see "Track 11" twice, one from each of the two albums. I noticed that they have the same size, and soon, I realized that these two albums are the same. Deleting one album will save me about 50M.
![]() |
| iTunes - visually sort through piles of "unstructured" data to find overlap (waste!) |
So now I am starting to realize the field of:
- Business Intelligence (BI) : what is happening - Wendy's is gaining business
- Business Analytics (BA) : why is this happening - Wendy is shifting to healthier chicken
- HP ex-CEO spends $12B on Autonomy
My guess is that "big data" will require
- collection of data from
- social media & networks : what does Joe like, who influences him, what does he like
- business : Jane responds well to coupons, dislikes emails with updates without discount
- browsing : what is Ken researching, what time does he spend most looking at
- some way to index the data, put structure, but simultaneously not hardwire structure such that it implies relationship
- put into a database (relational, etc)
- Oracle
- IBM DB2 / Informix
- SAP/Sybase
- Microsoft SQL
- storing it
- large disk arrays (RAID)
- caching localized flash storage
- allowing access to it
- cloud application
- cloud storage infrastructure - accessed by cloud app and non-cloud app
- software to create structure from the data
- software to analyze data to infer patterns, correlation, causality
