Thursday, December 22, 2011

Big Data/BI/BA

An a-ha moment came as I listened to Christmas music and looked for ways to free storage space on my Mac that is constantly running out of storage.

iTunes allows users to look at their song catalog by simple criteria such as:
  • name of the track
  • length of the track
  • album name
With the visualization of this information via iTunes, this vast pool of mp3 "data" suddenly showed a pattern and overlap in data. How? I have two albums performed by the Canadian Brass albums. For kicks (well, not really, I was trying to find a way to put an album cover) I sorted first by artist. Once I have that, I sorted by track. The two albums (luckily) uses generic name for their songs, so the songs are labeled "Track 01", "Track 02", and so forth.

If I sorted by track, I can see "Track 11" twice, one from each of the two albums. I noticed that they have the same size, and soon, I realized that these two albums are the same. Deleting one album will save me about 50M.

iTunes - visually sort through piles of "unstructured" data to find overlap (waste!)


So now I am starting to realize the field of:
  • Business Intelligence (BI) : what is happening - Wendy's is gaining business
  • Business Analytics (BA) : why is this happening - Wendy is shifting to healthier chicken
  • HP ex-CEO spends $12B on Autonomy

 My guess is that "big data" will require
  • collection of data from
    • social media & networks : what does Joe like, who influences him, what does he like
    • business : Jane responds well to coupons, dislikes emails with updates without discount
    • browsing : what is Ken researching, what time does he spend most looking at
  • some way to index the data, put structure, but simultaneously not hardwire structure such that it implies relationship
  • put into a database (relational, etc)
    • Oracle
    • IBM DB2 / Informix
    • SAP/Sybase
    • Microsoft SQL
  • storing it
    • large disk arrays (RAID)
    • caching localized flash storage
  • allowing access to it
    • cloud application 
    • cloud storage infrastructure - accessed by cloud app and non-cloud app
  • software to create structure from the data
  • software to analyze data to infer patterns, correlation, causality



No comments:

Post a Comment