My Take on Hadoop World 2011
- by Jean-Pierre Dijcks
I’m sure some of you have read pieces about Hadoop World and I did see some headlines which were somewhat, shall we say, interesting? 
  I thought the keynote by Larry Feinsmith of JP Morgan Chase & Co was one of the highlights of the conference for me. The reason was very simple, he addressed some real use cases outside of internet and ad platforms.  
  The following are my notes, since the keynote was recorded I presume you can go and look at Hadoopworld.com at some point… 
  On the use cases that were mentioned: 
   
    ETL – how can I do complex data transformation at scale 
     
      Doing Basel III liquidity analysis 
      Private banking – transaction filtering to feed [relational] data marts 
     
    Common Data Platform – a place to keep data that is (or will be) valuable some day, to someone, somewhere 
     
      360 Degree view of customers – become pro-active and look at events across lines of business. For example make sure the mortgage folks know about direct deposits being stopped into an account and ensure the bank is pro-active to service the customer 
      Treasury and Security – Global Payment Hub [I think this is really consolidation of data to cross reference activity across business and geographies] 
     
    Data Mining 
     
      Bypass data engineering [I interpret this as running a lot of a large data set rather than on samples] 
      Fraud prevention – work on event triggers, say a number of failed log-ins to the website. When they occur grab web logs, firewall logs and rules and start to figure out who is trying to log in. Is this me, who forget his password, or is it someone in some other country trying to guess passwords 
      Trade quality analysis – do a batch analysis or all trades done and run them through an analysis or comparison pipeline 
     
   
  One of the key requests – if you can say it like that – was for vendors and entrepreneurs to make sure that new tools work with existing tools. JPMC has a large footprint of BI Tools and Big Data reporting and tools should work with those tools, rather than be separate. 
  Security and Entitlement – how to protect data within a large cluster from unwanted snooping was another topic that came up. 
  I thought his Elephant ears graph was interesting (couldn’t actually read the points on it, but the concept certainly made some sense) and it was interesting – when asked to show hands – how the audience did not (!) think that RDBMS and Hadoop technology would overlap completely within a few years. 
  Another interesting session was the session from Disney discussing how Disney is building a DaaS (Data as a Service) platform and how Hadoop processing capabilities are mixed with Database technologies. I thought this one of the best sessions I have seen in a long time. It discussed real use case, where problems existed, how they were solved and how Disney planned some of it. 
  The planning focused on three things/phases: 
   
    Determine the Strategy – Design a platform and evangelize this within the organization 
    Focus on the people – Hire key people, grow and train the staff (and do not overload what you have with new things on top of their day-to-day job), leverage a partner with experience 
    Work on Execution of the strategy – Implement the platform Hadoop next to the other technologies and work toward the DaaS platform 
   
  This kind of fitted with some of the Linked-In comments, best summarized in “Think Platform – Think Hadoop”. In other words [my interpretation], step back and engineer a platform (like DaaS in the Disney example), then layer the rest of the solutions on top of this platform. 
  One general observation, I got the impression that we have knowledge gaps left and right. On the one hand are people looking for more information and details on the Hadoop tools and languages. On the other I got the impression that the capabilities of today’s relational databases are underestimated. Mostly in terms of data volumes and parallel processing capabilities or things like commodity hardware scale-out models. 
  All in all I liked this conference, it was great to chat with a wide range of people on Oracle big data, on big data, on use cases and all sorts of other stuff. Just hope they get a set of bigger rooms next time… and yes, I hope I’m going to be back next year!