This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions of SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect.
Let us read the blog post in Vinod’s own voice.
I think the series from Pinal is a good one for anyone planning to start on Big Data journey from the basics. In my daily customer interactions this buzz of “Big Data” always comes up, I react generally saying – “Sir, do you really have a ‘Big Data’ problem or do you have a big Data problem?” Generally, there is a silence in the air when I ask this question. Data is everywhere in organizations – be it big data, small data, all data and for few it is bad data which is same as no data :). Wow, don’t discount me as someone who opposes “Big Data”, I am a big supporter as much as I am a critic of the abuse of this term by the people.
In this post, I wanted to let my mind flow so that you can also think in the direction I want you to see these concepts. In any case, this is not an exhaustive dump of what is in my mind – but you will surely get the drift how I am going to question Big Data terms from customers!!!
Is Big Data Relevant to me?
Many of my customers talk to me like blank whiteboard with no idea – “why Big Data”. They want to jump into the bandwagon of technology and they want to decipher insights from their unexplored data a.k.a. unstructured data with structured data. So what are these industry scenario’s that come to mind? Here are some of them:
Fraud detection: Banks and Credit cards are monitoring your spending habits on real-time basis.
Customer Segmentation: applies in every industry from Banking to Retail to Aviation to Utility and others where they deal with end customer who consume their products and services.
Customer Sentiment Analysis: Responding to negative brand perception on social or amplify the positive perception.
Sales and Marketing Campaign: Understand the impact and get closer to customer delight.
Call Center Analysis: attempt to take unstructured voice recordings and analyze them for content and sentiment.
Reduce Re-admissions: How to build a proactive follow-up engagements with patients.
Patient Monitoring: How to track Inpatient, Out-Patient, Emergency Visits, Intensive Care Units etc.
Preventive Care: Disease identification and Risk stratification is a very crucial business function for medical.
Claims fraud detection: There is no precise dollars that one can put here, but this is a big thing for the medical field.
Customer Sentiment Analysis, Customer Care Centers, Campaign Management.
Supply Chain Analysis: Every sensors and RFID data can be tracked for warehouse space optimization.
Location based marketing: Based on where a check-in happens retail stores can be optimize their marketing.
Price optimization and Plans, Finding Customer churn, Customer loyalty programs
Call Detail Record (CDR) Analysis, Network optimizations, User Location analysis
Customer Behavior Analysis
Fraud Detection & Analysis, Pricing based on customer
Sentiment Analysis, Loyalty Management
Agents Analysis, Customer Value Management
This list can go on to other areas like Utility, Manufacturing, Travel, ITES etc. So as you can see, there are obviously interesting use cases for each of these industry verticals. These are just representative list.
Where to start?
A lot of times I try to quiz customers on a number of dimensions before starting a Big Data conversation.
Are you getting the data you need the way you want it and in a timely manner?
Can you get in and analyze the data you need?
How quickly is IT to respond to your BI Requests?
How easily can you get at the data that you need to run your business/department/project?
How are you currently measuring your business?
Can you get the data you need to react WITHIN THE QUARTER to impact behaviors to meet your numbers or is it always “rear-view mirror?”
How are you measuring:
Supply Chain Efficiencies
Predictive product / service positioning
What are your key challenges of driving collaboration across your global business? What the challenges in innovation?
What challenges are you facing in getting more information out of your data?
Note: Garbage-in is Garbage-out. Hold good for all reporting / analytics requirements
Big Data POCs?
A number of customers get into the realm of setting a small team to work on Big Data – well it is a great start from an understanding point of view, but I tend to ask a number of other questions to such customers. Some of these common questions are:
To what degree is your advanced analytics (natural language processing, sentiment analysis, predictive analytics and classification) paired with your Big Data’s efforts?
Do you have dedicated resources exploring the possibilities of advanced analytics in Big Data for your business line?
Do you plan to employ machine learning technology while doing Advanced Analytics?
How is Social Media being monitored in your organization?
What is your ability to scale in terms of storage and processing power?
Do you have a system in place to sort incoming data in near real time by potential value, data quality, and use frequency?
Do you use event-driven architecture to manage incoming data?
Do you have specialized data services that can accommodate different formats, security, and the management requirements of multiple data sources?
Is your organization currently using or considering in-memory analytics?
To what degree are you able to correlate data from your Big Data infrastructure with that from your enterprise data warehouse?
Have you extended the role of Data Stewards to include ownership of big data components?
Do you prioritize data quality based on the source system (that is Facebook/Twitter data has lower quality thresholds than radio frequency identification (RFID) for a tracking system)?
Do your retention policies consider the different legal responsibilities for storing Big Data for a specific amount of time?
Do Data Scientists work in close collaboration with Data Stewards to ensure data quality?
How is access to attributes of Big Data being given out in the organization?
Are roles related to Big Data (Advanced Analyst, Data Scientist) clearly defined?
How involved is risk management in the Big Data governance process?
Is there a set of documented policies regarding Big Data governance?
Is there an enforcement mechanism or approach to ensure that policies are followed?
Who is the key sponsor for your Big Data governance program? (The CIO is best)
Do you have defined policies surrounding the use of social media data for potential employees and customers, as well as the use of customer Geo-location data?
How accessible are complex analytic routines to your user base?
What is the level of involvement with outside vendors and third parties in regard to the planning and execution of Big Data projects?
What programming technologies are utilized by your data warehouse/BI staff when working with Big Data?
These are some of the important questions I ask each customer who is actively evaluating Big Data trends for their organizations. These questions give you a sense of direction where to start, what to use, how to secure, how to analyze and more.
Any Big data is analysis is incomplete without a compelling story. The best way to understand this is to watch Hans Rosling – Gapminder (2:17 to 6:06) videos about the third world myths. Don’t get overwhelmed with the Big Data buzz word, the destination to what your data speaks is important.
In this blog post, we did not particularly look at any Big Data technologies. This is a set of questionnaire one needs to keep in mind as they embark their journey of Big Data. I did write some of the basics in my blog: Big Data – Big Hype yet Big Opportunity. Do let me know if these questions make sense?
Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL