Search Results

Search found 67192 results on 2688 pages for 'excel external data'.

Page 2/2688 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • SQL SERVER – Advanced Data Quality Services with Melissa Data – Azure Data Market

    - by pinaldave
    There has been much fanfare over the new SQL Server 2012, and especially around its new companion product Data Quality Services (DQS). Among the many new features is the addition of this integrated knowledge-driven product that enables data stewards everywhere to profile, match, and cleanse data. In addition to the homegrown rules that data stewards can design and implement, there are also connectors to third party providers that are hosted in the Azure Datamarket marketplace.  In this review, I leverage SQL Server 2012 Data Quality Services, and proceed to subscribe to a third party data cleansing product through the Datamarket to showcase this unique capability. Crucial Questions For the purposes of the review, I used a database I had in an Excel spreadsheet with name and address information. Upon a cursory inspection, there are miscellaneous problems with these records; some addresses are missing ZIP codes, others missing a city, and some records are slightly misspelled or have unparsed suites. With DQS, I can easily add a knowledge base to help standardize my values, such as for state abbreviations. But how do I know that my address is correct? And if my address is not correct, what should it be corrected to? The answer lies in a third party knowledge base by the acknowledged USPS certified address accuracy experts at Melissa Data. Reference Data Services Within DQS there is a handy feature to actually add reference data from many different third-party Reference Data Services (RDS) vendors. DQS simplifies the processes of cleansing, standardizing, and enriching data through custom rules and through service providers from the Azure Datamarket. A quick jump over to the Datamarket site shows me that there are a handful of providers that offer data directly through Data Quality Services. Upon subscribing to these services, one can attach a DQS domain or composite domain (fields in a record) to a reference data service provider, and begin using it to cleanse, standardize, and enrich that data. Besides what I am looking for (address correction and enrichment), it is possible to subscribe to a host of other services including geocoding, IP address reference, phone checking and enrichment, as well as name parsing, standardization, and genderization.  These capabilities extend the data quality that DQS has natively by quite a bit. For my current address correction review, I needed to first sign up to a reference data provider on the Azure Data Market site. For this example, I used Melissa Data’s Address Check Service. They offer free one-month trials, so if you wish to follow along, or need to add address quality to your own data, I encourage you to sign up with them. Once I subscribed to the desired Reference Data Provider, I navigated my browser to the Account Keys within My Account to view the generated account key, which I then inserted into the DQS Client – Configuration under the Administration area. Step by Step to Guide That was all it took to hook in the subscribed provider -Melissa Data- directly to my DQS Client. The next step was for me to attach and map in my Reference Data from the newly acquired reference data provider, to a domain in my knowledge base. On the DQS Client home screen, I selected “New Knowledge Base” under Knowledge Base Management on the left-hand side of the home screen. Under New Knowledge Base, I typed a Name and description of my new knowledge base, then proceeded to the Domain Management screen. Here I established a series of domains (fields) and then linked them all together as a composite domain (record set). Using the Create Domain button, I created the following domains according to the fields in my incoming data: Name Address Suite City State Zip I added a Suite column in my domain because Melissa Data has the ability to return missing Suites based on last name or company. And that’s a great benefit of using these third party providers, as they have data that the data steward would not normally have access to. The bottom line is, with these third party data providers, I can actually improve my data. Next, I created a composite domain (fulladdress) and added the (field) domains into the composite domain. This essentially groups our address fields together in a record to facilitate the full address cleansing they perform. I then selected my newly created composite domain and under the Reference Data tab, added my third party reference data provider –Melissa Data’s Address Check- and mapped in each domain that I had to the provider’s Schema. Now that my composite domain has been married to the Reference Data service, I can take the newly published knowledge base and create a project to cleanse and enrich my data. My next task was to create a new Data Quality project, mapping in my data source and matching it to the appropriate domain column, and then kick off the verification process. It took just a few minutes with some progress indicators indicating that it was working. When the process concluded, there was a helpful set of tabs that place the response records into categories: suggested; new; invalid; corrected (automatically); and correct. Accepting the suggestions provided by  Melissa Data allowed me to clean up all the records and flag the invalid ones. It is very apparent that DQS makes address data quality simplistic for any IT professional. Final Note As I have shown, DQS makes data quality very easy. Within minutes I was able to set up a data cleansing and enrichment routine within my data quality project, and ensure that my address data was clean, verified, and standardized against real reference data. As reviewed here, it’s easy to see how both SQL Server 2012 and DQS work to take what used to require a highly skilled developer, and empower an average business or database person to consume external services and clean data. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQL Utility, T SQL, Technology Tagged: DQS

    Read the article

  • Excel chart on one column of date/times

    - by Segfault
    Hello, I have a list of date/timestamps, and I would like to plot these to a chart in Excel. What I'm looking for is a chart that shows "events per hour" or something similar. Can this be done easily if I just have one column of data (the list of timestamps)? I'm using excel 2007 and looking at two data sets. One is 56k events and the other 750, both over the span of a few days.

    Read the article

  • [change] Vlookup onto another workbook

    - by Arcadian
    Hello, here is my dilemma. I have two worksheets one that has the name of clients and one that i want to copy the names to depending on the city. For instance: associated to each column is last name, first name and city. i have hundreds of names associated to different cities and what i would like is from worksheet1.xls to copy all the New York clients to worksheet2.xls either when i open worksheet2 or via macro what ever is easier and because last name is in one cell and the first name is in the other i would have to copy both. I saw that its possible to link cells from one worksheet to another and then do a vlookup depending on the criteria. Is that the best easiest way or is there another? thanks in advance and cheers oleg

    Read the article

  • populating a list from another worksheet with 1 criteria

    - by Arcadian
    Hello, here is my dilemma. I have two worksheets one that has the name of clients and one that i want to copy the names to depending on the city. For instance: associated to each column is last name, first name and city. i have hundreds of names associated to different cities and what i would like is from worksheet1.xls to copy all the New York clients to worksheet2.xls either when i open worksheet2 or via macro what ever is easier and because last name is in one cell and the first name is in the other i would have to copy both. I saw that its possible to link cells from one worksheet to another and then do a vlookup depending on the criteria. Is that the best easiest way or is there another? thanks in advance and cheers oleg

    Read the article

  • Summing of total with dynamics rows coming external datasource

    - by Gainster
    I am using Excel 2010 and retrieving data from SQL analysis service. When I refresh the data from Excel, the rows all refresh as they are bound to an external datasource. I am adding a separate column with a formula to sum the totals. With an increment or decrement of these rows, the alignment of custom columns goes out. How can I resolve this problem that summing of values become dynamic with adding and removal of rows?

    Read the article

  • Excel not properly recalculating values

    - by gms8994
    I have an excel sheet with values in it (this sheet is generated by a custom perl script, but I don't think that's where the problem lies). In it, I have a formula: =sum(indirect(concatenate(address(6,column()),":",address(17,column())))) The purpose of this formula is to give me the SUM() of the cells in the current column, between rows 6 and 17. In Gnumeric Spreadsheet, as soon as I open the file, this works. But in Excel (both 2003 and 2007), opening the file gives #VALUE! errors in the fields with this formula, stating that the INDIRECT call with the values $B$6:$B$17 will result in an error. Here's the kink in the issue. If I edit the field (via F2), and make no changes, and hit enter, the values update. Also, it seems, if I save the file as .xlsx (Excel 2007 format), the values update upon opening. Unfortunately, I'm not sure that creating an xlsx is a possibility with the modules that I'm using, and many of our clients probably wouldn't be able to use it anyway. Any suggestions? Editing 200+ files every month for each client isn't going to be feasible, so if there's something I'm missing, please let me know.

    Read the article

  • Exploring the Excel Services REST API

    - by jamiet
    Over the last few years Analysis Services guru Chris Webb and I have been on something of a crusade to enable better access to data that is locked up in countless Excel workbooks that litter the hard drives of enterprise PCs. The most prominent manifestation of that crusade up to now has been a forum thread that Chris began on Microsoft Answers entitled Excel Web App API? Chris began that thread with: I was wondering whether there was an API for the Excel Web App? Specifically, I was wondering if it was possible (or if it will be possible in the future) to expose data in a spreadsheet in the Excel Web App as an OData feed, in the way that it is possible with Excel Services? Up to recently the last 10 words of that paragraph "in the way that it is possible with Excel Services" had completely washed over me however a comment on my recent blog post Thoughts on ExcelMashup.com (and a rant) by Josh Booker in which Josh said: Excel Services is a service application built for sharepoint 2010 which exposes a REST API for excel documents. We're looking forward to pros like you giving it a try now that Office365 makes sharepoint more easily accessible.  Can't wait for your future blog about using REST API to load data from Excel on Offce 365 in SSIS. made me think that perhaps the Excel Services REST API is something I should be looking into and indeed that is what I have been doing over the past few days. And you know what? I'm rather impressed with some of what Excel Services' REST API has to offer. Unfortunately Excel Services' REST API also has one debilitating aspect that renders this blog post much less useful than it otherwise would be; namely that it is not publicly available from the Excel Web App on SkyDrive. Therefore all I can do in this blog post is show you screenshots of what the REST API provides in Sharepoint rather than linking you directly to those REST resources; that's a great shame because one of the benefits of a REST API is that it is easily and ubiquitously demonstrable from a web browser. Instead I am hosting a workbook on Sharepoint in Office 365 because that does include Excel Services' REST API but, again, all I can do is show you screenshots. N.B. If anyone out there knows how to make Office-365-hosted spreadsheets publicly-accessible (i.e. without requiring a username/password) please do let me know (because knowing which forum on which to ask the question is an exercise in futility). In order to demonstrate Excel Services' REST API I needed some decent data and for that I used the World Tourism Organization Statistics Database and Yearbook - United Nations World Tourism Organization dataset hosted on Azure Datamarket (its free, by the way); this dataset "provides comprehensive information on international tourism worldwide and offers a selection of the latest available statistics on international tourist arrivals, tourism receipts and expenditure" and you can explore the data for yourself here. If you want to play along at home by viewing the data as it exists in Excel then it can be viewed here. Let's dive in.   The root of Excel Services' REST API is the model resource which resides at: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model Note that this is true for every workbook hosted in a Sharepoint document library - each Excel workbook is a RESTful resource. (Update: Mark Stacey on Twitter tells me that "It's turned off by default in onpremise Sharepoint (1 tickbox to turn on though)". Thanks Mark!) The data is provided as an ATOM feed but I have Firefox's feed reading ability turned on so you don't see the underlying XML goo. As you can see there are four top level resources, Ranges, Charts, Tables and PivotTables; exploring one of those resources is where things get interesting. Let's take a look at the Tables Resource: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables Our workbook contains only one table, called ‘Table1’ (to reiterate, you can explore this table yourself here). Viewing that table via the REST API is pretty easy, we simply append the name of the table onto our previous URI: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1') As you can see, that quite simply gives us a representation of the data in that table. What you cannot see from this screenshot is that this is pure HTML that is being served up; that is all well and good but actually we can do more interesting things. If we specify that the data should be returned not as HTML but as: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1')?$format=image then that data comes back as a pure image and can be used in any web page where you would ordinarily use images. This is the thing that I really like about Excel Services’ REST API – we can embed an image in any web page but instead of being a copy of the data, that image is actually live – if the underlying data in the workbook were to change then hitting refresh will show a new image. Pretty cool, no? The same is true of any Charts or Pivot Tables in your workbook - those can be embedded as images too and if the underlying data changes, boom, the image in your web page changes too. There is a lot of data in the workbook so the image returned by that previous URI is too large to show here so instead let’s take a look at a different resource, this time a range: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15') That URI returns cells A1 to C15 from a worksheet called “Data”: And if we ask for that as an image again: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15')?$format=image Were this image resource not behind a username/password then this would be a live image of the data in the workbook as opposed to one that I had to copy and upload elsewhere. Nonetheless I hope this little wrinkle doesn't detract from the inate value of what I am trying to articulate here; that an existing image in a web page can be changed on-the-fly simply by inserting some data into an Excel workbook. I for one think that that is very cool indeed! I think that's enough in the way of demo for now as this shows what is possible using Excel Services' REST API. Of course, not all features work quite how I would like and here is a bulleted list of some of my more negative feedback: The URIs are pig-ugly. Are "_vti_bin" & "ExcelRest.aspx" really necessary as part of the URI? Would this not be better: http://server/Documents/TourismExpenditureInMillionsOfUSD.xlsx/Model/Tables(‘Table1’) That URI provides the necessary addressability and is a lot easier to remember. Discoverability of these resources is not easy, we essentially have to handcrank a URI ourselves. Take the example of embedding a chart into a blog post - would it not be better if I could browse first through the document library to an Excel workbook and THEN through the workbook to the chart/range/table that I am interested in? Call it a wizard if you like. That would be really cool and would, I am sure, promote this feature and cut down on the copy-and-paste disease that the REST API is meant to alleviate. The resources that I demonstrated can be returned as feeds as well as images or HTML simply by changing the format parameter to ?$format=atom however for some inexplicable reason they don't return OData and no-one on the Excel Services team can tell me why (believe me, I have asked). $format is an OData parameter however other useful parameters such as $top and $filter are not supported. It would be nice if they were. Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this and was one of the motivating factors for Chris Webb's forum post that I linked to above. None of this works for Excel workbooks hosted on SkyDrive This blog post is as long as it needs to be for a short introduction so I'll stop now. If you want to know more than I recommend checking out a few links: Excel Services REST API documentation on MSDNSo what does REST on Excel Services look like??? by Shahar PrishExcel Services in SharePoint 2010 REST API Syntax by Christian Stich. Any thoughts? Let's hear them in the comments section below! @Jamiet 

    Read the article

  • Exploring the Excel Services REST API

    - by jamiet
    Over the last few years Analysis Services guru Chris Webb and I have been on something of a crusade to enable better access to data that is locked up in countless Excel workbooks that litter the hard drives of enterprise PCs. The most prominent manifestation of that crusade up to now has been a forum thread that Chris began on Microsoft Answers entitled Excel Web App API? Chris began that thread with: I was wondering whether there was an API for the Excel Web App? Specifically, I was wondering if it was possible (or if it will be possible in the future) to expose data in a spreadsheet in the Excel Web App as an OData feed, in the way that it is possible with Excel Services? Up to recently the last 10 words of that paragraph "in the way that it is possible with Excel Services" had completely washed over me however a comment on my recent blog post Thoughts on ExcelMashup.com (and a rant) by Josh Booker in which Josh said: Excel Services is a service application built for sharepoint 2010 which exposes a REST API for excel documents. We're looking forward to pros like you giving it a try now that Office365 makes sharepoint more easily accessible.  Can't wait for your future blog about using REST API to load data from Excel on Offce 365 in SSIS. made me think that perhaps the Excel Services REST API is something I should be looking into and indeed that is what I have been doing over the past few days. And you know what? I'm rather impressed with some of what Excel Services' REST API has to offer. Unfortunately Excel Services' REST API also has one debilitating aspect that renders this blog post much less useful than it otherwise would be; namely that it is not publicly available from the Excel Web App on SkyDrive. Therefore all I can do in this blog post is show you screenshots of what the REST API provides in Sharepoint rather than linking you directly to those REST resources; that's a great shame because one of the benefits of a REST API is that it is easily and ubiquitously demonstrable from a web browser. Instead I am hosting a workbook on Sharepoint in Office 365 because that does include Excel Services' REST API but, again, all I can do is show you screenshots. N.B. If anyone out there knows how to make Office-365-hosted spreadsheets publicly-accessible (i.e. without requiring a username/password) please do let me know (because knowing which forum on which to ask the question is an exercise in futility). In order to demonstrate Excel Services' REST API I needed some decent data and for that I used the World Tourism Organization Statistics Database and Yearbook - United Nations World Tourism Organization dataset hosted on Azure Datamarket (its free, by the way); this dataset "provides comprehensive information on international tourism worldwide and offers a selection of the latest available statistics on international tourist arrivals, tourism receipts and expenditure" and you can explore the data for yourself here. If you want to play along at home by viewing the data as it exists in Excel then it can be viewed here. Let's dive in.   The root of Excel Services' REST API is the model resource which resides at: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model Note that this is true for every workbook hosted in a Sharepoint document library - each Excel workbook is a RESTful resource. (Update: Mark Stacey on Twitter tells me that "It's turned off by default in onpremise Sharepoint (1 tickbox to turn on though)". Thanks Mark!) The data is provided as an ATOM feed but I have Firefox's feed reading ability turned on so you don't see the underlying XML goo. As you can see there are four top level resources, Ranges, Charts, Tables and PivotTables; exploring one of those resources is where things get interesting. Let's take a look at the Tables Resource: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables Our workbook contains only one table, called ‘Table1’ (to reiterate, you can explore this table yourself here). Viewing that table via the REST API is pretty easy, we simply append the name of the table onto our previous URI: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1') As you can see, that quite simply gives us a representation of the data in that table. What you cannot see from this screenshot is that this is pure HTML that is being served up; that is all well and good but actually we can do more interesting things. If we specify that the data should be returned not as HTML but as: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Tables('Table1')?$format=image then that data comes back as a pure image and can be used in any web page where you would ordinarily use images. This is the thing that I really like about Excel Services’ REST API – we can embed an image in any web page but instead of being a copy of the data, that image is actually live – if the underlying data in the workbook were to change then hitting refresh will show a new image. Pretty cool, no? The same is true of any Charts or Pivot Tables in your workbook - those can be embedded as images too and if the underlying data changes, boom, the image in your web page changes too. There is a lot of data in the workbook so the image returned by that previous URI is too large to show here so instead let’s take a look at a different resource, this time a range: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15') That URI returns cells A1 to C15 from a worksheet called “Data”: And if we ask for that as an image again: http://server/_vti_bin/ExcelRest.aspx/Documents/TourismExpenditureInMillionsOfUSD.xlsx/model/Ranges('Data!A1|C15')?$format=image Were this image resource not behind a username/password then this would be a live image of the data in the workbook as opposed to one that I had to copy and upload elsewhere. Nonetheless I hope this little wrinkle doesn't detract from the inate value of what I am trying to articulate here; that an existing image in a web page can be changed on-the-fly simply by inserting some data into an Excel workbook. I for one think that that is very cool indeed! I think that's enough in the way of demo for now as this shows what is possible using Excel Services' REST API. Of course, not all features work quite how I would like and here is a bulleted list of some of my more negative feedback: The URIs are pig-ugly. Are "_vti_bin" & "ExcelRest.aspx" really necessary as part of the URI? Would this not be better: http://server/Documents/TourismExpenditureInMillionsOfUSD.xlsx/Model/Tables(‘Table1’) That URI provides the necessary addressability and is a lot easier to remember. Discoverability of these resources is not easy, we essentially have to handcrank a URI ourselves. Take the example of embedding a chart into a blog post - would it not be better if I could browse first through the document library to an Excel workbook and THEN through the workbook to the chart/range/table that I am interested in? Call it a wizard if you like. That would be really cool and would, I am sure, promote this feature and cut down on the copy-and-paste disease that the REST API is meant to alleviate. The resources that I demonstrated can be returned as feeds as well as images or HTML simply by changing the format parameter to ?$format=atom however for some inexplicable reason they don't return OData and no-one on the Excel Services team can tell me why (believe me, I have asked). $format is an OData parameter however other useful parameters such as $top and $filter are not supported. It would be nice if they were. Although I haven't demonstrated it here Excel Services' REST API does provide a makeshift way of altering the data by changing the value of specific cells however what it does not allow you to do is add new data into the workbook. Google Docs allows this and was one of the motivating factors for Chris Webb's forum post that I linked to above. None of this works for Excel workbooks hosted on SkyDrive This blog post is as long as it needs to be for a short introduction so I'll stop now. If you want to know more than I recommend checking out a few links: Excel Services REST API documentation on MSDNSo what does REST on Excel Services look like??? by Shahar PrishExcel Services in SharePoint 2010 REST API Syntax by Christian Stich. Any thoughts? Let's hear them in the comments section below! @Jamiet 

    Read the article

  • Export to Excel from ASP.NET produces files unreadable with Excel Viewer

    - by MainMa
    Hi, I want to output some dynamic data from an ASP.NET website to Excel. I found that the easiest way which does not require to use Excel XML or to install Excel on server machine is to output data as a table and specify application/vnd.ms-excel type. The problem is when I do so and try to open the file in Excel Viewer, I receive the following error message: Microsoft Excel Viewer cannot open files of this type. Then nothing is opened. Even an easiest code such as: protected void Page_Load(object sender, EventArgs e) { Response.Clear(); Response.Buffer = true; HttpContext.Current.Response.AddHeader("content-disposition", "attachment; filename=Example.xls"); Response.ContentType = "application/vnd.ms-excel"; StringWriter stringWriter = new System.IO.StringWriter(); HtmlTextWriter htmlTextWriter = new System.Web.UI.HtmlTextWriter(stringWriter); dataGrid.RenderControl(htmlTextWriter); Response.Write("<html><body><table><tr><td>A</td><td>B</td></tr><tr><td>C</td><td>D</td></tr></table></body></html>"); Response.End(); } produces the same error. What can cause such behavior? Is it because the Viewer cannot read such files, but full Excel version can?

    Read the article

  • Can JSON be made easily and safely editable by the non-technical Excel crowd?

    - by glitch
    I'm looking for a data storage format that's very intuitive and easy to edit. It should be ideally targeted towards the same crowd as Excel. At the same time I would like the data structure to be a tree. Ideally this would be JSON, since it offers both the tree aspect and allows for more interesting constructs like arrays. That and parsing libraries for JSON are ubiquitous, so I don't have to reinvent the wheel. The problem is that, at least with a non-specialized text editor, JSON is a giant pain to edit for a non-technical user. I'm thinking along the lines of someone who might have used Excel in the past, but never a real text editor. Someone who might not be comfortable with the idea of preserving JSON syntax by hand. Are there data formats out there that would fit this profile? I'd very much prefer this to be a JSON actually, but then it would require a solid editing tool that would hide the underlying implementation from the user. Think Excel and how it abstracts CSV syntax from the user. The reason I'm looking for something like this is because the team has been working with pretty hierarchical data for a while now and we've hit the limits of how easy it is to represent in simple CSVs without having to create complex rules for how represent hierarchy semantics from each row. Any suggestions?

    Read the article

  • Big Data – Is Big Data Relevant to me? – Big Data Questionnaires – Guest Post by Vinod Kumar

    - by Pinal Dave
    This guest post is by Vinod Kumar. Vinod Kumar has worked with SQL Server extensively since joining the industry over a decade ago. Working on various versions of SQL Server 7.0, Oracle 7.3 and other database technologies – he now works with the Microsoft Technology Center (MTC) as a Technology Architect. Let us read the blog post in Vinod’s own voice. I think the series from Pinal is a good one for anyone planning to start on Big Data journey from the basics. In my daily customer interactions this buzz of “Big Data” always comes up, I react generally saying – “Sir, do you really have a ‘Big Data’ problem or do you have a big Data problem?” Generally, there is a silence in the air when I ask this question. Data is everywhere in organizations – be it big data, small data, all data and for few it is bad data which is same as no data :). Wow, don’t discount me as someone who opposes “Big Data”, I am a big supporter as much as I am a critic of the abuse of this term by the people. In this post, I wanted to let my mind flow so that you can also think in the direction I want you to see these concepts. In any case, this is not an exhaustive dump of what is in my mind – but you will surely get the drift how I am going to question Big Data terms from customers!!! Is Big Data Relevant to me? Many of my customers talk to me like blank whiteboard with no idea – “why Big Data”. They want to jump into the bandwagon of technology and they want to decipher insights from their unexplored data a.k.a. unstructured data with structured data. So what are these industry scenario’s that come to mind? Here are some of them: Financials Fraud detection: Banks and Credit cards are monitoring your spending habits on real-time basis. Customer Segmentation: applies in every industry from Banking to Retail to Aviation to Utility and others where they deal with end customer who consume their products and services. Customer Sentiment Analysis: Responding to negative brand perception on social or amplify the positive perception. Sales and Marketing Campaign: Understand the impact and get closer to customer delight. Call Center Analysis: attempt to take unstructured voice recordings and analyze them for content and sentiment. Medical Reduce Re-admissions: How to build a proactive follow-up engagements with patients. Patient Monitoring: How to track Inpatient, Out-Patient, Emergency Visits, Intensive Care Units etc. Preventive Care: Disease identification and Risk stratification is a very crucial business function for medical. Claims fraud detection: There is no precise dollars that one can put here, but this is a big thing for the medical field. Retail Customer Sentiment Analysis, Customer Care Centers, Campaign Management. Supply Chain Analysis: Every sensors and RFID data can be tracked for warehouse space optimization. Location based marketing: Based on where a check-in happens retail stores can be optimize their marketing. Telecom Price optimization and Plans, Finding Customer churn, Customer loyalty programs Call Detail Record (CDR) Analysis, Network optimizations, User Location analysis Customer Behavior Analysis Insurance Fraud Detection & Analysis, Pricing based on customer Sentiment Analysis, Loyalty Management Agents Analysis, Customer Value Management This list can go on to other areas like Utility, Manufacturing, Travel, ITES etc. So as you can see, there are obviously interesting use cases for each of these industry verticals. These are just representative list. Where to start? A lot of times I try to quiz customers on a number of dimensions before starting a Big Data conversation. Are you getting the data you need the way you want it and in a timely manner? Can you get in and analyze the data you need? How quickly is IT to respond to your BI Requests? How easily can you get at the data that you need to run your business/department/project? How are you currently measuring your business? Can you get the data you need to react WITHIN THE QUARTER to impact behaviors to meet your numbers or is it always “rear-view mirror?” How are you measuring: The Brand Customer Sentiment Your Competition Your Pricing Your performance Supply Chain Efficiencies Predictive product / service positioning What are your key challenges of driving collaboration across your global business?  What the challenges in innovation? What challenges are you facing in getting more information out of your data? Note: Garbage-in is Garbage-out. Hold good for all reporting / analytics requirements Big Data POCs? A number of customers get into the realm of setting a small team to work on Big Data – well it is a great start from an understanding point of view, but I tend to ask a number of other questions to such customers. Some of these common questions are: To what degree is your advanced analytics (natural language processing, sentiment analysis, predictive analytics and classification) paired with your Big Data’s efforts? Do you have dedicated resources exploring the possibilities of advanced analytics in Big Data for your business line? Do you plan to employ machine learning technology while doing Advanced Analytics? How is Social Media being monitored in your organization? What is your ability to scale in terms of storage and processing power? Do you have a system in place to sort incoming data in near real time by potential value, data quality, and use frequency? Do you use event-driven architecture to manage incoming data? Do you have specialized data services that can accommodate different formats, security, and the management requirements of multiple data sources? Is your organization currently using or considering in-memory analytics? To what degree are you able to correlate data from your Big Data infrastructure with that from your enterprise data warehouse? Have you extended the role of Data Stewards to include ownership of big data components? Do you prioritize data quality based on the source system (that is Facebook/Twitter data has lower quality thresholds than radio frequency identification (RFID) for a tracking system)? Do your retention policies consider the different legal responsibilities for storing Big Data for a specific amount of time? Do Data Scientists work in close collaboration with Data Stewards to ensure data quality? How is access to attributes of Big Data being given out in the organization? Are roles related to Big Data (Advanced Analyst, Data Scientist) clearly defined? How involved is risk management in the Big Data governance process? Is there a set of documented policies regarding Big Data governance? Is there an enforcement mechanism or approach to ensure that policies are followed? Who is the key sponsor for your Big Data governance program? (The CIO is best) Do you have defined policies surrounding the use of social media data for potential employees and customers, as well as the use of customer Geo-location data? How accessible are complex analytic routines to your user base? What is the level of involvement with outside vendors and third parties in regard to the planning and execution of Big Data projects? What programming technologies are utilized by your data warehouse/BI staff when working with Big Data? These are some of the important questions I ask each customer who is actively evaluating Big Data trends for their organizations. These questions give you a sense of direction where to start, what to use, how to secure, how to analyze and more. Sign off Any Big data is analysis is incomplete without a compelling story. The best way to understand this is to watch Hans Rosling – Gapminder (2:17 to 6:06) videos about the third world myths. Don’t get overwhelmed with the Big Data buzz word, the destination to what your data speaks is important. In this blog post, we did not particularly look at any Big Data technologies. This is a set of questionnaire one needs to keep in mind as they embark their journey of Big Data. I did write some of the basics in my blog: Big Data – Big Hype yet Big Opportunity. Do let me know if these questions make sense?  Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • EXcel VBA : Excel Macro to create table in a PowerPoint

    - by Balaji.N.S
    Hi friends, My requirement is I have a Excel which contains some data. I would like to select some data from the excel and open a PowerPoint file and Create Table in PowerPoint and populate the data in to it Right now I have succeeded in collecting the data from excel opening a PowerPoint file through Excel VBA Code. Code for Opening the PowerPoint from Excel. Set objPPT = CreateObject("Powerpoint.application") objPPT.Visible = True Dim file As String file = "C:\Heavyhitters_new.ppt" Set pptApp = CreateObject("PowerPoint.Application") Set pptPres = pptApp.Presentations.Open(file) Now how do I create the table in PowerPoint from Excel and populate the data. Timely help will be very much appreciated. Thanks in advance,

    Read the article

  • Create a linear trend line in Excel graphs with logarithmic scale

    - by Redsoft7
    I I have an Excel scatter chart with x and y values. I set the logarithmic scale in x-axis and y-axis. When I add a linear trend line to the graph, the line is not linear but appears like a curve. How can I make a linear trend line on a logarithmic-scaled chart? Sample data: x: 18449 22829 25395 36869 101419 125498 208144 2001508 14359478 17301785 y: 269,09 273,89 239,50 239,50 175,13 176,73 151,94 135,15 131,55 121,55

    Read the article

  • Big Data – Basics of Big Data Architecture – Day 4 of 21

    - by Pinal Dave
    In yesterday’s blog post we understood how Big Data evolution happened. Today we will understand basics of the Big Data Architecture. Big Data Cycle Just like every other database related applications, bit data project have its development cycle. Though three Vs (link) for sure plays an important role in deciding the architecture of the Big Data projects. Just like every other project Big Data project also goes to similar phases of the data capturing, transforming, integrating, analyzing and building actionable reporting on the top of  the data. While the process looks almost same but due to the nature of the data the architecture is often totally different. Here are few of the question which everyone should ask before going ahead with Big Data architecture. Questions to Ask How big is your total database? What is your requirement of the reporting in terms of time – real time, semi real time or at frequent interval? How important is the data availability and what is the plan for disaster recovery? What are the plans for network and physical security of the data? What platform will be the driving force behind data and what are different service level agreements for the infrastructure? This are just basic questions but based on your application and business need you should come up with the custom list of the question to ask. As I mentioned earlier this question may look quite simple but the answer will not be simple. When we are talking about Big Data implementation there are many other important aspects which we have to consider when we decide to go for the architecture. Building Blocks of Big Data Architecture It is absolutely impossible to discuss and nail down the most optimal architecture for any Big Data Solution in a single blog post, however, we can discuss the basic building blocks of big data architecture. Here is the image which I have built to explain how the building blocks of the Big Data architecture works. Above image gives good overview of how in Big Data Architecture various components are associated with each other. In Big Data various different data sources are part of the architecture hence extract, transform and integration are one of the most essential layers of the architecture. Most of the data is stored in relational as well as non relational data marts and data warehousing solutions. As per the business need various data are processed as well converted to proper reports and visualizations for end users. Just like software the hardware is almost the most important part of the Big Data Architecture. In the big data architecture hardware infrastructure is extremely important and failure over instances as well as redundant physical infrastructure is usually implemented. NoSQL in Data Management NoSQL is a very famous buzz word and it really means Not Relational SQL or Not Only SQL. This is because in Big Data Architecture the data is in any format. It can be unstructured, relational or in any other format or from any other data source. To bring all the data together relational technology is not enough, hence new tools, architecture and other algorithms are invented which takes care of all the kind of data. This is collectively called NoSQL. Tomorrow Next four days we will answer the Buzz Words – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Big Data – Evolution of Big Data – Day 3 of 21

    - by Pinal Dave
    In yesterday’s blog post we answered what is the Big Data. Today we will understand why and how the evolution of Big Data has happened. Though the answer is very simple, I would like to tell it in the form of a history lesson. Data in Flat File In earlier days data was stored in the flat file and there was no structure in the flat file.  If any data has to be retrieved from the flat file it was a project by itself. There was no possibility of retrieving the data efficiently and data integrity has been just a term discussed without any modeling or structure around. Database residing in the flat file had more issues than we would like to discuss in today’s world. It was more like a nightmare when there was any data processing involved in the application. Though, applications developed at that time were also not that advanced the need of the data was always there and there was always need of proper data management. Edgar F Codd and 12 Rules Edgar Frank Codd was a British computer scientist who, while working for IBM, invented the relational model for database management, the theoretical basis for relational databases. He presented 12 rules for the Relational Database and suddenly the chaotic world of the database seems to see discipline in the rules. Relational Database was a promising land for all the unstructured database users. Relational Database brought into the relationship between data as well improved the performance of the data retrieval. Database world had immediately seen a major transformation and every single vendors and database users suddenly started to adopt the relational database models. Relational Database Management Systems Since Edgar F Codd proposed 12 rules for the RBDMS there were many different vendors who started them to build applications and tools to support the relationship between database. This was indeed a learning curve for many of the developer who had never worked before with the modeling of the database. However, as time passed by pretty much everybody accepted the relationship of the database and started to evolve product which performs its best with the boundaries of the RDBMS concepts. This was the best era for the databases and it gave the world extreme experts as well as some of the best products. The Entity Relationship model was also evolved at the same time. In software engineering, an Entity–relationship model (ER model) is a data model for describing a database in an abstract way. Enormous Data Growth Well, everything was going fine with the RDBMS in the database world. As there were no major challenges the adoption of the RDBMS applications and tools was pretty much universal. There was a race at times to make the developer’s life much easier with the RDBMS management tools. Due to the extreme popularity and easy to use system pretty much every data was stored in the RDBMS system. New age applications were built and social media took the world by the storm. Every organizations was feeling pressure to provide the best experience for their users based the data they had with them. While this was all going on at the same time data was growing pretty much every organization and application. Data Warehousing The enormous data growth now presented a big challenge for the organizations who wanted to build intelligent systems based on the data and provide near real time superior user experience to their customers. Various organizations immediately start building data warehousing solutions where the data was stored and processed. The trend of the business intelligence becomes the need of everyday. Data was received from the transaction system and overnight was processed to build intelligent reports from it. Though this is a great solution it has its own set of challenges. The relational database model and data warehousing concepts are all built with keeping traditional relational database modeling in the mind and it still has many challenges when unstructured data was present. Interesting Challenge Every organization had expertise to manage structured data but the world had already changed to unstructured data. There was intelligence in the videos, photos, SMS, text, social media messages and various other data sources. All of these needed to now bring to a single platform and build a uniform system which does what businesses need. The way we do business has also been changed. There was a time when user only got the features what technology supported, however, now users ask for the feature and technology is built to support the same. The need of the real time intelligence from the fast paced data flow is now becoming a necessity. Large amount (Volume) of difference (Variety) of high speed data (Velocity) is the properties of the data. The traditional database system has limits to resolve the challenges this new kind of the data presents. Hence the need of the Big Data Science. We need innovation in how we handle and manage data. We need creative ways to capture data and present to users. Big Data is Reality! Tomorrow In tomorrow’s blog post we will try to answer discuss Basics of Big Data Architecture. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Working with Temporal Data in SQL Server

    - by Dejan Sarka
    My third Pluralsight course, Working with Temporal Data in SQL Server, is published. I am really proud on the second part of the course, where I discuss optimization of temporal queries. This was a nearly impossible task for decades. First solutions appeared only lately. I present all together six solutions (and one more that is not a solution), and I invented four of them. http://pluralsight.com/training/Courses/TableOfContents/working-with-temporal-data-sql-server

    Read the article

  • How to search for newline or linebreak characters in Excel?

    - by Highly Irregular
    I've imported some data into Excel (from a text file) and it contains some sort of newline characters. It looks like this initially: If I hit F2 (to edit) then Enter (to save changes) on each of the cells with a newline (without actually editing anything), Excel automatically changes the layout to look like this: I don't want these newlines characters here, as it messes up data processing further down the track. How can I do a search for these to detect more of them? The usual search function doesn't accept an enter character as a search character.

    Read the article

  • How To - Guide to Importing Data from a MySQL Database to Excel using MySQL for Excel

    - by Javier Treviño
    Fetching data from a database to then get it into an Excel spreadsheet to do analysis, reporting, transforming, sharing, etc. is a very common task among users. There are several ways to extract data from a MySQL database to then import it to Excel; for example you can use the MySQL Connector/ODBC to configure an ODBC connection to a MySQL database, then in Excel use the Data Connection Wizard to select the database and table from which you want to extract data from, then specify what worksheet you want to put the data into.  Another way is to somehow dump a comma delimited text file with the data from a MySQL table (using the MySQL Command Line Client, MySQL Workbench, etc.) to then in Excel open the file using the Text Import Wizard to attempt to correctly split the data in columns. These methods are fine, but involve some degree of technical knowledge to make the magic happen and involve repeating several steps each time data needs to be imported from a MySQL table to an Excel spreadsheet. So, can this be done in an easier and faster way? With MySQL for Excel you can. MySQL for Excel features an Import MySQL Data action where you can import data from a MySQL Table, View or Stored Procedure literally with a few clicks within Excel.  Following is a quick guide describing how to import data using MySQL for Excel. This guide assumes you already have a working MySQL Server instance, Microsoft Office Excel 2007 or 2010 and MySQL for Excel installed. 1. Opening MySQL for Excel Being an Excel Add-In, MySQL for Excel is opened from within Excel, so to use it open Excel, go to the Data tab located in the Ribbon and click MySQL for Excel at the far right of the Ribbon. 2. Creating a MySQL Connection (may be optional) If you have MySQL Workbench installed you will automatically see the same connections that you can see in MySQL Workbench, so you can use any of those and there may be no need to create a new connection. If you want to create a new connection (which normally you will do only once), in the Welcome Panel click New Connection, which opens the Setup New Connection dialog. Here you only need to give your new connection a distinctive Connection Name, specify the Hostname (or IP address) where the MySQL Server instance is running on (if different than localhost), the Port to connect to and the Username for the login. If you wish to test if your setup is good to go, click Test Connection and an information dialog will pop-up stating if the connection is successful or errors were found. 3.Opening a connection to a MySQL Server To open a pre-configured connection to a MySQL Server you just need to double-click it, so the Connection Password dialog is displayed where you enter the password for the login. 4. Selecting a MySQL Schema After opening a connection to a MySQL Server, the Schema Selection Panel is shown, where you can select the Schema that contains the Tables, Views and Stored Procedures you want to work with. To do so, you just need to either double-click the desired Schema or select it and click Next >. 5. Importing data… All previous steps were really the basic minimum needed to drill-down to the DB Object Selection Panel  where you can see the Database Objects (grouped by type: Tables, Views and Procedures in that order) that you want to perform actions against; in the case of this guide, the action of importing data from them. a. From a MySQL Table To import from a Table you just need to select it from the list of Database Objects’ Tables group, after selecting it you will note actions below the list become available; then click Import MySQL Data. The Import Data dialog is displayed; you can see some basic information here like the name of the Excel worksheet the data will be imported to (in the window title), the Table Name, the total Row Count and a 10 row preview of the data meant for the user to see the columns that the table contains and to provide a way to select which columns to import. The Import Data dialog is designed with defaults in place so all data is imported (all rows and all columns) by just clicking Import; this is important to minimize the number of clicks needed to get the job done. After the import is performed you will have the data in the Excel worksheet formatted automatically. If you need to override the defaults in the Import Data dialog to change the columns selected for import or to change the number of imported rows you can easily do so before clicking Import. In the screenshot below the defaults are overridden to import only the first 3 columns and rows 10 – 60 (Limit to 50 Rows and Start with Row 10). If the number of rows to be imported exceeds the maximum number of rows Excel can hold in its worksheet, a warning will be displayed in the dialog, meaning the imported number of rows will be limited by that maximum number (65,535 rows if the worksheet is in Compatibility Mode).  In the screenshot below you can see the Table contains 80,559 rows, but only 65,534 rows will be imported since the first row is used for the column names if the Include Column Names as Headers checkbox is checked. b. From a MySQL View Similar to the way of importing from a Table, to import from a View you just need to select it from the list of Database Objects’ Views group, then click Import MySQL Data. The Import Data dialog is displayed; identically to the way everything looks when importing from a table, the dialog displays the View Name, the total Row Count and the data preview grid. Since Views are really a filtered way to display data from Tables, it is actually as if we are extracting data from a Table; so the Import Data dialog is actually identical for those 2 Database Objects. After the import is performed, the data in the Excel spreadsheet looks like the following screenshot. Note that you can override the defaults in the Import Data dialog in the same way described above for importing data from Tables. Also the Compatibility Mode warning will be displayed if data exceeds the maximum number of rows explained before. c. From a MySQL Procedure Too import from a Procedure you just need to select it from the list of Database Objects’ Procedures group (note you can see Procedures here but not Functions since these return a single value, so by design they are filtered out). After the selection is made, click Import MySQL Data. The Import Data dialog is displayed, but this time you can see it looks different to the one used for Tables and Views.  Given the nature of Store Procedures, they require first that values are supplied for its Parameters and also Procedures can return multiple Result Sets; so the Import Data dialog shows the Procedure Name and the Procedure Parameters in a grid where their values are input. After you supply the Parameter Values click Call. After calling the Procedure, the Result Sets returned by it are displayed at the bottom of the dialog; output parameters and the return value of the Procedure are appended as the last Result Set of the group. You can see each Result Set is displayed as a tab so you can see a preview of the returned data.  You can specify if you want to import the Selected Result Set (default), All Result Sets – Arranged Horizontally or All Result Sets – Arranged Vertically using the Import drop-down list; then click Import. After the import is performed, the data in the Excel spreadsheet looks like the following screenshot.  Note in this example all Result Sets were imported and arranged vertically. As you can see using MySQL for Excel importing data from a MySQL database becomes an easy task that requires very little technical knowledge, so it can be done by any type of user. Hope you enjoyed this guide! Remember that your feedback is very important for us, so drop us a message: MySQL on Windows (this) Blog - https://blogs.oracle.com/MySqlOnWindows/ Forum - http://forums.mysql.com/list.php?172 Facebook - http://www.facebook.com/mysql Cheers!

    Read the article

  • New Communications Industry Data Model with "Factory Installed" Predictive Analytics using Oracle Da

    - by charlie.berger
    Oracle Introduces Oracle Communications Data Model to Provide Actionable Insight for Communications Service Providers   We've integrated pre-installed analytical methodologies with the new Oracle Communications Data Model to deliver automated, simple, yet powerful predictive analytics solutions for customers.  Churn, sentiment analysis, identifying customer segments - all things that can be anticipated and hence, preconcieved and implemented inside an applications.  Read on for more information! TM Forum Management World, Nice, France - 18 May 2010 News Facts To help communications service providers (CSPs) manage and analyze rapidly growing data volumes cost effectively, Oracle today introduced the Oracle Communications Data Model. With the Oracle Communications Data Model, CSPs can achieve rapid time to value by quickly implementing a standards-based enterprise data warehouse that features communications industry-specific reporting, analytics and data mining. The combination of the Oracle Communications Data Model, Oracle Exadata and the Oracle Business Intelligence (BI) Foundation represents the most comprehensive data warehouse and BI solution for the communications industry. Also announced today, Hong Kong Broadband Network enhanced their data warehouse system, going live on Oracle Communications Data Model in three months. The leading provider increased its subscriber base by 37 percent in six months and reduced customer churn to less than one percent. Product Details Oracle Communications Data Model provides industry-specific schema and embedded analytics that address key areas such as customer management, marketing segmentation, product development and network health. CSPs can efficiently capture and monitor critical data and transform it into actionable information to support development and delivery of next-generation services using: More than 1,300 industry-specific measurements and key performance indicators (KPIs) such as network reliability statistics, provisioning metrics and customer churn propensity. Embedded OLAP cubes for extremely fast dimensional analysis of business information. Embedded data mining models for sophisticated trending and predictive analysis. Support for multiple lines of business, such as cable, mobile, wireline and Internet, which can be easily extended to support future requirements. With Oracle Communications Data Model, CSPs can jump start the implementation of a communications data warehouse in line with communications-industry standards including the TM Forum Information Framework (SID), formerly known as the Shared Information Model. Oracle Communications Data Model is optimized for any Oracle Database 11g platform, including Oracle Exadata, which can improve call data record query performance by 10x or more. Supporting Quotes "Oracle Communications Data Model covers a wide range of business areas that are relevant to modern communications service providers and is a comprehensive solution - with its data model and pre-packaged templates including BI dashboards, KPIs, OLAP cubes and mining models. It helps us save a great deal of time in building and implementing a customized data warehouse and enables us to leverage the advanced analytics quickly and more effectively," said Yasuki Hayashi, executive manager, NTT Comware Corporation. "Data volumes will only continue to grow as communications service providers expand next-generation networks, deploy new services and adopt new business models. They will increasingly need efficient, reliable data warehouses to capture key insights on data such as customer value, network value and churn probability. With the Oracle Communications Data Model, Oracle has demonstrated its commitment to meeting these needs by delivering data warehouse tools designed to fill communications industry-specific needs," said Elisabeth Rainge, program director, Network Software, IDC. "The TM Forum Conformance Mark provides reassurance to customers seeking standards-based, and therefore, cost-effective and flexible solutions. TM Forum is extremely pleased to work with Oracle to certify its Oracle Communications Data Model solution. Upon successful completion, this certification will represent the broadest and most complete implementation of the TM Forum Information Framework to date, with more than 130 aggregate business entities," said Keith Willetts, chairman and chief executive officer, TM Forum. Supporting Resources Oracle Communications Oracle Communications Data Model Data Sheet Oracle Communications Data Model Podcast Oracle Data Warehousing Oracle Communications on YouTube Oracle Communications on Delicious Oracle Communications on Facebook Oracle Communications on Twitter Oracle Communications on LinkedIn Oracle Database on Twitter The Data Warehouse Insider Blog

    Read the article

  • Big Data – How to become a Data Scientist and Learn Data Science? – Day 19 of 21

    - by Pinal Dave
    In yesterday’s blog post we learned the importance of the analytics in Big Data Story. In this article we will understand how to become a Data Scientist for Big Data Story. Data Scientist is a new buzz word, everyone seems to be wanting to become Data Scientist. Let us go over a few key topics related to Data Scientist in this blog post. First of all we will understand what is a Data Scientist. In the new world of Big Data, I see pretty much everyone wants to become Data Scientist and there are lots of people I have already met who claims that they are Data Scientist. When I ask what is their role, I have got a wide variety of answers. What is Data Scientist? Data scientists are the experts who understand various aspects of the business and know how to strategies data to achieve the business goals. They should have a solid foundation of various data algorithms, modeling and statistics methodology. What do Data Scientists do? Data scientists understand the data very well. They just go beyond the regular data algorithms and builds interesting trends from available data. They innovate and resurrect the entire new meaning from the existing data. They are artists in disguise of computer analyst. They look at the data traditionally as well as explore various new ways to look at the data. Data Scientists do not wait to build their solutions from existing data. They think creatively, they think before the data has entered into the system. Data Scientists are visionary experts who understands the business needs and plan ahead of the time, this tremendously help to build solutions at rapid speed. Besides being data expert, the major quality of Data Scientists is “curiosity”. They always wonder about what more they can get from their existing data and how to get maximum out of future incoming data. Data Scientists do wonders with the data, which goes beyond the job descriptions of Data Analysist or Business Analysist. Skills Required for Data Scientists Here are few of the skills a Data Scientist must have. Expert level skills with statistical tools like SAS, Excel, R etc. Understanding Mathematical Models Hands-on with Visualization Tools like Tableau, PowerPivots, D3. j’s etc. Analytical skills to understand business needs Communication skills On the technology front any Data Scientists should know underlying technologies like (Hadoop, Cloudera) as well as their entire ecosystem (programming language, analysis and visualization tools etc.) . Remember that for becoming a successful Data Scientist one require have par excellent skills, just having a degree in a relevant education field will not suffice. Final Note Data Scientists is indeed very exciting job profile. As per research there are not enough Data Scientists in the world to handle the current data explosion. In near future Data is going to expand exponentially, and the need of the Data Scientists will increase along with it. It is indeed the job one should focus if you like data and science of statistics. Courtesy: emc Tomorrow In tomorrow’s blog post we will discuss about various Big Data Learning resources. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Unstructured Data - The future of Data Administration

    Some have claimed that there is a problem with the way data is currently managed using the relational paradigm do to the rise of unstructured data in modern business. PCMag.com defines unstructured data as data that does not reside in a fixed location. They further explain that unstructured data refers to data in a free text form that is not bound to any specific structure. With the rise of unstructured data in the form of emails, spread sheets, images and documents the critics have a right to argue that the relational paradigm is not as effective as the object oriented data paradigm in managing this type of data. The relational paradigm relies heavily on structure and relationships in and between items of data. This type of paradigm works best in a relation database management system like Microsoft SQL, MySQL, and Oracle because data is forced to conform to a structure in the form of tables and relations can be derived from the existence of one or more tables. These critics also claim that database administrators have not kept up with reality because their primary focus in regards to data administration deals with structured data and the relational paradigm. The relational paradigm was developed in the 1970’s as a way to improve data management when compared to standard flat files. Little has changed since then, and modern database administrators need to know more than just how to handle structured data. That is why critics claim that today’s data professionals do not have the proper skills in order to store and maintain data for modern systems when compared to the skills of system designers, programmers , software engineers, and data designers  due to the industry trend of object oriented design and development. I think that they are wrong. I do not disagree that the industry is moving toward an object oriented approach to development with the potential to use more of an object oriented approach to data.   However, I think that it is business itself that is limiting database administrators from changing how data is stored because of the potential costs, and impact that might occur by altering any part of stored data. Furthermore, database administrators like all technology workers constantly are trying to improve their technical skills in order to excel in their job, so I think that accusing data professional is not just when the root cause of the lack of innovation is controlled by business, and it is business that will suffer for their inability to keep up with technology. One way for database professionals to better prepare for the future of database management is start working with data in the form of objects and so that they can extract data from the objects so that the stored information within objects can be used in relation to the data stored in a using the relational paradigm. Furthermore, I think the use of pattern matching will increase with the increased use of unstructured data because object can be selected, filtered and altered based on the existence of a pattern found within an object.

    Read the article

  • Working with PivotTables in Excel

    - by Mark Virtue
    PivotTables are one of the most powerful features of Microsoft Excel.  They allow large amounts of data to be analyzed and summarized in just a few mouse clicks. In this article, we explore PivotTables, understand what they are, and learn how to create and customize them. Note:  This article is written using Excel 2010 (Beta).  The concept of a PivotTable has changed little over the years, but the method of creating one has changed in nearly every iteration of Excel.  If you are using a version of Excel that is not 2010, expect different screens from the ones you see in this article. A Little History In the early days of spreadsheet programs, Lotus 1-2-3 ruled the roost.  Its dominance was so complete that people thought it was a waste of time for Microsoft to bother developing their own spreadsheet software (Excel) to compete with Lotus.  Flash-forward to 2010, and Excel’s dominance of the spreadsheet market is greater than Lotus’s ever was, while the number of users still running Lotus 1-2-3 is approaching zero.  How did this happen?  What caused such a dramatic reversal of fortunes? Industry analysts put it down to two factors:  Firstly, Lotus decided that this fancy new GUI platform called “Windows” was a passing fad that would never take off.  They declined to create a Windows version of Lotus 1-2-3 (for a few years, anyway), predicting that their DOS version of the software was all anyone would ever need.  Microsoft, naturally, developed Excel exclusively for Windows.  Secondly, Microsoft developed a feature for Excel that Lotus didn’t provide in 1-2-3, namely PivotTables.  The PivotTables feature, exclusive to Excel, was deemed so staggeringly useful that people were willing to learn an entire new software package (Excel) rather than stick with a program (1-2-3) that didn’t have it.  This one feature, along with the misjudgment of the success of Windows, was the death-knell for Lotus 1-2-3, and the beginning of the success of Microsoft Excel. Understanding PivotTables So what is a PivotTable, exactly? Put simply, a PivotTable is a summary of some data, created to allow easy analysis of said data.  But unlike a manually created summary, Excel PivotTables are interactive.  Once you have created one, you can easily change it if it doesn’t offer the exact insights into your data that you were hoping for.  In a couple of clicks the summary can be “pivoted” – rotated in such a way that the column headings become row headings, and vice versa.  There’s a lot more that can be done, too.  Rather than try to describe all the features of PivotTables, we’ll simply demonstrate them… The data that you analyze using a PivotTable can’t be just any data – it has to be raw data, previously unprocessed (unsummarized) – typically a list of some sort.  An example of this might be the list of sales transactions in a company for the past six months. Examine the data shown below: Notice that this is not raw data.  In fact, it is already a summary of some sort.  In cell B3 we can see $30,000, which apparently is the total of James Cook’s sales for the month of January.  So where is the raw data?  How did we arrive at the figure of $30,000?  Where is the original list of sales transactions that this figure was generated from?  It’s clear that somewhere, someone must have gone to the trouble of collating all of the sales transactions for the past six months into the summary we see above.  How long do you suppose this took?  An hour?  Ten?  Probably. If we were to track down the original list of sales transactions, it might look something like this: You may be surprised to learn that, using the PivotTable feature of Excel, we can create a monthly sales summary similar to the one above in a few seconds, with only a few mouse clicks.  We can do this – and a lot more too! How to Create a PivotTable First, ensure that you have some raw data in a worksheet in Excel.  A list of financial transactions is typical, but it can be a list of just about anything:  Employee contact details, your CD collection, or fuel consumption figures for your company’s fleet of cars. So we start Excel… …and we load such a list… Once we have the list open in Excel, we’re ready to start creating the PivotTable. Click on any one single cell within the list: Then, from the Insert tab, click the PivotTable icon: The Create PivotTable box appears, asking you two questions:  What data should your new PivotTable be based on, and where should it be created?  Because we already clicked on a cell within the list (in the step above), the entire list surrounding that cell is already selected for us ($A$1:$G$88 on the Payments sheet, in this example).  Note that we could select a list in any other region of any other worksheet, or even some external data source, such as an Access database table, or even a MS-SQL Server database table.  We also need to select whether we want our new PivotTable to be created on a new worksheet, or on an existing one.  In this example we will select a new one: The new worksheet is created for us, and a blank PivotTable is created on that worksheet: Another box also appears:  The PivotTable Field List.  This field list will be shown whenever we click on any cell within the PivotTable (above): The list of fields in the top part of the box is actually the collection of column headings from the original raw data worksheet.  The four blank boxes in the lower part of the screen allow us to choose the way we would like our PivotTable to summarize the raw data.  So far, there is nothing in those boxes, so the PivotTable is blank.  All we need to do is drag fields down from the list above and drop them in the lower boxes.  A PivotTable is then automatically created to match our instructions.  If we get it wrong, we only need to drag the fields back to where they came from and/or drag new fields down to replace them. The Values box is arguably the most important of the four.  The field that is dragged into this box represents the data that needs to be summarized in some way (by summing, averaging, finding the maximum, minimum, etc).  It is almost always numerical data.  A perfect candidate for this box in our sample data is the “Amount” field/column.  Let’s drag that field into the Values box: Notice that (a) the “Amount” field in the list of fields is now ticked, and “Sum of Amount” has been added to the Values box, indicating that the amount column has been summed. If we examine the PivotTable itself, we indeed find the sum of all the “Amount” values from the raw data worksheet: We’ve created our first PivotTable!  Handy, but not particularly impressive.  It’s likely that we need a little more insight into our data than that. Referring to our sample data, we need to identify one or more column headings that we could conceivably use to split this total.  For example, we may decide that we would like to see a summary of our data where we have a row heading for each of the different salespersons in our company, and a total for each.  To achieve this, all we need to do is to drag the “Salesperson” field into the Row Labels box: Now, finally, things start to get interesting!  Our PivotTable starts to take shape….   With a couple of clicks we have created a table that would have taken a long time to do manually. So what else can we do?  Well, in one sense our PivotTable is complete.  We’ve created a useful summary of our source data.  The important stuff is already learned!  For the rest of the article, we will examine some ways that more complex PivotTables can be created, and ways that those PivotTables can be customized. First, we can create a two-dimensional table.  Let’s do that by using “Payment Method” as a column heading.  Simply drag the “Payment Method” heading to the Column Labels box: Which looks like this: Starting to get very cool! Let’s make it a three-dimensional table.  What could such a table possibly look like?  Well, let’s see… Drag the “Package” column/heading to the Report Filter box: Notice where it ends up…. This allows us to filter our report based on which “holiday package” was being purchased.  For example, we can see the breakdown of salesperson vs payment method for all packages, or, with a couple of clicks, change it to show the same breakdown for the “Sunseekers” package: And so, if you think about it the right way, our PivotTable is now three-dimensional.  Let’s keep customizing… If it turns out, say, that we only want to see cheque and credit card transactions (i.e. no cash transactions), then we can deselect the “Cash” item from the column headings.  Click the drop-down arrow next to Column Labels, and untick “Cash”: Let’s see what that looks like…As you can see, “Cash” is gone. Formatting This is obviously a very powerful system, but so far the results look very plain and boring.  For a start, the numbers that we’re summing do not look like dollar amounts – just plain old numbers.  Let’s rectify that. A temptation might be to do what we’re used to doing in such circumstances and simply select the whole table (or the whole worksheet) and use the standard number formatting buttons on the toolbar to complete the formatting.  The problem with that approach is that if you ever change the structure of the PivotTable in the future (which is 99% likely), then those number formats will be lost.  We need a way that will make them (semi-)permanent. First, we locate the “Sum of Amount” entry in the Values box, and click on it.  A menu appears.  We select Value Field Settings… from the menu: The Value Field Settings box appears. Click the Number Format button, and the standard Format Cells box appears: From the Category list, select (say) Accounting, and drop the number of decimal places to 0.  Click OK a few times to get back to the PivotTable… As you can see, the numbers have been correctly formatted as dollar amounts. While we’re on the subject of formatting, let’s format the entire PivotTable.  There are a few ways to do this.  Let’s use a simple one… Click the PivotTable Tools/Design tab: Then drop down the arrow in the bottom-right of the PivotTable Styles list to see a vast collection of built-in styles: Choose any one that appeals, and look at the result in your PivotTable:   Other Options We can work with dates as well.  Now usually, there are many, many dates in a transaction list such as the one we started with.  But Excel provides the option to group data items together by day, week, month, year, etc.  Let’s see how this is done. First, let’s remove the “Payment Method” column from the Column Labels box (simply drag it back up to the field list), and replace it with the “Date Booked” column: As you can see, this makes our PivotTable instantly useless, giving us one column for each date that a transaction occurred on – a very wide table! To fix this, right-click on any date and select Group… from the context-menu: The grouping box appears.  We select Months and click OK: Voila!  A much more useful table: (Incidentally, this table is virtually identical to the one shown at the beginning of this article – the original sales summary that was created manually.) Another cool thing to be aware of is that you can have more than one set of row headings (or column headings): …which looks like this…. You can do a similar thing with column headings (or even report filters). Keeping things simple again, let’s see how to plot averaged values, rather than summed values. First, click on “Sum of Amount”, and select Value Field Settings… from the context-menu that appears: In the Summarize value field by list in the Value Field Settings box, select Average: While we’re here, let’s change the Custom Name, from “Average of Amount” to something a little more concise.  Type in something like “Avg”: Click OK, and see what it looks like.  Notice that all the values change from summed totals to averages, and the table title (top-left cell) has changed to “Avg”: If we like, we can even have sums, averages and counts (counts = how many sales there were) all on the same PivotTable! Here are the steps to get something like that in place (starting from a blank PivotTable): Drag “Salesperson” into the Column Labels Drag “Amount” field down into the Values box three times For the first “Amount” field, change its custom name to “Total” and it’s number format to Accounting (0 decimal places) For the second “Amount” field, change its custom name to “Average”, its function to Average and it’s number format to Accounting (0 decimal places) For the third “Amount” field, change its name to “Count” and its function to Count Drag the automatically created field from Column Labels to Row Labels Here’s what we end up with: Total, average and count on the same PivotTable! Conclusion There are many, many more features and options for PivotTables created by Microsoft Excel – far too many to list in an article like this.  To fully cover the potential of PivotTables, a small book (or a large website) would be required.  Brave and/or geeky readers can explore PivotTables further quite easily:  Simply right-click on just about everything, and see what options become available to you.  There are also the two ribbon-tabs: PivotTable Tools/Options and Design.  It doesn’t matter if you make a mistake – it’s easy to delete the PivotTable and start again – a possibility old DOS users of Lotus 1-2-3 never had. We’ve included an Excel that should work with most versions of Excel, so you can download to practice your PivotTable skills. Download Our Practice Excel File Similar Articles Productive Geek Tips Magnify Selected Cells In Excel 2007Share Access Data with Excel in Office 2010Make Excel 2007 Print Gridlines In Workbook FileMake Excel 2007 Always Save in Excel 2003 FormatConvert Older Excel Documents to Excel 2007 Format TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Revo Uninstaller Pro Registry Mechanic 9 for Windows PC Tools Internet Security Suite 2010 PCmover Professional Ben & Jerry’s Free Cone Day, 3/23/10 New Stinger from McAfee Helps Remove ‘FakeAlert’ Threats Google Apps Marketplace: Tools & Services For Google Apps Users Get News Quick and Precise With Newser Scan for Viruses in Ubuntu using ClamAV Replace Your Windows Task Manager With System Explorer

    Read the article

  • excel rows,find if include,low and high

    - by Malin Pedersen
    Link to full-size image For what is marked in orange: As mentioned in the example in the picture it says "headphones". I would like it to search through all the lines in column A, to find something that has that name in it, then it should count the number of people, and come out with the number (in how many) the "middle price" I want it to take the price of B (depending on where it found it called headphones) and take the average price of it. In secured, as I would like it to count how many of them (from the number, or from the beginning) that have "secured" as "no" and "yes." I would like to use this on several things. For what is marked in pink: Where would I find the average price of all the goods, and what the name of the particular item is? Same with the highest and lowest price. How can I do this?

    Read the article

  • Excel INDIRECT function and conditional formatting - highlighting a row

    - by Ehryk
    I'm having an issue with conditional formatting using the INDIRECT function. I'm doing something similar to Using INDIRECT and AND/IF for conditional formatting , but the only answer there isn't working for me. Basically, I want to highlight rows where B is not blank and F is blank. INDIRECT will work for ONE of the conditions, but = AND(INDIRECT("B"&ROW()) > 0, INDIRECT("F"&ROW()) = "") does not work at all. The answer in the question points to replacing the references with relative ones, so I'm thinking this should work: = AND ($B2 > 0, $F2 = "") But it does not, nor does ISBLANK($F@) or ISEMPTY($F2) (the cell contains a formula that sometimes will return "", I want the row highlighted in these cases but only when something is in column B). Am I missing something about relative references? Why doesn't INDIRECT work with AND/OR?

    Read the article

  • Developing an analytics's system processing large amounts of data - where to start

    - by Ryan
    Imagine you're writing some sort of Web Analytics system - you're recording raw page hits along with some extra things like tagging cookies etc and then producing stats such as Which pages got most traffic over a time period Which referers sent most traffic Goals completed (goal being a view of a particular page) And more advanced things like which referers sent the most number of vistors who later hit a goal. The naieve way of approaching this would be to throw it in a relational database and run queries over it - but that won't scale. You could pre-calculate everything (have a queue of incoming 'hits' and use to update report tables) - but what if you later change a goal - how could you efficiently re-calculate just the data that would be effected. Obviously this has been done before ;) so any tips on where to start, methods & examples, architecture, technologies etc.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >