Search Results

Search found 37647 results on 1506 pages for 'sql performance'.

Page 132/1506 | < Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139  | Next Page >

  • SSIS - XML Source Script

    - by simonsabin
    The XML Source in SSIS is great if you have a 1 to 1 mapping between entity and table. You can do more complex mapping but it becomes very messy and won't perform. What other options do you have? The challenge with XML processing is to not need a huge amount of memory. I remember using the early versions of Biztalk with loaded the whole document into memory to map from one document type to another. This was fine for small documents but was an absolute killer for large documents. You therefore need a streaming approach. For flexibility however you want to be able to generate your rows easily, and if you've ever used the XmlReader you will know its ugly code to write. That brings me on to LINQ. The is an implementation of LINQ over XML which is really nice. You can write nice LINQ queries instead of the XMLReader stuff. The downside is that by default LINQ to XML requires a whole XML document to work with. No streaming. Your code would look like this. We create an XDocument and then enumerate over a set of annoymous types we generate from our LINQ statement XDocument x = XDocument.Load("C:\\TEMP\\CustomerOrders-Attribute.xml");   foreach (var xdata in (from customer in x.Elements("OrderInterface").Elements("Customer")                        from order in customer.Elements("Orders").Elements("Order")                        select new { Account = customer.Attribute("AccountNumber").Value                                   , OrderDate = order.Attribute("OrderDate").Value }                        )) {     Output0Buffer.AddRow();     Output0Buffer.AccountNumber = xdata.Account;     Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate); } As I said the downside to this is that you are loading the whole document into memory. I did some googling and came across some helpful videos from a nice UK DPE Mike Taulty http://www.microsoft.com/uk/msdn/screencasts/screencast/289/LINQ-to-XML-Streaming-In-Large-Documents.aspx. Which show you how you can combine LINQ and the XmlReader to get a semi streaming approach. I took what he did and implemented it in SSIS. What I found odd was that when I ran it I got different numbers between using the streamed and non streamed versions. I found the cause was a little bug in Mikes code that causes the pointer in the XmlReader to progress past the start of the element and thus foreach (var xdata in (from customer in StreamReader("C:\\TEMP\\CustomerOrders-Attribute.xml","Customer")                                from order in customer.Elements("Orders").Elements("Order")                                select new { Account = customer.Attribute("AccountNumber").Value                                           , OrderDate = order.Attribute("OrderDate").Value }                                ))         {             Output0Buffer.AddRow();             Output0Buffer.AccountNumber = xdata.Account;             Output0Buffer.OrderDate = Convert.ToDateTime(xdata.OrderDate);         } These look very similiar and they are the key element is the method we are calling, StreamReader. This method is what gives us streaming, what it does is return a enumerable list of elements, because of the way that LINQ works this results in the data being streamed in. static IEnumerable<XElement> StreamReader(String filename, string elementName) {     using (XmlReader xr = XmlReader.Create(filename))     {         xr.MoveToContent();         while (xr.Read()) //Reads the first element         {             while (xr.NodeType == XmlNodeType.Element && xr.Name == elementName)             {                 XElement node = (XElement)XElement.ReadFrom(xr);                   yield return node;             }         }         xr.Close();     } } This code is specifically designed to return a list of the elements with a specific name. The first Read reads the root element and then the inner while loop checks to see if the current element is the type we want. If not we do the xr.Read() again until we find the element type we want. We then use the neat function XElement.ReadFrom to read an element and all its sub elements into an XElement. This is what is returned and can be consumed by the LINQ statement. Essentially once one element has been read we need to check if we are still on the same element type and name (the inner loop) This was Mikes mistake, if we called .Read again we would advance the XmlReader beyond the start of the Element and so the ReadFrom method wouldn't work. So with the code above you can use what ever LINQ statement you like to flatten your XML into the rowsets you want. You could even have multiple outputs and generate your own surrogate keys.        

    Read the article

  • Joins in LINQ to SQL

    - by rajbk
    The following post shows how to write different types of joins in LINQ to SQL. I am using the Northwind database and LINQ to SQL for these examples. NorthwindDataContext dataContext = new NorthwindDataContext(); Inner Join var q1 = from c in dataContext.Customers join o in dataContext.Orders on c.CustomerID equals o.CustomerID select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID], [t1].[OrderDate]FROM [dbo].[Customers] AS [t0]INNER JOIN [dbo].[Orders] AS [t1] ON [t0].[CustomerID] = [t1].[CustomerID] Left Join var q2 = from c in dataContext.Customers join o in dataContext.Orders on c.CustomerID equals o.CustomerID into g from a in g.DefaultIfEmpty() select new { c.CustomerID, c.ContactName, a.OrderID, a.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID] AS [OrderID], [t1].[OrderDate] AS [OrderDate]FROM [dbo].[Customers] AS [t0]LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON [t0].[CustomerID] = [t1].[CustomerID] Inner Join on multiple //We mark our anonymous type properties as a and b otherwise//we get the compiler error "Type inferencce failed in the call to 'Join’var q3 = from c in dataContext.Customers join o in dataContext.Orders on new { a = c.CustomerID, b = c.Country } equals new { a = o.CustomerID, b = "USA" } select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID], [t1].[OrderDate]FROM [dbo].[Customers] AS [t0]INNER JOIN [dbo].[Orders] AS [t1] ON ([t0].[CustomerID] = [t1].[CustomerID]) AND ([t0].[Country] = @p0) Inner Join on multiple with ‘OR’ clause var q4 = from c in dataContext.Customers from o in dataContext.Orders.Where(a => a.CustomerID == c.CustomerID || c.Country == "USA") select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID], [t1].[OrderDate]FROM [dbo].[Customers] AS [t0], [dbo].[Orders] AS [t1]WHERE ([t1].[CustomerID] = [t0].[CustomerID]) OR ([t0].[Country] = @p0) Left Join on multiple with ‘OR’ clause var q5 = from c in dataContext.Customers from o in dataContext.Orders.Where(a => a.CustomerID == c.CustomerID || c.Country == "USA").DefaultIfEmpty() select new { c.CustomerID, c.ContactName, o.OrderID, o.OrderDate }; SELECT [t0].[CustomerID], [t0].[ContactName], [t1].[OrderID] AS [OrderID], [t1].[OrderDate] AS [OrderDate]FROM [dbo].[Customers] AS [t0]LEFT OUTER JOIN [dbo].[Orders] AS [t1] ON ([t1].[CustomerID] = [t0].[CustomerID]) OR ([t0].[Country] = @p0)

    Read the article

  • SQL Sharding and SQL Azure&hellip;

    - by Dave Noderer
    Herve Roggero has just published a paper that outlines patterns for scaling using SQL Azure and the Blue Syntax (he and Scott Klein’s company) sharding api. You can find the paper at: http://www.bluesyntax.net/files/EnzoFramework.pdf Herve and Scott have also just released an Apress book Pro SQL Azure. The idea of being able to split (shard) database operations automatically and control them from a web based management console is very appealing. These ideas have been talked about for a long time and implemented in thousands of very custom ways that have been costly, complicated and fragile. Now, there is light at the end of the tunnel. Scaling database access will become easier and move into the mainstream of application development. The main cost is using an api whenever accessing the database. The api will direct the query to the correct database(s) which may be located locally or in the cloud. It is inevitable that the api will change in the future, perhaps incorporated into a Microsoft offering. Even if this is the case, your application has now been architected to utilize these patterns and details of the actual api will be less important. Herve does a great job of laying out the concepts which every developer and architect should be familiar with!

    Read the article

  • Formatting Keywords to UPPERCASE In Oracle SQL Developer

    - by thatjeffsmith
    I received this question from a customer today, and it took me more than a few minutes to remember where this preference was located in SQL Developer. This tells me that the topic is ripe for blogging How do I go FROM: select * from scott.emp where ename like '%JEFF%' TO SELECT * FROM scott.emp WHERE ename LIKE '%JEFF%' It’s all in the formatting You need to access the formatting preferences under the Tools menu. It takes a bit of navigating to get there, so bear with me: Tools Database SQL Formatter Oracle Formatting Click ‘Edit’ on the profile Other Case change: ‘Keywords Uppercase’ It’s easy to find once you know where to look? You can tell it to leave the case alone, upper everything, upper only the keywords, lower everything. Accessing the Formatter Options We allow separate formatting options for different RDBMS. You need to make sure you’re accessing the ‘Oracle Formatting’ page in the preferences. You can then choose to edit the default options OR you can do what I have done – save the defaults as a new set of options. I’ve called my profile ‘JeffCustom.’ I can now switch back and forth now through different sets of formatting options. You need to hit the ‘Edit’ button to get to the formatting options editor. A good number of people seem to miss this. Select your profile, then hit the ‘Edit’ button

    Read the article

  • MDX Studio download #mdx #ssas

    - by Marco Russo (SQLBI)
    Short version: the latest available version of MDX Studio can be downloaded from http://www.sqlbi.com/tools/mdx-studio/ Long version: Last week Stacia Misner twitted that the online version of MDX Studio was no longer available. It was hosted on http://mdx.mosha.com. It was a sad news, and it is also not good that nobody is maintaining the desktop version of MDX Studio. The latest release is the 0.4.14 and as I am writing it is still available on a SkyDrive link provided by Mosha Pasumansky, who wrote MDX Studio. Mosha does not work in Microsoft now and the entire BI community hopes that somebody will continue its work on this product. Unfortunately, it cannot be published on CodePlex because of some IP restrictions. Only bad news? Well, I hope no. The first good news is that MDX Studio also works with Analysis Services 2012 in Multidimensional mode. The second news is that, after having checked that we can do that, we created a web page on SQLBI web site to download the latest available release of MDX Studio. I hope it will be necessary to update it in the future, by now it is just a way to simplify the finding and download of this precious tool, and to grant that it will not disappear in case the current SkyDrive using to host the download would be discontinued, like it happened to the MDX Studio online version. Now a question to the BI Community: I know that there was some content available regarding tutorial on MDX Studio. I’d like to gather it and to put all in a single place. If you have such content, please contact me directly writing to marco (dot) russo (at) sqlbi [dot] com. Thanks!

    Read the article

  • ESXi 5 network performance is slow

    - by R D
    We just did a fresh install of ESXi 5 on a host that was running ESX 4 before. Nothing has changed hardware wise. After the upgrade network performance is much slower. Even copying a big file from one VM to another VM within same virtual switch is slower compared to other hosts that are running ESX 4. Network cards are auto-negotiating at 1Gbps as were on ESX 4 prior to upgrade. All settings are default and I haven't played with Advanced Settings at all. Before opening a case with vmware, wanted to know if I am missing something or if others have experienced similar issues and found a fix?

    Read the article

  • Connecting to SQL database using SQLCMD

    - by kaleidoscope
    As we all know, there are a number of ways you can connect to your SQL Azure Database. One of the quick options is to try to connect to SQL server is SQLCMD. To start the SQLCMD utility and connect to a named instance of SQL Server Open a Command Prompt window, and type sqlcmd -S myServer\instanceName. Replace myServer\instanceName with the name of the computer and the instance of SQL Server that you want to connect to. Press ENTER. The sqlcmd prompt (1>) indicates that you are connected to the specified instance of SQL Server. SQL Management Studio offers the facility to use SQLCMD from within SQL scripts by using SQLCMD Mode. How to: Enable SQLCMD mode in the Transact-SQL Editor (About how to start the editor, see How to: Start the Transact-SQL Editor.) To toggle SQLCMD mode from the Data menu 1. Open the query in the Transact-SQL editor. 2. On the Data menu, point to Transact-SQL Editor, and click SQLCMD Mode. To toggle SQLCMD mode from the toolbar 1. Open the query in the Transact-SQL editor. 2. On the Transact-SQL Editor toolbar, click SQLCMD Mode. To toggle SQLCMD mode from the shortcut menu 1. Open the query in the Transact-SQL editor. 2. Right-click anywhere in the editor window, and then click SQLCMD Mode. For more information follow below link http://msdn.microsoft.com/en-us/library/ms170207.aspx   Geeta, G

    Read the article

  • Upgrade to 2008 R2

    - by DavidWimbush
    I don't like it, Carruthers. It's just too quiet. Well, I've done the pre-production server, the main live server and the Reporting/BI server with remarkably little trouble. Pre-production and live were rebuilds. I failed live over to our log shipping standby for the duration, which has a gotcha I blogged about before. When I failed back to the primary live server again, it was very quick to bring the databases online. I understand the databases don't actually get upgraded until you recover them but there was no noticable delay. It's gone from 2005 Workgroup - limited to 4GB of memory - to 2008 R2 Standard so it can now use nearly all of the 30GB in the server. It's soo much faster. The reporting/BI server I upgraded in situ. This took a while but, again, went smoothly. Just watch out, because the master database was left at compatibility level 90. Also the upgrade decided to use the reporting service's credentials for database access when running reports. It didn't preserve the existing credentials and I had to go into the Reporting Configuration Manager to put them back in. Make sure you know what credentials your server is using before you upgrade. All things considered, a fairly painless experience. Now I just have to upgrade and reset our log shipping standby server again!

    Read the article

  • Edinburgh this Thurs (25th) - Rob Carrol talks about how to build a high performance, scalable repor

    - by tonyrogerson
    Scottish Area SQL Server User Group Meeting, Edinburgh - Thursday 25th March An evening of SQL Server 2008 Reporting Services Scalability and Performance with Rob Carrol, see how to build a high performance, scalable reporting platform and the tuning techniques required to ensure that report performance remains optimal as your platform grows. Pizza and drinks will be provided! Register at http://www.sqlserverfaq.com/events/221/SQL-Server-2008-Reporting-Services-Scalability-and-Performance.aspx...(read more)

    Read the article

  • How to open ports on modem for better torrent performance

    - by Mehper C. Palavuzlar
    I've been using utorrent to download and upload torrents for a long time. Recently someone told me that I need to open port(s) for utorrent from my modem settings for better downloading and uploading performance. Is it true? If yes, how can I do that? My utorrent version: 2.0 and the port used for incoming connections: 61829. My modem: Yaksu S200 ADSL router modem and I can reach its settings via web interface. I looked at the settings but they seem a bit complicated to me. Other info you may need to know: I have dynamic IP. I'm using Win7 x64.

    Read the article

  • Apache mpm-itk Performance

    - by Matt Beckman
    I manage a bunch of VPSs with memory ranging from 1GB to 8GB. Most of these websites are Joomla websites, and the servers must support multiple sites/users/S-FTP. I use mpm-itk almost exclusively (mostly due to it's convenience in these shared environments). However, I'm aware it isn't known for performance, so I need some advice on making it faster. Due to the lack of documentation when I first went the way of mpm-itk, I included only one setting in the config, and that was to limit each user to 50 clients (the rest I left up to defaults): <IfModule mpm_itk_module> MaxClientsVHost 50 </IfModule> Are there any better alternatives available? Are there any settings supported in mpm-prefork or mpm-worker that are also supported in mpm-itk? Thanks!

    Read the article

  • Performance Testing through distributed jmeter instances and bamboo

    - by user1617754
    I´m working on performance test for several services running in an Amazon network. Our architecture is: Continuous Integration server running in our facilities (Bamboo); A Jmeter server instance in the same network than the services to test; A Jmeter client connected to the JMeter server (ssh tunnels) in our facilities. I want to start the execution of tests from bamboo, and see the different results on it too. Bamboo with <---------> Jmeter server <--------> WebService Jmeter client on Amazon on Amazon Has anybody tried something like this?

    Read the article

  • Performance hit with new hard drive?

    - by aaaidan
    I've recently upgraded my laptop's internal hard drive from a 160GB to 1TB drive. I cloned the drive, then installed it. The general system performance seems appreciably slower. In particular application launches seem to take much longer. Is this possible, or am I just expecting too much from the new drive? It's running a Macbook Pro which is a couple of years old. Any ideas? 160 GB 7MB cache 5400 rpm NCQ (Hitachi HTS545016B9SA02) -- original drive 1 TB 8MB cache 5400 rpm SATA300 NCQ (Western Digital WD10TPVT-00HT5T0) Sisoftware links: Hitachi HTS545016B9SA02 Western Digital WD10TPVT-00HT5T0

    Read the article

  • Troubleshooting server performance with a hosted server?

    - by ProfessionalAmateur
    We are in a tough spot, we have a hosted server with the following specs: OS: Windows Server 2008 R2 Enterprise SP1 64bit Processor: Intel Xeon X7550 @ 2GHz (8 processors) RAM: 16GB The file system is on a SAN or NAS (not sure). We are seeing very odd issues where a user will open a 25MB .xslb file and it takes literally 60-120 seconds sometimes. The server is just dog slow for excel. Resources are not being pegged, CPU never jumps up, plenty of RAM... it's just oddly slow. Our host has been looking at the issue for several weeks with not much to show for it. Is there a utility I can run myself that will help trackdown our issue? I have found Server Performance Advisor V1.0 Any experience in using it? Our host is ultimately responsible for fixing this, but we are going on 1 month and our users are losing patience. Any tips would be helpful.

    Read the article

  • Improving terminal server performance for a specfic app

    - by Matt
    We have a windows 2003 terminal server running 2X application load balancign that is hosting a client's application that is accessed by around 50 users. Each user has there own database. The database is a file based database. The application is developed under Delphi so I think the database may be BDE based. As you can imagine, there is probably quite a lot of disk i/o. Here are some of the perfmon settings. Logged in users (average) 20 - 25 CPU Utilization (average) 80 - 100% Disk Queue Length (average) 1.6 % Disk time (average) 111 Page faults/sec (average) 1400 The application takes on average about a minute to load up. As usual, the budget is tight. Is there basic windows performance tuning tips that people can recommend to improve things before we fork out on more RAM etc. Server is a 2.8GHz Xeon with 3GB of RAM.

    Read the article

  • SQL Server Windows Auth Login sees Domain as untrusted...

    - by Mr Shoubs
    I've had someone set up a domain controller on windows 2008 on one server, and sql server 2008 on another. The domain seems to be working fine, I'm logged on as a domain user on both servers, nothing seems to be a problem there. However, when I try to add a domain user/group to SQL Server Security (e.g. clicking ok from the create login screen) it says it can't find it (even though I've used the search to find the correct account in the first place), when I try to logon (even though I haven't added it yet) it says something about the account being part of an untrusted domain instead of saying I don't have permission to log on. Anyone have any ideas on what is set up incorrectly?

    Read the article

  • Disable Memory Modules In BIOS for Testing Purposes (Optimize Nehalem/Gulftown Memory Performance)

    - by Bob
    I recently acquired an HP Z800 with two Intel Xeon X5650 (Gulftown) 6 core processors. The person that configured the system chose 16GB (8 x 2GB DDR3-1333). I'm assuming this person was unaware these processors have 3 memory channels and to optimize memory performance one should choose memory in multiples of three. Based on this information, I have a question: By entering the BIOS, can I disable the bank on each processor that has the single memory module? If so, will this have any adverse effects or behave differently than physically removing the modules? I ask due to the fact that I prefer to store the extra memory in the system if it truly behaves as if the memory is not even there. Also, I see this as an opportunity to test 12GB vs. 16GB to see if there is a noticeable difference. Note: According to http://www.delltechcenter.com/page/04-08-2009+-+Nehalem+and+Memory+Configurations?t=anon, the current configuration reduces the overall data transfer speed to 1066 and in addition, the memory bandwidth goes down by about 23%.

    Read the article

  • Connecting MS SQL using freetds and unixodbc: isql - no default driver specified

    - by Dejan
    I am trying to connect to the MS SQL database using freetds and unixodbc. I have read various guides how to do it, but no one works fine for me. When I try to connect to the database using isql tool, I get the following error: $ isql -v TS username password [IM002][unixODBC][Driver Manager]Data source name not found, and no default driver specified [ISQL]ERROR: Could not SQLConnect Have anybody already successfully established the connection to the MS SQL database using freetds and unixodbc on Ubuntu 12.04? I would really appreciate some help. Below is the procedure I used to configure the freetds and unixodbc. Thanks for your help in advance! Procedure First, I have installed the following packages sudo apt-get unixodbc unixodbc-dev freetds-dev tdsodbc and configured freetds as follows: --- /etc/freetds/freetds.conf --- [TS] host = SERVER port = 1433 tds version = 7.0 client charset = UTF-8 Using tsql tool I can successfully connect to the database by executing tsql -S TS -U username -P password As I need an odbc connection I configured odbcinst.ini as follows: --- /etc/odbcinst.ini --- [FreeTDS] Description = FreeTDS Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so FileUsage = 1 CPTimeout = CPResuse = client charset = utf-8 and odbc.ini as follows: --- /etc/odbc.ini --- [TS] Description = "test" Driver = FreeTDS Servername = SERVER Server = SERVER Port = 1433 Database = DBNAME Trace = No Trying to connect to the database using isql tool with such a configuration results the following error: $ isql -v TS username password [IM002][unixODBC][Driver Manager]Data source name not found, and no default driver specified [ISQL]ERROR: Could not SQLConnect

    Read the article

  • SSMS Tools Pack now supports Denali CTP1

    - by AaronBertrand
    Earlier today, Mladen Prajdic ( blog | twitter ) released an updated version of his SSMS Tools Pack (v.1.9.4), a free add-in for Management Studio that provides a ton of helpful functionality that isn't available with the native tools. I'm really glad this happened, because I've installed Denali on all of my VMs and have been using it for most of my work, and I've been missing some of the little things the tool adds. In addition to adding Denali support, Mladen also fixed a handful of minor bugs...(read more)

    Read the article

  • The Data Scientist

    - by BuckWoody
    A new term - well, perhaps not that new - has come up and I’m actually very excited about it. The term is Data Scientist, and since it’s new, it’s fairly undefined. I’ll explain what I think it means, and why I’m excited about it. In general, I’ve found the term deals at its most basic with analyzing data. Of course, we all do that, and the term itself in that definition is redundant. There is no science that I know of that does not work with analyzing lots of data. But the term seems to refer to more than the common practices of looking at data visually, putting it in a spreadsheet or report, or even using simple coding to examine data sets. The term Data Scientist (as far as I can make out this early in it’s use) is someone who has a strong understanding of data sources, relevance (statistical and otherwise) and processing methods as well as front-end displays of large sets of complicated data. Some - but not all - Business Intelligence professionals have these skills. In other cases, senior developers, database architects or others fill these needs, but in my experience, many lack the strong mathematical skills needed to make these choices properly. I’ve divided the knowledge base for someone that would wear this title into three large segments. It remains to be seen if a given Data Scientist would be responsible for knowing all these areas or would specialize. There are pretty high requirements on the math side, specifically in graduate-degree level statistics, but in my experience a company will only have a few of these folks, so they are expected to know quite a bit in each of these areas. Persistence The first area is finding, cleaning and storing the data. In some cases, no cleaning is done prior to storage - it’s just identified and the cleansing is done in a later step. This area is where the professional would be able to tell if a particular data set should be stored in a Relational Database Management System (RDBMS), across a set of key/value pair storage (NoSQL) or in a file system like HDFS (part of the Hadoop landscape) or other methods. Or do you examine the stream of data without storing it in another system at all? This is an important decision - it’s a foundation choice that deals not only with a lot of expense of purchasing systems or even using Cloud Computing (PaaS, SaaS or IaaS) to source it, but also the skillsets and other resources needed to care and feed the system for a long time. The Data Scientist sets something into motion that will probably outlast his or her career at a company or organization. Often these choices are made by senior developers, database administrators or architects in a company. But sometimes each of these has a certain bias towards making a decision one way or another. The Data Scientist would examine these choices in light of the data itself, starting perhaps even before the business requirements are created. The business may not even be aware of all the strategic and tactical data sources that they have access to. Processing Once the decision is made to store the data, the next set of decisions are based around how to process the data. An RDBMS scales well to a certain level, and provides a high degree of ACID compliance as well as offering a well-known set-based language to work with this data. In other cases, scale should be spread among multiple nodes (as in the case of Hadoop landscapes or NoSQL offerings) or even across a Cloud provider like Windows Azure Table Storage. In fact, in many cases - most of the ones I’m dealing with lately - the data should be split among multiple types of processing environments. This is a newer idea. Many data professionals simply pick a methodology (RDBMS with Star Schemas, NoSQL, etc.) and put all data there, regardless of its shape, processing needs and so on. A Data Scientist is familiar not only with the various processing methods, but how they work, so that they can choose the right one for a given need. This is a huge time commitment, hence the need for a dedicated title like this one. Presentation This is where the need for a Data Scientist is most often already being filled, sometimes with more or less success. The latest Business Intelligence systems are quite good at allowing you to create amazing graphics - but it’s the data behind the graphics that are the most important component of truly effective displays. This is where the mathematics requirement of the Data Scientist title is the most unforgiving. In fact, someone without a good foundation in statistics is not a good candidate for creating reports. Even a basic level of statistics can be dangerous. Anyone who works in analyzing data will tell you that there are multiple errors possible when data just seems right - and basic statistics bears out that you’re on the right track - that are only solvable when you understanding why the statistical formula works the way it does. And there are lots of ways of presenting data. Sometimes all you need is a “yes” or “no” answer that can only come after heavy analysis work. In that case, a simple e-mail might be all the reporting you need. In others, complex relationships and multiple components require a deep understanding of the various graphical methods of presenting data. Knowing which kind of chart, color, graphic or shape conveys a particular datum best is essential knowledge for the Data Scientist. Why I’m excited I love this area of study. I like math, stats, and computing technologies, but it goes beyond that. I love what data can do - how it can help an organization. I’ve been fortunate enough in my professional career these past two decades to work with lots of folks who perform this role at companies from aerospace to medical firms, from manufacturing to retail. Interestingly, the size of the company really isn’t germane here. I worked with one very small bio-tech (cryogenics) company that worked deeply with analysis of complex interrelated data. So  watch this space. No, I’m not leaving Azure or distributed computing or Microsoft. In fact, I think I’m perfectly situated to investigate this role further. We have a huge set of tools, from RDBMS to Hadoop to allow me to explore. And I’m happy to share what I learn along the way.

    Read the article

  • SQL in the City (Charlotte) Wrap Up

    - by drsql
    Ok, it has been quite a while since the event, two weeks and a day to be exact, but I needed a rest before hitting Windows Live Writer again. Speaking is exhausting, traveling is exhausting, and well, I replaced my laptop and had to get all of my software back together. (Between Windows 8.1 sync features, Dropbox and Skydrive, it has never been easier…but I digress.) There are plenty of great vendors out there, but one of my favorites has always been Red-Gate. I have written half of a book with them,...(read more)

    Read the article

  • Raid0 performance degradation?

    - by davy8
    Not sure if this belongs here or on SuperUser, feel free to move as appropriate. I've noticed the performance on my RAID0 setup seems to have degraded over the past months. The throughput is fine, but I think the random access time has increased or something. In use I generally see about 1-5mb/sec when loading stuff in Visual Studio and other apps and it doesn't seem like the CPU is bottlenecking as the CPU utilization is pretty low. I don't recall what Access Time used to be, but HD Tune is reporting 12.6ms Read throughput is showing as averaging about 125MB/sec so it should be great for sequential reads. Defrag daily and it shows fragmentation levels low, so that shouldn't be an issue. Additional info, Windows 7 x64, Intel raid controller on mobo, WD Black 500GB (I think 32mb cache) x2.

    Read the article

  • Raid0 performance degradation?

    - by davy8
    Not sure if this belongs here or on SuperUser, feel free to move as appropriate. I've noticed the performance on my RAID0 setup seems to have degraded over the past months. The throughput is fine, but I think the random access time has increased or something. In use I generally see about 1-5mb/sec when loading stuff in Visual Studio and other apps and it doesn't seem like the CPU is bottlenecking as the CPU utilization is pretty low. I don't recall what Access Time used to be, but HD Tune is reporting 12.6ms Read throughput is showing as averaging about 125MB/sec so it should be great for sequential reads. Defrag daily and it shows fragmentation levels low, so that shouldn't be an issue. Additional info, Windows 7 x64, Intel raid controller on mobo, WD Black 500GB (I think 32mb cache) x2.

    Read the article

  • Smart defaults [SSDT]

    - by jamiet
    I’ve just discovered a new, somewhat hidden, feature in SSDT that I didn’t know about and figured it would be worth highlighting here because I’ll bet not many others know it either; the feature is called Smart Defaults. It gets around the problem of adding a NOT NULLable column to an existing table that has got data in it – previous to SSDT you would need to define a DEFAULT constraint however it does feel rather cumbersome to create an object purely for the purpose of pushing through a deployment – that’s the situation that Smart Defaults is meant to alleviate. The Smart Defaults option exists in the advanced section of a Publish Profile file: The description of the setting is “Automatically provides a default value when updating a table that contains data with a column that does not allow null values”, in other words checking that option will cause SSDT to insert an arbitrary default value into your newly created NON NULLable column. In case you’re wondering how it does it, here’s how: SSDT creates a DEFAULT CONSTRAINT at the same time as the column is created and then immediately removes that constraint: ALTER TABLE [dbo].[T1]    ADD [C1] INT NOT NULL,         CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b] DEFAULT 0 FOR [C1];ALTER TABLE [dbo].[T1] DROP CONSTRAINT [SD_T1_1df7a5f76cf44bb593506d05ff9a1e2b]; You can then update the value as appropriate in a Post-Deployment script. Pretty cool! On the downside, you can only specify this option for the whole project, not for an individual table or even an individual column – I’m not sure that I’d want to turn this on for an entire project as it could hide problems that a failed deployment would highlight, in other words smart defaults could be seen to be “papering over the cracks”. If you think that should be improved go and vote (and leave a comment) at [SSDT] Allow us to specify Smart defaults per table or even per column. @Jamiet

    Read the article

< Previous Page | 128 129 130 131 132 133 134 135 136 137 138 139  | Next Page >