Search Results

Search found 134806 results on 5393 pages for 'file server'.

Page 143/5393 | < Previous Page | 139 140 141 142 143 144 145 146 147 148 149 150 | Next Page >

SQL SERVER – Solution – Puzzle – SELECT * vs SELECT COUNT(*)

- by pinaldave

Earlier I have published Puzzle Why SELECT * throws an error but SELECT COUNT(*) does not. This question have received many interesting comments. Let us go over few of the answers, which are valid. Before I start the same, let me acknowledge Rob Farley who has not only answered correctly very first but also started interesting conversation in the same thread. The usual question will be what is the right answer. I would like to point to official Microsoft Connect Items which discusses the same. RGarvao https://connect.microsoft.com/SQLServer/feedback/details/671475/select-test-where-exists-select tiberiu utan http://connect.microsoft.com/SQLServer/feedback/details/338532/count-returns-a-value-1 Rob Farley count(*) is about counting rows, not a particular column. It doesn’t even look to see what columns are available, it’ll just count the rows, which in the case of a missing FROM clause, is 1. “select *” is designed to return columns, and therefore barfs if there are none available. Even more odd is this one: select ‘blah’ where exists (select *) You might be surprised at the results… Koushik The engine performs a “Constant scan” for Count(*) where as in the case of “SELECT *” the engine is trying to perform either Index/Cluster/Table scans. amikolaj When you query ‘select * from sometable’, SQL replaces * with the current schema of that table. With out a source for the schema, SQL throws an error. so when you query ‘select count(*)’, you are counting the one row. * is just a constant to SQL here. Check out the execution plan. Like the description states – ‘Scan an internal table of constants.’ You could do ‘select COUNT(‘my name is adam and this is my answer’)’ and get the same answer. Netra Acharya SELECT * Here, * represents all columns from a table. So it always looks for a table (As we know, there should be FROM clause before specifying table name). So, it throws an error whenever this condition is not satisfied. SELECT COUNT(*) Here, COUNT is a Function. So it is not mandetory to provide a table. Check it out this: DECLARE @cnt INT SET @cnt = COUNT(*) SELECT @cnt SET @cnt = COUNT(‘x’) SELECT @cnt Naveen Select 1 / Select ‘*’ will return 1/* as expected. Select Count(1)/Count(*) will return the count of result set of select statement. Count(1)/Count(*) will have one 1/* for each row in the result set of select statement. Select 1 or Select ‘*’ result set will contain only 1 result. so count is 1. Where as “Select *” is a sysntax which expects the table or equauivalent to table (table functions, etc..). It is like compilation error for that query. Ramesh Hi Friends, Count is an aggregate function and it expects the rows (list of records) for a specified single column or whole rows for *. So, when we use ‘select *’ it definitely give and error because ‘*’ is meant to have all the fields but there is not any table and without table it can only raise an error. So, in the case of ‘Select Count(*)’, there will be an error as a record in the count function so you will get the result as ’1'. Try using : Select COUNT(‘RAMESH’) and think there is an error ‘Must specify table to select from.’ in place of ‘RAMESH’ Pinal : If i am wrong then please clarify this. Sachin Nandanwar Any aggregate function expects a constant or a column name as an expression. DO NOT be confused with * in an aggregate function.The aggregate function does not treat it as a column name or a set of column names but a constant value, as * is a key word in SQL. You can replace any value instead of * for the COUNT function.Ex Select COUNT(5) will result as 1. The error resulting from select * is obvious it expects an object where it can extract the result set. I sincerely thank you all for wonderful conversation, I personally enjoyed it and I am sure all of you have the same feeling. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, Pinal Dave, PostADay, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
Date and Time Support in SQL Server 2008

- by Aamir Hasan

Using the New Date and Time Data Types Normal 0 false false false EN-US X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} 1. The new date and time data types in SQL Server 2008 offer increased range and precision and are ANSI SQL compatible. 2. Separate date and time data types minimize storage space requirements for applications that need only date or time information. Moreover, the variable precision of the new time data type increases storage savings in exchange for reduced accuracy. 3. The new data types are mostly compatible with the original date and time data types and use the same Transact-SQL functions. 4. The datetimeoffset data type allows you to handle date and time information in global applications that use data that originates from different time zones. SELECT c.name, p.* FROM politics pJOIN country cON p.country = c.codeWHERE YEAR(Independence) < 1753ORDER BY IndependenceGO8. Highlight the SELECT statement and click Execute ( ) to show the use of some of the date functions.T-SQLSELECT c.name AS [Country Name], CONVERT(VARCHAR(12), p.Independence, 107) AS [Independence Date], DATEDIFF(YEAR, p.Independence, GETDATE()) AS [Years Independent (appox)], p.GovernmentFROM politics pJOIN country cON p.country = c.codeWHERE YEAR(Independence) < 1753ORDER BY IndependenceGO10. Select the SET DATEFORMAT statement and click Execute ( ) to change the DATEFORMAT to day-month-year.T-SQLSET DATEFORMAT dmyGO11. Select the DECLARE and SELECT statements and click Execute ( ) to show how the datetime and datetime2 data types interpret a date literal.T-SQLSET DATEFORMAT dmyDECLARE @dt datetime = '2008-12-05'DECLARE @dt2 datetime2 = '2008-12-05'SELECT MONTH(@dt) AS [Month-Datetime], DAY(@dt) AS [Day-Datetime]SELECT MONTH(@dt2) AS [Month-Datetime2], DAY(@dt2) AS [Day-Datetime2]GO12. Highlight the DECLARE and SELECT statements and click Execute ( ) to use integer arithmetic on a datetime variable.T-SQLDECLARE @dt datetime = '2008-12-05'SELECT @dt + 1GO13. Highlight the DECLARE and SELECT statements and click Execute ( ) to show how integer arithmetic is not allowed for datetime2 variables.T-SQLDECLARE @dt2 datetime = '2008-12-05'SELECT @dt2 + 1GO14. Highlight the DECLARE and SELECT statements and click Execute ( ) to show how to use DATE functions to do simple arithmetic on datetime2 variables.T-SQLDECLARE @dt2 datetime2(7) = '2008-12-05'SELECT DATEADD(d, 1, @dt2)GO15. Highlight the DECLARE and SELECT statements and click Execute ( ) to show how the GETDATE function can be used with both datetime and datetime2 data types.T-SQLDECLARE @dt datetime = GETDATE();DECLARE @dt2 datetime2(7) = GETDATE();SELECT @dt AS [GetDate-DateTime], @dt2 AS [GetDate-DateTime2]GO16. Draw attention to the values returned for both columns and how they are equal.17. Highlight the DECLARE and SELECT statements and click Execute ( ) to show how the SYSDATETIME function can be used with both datetime and datetime2 data types.T-SQLDECLARE @dt datetime = SYSDATETIME();DECLARE @dt2 datetime2(7) = SYSDATETIME();SELECT @dt AS [Sysdatetime-DateTime], @dt2 AS [Sysdatetime-DateTime2]GO18. Draw attention to the values returned for both columns and how they are different.Programming Global Applications with DateTimeOffset 2. If you have not previously created the SQLTrainingKitDB database while completing another demo in this training kit, highlight the CREATE DATABASE statement and click Execute ( ) to do so now.T-SQLCREATE DATABASE SQLTrainingKitDBGO3. Select the USE and CREATE TABLE statements and click Execute ( ) to create table datetest in the SQLTrainingKitDB database.T-SQLUSE SQLTrainingKitDBGOCREATE TABLE datetest ( id integer IDENTITY PRIMARY KEY, datetimecol datetimeoffset, EnteredTZ varchar(40)); Reference:http://www.microsoft.com/downloads/details.aspx?FamilyID=E9C68E1B-1E0E-4299-B498-6AB3CA72A6D7&displaylang=en

Read the article
SQL SERVER – OVER clause with FIRST _VALUE and LAST_VALUE – Analytic Functions Introduced in SQL Server 2012 – ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING

- by pinaldave

Yesterday I had discussed two analytical functions FIRST_VALUE and LAST_VALUE. After reading the blog post I received very interesting question. “Don’t you think there is bug in your first example where FIRST_VALUE is remain same but the LAST_VALUE is changing every line. I think the LAST_VALUE should be the highest value in the windows or set of result.” I find this question very interesting because this is very commonly made mistake. No there is no bug in the code. I think what we need is a bit more explanation. Let me attempt that first. Before you do that I suggest you read yesterday’s blog post as this question is related to that blog post. Now let’s have fun following query: USE AdventureWorks GO SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty, FIRST_VALUE(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID) FstValue, LAST_VALUE(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID) LstValue FROM Sales.SalesOrderDetail s WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty GO The above query will give us the following result: As per the reader’s question the value of the LAST_VALUE function should be always 114 and not increasing as the rows are increased. Let me re-write the above code once again with bit extra T-SQL Syntax. Please pay special attention to the ROW clause which I have added in the above syntax. USE AdventureWorks GO SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty, FIRST_VALUE(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FstValue, LAST_VALUE(SalesOrderDetailID) OVER (ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) LstValue FROM Sales.SalesOrderDetail s WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty GO Now once again check the result of the above query. The result of both the query is same because in OVER clause the default ROWS selection is always UNBOUNDED PRECEDING AND CURRENT ROW. If you want the maximum value of the windows with OVER clause you need to change the syntax to UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING for ROW clause. Now run following query and pay special attention to ROW clause again. USE AdventureWorks GO SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty, FIRST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FstValue, LAST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) LstValue FROM Sales.SalesOrderDetail s WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty GO Here is the resultset of the above query which is what questioner was asking. So in simple word, there is no bug but there is additional syntax needed to add to get your desired answer. The same logic also applies to PARTITION BY clause when used. Here is quick example of how we can further partition the query by SalesOrderDetailID with this new functions. USE AdventureWorks GO SELECT s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty, FIRST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) FstValue, LAST_VALUE(SalesOrderDetailID) OVER (PARTITION BY SalesOrderID ORDER BY SalesOrderDetailID ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) LstValue FROM Sales.SalesOrderDetail s WHERE SalesOrderID IN (43670, 43669, 43667, 43663) ORDER BY s.SalesOrderID,s.SalesOrderDetailID,s.OrderQty GO Above query will give us windowed resultset on SalesOrderDetailsID as well give us FIRST and LAST value for the windowed resultset. There are lots to discuss for this two functions and we have just explored tip of the iceberg. In future post I will discover it further deep. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, SQL, SQL Authority, SQL Function, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – UNION ALL and ORDER BY – How to Order Table Separately While Using UNION ALL

- by pinaldave

I often see developers trying following syntax while using ORDER BY. SELECT Columns FROM TABLE1 ORDER BY Columns UNION ALL SELECT Columns FROM TABLE2 ORDER BY Columns However the above query will return following error. Msg 156, Level 15, State 1, Line 5 Incorrect syntax near the keyword ‘ORDER’. It is not possible to use two different ORDER BY in the UNION statement. UNION returns single resultsetand as per the Logical Query Processing Phases. However, if your requirement is such that you want your top and bottom query of the UNION resultset independently sorted but in the same resultset you can add an additional static column and order by that column. Let us re-create the same scenario. First create two tables and populated with sample data. USE tempdb GO -- Create table CREATE TABLE t1 (ID INT, Col1 VARCHAR(100)); CREATE TABLE t2 (ID INT, Col1 VARCHAR(100)); GO -- Sample Data Build INSERT INTO t1 (ID, Col1) SELECT 1, 'Col1-t1' UNION ALL SELECT 2, 'Col2-t1' UNION ALL SELECT 3, 'Col3-t1'; INSERT INTO t2 (ID, Col1) SELECT 3, 'Col1-t2' UNION ALL SELECT 2, 'Col2-t2' UNION ALL SELECT 1, 'Col3-t2'; GO If we SELECT the data from both the table using UNION ALL . -- SELECT without ORDER BY SELECT ID, Col1 FROM t1 UNION ALL SELECT ID, Col1 FROM t2 GO We will get the data in following order. However, our requirement is to get data in following order. If we need data ordered by Column1 we can ORDER the resultset ordered by Column1. -- SELECT with ORDER BY SELECT ID, Col1 FROM t1 UNION ALL SELECT ID, Col1 FROM t2 ORDER BY ID GO Now to get the data in independently sorted in UNION ALL let us add additional column OrderKey and use ORDER BY on that column. I think the description does not do proper justice let us see the example here. -- SELECT with ORDER BY - with ORDER KEY SELECT ID, Col1, 'id1' OrderKey FROM t1 UNION ALL SELECT ID, Col1, 'id2' OrderKey FROM t2 ORDER BY OrderKey, ID GO The above query will give the desired result. Now do not forget to clean up the database by running the following script. -- Clean up DROP TABLE t1; DROP TABLE t2; GO Here is the complete script used in this example. USE tempdb GO -- Create table CREATE TABLE t1 (ID INT, Col1 VARCHAR(100)); CREATE TABLE t2 (ID INT, Col1 VARCHAR(100)); GO -- Sample Data Build INSERT INTO t1 (ID, Col1) SELECT 1, 'Col1-t1' UNION ALL SELECT 2, 'Col2-t1' UNION ALL SELECT 3, 'Col3-t1'; INSERT INTO t2 (ID, Col1) SELECT 3, 'Col1-t2' UNION ALL SELECT 2, 'Col2-t2' UNION ALL SELECT 1, 'Col3-t2'; GO -- SELECT without ORDER BY SELECT ID, Col1 FROM t1 UNION ALL SELECT ID, Col1 FROM t2 GO -- SELECT with ORDER BY SELECT ID, Col1 FROM t1 UNION ALL SELECT ID, Col1 FROM t2 ORDER BY ID GO -- SELECT with ORDER BY - with ORDER KEY SELECT ID, Col1, 'id1' OrderKey FROM t1 UNION ALL SELECT ID, Col1, 'id2' OrderKey FROM t2 ORDER BY OrderKey, ID GO -- Clean up DROP TABLE t1; DROP TABLE t2; GO I am sure there are many more ways to achieve this, what method would you use if you have to face the similar situation? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Best Practices, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Introduction to Big Data – Guest Post

- by pinaldave

BIG Data – such a big word – everybody talks about this now a days. It is the word in the database world. In one of the conversation I asked my friend Jasjeet Sigh the same question – what is Big Data? He instantly came up with a very effective write-up. Jasjeet is working as a Technical Manager with Koenig Solutions. He leads the SQL domain, and holds rich IT industry experience. Talking about Koenig, it is a 19 year old IT training company that offers several certification choices. Some of its courses include SharePoint Training, Project Management certifications, Microsoft Trainings, Business Intelligence programs, Web Design and Development courses etc. Big Data, as the name suggests, is about data that is BIG in nature. The data is BIG in terms of size, and it is difficult to manage such enormous data with relational database management systems that are quite popular these days. Big Data is not just about being large in size, it is also about the variety of the data that differs in form or type. Some examples of Big Data are given below : Scientific data related to weather and atmosphere, Genetics etc Data collected by various medical procedures, such as Radiology, CT scan, MRI etc Data related to Global Positioning System Pictures and Videos Radio Frequency Data Data that may vary very rapidly like stock exchange information Apart from difficulties in managing and storing such data, it is difficult to query, analyze and visualize it. The characteristics of Big Data can be defined by four Vs: Volume: It simply means a large volume of data that may span Petabyte, Exabyte and so on. However it also depends organization to organization that what volume of data they consider as Big Data. Variety: As discussed above, Big Data is not limited to relational information or structured Data. It can also include unstructured data like pictures, videos, text, audio etc. Velocity: Velocity means the speed by which data changes. The higher is the velocity, the more efficient should be the system to capture and analyze the data. Missing any important point may lead to wrong analysis or may even result in loss. Veracity: It has been recently added as the fourth V, and generally means truthfulness or adherence to the truth. In terms of Big Data, it is more of a challenge than a characteristic. It is difficult to ascertain the truth out of the enormous amount of data and the one that has high velocity. There are always chances of having un-precise and uncertain data. It is a challenging task to clean such data before it is analyzed. Big Data can be considered as the next big thing in the IT sector in terms of innovation and development. If appropriate technologies are developed to analyze and use the information, it can be the driving force for almost all industrial segments. These include Retail, Manufacturing, Service, Finance, Healthcare etc. This will help them to automate business decisions, increase productivity, and innovate and develop new products. Thanks Jasjeet Singh for an excellent write up. Jasjeet Sign is working as a Technical Manager with Koenig Solutions. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Database, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: Big Data

Read the article
SQL SERVER – Partition Parallelism Support in expressor 3.6

- by pinaldave

I am very excited to learn that there is a new version of expressor’s data integration platform coming out in March of this year. It will be version 3.6, and I look forward to using it and telling everyone about it. Let me describe a little bit more about what will be so great in expressor 3.6: Greatly enhanced user interface Parallel Processing Bulk Artifact Upgrading The User Interface First let me cover the most obvious enhancements. The expressor Studio user interface (UI) has had some significant work done. Kudos to the expressor Engineering team; the entire UI is a visual masterpiece that is very responsive and intuitive. The improvements are more than just eye candy; they provide significant productivity gains when developing expressor Dataflows. Operator shape icons now include a description that identifies the function of each operator, instead of having to guess at the function by the icon. Operator shapes and highlighting depict the current function and status: Disabled, enabled, complete, incomplete, and error. Each status displays an appropriate message in the message panel with correction suggestions. Floating or docking property panels provide descriptive tool tips for each property as well as auto resize when adjusting the canvas, without having to search Help or the need to scroll around to get access to the property. Progress and status indicators let you know when an operation is working. “No limit” canvas with snap-to-grid allows automatic sizing and accurate positioning when you have numerous operators in the Dataflow. The inline tool bar offers quick access to pan, zoom, fit and overview functions. Selecting multiple artifacts with a right click context allows you to easily manage your workspace more efficiently. Partitioning and Parallel Processing Partitioning allows each operator to process multiple subsets of records in parallel as opposed to processing all records that flow through that operator in a single sequential set. This capability allows the user to configure the expressor Dataflow to run in a way that most efficiently utilizes the resources of the hardware where the Dataflow is running. Partitions can exist in most individual operators. Using partitions increases the speed of an expressor data integration application, therefore improving performance and load times. With the expressor 3.6 Enterprise Edition, expressor simplifies enabling parallel processing by adding intuitive partition settings that are easy to configure. Bulk Artifact Upgrading Bulk Artifact Upgrading sounds a bit intimidating, but it actually is not and it is a welcome addition to expressor Studio. In past releases, users were prompted to confirm that they wanted to upgrade their individual artifacts only when opened. This was a cumbersome and repetitive process. Now with bulk artifact upgrading, a user can easily select what artifact or group of artifacts to upgrade all at once. As you can see, there are many new features and upgrade options that will prove to make expressor Studio quicker and more efficient. I hope I’m not the only one who is excited about all these new upgrades, and that I you try expressor and share your experience with me. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
SQL SERVER – Renaming Index – Index Naming Conventions

- by pinaldave

If you are regular reader of this blog, you must be aware of that there are two kinds of blog posts 1) I share what I learn recently 2) I share what I learn and request your participation. Today’s blog post is where I need your opinion to make this blog post a good reference for future. Background Story Recently I came across system where users have changed the name of the few of the table to match their new standard naming convention. The name of the table should be self explanatory and they should have explain their purpose without either opening it or reading documentations. Well, not every time this is possible but again this should be the goal of any database modeler. Well, I no way encourage the name of the tables to be too long like ‘ContainsDetailsofNewInvoices’. May be the name of the table should be ‘Invoices’ and table should contain a column with New/Processed bit filed to indicate if the invoice is processed or not (if necessary). Coming back to original story, the database had several tables of which the name were changed. Story Continues… To continue the story let me take simple example. There was a table with the name ’ReceivedInvoices’, it was changed to new name as ‘TblInvoices’. As per their new naming standard they had to prefix every talbe with the words ‘Tbl’ and prefix every view with the letters ‘Vw’. Personally I do not see any need of the prefix but again, that issue is not here to discuss. Now after changing the name of the table they faced very interesting situation. They had few indexes on the table which had name of the table. Let us take an example. Old Name of Table: ReceivedInvoice Old Name of Index: Index_ReceivedInvoice1 Here is the new names New Name of Table: TblInvoices New Name of Index: ??? Well, their dilemma was what should be the new naming convention of the Indexes. Here is a quick proposal of the Index naming convention. Do let me know your opinion. If Index is Primary Clustered Index: PK_TableName If Index is Non-clustered Index: IX_TableName_ColumnName1_ColumnName2… If Index is Unique Non-clustered Index: UX_TableName_ColumnName1_ColumnName2… If Index is Columnstore Non-clustered Index: CL_TableName Here ColumnName is the column on which index is created. As there can be only one Primary Key Index and Columnstore Index per table, they do not require ColumnName in the name of the index. The purpose of this new naming convention is to increase readability. When any user come across this index, without opening their properties or definition, user can will know the details of the index. T-SQL script to Rename Indexes Here is quick T-SQL script to rename Indexes EXEC sp_rename N'SchemaName.TableName.IndexName', N'New_IndexName', N'INDEX'; GO Your Contribute Please Well, the organization has already defined above four guidelines, personally I follow very similar guidelines too. I have seen many variations like adding prefixes CL for Clustered Index and NCL for Non-clustered Index. I have often seen many not using UX prefix for Unique Index but rather use generic IX prefix only. Now do you think if they have missed anything in the coding standard. Is NCI and CI prefixed required to additionally describe the index names. I have once received suggestion to even add fill factor in the index name – which I do not recommend at all. What do you think should be ideal name of the index, so it explains all the most important properties? Additionally, you are welcome to vote if you believe changing the name of index is just waste of time and energy. Note: The purpose of the blog post is to encourage all to participate with their ideas. I will write follow up blog posts in future compiling all the suggestions. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Index, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Asynchronous Update and Timestamp – Check if Row Values are Changed Since Last Retrieve

- by pinaldave

Here is the question received just this morning. “Pinal, Our application is much different than other application you might have come across. In simple words, I would like to call it Asynchronous Updated Application. We need your quick opinion about one of the situation which we are facing. From business side: We have bidding system (similar to eBay but not exactly) and where multiple parties bid on one item, during the last few minutes of bidding many parties try to bid at the same time with the same price. When they hit submit, we would like to check if the original data which they retrieved is changed or not. If the original data which they have retrieved is the same, we will accept their new proposed price. If original data are changed, they will have to resubmit the data with new price. From technical side: We have a row which we retrieve in our application. Multiple users are retrieving the same row. Some of the users will update the value of the row and submit. However, only the very first user should be allowed to update the row and remaining all the users will have to re-fetch the row and updated it once again. We do not want to lock any record as that will create other problems. Do you have any solution for this kind of situation?” Fantastic Question. I believe there is good chance that we can use timestamp datatype in this kind of application. Before we continue let us see following simple example. USE tempdb GO CREATE TABLE SampleTable (ID INT, Col1 VARCHAR(100), TimeStampCol TIMESTAMP) GO INSERT INTO SampleTable (ID, Col1) VALUES (1, 'FirstVal') GO SELECT ID, Col1, TimeStampCol FROM SampleTable st GO UPDATE SampleTable SET Col1 = 'NextValue' GO SELECT ID, Col1, TimeStampCol FROM SampleTable st GO DROP TABLE SampleTable GO Now let us see the resultset. Here is the simple explanation of the scenario. We created a table with simple column with TIMESTAMP datatype. When we inserted a very first value the timestamp was generated. When we updated any value in that row, the timestamp was updated with the new value. Every single time when we update any value in the row, it will generate new timestamp value. Now let us apply this in an original question’s scenario. In that case multiple users are retrieving the same row. Everybody will have the same now same TimeStamp with them. Before any user update any value they should once again retrieve the timestamp from the table and compare with the timestamp they have with them. If both of the timestamp have the same value – the original row has not been updated and we can safely update the row with the new value. After initial update, now the row will contain a new timestamp. Any subsequent update to the same row should also go to the same process of checking the value of the timestamp they have in their memory. In this case, the timestamp from memory will be different from the timestamp in the row. This indicates that row in the table has changed and new updates should not be allowed. I believe timestamp can be very very useful in this kind of scenario. Is there any better alternative? Please leave a comment with the suggestion and I will post on the blog with due credit. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

Read the article
SQL SERVER – Finding Different ColumnName From Almost Identitical Tables

- by pinaldave

I have mentioned earlier on this blog that I love social media – Facebook and Twitter. I receive so many interesting questions that sometimes I wonder how come I never faced them in my real life scenario. Well, let us see one of the similar situation. Here is one of the questions which I received on my social media handle. “Pinal, I have a large database. I did not develop this database but I have inherited this database. In our database we have many tables but all the tables are in pairs. We have one archive table and one current table. Now here is interesting situation. For a while due to some reason our organization has stopped paying attention to archive data. We did not archive anything for a while. If this was not enough we even changed the schema of current table but did not change the corresponding archive table. This is now becoming a huge huge problem. We know for sure that in current table we have added few column but we do not know which ones. Is there any way we can figure out what are the new column added in the current table and does not exist in the archive tables? We cannot use any third party tool. Would you please guide us?” Well here is the interesting example of how we can use sys.column catalogue views and get the details of the newly added column. I have previously written about EXCEPT over here which is very similar to MINUS of Oracle. In following example we are going to create two tables. One of the tables has extra column. In our resultset we will get the name of the extra column as we are comparing the catalogue view of the column name. USE AdventureWorks2012 GO CREATE TABLE ArchiveTable (ID INT, Col1 VARCHAR(10), Col2 VARCHAR(100), Col3 VARCHAR(100)); CREATE TABLE CurrentTable (ID INT, Col1 VARCHAR(10), Col2 VARCHAR(100), Col3 VARCHAR(100), ExtraCol INT); GO -- Columns in ArchiveTable but not in CurrentTable SELECT name ColumnName FROM sys.columns WHERE OBJECT_NAME(OBJECT_ID) = 'ArchiveTable' EXCEPT SELECT name ColumnName FROM sys.columns WHERE OBJECT_NAME(OBJECT_ID) = 'CurrentTable' GO -- Columns in CurrentTable but not in ArchiveTable SELECT name ColumnName FROM sys.columns WHERE OBJECT_NAME(OBJECT_ID) = 'CurrentTable' EXCEPT SELECT name ColumnName FROM sys.columns WHERE OBJECT_NAME(OBJECT_ID) = 'ArchiveTable' GO DROP TABLE ArchiveTable; DROP TABLE CurrentTable; GO The above query will return us following result. I hope this solves the problems. It is not the most elegant solution ever possible but it works. Here is the puzzle back to you – what native T-SQL solution would you have provided in this situation? Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL System Table, SQL Tips and Tricks, T SQL, Technology

Read the article
uWSGI log file...permission denied to read file

- by bkev

I have a server running Django/Nginx/uWSGI with uWSGI in emperor mode, and the error log for it (the vassal-level error log, not the emperor-level log) has a continual permissions error every time it spawns a new worker, like so: Tue Jun 26 19:34:55 2012 - Respawned uWSGI worker 2 (new pid: 9334) Error opening file for reading: Permission denied Problem is, I don't know what file it's having trouble opening; it's not the log file, obviously, since I'm looking at it and it's writing to that without issue. Any way to find out? I'm running the apt-get version of uWSGI 1.0.3-debian through Upstart on Ubuntu 12.04. The site is working successfully, aside from what seems like a memory leak...hence my looking at the log file. My Upstart conf file description "uWSGI" start on runlevel [2345] stop on runlevel [06] respawn env UWSGI=/usr/bin/uwsgi env LOGTO=/var/log/uwsgi/emperor.log exec $UWSGI \ --master \ --emperor /etc/uwsgi/vassals \ --die-on-term \ --auto-procname \ --no-orphans \ --logto $LOGTO \ --logdate My Vassal ini file: [uwsgi] # Variables base = /srv/env/mysiteenv # Generic Config uid = uwsgi gid = uwsgi socket = 127.0.0.1:5050 master = true processes = 2 reload-on-as = 128 harakiri = 60 harakiri-verbose = true auto-procname = true plugins = http,python cache = 2000 home = %(base) pythonpath = %(base)/mysite module = wsgi logto = /srv/log/mysite/uwsgi_error.log logdate = true

Read the article
Create "raw disk file" from WIM file

- by Joe Baltimore

First timer here. I've searched around here, but haven't found a question like the one I have. Apologies if I missed it. The challenge at hand: produce a "raw disk image file" from a given WIM file. What I am pursuing so far is to use imagex.exe with the "/apply" operation to take the WIM and lay it down in a directory on a server. That seems to produce all the necessary "stuff" I need in that directory. How would I take that content and produce a "raw disk image file"? I'm told the definition of "raw disk image file" is a block-by-block copy of the disk image, which I hope is the output of the "imagex.exe /apply" command I use currently, but stored in a single file I can hand back to another system in our solution. imagex.exe /apply image.wim 1 R:\WimImagePoint I would like to take the contents of R:\WimImagePoint and produce the elusive (to me) "raw disk image file". ISO is not what they want, nor is anything requiring winPE. Any pointers? External utilities' references are welcome. Would like to avoid unmanaged code solutions as much as possible, but will entertain them if that's the only route. Also, I am not married to the idea of imagex /apply as the starting point, it's just the comfort zone so far.

Read the article
Getting rsync to move file from source to destination ?

- by fabien-barbier

Is rsync is a good choice for my project ? I have to : - copy files from source to destination folder via SSH, - be sure all files are copied, - delete source files after copy. - if I have conflict name, I have to rename files. It looks like I can use option : --remove-source-files (to delete source files) But how rsync manage conflict, can I had rules ? Use case on my project : I run scientific calculation on server A and results are inserted in folder "process", for each calculation I have a repository like this : /process/calc1. Now I would like to transfer repository "/calc1" to server B (I get /process/calc1), and delete "calc1" from server A. ...During another calculation I get "/process/calc2" on server A, the idea is also to move "calc2" in "/process/" directory on server B, then I have now on server B : - /process/calc1 - /process/calc2 (and /process/ on server A is empty). How rsync will manage conflict (on server B) if I have another folder like "/process/calc1" in server A after a new calculation (if "/process/calc1" already exist on server B) ? Is it possible to add rules with rsync, and rename "/process/calc1" by "process/calc1R2" in server B ? And so on (ex:calc1R3) ? Thanks.

Read the article
One row is skipped each time the program scans a matrix from file !

- by ZaZu

Hello there, I had this code working yesterday, but it seems like I edited it a bit and lost the working version. I cant get this to work anymore. I basically want to scan a matrix from a .txt file. But each time it scans the first row, the second one is skipped, and it reads the third instead :( Here is my code : for(i=0;i<=test->rowmat1;i++){ for(j=0;j<=test->colmat1;j++){ fscanf(fin,"%f\t",&test->mat[i][j]); } fscanf(fin,"%*[^\n]",&test->mat[i][j]); } For example, for a matrix of : 1.00 2.00 3.00 4.00 5.00 6.00 7.00 8.00 9.00 10.00 11.00 12.00 If I extract 3 rows and 3 cols, I get : 1.00 2.00 3.00 7.00 8.00 9.00 Then fails, it wants to skip over the second line but there is nothing after 10 11 12 Why did it stop working ? What do I have wrong ? Please help, Thanks in advance.

Read the article
How to create a simple server/client application using boost.asio?

- by the_drow

I was going over the examples of boost.asio and I am wondering why there isn't an example of a simple server/client example that prints a string on the server and then returns a response to the client. I tried to modify the echo server but I can't really figure out what I'm doing at all. Can anyone find me a template of a client and a template of a server? I would like to eventually create a server/client application that receives binary data and just returns an acknowledgment back to the client that the data is received.

Read the article
Java: how does isDirectory and isFile work in File-class?

- by HH

$ javac GetAllDirs.java GetAllDirs.java:16: cannot find symbol symbol : variable checkFile location: class GetAllDirs System.out.println(checkFile.getName()); ^ 1 error $ cat GetAllDirs.java import java.util.*; import java.io.*; public class GetAllDirs { public void getAllDirs(File file) { if(file.isDirectory()){ System.out.println(file.getName()); File checkFile = new File(file.getCanonicalPath()); }else if(file.isFile()){ System.out.println(file.getName()); File checkFile = new File(file.getParent()); }else{ // checkFile should get Initialized at least HERE! File checkFile = file; } System.out.println(file.getName()); // WHY ERROR HERE: checkfile not found System.out.println(checkFile.getName()); } public static void main(String[] args) { GetAllDirs dirs = new GetAllDirs(); File current = new File("."); dirs.getAllDirs(current); } }

Read the article
Fatal error when using FILE* in Windows from DLL

- by AlannY

Hi there. Recently, I found a problem with Visual C++ 2008 compiler, but using minor hack avoid it. Currently, I cannot use the same hack, but problem exists as in 2008 as in 2010 (Express). So, I've prepared for you 2 simple C file: one for DLL, one for program: DLL (file-dll.c): #include <stdio.h> __declspec(dllexport) void print_to_stream (FILE *stream) { fprintf (stream, "OK!\n"); } And for program, which links this DLL via file-dll.lib: Program: #include <stdio.h> __declspec(dllimport) void print_to_stream (FILE *stream); int main (void) { print_to_stream (stdout); return 0; } To compile and link DLL: cl /LD file-dll.c To compile and link program: cl file-test.c file-dll.lib When invoking file-test.exe, I got the fatal error (similar to segmentation fault in UNIX). As I said early, I had that the same problem before: about transferring FILE* pointer to DLL. I thought, that it may be because of compiler mismatch, but now I'm using one compiler for everything and it's not the problem. ;-( What can I do now? UPD: I've found solution: cl /LD /MD file-dll.c cl /MD file-test.c file-dll.lib The key is to link to dynamic library, but (I did not know it) by default it links staticaly and (hencefore) error occurs (I see why). P.S. Thanks for patience.

Read the article
Drawbacks with reading and writing a big file to or from disk at once instead of small chunks?

- by Johann Gerell

I work mainly on Windows and Windows CE based systems, where CreateFile, ReadFile and WriteFile are the work horses, no matter if I'm in native Win32 land or in managed .Net land. I have so far never had any obvious problem writing or reading big files in one chunk, as opposed to looping until several smaller chunks are processed. I usually delegate the IO work to a background thread that notifies me when it's done. But looking at file IO tutorials or "textbook examples", I often find the "loop with small chunks" used without any explanation of why it's used instead of the more obvious (I dare to say!) "do it all at once". Are there any drawbacks to the way I do that I haven't understood?

Read the article
C++ ofstream cannot write to file....

- by user69514

Hey I am trying to write some numbers to a file, but when I open the file it is empty. Can you help me out here? Thanks. /** main function **/ int main(){ /** variables **/ RandGen* random_generator = new RandGen; int random_numbers; string file_name; /** ask user for quantity of random number to produce **/ cout << "How many random number would you like to create?" << endl; cin >> random_numbers; /** ask user for the name of the file to store the numbers **/ cout << "Enter name of file to store random number" << endl; cin >> file_name; /** now create array to store the number **/ int random_array [random_numbers]; /** file the array with random integers **/ for(int i=0; i<random_numbers; i++){ random_array[i] = random_generator -> randInt(-20, 20); cout << random_array[i] << endl; } /** open file and write contents of random array **/ const char* file = file_name.c_str(); ofstream File(file); /** write contents to the file **/ for(int i=0; i<random_numbers; i++){ File << random_array[i] << endl; } /** close the file **/ File.close(); return 0; /** END OF PROGRAM **/ }

Read the article
Write a file in UTF-8 using FileWriter (Java)?

- by user1280970

I have the following code however, I want it to write as a UTF-8 file to handle foreign characters. Is there a way of doing this, is there some need to have a parameter? I would really appreciate your help with this. Thanks. try { BufferedReader reader = new BufferedReader(new FileReader("C:/Users/Jess/My Documents/actresses.list")); writer = new BufferedWriter(new FileWriter("C:/Users/Jess/My Documents/actressesFormatted.csv")); while( (line = reader.readLine()) != null) { //If the line starts with a tab then we just want to add a movie //using the current actor's name. if(line.length() == 0) continue; else if(line.charAt(0) == '\t') { readMovieLine2(0, line, surname.toString(), forename.toString()); } //Else we've reached a new actor else { readActorName(line); } } } catch (IOException e) { e.printStackTrace(); } }

Read the article
Uploading a file fails under WordPress

- by Ash

I'm using WordPress and I'm following W3's guide for uploading a file: HTML code: <html> <body> <form action="upload_file.php" method="post" enctype="multipart/form-data"> <label for="file">Filename:</label> <input type="file" name="file" id="file" /> <br /> <input type="submit" name="submit" value="Submit" /> </form> </body> </html> PHP code (upload_file.php): <?php if ($_FILES["file"]["error"] > 0) { echo "Error: " . $_FILES["file"]["error"] . "<br />"; } else { echo "Upload: " . $_FILES["file"]["name"] . "<br />"; echo "Type: " . $_FILES["file"]["type"] . "<br />"; echo "Size: " . ($_FILES["file"]["size"] / 1024) . " Kb<br />"; echo "Stored in: " . $_FILES["file"]["tmp_name"]; } ?> The HTML code is pasted in a PHP page template and the PHP file under the WP installation directory under www. The problem is when I submit the file I get Error: 1. If I remark the "if" part of the PHP code and leave the "else" part I get: Upload: IMG_4258.JPG Type: Size: 0 Kb Stored in: So at least I know the PHP code is running. But what's causing it to fail? Is there a problem with the code or is WordPress meddling with the process?

Read the article
Will .NET 4.0 apps work on Win 2008 R2 Server Core?

- by markus

When Windows Server 2008 R2 was launched, the "server core" edition started to become useful to me, because it lets me deploy .NET background applications isolated on their own virtual machine instance with only a small fraction of all the disk space overhead of a default Windows Server installation, and very few Windows Updates. It comes with a subset of .NET 3.5 SP1 integrated (as an optional feature). Now that .NET 4.0 is released, the redistributables explicitly state that it's not support on Server Core. Any chance that there will be a separate download available for Server Core (e. g. without WPF) any time soon, has anybody heard about it?

Read the article
Which file types are worth compressing (zipping) for remote storage? For which of them the compresse

- by user193655

I am storing documents in sql server in varbinary(max) fileds, I use filestream optionally when a user has: (DB_Size + Docs_Size) ~> 0.8 * ExpressEdition_Max_DB_Size I am currently zipping all the files, anyway this is done because the Document Read/Write work was developed 10 years ago where Storage was more expensive than now. Many files when zipped are almost as big as the original (a zipped pdf is about 95% of original size). And anyway unzipping has some overhead, that becomes twice when I need also to "Check-in"/Update the file because I need to zip it. So I was thinking of giving to the users the option to choose whether the file type will be zipped or not by providing some meaningful default values. For my experience I would impose the following rules: 1) zip by default: txt, bmp, rtf 2) do not zip by default: jpg, jpeg, Microsoft Office files, Open Office files, png, tif, tiff Could you suggest other file types chosen among the most common or comment on the ones I listed here?

Read the article
SQL SERVER – Core Concepts – Elasticity, Scalability and ACID Properties – Exploring NuoDB an Elastically Scalable Database System

- by pinaldave

I have been recently exploring Elasticity and Scalability attributes of databases. You can see that in my earlier blog posts about NuoDB where I wanted to look at Elasticity and Scalability concepts. The concepts are very interesting, and intriguing as well. I have discussed these concepts with my friend Joyti M and together we have come up with this interesting read. The goal of this article is to answer following simple questions What is Elasticity? What is Scalability? How ACID properties vary from NOSQL Concepts? What are the prevailing problems in the current database system architectures? Why is NuoDB an innovative and welcome change in database paradigm? Elasticity This word’s original form is used in many different ways and honestly it does do a decent job in holding things together over the years as a person grows and contracts. Within the tech world, and specifically related to software systems (database, application servers), it has come to mean a few things - allow stretching of resources without reaching the breaking point (on demand). What are resources in this context? Resources are the usual suspects – RAM/CPU/IO/Bandwidth in the form of a container (a process or bunch of processes combined as modules). When it is about increasing resources the simplest idea which comes to mind is the addition of another container. Another container means adding a brand new physical node. When it is about adding a new node there are two questions which comes to mind. 1) Can we add another node to our software system? 2) If yes, does adding new node cause downtime for the system? Let us assume we have added new node, let us see what the new needs of the system are when a new node is added. Balancing incoming requests to multiple nodes Synchronization of a shared state across multiple nodes Identification of “downstate” and resolution action to bring it to “upstate” Well, adding a new node has its advantages as well. Here are few of the positive points Throughput can increase nearly horizontally across the node throughout the system Response times of application will increase as in-between layer interactions will be improved Now, Let us put the above concepts in the perspective of a Database. When we mention the term “running out of resources” or “application is bound to resources” the resources can be CPU, Memory or Bandwidth. The regular approach to “gain scalability” in the database is to look around for bottlenecks and increase the bottlenecked resource. When we have memory as a bottleneck we look at the data buffers, locks, query plans or indexes. After a point even this is not enough as there needs to be an efficient way of managing such large workload on a “single machine” across memory and CPU bound (right kind of scheduling) workload. We next move on to either read/write separation of the workload or functionality-based sharing so that we still have control of the individual. But this requires lots of planning and change in client systems in terms of knowing where to go/update/read and for reporting applications to “aggregate the data” in an intelligent way. What we ideally need is an intelligent layer which allows us to do these things without us getting into managing, monitoring and distributing the workload. Scalability In the context of database/applications, scalability means three main things Ability to handle normal loads without pressure E.g. X users at the Y utilization of resources (CPU, Memory, Bandwidth) on the Z kind of hardware (4 processor, 32 GB machine with 15000 RPM SATA drives and 1 GHz Network switch) with T throughput Ability to scale up to expected peak load which is greater than normal load with acceptable response times Ability to provide acceptable response times across the system E.g. Response time in S milliseconds (or agreed upon unit of measure) – 90% of the time The Issue – Need of Scale In normal cases one can plan for the load testing to test out normal, peak, and stress scenarios to ensure specific hardware meets the needs. With help from Hardware and Software partners and best practices, bottlenecks can be identified and requisite resources added to the system. Unfortunately this vertical scale is expensive and difficult to achieve and most of the operational people need the ability to scale horizontally. This helps in getting better throughput as there are physical limits in terms of adding resources (Memory, CPU, Bandwidth and Storage) indefinitely. Today we have different options to achieve scalability: Read & Write Separation The idea here is to do actual writes to one store and configure slaves receiving the latest data with acceptable delays. Slaves can be used for balancing out reads. We can also explore functional separation or sharing as well. We can separate data operations by a specific identifier (e.g. region, year, month) and consolidate it for reporting purposes. For functional separation the major disadvantage is when schema changes or workload pattern changes. As the requirement grows one still needs to deal with scale need in manual ways by providing an abstraction in the middle tier code. Using NOSQL solutions The idea is to flatten out the structures in general to keep all values which are retrieved together at the same store and provide flexible schema. The issue with the stores is that they are compromising on mostly consistency (no ACID guarantees) and one has to use NON-SQL dialect to work with the store. The other major issue is about education with NOSQL solutions. Would one really want to make these compromises on the ability to connect and retrieve in simple SQL manner and learn other skill sets? Or for that matter give up on ACID guarantee and start dealing with consistency issues? Hybrid Deployment – Mac, Linux, Cloud, and Windows One of the challenges today that we see across On-premise vs Cloud infrastructure is a difference in abilities. Take for example SQL Azure – it is wonderful in its concepts of throttling (as it is shared deployment) of resources and ability to scale using federation. However, the same abilities are not available on premise. This is not a mistake, mind you – but a compromise of the sweet spot of workloads, customer requirements and operational SLAs which can be supported by the team. In today’s world it is imperative that databases are available across operating systems – which are a commodity and used by developers of all hues. An Ideal Database Ability List A system which allows a linear scale of the system (increase in throughput with reasonable response time) with the addition of resources A system which does not compromise on the ACID guarantees and require developers to learn new paradigms A system which does not force fit a new way interacting with database by learning Non-SQL dialect A system which does not force fit its mechanisms for providing availability across its various modules. Well NuoDB is the first database which has all of the above abilities and much more. In future articles I will cover my hands-on experience with it. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: NuoDB

Read the article
SQL SERVER – Using expressor Composite Types to Enforce Business Rules

- by pinaldave

One of the features that distinguish the expressor Data Integration Platform from other products in the data integration space is its concept of composite types, which provide an effective and easily reusable way to clearly define the structure and characteristics of data within your application. An important feature of the composite type approach is that it allows you to easily adjust the content of a record to its ultimate purpose. For example, a record used to update a row in a database table is easily defined to include only the minimum set of columns, that is, a value for the key column and values for only those columns that need to be updated. Much like a class in higher level programming languages, you can also use the composite type as a way to enforce business rules onto your data by encapsulating a datum’s name, data type, and constraints (for example, maximum, minimum, or acceptable values) as a single entity, which ensures that your data can not assume an invalid value. To what extent you use this functionality is a decision you make when designing your application; the expressor design paradigm does not force this approach on you. Let’s take a look at how these features are used. Suppose you want to create a group of applications that maintain the employee table in your human resources database. Your table might have a structure similar to the HumanResources.Employee table in the AdventureWorks database. This table includes two columns, EmployeID and rowguid, that are maintained by the relational database management system; you cannot provide values for these columns when inserting new rows into the table. Additionally, there are columns such as VacationHours and SickLeaveHours that you might choose to update for all employees on a monthly basis, which justifies creation of a dedicated application. By creating distinct composite types for the read, insert and update operations against this table, you can more easily manage this table’s content. When developing this application within expressor Studio, your first task is to create a schema artifact for the database table. This process is completely driven by a wizard, only requiring that you select the desired database schema and table. The resulting schema artifact defines the mapping of result set records to a record within the expressor data integration application. The structure of the record within the expressor application is a composite type that is given the default name CompositeType1. As you can see in the following figure, all columns from the table are included in the result set and mapped to an identically named attribute in the default composite type. If you are developing an application that needs to read this table, perhaps to prepare a year-end report of employees by department, you would probably not be interested in the data in the rowguid and ModifiedDate columns. A typical approach would be to drop this unwanted data in a downstream operator. But using an alternative composite type provides a better approach in which the unwanted data never enters your application. While working in expressor Studio’s schema editor, simply create a second composite type within the same schema artifact, which you could name ReadTable, and remove the attributes corresponding to the unwanted columns. The value of an alternative composite type is even more apparent when you want to insert into or update the table. In the composite type used to insert rows, remove the attributes corresponding to the EmployeeID primary key and rowguid uniqueidentifier columns since these values are provided by the relational database management system. And to update just the VacationHours and SickLeaveHours columns, use a composite type that includes only the attributes corresponding to the EmployeeID, VacationHours, SickLeaveHours and ModifiedDate columns. By specifying this schema artifact and composite type in a Write Table operator, your upstream application need only deal with the four required attributes and there is no risk of unintentionally overwriting a value in a column that does not need to be updated. Now, what about the option to use the composite type to enforce business rules? If you review the composition of the default composite type CompositeType1, you will note that the constraints defined for many of the attributes mirror the table column specifications. For example, the maximum number of characters in the NationaIDNumber, LoginID and Title attributes is equivalent to the maximum width of the target column, and the size of the MaritalStatus and Gender attributes is limited to a single character as required by the table column definition. If your application code leads to a violation of these constraints, an error will be raised. The expressor design paradigm then allows you to handle the error in a way suitable for your application. For example, a string value could be truncated or a numeric value could be rounded. Moreover, you have the option of specifying additional constraints that support business rules unrelated to the table definition. Let’s assume that the only acceptable values for marital status are S, M, and D. Within the schema editor, double-click on the MaritalStatus attribute to open the Edit Attribute window. Then click the Allowed Values checkbox and enter the acceptable values into the Constraint Value text box. The schema editor is updated accordingly. There is one more option that the expressor semantic type paradigm supports. Since the MaritalStatus attribute now clearly specifies how this type of information should be represented (a single character limited to S, M or D), you can convert this attribute definition into a shared type, which will allow you to quickly incorporate this definition into another composite type or into the description of an output record from a transform operator. Again, double-click on the MaritalStatus attribute and in the Edit Attribute window, click Convert, which opens the Share Local Semantic Type window that you use to name this shared type. There’s no requirement that you give the shared type the same name as the attribute from which it was derived. You should supply a name that makes it obvious what the shared type represents. In this posting, I’ve overviewed the expressor semantic type paradigm and shown how it can be used to make your application development process more productive. The beauty of this feature is that you choose when and to what extent you utilize the functionality, but I’m certain that if you opt to follow this approach your efforts will become more efficient and your work will progress more quickly. As always, I encourage you to download and evaluate expressor Studio for your current and future data integration needs. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, Pinal Dave, PostADay, SQL, SQL Authority, SQL Documentation, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
Fraud Detection with the SQL Server Suite Part 2

- by Dejan Sarka

This is the second part of the fraud detection whitepaper. You can find the first part in my previous blog post about this topic. My Approach to Data Mining Projects It is impossible to evaluate the time and money needed for a complete fraud detection infrastructure in advance. Personally, I do not know the customer’s data in advance. I don’t know whether there is already an existing infrastructure, like a data warehouse, in place, or whether we would need to build one from scratch. Therefore, I always suggest to start with a proof-of-concept (POC) project. A POC takes something between 5 and 10 working days, and involves personnel from the customer’s site – either employees or outsourced consultants. The team should include a subject matter expert (SME) and at least one information technology (IT) expert. The SME must be familiar with both the domain in question as well as the meaning of data at hand, while the IT expert should be familiar with the structure of data, how to access it, and have some programming (preferably Transact-SQL) knowledge. With more than one IT expert the most time consuming work, namely data preparation and overview, can be completed sooner. I assume that the relevant data is already extracted and available at the very beginning of the POC project. If a customer wants to have their people involved in the project directly and requests the transfer of knowledge, the project begins with training. I strongly advise this approach as it offers the establishment of a common background for all people involved, the understanding of how the algorithms work and the understanding of how the results should be interpreted, a way of becoming familiar with the SQL Server suite, and more. Once the data has been extracted, the customer’s SME (i.e. the analyst), and the IT expert assigned to the project will learn how to prepare the data in an efficient manner. Together with me, knowledge and expertise allow us to focus immediately on the most interesting attributes and identify any additional, calculated, ones soon after. By employing our programming knowledge, we can, for example, prepare tens of derived variables, detect outliers, identify the relationships between pairs of input variables, and more, in only two or three days, depending on the quantity and the quality of input data. I favor the customer’s decision of assigning additional personnel to the project. For example, I actually prefer to work with two teams simultaneously. I demonstrate and explain the subject matter by applying techniques directly on the data managed by each team, and then both teams continue to work on the data overview and data preparation under our supervision. I explain to the teams what kind of results we expect, the reasons why they are needed, and how to achieve them. Afterwards we review and explain the results, and continue with new instructions, until we resolve all known problems. Simultaneously with the data preparation the data overview is performed. The logic behind this task is the same – again I show to the teams involved the expected results, how to achieve them and what they mean. This is also done in multiple cycles as is the case with data preparation, because, quite frankly, both tasks are completely interleaved. A specific objective of the data overview is of principal importance – it is represented by a simple star schema and a simple OLAP cube that will first of all simplify data discovery and interpretation of the results, and will also prove useful in the following tasks. The presence of the customer’s SME is the key to resolving possible issues with the actual meaning of the data. We can always replace the IT part of the team with another database developer; however, we cannot conduct this kind of a project without the customer’s SME. After the data preparation and when the data overview is available, we begin the scientific part of the project. I assist the team in developing a variety of models, and in interpreting the results. The results are presented graphically, in an intuitive way. While it is possible to interpret the results on the fly, a much more appropriate alternative is possible if the initial training was also performed, because it allows the customer’s personnel to interpret the results by themselves, with only some guidance from me. The models are evaluated immediately by using several different techniques. One of the techniques includes evaluation over time, where we use an OLAP cube. After evaluating the models, we select the most appropriate model to be deployed for a production test; this allows the team to understand the deployment process. There are many possibilities of deploying data mining models into production; at the POC stage, we select the one that can be completed quickly. Typically, this means that we add the mining model as an additional dimension to an existing DW or OLAP cube, or to the OLAP cube developed during the data overview phase. Finally, we spend some time presenting the results of the POC project to the stakeholders and managers. Even from a POC, the customer will receive lots of benefits, all at the sole risk of spending money and time for a single 5 to 10 day project: The customer learns the basic patterns of frauds and fraud detection The customer learns how to do the entire cycle with their own people, only relying on me for the most complex problems The customer’s analysts learn how to perform much more in-depth analyses than they ever thought possible The customer’s IT experts learn how to perform data extraction and preparation much more efficiently than they did before All of the attendees of this training learn how to use their own creativity to implement further improvements of the process and procedures, even after the solution has been deployed to production The POC output for a smaller company or for a subsidiary of a larger company can actually be considered a finished, production-ready solution It is possible to utilize the results of the POC project at subsidiary level, as a finished POC project for the entire enterprise Typically, the project results in several important “side effects” Improved data quality Improved employee job satisfaction, as they are able to proactively contribute to the central knowledge about fraud patterns in the organization Because eventually more minds get to be involved in the enterprise, the company should expect more and better fraud detection patterns After the POC project is completed as described above, the actual project would not need months of engagement from my side. This is possible due to our preference to transfer the knowledge onto the customer’s employees: typically, the customer will use the results of the POC project for some time, and only engage me again to complete the project, or to ask for additional expertise if the complexity of the problem increases significantly. I usually expect to perform the following tasks: Establish the final infrastructure to measure the efficiency of the deployed models Deploy the models in additional scenarios Through reports By including Data Mining Extensions (DMX) queries in OLTP applications to support real-time early warnings Include data mining models as dimensions in OLAP cubes, if this was not done already during the POC project Create smart ETL applications that divert suspicious data for immediate or later inspection I would also offer to investigate how the outcome could be transferred automatically to the central system; for instance, if the POC project was performed in a subsidiary whereas a central system is available as well Of course, for the actual project, I would repeat the data and model preparation as needed It is virtually impossible to tell in advance how much time the deployment would take, before we decide together with customer what exactly the deployment process should cover. Without considering the deployment part, and with the POC project conducted as suggested above (including the transfer of knowledge), the actual project should still only take additional 5 to 10 days. The approximate timeline for the POC project is, as follows: 1-2 days of training 2-3 days for data preparation and data overview 2 days for creating and evaluating the models 1 day for initial preparation of the continuous learning infrastructure 1 day for presentation of the results and discussion of further actions Quite frequently I receive the following question: are we going to find the best possible model during the POC project, or during the actual project? My answer is always quite simple: I do not know. Maybe, if we would spend just one hour more for data preparation, or create just one more model, we could get better patterns and predictions. However, we simply must stop somewhere, and the best possible way to do this, according to my experience, is to restrict the time spent on the project in advance, after an agreement with the customer. You must also never forget that, because we build the complete learning infrastructure and transfer the knowledge, the customer will be capable of doing further investigations independently and improve the models and predictions over time without the need for a constant engagement with me.

Read the article

Search Results

Search found 134806 results on 5393 pages for 'file server'.

Page 143/5393 | < Previous Page | 139 140 141 142 143 144 145 146 147 148 149 150 | Next Page >

- by pinaldave

- by Aamir Hasan

- by pinaldave

- by pinaldave

- by pinaldave

- by pinaldave

- by pinaldave

- by pinaldave

- by pinaldave

- by bkev

- by Joe Baltimore

- by fabien-barbier

- by ZaZu

- by the_drow

- by HH

- by AlannY

- by Johann Gerell

- by user69514

- by user1280970

- by Ash

- by markus

- by user193655

- by pinaldave

- by pinaldave

- by Dejan Sarka

< Previous Page | 139 140 141 142 143 144 145 146 147 148 149 150 | Next Page >