Search Results

Search found 1894 results on 76 pages for 'phil factor'.

Page 6/76 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >

  • Table Variables: an empirical approach.

    - by Phil Factor
    It isn’t entirely a pleasant experience to publish an article only to have it described on Twitter as ‘Horrible’, and to have it criticized on the MVP forum. When this happened to me in the aftermath of publishing my article on Temporary tables recently, I was taken aback, because these critics were experts whose views I respect. What was my crime? It was, I think, to suggest that, despite the obvious quirks, it was best to use Table Variables as a first choice, and to use local Temporary Tables if you hit problems due to these quirks, or if you were doing complex joins using a large number of rows. What are these quirks? Well, table variables have advantages if they are used sensibly, but this requires some awareness by the developer about the potential hazards and how to avoid them. You can be hit by a badly-performing join involving a table variable. Table Variables are a compromise, and this compromise doesn’t always work out well. Explicit indexes aren’t allowed on Table Variables, so one cannot use covering indexes or non-unique indexes. The query optimizer has to make assumptions about the data rather than using column distribution statistics when a table variable is involved in a join, because there aren’t any column-based distribution statistics on a table variable. It assumes a reasonably even distribution of data, and is likely to have little idea of the number of rows in the table variables that are involved in queries. However complex the heuristics that are used might be in determining the best way of executing a SQL query, and they most certainly are, the Query Optimizer is likely to fail occasionally with table variables, under certain circumstances, and produce a Query Execution Plan that is frightful. The experienced developer or DBA will be on the lookout for this sort of problem. In this blog, I’ll be expanding on some of the tests I used when writing my article to illustrate the quirks, and include a subsequent example supplied by Kevin Boles. A simplified example. We’ll start out by illustrating a simple example that shows some of these characteristics. We’ll create two tables filled with random numbers and then see how many matches we get between the two tables. We’ll forget indexes altogether for this example, and use heaps. We’ll try the same Join with two table variables, two table variables with OPTION (RECOMPILE) in the JOIN clause, and with two temporary tables. It is all a bit jerky because of the granularity of the timing that isn’t actually happening at the millisecond level (I used DATETIME). However, you’ll see that the table variable is outperforming the local temporary table up to 10,000 rows. Actually, even without a use of the OPTION (RECOMPILE) hint, it is doing well. What happens when your table size increases? The table variable is, from around 30,000 rows, locked into a very bad execution plan unless you use OPTION (RECOMPILE) to provide the Query Analyser with a decent estimation of the size of the table. However, if it has the OPTION (RECOMPILE), then it is smokin’. Well, up to 120,000 rows, at least. It is performing better than a Temporary table, and in a good linear fashion. What about mixed table joins, where you are joining a temporary table to a table variable? You’d probably expect that the query analyzer would throw up its hands and produce a bad execution plan as if it were a table variable. After all, it knows nothing about the statistics in one of the tables so how could it do any better? Well, it behaves as if it were doing a recompile. And an explicit recompile adds no value at all. (we just go up to 45000 rows since we know the bigger picture now)   Now, if you were new to this, you might be tempted to start drawing conclusions. Beware! We’re dealing with a very complex beast: the Query Optimizer. It can come up with surprises What if we change the query very slightly to insert the results into a Table Variable? We change nothing else and just measure the execution time of the statement as before. Suddenly, the table variable isn’t looking so much better, even taking into account the time involved in doing the table insert. OK, if you haven’t used OPTION (RECOMPILE) then you’re toast. Otherwise, there isn’t much in it between the Table variable and the temporary table. The table variable is faster up to 8000 rows and then not much in it up to 100,000 rows. Past the 8000 row mark, we’ve lost the advantage of the table variable’s speed. Any general rule you may be formulating has just gone for a walk. What we can conclude from this experiment is that if you join two table variables, and can’t use constraints, you’re going to need that Option (RECOMPILE) hint. Count Dracula and the Horror Join. These tables of integers provide a rather unreal example, so let’s try a rather different example, and get stuck into some implicit indexing, by using constraints. What unusual words are contained in the book ‘Dracula’ by Bram Stoker? Here we get a table of all the common words in the English language (60,387 of them) and put them in a table. We put them in a Table Variable with the word as a primary key, a Table Variable Heap and a Table Variable with a primary key. We then take all the distinct words used in the book ‘Dracula’ (7,558 of them). We then create a table variable and insert into it all those uncommon words that are in ‘Dracula’. i.e. all the words in Dracula that aren’t matched in the list of common words. To do this we use a left outer join, where the right-hand value is null. The results show a huge variation, between the sublime and the gorblimey. If both tables contain a Primary Key on the columns we join on, and both are Table Variables, it took 33 Ms. If one table contains a Primary Key, and the other is a heap, and both are Table Variables, it took 46 Ms. If both Table Variables use a unique constraint, then the query takes 36 Ms. If neither table contains a Primary Key and both are Table Variables, it took 116383 Ms. Yes, nearly two minutes!! If both tables contain a Primary Key, one is a Table Variables and the other is a temporary table, it took 113 Ms. If one table contains a Primary Key, and both are Temporary Tables, it took 56 Ms.If both tables are temporary tables and both have primary keys, it took 46 Ms. Here we see table variables which are joined on their primary key again enjoying a  slight performance advantage over temporary tables. Where both tables are table variables and both are heaps, the query suddenly takes nearly two minutes! So what if you have two heaps and you use option Recompile? If you take the rogue query and add the hint, then suddenly, the query drops its time down to 76 Ms. If you add unique indexes, then you've done even better, down to half that time. Here are the text execution plans.So where have we got to? Without drilling down into the minutiae of the execution plans we can begin to create a hypothesis. If you are using table variables, and your tables are relatively small, they are faster than temporary tables, but as the number of rows increases you need to do one of two things: either you need to have a primary key on the column you are using to join on, or else you need to use option (RECOMPILE) If you try to execute a query that is a join, and both tables are table variable heaps, you are asking for trouble, well- slow queries, unless you give the table hint once the number of rows has risen past a point (30,000 in our first example, but this varies considerably according to context). Kevin’s Skew In describing the table-size, I used the term ‘relatively small’. Kevin Boles produced an interesting case where a single-row table variable produces a very poor execution plan when joined to a very, very skewed table. In the original, pasted into my article as a comment, a column consisted of 100000 rows in which the key column was one number (1) . To this was added eight rows with sequential numbers up to 9. When this was joined to a single-tow Table Variable with a key of 2 it produced a bad plan. This problem is unlikely to occur in real usage, and the Query Optimiser team probably never set up a test for it. Actually, the skew can be slightly less extreme than Kevin made it. The following test showed that once the table had 54 sequential rows in the table, then it adopted exactly the same execution plan as for the temporary table and then all was well. Undeniably, real data does occasionally cause problems to the performance of joins in Table Variables due to the extreme skew of the distribution. We've all experienced Perfectly Poisonous Table Variables in real live data. As in Kevin’s example, indexes merely make matters worse, and the OPTION (RECOMPILE) trick does nothing to help. In this case, there is no option but to use a temporary table. However, one has to note that once the slight de-skew had taken place, then the plans were identical across a huge range. Conclusions Where you need to hold intermediate results as part of a process, Table Variables offer a good alternative to temporary tables when used wisely. They can perform faster than a temporary table when the number of rows is not great. For some processing with huge tables, they can perform well when only a clustered index is required, and when the nature of the processing makes an index seek very effective. Table Variables are scoped to the batch or procedure and are unlikely to hang about in the TempDB when they are no longer required. They require no explicit cleanup. Where the number of rows in the table is moderate, you can even use them in joins as ‘Heaps’, unindexed. Beware, however, since, as the number of rows increase, joins on Table Variable heaps can easily become saddled by very poor execution plans, and this must be cured either by adding constraints (UNIQUE or PRIMARY KEY) or by adding the OPTION (RECOMPILE) hint if this is impossible. Occasionally, the way that the data is distributed prevents the efficient use of Table Variables, and this will require using a temporary table instead. Tables Variables require some awareness by the developer about the potential hazards and how to avoid them. If you are not prepared to do any performance monitoring of your code or fine-tuning, and just want to pummel out stuff that ‘just runs’ without considering namby-pamby stuff such as indexes, then stick to Temporary tables. If you are likely to slosh about large numbers of rows in temporary tables without considering the niceties of processing just what is required and no more, then temporary tables provide a safer and less fragile means-to-an-end for you.

    Read the article

  • Sniffing out SQL Code Smells: Inconsistent use of Symbolic names and Datatypes

    - by Phil Factor
    It is an awkward feeling. You’ve just delivered a database application that seems to be working fine in production, and you just run a few checks on it. You discover that there is a potential bug that, out of sheer good chance, hasn’t kicked in to produce an error; but it lurks, like a smoking bomb. Worse, maybe you find that the bug has started its evil work of corrupting the data, but in ways that nobody has, so far detected. You investigate, and find the damage. You are somehow going to have to repair it. Yes, it still very occasionally happens to me. It is not a nice feeling, and I do anything I can to prevent it happening. That’s why I’m interested in SQL code smells. SQL Code Smells aren’t necessarily bad practices, but just show you where to focus your attention when checking an application. Sometimes with databases the bugs can be subtle. SQL is rather like HTML: the language does its best to try to carry out your wishes, rather than to be picky about your bugs. Most of the time, this is a great benefit, but not always. One particular place where this can be detrimental is where you have implicit conversion between different data types. Most of the time it is completely harmless but we’re  concerned about the occasional time it isn’t. Let’s give an example: String truncation. Let’s give another even more frightening one, rounding errors on assignment to a number of different precision. Each requires a blog-post to explain in detail and I’m not now going to try. Just remember that it is not always a good idea to assign data to variables, parameters or even columns when they aren’t the same datatype, especially if you are relying on implicit conversion to work its magic.For details of the problem and the consequences, see here:  SR0014: Data loss might occur when casting from {Type1} to {Type2} . For any experienced Database Developer, this is a more frightening read than a Vampire Story. This is why one of the SQL Code Smells that makes me edgy, in my own or other peoples’ code, is to see parameters, variables and columns that have the same names and different datatypes. Whereas quite a lot of this is perfectly normal and natural, you need to check in case one of two things have gone wrong. Either sloppy naming, or mixed datatypes. Sure it is hard to remember whether you decided that the length of a log entry was 80 or 100 characters long, or the precision of a number. That is why a little check like this I’m going to show you is excellent for tidying up your code before you check it back into source Control! 1/ Checking Parameters only If you were just going to check parameters, you might just do this. It simply groups all the parameters, either input or output, of all the routines (e.g. stored procedures or functions) by their name and checks to see, in the HAVING clause, whether their data types are all the same. If not, it lists all the examples and their origin (the routine) Even this little check can occasionally be scarily revealing. ;WITH userParameter AS  ( SELECT   c.NAME AS ParameterName,  OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName,  t.name + ' '     + CASE     --we may have to put in the length            WHEN t.name IN ('char', 'varchar', 'nchar', 'nvarchar')             THEN '('               + CASE WHEN c.max_length = -1 THEN 'MAX'                ELSE CONVERT(VARCHAR(4),                    CASE WHEN t.name IN ('nchar', 'nvarchar')                      THEN c.max_length / 2 ELSE c.max_length                    END)                END + ')'         WHEN t.name IN ('decimal', 'numeric')             THEN '(' + CONVERT(VARCHAR(4), c.precision)                   + ',' + CONVERT(VARCHAR(4), c.Scale) + ')'         ELSE ''      END  --we've done with putting in the length      + CASE WHEN XML_collection_ID <> 0         THEN --deal with object schema names             '(' + CASE WHEN is_XML_Document = 1                    THEN 'DOCUMENT '                    ELSE 'CONTENT '                   END              + COALESCE(               (SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name)                FROM sys.xml_schema_collections sc                INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID                WHERE sc.xml_collection_ID = c.XML_collection_ID),'NULL') + ')'          ELSE ''         END        AS [DataType]  FROM sys.parameters c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID  WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'   AND parameter_id>0)SELECT CONVERT(CHAR(80),objectName+'.'+ParameterName),DataType FROM UserParameterWHERE ParameterName IN   (SELECT ParameterName FROM UserParameter    GROUP BY ParameterName    HAVING MIN(Datatype)<>MAX(DataType))ORDER BY ParameterName   so, in a very small example here, we have a @ClosingDelimiter variable that is only CHAR(1) when, by the looks of it, it should be up to ten characters long, or even worse, a function that should be a char(1) and seems to let in a string of ten characters. Worth investigating. Then we have a @Comment variable that can't decide whether it is a VARCHAR(2000) or a VARCHAR(MAX) 2/ Columns and Parameters Actually, once we’ve cleared up the mess we’ve made of our parameter-naming in the database we’re inspecting, we’re going to be more interested in listing both columns and parameters. We can do this by modifying the routine to list columns as well as parameters. Because of the slight complexity of creating the string version of the datatypes, we will create a fake table of both columns and parameters so that they can both be processed the same way. After all, we want the datatypes to match Unfortunately, parameters do not expose all the attributes we are interested in, such as whether they are nullable (oh yes, subtle bugs happen if this isn’t consistent for a datatype). We’ll have to leave them out for this check. Voila! A slight modification of the first routine ;WITH userObject AS  ( SELECT   Name AS DataName,--the actual name of the parameter or column ('@' removed)  --and the qualified object name of the routine  OBJECT_SCHEMA_NAME(ObjectID) + '.' + OBJECT_NAME(ObjectID) AS ObjectName,  --now the harder bit: the definition of the datatype.  TypeName + ' '     + CASE     --we may have to put in the length. e.g. CHAR (10)           WHEN TypeName IN ('char', 'varchar', 'nchar', 'nvarchar')             THEN '('               + CASE WHEN MaxLength = -1 THEN 'MAX'                ELSE CONVERT(VARCHAR(4),                    CASE WHEN TypeName IN ('nchar', 'nvarchar')                      THEN MaxLength / 2 ELSE MaxLength                    END)                END + ')'         WHEN TypeName IN ('decimal', 'numeric')--a BCD number!             THEN '(' + CONVERT(VARCHAR(4), Precision)                   + ',' + CONVERT(VARCHAR(4), Scale) + ')'         ELSE ''      END  --we've done with putting in the length      + CASE WHEN XML_collection_ID <> 0 --tush tush. XML         THEN --deal with object schema names             '(' + CASE WHEN is_XML_Document = 1                    THEN 'DOCUMENT '                    ELSE 'CONTENT '                   END              + COALESCE(               (SELECT TOP 1 QUOTENAME(ss.name) + '.' + QUOTENAME(sc.Name)                FROM sys.xml_schema_collections sc                INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID                WHERE sc.xml_collection_ID = XML_collection_ID),'NULL') + ')'          ELSE ''         END        AS [DataType],       DataObjectType  FROM   (Select t.name AS TypeName, REPLACE(c.name,'@','') AS Name,          c.max_length AS MaxLength, c.precision AS [Precision],           c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID,          is_XML_Document,'P' AS DataobjectType  FROM sys.parameters c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID  AND parameter_id>0  UNION all  Select t.name AS TypeName, c.name AS Name, c.max_length AS MaxLength,          c.precision AS [Precision], c.scale AS [Scale],          c.[Object_id] AS ObjectID, XML_collection_ID,is_XML_Document,          'C' AS DataobjectType            FROM sys.columns c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID   WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'  )f)SELECT CONVERT(CHAR(80),objectName+'.'   + CASE WHEN DataobjectType ='P' THEN '@' ELSE '' END + DataName),DataType FROM UserObjectWHERE DataName IN   (SELECT DataName FROM UserObject   GROUP BY DataName    HAVING MIN(Datatype)<>MAX(DataType))ORDER BY DataName     Hmm. I can tell you I found quite a few minor issues with the various tabases I tested this on, and found some potential bugs that really leap out at you from the results. Here is the start of the result for AdventureWorks. Yes, AccountNumber is, for some reason, a Varchar(10) in the Customer table. Hmm. odd. Why is a city fifty characters long in that view?  The idea of the description of a colour being 256 characters long seems over-ambitious. Go down the list and you'll spot other mistakes. There are no bugs, but just mess. We started out with a listing to examine parameters, then we mixed parameters and columns. Our last listing is for a slightly more in-depth look at table columns. You’ll notice that we’ve delibarately removed the indication of whether a column is persisted, or is an identity column because that gives us false positives for our code smells. If you just want to browse your metadata for other reasons (and it can quite help in some circumstances) then uncomment them! ;WITH userColumns AS  ( SELECT   c.NAME AS columnName,  OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName,  REPLACE(t.name + ' '   + CASE WHEN is_computed = 1 THEN ' AS ' + --do DDL for a computed column          (SELECT definition FROM sys.computed_columns cc           WHERE cc.object_id = c.object_id AND cc.column_ID = c.column_ID)     --we may have to put in the length            WHEN t.Name IN ('char', 'varchar', 'nchar', 'nvarchar')             THEN '('               + CASE WHEN c.Max_Length = -1 THEN 'MAX'                ELSE CONVERT(VARCHAR(4),                    CASE WHEN t.Name IN ('nchar', 'nvarchar')                      THEN c.Max_Length / 2 ELSE c.Max_Length                    END)                END + ')'       WHEN t.name IN ('decimal', 'numeric')       THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')'       ELSE ''      END + CASE WHEN c.is_rowguidcol = 1          THEN ' ROWGUIDCOL'          ELSE ''         END + CASE WHEN XML_collection_ID <> 0            THEN --deal with object schema names             '(' + CASE WHEN is_XML_Document = 1                THEN 'DOCUMENT '                ELSE 'CONTENT '               END + COALESCE((SELECT                QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name)                FROM                sys.xml_schema_collections sc                INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID                WHERE                sc.xml_collection_ID = c.XML_collection_ID),                'NULL') + ')'            ELSE ''           END + CASE WHEN is_identity = 1             THEN CASE WHEN OBJECTPROPERTY(object_id,                'IsUserTable') = 1 AND COLUMNPROPERTY(object_id,                c.name,                'IsIDNotForRepl') = 0 AND OBJECTPROPERTY(object_id,                'IsMSShipped') = 0                THEN ''                ELSE ' NOT FOR REPLICATION '               END             ELSE ''            END + CASE WHEN c.is_nullable = 0               THEN ' NOT NULL'               ELSE ' NULL'              END + CASE                WHEN c.default_object_id <> 0                THEN ' DEFAULT ' + object_Definition(c.default_object_id)                ELSE ''               END + CASE                WHEN c.collation_name IS NULL                THEN ''                WHEN c.collation_name <> (SELECT                collation_name                FROM                sys.databases                WHERE                name = DB_NAME()) COLLATE Latin1_General_CI_AS                THEN COALESCE(' COLLATE ' + c.collation_name,                '')                ELSE ''                END,'  ',' ') AS [DataType]FROM sys.columns c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID  WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys')SELECT CONVERT(CHAR(80),objectName+'.'+columnName),DataType FROM UserColumnsWHERE columnName IN (SELECT columnName FROM UserColumns  GROUP BY columnName  HAVING MIN(Datatype)<>MAX(DataType))ORDER BY columnName If you take a look down the results against Adventureworks, you'll see once again that there are things to investigate, mostly, in the illustration, discrepancies between null and non-null datatypes So I here you ask, what about temporary variables within routines? If ever there was a source of elusive bugs, you'll find it there. Sadly, these temporary variables are not stored in the metadata so we'll have to find a more subtle way of flushing these out, and that will, I'm afraid, have to wait!

    Read the article

  • Curing the Database-Application mismatch

    - by Phil Factor
    If an application requires access to a database, then you have to be able to deploy it so as to be version-compatible with the database, in phase. If you can deploy both together, then the application and database must normally be deployed at the same version in which they, together, passed integration and functional testing.  When a single database supports more than one application, then the problem gets more interesting. I’ll need to be more precise here. It is actually the application-interface definition of the database that needs to be in a compatible ‘version’.  Most databases that get into production have no separate application-interface; in other words they are ‘close-coupled’.  For this vast majority, the whole database is the application-interface, and applications are free to wander through the bowels of the database scot-free.  If you’ve spurned the perceived wisdom of application architects to have a defined application-interface within the database that is based on views and stored procedures, any version-mismatch will be as sensitive as a kitten.  A team that creates an application that makes direct access to base tables in a database will have to put a lot of energy into keeping Database and Application in sync, to say nothing of having to tackle issues such as security and audit. It is not the obvious route to development nirvana. I’ve been in countless tense meetings with application developers who initially bridle instinctively at the apparent restrictions of being ‘banned’ from the base tables or routines of a database.  There is no good technical reason for needing that sort of access that I’ve ever come across.  Everything that the application wants can be delivered via a set of views and procedures, and with far less pain for all concerned: This is the application-interface.  If more than zero developers are creating a database-driven application, then the project will benefit from the loose-coupling that an application interface brings. What is important here is that the database development role is separated from the application development role, even if it is the same developer performing both roles. The idea of an application-interface with a database is as old as I can remember. The big corporate or government databases generally supported several applications, and there was little option. When a new application wanted access to an existing corporate database, the developers, and myself as technical architect, would have to meet with hatchet-faced DBAs and production staff to work out an interface. Sure, they would talk up the effort involved for budgetary reasons, but it was routine work, because it decoupled the database from its supporting applications. We’d be given our own stored procedures. One of them, I still remember, had ninety-two parameters. All database access was encapsulated in one application-module. If you have a stable defined application-interface with the database (Yes, one for each application usually) you need to keep the external definitions of the components of this interface in version control, linked with the application source,  and carefully track and negotiate any changes between database developers and application developers.  Essentially, the application development team owns the interface definition, and the onus is on the Database developers to implement it and maintain it, in conformance.  Internally, the database can then make all sorts of changes and refactoring, as long as source control is maintained.  If the application interface passes all the comprehensive integration and functional tests for the particular version they were designed for, nothing is broken. Your performance-testing can ‘hang’ on the same interface, since databases are judged on the performance of the application, not an ‘internal’ database process. The database developers have responsibility for maintaining the application-interface, but not its definition,  as they refactor the database. This is easily tested on a daily basis since the tests are normally automated. In this setting, the deployment can proceed if the more stable application-interface, rather than the continuously-changing database, passes all tests for the version of the application. Normally, if all goes well, a database with a well-designed application interface can evolve gracefully without changing the external appearance of the interface, and this is confirmed by integration tests that check the interface, and which hopefully don’t need to be altered at all often.  If the application is rapidly changing its ‘domain model’  in the light of an increased understanding of the application domain, then it can change the interface definitions and the database developers need only implement the interface rather than refactor the underlying database.  The test team will also have to redo the functional and integration tests which are, of course ‘written to’ the definition.  The Database developers will find it easier if these tests are done before their re-wiring  job to implement the new interface. If, at the other extreme, an application receives no further development work but survives unchanged, the database can continue to change and develop to keep pace with the requirements of the other applications it supports, and needs only to take care that the application interface is never broken. Testing is easy since your automated scripts to test the interface do not need to change. The database developers will, of course, maintain their own source control for the database, and will be likely to maintain versions for all major releases. However, this will not need to be shared with the applications that the database servers. On the other hand, the definition of the application interfaces should be within the application source. Changes in it have to be subject to change-control procedures, as they will require a chain of tests. Once you allow, instead of an application-interface, an intimate relationship between application and database, we are in the realms of impedance mismatch, over and above the obvious security problems.  Part of this impedance problem is a difference in development practices. Whereas the application has to be regularly built and integrated, this isn’t necessarily the case with the database.  An RDBMS is inherently multi-user and self-integrating. If the developers work together on the database, then a subsequent integration of the database on a staging server doesn’t often bring nasty surprises. A separate database-integration process is only needed if the database is deliberately built in a way that mimics the application development process, but which hampers the normal database-development techniques.  This process is like demanding a official walking with a red flag in front of a motor car.  In order to closely coordinate databases with applications, entire databases have to be ‘versioned’, so that an application version can be matched with a database version to produce a working build without errors.  There is no natural process to ‘version’ databases.  Each development project will have to define a system for maintaining the version level. A curious paradox occurs in development when there is no formal application-interface. When the strains and cracks happen, the extra meetings, bureaucracy, and activity required to maintain accurate deployments looks to IT management like work. They see activity, and it looks good. Work means progress.  Management then smile on the design choices made. In IT, good design work doesn’t necessarily look good, and vice versa.

    Read the article

  • Source-control 'wet-work'?

    - by Phil Factor
    When a design or creative work is flawed beyond remedy, it is often best to destroy it and start again. The other day, I lost the code to a long and intricate SQL batch I was working on. I’d thought it was impossible, but it happened. With all the technology around that is designed to prevent this occurring, this sort of accident has become a rare event.  If it weren’t for a deranged laptop, and my distraction, the code wouldn’t have been lost this time.  As always, I sighed, had a soothing cup of tea, and typed it all in again.  The new code I hastily tapped in  was much better: I’d held in my head the essence of how the code should work rather than the details: I now knew for certain  the start point, the end, and how it should be achieved. Instantly the detritus of half-baked thoughts fell away and I was able to write logical code that performed better.  Because I could work so quickly, I was able to hold the details of all the columns and variables in my head, and the dynamics of the flow of data. It was, in fact, easier and quicker to start from scratch rather than tidy up and refactor the existing code with its inevitable fumbling and half-baked ideas. What a shame that technology is now so good that developers rarely experience the cleansing shock of losing one’s code and having to rewrite it from scratch.  If you’ve never accidentally lost  your code, then it is worth doing it deliberately once for the experience. Creative people have, until Technology mistakenly prevented it, torn up their drafts or sketches, threw them in the bin, and started again from scratch.  Leonardo’s obsessive reworking of the Mona Lisa was renowned because it was so unusual:  Most artists have been utterly ruthless in destroying work that didn’t quite make it. Authors are particularly keen on writing afresh, and the results are generally positive. Lawrence of Arabia actually lost the entire 250,000 word manuscript of ‘The Seven Pillars of Wisdom’ by accidentally leaving it on a train at Reading station, before rewriting a much better version.  Now, any writer or artist is seduced by technology into altering or refining their work rather than casting it dramatically in the bin or setting a light to it on a bonfire, and rewriting it from the blank page.  It is easy to pick away at a flawed work, but the real creative process is far more brutal. Once, many years ago whilst running a software house that supplied commercial software to local businesses, I’d been supervising an accounting system for a farming cooperative. No packaged system met their needs, and it was all hand-cut code.  For us, it represented a breakthrough as it was for a government organisation, and success would guarantee more contracts. As you’ve probably guessed, the code got mangled in a disk crash just a week before the deadline for delivery, and the many backups all proved to be entirely corrupted by a faulty tape drive.  There were some fragments left on individual machines, but they were all of different versions.  The developers were in despair.  Strangely, I managed to re-write the bulk of a three-month project in a manic and caffeine-soaked weekend.  Sure, that elegant universally-applicable input-form routine was‘nt quite so elegant, but it didn’t really need to be as we knew what forms it needed to support.  Yes, the code lacked architectural elegance and reusability. By dawn on Monday, the application passed its integration tests. The developers rose to the occasion after I’d collapsed, and tidied up what I’d done, though they were reproachful that some of the style and elegance had gone out of the application. By the delivery date, we were able to install it. It was a smaller, faster application than the beta they’d seen and the user-interface had a new, rather Spartan, appearance that we swore was done to conform to the latest in user-interface guidelines. (we switched to Helvetica font to look more ‘Bauhaus’ ). The client was so delighted that he forgave the new bugs that had crept in. I still have the disk that crashed, up in the attic. In IT, we have had mixed experiences from complete re-writes. Lotus 123 never really recovered from a complete rewrite from assembler into C, Borland made the mistake with Arago and Quattro Pro  and Netscape’s complete rewrite of their Navigator 4 browser was a white-knuckle ride. In all cases, the decision to rewrite was a result of extreme circumstances where no other course of action seemed possible.   The rewrite didn’t come out of the blue. I prefer to remember the rewrite of Minix by young Linus Torvalds, or the rewrite of Bitkeeper by a slightly older Linus.  The rewrite of CP/M didn’t do too badly either, did it? Come to think of it, the guy who decided to rewrite the windowing system of the Xerox Star never regretted the decision. I’ll agree that one should often resist calls for a rewrite. One of the worst habits of the more inexperienced programmer is to denigrate whatever code he or she inherits, and then call loudly for a complete rewrite. They are buoyed up by the mistaken belief that they can do better. This, however, is a different psychological phenomenon, more related to the idea of some motorcyclists that they are operating on infinite lives, or the occasional squaddies that if they charge the machine-guns determinedly enough all will be well. Grim experience brings out the humility in any experienced programmer.  I’m referring to quite different circumstances here. Where a team knows the requirements perfectly, are of one mind on methodology and coding standards, and they already have a solution, then what is wrong with considering  a complete rewrite? Rewrites are so painful in the early stages, until that point where one realises the payoff, that even I quail at the thought. One needs a natural disaster to push one over the edge. The trouble is that source-control systems, and disaster recovery systems, are just too good nowadays.   If I were to lose this draft of this very blog post, I know I’d rewrite it much better. However, if you read this, you’ll know I didn’t have the nerve to delete it and start again.  There was a time that one prayed that unreliable hardware would deliver you from an unmaintainable mess of a codebase, but now technology has made us almost entirely immune to such a merciful act of God. An old friend of mine with long experience in the software industry has long had the idea of the ‘source-control wet-work’,  where one hires a malicious hacker in some wild eastern country to hack into one’s own  source control system to destroy all trace of the source to an application. Alas, backup systems are just too good to make this any more than a pipedream. Somehow, it would be difficult to promote the idea. As an alternative, could one construct a source control system that, on doing all the code-quality metrics, would systematically destroy all trace of source code that failed the quality test? Alas, I can’t see many managers buying into the idea. In reading the full story of the near-loss of Toy Story 2, it set me thinking. It turned out that the lucky restoration of the code wasn’t the happy ending one first imagined it to be, because they eventually came to the conclusion that the plot was fundamentally flawed and it all had to be rewritten anyway.  Was this an early  case of the ‘source-control wet-job’?’ It is very hard nowadays to do a rapid U-turn in a development project because we are far too prone to cling to our existing source-code.

    Read the article

  • A Slice of Raspberry Pi

    - by Phil Factor
    Guest editorial for the ITPro/SysAdmin newsletter The Raspberry Pi Foundation has done a superb design job on their new $35 network-enabled Linux computer. This tiny machine, incorporating an ARM processor on a Broadcom BCM2835 multimedia chip, aims to put the fun back into learning computing. The public response has been overwhelmingly positive.Note that aim: "…to put the fun back". Education in Information Technology is in dire straits. It always has been, but seems to have deteriorated further still, even in the face of improved provision of equipment.In many countries, the government controls the curriculum. It predicted a shortage in office-based IT skills, and so geared the ICT curriculum toward mind-numbing training in word-processing and spreadsheet skills. Instead, the shortage has turned out to be in people with an engineering-mindset, who can solve problems with whatever technologies are available and learn new techniques quickly, in a rapidly-changing field.In retrospect, the assumption that specific training was required rather than an education was an idiotic response to the arrival of mainstream information technology. As a result, ICT became a disaster area, which discouraged a generation of youngsters from a career in IT, and thereby led directly to the shortage of people with the skills that are required to exploit the potential of Information Technology..Raspberry Pi aims to reverse the trend. This is a rig that is geared to fast graphics in high resolution. It is no toy. It should be a superb games machine. However, the use of Fedora, Debian, or Arch Linux ARM shows the more serious educational intent behind the Foundation's work. It looks like it will even do some office work too!So, get hold of any power supply that provides a 5VDC source at the required 700mA; an old Blackberry charger will do or, alternatively, it will run off four AA cells. You'll need a USB hub to support the mouse and keyboard, and maybe a hard drive. You'll want a DVI monitor (with audio out) or TV (sound and video). You'll also need to be able to cope with wired Ethernet 10/100, if you want networking.With this lot assembled, stick the paraphernalia on the back of the HDTV with Blu Tack, get a nice keyboard, and you have a classy Linux-based home computer. The major cost is in the T.V and the keyboard. If you're not already writing software for this platform, then maybe, at a time when some countries are talking of orders in the millions, you should consider it.

    Read the article

  • The Presentation Isn't Over Until It's Over

    - by Phil Factor
    The senior corporate dignitaries settled into their seats looking important in a blue-suited sort of way. The lights dimmed as I strode out in front to give my presentation.  I had ten vital minutes to make my pitch.  I was about to dazzle the top management of a large software company who were considering the purchase of my software product. I would present them with a dazzling synthesis of diagrams, graphs, followed by  a live demonstration of my software projected from my laptop.  My preparation had been meticulous: It had to be: A year’s hard work was at stake, so I’d prepared it to perfection.  I stood up and took them all in, with a gaze of sublime confidence. Then the laptop expired. There are several possible alternative plans of action when this happens     A. Stare at the smoking laptop vacuously, flapping ones mouth slowly up and down     B. Stand frozen like a statue, locked in indecision between fright and flight.     C. Run out of the room, weeping     D. Pretend that this was all planned     E. Abandon the presentation in favour of a stilted and tedious dissertation about the software     F. Shake your fist at the sky, and curse the sense of humour of your preferred deity I started for a few seconds on plan B, normally referred to as the ‘Rabbit in the headlamps of the car’ technique. Suddenly, a little voice inside my head spoke. It spoke the famous inane words of Yogi Berra; ‘The game isn't over until it's over.’ ‘Too right’, I thought. What to do? I ran through the alternatives A-F inclusive in my mind but none appealed to me. I was completely unprepared for this. Nowadays, longevity has since taught me more than I wanted to know about the wacky sense of humour of fate, and I would have taken two laptops. I hadn’t, but decided to do the presentation anyway as planned. I started out ignoring the dead laptop, but pretending, instead that it was still working. The audience looked startled. They were expecting plan B to be succeeded by plan C, I suspect. They weren’t used to denial on this scale. After my introductory talk, which didn’t require any visuals, I came to the diagram that described the application I’d written.  I’d taken ages over it and it was hot stuff. Well, it would have been had it been projected onto the screen. It wasn’t. Before I describe what happened then, I must explain that I have thespian tendencies.  My  triumph as Professor Higgins in My Fair Lady at the local operatic society is now long forgotten, but I remember at the time of my finest performance, the moment that, glancing up over the vast audience of  moist-eyed faces at the during the poignant  scene between Eliza and Higgins at the end, I  realised that I had a talent that one day could possibly  be harnessed for commercial use I just talked about the diagram as if it was there, but throwing in some extra description. The audience nodded helpfully when I’d done enough. Emboldened, I began a sort of mime, well, more of a ballet, to represent each slide as I came to it. Heaven knows I’d done my preparation and, in my mind’s eye, I could see every detail, but I had to somehow project the reality of that vision to the audience, much the same way any actor playing Macbeth should do the ghost of Banquo.  My desperation gave me a manic energy. If you’ve ever demonstrated a windows application entirely by mime, gesture and florid description, you’ll understand the scale of the challenge, but then I had nothing to lose. With a brief sentence of description here and there, and arms flailing whilst outlining the size and shape of  graphs and diagrams, I used the many tricks of mime, gesture and body-language  learned from playing Captain Hook, or the Sheriff of Nottingham in pantomime. I set out determinedly on my desperate venture. There wasn’t time to do anything but focus on the challenge of the task: the world around me narrowed down to ten faces and my presentation: ten souls who had to be hypnotized into seeing a Windows application:  one that was slick, well organized and functional I don’t remember the details. Eight minutes of my life are gone completely. I was a thespian berserker.  I know however that I followed the basic plan of building the presentation in a carefully controlled crescendo until the dazzling finale where the results were displayed on-screen.  ‘And here you see the results, neatly formatted and grouped carefully to enhance the significance of the figures, together with running trend-graphs!’ I waved a mime to signify an animated  window-opening, and looked up, in my first pause, to gaze defiantly  at the audience.  It was a sight I’ll never forget. Ten pairs of eyes were gazing in rapt attention at the imaginary window, and several pairs of eyes were glancing at the imaginary graphs and figures.  I hadn’t had an audience like that since my starring role in  Beauty and the Beast.  At that moment, I realized that my desperate ploy might work. I sat down, slightly winded, when my ten minutes were up.  For the first and last time in my life, the audience of a  ‘PowerPoint’ presentation burst into spontaneous applause. ‘Any questions?’ ‘Yes,  Have you got an agent?’ Yes, in case you’re wondering, I got the deal. They bought the software product from me there and then. However, it was a life-changing experience for me and I have never ever again trusted technology as part of a presentation.  Even if things can’t go wrong, they’ll go wrong and they’ll kill the flow of what you’re presenting.  if you can’t do something without the techno-props, then you shouldn’t do it.  The greatest lesson of all is that great presentations require preparation and  ‘stage-presence’ rather than fancy graphics. They’re a great supporting aid, but they should never dominate to the point that you’re lost without them.

    Read the article

  • Source-control 'wet-work'?

    - by Phil Factor
    When a design or creative work is flawed beyond remedy, it is often best to destroy it and start again. The other day, I lost the code to a long and intricate SQL batch I was working on. I’d thought it was impossible, but it happened. With all the technology around that is designed to prevent this occurring, this sort of accident has become a rare event.  If it weren’t for a deranged laptop, and my distraction, the code wouldn’t have been lost this time.  As always, I sighed, had a soothing cup of tea, and typed it all in again.  The new code I hastily tapped in  was much better: I’d held in my head the essence of how the code should work rather than the details: I now knew for certain  the start point, the end, and how it should be achieved. Instantly the detritus of half-baked thoughts fell away and I was able to write logical code that performed better.  Because I could work so quickly, I was able to hold the details of all the columns and variables in my head, and the dynamics of the flow of data. It was, in fact, easier and quicker to start from scratch rather than tidy up and refactor the existing code with its inevitable fumbling and half-baked ideas. What a shame that technology is now so good that developers rarely experience the cleansing shock of losing one’s code and having to rewrite it from scratch.  If you’ve never accidentally lost  your code, then it is worth doing it deliberately once for the experience. Creative people have, until Technology mistakenly prevented it, torn up their drafts or sketches, threw them in the bin, and started again from scratch.  Leonardo’s obsessive reworking of the Mona Lisa was renowned because it was so unusual:  Most artists have been utterly ruthless in destroying work that didn’t quite make it. Authors are particularly keen on writing afresh, and the results are generally positive. Lawrence of Arabia actually lost the entire 250,000 word manuscript of ‘The Seven Pillars of Wisdom’ by accidentally leaving it on a train at Reading station, before rewriting a much better version.  Now, any writer or artist is seduced by technology into altering or refining their work rather than casting it dramatically in the bin or setting a light to it on a bonfire, and rewriting it from the blank page.  It is easy to pick away at a flawed work, but the real creative process is far more brutal. Once, many years ago whilst running a software house that supplied commercial software to local businesses, I’d been supervising an accounting system for a farming cooperative. No packaged system met their needs, and it was all hand-cut code.  For us, it represented a breakthrough as it was for a government organisation, and success would guarantee more contracts. As you’ve probably guessed, the code got mangled in a disk crash just a week before the deadline for delivery, and the many backups all proved to be entirely corrupted by a faulty tape drive.  There were some fragments left on individual machines, but they were all of different versions.  The developers were in despair.  Strangely, I managed to re-write the bulk of a three-month project in a manic and caffeine-soaked weekend.  Sure, that elegant universally-applicable input-form routine was‘nt quite so elegant, but it didn’t really need to be as we knew what forms it needed to support.  Yes, the code lacked architectural elegance and reusability. By dawn on Monday, the application passed its integration tests. The developers rose to the occasion after I’d collapsed, and tidied up what I’d done, though they were reproachful that some of the style and elegance had gone out of the application. By the delivery date, we were able to install it. It was a smaller, faster application than the beta they’d seen and the user-interface had a new, rather Spartan, appearance that we swore was done to conform to the latest in user-interface guidelines. (we switched to Helvetica font to look more ‘Bauhaus’ ). The client was so delighted that he forgave the new bugs that had crept in. I still have the disk that crashed, up in the attic. In IT, we have had mixed experiences from complete re-writes. Lotus 123 never really recovered from a complete rewrite from assembler into C, Borland made the mistake with Arago and Quattro Pro  and Netscape’s complete rewrite of their Navigator 4 browser was a white-knuckle ride. In all cases, the decision to rewrite was a result of extreme circumstances where no other course of action seemed possible.   The rewrite didn’t come out of the blue. I prefer to remember the rewrite of Minix by young Linus Torvalds, or the rewrite of Bitkeeper by a slightly older Linus.  The rewrite of CP/M didn’t do too badly either, did it? Come to think of it, the guy who decided to rewrite the windowing system of the Xerox Star never regretted the decision. I’ll agree that one should often resist calls for a rewrite. One of the worst habits of the more inexperienced programmer is to denigrate whatever code he or she inherits, and then call loudly for a complete rewrite. They are buoyed up by the mistaken belief that they can do better. This, however, is a different psychological phenomenon, more related to the idea of some motorcyclists that they are operating on infinite lives, or the occasional squaddies that if they charge the machine-guns determinedly enough all will be well. Grim experience brings out the humility in any experienced programmer.  I’m referring to quite different circumstances here. Where a team knows the requirements perfectly, are of one mind on methodology and coding standards, and they already have a solution, then what is wrong with considering  a complete rewrite? Rewrites are so painful in the early stages, until that point where one realises the payoff, that even I quail at the thought. One needs a natural disaster to push one over the edge. The trouble is that source-control systems, and disaster recovery systems, are just too good nowadays.   If I were to lose this draft of this very blog post, I know I’d rewrite it much better. However, if you read this, you’ll know I didn’t have the nerve to delete it and start again.  There was a time that one prayed that unreliable hardware would deliver you from an unmaintainable mess of a codebase, but now technology has made us almost entirely immune to such a merciful act of God. An old friend of mine with long experience in the software industry has long had the idea of the ‘source-control wet-work’,  where one hires a malicious hacker in some wild eastern country to hack into one’s own  source control system to destroy all trace of the source to an application. Alas, backup systems are just too good to make this any more than a pipedream. Somehow, it would be difficult to promote the idea. As an alternative, could one construct a source control system that, on doing all the code-quality metrics, would systematically destroy all trace of source code that failed the quality test? Alas, I can’t see many managers buying into the idea. In reading the full story of the near-loss of Toy Story 2, it set me thinking. It turned out that the lucky restoration of the code wasn’t the happy ending one first imagined it to be, because they eventually came to the conclusion that the plot was fundamentally flawed and it all had to be rewritten anyway.  Was this an early  case of the ‘source-control wet-job’?’ It is very hard nowadays to do a rapid U-turn in a development project because we are far too prone to cling to our existing source-code.

    Read the article

  • Normalisation and 'Anima notitia copia' (Soul of the Database)

    - by Phil Factor
    (A Guest Editorial for Simple-Talk) The other day, I was staring  at the sys.syslanguages  table in SQL Server with slightly-raised eyebrows . I’d just been reading Chris Date’s  interesting book ‘SQL and Relational Theory’. He’d made the point that you’re not necessarily doing relational database operations by using a SQL Database product.  The same general point was recently made by Dino Esposito about ASP.NET MVC.  The use of ASP.NET MVC doesn’t guarantee you a good application design: It merely makes it possible to test it. The way I’d describe the sentiment in both cases is ‘you can hit someone over the head with a frying-pan but you can’t call it cooking’. SQL enables you to create relational databases. However,  even if it smells bad, it is no crime to do hideously un-relational things with a SQL Database just so long as it’s necessary and you can tell the difference; not only that but also only if you’re aware of the risks and implications. Naturally, I’ve never knowingly created a database that Codd would have frowned at, but around the edges are interfaces and data feeds I’ve written  that have caused hissy fits amongst the Normalisation fundamentalists. Part of the problem for those who agonise about such things  is the misinterpretation of Atomicity.  An atomic value is one for which, in the strange virtual universe you are creating in your database, you don’t have any interest in any of its component parts.  If you aren’t interested in the electrons, neutrinos,  muons,  or  taus, then  an atom is ..er.. atomic. In the same way, if you are passed a JSON string or XML, and required to store it in a database, then all you need to do is to ask yourself, in your role as Anima notitia copia (Soul of the database) ‘have I any interest in the contents of this item of information?’.  If the answer is ‘No!’, or ‘nequequam! Then it is an atomic value, however complex it may be.  After all, you would never have the urge to store the pixels of images individually, under the misguided idea that these are the atomic values would you?  I would, of course,  ask the ‘Anima notitia copia’ rather than the application developers, since there may be more than one application, and the applications developers may be designing the application in the absence of full domain knowledge, (‘or by the seat of the pants’ as the technical term used to be). If, on the other hand, the answer is ‘sure, and we want to index the XML column’, then we may be in for some heavy XML-shredding sessions to get to store the ‘atomic’ values and ensure future harmony as the application develops. I went back to looking at the sys.syslanguages table. It has a months column with the months in a delimited list January,February,March,April,May,June,July,August,September,October,November,December This is an ordered list. Wicked? I seem to remember that this value, like shortmonths and days, is treated as a ‘thing’. It is merely passed off to an external  C++ routine in order to format a date in a particular language, and never accessed directly within the database. As far as the database is concerned, it is an atomic value.  There is more to normalisation than meets the eye.

    Read the article

  • A Knights Tale

    - by Phil Factor
    There are so many lessons to be learned from the story of Knight Capital losing nearly half a billion dollars as a result of a deployment gone wrong. The Knight Capital Group (KCG N) was an American global financial services firm engaging in market making, electronic execution, and institutional sales and trading. According to the recent order (File No.3.15570) against Knight Capital by U.S. Securities and Exchange Commission?, Knight had, for many years used some software which broke up incoming “parent” orders into smaller “child” orders that were then transmitted to various exchanges or trading venues for execution. A tracking ‘cumulative quantity’ function counted the number of ‘child’ orders and stopped the process once the total of child orders matched the ‘parent’ and so the parent order had been completed. Back in the mists of time, some code had been added to it  which was excuted if a particular flag was set. It was called ‘power peg’ and seems to have had a similar design and purpose, but, one guesses, would have shared the same tracking function. This code had been abandoned in 2003, but never deleted. In 2005, The tracking function was moved to an earlier point in the main process. It would seem from the account that, from that point, had that flag ever been set, the old ‘Power Peg’ would have been executed like Godzilla bursting from the ice, making child orders without limit without any tracking function. It wasn’t, presumably because the software that set the flag was removed. In 2012, nearly a decade after ‘Power Peg’ was abandoned, Knight prepared a new module to their software to cope with the imminent Retail Liquidity Program (RLP) for the New York Stock Exchange. By this time, the flag had remained unused and someone made the fateful decision to reuse it, and replace the old ‘power peg’ code with this new RLP code. Had the two actions been done together in a single automated deployment, and the new deployment tested, all would have been well. It wasn’t. To quote… “Beginning on July 27, 2012, Knight deployed the new RLP code in SMARS in stages by placing it on a limited number of servers in SMARS on successive days. During the deployment of the new code, however, one of Knight’s technicians did not copy the new code to one of the eight SMARS computer servers. Knight did not have a second technician review this deployment and no one at Knight realized that the Power Peg code had not been removed from the eighth server, nor the new RLP code added. Knight had no written procedures that required such a review.” (para 15) “On August 1, Knight received orders from broker-dealers whose customers were eligible to participate in the RLP. The seven servers that received the new code processed these orders correctly. However, orders sent with the repurposed flag to the eighth server triggered the defective Power Peg code still present on that server. As a result, this server began sending child orders to certain trading centers for execution. Because the cumulative quantity function had been moved, this server continuously sent child orders, in rapid sequence, for each incoming parent order without regard to the number of share executions Knight had already received from trading centers. Although one part of Knight’s order handling system recognized that the parent orders had been filled, this information was not communicated to SMARS.” (para 16) SMARS routed millions of orders into the market over a 45-minute period, and obtained over 4 million executions in 154 stocks for more than 397 million shares. By the time that Knight stopped sending the orders, Knight had assumed a net long position in 80 stocks of approximately $3.5 billion and a net short position in 74 stocks of approximately $3.15 billion. Knight’s shares dropped more than 20% after traders saw extreme volume spikes in a number of stocks, including preferred shares of Wells Fargo (JWF) and semiconductor company Spansion (CODE). Both stocks, which see roughly 100,000 trade per day, had changed hands more than 4 million times by late morning. Ultimately, Knight lost over $460 million from this wild 45 minutes of trading. Obviously, I’m interested in all this because, at one time, I used to write trading systems for the City of London. Obviously, the US SEC is in a far better position than any of us to work out the failings of Knight’s IT department, and the report makes for painful reading. I can’t help observing, though, that even with the breathtaking mistakes all along the way, that a robust automated deployment process that was ‘all-or-nothing’, and tested from soup to nuts would have prevented the disaster. The report reads like a Greek Tragedy. All the way along one wants to shout ‘No! not that way!’ and ‘Aargh! Don’t do it!’. As the tragedy unfolds, the audience weeps for the players, trapped by a cruel fate. All application development and deployment requires defense in depth. All IT goes wrong occasionally, but if there is a culture of defensive programming throughout, the consequences are usually containable. For financial systems, these defenses are required by statute, and ignored only by the foolish. Knight’s mistakes weren’t made by just one hapless sysadmin, but were progressive errors by an  IT culture spanning at least ten years.  One can spell these out, but I think they’re obvious. One can only hope that the industry studies what happened in detail, learns from the mistakes, and draws the right conclusions.

    Read the article

  • Software Tuned to Humanity

    - by Phil Factor
    I learned a great deal from a cynical old programmer who once told me that the ideal length of time for a compiler to do its work was the same time it took to roll a cigarette. For development work, this is oh so true. After intently looking at the editing window for an hour or so, it was a relief to look up, stretch, focus the eyes on something else, and roll the possibly-metaphorical cigarette. This was software tuned to humanity. Likewise, a user’s perception of the “ideal” time that an application will take to move from frame to frame, to retrieve information, or to process their input has remained remarkably static for about thirty years, at around 200 ms. Anything else appears, and always has, to be either fast or slow. This could explain why commercial applications, unlike games, simulations and communications, aren’t noticeably faster now than they were when I started programming in the Seventies. Sure, they do a great deal more, but the SLAs that I negotiated in the 1980s for application performance are very similar to what they are nowadays. To prove to myself that this wasn’t just some rose-tinted misperception on my part, I cranked up a Z80-based Jonos CP/M machine (1985) in the roof-space. Within 20 seconds from cold, it had loaded Wordstar and I was ready to write. OK, I got it wrong: some things were faster 30 years ago. Sure, I’d now have had all sorts of animations, wizzy graphics, and other comforting features, but it seems a pity that we have used all that extra CPU and memory to increase the scope of what we develop, and the graphical prettiness, but not to speed the processes needed to complete a business procedure. Never mind the weight, the response time’s great! To achieve 200 ms response times on a Z80, or similar, performance considerations influenced everything one did as a developer. If it meant writing an entire application in assembly code, applying every smart algorithm, and shortcut imaginable to get the application to perform to spec, then so be it. As a result, I’m a dyed-in-the-wool performance freak and find it difficult to change my habits. Conversely, many developers now seem to feel quite differently. While all will acknowledge that performance is important, it’s no longer the virtue is once was, and other factors such as user-experience now take precedence. Am I wrong? If not, then perhaps we need a new school of development technique to rival Agile, dedicated once again to producing applications that smoke the rear wheels rather than pootle elegantly to the shops; that forgo skeuomorphism, cute animation, or architectural elegance in favor of the smell of hot rubber. I struggle to name an application I use that is truly notable for its blistering performance, and would dearly love one to do my everyday work – just as long as it doesn’t go faster than my brain.

    Read the article

  • Sitting Pretty

    - by Phil Factor
    Guest Editorial for Simple-Talk IT Pro newsletter'DBAs and SysAdmins generally prefer an expression of calmness under adversity. It is a subtle trick, and requires practice in front of a mirror to get it just right. Too much adversity and they think you're not coping; too much calmness and they think you're under-employed' I dislike the term 'avatar', when used to describe a portrait photograph. An avatar, in the sense of a picture, is merely the depiction of one's role-play alter-ego, often a ridiculous bronze-age deity. However, professional image is important. The choice and creation of online photos has an effect on the way your message is received and it is important to get that right. It is fine to use that photo of you after ten lagers on holiday in an Ibiza nightclub, but what works on Facebook looks hilarious on LinkedIn. My splendid photograph that I use online was done by a professional photographer at great expense and I've never had the slightest twinge of regret when I remember how much I paid for it. It is me, but a more pensive and dignified edition, oozing trust and wisdom. One gasps at the magical skill that a professional photographer can conjure up, without digital manipulation, to make the best of a derisory noggin (ed: slang for a head). Even if he had offered to depict me as a semi-naked, muscle-bound, sword-wielding hero, I'd have demurred. No, any professional person needs a carefully cultivated image that looks right. I'd never thought of using that profile shot, though I couldn't help noticing the photographer flinch slightly when he first caught sight of my face. There is a problem with using an avatar. The use of a single image doesn't express the appropriate emotion. At the moment, it is weird to see someone with a laughing portrait writing something solemn. A neutral cast to the face, somewhat like a passport photo, is probably the best compromise. Actually, the same is true of a working life in IT. One of the first skills I learned was not to laugh at managers, but, instead, to develop a facial expression that promoted a sense of keenness, energy and respect. Every profession has its own preferred facial cast. A neighbour of mine has the natural gift of a face that displays barely repressed grief. Though he is characteristically cheerful, he earns a remarkable income as a pallbearer. DBAs and SysAdmins generally prefer an expression of calmness under adversity. It is a subtle trick, and requires practice in front of a mirror to get it just right. Too much adversity and they think you're not coping; too much calmness and they think you're under-employed. With an appropriate avatar, you could do away with a lot of the need for 'smilies' to give clues as to the meaning of what you've written on forums and blogs. If you had a set of avatars, showing the full gamut of human emotions expressible in writing: Rage, fear, reproach, joy, ebullience, apprehension, exasperation, dissembly, irony, pathos, euphoria, remorse and so on. It would be quite a drop-down list on forums, but given the vast prairies of space on the average hard drive, who cares? It would cut down on the number of spats in Forums just as long as one picks the right avatar. As an unreconstructed geek, I find it hard to admit to the value of image in the workplace, but it is true. Just as we use professionals to tidy up and order our CVs and job applications, we should employ experts to enhance our professional image. After all you don't perform surgery or dentistry on yourself do you?

    Read the article

  • Learn Many Languages

    - by Phil Factor
    Around twenty-five years ago, I was trying to solve the problem of recruiting suitable developers for a large business. I visited the local University (it was a Technical College then). My mission was to remind them that we were a large, local employer of technical people and to suggest that, as they were in the business of educating young people for a career in IT, we should work together. I anticipated a harmonious chat where we could suggest to them the idea of mentioning our name to some of their graduates. It didn’t go well. The academic staff displayed a degree of revulsion towards the whole topic of IT in the world of commerce that surprised me; tweed met charcoal-grey, trainers met black shoes. However, their antipathy to commerce was something we could have worked around, since few of their graduates were destined for a career as university lecturers. They asked me what sort of language skills we needed. I tried ducking the invidious task of naming computer languages, since I wanted recruits who were quick to adapt and learn, with a broad understanding of IT, including development methodologies, technologies, and data. However, they pressed the point and I ended up saying that we needed good working knowledge of C and BASIC, though FORTRAN and COBOL were, at the time, still useful. There was a ghastly silence. It was as if I’d recommended the beliefs and practices of the Bogomils of Bulgaria to a gathering of Cardinals. They stared at me severely, like owls, until the head of department broke the silence, informing me in clipped tones that they taught only Modula 2. Now, I wouldn’t blame you if at this point you hurriedly had to look up ‘Modula 2′ on Wikipedia. Based largely on Pascal, it was a specialist language for embedded systems, but I’ve never ever come across it in a commercial business application. Nevertheless, it was an excellent teaching language since it taught modules, scope control, multiprogramming and the advantages of encapsulating a set of related subprograms and data structures. As long as the course also taught how to transfer these skills to other, more useful languages, it was not necessarily a problem. I said as much, but they gleefully retorted that the biggest local employer, a defense contractor specializing in Radar and military technology, used nothing but Modula 2. “Why teach any other programming language when they will be using Modula 2 for all their working lives?” said a complacent lecturer. On hearing this, I made my excuses and left. There could be no meeting of minds. They were providing training in a specific computer language, not an education in IT. Twenty years later, I once more worked nearby and regularly passed the long-deserted ‘brownfield’ site of the erstwhile largest local employer; the end of the cold war had led to lean times for defense contractors. A digger was about to clear the rubble of the long demolished factory along with the accompanying growth of buddleia and thistles, in order to lay the infrastructure for ‘affordable housing’. Modula 2 was a distant memory. Either those employees had short working lives or they’d retrained in other languages. The University, by contrast, was thriving, but I wondered if their erstwhile graduates had ever cursed the narrow specialization of their training in IT, as they struggled with the unexpected variety of their subsequent careers.

    Read the article

  • At times, you need to hire a professional.

    - by Phil Factor
    After months of increasingly demanding toil, the development team I belonged to was told that the project was to be canned and the whole team would be fired.  I’d been brought into the team as an expert in the data implications of a business re-engineering of a major financial institution. Nowadays, you’d call me a data architect, I suppose.  I’d spent a happy year being paid consultancy fees solving a succession of interesting problems until the point when the company lost is nerve, and closed the entire initiative. The IT industry was in one of its characteristic mood-swings downwards.  After the announcement, we met in the canteen. A few developers had scented the smell of death around the project already hand had been applying unsuccessfully for jobs. There was a sense of doom in the mass of dishevelled and bleary-eyed developers. After giving vent to anger and despair, talk turned to getting new employment. It was then that I perked up. I’m not an obvious choice to give advice on getting, or passing,  IT interviews. I reckon I’ve failed most of the job interviews I’ve ever attended. I once even failed an interview for a job I’d already been doing perfectly well for a year. The jobs I’ve got have mostly been from personal recommendation. Paradoxically though, from years as a manager trying to recruit good staff, I know a lot about what IT managers are looking for.  I gave an impassioned speech outlining the important factors in getting to an interview.  The most important thing, certainly in my time at work is the quality of the résumé or CV. I can’t even guess the huge number of CVs (résumés) I’ve read through, scanning for candidates worth interviewing.  Many IT Developers find it impossible to describe their  career succinctly on two sides of paper.  They leave chunks of their life out (were they in prison?), get immersed in detail, put in irrelevancies, describe what was going on at work rather than what they themselves did, exaggerate their importance, criticize their previous employers, aren’t  aware of the important aspects of a role to a potential employer, suffer from shyness and modesty,  and lack any sort of organized perspective of their work. There are many ways of failing to write a decent CV. Many developers suffer from the delusion that their worth can be recognized purely from the code that they write, and shy away from anything that seems like self-aggrandizement. No.  A resume must make a good impression, which means presenting the facts about yourself in a clear and positive way. You can’t do it yourself. Why not have your resume professionally written? A good professional CV Writer will know the qualities being looked for in a CV and interrogate you to winkle them out. Their job is to make order and sense out of a confused career, to summarize in one page a mass of detail that presents to any recruiter the information that’s wanted. To stand back and describe an accurate summary of your skills, and work-experiences dispassionately, without rancor, pity or modesty. You are no more capable of producing an objective documentation of your career than you are of taking your own appendix out.  My next recommendation was more controversial. This is to have a professional image overhaul, or makeover, followed by a professionally-taken photo portrait. I discovered this by accident. It is normal for IT professionals to face impossible deadlines and long working hours by looking more and more like something that had recently blocked a sink. Whilst working in IT, and in a state of personal dishevelment, I’d been offered the role in a high-powered amateur production of an old ex- Broadway show, purely for my singing voice. I was supposed to be the presentable star. When the production team saw me, the air was thick with tension and despair. I was dragged kicking and protesting through a succession of desperate grooming, scrubbing, dressing, dieting. I emerged feeling like “That jewelled mass of millinery, That oiled and curled Assyrian bull, Smelling of musk and of insolence.” (Tennyson Maud; A Monodrama (1855) Section v1 stanza 6) I was then photographed by a professional stage photographer.  When the photographs were delivered, I was amazed. It wasn’t me, but it looked somehow respectable, confident, trustworthy.   A while later, when the show had ended, I took the photos, and used them for work. They went with the CV to job applications. It did the trick better than I could ever imagine.  My views went down big with the developers. Old rivalries were put immediately to one side. We voted, with a show of hands, to devote our energies for the entire notice period to getting employable. We had a team sourcing the CV Writer,  a team organising the make-overs and photographer, and a third team arranging  mock interviews. A fourth team determined the best websites and agencies for recruitment, with the help of friends in the trade.  Because there were around thirty developers, we were in a good negotiating position.  Of the three CV Writers we found who lived locally, one proved exceptional. She was an ex-journalist with an eye to detail, and years of experience in manipulating language. We tried her skills out on a developer who seemed a hopeless case, and he was called to interview within a week.  I was surprised, too, how many companies were experts at image makeovers. Within the month, we all looked like those weird slick  people in the ‘Office-tagged’ stock photographs who stare keenly and interestedly at PowerPoint slides in sleek chromium-plated high-rise offices. The portraits we used still adorn the entries of many of my ex-colleagues in LinkedIn. After a months’ worth of mock interviews, and technical Q&A, our stutters, hesitations, evasions and periphrastic circumlocutions were all gone.  There is little more to relate. With the résumés or CVs, mugshots, and schooling in how to pass interviews, we’d all got new and better-paid jobs well  before our month’s notice was ended. Whilst normally, an IT team under the axe is a sad and depressed place to belong to, this wonderful group of people had proved the power of organized group action in turning the experience to advantage. It left us feeling slightly guilty that we were somehow cheating, but I guess we were merely leveling the playing-field.

    Read the article

  • Musings on the launch of SQL Monitor

    - by Phil Factor
    For several years, I was responsible for the smooth running of a large number of enterprise database servers. We ran a network monitoring tool that was primitive by today’s standards but which performed the useful function of polling every system, including all the Servers in my charge. It ran a configurable script for each service that you needed to monitor that was merely required to return one of a number of integer values. These integer values represented the pain level of the service, from 10 (“hurtin’ real bad”) to 1 (“Things is great”). Not only could you program the visual appearance of each server on the network diagram according to the value of the integer, but you could even opt to run a sound file. Very soon, we had a large TFT Screen, high on the wall of the server room, with every server represented by an icon, and a speaker next to it that would give out a series of grunts, groans, snores, shrieks and funeral marches, depending on the problem. One glance at the display, and you could dive in with iSQL/QA/SSMS and check what was going on with your favourite diagnostic tools. If you saw a server icon burst into flames on the screen or droop like a jelly, you dropped your mug of coffee to do it.  It was real fun, but I remember it more for the huge difference it made to have that real-time visibility into how your servers are performing. The management soon stopped making jokes about the real reason we wanted the TFT screen. (It rendered DVDs beautifully they said; particularly flesh-tints). If you are instantly alerted when things start to go wrong, then there was a good chance you could fix it before being alerted to the problem by the users of the system.  There is a world of difference between this sort of tool, one that gives whoever is ‘on watch’ in the server room the first warning of a potential problem on one of any number of servers, and the breed of tool that attempts to provide some sort of prosthetic DBA Brain. I like to get the early warning, to get the right information to help to diagnose a problem: No auto-fix, but just the information. I prefer to leave the task of ascertaining the exact cause of a problem to my own routines, custom code, intuition and forensic instincts. A simulated aircraft cockpit doesn’t do anything for me, especially before I know where I should be flying.  Time has moved on, and that TFT screen is now, with SQL Monitor, an iPad or any other mobile or static device that can support a browser. Rather than trying to reproduce the conceptual topology of the servers, it lists them in their groups so as to give a display that scales with the increasing number of databases you monitor.  It gives the history of the major events and trends for the servers. It gives the icons and colours that you can spot out of the corner of your eye, but goes on to give you just enough information in drill-down to give you a much clearer idea of where to look with your DBA tools and routines. It doesn't swamp you with information.  Whereas a few server and database-level problems are pretty easily fixed, others depend on judgement and experience to sort out.  Although the idea of an application that automates the bulk of a DBA’s skills is attractive to many, I can’t see it happening soon. SQL Server’s complexity increases faster than the panaceas can be created. In the meantime, I believe that the best way of helping  DBAs  is to make the monitoring process as simple and effective as possible,  and provide the right sort of detail and ‘evidence’ to allow them to decide on the fix. In the end, it is still down to the skill of the DBA.

    Read the article

  • Hadoop, NOSQL, and the Relational Model

    - by Phil Factor
    (Guest Editorial for the IT Pro/SysAdmin Newsletter)Whereas Relational Databases fit the world of commerce like a glove, it is useless to pretend that they are a perfect fit for all human endeavours. Although, with SQL Server, we’ve made great strides with indexing text, in processing spatial data and processing markup, there is still a problem in dealing efficiently with large volumes of ephemeral semi-structured data. Key-value stores such as Cassandra, Project Voldemort, and Riak are of great value for ephemeral data, and seem of equal value as a data-feed that provides aggregations to an RDBMS. However, the Document databases such as MongoDB and CouchDB are ideal for semi-structured data for which no fixed schema exists; analytics and logging are obvious examples. NoSQL products, such as MongoDB, tackle the semi-structured data problem with panache. MongoDB is designed with a simple document-oriented data model that scales horizontally across multiple servers. It doesn’t impose a schema, and relies on the application to enforce the data structure. This is another take on the old ‘EAV’ problem (where you don’t know in advance all the attributes of a particular entity) It uses a clever replica set design that allows automatic failover, and uses journaling for data durability. It allows indexing and ad-hoc querying. However, for SQL Server users, the obvious choice for handling semi-structured data is Apache Hadoop. There will soon be an ODBC Driver for Apache Hive .and an Add-in for Excel. Additionally, there are now two Hadoop-based connectors for SQL Server; the Apache Hadoop connector for SQL Server 2008 R2, and the SQL Server Parallel Data Warehouse (PDW) connector. We can connect to Hadoop process the semi-structured data and then store it in SQL Server. For one steeped in the culture of Relational SQL Databases, I might be expected to throw up my hands in the air in a gesture of contempt for a technology that was, judging by the overblown journalism on the subject, about to make my own profession as archaic as the Saggar makers bottom knocker (a potter’s assistant who helped the saggar maker to make the bottom of the saggar by placing clay in a metal hoop and bashing it). However, on the contrary, I find that I'm delighted with the advances made by the NoSQL databases in the past few years. Having the flow of ideas from the NoSQL providers will knock any trace of complacency out of the providers of Relational Databases and inspire them into back-fitting some features, such as horizontal scaling, with sharding and automatic failover into SQL-based RDBMSs. It will do the breed a power of good to benefit from all this lateral thinking.

    Read the article

  • Database Migration Scripts: Getting from place A to place B

    - by Phil Factor
    We’ll be looking at a typical database ‘migration’ script which uses an unusual technique to migrate existing ‘de-normalised’ data into a more correct form. So, the book-distribution business that uses the PUBS database has gradually grown organically, and has slipped into ‘de-normalisation’ habits. What’s this? A new column with a list of tags or ‘types’ assigned to books. Because books aren’t really in just one category, someone has ‘cured’ the mismatch between the database and the business requirements. This is fine, but it is now proving difficult for their new website that allows searches by tags. Any request for history book really has to look in the entire list of associated tags rather than the ‘Type’ field that only keeps the primary tag. We have other problems. The TypleList column has duplicates in there which will be affecting the reporting, and there is the danger of mis-spellings getting there. The reporting system can’t be persuaded to do reports based on the tags and the Database developers are complaining about the unCoddly things going on in their database. In your version of PUBS, this extra column doesn’t exist, so we’ve added it and put in 10,000 titles using SQL Data Generator. /* So how do we refactor this database? firstly, we create a table of all the tags. */IF  OBJECT_ID('TagName') IS NULL OR OBJECT_ID('TagTitle') IS NULL  BEGIN  CREATE TABLE  TagName (TagName_ID INT IDENTITY(1,1) PRIMARY KEY ,     Tag VARCHAR(20) NOT NULL UNIQUE)  /* ...and we insert into it all the tags from the list (remembering to take out any leading spaces */  INSERT INTO TagName (Tag)     SELECT DISTINCT LTRIM(x.y.value('.', 'Varchar(80)')) AS [Tag]     FROM     (SELECT  Title_ID,          CONVERT(XML, '<list><i>' + REPLACE(TypeList, ',', '</i><i>') + '</i></list>')          AS XMLkeywords          FROM   dbo.titles)g    CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y )  /* we can then use this table to provide a table that relates tags to articles */  CREATE TABLE TagTitle   (TagTitle_ID INT IDENTITY(1, 1),   [title_id] [dbo].[tid] NOT NULL REFERENCES titles (Title_ID),   TagName_ID INT NOT NULL REFERENCES TagName (Tagname_ID)   CONSTRAINT [PK_TagTitle]       PRIMARY KEY CLUSTERED ([title_id] ASC, TagName_ID)       ON [PRIMARY])        CREATE NONCLUSTERED INDEX idxTagName_ID  ON  TagTitle (TagName_ID)  INCLUDE (TagTitle_ID,title_id)        /* ...and it is easy to fill this with the tags for each title ... */        INSERT INTO TagTitle (Title_ID, TagName_ID)    SELECT DISTINCT Title_ID, TagName_ID      FROM        (SELECT  Title_ID,          CONVERT(XML, '<list><i>' + REPLACE(TypeList, ',', '</i><i>') + '</i></list>')          AS XMLkeywords          FROM   dbo.titles)g    CROSS APPLY XMLkeywords.nodes('/list/i/text()') AS x ( y )    INNER JOIN TagName ON TagName.Tag=LTRIM(x.y.value('.', 'Varchar(80)'))    END    /* That's all there was to it. Now we can select all titles that have the military tag, just to try things out */SELECT Title FROM titles  INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID  INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  WHERE tagname.tag='Military'/* and see the top ten most popular tags for titles */SELECT Tag, COUNT(*) FROM titles  INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID  INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  GROUP BY Tag ORDER BY COUNT(*) DESC/* and if you still want your list of tags for each title, then here they are */SELECT title_ID, title, STUFF(  (SELECT ','+tagname.tag FROM titles thisTitle    INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID    INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  WHERE ThisTitle.title_id=titles.title_ID  FOR XML PATH(''), TYPE).value('.', 'varchar(max)')  ,1,1,'')    FROM titles  ORDER BY title_ID So we’ve refactored our PUBS database without pain. We’ve even put in a check to prevent it being re-run once the new tables are created. Here is the diagram of the new tag relationship We’ve done both the DDL to create the tables and their associated components, and the DML to put the data in them. I could have also included the script to remove the de-normalised TypeList column, but I’d do a whole lot of tests first before doing that. Yes, I’ve left out the assertion tests too, which should check the edge cases and make sure the result is what you’d expect. One thing I can’t quite figure out is how to deal with an ordered list using this simple XML-based technique. We can ensure that, if we have to produce a list of tags, we can get the primary ‘type’ to be first in the list, but what if the entire order is significant? Thank goodness it isn’t in this case. If it were, we might have to revisit a string-splitter function that returns the ordinal position of each component in the sequence. You’ll see immediately that we can create a synchronisation script for deployment from a comparison tool such as SQL Compare, to change the schema (DDL). On the other hand, no tool could do the DML to stuff the data into the new table, since there is no way that any tool will be able to work out where the data should go. We used some pretty hairy code to deal with a slightly untypical problem. We would have to do this migration by hand, and it has to go into source control as a batch. If most of your database changes are to be deployed by an automated process, then there must be a way of over-riding this part of the data synchronisation process to do this part of the process taking the part of the script that fills the tables, Checking that the tables have not already been filled, and executing it as part of the transaction. Of course, you might prefer the approach I’ve taken with the script of creating the tables in the same batch as the data conversion process, and then using the presence of the tables to prevent the script from being re-run. The problem with scripting a refactoring change to a database is that it has to work both ways. If we install the new system and then have to rollback the changes, several books may have been added, or had their tags changed, in the meantime. Yes, you have to script any rollback! These have to be mercilessly tested, and put in source control just in case of the rollback of a deployment after it has been in place for any length of time. I’ve shown you how to do this with the part of the script .. /* and if you still want your list of tags for each title, then here they are */SELECT title_ID, title, STUFF(  (SELECT ','+tagname.tag FROM titles thisTitle    INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID    INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID  WHERE ThisTitle.title_id=titles.title_ID  FOR XML PATH(''), TYPE).value('.', 'varchar(max)')  ,1,1,'')    FROM titles  ORDER BY title_ID …which would be turned into an UPDATE … FROM script. UPDATE titles SET  typelist= ThisTaglistFROM     (SELECT title_ID, title, STUFF(    (SELECT ','+tagname.tag FROM titles thisTitle      INNER JOIN TagTitle ON titles.title_ID=TagTitle.Title_ID      INNER JOIN Tagname ON Tagname.TagName_ID=TagTitle.TagName_ID    WHERE ThisTitle.title_id=titles.title_ID    ORDER BY CASE WHEN tagname.tag=titles.[type] THEN 1 ELSE 0  END DESC    FOR XML PATH(''), TYPE).value('.', 'varchar(max)')    ,1,1,'')  AS ThisTagList  FROM titles)fINNER JOIN Titles ON f.title_ID=Titles.title_ID You’ll notice that it isn’t quite a round trip because the tags are in a different order, though we’ve managed to make sure that the primary tag is the first one as originally. So, we’ve improved the database for the poor book distributors using PUBS. It is not a major deal but you’ve got to be prepared to provide a migration script that will go both forwards and backwards. Ideally, database refactoring scripts should be able to go from any version to any other. Schema synchronization scripts can do this pretty easily, but no data synchronisation scripts can deal with serious refactoring jobs without the developers being able to specify how to deal with cases like this.

    Read the article

  • When things go awry

    - by Phil Factor
    The moment the Entrepreneur opened his mouth on prime-time national TV, spelled out the URL and waxed big on how exciting ‘his’ new website was, I knew I was in for a busy night. I’d designed and built it. All at once, half a million people tried to log into the website. Although all my stress-testing paid off, I have to admit that the network locked up tight long before there was any danger of a database or website problem. Soon afterwards, the Entrepreneur and the Big Boss were there in the autopsy meeting. We picked through all our systems in detail to see how they’d borne the unexpected strain. Mercifully, in view of the sour mood of the Big Boss, it turned out that the only thing we could have done better was buy a bigger pipe to and from the internet. We’d specified that ‘big pipe’ when designing the system. The Big Boss had then railed at the cost and so we’d subsequently compromised. I felt that my design decisions were vindicated. The Big Boss brooded for a while. Then he made the significant comment: “What really ****** me off is the fact that, for ten minutes, we couldn’t take people’s money.” At that point I stopped feeling smug. Had the internet connection been better, the system would have reached its limit and failed rather precipitously, and that wasn’t what he wanted. Then it occurred to me that what had gummed up the connection was all those images on the site, that had made it so impressive for the visitors. If there had been a way to automatically pare down the site to the bare essentials under stress… Hmm. I began to consider disaster-recovery in the broadest sense – maintaining a service in spite of unusual or unexpected events. What he said makes a lot of sense: sacrifice whatever isn’t essential to keep the core service running when we approach the capacity limits. Maybe in IT we should borrow (or revive) the business concept of the ‘Skeleton service’, maintaining only the priority parts under stress, using a process that is well-prepared and carefully rehearsed. How might this work? Whatever the event we have to prepare for, it is all about understanding the priorities; knowing what one can dispense with when the going gets tough. In the event of database disaster, it’s much faster to deploy a skeletal system with only the essential data than to restore the entire system, though there would have to be a reconciliation process to update the revived database retrospectively, once the emergency was over. It isn’t just the database that could be designed for resilience. One could prepare for unusually high traffic in a website by designing a system that degraded gradually to a ‘skeletal’ site, one that maintained the commercial essentials without fat images, JavaScript libraries and razzmatazz. This is all what the Big Boss scathingly called ‘a mere technicality’. It seems to me that what is needed first is a culture of application and database design which acknowledges that we live in a very imperfect world, and react accordingly when things go awry.

    Read the article

  • Separating text strings into a table of individual words in SQL via XML.

    - by Phil Factor
    p.MsoNormal {margin-top:0cm; margin-right:0cm; margin-bottom:10.0pt; margin-left:0cm; line-height:115%; font-size:11.0pt; font-family:"Calibri","sans-serif"; } Nearly nine years ago, Mike Rorke of the SQL Server 2005 XML team blogged ‘Querying Over Constructed XML Using Sub-queries’. I remember reading it at the time without being able to think of a use for what he was demonstrating. Just a few weeks ago, whilst preparing my article on searching strings, I got out my trusty function for splitting strings into words and something reminded me of the old blog. I’d been trying to think of a way of using XML to split strings reliably into words. The routine I devised turned out to be slightly slower than the iterative word chop I’ve always used in the past, so I didn’t publish it. It was then I suddenly remembered the old routine. Here is my version of it. I’ve unwrapped it from its obvious home in a function or procedure just so it is easy to appreciate. What it does is to chop a text string into individual words using XQuery and the good old nodes() method. I’ve benchmarked it and it is quicker than any of the SQL ways of doing it that I know about. Obviously, you can’t use the trick I described here to do it, because it is awkward to use REPLACE() on 1…n characters of whitespace. I’ll carry on using my iterative function since it is able to tell me the location of each word as a character-offset from the start, and also because this method leaves punctuation in (removing it takes time!). However, I can see other uses for this in passing lists as input or output parameters, or as return values.   if exists (Select * from sys.xml_schema_collections where name like 'WordList')   drop XML SCHEMA COLLECTION WordList go create xml schema collection WordList as ' <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="words">        <xs:simpleType>               <xs:list itemType="xs:string" />        </xs:simpleType> </xs:element> </xs:schema>'   go   DECLARE @string VARCHAR(MAX) –we'll get some sample data from the great Ogden Nash Select @String='This is a song to celebrate banks, Because they are full of money and you go into them and all you hear is clinks and clanks, Or maybe a sound like the wind in the trees on the hills, Which is the rustling of the thousand dollar bills. Most bankers dwell in marble halls, Which they get to dwell in because they encourage deposits and discourage withdrawals, And particularly because they all observe one rule which woe betides the banker who fails to heed it, Which is you must never lend any money to anybody unless they don''t need it. I know you, you cautious conservative banks! If people are worried about their rent it is your duty to deny them the loan of one nickel, yes, even one copper engraving of the martyred son of the late Nancy Hanks; Yes, if they request fifty dollars to pay for a baby you must look at them like Tarzan looking at an uppity ape in the jungle, And tell them what do they think a bank is, anyhow, they had better go get the money from their wife''s aunt or ungle. But suppose people come in and they have a million and they want another million to pile on top of it, Why, you brim with the milk of human kindness and you urge them to accept every drop of it, And you lend them the million so then they have two million and this gives them the idea that they would be better off with four, So they already have two million as security so you have no hesitation in lending them two more, And all the vice-presidents nod their heads in rhythm, And the only question asked is do the borrowers want the money sent or do they want to take it withm. Because I think they deserve our appreciation and thanks, the jackasses who go around saying that health and happi- ness are everything and money isn''t essential, Because as soon as they have to borrow some unimportant money to maintain their health and happiness they starve to death so they can''t go around any more sneering at good old money, which is nothing short of providential. '   –we now turn it into XML declare @xml_data xml(WordList)  set @xml_data='<words>'+ replace(@string,'&', '&amp;')+'</words>'    select T.ref.value('.', 'nvarchar(100)')  from (Select @xml_data.query('                      for $i in data(/words) return                      element li { $i }               '))  A(list) cross apply A.List.nodes('/li') T(ref)     …which gives (truncated, of course)…

    Read the article

  • Documentation and Test Assertions in Databases

    - by Phil Factor
    When I first worked with Sybase/SQL Server, we thought our databases were impressively large but they were, by today’s standards, pathetically small. We had one script to build the whole database. Every script I ever read was richly annotated; it was more like reading a document. Every table had a comment block, and every line would be commented too. At the end of each routine (e.g. procedure) was a quick integration test, or series of test assertions, to check that nothing in the build was broken. We simply ran the build script, stored in the Version Control System, and it pulled everything together in a logical sequence that not only created the database objects but pulled in the static data. This worked fine at the scale we had. The advantage was that one could, by reading the source code, reach a rapid understanding of how the database worked and how one could interface with it. The problem was that it was a system that meant that only one developer at the time could work on the database. It was very easy for a developer to execute accidentally the entire build script rather than the selected section on which he or she was working, thereby cleansing the database of everyone else’s work-in-progress and data. It soon became the fashion to work at the object level, so that programmers could check out individual views, tables, functions, constraints and rules and work on them independently. It was then that I noticed the trend to generate the source for the VCS retrospectively from the development server. Tables were worst affected. You can, of course, add or delete a table’s columns and constraints retrospectively, which means that the existing source no longer represents the current object. If, after your development work, you generate the source from the live table, then you get no block or line comments, and the source script is sprinkled with silly square-brackets and other confetti, thereby rendering it visually indigestible. Routines, too, were affected. In our system, every routine had a directly attached string of unit-tests. A retro-generated routine has no unit-tests or test assertions. Yes, one can still commit our test code to the VCS but it’s a separate module and teams end up running the whole suite of tests for every individual change, rather than just the tests for that routine, which doesn’t scale for database testing. With Extended properties, one can get the best of both worlds, and even use them to put blame, praise or annotations into your VCS. It requires a lot of work, though, particularly the script to generate the table. The problem is that there are no conventional names beyond ‘MS_Description’ for the special use of extended properties. This makes it difficult to do splendid things such ensuring the integrity of the build by running a suite of tests that are actually stored in extended properties within the database and therefore the VCS. We have lost the readability of database source code over the years, and largely jettisoned the use of test assertions as part of the database build. This is not unexpected in view of the increasing complexity of the structure of databases and number of programmers working on them. There must, surely, be a way of getting them back, but I sometimes wonder if I’m one of very few who miss them.

    Read the article

  • IsNumeric() Broken? Only up to a point.

    - by Phil Factor
    In SQL Server, probably the best-known 'broken' function is poor ISNUMERIC() . The documentation says 'ISNUMERIC returns 1 when the input expression evaluates to a valid numeric data type; otherwise it returns 0. ISNUMERIC returns 1 for some characters that are not numbers, such as plus (+), minus (-), and valid currency symbols such as the dollar sign ($).'Although it will take numeric data types (No, I don't understand why either), its main use is supposed to be to test strings to make sure that you can convert them to whatever numeric datatype you are using (int, numeric, bigint, money, smallint, smallmoney, tinyint, float, decimal, or real). It wouldn't actually be of much use anyway, since each datatype has different rules. You actually need a RegEx to do a reasonably safe check. The other snag is that the IsNumeric() function  is a bit broken. SELECT ISNUMERIC(',')This cheerfully returns 1, since it believes that a comma is a currency symbol (not a thousands-separator) and you meant to say 0, in this strange currency.  However, SELECT ISNUMERIC(N'£')isn't recognized as currency.  '+' and  '-' is seen to be numeric, which is stretching it a bit. You'll see that what it allows isn't really broken except that it doesn't recognize Unicode currency symbols: It just tells you that one numeric type is likely to accept the string if you do an explicit conversion to it using the string. Both these work fine, so poor IsNumeric has to follow suit. SELECT  CAST('0E0' AS FLOAT)SELECT  CAST (',' AS MONEY) but it is harder to predict which data type will accept a '+' sign. SELECT  CAST ('+' AS money) --0.00SELECT  CAST ('+' AS INT)   --0SELECT  CAST ('+' AS numeric)/* Msg 8115, Level 16, State 6, Line 4 Arithmetic overflow error converting varchar to data type numeric.*/SELECT  CAST ('+' AS FLOAT)/*Msg 8114, Level 16, State 5, Line 5Error converting data type varchar to float.*/> So we can begin to say that the maybe IsNumeric isn't really broken, but is answering a silly question 'Is there some numeric datatype to which i can convert this string? Almost, but not quite. The bug is that it doesn't understand Unicode currency characters such as the euro or franc which are actually valid when used in the CAST function. (perhaps they're delaying fixing the euro bug just in case it isn't necessary).SELECT ISNUMERIC (N'?23.67') --0SELECT  CAST (N'?23.67' AS money) --23.67SELECT ISNUMERIC (N'£100.20') --1SELECT  CAST (N'£100.20' AS money) --100.20 Also the CAST function itself is quirky in that it cannot convert perfectly reasonable string-representations of integers into integersSELECT ISNUMERIC('200,000')       --1SELECT  CAST ('200,000' AS INT)   --0/*Msg 245, Level 16, State 1, Line 2Conversion failed when converting the varchar value '200,000' to data type int.*/  A more sensible question is 'Is this an integer or decimal number'. This cuts out a lot of the apparent quirkiness. We do this by the '+E0' trick. If we want to include floats in the check, we'll need to make it a bit more complicated. Here is a small test-rig. SELECT  PossibleNumber,         ISNUMERIC(CAST(PossibleNumber AS NVARCHAR(20)) + 'E+00') AS Hack,        ISNUMERIC (PossibleNumber + CASE WHEN PossibleNumber LIKE '%E%'                                          THEN '' ELSE 'E+00' END) AS Hackier,        ISNUMERIC(PossibleNumber) AS RawIsNumericFROM    (SELECT CAST(',' AS NVARCHAR(10)) AS PossibleNumber          UNION SELECT '£' UNION SELECT '.'         UNION SELECT '56' UNION SELECT '456.67890'         UNION SELECT '0E0' UNION SELECT '-'         UNION SELECT '-' UNION SELECT '.'         UNION  SELECT N'?' UNION SELECT N'¢'        UNION  SELECT N'?' UNION SELECT N'?34.56'         UNION SELECT '-345' UNION SELECT '3.332228E+09') AS examples Which gives the result ... PossibleNumber Hack Hackier RawIsNumeric-------------- ----------- ----------- ------------? 0 0 0- 0 0 1, 0 0 1. 0 0 1¢ 0 0 1£ 0 0 1? 0 0 0?34.56 0 0 00E0 0 1 13.332228E+09 0 1 1-345 1 1 1456.67890 1 1 156 1 1 1 I suspect that this is as far as you'll get before you abandon IsNumeric in favour of a regex. You can only get part of the way with the LIKE wildcards, because you cannot specify quantifiers. You'll need full-blown Regex strings like these ..[-+]?\b[0-9]+(\.[0-9]+)?\b #INT or REAL[-+]?\b[0-9]{1,3}\b #TINYINT[-+]?\b[0-9]{1,5}\b #SMALLINT.. but you'll get even these to fail to catch numbers out of range.So is IsNumeric() an out and out rogue function? Not really, I'd say, but then it would need a damned good lawyer.

    Read the article

  • creating email forwarding accounts on the fly using

    - by Phil Jackson
    I've recently asked a question on Stack Overflow; http://stackoverflow.com/questions/2637137/creating-email-forwarding-accounts-on-the-fly-using-php and was I would find the solution here. first instruction was: Ask on the serverfault what operations you have have to perform. what config file to edit or something. Any help would be much appreciated. regards, phil

    Read the article

  • Wild Card DNS setup problems

    - by Phil Jackson
    Hi, this is my firast time with dedicated servers and im having problem setting up a wildcard sub-domain. I previously tried * /-----/ 14400 /-----/ IN /-----/ A /-----/ (serverip) waited 30 hours and nothing. so i then tried * /-----/ 14400 /-----/ IN /-----/ CNAME /-----/ actvbiv.co.uk. waited another 30 hours, nothing Im now trying; *.actvbiz.co.uk /-----/ 14400 /-----/ IN /-----/ CNAME /-----/ actvbiv.co.uk. Am i doing this correctly? using WHM. Regards, Phil Jackson

    Read the article

  • Secure Yourself by Using Two-Step Verification on These 16 Web Services

    - by Chris Hoffman
    Two-factor authentication, also known as 2-step verification, provides additional security for your online accounts. Even if someone discovers your password, they’ll need a special one-time code to log in after you enable two-factor authentication on these services. Notably absent from this list are banks and other financial institutions. It’s a shame that you can use two-factor authentication to protect your in-game currency in an MMORPG, but not the real money in your bank account. Secure Yourself by Using Two-Step Verification on These 16 Web Services How to Fix a Stuck Pixel on an LCD Monitor How to Factory Reset Your Android Phone or Tablet When It Won’t Boot

    Read the article

  • SQL Server - Rebuilding Indexes

    - by Renso
    Goal: Rebuild indexes in SQL server. This can be done one at a time or with the example script below to rebuild all index for a specified table or for all tables in a given database. Why? The data in indexes gets fragmented over time. That means that as the index grows, the newly added rows to the index are physically stored in other sections of the allocated database storage space. Kind of like when you load your Christmas shopping into the trunk of your car and it is full you continue to load some on the back seat, in the same way some storage buffer is created for your index but once that runs out the data is then stored in other storage space and your data in your index is no longer stored in contiguous physical pages. To access the index the database manager has to "string together" disparate fragments to create the full-index and create one contiguous set of pages for that index. Defragmentation fixes that. What does the fragmentation affect?Depending of course on how large the table is and how fragmented the data is, can cause SQL Server to perform unnecessary data reads, slowing down SQL Server’s performance.Which index to rebuild?As a rule consider that when reorganize a table's clustered index, all other non-clustered indexes on that same table will automatically be rebuilt. A table can only have one clustered index.How to rebuild all the index for one table:The DBCC DBREINDEX command will not automatically rebuild all of the indexes on a given table in a databaseHow to rebuild all indexes for all tables in a given database:USE [myDB]    -- enter your database name hereDECLARE @tableName varchar(255)DECLARE TableCursor CURSOR FORSELECT table_name FROM information_schema.tablesWHERE table_type = 'base table'OPEN TableCursorFETCH NEXT FROM TableCursor INTO @tableNameWHILE @@FETCH_STATUS = 0BEGINDBCC DBREINDEX(@tableName,' ',90)     --a fill factor of 90%FETCH NEXT FROM TableCursor INTO @tableNameENDCLOSE TableCursorDEALLOCATE TableCursorWhat does this script do?Reindexes all indexes in all tables of the given database. Each index is filled with a fill factor of 90%. While the command DBCC DBREINDEX runs and rebuilds the indexes, that the table becomes unavailable for use by your users temporarily until the rebuild has completed, so don't do this during production  hours as it will create a shared lock on the tables, although it will allow for read-only uncommitted data reads; i.e.e SELECT.What is the fill factor?Is the percentage of space on each index page for storing data when the index is created or rebuilt. It replaces the fill factor when the index was created, becoming the new default for the index and for any other nonclustered indexes rebuilt because a clustered index is rebuilt. When fillfactor is 0, DBCC DBREINDEX uses the fill factor value last specified for the index. This value is stored in the sys.indexes catalog view. If fillfactor is specified, table_name and index_name must be specified. If fillfactor is not specified, the default fill factor, 100, is used.How do I determine the level of fragmentation?Run the DBCC SHOWCONTIG command. However this requires you to specify the ID of both the table and index being. To make it a lot easier by only requiring you to specify the table name and/or index you can run this script:[email protected] int,@IndexID int,@IndexName varchar(128)--Specify the table and index namesSELECT @IndexName = ‘index_name’    --name of the indexSET @ID = OBJECT_ID(‘table_name’)  -- name of the tableSELECT @IndexID = IndIDFROM sysindexesWHERE id = @ID AND name = @IndexName--Show the level of fragmentationDBCC SHOWCONTIG (@id, @IndexID)Here is an example:DBCC SHOWCONTIG scanning 'Tickets' table...Table: 'Tickets' (1829581556); index ID: 1, database ID: 13TABLE level scan performed.- Pages Scanned................................: 915- Extents Scanned..............................: 119- Extent Switches..............................: 281- Avg. Pages per Extent........................: 7.7- Scan Density [Best Count:Actual Count].......: 40.78% [115:282]- Logical Scan Fragmentation ..................: 16.28%- Extent Scan Fragmentation ...................: 99.16%- Avg. Bytes Free per Page.....................: 2457.0- Avg. Page Density (full).....................: 69.64%DBCC execution completed. If DBCC printed error messages, contact your system administrator.What's important here?The Scan Density; Ideally it should be 100%. As time goes by it drops as fragmentation occurs. When the level drops below 75%, you should consider re-indexing.Here are the results of the same table and clustered index after running the script:DBCC SHOWCONTIG scanning 'Tickets' table...Table: 'Tickets' (1829581556); index ID: 1, database ID: 13TABLE level scan performed.- Pages Scanned................................: 692- Extents Scanned..............................: 87- Extent Switches..............................: 86- Avg. Pages per Extent........................: 8.0- Scan Density [Best Count:Actual Count].......: 100.00% [87:87]- Logical Scan Fragmentation ..................: 0.00%- Extent Scan Fragmentation ...................: 22.99%- Avg. Bytes Free per Page.....................: 639.8- Avg. Page Density (full).....................: 92.10%DBCC execution completed. If DBCC printed error messages, contact your system administrator.What's different?The Scan Density has increased from 40.78% to 100%; no fragmentation on the clustered index. Note that since we rebuilt the clustered index, all other index were also rebuilt.

    Read the article

  • Standards Corner: OAuth WG Client Registration Problem

    - by Tanu Sood
    Phil Hunt is an active member of multiple industry standards groups and committees (see brief bio at the end of the post) and has spearheaded discussions, creation and ratifications of  Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Calibri","sans-serif"; mso-ascii- mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi- mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} industry standards including the Kantara Identity Governance Framework, among others. Being an active voice in the industry standards development world, we have invited him to share his discussions, thoughts, news & updates, and discuss use cases, implementation success stories (and even failures) around industry standards on this monthly column. Author: Phil Hunt This afternoon, the OAuth Working Group will meet at IETF88 in Vancouver to discuss some important topics important to the maturation of OAuth. One of them is the OAuth client registration problem.OAuth (RFC6749) was initially developed with a simple deployment model where there is only monopoly or singleton cloud instance of a web API (e.g. there is one Facebook, one Google, on LinkedIn, and so on). When the API publisher and API deployer are the same monolithic entity, it easy for developers to contact the provider and register their app to obtain a client_id and credential.But what happens when the API is for an open source project where there may be 1000s of deployed copies of the API (e.g. such as wordpress). In these cases, the authors of the API are not the people running the API. In these scenarios, how does the developer obtain a client_id? An example of an "open deployed" API is OpenID Connect. Connect defines an OAuth protected resource API that can provide personal information about an authenticated user -- in effect creating a potentially common API for potential identity providers like Facebook, Google, Microsoft, Salesforce, or Oracle. In Oracle's case, Fusion applications will soon have RESTful APIs that are deployed in many different ways in many different environments. How will developers write apps that can work against an openly deployed API with whom the developer can have no prior relationship?At present, the OAuth Working Group has two proposals two consider: Dynamic RegistrationDynamic Registration was originally developed for OpenID Connect and UMA. It defines a RESTful API in which a prospective client application with no client_id creates a new client registration record with a service provider and is issued a client_id and credential along with a registration token that can be used to update registration over time.As proof of success, the OIDC community has done substantial implementation of this spec and feels committed to its use. Why not approve?Well, the answer is that some of us had some concerns, namely: Recognizing instances of software - dynamic registration treats all clients as unique. It has no defined way to recognize that multiple copies of the same client are being registered other then assuming if the registration parameters are similar it might be the same client. Versioning and Policy Approval of open APIs and clients - many service providers have to worry about change management. They expect to have approval cycles that approve versions of server and client software for use in their environment. In some cases approval might be wide open, but in many cases, approval might be down to the specific class of software and version. Registration updates - when does a client actually need to update its registration? Shouldn't it be never? Is there some characteristic of deployed code that would cause it to change? Options lead to complexity - because each client is treated as unique, it becomes unclear how the clients and servers will agree on what credentials forms are acceptable and what OAuth features are allowed and disallowed. Yet the reality is, developers will write their application to work in a limited number of ways. They can't implement all the permutations and combinations that potential service providers might choose. Stateful registration - if the primary motivation for registration is to obtain a client_id and credential, why can't this be done in a stateless fashion using assertions? Denial of service - With so much stateful registration and the need for multiple tokens to be issued, will this not lead to a denial of service attack / risk of resource depletion? At the very least, because of the information gathered, it would difficult for service providers to clean up "failed" registrations and determine active from inactive or false clients. There has yet to be much wide-scale "production" use of dynamic registration other than in small closed communities. Client Association A second proposal, Client Association, has been put forward by Tony Nadalin of Microsoft and myself. We took at look at existing use patterns to come up with a new proposal. At the Berlin meeting, we considered how WS-STS systems work. More recently, I took a review of how mobile messaging clients work. I looked at how Apple, Google, and Microsoft each handle registration with APNS, GCM, and WNS, and a similar pattern emerges. This pattern is to use an existing credential (mutual TLS auth), or client bearer assertion and swap for a device specific bearer assertion.In the client association proposal, the developer's registration with the API publisher is handled by having the developer register with an API publisher (as opposed to the party deploying the API) and obtaining a software "statement". Or, if there is no "publisher" that can sign a statement, the developer may include their own self-asserted software statement.A software statement is a special type of assertion that serves to lock application registration profile information in a signed assertion. The statement is included with the client application and can then be used by the client to swap for an instance specific client assertion as defined by section 4.2 of the OAuth Assertion draft and profiled in the Client Association draft. The software statement provides a way for service provider to recognize and configure policy to approve classes of software clients, and simplifies the actual registration to a simple assertion swap. Because the registration is an assertion swap, registration is no longer "stateful" - meaning the service provider does not need to store any information to support the client (unless it wants to). Has this been implemented yet? Not directly. We've only delivered draft 00 as an alternate way of solving the problem using well-known patterns whose security characteristics and scale characteristics are well understood. Dynamic Take II At roughly the same time that Client Association and Software Statement were published, the authors of Dynamic Registration published a "split" version of the Dynamic Registration (draft-richer-oauth-dyn-reg-core and draft-richer-oauth-dyn-reg-management). While some of the concerns above are addressed, some differences remain. Registration is now a simple POST request. However it defines a new method for issuing client tokens where as Client Association uses RFC6749's existing extension point. The concern here is whether future client access token formats would be addressed properly. Finally, Dyn-reg-core does not yet support software statements. Conclusion The WG has some interesting discussion to bring this back to a single set of specifications. Dynamic Registration has significant implementation, but Client Association could be a much improved way to simplify implementation of the overall OpenID Connect specification and improve adoption. In fairness, the existing editors have already come a long way. Yet there are those with significant investment in the current draft. There are many that have expressed they don't care. They just want a standard. There is lots of pressure on the working group to reach consensus quickly.And that folks is how the sausage is made.Note: John Bradley and Justin Richer recently published draft-bradley-stateless-oauth-client-00 which on first look are getting closer. Some of the details seem less well defined, but the same could be said of client-assoc and software-statement. I hope we can merge these specs this week. Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin:0in; mso-para-margin-bottom:.0001pt; mso-pagination:widow-orphan; font-family:"Calibri","sans-serif"; mso-ascii- mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi- mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} About the Writer: Phil Hunt joined Oracle as part of the November 2005 acquisition of OctetString Inc. where he headed software development for what is now Oracle Virtual Directory. Since joining Oracle, Phil works as CMTS in the Identity Standards group at Oracle where he developed the Kantara Identity Governance Framework and provided significant input to JSR 351. Phil participates in several standards development organizations such as IETF and OASIS working on federation, authorization (OAuth), and provisioning (SCIM) standards.  Phil blogs at www.independentid.com and a Twitter handle of @independentid.

    Read the article

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13  | Next Page >