Search Results

Search found 8893 results on 356 pages for 'stored'.

Page 314/356 | < Previous Page | 310 311 312 313 314 315 316 317 318 319 320 321 | Next Page >

From NaN to Infinity...and Beyond!

- by Tony Davis

It is hard to believe that it was once possible to corrupt a SQL Server Database by storing perfectly normal data values into a table; but it is true. In SQL Server 2000 and before, one could inadvertently load invalid data values into certain data types via RPC calls or bulk insert methods rather than DML. In the particular case of the FLOAT data type, this meant that common 'special values' for this type, namely NaN (not-a-number) and +/- infinity, could be quite happily plugged into the database from an application and stored as 'out-of-range' values. This was like a time-bomb. When one then tried to query this data; the values were unsupported and so data pages containing them were flagged as being corrupt. Any query that needed to read a column containing the special value could fail or return unpredictable results. Microsoft even had to issue a hotfix to deal with failures in the automatic recovery process, caused by the presence of these NaN values, which rendered the whole database inaccessible! This problem is history for those of us on more current versions of SQL Server, but its ghost still haunts us. Recently, for example, a developer on Red Gate’s SQL Response team reported a strange problem when attempting to load historical monitoring data into a SQL Server 2005 database via the C# ADO.NET provider. The ratios used in some of their reporting calculations occasionally threw out NaN or infinity values, and the subsequent attempts to load these values resulted in a nasty error. It turns out to be a different manifestation of the same problem. SQL Server 2005 still does not fully support the IEEE 754 standard for floating point numbers, in that the FLOAT data type still cannot handle NaN or infinity values. Instead, they just added validation checks that prevent the 'invalid' values from being loaded in the first place. For people migrating from SQL Server 2000 databases that contained out-of-range FLOAT (or DATETIME etc.) data, to SQL Server 2005, Microsoft have added to the latter's version of the DBCC CHECKDB (or CHECKTABLE) command a DATA_PURITY clause. When enabled, this will seek out the corrupt data, but won’t fix it. You have to do this yourself in what can often be a slow, painful manual process. Our development team, after a quizzical shrug of the shoulders, simply decided to represent NaN and infinity values as NULL, and move on, accepting the minor inconvenience of not being able to tell them apart. However, what of scientific, engineering and other applications that really would like the luxury of being able to both store and access these perfectly-reasonable floating point data values? The sticking point seems to be the stipulation in the IEEE 754 standard that, when NaN is compared to any other value including itself, the answer is "unequal" (i.e. FALSE). This is clearly different from normal number comparisons and has repercussions for such things as indexing operations. Even so, this hardly applies to infinity values, which are single definite values. In fact, there is some encouraging talk in the Connect note on this issue that they might be supported 'in the SQL Server 2008 timeframe'. If didn't happen; SQL 2008 doesn't support NaN or infinity values, though one could be forgiven for thinking otherwise, based on the MSDN documentation for the FLOAT type, which states that "The behavior of float and real follows the IEEE 754 specification on approximate numeric data types". However, the truth is revealed in the XPath documentation, which states that "…float (53) is not exactly IEEE 754. For example, neither NaN (Not-a-Number) nor infinity is used…". Is it really so hard to fix this problem the right way, and properly support in SQL Server the IEEE 754 standard for the floating point data type, NaNs, infinities and all? Oracle seems to have managed it quite nicely with its BINARY_FLOAT and BINARY_DOUBLE types, so it is technically possible. We have an enterprise-class database that is marketed as being part of an 'integrated' Windows platform. Absurdly, we have .NET and XPath libraries that fully support the standard for floating point numbers, and we can't even properly store these values, let alone query them, in the SQL Server database! Cheers, Tony.

Read the article
Dynamically creating meta tags in asp.net mvc

- by Jalpesh P. Vadgama

As we all know that Meta tag has very important roles in Search engine optimization and if we want to have out site listed with good ranking on search engines then we have to put meta tags. Before some time I have blogged about dynamically creating meta tags in asp.net 2.0/3.5 sites, in this blog post I am going to explain how we can create a meta tag dynamically very easily. To have meta tag dynamically we have to create a meta tag on server-side. So I have created a method like following. public string HomeMetaTags() { System.Text.StringBuilder strMetaTag = new System.Text.StringBuilder(); strMetaTag.AppendFormat(@"<meta content='{0}' name='Keywords'/>","Home Action Keyword"); strMetaTag.AppendFormat(@"<meta content='{0}' name='Descption'/>", "Home Description Keyword"); return strMetaTag.ToString(); } Here you can see that I have written a method which will return a string with meta tags. Here you can write any logic you can fetch it from the database or you can even fetch it from xml based on key passed. For the demo purpose I have written that hardcoded. So it will create a meta tag string and will return it. Now I am going to store that meta tag in ViewBag just like we have a title tag. In this post I am going to use standard template so we have our title tag there in viewbag message. Same way I am going save meta tag like following in ViewBag. public ActionResult Index() { ViewBag.Message = "Welcome to ASP.NET MVC!"; ViewBag.MetaTag = HomeMetaTags(); return View(); } Here in the above code you can see that I have stored MetaTag ViewBag. Now as I am using standard ASP.NET MVC3 template so we have our we have out head element in Shared folder _layout.cshtml file. So to render meta tag I have modified the Head tag part of _layout.cshtml like following. <head> <title>@ViewBag.Title</title> <link href="@Url.Content("~/Content/Site.css")" rel="stylesheet" type="text/css" /> <script src="@Url.Content("~/Scripts/jquery-1.5.1.min.js")" type="text/javascript"></script> @Html.Raw(ViewBag.MetaTag) </head> Here in the above code you can see I have use @Html.Raw method to embed meta tag in _layout.cshtml page. This HTML.Raw method will embed output to head tag section without encoding html. As we have already taken care of html tag in string function we don’t need the html encoding. Now it’s time to run application in browser. Now once you run your application in browser and click on view source you will find meta tag for home page as following. That’s its It’s very easy to create dynamically meta tag. Hope you liked it.. Stay tuned for more.. Till then happy programming.

Read the article
Monitoring the Application alongside SQL Server

- by Tony Davis

Sometimes, on Simple-Talk, it takes a while to spot strange and unexpected patterns of user activity, or small bugs. For example, one morning we spotted that an article’s comment count had leapt to 1485, but that only four were displayed. With some rooting around in Google Analytics, and the endlessly annoying Community Server admin-interface, we were able to work out that a few days previously the article had been subject to a spam attack and that the comment count was for some reason including both accepted and unaccepted comments (which in turn uncovered a bug in the SQL). This sort of incident made us a lot keener on monitoring Simple-talk website usage more effectively. However, the metrics we wanted are troublesome, because they are far too specific for Google Analytics to measure, and the SQL Server backend doesn’t keep sufficient information to enable us to plot trends. The latter could provide, for example, the total number of comments made on, or votes cast for, articles, over all time, but not the number that occur by hour over a set time. We lacked a baseline, in other words. We couldn’t alter the database, as it is a bought-in package. We had neither the resources nor inclination to build-in dedicated application monitoring. Possibly, we could investigate a third-party tool to do the job; but then it occurred to us that we were already using a monitoring tool (SQL Monitor) to keep an eye on the database. It stored data, made graphs and sent alerts. Could we get it to monitor some aspects of the application as well? Of course, SQL Monitor’s single purpose is to check and monitor SQL Server, over time, rather than to monitor applications that use SQL Server. However, how different is the business of gathering and plotting SQL Server Wait Stats, from gathering and plotting various aspects of user activity on the site? Not a lot, it turns out. The latest version allows us to write our own custom monitoring scripts, meaning that we could now monitor any metric in the application that returns an integer. It took little time to write a simple SQL Query that collects basic metrics of the total number of subscribers, votes cast, comments made, or views of articles, over time. The SQL Monitor database polls Simple-Talk every second or so in order to get the latest totals, and can then store and plot this information, or even correlate SQL Server usage to application usage. You can see the live data by visiting monitor.red-gate.com. Click the "Analysis" tab, and select one of the "Simple-talk:" entries in the "Show" box and an appropriate data range (e.g. last 30 days). It’s nascent, and we’re still working on it, but it’s already given us more confidence that we’ll spot quickly trends, bugs, or bursts of ‘abnormal’ activity. If there is a sudden rise in comments, we get an alert, and if it’s due to a spam attack, we can moderate or ban the perpetrator very quickly. We’ve often argued that a tool should perform a single job well rather than turn into a Swiss-army knife, but ironically we’ve rather appreciated being able to make best use of what’s there anyway for a slightly different purpose. Is this a good or common practice? What do you think? Cheers, Tony.

Read the article
A temporary disagreement

- by Tony Davis

Last month, Phil Factor caused a furore amongst some MVPs with an article that attempted to offer simple advice to developers regarding the use of table variables, versus local and global temporary tables, in their code. Phil makes clear that the table variables do come with some fairly major limitations.no distribution statistics, no parallel query plans for queries that modify table variables.but goes on to suggest that for reasonably small-scale strategic uses, and with a bit of due care and testing, table variables are a "good thing". Not everyone shares his opinion; in fact, I imagine he was rather aghast to learn that there were those felt his article was akin to pulling the pin out of a grenade and tossing it into the database; table variables should be avoided in almost all cases, according to their advice, in favour of temp tables. In other words, a fairly major feature of SQL Server should be more-or-less 'off limits' to developers. The problem with temp tables is that, because they are scoped either in the procedure or the connection, it is easy to allow them to hang around for too long, eating up precious memory and bulking up the shared tempdb database. Unless they are explicitly dropped, global temporary tables, and local temporary tables created within a connection rather than within a stored procedure, will persist until the connection is closed or, with connection pooling, until the connection is reused. It's also quite common with ASP.NET applications to have connection leaks, as Bill Vaughn explains in his chapter in the "SQL Server Deep Dives" book, meaning that the web page exits without closing the connection object, maybe due to an error condition. This will then hang around in the heap for what might be hours before picked up by the garbage collector. Table variables are much safer in this regard, since they are batch-scoped and so are cleaned up automatically once the batch is complete, which also means that they are intuitive to use for the developer because they conform to scoping rules that are closer to those in procedural code. On the surface then, an ideal way to deal with issues related to tempdb memory hogging. So why did Phil qualify his recommendation to use Table Variables? This is another of those cases where, like scalar UDFs and table-valued multi-statement UDFs, developers can sometimes get into trouble with a relatively benign-looking feature, due to way it's been implemented in SQL Server. Once again the biggest problem is how they are handled internally, by the SQL Server query optimizer, which can make very poor choices for JOIN orders and so on, in the absence of statistics, especially when joining to tables with highly-skewed data. The resulting execution plans can be horrible, as will be the resulting performance. If the JOIN is to a large table, that will hurt. Ideally, Microsoft would simply fix this issue so that developers can't get burned in this way; they've been around since SQL Server 2000, so Microsoft has had a bit of time to get it right. As I commented in regard to UDFs, when developers discover issues like with such standard features, the database becomes an alien planet to them, where death lurks around each corner, and they continue to avoid these "killer" features years after the problems have been eventually resolved. In the meantime, what is the right approach? Is it to say "hammers can kill, don't ever use hammers", or is it to try to explain, as Phil's article and follow-up blog post have tried to do, what the feature was intended for, why care must be applied in its use, and so enable developers to make properly-informed decisions, without requiring them to delve deep into the inner workings of SQL Server? Cheers, Tony.

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article
SQL SERVER – Integrate Your Data with Skyvia – Cloud ETL Solution

- by Pinal Dave

In our days data integration often becomes a key aspect of business success. For business analysts it’s very important to get integrated data from various sources, such as relational databases, cloud CRMs, etc. to make correct and successful decisions. There are various data integration solutions on market, and today I will tell about one of them – Skyvia. Skyvia is a cloud data integration service, which allows integrating data in cloud CRMs and different relational databases. It is a completely online solution and does not require anything except for a browser. Skyvia provides powerful etl tools for data import, export, replication, and synchronization for SQL Server and other databases and cloud CRMs. You can use Skyvia data import tools to load data from various sources to SQL Server (and SQL Azure). Skyvia supports such cloud CRMs as Salesforce and Microsoft Dynamics CRM and such databases as MySQL and PostgreSQL. You even can migrate data from SQL Server to SQL Server, or from SQL Server to other databases and cloud CRMs. Additionally Skyvia supports import of CSV files, either uploaded manually or stored on cloud file storage services, such as Dropbox, Box, Google Drive, or FTP servers. When data import is not enough, Skyvia offers bidirectional data synchronization. With this tool, you can synchronize SQL Server data with other databases and cloud CRMs. After performing the first synchronization, Skyvia tracks data changes in the synchronized data storages. In SQL Server databases (and other relational databases) it creates additional tracking tables and triggers. This allows synchronizing only the changed data. Skyvia also maps records by their primary key values to each other, so it does not require different sources to have the same primary key structure. It still can match the corresponding records without having to add any additional columns or changing data structure. The only requirement for synchronization is that primary keys must be autogenerated. With Skyvia it’s not necessary for data to have the same structure in integrated data storages. Skyvia supports powerful mapping mechanisms that allow synchronizing data with completely different structure. It provides support for complex mathematical and string expressions when mapping data, using lookups, etc. You may use data splitting – loading data from a single CSV file or source table to multiple related target tables. Or you may load data from several source CSV files or tables to several related target tables. In each case Skyvia preserves data relations. It builds corresponding relations between the target data automatically. When you often work with cloud CRM data, native CRM data reporting and analysis tools may be not enough for you. And there is a vast set of professional data analysis and reporting tools available for SQL Server. With Skyvia you can quickly copy your cloud CRM data to an SQL Server database and apply corresponding SQL Server tools to the data. In such case you can use Skyvia data replication tools. It allows you to quickly copy cloud CRM data to SQL Server or other databases without customizing any mapping. You need just to specify columns to copy data from. Target database tables will be created automatically. Skyvia offers powerful filtering settings to replicate only the records you need. Skyvia also provides capability to export data from SQL Server (including SQL Azure) and other databases and cloud CRMs to CSV files. These files can be either downloadable manually or loaded to cloud file storages or FTP server. You can use export, for example, to backup SQL Azure data to Dropbox. Any data integration operation can be scheduled for automatic execution. Thus, you can automate your SQL Azure data backup or data synchronization – just configure it once, then schedule it, and benefit from automatic data integration with Skyvia. Currently registration and using Skyvia is completely free, so you can try it yourself and find out whether its data migration and integration tools suits for you. Visit this link to register on Skyvia: https://app.skyvia.com/register Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL Tagged: Cloud Computing

Read the article
Oracle Big Data Software Downloads

- by Mike.Hallett(at)Oracle-BI&EPM

Companies have been making business decisions for decades based on transactional data stored in relational databases. Beyond that critical data, is a potential treasure trove of less structured data: weblogs, social media, email, sensors, and photographs that can be mined for useful information. Oracle offers a broad integrated portfolio of products to help you acquire and organize these diverse data sources and analyze them alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data Connectors Downloads here, includes: Oracle SQL Connector for Hadoop Distributed File System Release 2.1.0 Oracle Loader for Hadoop Release 2.1.0 Oracle Data Integrator Companion 11g Oracle R Connector for Hadoop v 2.1 Oracle Big Data Documentation The Oracle Big Data solution offers an integrated portfolio of products to help you organize and analyze your diverse data sources alongside your existing data to find new insights and capitalize on hidden relationships. Oracle Big Data, Release 2.2.0 - E41604_01 zip (27.4 MB) Integrated Software and Big Data Connectors User's Guide HTML PDF Oracle Data Integrator (ODI) Application Adapter for Hadoop Apache Hadoop is designed to handle and process data that is typically from data sources that are non-relational and data volumes that are beyond what is handled by relational databases. Typical processing in Hadoop includes data validation and transformations that are programmed as MapReduce jobs. Designing and implementing a MapReduce job usually requires expert programming knowledge. However, when you use Oracle Data Integrator with the Application Adapter for Hadoop, you do not need to write MapReduce jobs. Oracle Data Integrator uses Hive and the Hive Query Language (HiveQL), a SQL-like language for implementing MapReduce jobs. Employing familiar and easy-to-use tools and pre-configured knowledge modules (KMs), the application adapter provides the following capabilities: Loading data into Hadoop from the local file system and HDFS Performing validation and transformation of data within Hadoop Loading processed data from Hadoop to an Oracle database for further processing and generating reports Oracle Database Loader for Hadoop Oracle Loader for Hadoop is an efficient and high-performance loader for fast movement of data from a Hadoop cluster into a table in an Oracle database. It pre-partitions the data if necessary and transforms it into a database-ready format. Oracle Loader for Hadoop is a Java MapReduce application that balances the data across reducers to help maximize performance. Oracle R Connector for Hadoop Oracle R Connector for Hadoop is a collection of R packages that provide: Interfaces to work with Hive tables, the Apache Hadoop compute infrastructure, the local R environment, and Oracle database tables Predictive analytic techniques, written in R or Java as Hadoop MapReduce jobs, that can be applied to data in HDFS files You install and load this package as you would any other R package. Using simple R functions, you can perform tasks such as: Access and transform HDFS data using a Hive-enabled transparency layer Use the R language for writing mappers and reducers Copy data between R memory, the local file system, HDFS, Hive, and Oracle databases Schedule R programs to execute as Hadoop MapReduce jobs and return the results to any of those locations Oracle SQL Connector for Hadoop Distributed File System Using Oracle SQL Connector for HDFS, you can use an Oracle Database to access and analyze data residing in Hadoop in these formats: Data Pump files in HDFS Delimited text files in HDFS Hive tables For other file formats, such as JSON files, you can stage the input in Hive tables before using Oracle SQL Connector for HDFS. Oracle SQL Connector for HDFS uses external tables to provide Oracle Database with read access to Hive tables, and to delimited text files and Data Pump files in HDFS. Related Documentation Cloudera's Distribution Including Apache Hadoop Library HTML Oracle R Enterprise HTML Oracle NoSQL Database HTML Recent Blog Posts Big Data Appliance vs. DIY Price Comparison Big Data: Architecture Overview Big Data: Achieve the Impossible in Real-Time Big Data: Vertical Behavioral Analytics Big Data: In-Memory MapReduce Flume and Hive for Log Analytics Building Workflows in Oozie

Read the article
Q&A: Oracle's Paul Needham on How to Defend Against Insider Attacks

- by Troy Kitch

Source: Database Insider Newsletter: The threat from insider attacks continues to grow. In fact, just since January 1, 2014, insider breaches have been reported by a major consumer bank, a major healthcare organization, and a range of state and local agencies, according to the Privacy Rights Clearinghouse. We asked Paul Needham, Oracle senior director, product management, to shed light on the nature of these pernicious risks—and how organizations can best defend themselves against the threat from insider risks. Q. First, can you please define the term "insider" in this context? A. According to the CERT Insider Threat Center, a malicious insider is a current or former employee, contractor, or business partner who "has or had authorized access to an organization's network, system, or data and intentionally exceeded or misused that access in a manner that negatively affected the confidentiality, integrity, or availability of the organization's information or information systems." Q. What has changed with regard to insider risks? A. We are actually seeing the risk of privileged insiders growing. In the latest Independent Oracle Users Group Data Security Survey, the number of organizations that had not taken steps to prevent privileged user access to sensitive information had grown from 37 percent to 42 percent. Additionally, 63 percent of respondents say that insider attacks represent a medium-to-high risk—higher than any other category except human error (by an insider, I might add). Q. What are the dangers of this type of risk? A. Insiders tend to have special insight and access into the kinds of data that are especially sensitive. Breaches can result in long-term legal issues and financial penalties. They can also damage an organization's brand in a way that directly impacts its bottom line. Finally, there is the potential loss of intellectual property, which can have serious long-term consequences because of the loss of market advantage. Q. How can organizations protect themselves against abuse of privileged access? A. Every organization has privileged users and that will always be the case. The questions are how much access should those users have to application data stored in the database, and how can that default access be controlled? Oracle Database Vault (See image) was designed specifically for this purpose and helps protect application data against unauthorized access. Oracle Database Vault can be used to block default privileged user access from inside the database, as well as increase security controls on the application itself. Attacks can and do come from inside the organization, and they are just as likely to come from outside as attempts to exploit a privileged account. Using Oracle Database Vault protection, boundaries can be placed around database schemas, objects, and roles, preventing privileged account access from being exploited by hackers and insiders. A new Oracle Database Vault capability called privilege analysis identifies privileges and roles used at runtime, which can then be audited or revoked by the security administrators to reduce the attack surface and increase the security of applications overall. For a more comprehensive look at controlling data access and restricting privileged data in Oracle Database, download Needham's new e-book, Securing Oracle Database 12c: A Technical Primer.

Read the article
Autoscaling in a modern world…. last chapter

- by Steve Loethen

As we all know as coders, things like logging are never important. Our code will work right the first time. So, you can understand my surprise when the first time I deployed the autoscaling worker role to the actual Azure fabric, it did not scale. I mean, it worked on my machine. How dare the datacenter argue with that. So, how did I track down the problem? (turns out, it was not so much code as lack of the right certificate) When I ran it local in the developer fabric, I was able to see a wealth of information. Lots of periodic status info every time the autoscalar came around to check on my rules and decide to act or not. But that information was not making it to Azure storage. The diagnostics were not being transferred to where I could easily see and use them to track down why things were not being cooperative. After a bit of digging, I discover the problem. You need to add a bit of extra configuration code to get the correct information stored for you. I added the following to my app.config: Code Snippet <system.diagnostics> <sources> <source name="Autoscaling General"switchName="SourceSwitch" switchType="System.Diagnostics.SourceSwitch" > <listeners> <add name="AzureDiag" /> <remove name="Default"/> </listeners> </source> <source name="Autoscaling Updates"switchName="SourceSwitch" switchType="System.Diagnostics.SourceSwitch" > <listeners> <add name="AzureDiag" /> <remove name="Default"/> </listeners> </source> </sources> <switches> <add name="SourceSwitch" value="Verbose, Information, Warning, Error, Critical" /> </switches> <sharedListeners> <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener,Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral,PublicKeyToken=31bf3856ad364e35" name="AzureDiag"/> </sharedListeners> <trace> <listeners> <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener,Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral,PublicKeyToken=31bf3856ad364e35" name="AzureDiagnostics"> <filter type="" /> </add> </listeners> </trace> </system.diagnostics> Suddenly all the rich tracing info I needed was filling up my storage account. After a few cycles of trying to attempting to scale, I identified the cert problem, uploaded a correct certificate, and away it went. I hope this was helpful.

Read the article
SQLAuthority News – Monthly list of Puzzles and Solutions on SQLAuthority.com

- by pinaldave

This month has been very interesting month for SQLAuthority.com we had multiple and various puzzles which everybody participated and lots of interesting conversation which we have shared. Let us start in latest puzzles and continue going down. There are few answers also posted on facebook as well. SQL SERVER – Puzzle Involving NULL – Resolve – Error – Operand data type void type is invalid for sum operator This puzzle involves NULL and throws an error. The challenge is to resolve the error. There are multiple ways to resolve this error. Readers has contributed various methods. Few of them even have supplied the answer why this error is showing up. NULL are very important part of the database and if one of the column has NULL the result can be totally different than the one expected. SQL SERVER – T-SQL Scripts to Find Maximum between Two Numbers I modified script provided by friend to find greatest number between two number. My script has small bug in it. However, lots of readers have suggested better scripts. Madhivanan has written blog post on the subject over here. SQL SERVER – BI Quiz Hint – Performance Tuning Cubes – Hints This quiz is hosted on my friend Jacob‘s site. I have written many hints how one can tune cubes. Now one can take part here and win exciting prizes. SQL SERVER – Solution – Generating Zero Without using Any Numbers in T-SQL Madhivanan has asked very interesting question on his blog about How to Generate Zero without using Any Numbers in T-SQL. He has demonstrated various methods how one can generate Zero. I asked the same question on blog and got many interesting answers which I have shared. SQL SERVER – Solution – Puzzle – Statistics are not Updated but are Created Once I have to accept that this was most difficult puzzle. In this puzzle I have asked even though settings are correct, why statistics of the tables are not getting updated. In this puzzle one is tested with various concepts 1) Indexes, 2) Statistics, 3) database settings etc. There are multiple ways of solving this puzzles. It was interesting as many took interest but only few got it right. SQL SERVER – Question to You – When to use Function and When to use Stored Procedure This is rather straight forward question and not the typical puzzle. The answers from readers are great however, still there is chance of more detailed answers. SQL SERVER – Selecting Domain from Email Address I wrote on selecting domains from email addresses. Madhivanan makes puzzle out of a simple question. He wrote a follow-up post over here. In his post he writes various way how one can find email addresses from list of domains. Well, this is not a puzzle but amazing Guest Post by Feodor Georgiev who has written on subject Job Interviewing the Right Way (and for the Right Reasons). An article which everyone should read. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, PostADay, Readers Contribution, Readers Question, SQL, SQL Authority, SQL Puzzle, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology

Read the article
jtreg update, March 2012

- by jjg

There is a new update for jtreg 4.1, b04, available. The primary changes have been to support faster and more reliable test runs, especially for tests in the jdk/ repository. [ For users inside Oracle, there is preliminary direct support for gathering code coverage data using jcov while running tests, and for generating a coverage report when all the tests have been run. ] -- jtreg can be downloaded from the OpenJDK jtreg page: http://openjdk.java.net/jtreg/. Scratch directories On platforms like Windows, if a test leaves a file open when the test is over, that can cause a problem for downstream tests, because the scratch directory cannot be emptied beforehand. This is addressed in agentvm mode by discarding any agents using that scratch directory and starting new agents using a new empty scratch directory. Successive directives use suffices _1, _2, etc. If you see such directories appearing in the work directory, that is an indication that files were left open in the preceding directory in the series. Locking support Some tests use shared system resources such as fixed port numbers. This causes a problem when running tests concurrently. So, you can now mark a directory such that all the tests within all such directories will be run sequentially, even if you use -concurrency:N on the command line to run the rest of the tests in parallel. This is seen as a short term solution: it is recommended that tests not use shared system resources whenever possible. If you are running multiple instances of jtreg on the same machine at the same time, you can use a new option -lock:file to specify a file to be used for file locking; otherwise, the locking will just be within the JVM used to run jtreg. "autovm mode" By default, if no options to the contrary are given on the command line, tests will be run in othervm mode. Now, a test suite can be marked so that the default execution mode is "agentvm" mode. In conjunction with this, you can now mark a directory such that all the tests within that directory will be run in "othervm" mode. Conceptually, this is equivalent to putting /othervm on every appropriate action on every test in that directory and any subdirectories. This is seen as a short term solution: it is recommended tests be adapted to use agentvm mode, or use "@run main/othervm" explicitly. Info in test result files The user name and jtreg version info are now stored in the properties near the beginning of the .jtr file. Build The makefiles used to build and test jtreg have been reorganized and simplified. jtreg is now using JT Harness version 4.4. Other jtreg provides access to GNOME_DESKTOP_SESSION_ID when set. jtreg ensures that shell tests are given an absolute path for the JDK under test. jtreg now honors the "first sentence rule" for the description given by @summary. jtreg saves the default locale before executing a test in samevm or agentvm mode, and restores it afterwards. Bug fixes jtreg tried to execute a test even if the compilation failed in agentvm mode because of a JVM crash. jtreg did not correctly handle the -compilejdk option. Acknowledgements Thanks to Alan, Amy, Andrey, Brad, Christine, Dima, Max, Mike, Sherman, Steve and others for their help, suggestions, bug reports and for testing this latest version.

Read the article
Is this simple XOR encrypted communication absolutely secure?

- by user3123061

Say Alice have 4GB USB flash memory and Peter also have 4GB USB flash memory. They once meet and save on both of memories two files named alice_to_peter.key (2GB) and peter_to_alice.key (2GB) which is randomly generated bits. Then they never meet again and communicate electronicaly. Alice also maintains variable called alice_pointer and Peter maintains variable called peter_pointer which is both initially set to zero. Then when Alice needs to send message to Peter they do: encrypted_message_to_peter[n] = message_to_peter[n] XOR alice_to_peter.key[alice_pointer + n] Where n i n-th byte of message. Then alice_pointer is attached at begining of the encrypted message and (alice_pointer + encrypted message) is sent to Peter and then alice_pointer is incremented by length of message (and for maximum security can be used part of key erased) Peter receives encrypted_message, reads alice_pointer stored at beginning of message and do this: message_to_peter[n] = encrypted_message_to_peter[n] XOR alice_to_peter.key[alice_pointer + n] And for maximum security after reading of message also erases used part of key. - EDIT: In fact this step with this simple algorithm (without integrity check and authentication) decreases security, see Paulo Ebermann post below. When Peter needs to send message to Alice they do analogical steps with peter_to_alice.key and with peter_pointer. With this trivial schema they can send for next 50 years each day 2GB / (50 * 365) = cca 115kB of encrypted data in both directions. If they need more data to send, they simple use larger memory for keys for example with today 2TB harddiscs (1TB keys) is possible to exchange next 50years 60MB/day ! (thats practicaly lots of data for example with using compression its more than hour of high quality voice communication) It Seems to me there is no way for attacker to read encrypted message without keys even if they have infinitely fast computer. because even with infinitely fast computer with brute force they get ever possible message that can fit to length of message, but this is astronomical amount of messages and attacker dont know which of them is actual message. I am right? Is this communication schema really absolutely secure? And if its secure, has this communication method its own name? (I mean XOR encryption is well-known, but whats name of this concrete practical application with use large memories at both communication sides for keys? I am humbly expecting that this application has been invented someone before me :-) ) Note: If its absolutely secure then its amazing because with today low cost large memories it is practicaly much cheeper way of secure communication than expensive quantum cryptography and with equivalent security! EDIT: I think it will be more and more practical in future with lower a lower cost of memories. It can solve secure communication forever. Today you have no certainty if someone succesfuly atack to existing ciphers one year later and make its often expensive implementations unsecure. In many cases before comunication exist step where communicating sides meets personaly, thats time to generate large keys. I think its perfect for military communication for example for communication with submarines which can have installed harddrive with large keys and military central can have harddrive for each submarine they have. It can be also practical in everyday life for example for control your bank account because when you create your account you meet with bank etc.

Read the article
Merge sort versus quick sort performance

- by Giorgio

I have implemented merge sort and quick sort using C (GCC 4.4.3 on Ubuntu 10.04 running on a 4 GB RAM laptop with an Intel DUO CPU at 2GHz) and I wanted to compare the performance of the two algorithms. The prototypes of the sorting functions are: void merge_sort(const char **lines, int start, int end); void quick_sort(const char **lines, int start, int end); i.e. both take an array of pointers to strings and sort the elements with index i : start <= i <= end. I have produced some files containing random strings with length on average 4.5 characters. The test files range from 100 lines to 10000000 lines. I was a bit surprised by the results because, even though I know that merge sort has complexity O(n log(n)) while quick sort is O(n^2), I have often read that on average quick sort should be as fast as merge sort. However, my results are the following. Up to 10000 strings, both algorithms perform equally well. For 10000 strings, both require about 0.007 seconds. For 100000 strings, merge sort is slightly faster with 0.095 s against 0.121 s. For 1000000 strings merge sort takes 1.287 s against 5.233 s of quick sort. For 5000000 strings merge sort takes 7.582 s against 118.240 s of quick sort. For 10000000 strings merge sort takes 16.305 s against 1202.918 s of quick sort. So my question is: are my results as expected, meaning that quick sort is comparable in speed to merge sort for small inputs but, as the size of the input data grows, the fact that its complexity is quadratic will become evident? Here is a sketch of what I did. In the merge sort implementation, the partitioning consists in calling merge sort recursively, i.e. merge_sort(lines, start, (start + end) / 2); merge_sort(lines, 1 + (start + end) / 2, end); Merging of the two sorted sub-array is performed by reading the data from the array lines and writing it to a global temporary array of pointers (this global array is allocate only once). After each merge the pointers are copied back to the original array. So the strings are stored once but I need twice as much memory for the pointers. For quick sort, the partition function chooses the last element of the array to sort as the pivot and scans the previous elements in one loop. After it has produced a partition of the type start ... {elements <= pivot} ... pivotIndex ... {elements > pivot} ... end it calls itself recursively: quick_sort(lines, start, pivotIndex - 1); quick_sort(lines, pivotIndex + 1, end); Note that this quick sort implementation sorts the array in-place and does not require additional memory, therefore it is more memory efficient than the merge sort implementation. So my question is: is there a better way to implement quick sort that is worthwhile trying out? If I improve the quick sort implementation and perform more tests on different data sets (computing the average of the running times on different data sets) can I expect a better performance of quick sort wrt merge sort? EDIT Thank you for your answers. My implementation is in-place and is based on the pseudo-code I have found on wikipedia in Section In-place version: function partition(array, 'left', 'right', 'pivotIndex') where I choose the last element in the range to be sorted as a pivot, i.e. pivotIndex := right. I have checked the code over and over again and it seems correct to me. In order to rule out the case that I am using the wrong implementation I have uploaded the source code on github (in case you would like to take a look at it). Your answers seem to suggest that I am using the wrong test data. I will look into it and try out different test data sets. I will report as soon as I have some results.

Read the article
FairWarning Privacy Monitoring Solutions Rely on MySQL to Secure Patient Data

- by Rebecca Hansen

FairWarning® solutions have audited well over 120 billion events, each of which was processed and stored in a MySQL database. FairWarning is the world's leading supplier of privacy monitoring solutions for electronic health records, relied on by over 1,200 Hospitals and 5,000 Clinics to keep their patients' data safe. In January 2014, FairWarning was awarded the highest commendation in healthcare IT as the first ever Category Leader for Patient Privacy Monitoring in the "2013 Best in KLAS: Software & Services" report[1]. FairWarning has used MySQL as their solutions’ database from their start in 2005 to worldwide expansion and market leadership. FairWarning recently migrated their solutions from MyISAM to InnoDB and updated from MySQL 5.5 to 5.6. Following are some of benefits they’ve had as a result of those changes and reasons for their continued reliance on MySQL (from FairWarning MySQL Case Study). Scalability to Handle Terabytes of Data FairWarning's customers have a lot of data: On average, FairWarning customers receive over 700,000 events to be processed daily. Over 25% of their customers receive over 30 million events per day, which equates to over 1 billion events and nearly one terabyte (TB) of new data each month. Databases range in size from a few hundred GBs to 10+ TBs for enterprise deployments (data are rolled off after 13 months). Low or Zero Admin = Few DBAs "MySQL has not required a lot of administration. After it's been tuned, configured, and optimized for size on initial setup, we have very low administrative costs. I can scale and add more customers without adding DBAs. This has had a big, positive impact on our business.” - Chris Arnold, FairWarning Vice President of Product Management and Engineering. Performance Schema As the size of FairWarning's customers has increased, so have their tables and data volumes. MySQL 5.6’ new maintenance and management features have helped FairWarning keep up. In particular, MySQL 5.6 performance schema’s low-level metrics have provided critical insight into how the system is performing and why. Support for Mutli-CPU Threads MySQL 5.6' support for multiple concurrent CPU threads, and FairWarning's custom data loader allow multiple files to load into a single table simultaneously vs. one at a time. As a result, their data load time has been reduced by 500%. MySQL Enterprise Hot Backup Because hospitals and clinics never stop, FairWarning solutions can’t either. FairWarning changed from using mysqldump to MySQL Enterprise Hot Backup, which has reduced downtime, restore time, and storage requirements. For many of their larger customers, restore time has decreased by 80%. MySQL Enterprise Edition and Product Roadmap Provide Complete Solution "MySQL's product roadmap fully addresses our needs. We like the fact that MySQL Enterprise Edition has everything included; there's no need to purchase separate modules." - Chris Arnold Learn More>> FairWarning MySQL Case Study Why MySQL 5.6 is an Even Better Embedded Database for Your Products presentation Updating Your Products to MySQL 5.6, Best Practices for OEMs on-demand webinar (audio and / or slides + Q&A transcript) MyISAM to InnoDB – Why and How on-demand webinar (same stuff) Top 10 Reasons to Use MySQL as an Embedded Database white paper [1] 2013 Best in KLAS: Software & Services report, January, 2014. © 2014 KLAS Enterprises, LLC. All rights reserved.

Read the article
MySQL Cluster 7.3: On-Demand Webinar and Q&A Available

- by Mat Keep

The on-demand webinar for the MySQL Cluster 7.3 Development Release is now available. You can learn more about the design, implementation and getting started with all of the new MySQL Cluster 7.3 features from the comfort and convenience of your own device, including: - Foreign Key constraints in MySQL Cluster - Node.js NoSQL API - Auto-installation of higher performance distributed, clusters We received some great questions over the course of the webinar, and I wanted to share those for the benefit of a broader audience. Q. What Foreign Key actions are supported: A. The core referential actions defined in the SQL:2003 standard are implemented: CASCADE RESTRICT NO ACTION SET NULL Q. Where are Foreign Keys implemented, ie data nodes or SQL nodes? A. They are implemented in the data nodes, therefore can be enforced for both the SQL and NoSQL APIs Q. Are they compatible with the InnoDB Foreign Key implementation? A. Yes, with the following exceptions: - InnoDB doesn’t support “No Action” constraints, MySQL Cluster does - You can choose to suspend FK constraint enforcement with InnoDB using the FOREIGN_KEY_CHECKS parameter; at the moment, MySQL Cluster ignores that parameter. - You cannot set up FKs between 2 tables where one is stored using MySQL Cluster and the other InnoDB. - You cannot change primary keys through the NDB API which means that the MySQL Server actually has to simulate such operations by deleting and re-adding the row. If the PK in the parent table has a FK constraint on it then this causes non-ideal behaviour. With Restrict or No Action constraints, the change will result in an error. With Cascaded constraints, you’d want the rows in the child table to be updated with the new FK value but, the implicit delete of the row from the parent table would remove the associated rows from the child table and the subsequent implicit insert into the parent wouldn’t reinstate the child rows. For this reason, an attempt to add an ON UPDATE CASCADE where the parent column is a primary key will be rejected. Q. Does adding or dropping Foreign Keys cause downtime due to a schema change? A. Nope, this is an online operation. MySQL Cluster supports a number of on-line schema changes, ie adding and dropping indexes, adding columns, etc. Q. Where can I see an example of node.js with MySQL Cluster? A. Check out the tutorial and download the code from GitHub Q. Can I use the auto-installer to support remote deployments? How about setting up MySQL Cluster 7.2? A. Yes to both! Q. Can I get a demo Check out the tutorial. You can download the code from http://labs.mysql.com/ Go to Select Build drop-down box Q. What is be minimum internet speen required for Geo distributed cluster with synchronous replication? A. if you're splitting you cluster between sites then we recommend a network latency of 20ms or less. Alternatively, use MySQL asynchronous replication where the latency of your WAN doesn't impact the latency of your reads/writes. Q. Where you can one learn more about the PayPal project with MySQL Cluster? A. Take a look at the following - you'll find press coverage, a video and slides from their keynote presentation So, if you want to learn more, listen to the new MySQL Cluster 7.3 on-demand webinar MySQL Cluster 7.3 is still in the development phase, so it would be great to get your feedback on these new features, and things you want to see!

Read the article
MEB Support to NetBackup MMS

- by Hema Sridharan

In MySQL Enterprise Backup 3.6, new option was introduced to support backup to tapes via SBT interface. SBT stands for System Backup to Tape, an Oracle API that helps to perform backup and restore jobs via media management software such as Oracle's Secure Backup (OSB). There are other storage managers like IBM's Tivoli Storage Manager (TSM) and Symantec's Netbackup (NB) which are also supported by MEB but we don't guarantee that it will function as expected for every release. MEB supports SBT API version 2.0 In this blog, I am primarily going to focus the interface of MEB and Symantec's NB. If we are using tapes for backup, ensure that tape library and tape drives are compatible. Test Setup 1. Install NB 7.5 master and media servers in Linux OS. ( NB 7.1 can also be used but for testing purpose I used NB 7.5)2. Install MEB 3.8 also in Linux OS.3. Install NB admin console in your windows desktop and configure the NB master server from there. Note: Ensure that you have root user permission to install NetBackup. Configuration Steps for MEB and NB Once MEB and NB are installed, Ensure that NB is linked to MEB by specifying the library /usr/openv/netbackup/bin/libobk.so64 in the mysqlbackup command line using --sbt-lib-path. Configure the NB master server from windows console. That is configure the storage units by specifying the Storage unit name, Disk type, Media Server name etc. Create NetBackup policies that are user selectable. But please make sure that policy type is "Oracle". Define the clients where MEB will be executed. Some times this will be different host where MEB is run or some times in same Media server where NB and tapes are attached. Now once the installation and configuration steps are performed for MEB and NB, the next part is the actual execution.MEB should be run as single file backup using --backup-image option with prefix sbt:(it is a tag which tells MEB that it should stream the backup image through the SBT interface) which is sent to NB client via SBT interface . The resulting backup image is stored where NB stores the images that it backs up. The following diagram shows how MEB interacts with MMS through SBT interface. Backup The following parameters should also be ready for the execution, --sbt-lib-path : Path to SBT library specific to NetBackup MMS. SBT lib for NetBackup is in /usr/openv/netbackup/bin/libobk.so64 --sbt-environment: Environment variables must be defined specific to NetBackup. In our example below, we use NB_ORA_SERV=myserver.com, NB_ORA_CLIENT=myserver.com, NB_ORA_POLICY=NBU-MEB ORACLE_HOME = /export/home2/tmp/hema/mysql-server/ ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ./mysqlbackup --port=13000 --protocol=tcp --user=root --backup-image=sbt:bkpsbtNB --sbt-lib-path=/usr/openv/netbackup/bin/libobk.so64 --sbt-environment="NB_ORA_SERV=myserver.com, NB_ORA_CLIENT=myserver.com, NB_ORA_POLICY=NBU-MEB, ORACLE_HOME=/export/home2/tmp/hema/mysql-server/” --backup-dir=/export/home2/tmp/hema/MEB_bkdir/ backup-to-image ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Once backup is completed successfully, this should appear in Activity Monitor in NetBackup Console.For restore, image contents has to be extracted using image-to-backup-dir command and then apply-log and copy-back steps are applied. ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ./mysqlbackup --sbt-lib-path=/usr/openv/netbackup/bin/libobk.so64 --backup-dir=/export/home2/tmp/hema/NBMEB/ --backup-image=sbt:bkpsbtNB image-to-backup-dir-----------------------------------------------------------------------------------------------------------------------------------Now apply logs as usual, shutdown the server and perform restore, restart the server and check the data contents. ./mysqlbackup ---backup-dir=/export/home2/tmp/hema/NBMEB/ apply-log ./mysqlbackup --datadir=/export/home2/tmp/hema/mysql-server/mysql-5.5-meb-repo/mysql-test/var/mysqld.1/data/ --backup-dir=/export/home2/tmp/hema/MEB_bkpdir/ innodb_log_files_in_group=2 --innodb_log_file_size=5M --user=root --port=13000 --protocol=tcp copy-back The NB console should show 'Restore" job as done. If you don't see that there is something wrong with MEB or NetBackup.You can also refer to more detailed steps of MEB and NB integration in whitepaper here

Read the article
Page output caching for dynamic web applications

- by Mike Ellis

I am currently working on a web application where the user steps (forward or back) through a series of pages with "Next" and "Previous" buttons, entering data until they reach a page with the "Finish" button. Until finished, all data is stored in Session state, then sent to the mainframe database via web services at the end of the process. Some of the pages display data from previous pages in order to collect additional information. These pages can never be cached because they are different for every user. For pages that don't display this dynamic data, they can be cached, but only the first time they load. After that, the data that was previously entered needs to be displayed. This requires Page_Load to fire, which means the page can't be cached at that point. A couple of weeks ago, I knew almost nothing about implementing page caching. Now I still don't know much, but I know a little bit, and here is the solution that I developed with the help of others on my team and a lot of reading and trial-and-error. We have a base page class defined from which all pages inherit. In this class I have defined a method that sets the caching settings programmatically. For pages that can be cached, they call this base page method in their Page_Load event within a if(!IsPostBack) block, which ensures that only the page itself gets cached, not the data on the page. if(!IsPostBack) { ... SetCacheSettings(); ... } protected void SetCacheSettings() { Response.Cache.AddValidationCallback(new HttpCacheValidateHandler(Validate), null); Response.Cache.SetExpires(DateTime.Now.AddHours(1)); Response.Cache.SetSlidingExpiration(true); Response.Cache.SetValidUntilExpires(true); Response.Cache.SetCacheability(HttpCacheability.ServerAndNoCache); } The AddValidationCallback sets up an HttpCacheValidateHandler method called Validate which runs logic when a cached page is requested. The Validate method signature is standard for this method type. public static void Validate(HttpContext context, Object data, ref HttpValidationStatus status) { string visited = context.Request.QueryString["v"]; if (visited != null && "1".Equals(visited)) { status = HttpValidationStatus.IgnoreThisRequest; //force a page load } else { status = HttpValidationStatus.Valid; //load from cache } } I am using the HttpValidationStatus values IgnoreThisRequest or Valid which forces the Page_Load event method to run or allows the page to load from cache, respectively. Which one is set depends on the value in the querystring. The value in the querystring is set up on each page in the "Next" and "Previous" button click event methods based on whether the page that the button click is taking the user to has any data on it or not. bool hasData = HasPageBeenVisited(url); if (hasData) { url += VISITED; } Response.Redirect(url); The HasPageBeenVisited method determines whether the destination page has any data on it by checking one of its required data fields. (I won't include it here because it is very system-dependent.) VISITED is a string constant containing "?v=1" and gets appended to the url if the destination page has been visited. The reason this logic is within the "Next" and "Previous" button click event methods is because 1) the Validate method is static which doesn't allow it to access non-static data such as the data fields for a particular page, and 2) at the time at which the Validate method runs, either the data has not yet been deserialized from Session state or is not available (different AppDomain?) because anytime I accessed the Session state information from the Validate method, it was always empty.

Read the article
Speeding up procedural texture generation

- by FalconNL

Recently I've begun working on a game that takes place in a procedurally generated solar system. After a bit of a learning curve (having neither worked with Scala, OpenGL 2 ES or Libgdx before), I have a basic tech demo going where you spin around a single procedurally textured planet: The problem I'm running into is the performance of the texture generation. A quick overview of what I'm doing: a planet is a cube that has been deformed to a sphere. To each side, a n x n (e.g. 256 x 256) texture is applied, which are bundled in one 8n x n texture that is sent to the fragment shader. The last two spaces are not used, they're only there to make sure the width is a power of 2. The texture is currently generated on the CPU, using the updated 2012 version of the simplex noise algorithm linked to in the paper 'Simplex noise demystified'. The scene I'm using to test the algorithm contains two spheres: the planet and the background. Both use a greyscale texture consisting of six octaves of 3D simplex noise, so for example if we choose 128x128 as the texture size there are 128 x 128 x 6 x 2 x 6 = about 1.2 million calls to the noise function. The closest you will get to the planet is about what's shown in the screenshot and since the game's target resolution is 1280x720 that means I'd prefer to use 512x512 textures. Combine that with the fact the actual textures will of course be more complicated than basic noise (There will be a day and night texture, blended in the fragment shader based on sunlight, and a specular mask. I need noise for continents, terrain color variation, clouds, city lights, etc.) and we're looking at something like 512 x 512 x 6 x 3 x 15 = 70 million noise calls for the planet alone. In the final game, there will be activities when traveling between planets, so a wait of 5 or 10 seconds, possibly 20, would be acceptable since I can calculate the texture in the background while traveling, though obviously the faster the better. Getting back to our test scene, performance on my PC isn't too terrible, though still too slow considering the final result is going to be about 60 times worse: 128x128 : 0.1s 256x256 : 0.4s 512x512 : 1.7s This is after I moved all performance-critical code to Java, since trying to do so in Scala was a lot worse. Running this on my phone (a Samsung Galaxy S3), however, produces a more problematic result: 128x128 : 2s 256x256 : 7s 512x512 : 29s Already far too long, and that's not even factoring in the fact that it'll be minutes instead of seconds in the final version. Clearly something needs to be done. Personally, I see a few potential avenues, though I'm not particularly keen on any of them yet: Don't precalculate the textures, but let the fragment shader calculate everything. Probably not feasible, because at one point I had the background as a fullscreen quad with a pixel shader and I got about 1 fps on my phone. Use the GPU to render the texture once, store it and use the stored texture from then on. Upside: might be faster than doing it on the CPU since the GPU is supposed to be faster at floating point calculations. Downside: effects that cannot (easily) be expressed as functions of simplex noise (e.g. gas planet vortices, moon craters, etc.) are a lot more difficult to code in GLSL than in Scala/Java. Calculate a large amount of noise textures and ship them with the application. I'd like to avoid this if at all possible. Lower the resolution. Buys me a 4x performance gain, which isn't really enough plus I lose a lot of quality. Find a faster noise algorithm. If anyone has one I'm all ears, but simplex is already supposed to be faster than perlin. Adopt a pixel art style, allowing for lower resolution textures and fewer noise octaves. While I originally envisioned the game in this style, I've come to prefer the realistic approach. I'm doing something wrong and the performance should already be one or two orders of magnitude better. If this is the case, please let me know. If anyone has any suggestions, tips, workarounds, or other comments regarding this problem I'd love to hear them.

Read the article
Is SugarCRM really adequate for custom development (or adequate at all)? [closed]

- by dukeofgaming

Have you used SugarCRM for custom development successfully?, if so, have you done it programmatically or through the Module Builder? Were you successful? If not, why? I used SugarCRM for a project about two years ago, I ran into errors from the very installation, having to hack the actual installation file to deploy the software in the server and other erros that I can't recall now. Two years after, I'm picking it up for a project once again. I'm feeling like I should have developed the whole thing from scratch myself. Some examples: I couldn't install it in the server (again). I had to install it locally, then copy the files and database over to the server and manually edit the config file. Constantly getting deployment errors from the module builder. One reason is SugarCRM keeps creating a record in the upgrade_history table for a file that does not exist, I keep deleting such record and it keeps coming back corrupt. I get other deployment errors, but have not figured them out. then I have to rollback all files and database to try again. I deleted a custom module with relationships, the relationships stayed in the other modules and cannot be deleted anymore, PHP warnings all over the place. Quick create for custom modules does not appear, hack needed. Its whole cache directory is a joke, permanent data/files are stored there. The module builder interface disappears required fields. Edit the wrong thing, module builder won't deploy again, then pray Quick Repair and/or Rebuild Relationships do the trick. My impression of SugarCRM now is that, regardless of its pretty exterior and apparent functionality, it is a very low quality piece of software. This even scared me more: http://amplicate.com/hate/sugarcrm; a quote: I wis this info had been available when I tried to implement it 2 years ago... I searched high and low and the only info I found was positive. Yes, it's a piece of crap. The community edition was full of bugs... nothing worked. Essentially I got fired for implementing it. I'm glad though, because now I work for myself, am much happier and make more money... so, I should really thank SugarCRM for sucking so much I guess! I figured that perhaps some of you have had similar experiences, and have either sticked with SugarCRM or moved on to another solution. I'm very interested in knowing what your resolutions were -or your current situations are- to make up my own mind, since the project I'm working on is long term and I'm feeling SugarCRM will be more an obstacle than an aid. After further failed attempts to continue using this software I continued to stumble upon dead-ends when using the module editor, I could only recover from this errors by using version control. We are now moving on to a custom implementation using Symfony; perhaps if we were using it with its out-of-the-box modules we would have sticked with it.

Read the article
Big Data – Basics of Big Data Architecture – Day 4 of 21

- by Pinal Dave

In yesterday’s blog post we understood how Big Data evolution happened. Today we will understand basics of the Big Data Architecture. Big Data Cycle Just like every other database related applications, bit data project have its development cycle. Though three Vs (link) for sure plays an important role in deciding the architecture of the Big Data projects. Just like every other project Big Data project also goes to similar phases of the data capturing, transforming, integrating, analyzing and building actionable reporting on the top of the data. While the process looks almost same but due to the nature of the data the architecture is often totally different. Here are few of the question which everyone should ask before going ahead with Big Data architecture. Questions to Ask How big is your total database? What is your requirement of the reporting in terms of time – real time, semi real time or at frequent interval? How important is the data availability and what is the plan for disaster recovery? What are the plans for network and physical security of the data? What platform will be the driving force behind data and what are different service level agreements for the infrastructure? This are just basic questions but based on your application and business need you should come up with the custom list of the question to ask. As I mentioned earlier this question may look quite simple but the answer will not be simple. When we are talking about Big Data implementation there are many other important aspects which we have to consider when we decide to go for the architecture. Building Blocks of Big Data Architecture It is absolutely impossible to discuss and nail down the most optimal architecture for any Big Data Solution in a single blog post, however, we can discuss the basic building blocks of big data architecture. Here is the image which I have built to explain how the building blocks of the Big Data architecture works. Above image gives good overview of how in Big Data Architecture various components are associated with each other. In Big Data various different data sources are part of the architecture hence extract, transform and integration are one of the most essential layers of the architecture. Most of the data is stored in relational as well as non relational data marts and data warehousing solutions. As per the business need various data are processed as well converted to proper reports and visualizations for end users. Just like software the hardware is almost the most important part of the Big Data Architecture. In the big data architecture hardware infrastructure is extremely important and failure over instances as well as redundant physical infrastructure is usually implemented. NoSQL in Data Management NoSQL is a very famous buzz word and it really means Not Relational SQL or Not Only SQL. This is because in Big Data Architecture the data is in any format. It can be unstructured, relational or in any other format or from any other data source. To bring all the data together relational technology is not enough, hence new tools, architecture and other algorithms are invented which takes care of all the kind of data. This is collectively called NoSQL. Tomorrow Next four days we will answer the Buzz Words – Hadoop. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Read the article
std::map for storing static const Objects

- by Sean M.

I am making a game similar to Minecraft, and I am trying to fine a way to keep a map of Block objects sorted by their id. This is almost identical to the way that Minecraft does it, in that they declare a bunch of static final Block objects and initialize them, and then the constructor of each block puts a reference of that block into whatever the Java equivalent of a std::map is, so there is a central place to get ids and the Blocks with those ids. The problem is, that I am making my game in C++, and trying to do the exact same thing. In Block.h, I am declaring the Blocks like so: //Block.h public: static const Block Vacuum; static const Block Test; And in Block.cpp I am initializing them like so: //Block.cpp const Block Block::Vacuum = Block("Vacuum", 0, 0); const Block Block::Test = Block("Test", 1, 0); The block constructor looks like this: Block::Block(std::string name, uint16 id, uint8 tex) { //Check for repeat ids if (IdInUse(id)) { fprintf(stderr, "Block id %u is already in use!", (uint32)id); throw std::runtime_error("You cannot reuse block ids!"); } _id = id; //Check for repeat names if (NameInUse(name)) { fprintf(stderr, "Block name %s is already in use!", name); throw std::runtime_error("You cannot reuse block names!"); } _name = name; _tex = tex; //fprintf(stdout, "Using texture %u\n", _tex); _transparent = false; _solidity = 1.0f; idMap[id] = this; nameMap[name] = this; } And finally, the maps that I'm using to store references of Blocks in relation to their names and ids are declared as such: std::map<uint16, Block*> Block::idMap = std::map<uint16, Block*>(); //The map of block ids std::map<std::string, Block*> Block::nameMap = std::map<std::string, Block*>(); //The map of block names The problem comes when I try to get the Blocks in the maps using a method called const Block* GetBlock(uint16 id), where the last line is return idMap.at(id);. This line returns a Block with completely random values like _visibility = 0xcccc and such like that, found out through debugging. So my question is, is there something wrong with the blocks being declared as const obejcts, and then stored at pointers and accessed later on? The reason I cant store them as Block& is because that makes a copy of the Block when it is entered, so the block wouldn't have any of the attributes that could be set afterwards in the constructor of any child class, so I think I need to store them as a pointer. Any help is greatly appreciated, as I don't fully understand pointers yet. Just ask if you need to see any other parts of the code.

Read the article
Entity Framework - Single EMDX Mapping Multiple Database

- by michaelalisonalviar

Because of my recent craze on Entity Framework thanks to Sir Humprey, I have continuously searched the Internet for tutorials on how to apply it to our current system. So I've come to learn that with EF, I can eliminate the numerous coding of methods/functions for CRUD operations, my overly used assigning of connection strings, Data Adapters or Data Readers as Entity Framework will map my desired database and will do its magic to create entities for each table I want (using EF Powertool) and does all the methods/functions for my Crud Operations. But as I begin applying it to a new project I was assigned to, I realized our current server is designed to contain each similar entities in different databases. For example Our lookup tables are stored in LookupDb, Accounting-related tables are in AccountingDb, Sales-related tables in SalesDb. My dilemma is I have to use an existing table from LookupDB and use it as a look-up for my new table. Then I have found Miss Rachel's Blog (here)Thank You Miss Rachel! which enables me to let EF think that my TableLookup1 is in the AccountingDB using the following steps. Im on VS 2010, I am using C# , Using Entity Framework 5, SQL Server 2008 as our DB ServerStep 1:Creating A SQL Synonym. If you want a more detailed discussion on synonyms, this was what i have read -> (link here). To simply put it, A synonym enabled me to simplify my query for the Look-up table when I'm using the AccountingDB fromSELECT [columns] FROM LookupDB.dbo.TableLookup1toSELECT [columns] FROM TableLookup1Syntax: CREATE SYNONYM TableLookup1(1) FOR LookupDB.dbo.TableLookup1 (2)1. What you want to call the table on your other DB2. DataBaseName.schema.TableNameStep 2: We will now follow Miss Rachel's steps. you can either visit the link on the original topic I posted earlier or just follow the step I made.1. I created a Visual Basic Solution that will contain the 4 projects needed to complete the merging2. First project will contain the edmx file pointing to the AccountingDB3. Second project will contain the edmx file pointing to the LookupDB4. Third Project will will be our repository of the merged edmx file. Create an edmx file pointing To AccountingDB as this the database that we created the Synonym on.Reminder: Aside from using the same name for the Entities, please make sure that you have the same Model Namespace for all your Entities 5. Fourth project that will contain the beautiful EDMX merger that Miss Rachel created that will free you from Hard coding of the merge/recoding the Edmx File of the third project everytime a change is done on either one of the first two projects' Edmx File.6. Run the solution, but make sure that on the solutions properties Single startup project is selected and the project containing the EDMX merger is selected.7. After running the solution, double click on the EDMX file of the 3rd project and set Lazy Loading Enabled = False. This will let you use the tables/entities that you see in that EDMX File.8. Feel free to do your CRUD Operations.I don't know if EF 5 already has a feature to support synonyms as I am still a newbie on that aspect but I have seen a linked where there are supposed suggestions on Entity Framework upgrades and one is the "Support for multiple databases" So that's it! Thanks for reading!

Read the article
Using GMail's SMTP and IMAP servers in Notification Mailer

- by Saroja Kandepuneni

Overview GMail offers free, reliable, popular SMTP and IMAP services, because of which many people are interested to use it. GMail can be used when there are no in-house SMTP/IMAP servers for testing or debugging purposes. This blog explains how to install GMail SSL certificate in Concurrent Tier, testing the connection using a standalone program, running Mailer diagnostics and configuring GMail IMAP and SMTP servers for Workflow Notification Mailer Inbound and Outbound connections. GMail servers configuration SMTP server Host Name smtp.gmail.com SSL Port 465 TLS/SSL required Yes User Name Your full email address (including @gmail.com or @your_domain.com) Password Your gmail passwor IMAP server Host Name imap.gmail.com SSL Port 993 TLS/SSL Required Yes User Name Your full email address (including @gmail.com or @your_domain.com) Password Your gmail password GMail SSL Certificate Installation The following is the procedure to install the GMail SSL certificate Copy the below GMail SSL certificate to a file eg: gmail.cer -----BEGIN CERTIFICATE-----MIIDWzCCAsSgAwIBAgIKaNPuGwADAAAisjANBgkqhkiG9w0BAQUFADBGMQswCQYDVQQGEwJVUzETMBEGA1UEChMKR29vZ2xlIEluYzEiMCAGA1UEAxMZR29vZ2xlIEludGVybmV0IEF1dGhvcml0eTAeFw0xMTAyMTYwNDQzMDRaFw0xMjAyMTYwNDUzMDRaMGgxCzAJBgNVBAYTAlVTMRMwEQYDVQQIEwpDYWxpZm9ybmlhMRYwFAYDVQQHEw1Nb3VudGFpbiBWaWV3MRMwEQYDVQQKEwpHb29nbGUgSW5jMRcwFQYDVQQDEw5pbWFwLmdtYWlsLmNvbTCBnzANBgkqhkiG9w0BAQEFAAOBjQAwgYkCgYEAqfPyPSEHpfzvXx+9zGUxoxcOXFrGKCbZ8bfUd8JonC7rfId32t0gyAoLCgM6eU4lN05VenNZUoChL/nrX+ApdMQv9UFV58aYSBMU/pMmK5GXansbXlpHao09Mc8eur2xV+4cnEtxUvzpco/OaG15HDXcr46c6hN6P4EEFRcb0ccCAwEAAaOCASwwggEoMB0GA1UdDgQWBBQj27IIOfeIMyk1hDRzfALz4WpRtzAfBgNVHSMEGDAWgBS/wDDr9UMRPme6npH7/Gra42sSJDBbBgNVHR8EVDBSMFCgTqBMhkpodHRwOi8vd3d3LmdzdGF0aWMuY29tL0dvb2dsZUludGVybmV0QXV0aG9yaXR5L0dvb2dsZUludGVybmV0QXV0aG9yaXR5LmNybDBmBggrBgEFBQcBAQRaMFgwVgYIKwYBBQUHMAKGSmh0dHA6Ly93d3cuZ3N0YXRpYy5jb20vR29vZ2xlSW50ZXJuZXRBdXRob3JpdHkvR29vZ2xlSW50ZXJuZXRBdXRob3JpdHkuY3J0MCEGCSsGAQQBgjcUAgQUHhIAVwBlAGIAUwBlAHIAdgBlAHIwDQYJKoZIhvcNAQEFBQADgYEAxHVhW4aII3BPrKQGUdhOLMmdUyyr3TVmhJM9tPKhcKQ/IcBYUev6gLsB7FH/n2bIJkkIilwZWIsj9jVJaQyJWP84Hjs3kus4fTpAOHKkLqrbIZDYjwVueLmbOqr1U1bNe4E/LTyEf37+Y5hcveWBQduIZnHn1sDE2gA7LnUxvAU=-----END CERTIFICATE----- Install the SSL certificate into the default JRE location or any other location using below command Installing into a dfeault JRE location in EBS instance # keytool -import -trustcacerts -keystore $AF_JRE_TOP/lib/security/cacerts -storepass changeit -alias gmail-lnx_chainnedcert -file gmail.cer Install into a custom location # keytool -import -trustcacerts -keystore <customLocation> -storepass changeit -alias gmail-lnx_chainnedcert -file gmail.cer <customLocation> -- directory in instance where the certificate need to be installed After running the above command you can see the following response Trust this certificate? [no]: yes Certificate was added to keystore Running Mailer Command Line Diagnostics Run Mailer command line diagnostics from conccurrent tier where Mailer is running, to check the IMAP connection using the below command $AFJVAPRG -classpath $AF_CLASSPATH -Dprotocol=imap -Ddbcfile=$FND_SECURE/$TWO_TASK.dbc -Dserver=imap.gmail.com -Dport=993 -Dssl=Y -Dtruststore=$AF_JRE_TOP/lib/security/cacerts -Daccount=<gmail username> -Dpassword=<password> -Dconnect_timeout=120 -Ddebug=Y -Dlogfile=GmailImapTest.log -DdebugMailSession=Y oracle.apps.fnd.wf.mailer.Mailer Run Mailer command line diagnostics from concurrent tier where Mailer is running, to check the SMTP connection using the below command $AFJVAPRG -classpath $AF_CLASSPATH -Dprotocol=smtp -Ddbcfile=$FND_SECURE/$TWO_TASK.dbc -Dserver=smtp.gmail.com -Dport=465 -Dssl=Y -Dtruststore=$AF_JRE_TOP/lib/security/cacerts -Daccount=<gmail username> -Dpassword=<password> -Dconnect_timeout=120 -Ddebug=Y -Dlogfile=GmailSmtpTest.log -DdebugMailSession=Y oracle.apps.fnd.wf.mailer.Mailer Standalone program to verify the IMAP connection Run the below standalone program from the concurrent tier node where Mailer is running to verify the connection with GMail IMAP server. It connects to the Gmail IMAP server with the given GMail user name and password and lists all the folders that exist in that account. If the Gmail IMAP server is not working for the Mailer check whether the PROCESSED and DISCARD folders exist for the GMail account, if not create manually by logging into GMail account.Sample program to test GMail IMAP connection The standalone program can be run as below $java GmailIMAPTest GmailUsername GMailUserPassword Standalone program to verify the SMTP connection Run the below standalone program from the concurrent tier node where Mailer is running to verify the connection with GMail SMTP server. It connects to the GMail SMTP server by authenticating with the given user name and password and sends a test email message to the give recipient user email address. Sample program to test GMail SMTP connection The standalone program can be run as below $java GmailSMTPTest GmailUsername gMailPassword recipientEmailAddress Warnings As gmail.com is an external domain, the Mailer concurrent tier should allow the connection with GMail server Please keep in mind when using it for corporate facilities, that the e-mail data would be stored outside the corporate network

Read the article
Best Design Pattern for Coupling User Interface Components and Data Structures

- by szahn

I have a windows desktop application with a tree view. Due to lack of a sound data-binding solution for a tree view, I've implemented my own layer of abstraction on it to bind nodes to my own data structure. The requirements are as follows: Populate a tree view with nodes that resemble fields in a data structure. When a node is clicked, display the appropriate control to modify the value of that property in the instance of the data structure. The tree view is populated with instances of custom TreeNode classes that inherit from TreeNode. The responsibility of each custom TreeNode class is to (1) format the node text to represent the name and value of the associated field in my data structure, (2) return the control used to modify the property value, (3) get the value of the field in the control (3) set the field's value from the control. My custom TreeNode implementation has a property called "Control" which retrieves the proper custom control in the form of the base control. The control instance is stored in the custom node and instantiated upon first retrieval. So each, custom node has an associated custom control which extends a base abstract control class. Example TreeNode implementation: //The Tree Node Base Class public abstract class TreeViewNodeBase : TreeNode { public abstract CustomControlBase Control { get; } public TreeViewNodeBase(ExtractionField field) { UpdateControl(field); } public virtual void UpdateControl(ExtractionField field) { Control.UpdateControl(field); UpdateCaption(FormatValueForCaption()); } public virtual void SaveChanges(ExtractionField field) { Control.SaveChanges(field); UpdateCaption(FormatValueForCaption()); } public virtual string FormatValueForCaption() { return Control.FormatValueForCaption(); } public virtual void UpdateCaption(string newValue) { this.Text = Caption; this.LongText = newValue; } } //The tree node implementation class public class ExtractionTypeNode : TreeViewNodeBase { private CustomDropDownControl control; public override CustomControlBase Control { get { if (control == null) { control = new CustomDropDownControl(); control.label1.Text = Caption; control.comboBox1.Items.Clear(); control.comboBox1.Items.AddRange( Enum.GetNames( typeof(ExtractionField.ExtractionType))); } return control; } } public ExtractionTypeNode(ExtractionField field) : base(field) { } } //The custom control base class public abstract class CustomControlBase : UserControl { public abstract void UpdateControl(ExtractionField field); public abstract void SaveChanges(ExtractionField field); public abstract string FormatValueForCaption(); } //The custom control generic implementation (view) public partial class CustomDropDownControl : CustomControlBase { public CustomDropDownControl() { InitializeComponent(); } public override void UpdateControl(ExtractionField field) { //Nothing to do here } public override void SaveChanges(ExtractionField field) { //Nothing to do here } public override string FormatValueForCaption() { //Nothing to do here return string.Empty; } } //The custom control specific implementation public class FieldExtractionTypeControl : CustomDropDownControl { public override void UpdateControl(ExtractionField field) { comboBox1.SelectedIndex = comboBox1.FindStringExact(field.Extraction.ToString()); } public override void SaveChanges(ExtractionField field) { field.Extraction = (ExtractionField.ExtractionType) Enum.Parse(typeof(ExtractionField.ExtractionType), comboBox1.SelectedItem.ToString()); } public override string FormatValueForCaption() { return string.Empty; } The problem is that I have "generic" controls which inherit from CustomControlBase. These are just "views" with no logic. Then I have specific controls that inherit from the generic controls. I don't have any functions or business logic in the generic controls because the specific controls should govern how data is associated with the data structure. What is the best design pattern for this?

Read the article
maintaining a growing, diverse codebase with continuous integration

- by Nate

I am in need of some help with philosophy and design of a continuous integration setup. Our current CI setup uses buildbot. When I started out designing it, I inherited (well, not strictly, as I was involved in its design a year earlier) a bespoke CI builder that was tailored to run the entire build at once, overnight. After a while, we decided that this was insufficient, and started exploring different CI frameworks, eventually choosing buildbot. One of my goals in transitioning to buildbot (besides getting to enjoy all the whiz-bang extras) was to overcome some of the inadequacies of our bespoke nightly builder. Humor me for a moment, and let me explain what I have inherited. The codebase for my company is almost 150 unique c++ Windows applications, each of which has dependencies on one or more of a dozen internal libraries (and many on 3rd party libraries as well). Some of these libraries are interdependent, and have depending applications that (while they have nothing to do with each other) have to be built with the same build of that library. Half of these applications and libraries are considered "legacy" and unportable, and must be built with several distinct configurations of the IBM compiler (for which I have written unique subclasses of Compile), and the other half are built with visual studio. The code for each compiler is stored in two separate Visual SourceSafe repositories (which I am simply handling using a bunch of ShellCommands, as there is no support for VSS). Our original nightly builder simply took down the source for everything, and built stuff in a certain order. There was no way to build only a single application, or pick a revision, or to group things. It would launched virtual machines to build a number of the applications. It wasn't very robust, it wasn't distributable. It wasn't terribly extensible. I wanted to be able to overcame all of these limitations in buildbot. The way I did this originally was to create entries for each of the applications we wanted to build (all 150ish of them), then create triggered schedulers that could build various applications as groups, and then subsume those groups under an overall nightly build scheduler. These could run on dedicated slaves (no more virtual machine chicanery), and if I wanted I could simply add new slaves. Now, if we want to do a full build out of schedule, it's one click, but we can also build just one application should we so desire. There are four weaknesses of this approach, however. One is our source tree's complex web of dependencies. In order to simplify config maintenace, all builders are generated from a large dictionary. The dependencies are retrieved and built in a not-terribly robust fashion (namely, keying off of certain things in my build-target dictionary). The second is that each build has between 15 and 21 build steps, which is hard to browse and look at in the web interface, and since there are around 150 columns, takes forever to load (think from 30 seconds to multiple minutes). Thirdly, we no longer have autodiscovery of build targets (although, as much as one of my coworkers harps on me about this, I don't see what it got us in the first place). Finally, aformentioned coworker likes to constantly bring up the fact that we can no longer perform a full build on our local machine (though I never saw what that got us, either, considering that it took three times as long as the distributed build; I think he is just paranoically phobic of ever breaking the build). Now, moving to new development, we are starting to use g++ and subversion (not porting the old repository, mind you - just for the new stuff). Also, we are starting to do more unit testing ("more" might give the wrong picture... it's more like any), and integration testing (using python). I'm having a hard time figuring out how to fit these into my existing configuration. So, where have I gone wrong philosophically here? How can I best proceed forward (with buildbot - it's the only piece of the puzzle I have license to work on) so that my configuration is actually maintainable? How do I address some of my design's weaknesses? What really works in terms of CI strategies for large, (possibly over-)complex codebases?

Read the article

Search Results

Search found 8893 results on 356 pages for 'stored'.

Page 314/356 | < Previous Page | 310 311 312 313 314 315 316 317 318 319 320 321 | Next Page >

- by Tony Davis

- by Jalpesh P. Vadgama

- by Tony Davis

- by Tony Davis

- by rchrd

- by Pinal Dave

- by Mike.Hallett(at)Oracle-BI&EPM

- by Troy Kitch

- by Steve Loethen

- by pinaldave

- by jjg

- by user3123061

- by Giorgio

- by Rebecca Hansen

- by Mat Keep

- by Hema Sridharan

- by Mike Ellis

- by FalconNL

- by dukeofgaming

- by Pinal Dave

- by Sean M.

- by michaelalisonalviar

- by Saroja Kandepuneni

- by szahn

- by Nate

< Previous Page | 310 311 312 313 314 315 316 317 318 319 320 321 | Next Page >