Search Results

Search found 20 results on 1 pages for 'degenerate'.

Page 1/1 | 1 

  • OWB 11gR2 – Degenerate Dimensions

    - by David Allan
    Ever wondered how to build degenerate dimensions in OWB and get the benefits of slowly changing dimensions and cube loading? Now its possible through some changes in 11gR2 to make the dimension and cube loading much more flexible. This will let you get the benefits of OWB's surrogate key handling and slowly changing dimension reference when loading the fact table and need degenerate dimensions (see Ralph Kimball's degenerate dimensions design tip). Here we will see how to use the cube operator to load slowly changing, regular and degenerate dimensions. The cube and cube operator can now work with dimensions which have no surrogate key as well as dimensions with surrogates, so you can get the benefit of the cube loading and incorporate the degenerate dimension loading. What you need to do is create a dimension in OWB that is purely used for ETL metadata; the dimension itself is never deployed (its table is, but has not data) it has no surrogate keys has a single level with a business attribute the degenerate dimension data and a dummy attribute, say description just to pass the OWB validation. When this degenerate dimension is added into a cube, you will need to configure the fact table created and set the 'Deployable' flag to FALSE for the foreign key generated to the degenerate dimension table. The degenerate dimension reference will then be in the cube operator and used when matching. Create the degenerate dimension using the regular wizard. Delete the Surrogate ID attribute, this is not needed. Define a level name for the dimension member (any name). After the wizard has completed, in the editor delete the hierarchy STANDARD that was automatically generated, there is only a single level, no need for a hierarchy and this shouldn't really be created. Deploy the implementing table DD_ORDERNUMBER_TAB, this needs to be deployed but with no data (the mapping here will do a left outer join of the source data with the empty degenerate dimension table). Now, go ahead and build your cube, use the regular TIMES dimension for example and your degenerate dimension DD_ORDERNUMBER, can add in SCD dimensions etc. Configure the fact table created and set Deployable to false, so the foreign key does not get generated. Can now use the cube in a mapping and load data into the fact table via the cube operator, this will look after surrogate lookups and slowly changing dimension references.   If you generate the SQL you will see the ON clause for matching includes the columns representing the degenerate dimension columns. Here we have seen how this use case for loading fact tables using degenerate dimensions becomes a whole lot simpler using OWB 11gR2. I'm sure there are other use cases where using this mix of dimensions with surrogate and regular identifiers is useful, Fact tables partitioned by date columns is another classic example that this will greatly help and make the cube operator much more useful. Good to hear any comments.

    Read the article

  • What is the practical use of IBOs / degenerate vertex in OpenGL?

    - by 0xFAIL
    Vertices in 3D models CAN get cut in the process of optimizing 3D geometry, (degenerate vertices) by 3D graphics software (Blender, ...) when exporting because they aren't needed when reusing a vertex for multiple triangles. (In the current case 3D data is exported from Blender as .ply and read by a simple application that displays the 3D model) Every vertex has a few attributes like position, color, normal, tangent,... But the data for each vertex that is cut through the vertex sharing is lost and is missing in the vertex shader. Modern shader techniques like Bump or Normal mapping require normals/tangents per vertex which are also cut. To use complex shader techniques IBOs must not be used? Or is there a way to use IBOs and retain the data per vertex that was origionally lost?

    Read the article

  • Subselecting with MDX

    - by Vince
    Greetings stack overflow community. I've recently started building an OLAP cube in SSAS2008 and have gotten stuck. I would be grateful if someone could at least point me towards the right direction. Situation: Two fact tables, same cube. FactCalls holds information about calls made by subscribers, FactTopups holds topup data. Both tables have numerous common dimensions one of them being the Subscriber dimension. FactCalls             FactTopups SubscriberKey      SubscriberKey CallDuration         DateKey CallCost               Topup Value ... What I am trying to achieve is to be able to build FactCalls reports based on distinct subscribers that have topped up their accounts within the last 7 days. What I am basically looking for an MDX equivalent to SQL's: select * from FactCalls where SubscriberKey in ( select distinct SubscriberKey from FactTopups where ... ); I've tried creating a degenerate dimension for both tables containing SubscriberKey and doing: Exist( [Calls Degenerate].[Subscriber Key].Children, [Topups Degenerate].[Subscriber Key].Children ) Without success. Kind regards, Vince

    Read the article

  • Why should I use a switched network over routed?

    - by SRobertJames
    Now that routers are affordable, why should I build a network using Layer 2 switches, which degenerate to broadcasting under poor conditions, and not just use real routing at Layer 3? Edit: Got some great replies. Let me clarify the question: Of course, at the lowest level, you want to plug your end nodes into a switch, not a router (as demonstrated by AlReece). I'm referring to switches which are used to bridge traffic between segments - that is, switches connected to other switches.

    Read the article

  • How does Requiring users to Periodically Change their Passwords Improve Security? [closed]

    - by Bob Kaufman
    I've had the same password for some sites for years with no regrets. Meanwhile, at work, I find myself being forced to change passwords every two to three months. My thinking is that if a password gets compromised, requiring that I change it several weeks out isn't going to protect me or the network very much. Moreover, I find that by being required to change passwords frequently, I degenerate into a predictable password pattern (e.g., BearsFan111, BearsFan222, ...) which results in easier to remember and easier to guess passwords. Is there a sound argument for requiring that passwords be changed periodically?

    Read the article

  • SQL Rally Pre-Con: Data Warehouse Modeling – Making the Right Choices

    - by Davide Mauri
    As you may have already learned from my old post or Adam’s or Kalen’s posts, there will be two SQL Rally in North Europe. In the Stockholm SQL Rally, with my friend Thomas Kejser, I’ll be delivering a pre-con on Data Warehouse Modeling: Data warehouses play a central role in any BI solution. It's the back end upon which everything in years to come will be created. For this reason, it must be rock solid and yet flexible at the same time. To develop such a data warehouse, you must have a clear idea of its architecture, a thorough understanding of the concepts of Measures and Dimensions, and a proven engineered way to build it so that quality and stability can go hand-in-hand with cost reduction and scalability. In this workshop, Thomas Kejser and Davide Mauri will share all the information they learned since they started working with data warehouses, giving you the guidance and tips you need to start your BI project in the best way possible?avoiding errors, making implementation effective and efficient, paving the way for a winning Agile approach, and helping you define how your team should work so that your BI solution will stand the test of time. You'll learn: Data warehouse architecture and justification Agile methodology Dimensional modeling, including Kimball vs. Inmon, SCD1/SCD2/SCD3, Junk and Degenerate Dimensions, and Huge Dimensions Best practices, naming conventions, and lessons learned Loading the data warehouse, including loading Dimensions, loading Facts (Full Load, Incremental Load, Partitioned Load) Data warehouses and Big Data (Hadoop) Unit testing Tracking historical changes and managing large sizes With all the Self-Service BI hype, Data Warehouse is become more and more central every day, since if everyone will be able to analyze data using self-service tools, it’s better for him/her to rely on correct, uniform and coherent data. Already 50 people registered from the workshop and seats are limited so don’t miss this unique opportunity to attend to this workshop that is really a unique combination of years and years of experience! http://www.sqlpass.org/sqlrally/2013/nordic/Agenda/PreconferenceSeminars.aspx See you there!

    Read the article

  • Self-referencing anonymous closures: is JavaScript incomplete?

    - by Tom Auger
    Does the fact that anonymous self-referencing function closures are so prevelant in JavaScript suggest that JavaScript is an incomplete specification? We see so much of this: (function () { /* do cool stuff */ })(); and I suppose everything is a matter of taste, but does this not look like a kludge, when all you want is a private namespace? Couldn't JavaScript implement packages and proper classes? Compare to ActionScript 3, also based on EMACScript, where you get package com.tomauger { import bar; class Foo { public function Foo(){ // etc... } public function show(){ // show stuff } public function hide(){ // hide stuff } // etc... } } Contrast to the convolutions we perform in JavaScript (this, from the jQuery plugin authoring documentation): (function( $ ){ var methods = { init : function( options ) { // THIS }, show : function( ) { // IS }, hide : function( ) { // GOOD }, update : function( content ) { // !!! } }; $.fn.tooltip = function( method ) { // Method calling logic if ( methods[method] ) { return methods[ method ].apply( this, Array.prototype.slice.call( arguments, 1 )); } else if ( typeof method === 'object' || ! method ) { return methods.init.apply( this, arguments ); } else { $.error( 'Method ' + method + ' does not exist on jQuery.tooltip' ); } }; })( jQuery ); I appreciate that this question could easily degenerate into a rant about preferences and programming styles, but I'm actually very curious to hear how you seasoned programmers feel about this and whether it feels natural, like learning different idiosyncrasies of a new language, or kludgy, like a workaround to some basic programming language components that are just not implemented?

    Read the article

  • Weird order when painting triangle outlines using GL_LINE_STRIP

    - by RayDeeA
    I'm developing an app for iOS-Plaftorms using OpenGL. Currently I'm having a weird issue when painting a plane (terrain) which consists of multiple subplanes, where each subplane consists of 2 triangles forming a rect. I'm painting this terrain as a wireframe by using a call to glDrawElements and provide the parameters GL_Line_Strip and the precalculated indices. The problem is that the triangles get painted in the wrong order or are rather vertically mirrored. They do not get painted in the order how I specified the indices, which is confusing. This is the simplified code to generate the vertices: for(NSInteger y = - gridSegmentsY / 2; y < gridSegmentsY / 2; y ++) { for(NSInteger x = - gridSegmentsX / 2; x < gridSegmentsX / 2; x ++) { vertices[pos++] = x * 5; vertices[pos++] = y * 5; vertices[pos++] = 0; } } This is how I generate the indices including degenerated ones (To use as IBO). pos = 0; for(int y = 0; y < gridSegmentsY - 1; y ++) { if (y > 0) { // Degenerate begin: repeat first vertex indices[pos++] = (unsigned short)(y * gridSegmentsY); } for(int x = 0; x < gridSegmentsX; x++) { // One part of the strip indices[pos++] = (unsigned short)((y * gridSegmentsY) + x); indices[pos++] = (unsigned short)(((y + 1) * gridSegmentsY) + x); } if (y < gridSegmentsY - 2) { // Degenerate end: repeat last vertex indices[pos++] = (unsigned short)(((y + 1) * gridSegmentsY) + (gridSegmentsX - 1)); } } So in this part... indices[pos++] = (unsigned short)((y * gridSegmentsY) + x); indices[pos++] = (unsigned short)(((y + 1) * gridSegmentsY) + x); ...I'm setting the first index in the indices array to point to the current (x,y) and the next index to (x,y+1). I'm doin' this for all x's in the current strip, then I'm handling degenerated triangles and repeat this procedure for the next strip (y+1). This method is taken from http://www.learnopengles.com/android-lesson-eight-an-introduction-to-index-buffer-objects-ibos/ So I expect the resulting mesh to get painted like: a----b----c | /| /| | / | / | | / | / | |/ |/ | d----e----f | /| /| | / | / | | / | / | |/ |/ | g----h----i by painting it as described using: glDrawElements(GL_LINE_STRIP, indexCount, GL_UNSIGNED_SHORT, 0); ...since I expect GL_Line_Strip to paint first a line from (a-d), then from (d-b), then (b, e)... and so on (as specified in the indices calculation) But what actually gets painted is: *----*----* |\ |\ | | \ | \ | | \ | \ | | \| \| *----*----* |\ |\ | | \ | \ | | \ | \ | | \| \| *----*----* So the triangles are somehow painted in the wrong order and I need to know why? ;). Does somebody know? Does the problem lie in using GL_Line_Strip or is there a bug in my code? My eye is at (0.0f, 0.0f, 20.0f) and looks at (0,0,0). The mesh is painted along the x-axis & y-axis from left to right with z = 0, so the mesh should not be flipped or anything.

    Read the article

  • Setting up functional Tests in Flex

    - by Dan Monego
    I'm setting up a functional test suite for an application that loads an external configuration file. Right now, I'm using flexunit's addAsync function to load it and then again to test if the contents point to services that exist and can be accessed. The trouble with this is that having this kind of two (or more) stage method means that I'm running all of my tests in the context of one test with dozens of asserts, which seems like a kind of degenerate way to use the framework, and makes bugs harder to find. Is there a way to have something like an asynchronous setup? Is there another testing framework that handles this better?

    Read the article

  • A Taxonomy of Numerical Methods v1

    - by JoshReuben
    Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms     Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

    Read the article

  • virtual machines, dual booting and data disks on SSD

    - by stevemarvell
    This is in planning, so if I've got the strategy wrong, please let me know. There are multiple questions here, but I think they all degenerate to the same answers. The hardware is a laptop with a single SSD. I'm trying to not lose the performance of the SSD. I plan a native dual booting Windows (plus cygwin) and Linux machine which is my BYOD and represents the development environment. I keep the codebase on a shared partition (though sometimes this is an external thunderbolt SSD) which can be natively "mounted" by whichever OS is in operation. I boot into one or the other environments depending on the task in hand. Sometime I have to develop with windows tools, but generally, Linux is my preferred development environment. It would be ideal if I could VM the other OS and run either in either. I'm going to assume, because I've not found a sensible VM based solution, that I have get samba involved to share the code partition between VMs. Is this going to blow my SSD performance in the VM? The client also supplies me with a VM for the target environment, usually linux. This is not often suited to development and is used for testing only. I normally keep two copies of this, one as a sandbox and one which I deploy to using the client's preferred method. I keep these VM snapshots on the shared partition. The latter is interacted with over the network and so has no disk sharing requirements. However, it would be useful for the sandbox to be able to "mount" the code base from the natively running OS. Is this samba or nfs again, depending on the native OS? Am I missing a trick which allows this to all work smoothly with all four environments running at once without loosing the SSD performance?

    Read the article

  • Triangle Line-Segment Intersection - detecting near misses

    - by Will
    A ray is a very poor approximation of a player! I think approximating a player with a sphere traveling a straight line each game tick will solve my problems of the player intersecting edges of scenery because their line segment missed it yet their own model is not infinitely thin... I have a 3D triangle and a line segment. I have the normal triangle-line-segment intersection code which I admit I have only a woolly grasp of. To model movement and compute collisions of the player I have to determine if a line passes within sphere-radius of a triangle. But I can find no convenient line near-miss intersection code! Here's the classic triangle intersection ### commented ### code with my starting assumptions: function triangle_ray_intersection(a,b,c,ray_origin,ray_dir,ray_radius) { // http://softsurfer.com/Archive/algorithm_0105/algorithm_0105.htm#intersect_RayTriangle%28%29 // get triangle edge vectors and plane normal var u = vec3_sub(b,a); var v = vec3_sub(c,a); var n = vec3_cross(u,v); if(n[0]==0 && n[1]==0 && n[2]==0) return null; // triangle is degenerate var w0 = vec3_sub(ray_origin,a); var j = vec3_dot(n,ray_dir); if(Math.abs(j) < 0.00000001) { //### if parallel, might still pass within ray_radius of it return null; // parallel, disjoint or on plane } var i = -vec3_dot(n,w0); // get intersect point of ray with triangle plane var k = i / j; if(k < 0.0) return null; // ray goes away from triangle //### as its a line segment, k > 1+ray_radius means no intersect var hit = vec3_add(ray_origin,vec3_scale(ray_dir,k)); // intersect point of ray and plane // is I inside T? //### here I'm a bit lost; this is presumably computing barycentric coordinates? var uu = vec3_dot(u,u); var uv = vec3_dot(u,v); var vv = vec3_dot(v,v); var w = vec3_sub(hit,a); var wu = vec3_dot(w,u); var wv = vec3_dot(w,v); var D = uv * uv - uu * vv; var s = (uv * wv - vv * wu) / D; //### therefore, compute if its within ray_radius scaled to the 0..1 of barycentric coordinates? if(s<0.0 || s>1.0) return null; // I is outside T var t = (uv * wu - uu * wv) / D; if(t<0.0 || (s+t)>1.0) return null; // I is outside T //### finally, if it passses a barycentric test it might still be too far //### to a point; must check that its distance from a corner is within ray_radius too if more than one barycentric coord is >1 //### so we have rounded corners... return [hit,n]; // I is in T } Given the distance between the point of plane intersection and each corner, I ought to be able to determine distance at world scale of how far beyond the edge - beyond 1.0 in barycentric coordinates for each axis - that point is... At this point my head explodes! Is this the right track? What's the actual code? UPDATE: you can earn 100 pts on SO if you answer this question there...! How can you determine if a line segment passes within some distance of a triangle?

    Read the article

  • The theory of evolution applied to software

    - by Michel Grootjans
    I recently realized the many parallels you can draw between the theory of evolution and evolving software. Evolution is not the proverbial million monkeys typing on a million typewriters, where one of them comes up with the complete works of Shakespeare. We would have noticed by now, since the proverbial monkeys are now blogging on the Internet ;-) One of the main ideas of the theory of evolution is the balance between random mutations and natural selection. Random mutations happen all the time: millions of mutations over millions of years. Most of them are totally useless. Some of them are beneficial to the evolved species. Natural selection favors the beneficially mutated species. Less beneficial mutations die off. The mutated rabbit doesn't have to be faster than the fox. It just has to be faster than the other rabbits.   Theory of evolution Evolving software Random mutations happen all the time. Most of these mutations are so bad, the new species dies off, or cannot reproduce. Developers write new code all the time. New ideas come up during the act of writing software. The really bad ones don't get past the stage of idea. The bad ones don't get committed to source control. Natural selection favors the beneficial mutated species Good ideas and new code gets discussed in group during informal peer review. Less than good code gets refactored. Enhanced code makes it more readable, maintainable... A good set of traits makes the species superior to others. It becomes widespread A good design tends to make it easier to add new features, easier to understand the current implementations, easier to optimize for performance...thus superior. The best designs get carried over from project to project. They appear in blogs, articles and books about principles, patterns and practices.   Of course the act of writing software is deliberate. This can hardly be called random mutations. Though it sometimes might seem that code evolves through a will of its own ;-) Does this mean that evolving software (evolution) is better than a big design up front (creationism)? Not necessarily. It's a false idea to think that a project starts from scratch and everything evolves from there. Everyone carries his experience of what works and what doesn't. Up front design is necessary, but is best kept simple and minimal, just enough to get you started. Let the good experiences and ideas help to drive the process, whether they come from you or from others, from past experience or from the most junior developer on your team. Once again, balance is the keyword. Balance design up front with evolution on a daily basis. How do you know what balance is right? Through your own experience of what worked and what didn't (here's evolution again). Notes: The evolution of software can quickly degenerate without discipline. TDD is a discipline that leaves little to chance on that part. Write your test to describe the new behavior. Write just enough code to make it behave as specified. Refactor to evolve the code to a higher standard. The responsibility of good design rests continuously on each developers' shoulders. Promiscuous pair programming helps quickly spreading the design to the whole team.

    Read the article

  • Can I select 0 columns in SQL Server?

    - by Woody Zenfell III
    I am hoping this question fares a little better than the similar Create a table without columns. Yes, I am asking about something that will strike most as pointlessly academic. It is easy to produce a SELECT result with 0 rows (but with columns), e.g. SELECT a = 1 WHERE 1 = 0. Is it possible to produce a SELECT result with 0 columns (but with rows)? e.g. something like SELECT NO COLUMNS FROM Foo. (This is not valid T-SQL.) I came across this because I wanted to insert several rows without specifying any column data for any of them. e.g. (SQL Server 2005) CREATE TABLE Bar (id INT NOT NULL IDENTITY PRIMARY KEY) INSERT INTO Bar SELECT NO COLUMNS FROM Foo -- Invalid column name 'NO'. -- An explicit value for the identity column in table 'Bar' can only be specified when a column list is used and IDENTITY_INSERT is ON. One can insert a single row without specifying any column data, e.g. INSERT INTO Foo DEFAULT VALUES. One can query for a count of rows (without retrieving actual column data from the table), e.g. SELECT COUNT(*) FROM Foo. (But that result set, of course, has a column.) I tried things like INSERT INTO Bar () SELECT * FROM Foo -- Parameters supplied for object 'Bar' which is not a function. -- If the parameters are intended as a table hint, a WITH keyword is required. and INSERT INTO Bar DEFAULT VALUES SELECT * FROM Foo -- which is a standalone INSERT statement followed by a standalone SELECT statement. I can do what I need to do a different way, but the apparent lack of consistency in support for degenerate cases surprises me. I read through the relevant sections of BOL and didn't see anything. I was surprised to come up with nothing via Google either.

    Read the article

  • How do I define a Calculated Measure in MDX based on a Dimension Attribute?

    - by ShaneD
    I would like to create a calculated measure that sums up only a specific subset of records in my fact table based on a dimension attribute. Given: Dimension Date LedgerLineItem {Charge, Payment, Write-Off, Copay, Credit} Measures LedgerAmount Relationships * LedgerLineItem is a degenerate dimension of FactLedger If I break down LedgerAmount by LedgerLineItem.Type I can easily see how much is charged, paid, credit, etc, but when I do not break it down by LedgerLineItem.Type I cannot easily add the credit, paid, credit, etc into a pivot table. I would like to create separate calculated measures that sum only specific type (or multiple types) of ledger facts. An example of the desired output would be: | Year | Charged | Total Paid | Amount - Ledger | | 2008 | $1000 | $600 | -$400 | | 2009 | $2000 | $1500 | -$500 | | Total | $3000 | $2100 | -$900 | I have tried to create the calculated measure a couple of ways and each one works in some circumstances but not in others. Now before anyone says do this in ETL, I have already done it in ETL and it works just fine. What I am trying to do as part of learning to understand MDX better is to figure out how to duplicate what I have done in the ETL in MDX as so far I am unable to do that. Here are two attempts I have made and the problems with them. This works only when ledger type is in the pivot table. It returns the correct amount of the ledger entries (although in this case it is identical to [amount - ledger] but when I try to remove type and just get the sum of all ledger entries it returns unknown. CASE WHEN ([Ledger].[Type].currentMember = [Ledger].[Type].&[Credit]) OR ([Ledger].[Type].currentMember = [Ledger].[Type].&[Paid]) OR ([Ledger].[Type].currentMember = [Ledger].[Type].&[Held Money: Copay]) THEN [Measures].[Amount - ledger] ELSE 0 END This works only when ledger type is not in the pivot table. It always returns the total payment amount, which is incorrect when I am slicing by type as I would only expect to see the credit portion under credit, the paid portion, under paid, $0 under charge, etc. sum({([Ledger].[Type].&[Credit]), ([Ledger].[Type].&[Paid]), ([Ledger].[Type].&[Held Money: Copay])}, [Measures].[Amount - ledger]) Is there any way to make this return the correct numbers regardless of whether Ledger.Type is included in my pivot table or not?

    Read the article

  • How can I inject Javascript (including Prototype.js) in other sites without cluttering the global na

    - by Daniel Magliola
    I'm currently on a project that is a big site that uses the Prototype library, and there is already a humongous amount of Javascript code. We're now working on a piece of code that will get "injected" into other people's sites (picture people adding a <script> tag in their sites) which will then run our code and add a bunch of DOM elements and functionality to their site. This will have new pieces of code, and will also reuse a lot of the code that we use on our main site. The problem I have is that it's of course not cool to just add a <script> that will include Prototype in people's pages. If we do that in a page that's already using ANY framework, we're guaranteed to screw everything up. jQuery gives us the option to "rename" the $ object, so it could handle this situation decently, except obviously for the fact that we're not using jQuery, so we'd have to migrate everything. Right now i'm contemplating a number of ugly choices, and I'm not sure what's best... Rewrite everything to use jQuery, with a renamed $ object everywhere. Creating a "new" Prototype library with only the subset we'd be using in "injected" code, and renaming $ to something else. Then again I'd have to adapt the parts of my code that would be shared somehow. Not using a library at all in injected code, to keep it as clean as possible, and rewriting the shared code to use no library at all. This would obviously degenerate into us creating our own frankenstein of a library, which is probably the worst case scenario ever. I'm wondering what you guys think I could do, and also whether there's some magic option that would solve all my problems... For example, do you think I could use something like Caja / Cajita to sandbox my own code and isolate it from the rest of the site, and have Prototype inside of there? Or am I completely missing the point with that? I also read once about a technique for bookmarklets, were you add your code like this: (function() { /* your code */ })(); And then your code is all inside your anonymous function and you haven't touched the global namespace at all. Do you think I could make one file containing: (function() { /* Full Code of the Prototype file here */ /* All my code that will run in the "other" site */ InitializeStuff_CreateDOMElements_AttachEventHandlers(); })(); Would that work? Would it accomplish the objective of not cluttering the global namespace, and not killing the functionality on a site that uses jQuery, for example? Or is Prototype too complex somehow to isolate it like that? (NOTE: I think I know that that would create closures everywhere and that's slower, but I don't care too much about performance, my code is not doing anything that complex)

    Read the article

  • SQL SERVER – Weekly Series – Memory Lane – #039

    - by Pinal Dave
    Here is the list of selected articles of SQLAuthority.com across all these years. Instead of just listing all the articles I have selected a few of my most favorite articles and have listed them here with additional notes below it. Let me know which one of the following is your favorite article from memory lane. 2007 FQL – Facebook Query Language Facebook list following advantages of FQL: Condensed XML reduces bandwidth and parsing costs. More complex requests can reduce the number of requests necessary. Provides a single consistent, unified interface for all of your data. It’s fun! UDF – Get the Day of the Week Function The day of the week can be retrieved in SQL Server by using the DatePart function. The value returned by the function is between 1 (Sunday) and 7 (Saturday). To convert this to a string representing the day of the week, use a CASE statement. UDF – Function to Get Previous And Next Work Day – Exclude Saturday and Sunday While reading ColdFusion blog of Ben Nadel Getting the Previous Day In ColdFusion, Excluding Saturday And Sunday, I realize that I use similar function on my SQL Server Database. This function excludes the Weekends (Saturday and Sunday), and it gets previous as well as next work day. Complete Series of SQL Server Interview Questions and Answers Data Warehousing Interview Questions and Answers – Introduction Data Warehousing Interview Questions and Answers – Part 1 Data Warehousing Interview Questions and Answers – Part 2 Data Warehousing Interview Questions and Answers – Part 3 Data Warehousing Interview Questions and Answers Complete List Download 2008 Introduction to Log Viewer In SQL Server all the windows event logs can be seen along with SQL Server logs. Interface for all the logs is same and can be launched from the same place. This log can be exported and filtered as well. DBCC SHRINKFILE Takes Long Time to Run If you are DBA who are involved with Database Maintenance and file group maintenance, you must have experience that many times DBCC SHRINKFILE operations takes a long time but any other operations with Database are relatively quicker. mssqlsystemresource – Resource Database The purpose of resource database is to facilitates upgrading to the new version of SQL Server without any hassle. In previous versions whenever version of SQL Server was upgraded all the previous version system objects needs to be dropped and new version system objects to be created. 2009 Puzzle – Write Script to Generate Primary Key and Foreign Key In SQL Server Management Studio (SSMS), there is no option to script all the keys. If one is required to script keys they will have to manually script each key one at a time. If database has many tables, generating one key at a time can be a very intricate task. I want to throw a question to all of you if any of you have scripts for the same purpose. Maximizing View of SQL Server Management Studio – Full Screen – New Screen I had explained the following two different methods: 1) Open Results in Separate Tab - This is a very interesting method as result pan shows up in a different tab instead of the splitting screen horizontally. 2) Open SSMS in Full Screen - This works always and to its best. Not many people are aware of this method; hence, very few people use it to enhance performance. 2010 Find Queries using Parallelism from Cached Plan T-SQL script gets all the queries and their execution plan where parallelism operations are kicked up. Pay attention there is TOP 10 is used, if you have lots of transactional operations, I suggest that you change TOP 10 to TOP 50 This is the list of the all the articles in the series of computed columns. SQL SERVER – Computed Column – PERSISTED and Storage This article talks about how computed columns are created and why they take more storage space than before. SQL SERVER – Computed Column – PERSISTED and Performance This article talks about how PERSISTED columns give better performance than non-persisted columns. SQL SERVER – Computed Column – PERSISTED and Performance – Part 2 This article talks about how non-persisted columns give better performance than PERSISTED columns. SQL SERVER – Computed Column and Performance – Part 3 This article talks about how Index improves the performance of Computed Columns. SQL SERVER – Computed Column – PERSISTED and Storage – Part 2 This article talks about how creating index on computed column does not grow the row length of table. SQL SERVER – Computed Columns – Index and Performance This article summarized all the articles related to computed columns. 2011 SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 21 of 31 What is Data Warehousing? What is Business Intelligence (BI)? What is a Dimension Table? What is Dimensional Modeling? What is a Fact Table? What are the Fundamental Stages of Data Warehousing? What are the Different Methods of Loading Dimension tables? Describes the Foreign Key Columns in Fact Table and Dimension Table? What is Data Mining? What is the Difference between a View and a Materialized View? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 22 of 31 What is OLTP? What is OLAP? What is the Difference between OLTP and OLAP? What is ODS? What is ER Diagram? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 23 of 31 What is ETL? What is VLDB? Is OLTP Database is Design Optimal for Data Warehouse? If denormalizing improves Data Warehouse Processes, then why is the Fact Table is in the Normal Form? What are Lookup Tables? What are Aggregate Tables? What is Real-Time Data-Warehousing? What are Conformed Dimensions? What is a Conformed Fact? How do you Load the Time Dimension? What is a Level of Granularity of a Fact Table? What are Non-Additive Facts? What is a Factless Facts Table? What are Slowly Changing Dimensions (SCD)? SQL SERVER – Interview Questions and Answers – Frequently Asked Questions – Data Warehousing Concepts – Day 24 of 31 What is Hybrid Slowly Changing Dimension? What is BUS Schema? What is a Star Schema? What Snow Flake Schema? Differences between the Star and Snowflake Schema? What is Difference between ER Modeling and Dimensional Modeling? What is Degenerate Dimension Table? Why is Data Modeling Important? What is a Surrogate Key? What is Junk Dimension? What is a Data Mart? What is the Difference between OLAP and Data Warehouse? What is a Cube and Linked Cube with Reference to Data Warehouse? What is Snapshot with Reference to Data Warehouse? What is Active Data Warehousing? What is the Difference between Data Warehousing and Business Intelligence? What is MDS? Explain the Paradigm of Bill Inmon and Ralph Kimball. SQL SERVER – Azure Interview Questions and Answers – Guest Post by Paras Doshi – Day 25 of 31 Paras Doshi has submitted 21 interesting question and answers for SQL Azure. 1.What is SQL Azure? 2.What is cloud computing? 3.How is SQL Azure different than SQL server? 4.How many replicas are maintained for each SQL Azure database? 5.How can we migrate from SQL server to SQL Azure? 6.Which tools are available to manage SQL Azure databases and servers? 7.Tell me something about security and SQL Azure. 8.What is SQL Azure Firewall? 9.What is the difference between web edition and business edition? 10.How do we synchronize On Premise SQL server with SQL Azure? 11.How do we Backup SQL Azure Data? 12.What is the current pricing model of SQL Azure? 13.What is the current limitation of the size of SQL Azure DB? 14.How do you handle datasets larger than 50 GB? 15.What happens when the SQL Azure database reaches Max Size? 16.How many databases can we create in a single server? 17.How many servers can we create in a single subscription? 18.How do you improve the performance of a SQL Azure Database? 19.What is code near application topology? 20.What were the latest updates to SQL Azure service? 21.When does a workload on SQL Azure get throttled? SQL SERVER – Interview Questions and Answers – Guest Post by Malathi Mahadevan – Day 26 of 31 Malachi had asked a simple question which has several answers. Each answer makes you think and ponder about the reality of the IT world. Look at the simple question – ‘What is the toughest challenge you have faced in your present job and how did you handle it’? and its various answers. Each answer has its own story. SQL SERVER – Interview Questions and Answers – Guest Post by Rick Morelan – Day 27 of 31 Rick Morelan of Joes2Pros has written an excellent blog post on the subject how to find top N values. Most people are fully aware of how the TOP keyword works with a SELECT statement. After years preparing so many students to pass the SQL Certification I noticed they were pretty well prepared for job interviews too. Yes, they would do well in the interview but not great. There seemed to be a few questions that would come up repeatedly for almost everyone. Rick addresses similar questions in his lucid writing skills. 2012 Observation of Top with Index and Order of Resultset SQL Server has lots of things to learn and share. It is amazing to see how people evaluate and understand different techniques and styles differently when implementing. The real reason may be absolutely different but we may blame something totally different for the incorrect results. Read the blog post to learn more. How do I Record Video and Webcast How to Convert Hex to Decimal or INT Earlier I asked regarding a question about how to convert Hex to Decimal. I promised that I will post an answer with Due Credit to the author but never got around to post a blog post around it. Read the original post over here SQL SERVER – Question – How to Convert Hex to Decimal. Query to Get Unique Distinct Data Based on Condition – Eliminate Duplicate Data from Resultset The natural reaction will be to suggest DISTINCT or GROUP BY. However, not all the questions can be solved by DISTINCT or GROUP BY. Let us see the following example, where a user wanted only latest records to be displayed. Let us see the example to understand further. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Memory Lane, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL, Technology

    Read the article

  • PTLQueue : a scalable bounded-capacity MPMC queue

    - by Dave
    Title: Fast concurrent MPMC queue -- I've used the following concurrent queue algorithm enough that it warrants a blog entry. I'll sketch out the design of a fast and scalable multiple-producer multiple-consumer (MPSC) concurrent queue called PTLQueue. The queue has bounded capacity and is implemented via a circular array. Bounded capacity can be a useful property if there's a mismatch between producer rates and consumer rates where an unbounded queue might otherwise result in excessive memory consumption by virtue of the container nodes that -- in some queue implementations -- are used to hold values. A bounded-capacity queue can provide flow control between components. Beware, however, that bounded collections can also result in resource deadlock if abused. The put() and take() operators are partial and wait for the collection to become non-full or non-empty, respectively. Put() and take() do not allocate memory, and are not vulnerable to the ABA pathologies. The PTLQueue algorithm can be implemented equally well in C/C++ and Java. Partial operators are often more convenient than total methods. In many use cases if the preconditions aren't met, there's nothing else useful the thread can do, so it may as well wait via a partial method. An exception is in the case of work-stealing queues where a thief might scan a set of queues from which it could potentially steal. Total methods return ASAP with a success-failure indication. (It's tempting to describe a queue or API as blocking or non-blocking instead of partial or total, but non-blocking is already an overloaded concurrency term. Perhaps waiting/non-waiting or patient/impatient might be better terms). It's also trivial to construct partial operators by busy-waiting via total operators, but such constructs may be less efficient than an operator explicitly and intentionally designed to wait. A PTLQueue instance contains an array of slots, where each slot has volatile Turn and MailBox fields. The array has power-of-two length allowing mod/div operations to be replaced by masking. We assume sensible padding and alignment to reduce the impact of false sharing. (On x86 I recommend 128-byte alignment and padding because of the adjacent-sector prefetch facility). Each queue also has PutCursor and TakeCursor cursor variables, each of which should be sequestered as the sole occupant of a cache line or sector. You can opt to use 64-bit integers if concerned about wrap-around aliasing in the cursor variables. Put(null) is considered illegal, but the caller or implementation can easily check for and convert null to a distinguished non-null proxy value if null happens to be a value you'd like to pass. Take() will accordingly convert the proxy value back to null. An advantage of PTLQueue is that you can use atomic fetch-and-increment for the partial methods. We initialize each slot at index I with (Turn=I, MailBox=null). Both cursors are initially 0. All shared variables are considered "volatile" and atomics such as CAS and AtomicFetchAndIncrement are presumed to have bidirectional fence semantics. Finally T is the templated type. I've sketched out a total tryTake() method below that allows the caller to poll the queue. tryPut() has an analogous construction. Zebra stripping : alternating row colors for nice-looking code listings. See also google code "prettify" : https://code.google.com/p/google-code-prettify/ Prettify is a javascript module that yields the HTML/CSS/JS equivalent of pretty-print. -- pre:nth-child(odd) { background-color:#ff0000; } pre:nth-child(even) { background-color:#0000ff; } border-left: 11px solid #ccc; margin: 1.7em 0 1.7em 0.3em; background-color:#BFB; font-size:12px; line-height:65%; " // PTLQueue : Put(v) : // producer : partial method - waits as necessary assert v != null assert Mask = 1 && (Mask & (Mask+1)) == 0 // Document invariants // doorway step // Obtain a sequence number -- ticket // As a practical concern the ticket value is temporally unique // The ticket also identifies and selects a slot auto tkt = AtomicFetchIncrement (&PutCursor, 1) slot * s = &Slots[tkt & Mask] // waiting phase : // wait for slot's generation to match the tkt value assigned to this put() invocation. // The "generation" is implicitly encoded as the upper bits in the cursor // above those used to specify the index : tkt div (Mask+1) // The generation serves as an epoch number to identify a cohort of threads // accessing disjoint slots while s-Turn != tkt : Pause assert s-MailBox == null s-MailBox = v // deposit and pass message Take() : // consumer : partial method - waits as necessary auto tkt = AtomicFetchIncrement (&TakeCursor,1) slot * s = &Slots[tkt & Mask] // 2-stage waiting : // First wait for turn for our generation // Acquire exclusive "take" access to slot's MailBox field // Then wait for the slot to become occupied while s-Turn != tkt : Pause // Concurrency in this section of code is now reduced to just 1 producer thread // vs 1 consumer thread. // For a given queue and slot, there will be most one Take() operation running // in this section. // Consumer waits for producer to arrive and make slot non-empty // Extract message; clear mailbox; advance Turn indicator // We have an obvious happens-before relation : // Put(m) happens-before corresponding Take() that returns that same "m" for T v = s-MailBox if v != null : s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 // unlock slot to admit next producer and consumer return v Pause tryTake() : // total method - returns ASAP with failure indication for auto tkt = TakeCursor slot * s = &Slots[tkt & Mask] if s-Turn != tkt : return null T v = s-MailBox // presumptive return value if v == null : return null // ratify tkt and v values and commit by advancing cursor if CAS (&TakeCursor, tkt, tkt+1) != tkt : continue s-MailBox = null ST-ST barrier s-Turn = tkt + Mask + 1 return v The basic idea derives from the Partitioned Ticket Lock "PTL" (US20120240126-A1) and the MultiLane Concurrent Bag (US8689237). The latter is essentially a circular ring-buffer where the elements themselves are queues or concurrent collections. You can think of the PTLQueue as a partitioned ticket lock "PTL" augmented to pass values from lock to unlock via the slots. Alternatively, you could conceptualize of PTLQueue as a degenerate MultiLane bag where each slot or "lane" consists of a simple single-word MailBox instead of a general queue. Each lane in PTLQueue also has a private Turn field which acts like the Turn (Grant) variables found in PTL. Turn enforces strict FIFO ordering and restricts concurrency on the slot mailbox field to at most one simultaneous put() and take() operation. PTL uses a single "ticket" variable and per-slot Turn (grant) fields while MultiLane has distinct PutCursor and TakeCursor cursors and abstract per-slot sub-queues. Both PTL and MultiLane advance their cursor and ticket variables with atomic fetch-and-increment. PTLQueue borrows from both PTL and MultiLane and has distinct put and take cursors and per-slot Turn fields. Instead of a per-slot queues, PTLQueue uses a simple single-word MailBox field. PutCursor and TakeCursor act like a pair of ticket locks, conferring "put" and "take" access to a given slot. PutCursor, for instance, assigns an incoming put() request to a slot and serves as a PTL "Ticket" to acquire "put" permission to that slot's MailBox field. To better explain the operation of PTLQueue we deconstruct the operation of put() and take() as follows. Put() first increments PutCursor obtaining a new unique ticket. That ticket value also identifies a slot. Put() next waits for that slot's Turn field to match that ticket value. This is tantamount to using a PTL to acquire "put" permission on the slot's MailBox field. Finally, having obtained exclusive "put" permission on the slot, put() stores the message value into the slot's MailBox. Take() similarly advances TakeCursor, identifying a slot, and then acquires and secures "take" permission on a slot by waiting for Turn. Take() then waits for the slot's MailBox to become non-empty, extracts the message, and clears MailBox. Finally, take() advances the slot's Turn field, which releases both "put" and "take" access to the slot's MailBox. Note the asymmetry : put() acquires "put" access to the slot, but take() releases that lock. At any given time, for a given slot in a PTLQueue, at most one thread has "put" access and at most one thread has "take" access. This restricts concurrency from general MPMC to 1-vs-1. We have 2 ticket locks -- one for put() and one for take() -- each with its own "ticket" variable in the form of the corresponding cursor, but they share a single "Grant" egress variable in the form of the slot's Turn variable. Advancing the PutCursor, for instance, serves two purposes. First, we obtain a unique ticket which identifies a slot. Second, incrementing the cursor is the doorway protocol step to acquire the per-slot mutual exclusion "put" lock. The cursors and operations to increment those cursors serve double-duty : slot-selection and ticket assignment for locking the slot's MailBox field. At any given time a slot MailBox field can be in one of the following states: empty with no pending operations -- neutral state; empty with one or more waiting take() operations pending -- deficit; occupied with no pending operations; occupied with one or more waiting put() operations -- surplus; empty with a pending put() or pending put() and take() operations -- transitional; or occupied with a pending take() or pending put() and take() operations -- transitional. The partial put() and take() operators can be implemented with an atomic fetch-and-increment operation, which may confer a performance advantage over a CAS-based loop. In addition we have independent PutCursor and TakeCursor cursors. Critically, a put() operation modifies PutCursor but does not access the TakeCursor and a take() operation modifies the TakeCursor cursor but does not access the PutCursor. This acts to reduce coherence traffic relative to some other queue designs. It's worth noting that slow threads or obstruction in one slot (or "lane") does not impede or obstruct operations in other slots -- this gives us some degree of obstruction isolation. PTLQueue is not lock-free, however. The implementation above is expressed with polite busy-waiting (Pause) but it's trivial to implement per-slot parking and unparking to deschedule waiting threads. It's also easy to convert the queue to a more general deque by replacing the PutCursor and TakeCursor cursors with Left/Front and Right/Back cursors that can move either direction. Specifically, to push and pop from the "left" side of the deque we would decrement and increment the Left cursor, respectively, and to push and pop from the "right" side of the deque we would increment and decrement the Right cursor, respectively. We used a variation of PTLQueue for message passing in our recent OPODIS 2013 paper. ul { list-style:none; padding-left:0; padding:0; margin:0; margin-left:0; } ul#myTagID { padding: 0px; margin: 0px; list-style:none; margin-left:0;} -- -- There's quite a bit of related literature in this area. I'll call out a few relevant references: Wilson's NYU Courant Institute UltraComputer dissertation from 1988 is classic and the canonical starting point : Operating System Data Structures for Shared-Memory MIMD Machines with Fetch-and-Add. Regarding provenance and priority, I think PTLQueue or queues effectively equivalent to PTLQueue have been independently rediscovered a number of times. See CB-Queue and BNPBV, below, for instance. But Wilson's dissertation anticipates the basic idea and seems to predate all the others. Gottlieb et al : Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors Orozco et al : CB-Queue in Toward high-throughput algorithms on many-core architectures which appeared in TACO 2012. Meneghin et al : BNPVB family in Performance evaluation of inter-thread communication mechanisms on multicore/multithreaded architecture Dmitry Vyukov : bounded MPMC queue (highly recommended) Alex Otenko : US8607249 (highly related). John Mellor-Crummey : Concurrent queues: Practical fetch-and-phi algorithms. Technical Report 229, Department of Computer Science, University of Rochester Thomasson : FIFO Distributed Bakery Algorithm (very similar to PTLQueue). Scott and Scherer : Dual Data Structures I'll propose an optimization left as an exercise for the reader. Say we wanted to reduce memory usage by eliminating inter-slot padding. Such padding is usually "dark" memory and otherwise unused and wasted. But eliminating the padding leaves us at risk of increased false sharing. Furthermore lets say it was usually the case that the PutCursor and TakeCursor were numerically close to each other. (That's true in some use cases). We might still reduce false sharing by incrementing the cursors by some value other than 1 that is not trivially small and is coprime with the number of slots. Alternatively, we might increment the cursor by one and mask as usual, resulting in a logical index. We then use that logical index value to index into a permutation table, yielding an effective index for use in the slot array. The permutation table would be constructed so that nearby logical indices would map to more distant effective indices. (Open question: what should that permutation look like? Possibly some perversion of a Gray code or De Bruijn sequence might be suitable). As an aside, say we need to busy-wait for some condition as follows : "while C == 0 : Pause". Lets say that C is usually non-zero, so we typically don't wait. But when C happens to be 0 we'll have to spin for some period, possibly brief. We can arrange for the code to be more machine-friendly with respect to the branch predictors by transforming the loop into : "if C == 0 : for { Pause; if C != 0 : break; }". Critically, we want to restructure the loop so there's one branch that controls entry and another that controls loop exit. A concern is that your compiler or JIT might be clever enough to transform this back to "while C == 0 : Pause". You can sometimes avoid this by inserting a call to a some type of very cheap "opaque" method that the compiler can't elide or reorder. On Solaris, for instance, you could use :"if C == 0 : { gethrtime(); for { Pause; if C != 0 : break; }}". It's worth noting the obvious duality between locks and queues. If you have strict FIFO lock implementation with local spinning and succession by direct handoff such as MCS or CLH,then you can usually transform that lock into a queue. Hidden commentary and annotations - invisible : * And of course there's a well-known duality between queues and locks, but I'll leave that topic for another blog post. * Compare and contrast : PTLQ vs PTL and MultiLane * Equivalent : Turn; seq; sequence; pos; position; ticket * Put = Lock; Deposit Take = identify and reserve slot; wait; extract & clear; unlock * conceptualize : Distinct PutLock and TakeLock implemented as ticket lock or PTL Distinct arrival cursors but share per-slot "Turn" variable provides exclusive role-based access to slot's mailbox field put() acquires exclusive access to a slot for purposes of "deposit" assigns slot round-robin and then acquires deposit access rights/perms to that slot take() acquires exclusive access to slot for purposes of "withdrawal" assigns slot round-robin and then acquires withdrawal access rights/perms to that slot At any given time, only one thread can have withdrawal access to a slot at any given time, only one thread can have deposit access to a slot Permissible for T1 to have deposit access and T2 to simultaneously have withdrawal access * round-robin for the purposes of; role-based; access mode; access role mailslot; mailbox; allocate/assign/identify slot rights; permission; license; access permission; * PTL/Ticket hybrid Asymmetric usage ; owner oblivious lock-unlock pairing K-exclusion add Grant cursor pass message m from lock to unlock via Slots[] array Cursor performs 2 functions : + PTL ticket + Assigns request to slot in round-robin fashion Deconstruct protocol : explication put() : allocate slot in round-robin fashion acquire PTL for "put" access store message into slot associated with PTL index take() : Acquire PTL for "take" access // doorway step seq = fetchAdd (&Grant, 1) s = &Slots[seq & Mask] // waiting phase while s-Turn != seq : pause Extract : wait for s-mailbox to be full v = s-mailbox s-mailbox = null Release PTL for both "put" and "take" access s-Turn = seq + Mask + 1 * Slot round-robin assignment and lock "doorway" protocol leverage the same cursor and FetchAdd operation on that cursor FetchAdd (&Cursor,1) + round-robin slot assignment and dispersal + PTL/ticket lock "doorway" step waiting phase is via "Turn" field in slot * PTLQueue uses 2 cursors -- put and take. Acquire "put" access to slot via PTL-like lock Acquire "take" access to slot via PTL-like lock 2 locks : put and take -- at most one thread can access slot's mailbox Both locks use same "turn" field Like multilane : 2 cursors : put and take slot is simple 1-capacity mailbox instead of queue Borrow per-slot turn/grant from PTL Provides strict FIFO Lock slot : put-vs-put take-vs-take at most one put accesses slot at any one time at most one put accesses take at any one time reduction to 1-vs-1 instead of N-vs-M concurrency Per slot locks for put/take Release put/take by advancing turn * is instrumental in ... * P-V Semaphore vs lock vs K-exclusion * See also : FastQueues-excerpt.java dice-etc/queue-mpmc-bounded-blocking-circular-xadd/ * PTLQueue is the same as PTLQB - identical * Expedient return; ASAP; prompt; immediately * Lamport's Bakery algorithm : doorway step then waiting phase Threads arriving at doorway obtain a unique ticket number Threads enter in ticket order * In the terminology of Reed and Kanodia a ticket lock corresponds to the busy-wait implementation of a semaphore using an eventcount and a sequencer It can also be thought of as an optimization of Lamport's bakery lock was designed for fault-tolerance rather than performance Instead of spinning on the release counter, processors using a bakery lock repeatedly examine the tickets of their peers --

    Read the article

  • Agile Development

    - by James Oloo Onyango
    Alot of literature has and is being written about agile developement and its surrounding philosophies. In my quest to find the best way to express the importance of agile methodologies, i have found Robert C. Martin's "A Satire Of Two Companies" to be both the most concise and thorough! Enjoy the read! Rufus Inc Project Kick Off Your name is Bob. The date is January 3, 2001, and your head still aches from the recent millennial revelry. You are sitting in a conference room with several managers and a group of your peers. You are a project team leader. Your boss is there, and he has brought along all of his team leaders. His boss called the meeting. "We have a new project to develop," says your boss's boss. Call him BB. The points in his hair are so long that they scrape the ceiling. Your boss's points are just starting to grow, but he eagerly awaits the day when he can leave Brylcream stains on the acoustic tiles. BB describes the essence of the new market they have identified and the product they want to develop to exploit this market. "We must have this new project up and working by fourth quarter October 1," BB demands. "Nothing is of higher priority, so we are cancelling your current project." The reaction in the room is stunned silence. Months of work are simply going to be thrown away. Slowly, a murmur of objection begins to circulate around the conference table.   His points give off an evil green glow as BB meets the eyes of everyone in the room. One by one, that insidious stare reduces each attendee to quivering lumps of protoplasm. It is clear that he will brook no discussion on this matter. Once silence has been restored, BB says, "We need to begin immediately. How long will it take you to do the analysis?" You raise your hand. Your boss tries to stop you, but his spitwad misses you and you are unaware of his efforts.   "Sir, we can't tell you how long the analysis will take until we have some requirements." "The requirements document won't be ready for 3 or 4 weeks," BB says, his points vibrating with frustration. "So, pretend that you have the requirements in front of you now. How long will you require for analysis?" No one breathes. Everyone looks around to see whether anyone has some idea. "If analysis goes beyond April 1, we have a problem. Can you finish the analysis by then?" Your boss visibly gathers his courage: "We'll find a way, sir!" His points grow 3 mm, and your headache increases by two Tylenol. "Good." BB smiles. "Now, how long will it take to do the design?" "Sir," you say. Your boss visibly pales. He is clearly worried that his 3 mms are at risk. "Without an analysis, it will not be possible to tell you how long design will take." BB's expression shifts beyond austere.   "PRETEND you have the analysis already!" he says, while fixing you with his vacant, beady little eyes. "How long will it take you to do the design?" Two Tylenol are not going to cut it. Your boss, in a desperate attempt to save his new growth, babbles: "Well, sir, with only six months left to complete the project, design had better take no longer than 3 months."   "I'm glad you agree, Smithers!" BB says, beaming. Your boss relaxes. He knows his points are secure. After a while, he starts lightly humming the Brylcream jingle. BB continues, "So, analysis will be complete by April 1, design will be complete by July 1, and that gives you 3 months to implement the project. This meeting is an example of how well our new consensus and empowerment policies are working. Now, get out there and start working. I'll expect to see TQM plans and QIT assignments on my desk by next week. Oh, and don't forget that your crossfunctional team meetings and reports will be needed for next month's quality audit." "Forget the Tylenol," you think to yourself as you return to your cubicle. "I need bourbon."   Visibly excited, your boss comes over to you and says, "Gosh, what a great meeting. I think we're really going to do some world shaking with this project." You nod in agreement, too disgusted to do anything else. "Oh," your boss continues, "I almost forgot." He hands you a 30-page document. "Remember that the SEI is coming to do an evaluation next week. This is the evaluation guide. You need to read through it, memorize it, and then shred it. It tells you how to answer any questions that the SEI auditors ask you. It also tells you what parts of the building you are allowed to take them to and what parts to avoid. We are determined to be a CMM level 3 organization by June!"   You and your peers start working on the analysis of the new project. This is difficult because you have no requirements. But from the 10-minute introduction given by BB on that fateful morning, you have some idea of what the product is supposed to do.   Corporate process demands that you begin by creating a use case document. You and your team begin enumerating use cases and drawing oval and stick diagrams. Philosophical debates break out among the team members. There is disagreement as to whether certain use cases should be connected with <<extends>> or <<includes>> relationships. Competing models are created, but nobody knows how to evaluate them. The debate continues, effectively paralyzing progress.   After a week, somebody finds the iceberg.com Web site, which recommends disposing entirely of <<extends>> and <<includes>> and replacing them with <<precedes>> and <<uses>>. The documents on this Web site, authored by Don Sengroiux, describes a method known as stalwart-analysis, which claims to be a step-by-step method for translating use cases into design diagrams. More competing use case models are created using this new scheme, but again, people can't agree on how to evaluate them. The thrashing continues. More and more, the use case meetings are driven by emotion rather than by reason. If it weren't for the fact that you don't have requirements, you'd be pretty upset by the lack of progress you are making. The requirements document arrives on February 15. And then again on February 20, 25, and every week thereafter. Each new version contradicts the previous one. Clearly, the marketing folks who are writing the requirements, empowered though they might be, are not finding consensus.   At the same time, several new competing use case templates have been proposed by the various team members. Each template presents its own particularly creative way of delaying progress. The debates rage on. On March 1, Prudence Putrigence, the process proctor, succeeds in integrating all the competing use case forms and templates into a single, all-encompassing form. Just the blank form is 15 pages long. She has managed to include every field that appeared on all the competing templates. She also presents a 159- page document describing how to fill out the use case form. All current use cases must be rewritten according to the new standard.   You marvel to yourself that it now requires 15 pages of fill-in-the-blank and essay questions to answer the question: What should the system do when the user presses Return? The corporate process (authored by L. E. Ott, famed author of "Holistic Analysis: A Progressive Dialectic for Software Engineers") insists that you discover all primary use cases, 87 percent of all secondary use cases, and 36.274 percent of all tertiary use cases before you can complete analysis and enter the design phase. You have no idea what a tertiary use case is. So in an attempt to meet this requirement, you try to get your use case document reviewed by the marketing department, which you hope will know what a tertiary use case is.   Unfortunately, the marketing folks are too busy with sales support to talk to you. Indeed, since the project started, you have not been able to get a single meeting with marketing, which has provided a never-ending stream of changing and contradictory requirements documents.   While one team has been spinning endlessly on the use case document, another team has been working out the domain model. Endless variations of UML documents are pouring out of this team. Every week, the model is reworked.   The team members can't decide whether to use <<interfaces>> or <<types>> in the model. A huge disagreement has been raging on the proper syntax and application of OCL. Others on the team just got back from a 5-day class on catabolism, and have been producing incredibly detailed and arcane diagrams that nobody else can fathom.   On March 27, with one week to go before analysis is to be complete, you have produced a sea of documents and diagrams but are no closer to a cogent analysis of the problem than you were on January 3. **** And then, a miracle happens.   **** On Saturday, April 1, you check your e-mail from home. You see a memo from your boss to BB. It states unequivocally that you are done with the analysis! You phone your boss and complain. "How could you have told BB that we were done with the analysis?" "Have you looked at a calendar lately?" he responds. "It's April 1!" The irony of that date does not escape you. "But we have so much more to think about. So much more to analyze! We haven't even decided whether to use <<extends>> or <<precedes>>!" "Where is your evidence that you are not done?" inquires your boss, impatiently. "Whaaa . . . ." But he cuts you off. "Analysis can go on forever; it has to be stopped at some point. And since this is the date it was scheduled to stop, it has been stopped. Now, on Monday, I want you to gather up all existing analysis materials and put them into a public folder. Release that folder to Prudence so that she can log it in the CM system by Monday afternoon. Then get busy and start designing."   As you hang up the phone, you begin to consider the benefits of keeping a bottle of bourbon in your bottom desk drawer. They threw a party to celebrate the on-time completion of the analysis phase. BB gave a colon-stirring speech on empowerment. And your boss, another 3 mm taller, congratulated his team on the incredible show of unity and teamwork. Finally, the CIO takes the stage to tell everyone that the SEI audit went very well and to thank everyone for studying and shredding the evaluation guides that were passed out. Level 3 now seems assured and will be awarded by June. (Scuttlebutt has it that managers at the level of BB and above are to receive significant bonuses once the SEI awards level 3.)   As the weeks flow by, you and your team work on the design of the system. Of course, you find that the analysis that the design is supposedly based on is flawedno, useless; no, worse than useless. But when you tell your boss that you need to go back and work some more on the analysis to shore up its weaker sections, he simply states, "The analysis phase is over. The only allowable activity is design. Now get back to it."   So, you and your team hack the design as best you can, unsure of whether the requirements have been properly analyzed. Of course, it really doesn't matter much, since the requirements document is still thrashing with weekly revisions, and the marketing department still refuses to meet with you.     The design is a nightmare. Your boss recently misread a book named The Finish Line in which the author, Mark DeThomaso, blithely suggested that design documents should be taken down to code-level detail. "If we are going to be working at that level of detail," you ask, "why don't we simply write the code instead?" "Because then you wouldn't be designing, of course. And the only allowable activity in the design phase is design!" "Besides," he continues, "we have just purchased a companywide license for Dandelion! This tool enables 'Round the Horn Engineering!' You are to transfer all design diagrams into this tool. It will automatically generate our code for us! It will also keep the design diagrams in sync with the code!" Your boss hands you a brightly colored shrinkwrapped box containing the Dandelion distribution. You accept it numbly and shuffle off to your cubicle. Twelve hours, eight crashes, one disk reformatting, and eight shots of 151 later, you finally have the tool installed on your server. You consider the week your team will lose while attending Dandelion training. Then you smile and think, "Any week I'm not here is a good week." Design diagram after design diagram is created by your team. Dandelion makes it very difficult to draw these diagrams. There are dozens and dozens of deeply nested dialog boxes with funny text fields and check boxes that must all be filled in correctly. And then there's the problem of moving classes between packages. At first, these diagram are driven from the use cases. But the requirements are changing so often that the use cases rapidly become meaningless. Debates rage about whether VISITOR or DECORATOR design patterns should be used. One developer refuses to use VISITOR in any form, claiming that it's not a properly object-oriented construct. Someone refuses to use multiple inheritance, since it is the spawn of the devil. Review meetings rapidly degenerate into debates about the meaning of object orientation, the definition of analysis versus design, or when to use aggregation versus association. Midway through the design cycle, the marketing folks announce that they have rethought the focus of the system. Their new requirements document is completely restructured. They have eliminated several major feature areas and replaced them with feature areas that they anticipate customer surveys will show to be more appropriate. You tell your boss that these changes mean that you need to reanalyze and redesign much of the system. But he says, "The analysis phase is system. But he says, "The analysis phase is over. The only allowable activity is design. Now get back to it."   You suggest that it might be better to create a simple prototype to show to the marketing folks and even some potential customers. But your boss says, "The analysis phase is over. The only allowable activity is design. Now get back to it." Hack, hack, hack, hack. You try to create some kind of a design document that might reflect the new requirements documents. However, the revolution of the requirements has not caused them to stop thrashing. Indeed, if anything, the wild oscillations of the requirements document have only increased in frequency and amplitude.   You slog your way through them.   On June 15, the Dandelion database gets corrupted. Apparently, the corruption has been progressive. Small errors in the DB accumulated over the months into bigger and bigger errors. Eventually, the CASE tool just stopped working. Of course, the slowly encroaching corruption is present on all the backups. Calls to the Dandelion technical support line go unanswered for several days. Finally, you receive a brief e-mail from Dandelion, informing you that this is a known problem and that the solution is to purchase the new version, which they promise will be ready some time next quarter, and then reenter all the diagrams by hand.   ****   Then, on July 1 another miracle happens! You are done with the design!   Rather than go to your boss and complain, you stock your middle desk drawer with some vodka.   **** They threw a party to celebrate the on-time completion of the design phase and their graduation to CMM level 3. This time, you find BB's speech so stirring that you have to use the restroom before it begins. New banners and plaques are all over your workplace. They show pictures of eagles and mountain climbers, and they talk about teamwork and empowerment. They read better after a few scotches. That reminds you that you need to clear out your file cabinet to make room for the brandy. You and your team begin to code. But you rapidly discover that the design is lacking in some significant areas. Actually, it's lacking any significance at all. You convene a design session in one of the conference rooms to try to work through some of the nastier problems. But your boss catches you at it and disbands the meeting, saying, "The design phase is over. The only allowable activity is coding. Now get back to it."   ****   The code generated by Dandelion is really hideous. It turns out that you and your team were using association and aggregation the wrong way, after all. All the generated code has to be edited to correct these flaws. Editing this code is extremely difficult because it has been instrumented with ugly comment blocks that have special syntax that Dandelion needs in order to keep the diagrams in sync with the code. If you accidentally alter one of these comments, the diagrams will be regenerated incorrectly. It turns out that "Round the Horn Engineering" requires an awful lot of effort. The more you try to keep the code compatible with Dandelion, the more errors Dandelion generates. In the end, you give up and decide to keep the diagrams up to date manually. A second later, you decide that there's no point in keeping the diagrams up to date at all. Besides, who has time?   Your boss hires a consultant to build tools to count the number of lines of code that are being produced. He puts a big thermometer graph on the wall with the number 1,000,000 on the top. Every day, he extends the red line to show how many lines have been added. Three days after the thermometer appears on the wall, your boss stops you in the hall. "That graph isn't growing quickly enough. We need to have a million lines done by October 1." "We aren't even sh-sh-sure that the proshect will require a m-million linezh," you blather. "We have to have a million lines done by October 1," your boss reiterates. His points have grown again, and the Grecian formula he uses on them creates an aura of authority and competence. "Are you sure your comment blocks are big enough?" Then, in a flash of managerial insight, he says, "I have it! I want you to institute a new policy among the engineers. No line of code is to be longer than 20 characters. Any such line must be split into two or more preferably more. All existing code needs to be reworked to this standard. That'll get our line count up!"   You decide not to tell him that this will require two unscheduled work months. You decide not to tell him anything at all. You decide that intravenous injections of pure ethanol are the only solution. You make the appropriate arrangements. Hack, hack, hack, and hack. You and your team madly code away. By August 1, your boss, frowning at the thermometer on the wall, institutes a mandatory 50-hour workweek.   Hack, hack, hack, and hack. By September 1st, the thermometer is at 1.2 million lines and your boss asks you to write a report describing why you exceeded the coding budget by 20 percent. He institutes mandatory Saturdays and demands that the project be brought back down to a million lines. You start a campaign of remerging lines. Hack, hack, hack, and hack. Tempers are flaring; people are quitting; QA is raining trouble reports down on you. Customers are demanding installation and user manuals; salespeople are demanding advance demonstrations for special customers; the requirements document is still thrashing, the marketing folks are complaining that the product isn't anything like they specified, and the liquor store won't accept your credit card anymore. Something has to give.    On September 15, BB calls a meeting. As he enters the room, his points are emitting clouds of steam. When he speaks, the bass overtones of his carefully manicured voice cause the pit of your stomach to roll over. "The QA manager has told me that this project has less than 50 percent of the required features implemented. He has also informed me that the system crashes all the time, yields wrong results, and is hideously slow. He has also complained that he cannot keep up with the continuous train of daily releases, each more buggy than the last!" He stops for a few seconds, visibly trying to compose himself. "The QA manager estimates that, at this rate of development, we won't be able to ship the product until December!" Actually, you think it's more like March, but you don't say anything. "December!" BB roars with such derision that people duck their heads as though he were pointing an assault rifle at them. "December is absolutely out of the question. Team leaders, I want new estimates on my desk in the morning. I am hereby mandating 65-hour work weeks until this project is complete. And it better be complete by November 1."   As he leaves the conference room, he is heard to mutter: "Empowermentbah!" * * * Your boss is bald; his points are mounted on BB's wall. The fluorescent lights reflecting off his pate momentarily dazzle you. "Do you have anything to drink?" he asks. Having just finished your last bottle of Boone's Farm, you pull a bottle of Thunderbird from your bookshelf and pour it into his coffee mug. "What's it going to take to get this project done? " he asks. "We need to freeze the requirements, analyze them, design them, and then implement them," you say callously. "By November 1?" your boss exclaims incredulously. "No way! Just get back to coding the damned thing." He storms out, scratching his vacant head.   A few days later, you find that your boss has been transferred to the corporate research division. Turnover has skyrocketed. Customers, informed at the last minute that their orders cannot be fulfilled on time, have begun to cancel their orders. Marketing is re-evaluating whether this product aligns with the overall goals of the company. Memos fly, heads roll, policies change, and things are, overall, pretty grim. Finally, by March, after far too many sixty-five hour weeks, a very shaky version of the software is ready. In the field, bug-discovery rates are high, and the technical support staff are at their wits' end, trying to cope with the complaints and demands of the irate customers. Nobody is happy.   In April, BB decides to buy his way out of the problem by licensing a product produced by Rupert Industries and redistributing it. The customers are mollified, the marketing folks are smug, and you are laid off.     Rupert Industries: Project Alpha   Your name is Robert. The date is January 3, 2001. The quiet hours spent with your family this holiday have left you refreshed and ready for work. You are sitting in a conference room with your team of professionals. The manager of the division called the meeting. "We have some ideas for a new project," says the division manager. Call him Russ. He is a high-strung British chap with more energy than a fusion reactor. He is ambitious and driven but understands the value of a team. Russ describes the essence of the new market opportunity the company has identified and introduces you to Jane, the marketing manager, who is responsible for defining the products that will address it. Addressing you, Jane says, "We'd like to start defining our first product offering as soon as possible. When can you and your team meet with me?" You reply, "We'll be done with the current iteration of our project this Friday. We can spare a few hours for you between now and then. After that, we'll take a few people from the team and dedicate them to you. We'll begin hiring their replacements and the new people for your team immediately." "Great," says Russ, "but I want you to understand that it is critical that we have something to exhibit at the trade show coming up this July. If we can't be there with something significant, we'll lose the opportunity."   "I understand," you reply. "I don't yet know what it is that you have in mind, but I'm sure we can have something by July. I just can't tell you what that something will be right now. In any case, you and Jane are going to have complete control over what we developers do, so you can rest assured that by July, you'll have the most important things that can be accomplished in that time ready to exhibit."   Russ nods in satisfaction. He knows how this works. Your team has always kept him advised and allowed him to steer their development. He has the utmost confidence that your team will work on the most important things first and will produce a high-quality product.   * * *   "So, Robert," says Jane at their first meeting, "How does your team feel about being split up?" "We'll miss working with each other," you answer, "but some of us were getting pretty tired of that last project and are looking forward to a change. So, what are you people cooking up?" Jane beams. "You know how much trouble our customers currently have . . ." And she spends a half hour or so describing the problem and possible solution. "OK, wait a second" you respond. "I need to be clear about this." And so you and Jane talk about how this system might work. Some of her ideas aren't fully formed. You suggest possible solutions. She likes some of them. You continue discussing.   During the discussion, as each new topic is addressed, Jane writes user story cards. Each card represents something that the new system has to do. The cards accumulate on the table and are spread out in front of you. Both you and Jane point at them, pick them up, and make notes on them as you discuss the stories. The cards are powerful mnemonic devices that you can use to represent complex ideas that are barely formed.   At the end of the meeting, you say, "OK, I've got a general idea of what you want. I'm going to talk to the team about it. I imagine they'll want to run some experiments with various database structures and presentation formats. Next time we meet, it'll be as a group, and we'll start identifying the most important features of the system."   A week later, your nascent team meets with Jane. They spread the existing user story cards out on the table and begin to get into some of the details of the system. The meeting is very dynamic. Jane presents the stories in the order of their importance. There is much discussion about each one. The developers are concerned about keeping the stories small enough to estimate and test. So they continually ask Jane to split one story into several smaller stories. Jane is concerned that each story have a clear business value and priority, so as she splits them, she makes sure that this stays true.   The stories accumulate on the table. Jane writes them, but the developers make notes on them as needed. Nobody tries to capture everything that is said; the cards are not meant to capture everything but are simply reminders of the conversation.   As the developers become more comfortable with the stories, they begin writing estimates on them. These estimates are crude and budgetary, but they give Jane an idea of what the story will cost.   At the end of the meeting, it is clear that many more stories could be discussed. It is also clear that the most important stories have been addressed and that they represent several months worth of work. Jane closes the meeting by taking the cards with her and promising to have a proposal for the first release in the morning.   * * *   The next morning, you reconvene the meeting. Jane chooses five cards and places them on the table. "According to your estimates, these cards represent about one perfect team-week's worth of work. The last iteration of the previous project managed to get one perfect team-week done in 3 real weeks. If we can get these five stories done in 3 weeks, we'll be able to demonstrate them to Russ. That will make him feel very comfortable about our progress." Jane is pushing it. The sheepish look on her face lets you know that she knows it too. You reply, "Jane, this is a new team, working on a new project. It's a bit presumptuous to expect that our velocity will be the same as the previous team's. However, I met with the team yesterday afternoon, and we all agreed that our initial velocity should, in fact, be set to one perfectweek for every 3 real-weeks. So you've lucked out on this one." "Just remember," you continue, "that the story estimates and the story velocity are very tentative at this point. We'll learn more when we plan the iteration and even more when we implement it."   Jane looks over her glasses at you as if to say "Who's the boss around here, anyway?" and then smiles and says, "Yeah, don't worry. I know the drill by now."Jane then puts 15 more cards on the table. She says, "If we can get all these cards done by the end of March, we can turn the system over to our beta test customers. And we'll get good feedback from them."   You reply, "OK, so we've got our first iteration defined, and we have the stories for the next three iterations after that. These four iterations will make our first release."   "So," says Jane, can you really do these five stories in the next 3 weeks?" "I don't know for sure, Jane," you reply. "Let's break them down into tasks and see what we get."   So Jane, you, and your team spend the next several hours taking each of the five stories that Jane chose for the first iteration and breaking them down into small tasks. The developers quickly realize that some of the tasks can be shared between stories and that other tasks have commonalities that can probably be taken advantage of. It is clear that potential designs are popping into the developers' heads. From time to time, they form little discussion knots and scribble UML diagrams on some cards.   Soon, the whiteboard is filled with the tasks that, once completed, will implement the five stories for this iteration. You start the sign-up process by saying, "OK, let's sign up for these tasks." "I'll take the initial database generation." Says Pete. "That's what I did on the last project, and this doesn't look very different. I estimate it at two of my perfect workdays." "OK, well, then, I'll take the login screen," says Joe. "Aw, darn," says Elaine, the junior member of the team, "I've never done a GUI, and kinda wanted to try that one."   "Ah, the impatience of youth," Joe says sagely, with a wink in your direction. "You can assist me with it, young Jedi." To Jane: "I think it'll take me about three of my perfect workdays."   One by one, the developers sign up for tasks and estimate them in terms of their own perfect workdays. Both you and Jane know that it is best to let the developers volunteer for tasks than to assign the tasks to them. You also know full well that you daren't challenge any of the developers' estimates. You know these people, and you trust them. You know that they are going to do the very best they can.   The developers know that they can't sign up for more perfect workdays than they finished in the last iteration they worked on. Once each developer has filled his or her schedule for the iteration, they stop signing up for tasks.   Eventually, all the developers have stopped signing up for tasks. But, of course, tasks are still left on the board.   "I was worried that that might happen," you say, "OK, there's only one thing to do, Jane. We've got too much to do in this iteration. What stories or tasks can we remove?" Jane sighs. She knows that this is the only option. Working overtime at the beginning of a project is insane, and projects where she's tried it have not fared well.   So Jane starts to remove the least-important functionality. "Well, we really don't need the login screen just yet. We can simply start the system in the logged-in state." "Rats!" cries Elaine. "I really wanted to do that." "Patience, grasshopper." says Joe. "Those who wait for the bees to leave the hive will not have lips too swollen to relish the honey." Elaine looks confused. Everyone looks confused. "So . . .," Jane continues, "I think we can also do away with . . ." And so, bit by bit, the list of tasks shrinks. Developers who lose a task sign up for one of the remaining ones.   The negotiation is not painless. Several times, Jane exhibits obvious frustration and impatience. Once, when tensions are especially high, Elaine volunteers, "I'll work extra hard to make up some of the missing time." You are about to correct her when, fortunately, Joe looks her in the eye and says, "When once you proceed down the dark path, forever will it dominate your destiny."   In the end, an iteration acceptable to Jane is reached. It's not what Jane wanted. Indeed, it is significantly less. But it's something the team feels that can be achieved in the next 3 weeks.   And, after all, it still addresses the most important things that Jane wanted in the iteration. "So, Jane," you say when things had quieted down a bit, "when can we expect acceptance tests from you?" Jane sighs. This is the other side of the coin. For every story the development team implements,   Jane must supply a suite of acceptance tests that prove that it works. And the team needs these long before the end of the iteration, since they will certainly point out differences in the way Jane and the developers imagine the system's behaviour.   "I'll get you some example test scripts today," Jane promises. "I'll add to them every day after that. You'll have the entire suite by the middle of the iteration."   * * *   The iteration begins on Monday morning with a flurry of Class, Responsibilities, Collaborators sessions. By midmorning, all the developers have assembled into pairs and are rapidly coding away. "And now, my young apprentice," Joe says to Elaine, "you shall learn the mysteries of test-first design!"   "Wow, that sounds pretty rad," Elaine replies. "How do you do it?" Joe beams. It's clear that he has been anticipating this moment. "OK, what does the code do right now?" "Huh?" replied Elaine, "It doesn't do anything at all; there is no code."   "So, consider our task; can you think of something the code should do?" "Sure," Elaine said with youthful assurance, "First, it should connect to the database." "And thereupon, what must needs be required to connecteth the database?" "You sure talk weird," laughed Elaine. "I think we'd have to get the database object from some registry and call the Connect() method. "Ah, astute young wizard. Thou perceives correctly that we requireth an object within which we can cacheth the database object." "Is 'cacheth' really a word?" "It is when I say it! So, what test can we write that we know the database registry should pass?" Elaine sighs. She knows she'll just have to play along. "We should be able to create a database object and pass it to the registry in a Store() method. And then we should be able to pull it out of the registry with a Get() method and make sure it's the same object." "Oh, well said, my prepubescent sprite!" "Hay!" "So, now, let's write a test function that proves your case." "But shouldn't we write the database object and registry object first?" "Ah, you've much to learn, my young impatient one. Just write the test first." "But it won't even compile!" "Are you sure? What if it did?" "Uh . . ." "Just write the test, Elaine. Trust me." And so Joe, Elaine, and all the other developers began to code their tasks, one test case at a time. The room in which they worked was abuzz with the conversations between the pairs. The murmur was punctuated by an occasional high five when a pair managed to finish a task or a difficult test case.   As development proceeded, the developers changed partners once or twice a day. Each developer got to see what all the others were doing, and so knowledge of the code spread generally throughout the team.   Whenever a pair finished something significant whether a whole task or simply an important part of a task they integrated what they had with the rest of the system. Thus, the code base grew daily, and integration difficulties were minimized.   The developers communicated with Jane on a daily basis. They'd go to her whenever they had a question about the functionality of the system or the interpretation of an acceptance test case.   Jane, good as her word, supplied the team with a steady stream of acceptance test scripts. The team read these carefully and thereby gained a much better understanding of what Jane expected the system to do. By the beginning of the second week, there was enough functionality to demonstrate to Jane. She watched eagerly as the demonstration passed test case after test case. "This is really cool," Jane said as the demonstration finally ended. "But this doesn't seem like one-third of the tasks. Is your velocity slower than anticipated?"   You grimace. You'd been waiting for a good time to mention this to Jane but now she was forcing the issue. "Yes, unfortunately, we are going more slowly than we had expected. The new application server we are using is turning out to be a pain to configure. Also, it takes forever to reboot, and we have to reboot it whenever we make even the slightest change to its configuration."   Jane eyes you with suspicion. The stress of last Monday's negotiations had still not entirely dissipated. She says, "And what does this mean to our schedule? We can't slip it again, we just can't. Russ will have a fit! He'll haul us all into the woodshed and ream us some new ones."   You look Jane right in the eyes. There's no pleasant way to give someone news like this. So you just blurt out, "Look, if things keep going like they're going, we're not going to be done with everything by next Friday. Now it's possible that we'll figure out a way to go faster. But, frankly, I wouldn't depend on that. You should start thinking about one or two tasks that could be removed from the iteration without ruining the demonstration for Russ. Come hell or high water, we are going to give that demonstration on Friday, and I don't think you want us to choose which tasks to omit."   "Aw forchrisakes!" Jane barely manages to stifle yelling that last word as she stalks away, shaking her head. Not for the first time, you say to yourself, "Nobody ever promised me project management would be easy." You are pretty sure it won't be the last time, either.   Actually, things went a bit better than you had hoped. The team did, in fact, have to drop one task from the iteration, but Jane had chosen wisely, and the demonstration for Russ went without a hitch. Russ was not impressed with the progress, but neither was he dismayed. He simply said, "This is pretty good. But remember, we have to be able to demonstrate this system at the trade show in July, and at this rate, it doesn't look like you'll have all that much to show." Jane, whose attitude had improved dramatically with the completion of the iteration, responded to Russ by saying, "Russ, this team is working hard, and well. When July comes around, I am confident that we'll have something significant to demonstrate. It won't be everything, and some of it may be smoke and mirrors, but we'll have something."   Painful though the last iteration was, it had calibrated your velocity numbers. The next iteration went much better. Not because your team got more done than in the last iteration but simply because the team didn't have to remove any tasks or stories in the middle of the iteration.   By the start of the fourth iteration, a natural rhythm has been established. Jane, you, and the team know exactly what to expect from one another. The team is running hard, but the pace is sustainable. You are confident that the team can keep up this pace for a year or more.   The number of surprises in the schedule diminishes to near zero; however, the number of surprises in the requirements does not. Jane and Russ frequently look over the growing system and make recommendations or changes to the existing functionality. But all parties realize that these changes take time and must be scheduled. So the changes do not cause anyone's expectations to be violated. In March, there is a major demonstration of the system to the board of directors. The system is very limited and is not yet in a form good enough to take to the trade show, but progress is steady, and the board is reasonably impressed.   The second release goes even more smoothly than the first. By now, the team has figured out a way to automate Jane's acceptance test scripts. The team has also refactored the design of the system to the point that it is really easy to add new features and change old ones. The second release was done by the end of June and was taken to the trade show. It had less in it than Jane and Russ would have liked, but it did demonstrate the most important features of the system. Although customers at the trade show noticed that certain features were missing, they were very impressed overall. You, Russ, and Jane all returned from the trade show with smiles on your faces. You all felt as though this project was a winner.   Indeed, many months later, you are contacted by Rufus Inc. That company had been working on a system like this for its internal operations. Rufus has canceled the development of that system after a death-march project and is negotiating to license your technology for its environment.   Indeed, things are looking up!

    Read the article

  • Conceal packet loss in PCM stream

    - by ZeroDefect
    I am looking to use 'Packet Loss Concealment' to conceal lost PCM frames in an audio stream. Unfortunately, I cannot find a library that is accessible without all the licensing restrictions and code bloat (...up for some suggestions though). I have located some GPL code written by Steve Underwood for the Asterisk project which implements PLC. There are several limitations; although, as Steve suggests in his code, his algorithm can be applied to different streams with a bit of work. Currently, the code works with 8kHz 16-bit signed mono streams. Variations of the code can be found through a simple search of Google Code Search. My hope is that I can adapt the code to work with other streams. Initially, the goal is to adjust the algorithm for 8+ kHz, 16-bit signed, multichannel audio (all in a C++ environment). Eventually, I'm looking to make the code available under the GPL license in hopes that it could be of benefit to others... Attached is the code below with my efforts. The code includes a main function that will "drop" a number of frames with a given probability. Unfortunately, the code does not quite work as expected. I'm receiving EXC_BAD_ACCESS when running in gdb, but I don't get a trace from gdb when using 'bt' command. Clearly, I'm trampimg on memory some where but not sure exactly where. When I comment out the *amdf_pitch* function, the code runs without crashing... int main (int argc, char *argv[]) { std::ifstream fin("C:\\cc32kHz.pcm"); if(!fin.is_open()) { std::cout << "Failed to open input file" << std::endl; return 1; } std::ofstream fout_repaired("C:\\cc32kHz_repaired.pcm"); if(!fout_repaired.is_open()) { std::cout << "Failed to open output repaired file" << std::endl; return 1; } std::ofstream fout_lossy("C:\\cc32kHz_lossy.pcm"); if(!fout_lossy.is_open()) { std::cout << "Failed to open output repaired file" << std::endl; return 1; } audio::PcmConcealer Concealer; Concealer.Init(1, 16, 32000); //Generate random numbers; srand( time(NULL) ); int value = 0; int probability = 5; while(!fin.eof()) { char arr[2]; fin.read(arr, 2); //Generate's random number; value = rand() % 100 + 1; if(value <= probability) { char blank[2] = {0x00, 0x00}; fout_lossy.write(blank, 2); //Fill in data; Concealer.Fill((int16_t *)blank, 1); fout_repaired.write(blank, 2); } else { //Write data to file; fout_repaired.write(arr, 2); fout_lossy.write(arr, 2); Concealer.Receive((int16_t *)arr, 1); } } fin.close(); fout_repaired.close(); fout_lossy.close(); return 0; } PcmConcealer.hpp /* * Code adapted from Steve Underwood of the Asterisk Project. This code inherits * the same licensing restrictions as the Asterisk Project. */ #ifndef __PCMCONCEALER_HPP__ #define __PCMCONCEALER_HPP__ /** 1. What does it do? The packet loss concealment module provides a suitable synthetic fill-in signal, to minimise the audible effect of lost packets in VoIP applications. It is not tied to any particular codec, and could be used with almost any codec which does not specify its own procedure for packet loss concealment. Where a codec specific concealment procedure exists, the algorithm is usually built around knowledge of the characteristics of the particular codec. It will, therefore, generally give better results for that particular codec than this generic concealer will. 2. How does it work? While good packets are being received, the plc_rx() routine keeps a record of the trailing section of the known speech signal. If a packet is missed, plc_fillin() is called to produce a synthetic replacement for the real speech signal. The average mean difference function (AMDF) is applied to the last known good signal, to determine its effective pitch. Based on this, the last pitch period of signal is saved. Essentially, this cycle of speech will be repeated over and over until the real speech resumes. However, several refinements are needed to obtain smooth pleasant sounding results. - The two ends of the stored cycle of speech will not always fit together smoothly. This can cause roughness, or even clicks, at the joins between cycles. To soften this, the 1/4 pitch period of real speech preceeding the cycle to be repeated is blended with the last 1/4 pitch period of the cycle to be repeated, using an overlap-add (OLA) technique (i.e. in total, the last 5/4 pitch periods of real speech are used). - The start of the synthetic speech will not always fit together smoothly with the tail of real speech passed on before the erasure was identified. Ideally, we would like to modify the last 1/4 pitch period of the real speech, to blend it into the synthetic speech. However, it is too late for that. We could have delayed the real speech a little, but that would require more buffer manipulation, and hurt the efficiency of the no-lost-packets case (which we hope is the dominant case). Instead we use a degenerate form of OLA to modify the start of the synthetic data. The last 1/4 pitch period of real speech is time reversed, and OLA is used to blend it with the first 1/4 pitch period of synthetic speech. The result seems quite acceptable. - As we progress into the erasure, the chances of the synthetic signal being anything like correct steadily fall. Therefore, the volume of the synthesized signal is made to decay linearly, such that after 50ms of missing audio it is reduced to silence. - When real speech resumes, an extra 1/4 pitch period of sythetic speech is blended with the start of the real speech. If the erasure is small, this smoothes the transition. If the erasure is long, and the synthetic signal has faded to zero, the blending softens the start up of the real signal, avoiding a kind of "click" or "pop" effect that might occur with a sudden onset. 3. How do I use it? Before audio is processed, call plc_init() to create an instance of the packet loss concealer. For each received audio packet that is acceptable (i.e. not including those being dropped for being too late) call plc_rx() to record the content of the packet. Note this may modify the packet a little after a period of packet loss, to blend real synthetic data smoothly. When a real packet is not available in time, call plc_fillin() to create a sythetic substitute. That's it! */ /*! Minimum allowed pitch (66 Hz) */ #define PLC_PITCH_MIN(SAMPLE_RATE) ((double)(SAMPLE_RATE) / 66.6) /*! Maximum allowed pitch (200 Hz) */ #define PLC_PITCH_MAX(SAMPLE_RATE) ((SAMPLE_RATE) / 200) /*! Maximum pitch OLA window */ //#define PLC_PITCH_OVERLAP_MAX(SAMPLE_RATE) ((PLC_PITCH_MIN(SAMPLE_RATE)) >> 2) /*! The length over which the AMDF function looks for similarity (20 ms) */ #define CORRELATION_SPAN(SAMPLE_RATE) ((20 * (SAMPLE_RATE)) / 1000) /*! History buffer length. The buffer must also be at leat 1.25 times PLC_PITCH_MIN, but that is much smaller than the buffer needs to be for the pitch assessment. */ //#define PLC_HISTORY_LEN(SAMPLE_RATE) ((CORRELATION_SPAN(SAMPLE_RATE)) + (PLC_PITCH_MIN(SAMPLE_RATE))) namespace audio { typedef struct { /*! Consecutive erased samples */ int missing_samples; /*! Current offset into pitch period */ int pitch_offset; /*! Pitch estimate */ int pitch; /*! Buffer for a cycle of speech */ float *pitchbuf;//[PLC_PITCH_MIN]; /*! History buffer */ short *history;//[PLC_HISTORY_LEN]; /*! Current pointer into the history buffer */ int buf_ptr; } plc_state_t; class PcmConcealer { public: PcmConcealer(); ~PcmConcealer(); void Init(int channels, int bit_depth, int sample_rate); //Process a block of received audio samples. int Receive(short amp[], int frames); //Fill-in a block of missing audio samples. int Fill(short amp[], int frames); void Destroy(); private: int amdf_pitch(int min_pitch, int max_pitch, short amp[], int channel_index, int frames); void save_history(plc_state_t *s, short *buf, int channel_index, int frames); void normalise_history(plc_state_t *s); /** Holds the states of each of the channels **/ std::vector< plc_state_t * > ChannelStates; int plc_pitch_min; int plc_pitch_max; int plc_pitch_overlap_max; int correlation_span; int plc_history_len; int channel_count; int sample_rate; bool Initialized; }; } #endif PcmConcealer.cpp /* * Code adapted from Steve Underwood of the Asterisk Project. This code inherits * the same licensing restrictions as the Asterisk Project. */ #include "audio/PcmConcealer.hpp" /* We do a straight line fade to zero volume in 50ms when we are filling in for missing data. */ #define ATTENUATION_INCREMENT 0.0025 /* Attenuation per sample */ #if !defined(INT16_MAX) #define INT16_MAX (32767) #define INT16_MIN (-32767-1) #endif #ifdef WIN32 inline double rint(double x) { return floor(x + 0.5); } #endif inline short fsaturate(double damp) { if (damp > 32767.0) return INT16_MAX; if (damp < -32768.0) return INT16_MIN; return (short)rint(damp); } namespace audio { PcmConcealer::PcmConcealer() : Initialized(false) { } PcmConcealer::~PcmConcealer() { Destroy(); } void PcmConcealer::Init(int channels, int bit_depth, int sample_rate) { if(Initialized) return; if(channels <= 0 || bit_depth != 16) return; Initialized = true; channel_count = channels; this->sample_rate = sample_rate; ////////////// double min = PLC_PITCH_MIN(sample_rate); int imin = (int)min; double max = PLC_PITCH_MAX(sample_rate); int imax = (int)max; plc_pitch_min = imin; plc_pitch_max = imax; plc_pitch_overlap_max = (plc_pitch_min >> 2); correlation_span = CORRELATION_SPAN(sample_rate); plc_history_len = correlation_span + plc_pitch_min; ////////////// for(int i = 0; i < channel_count; i ++) { plc_state_t *t = new plc_state_t; memset(t, 0, sizeof(plc_state_t)); t->pitchbuf = new float[plc_pitch_min]; t->history = new short[plc_history_len]; ChannelStates.push_back(t); } } void PcmConcealer::Destroy() { if(!Initialized) return; while(ChannelStates.size()) { plc_state_t *s = ChannelStates.at(0); if(s) { if(s->history) delete s->history; if(s->pitchbuf) delete s->pitchbuf; memset(s, 0, sizeof(plc_state_t)); delete s; } ChannelStates.erase(ChannelStates.begin()); } ChannelStates.clear(); Initialized = false; } //Process a block of received audio samples. int PcmConcealer::Receive(short amp[], int frames) { if(!Initialized) return 0; int j = 0; for(int k = 0; k < ChannelStates.size(); k++) { int i; int overlap_len; int pitch_overlap; float old_step; float new_step; float old_weight; float new_weight; float gain; plc_state_t *s = ChannelStates.at(k); if (s->missing_samples) { /* Although we have a real signal, we need to smooth it to fit well with the synthetic signal we used for the previous block */ /* The start of the real data is overlapped with the next 1/4 cycle of the synthetic data. */ pitch_overlap = s->pitch >> 2; if (pitch_overlap > frames) pitch_overlap = frames; gain = 1.0 - s->missing_samples * ATTENUATION_INCREMENT; if (gain < 0.0) gain = 0.0; new_step = 1.0/pitch_overlap; old_step = new_step*gain; new_weight = new_step; old_weight = (1.0 - new_step)*gain; for (i = 0; i < pitch_overlap; i++) { int index = (i * channel_count) + j; amp[index] = fsaturate(old_weight * s->pitchbuf[s->pitch_offset] + new_weight * amp[index]); if (++s->pitch_offset >= s->pitch) s->pitch_offset = 0; new_weight += new_step; old_weight -= old_step; if (old_weight < 0.0) old_weight = 0.0; } s->missing_samples = 0; } save_history(s, amp, j, frames); j++; } return frames; } //Fill-in a block of missing audio samples. int PcmConcealer::Fill(short amp[], int frames) { if(!Initialized) return 0; int j =0; for(int k = 0; k < ChannelStates.size(); k++) { short *tmp = new short[plc_pitch_overlap_max]; int i; int pitch_overlap; float old_step; float new_step; float old_weight; float new_weight; float gain; short *orig_amp; int orig_len; orig_amp = amp; orig_len = frames; plc_state_t *s = ChannelStates.at(k); if (s->missing_samples == 0) { // As the gap in real speech starts we need to assess the last known pitch, //and prepare the synthetic data we will use for fill-in normalise_history(s); s->pitch = amdf_pitch(plc_pitch_min, plc_pitch_max, s->history + plc_history_len - correlation_span - plc_pitch_min, j, correlation_span); // We overlap a 1/4 wavelength pitch_overlap = s->pitch >> 2; // Cook up a single cycle of pitch, using a single of the real signal with 1/4 //cycle OLA'ed to make the ends join up nicely // The first 3/4 of the cycle is a simple copy for (i = 0; i < s->pitch - pitch_overlap; i++) s->pitchbuf[i] = s->history[plc_history_len - s->pitch + i]; // The last 1/4 of the cycle is overlapped with the end of the previous cycle new_step = 1.0/pitch_overlap; new_weight = new_step; for ( ; i < s->pitch; i++) { s->pitchbuf[i] = s->history[plc_history_len - s->pitch + i]*(1.0 - new_weight) + s->history[plc_history_len - 2*s->pitch + i]*new_weight; new_weight += new_step; } // We should now be ready to fill in the gap with repeated, decaying cycles // of what is in pitchbuf // We need to OLA the first 1/4 wavelength of the synthetic data, to smooth // it into the previous real data. To avoid the need to introduce a delay // in the stream, reverse the last 1/4 wavelength, and OLA with that. gain = 1.0; new_step = 1.0/pitch_overlap; old_step = new_step; new_weight = new_step; old_weight = 1.0 - new_step; for (i = 0; i < pitch_overlap; i++) { int index = (i * channel_count) + j; amp[index] = fsaturate(old_weight * s->history[plc_history_len - 1 - i] + new_weight * s->pitchbuf[i]); new_weight += new_step; old_weight -= old_step; if (old_weight < 0.0) old_weight = 0.0; } s->pitch_offset = i; } else { gain = 1.0 - s->missing_samples*ATTENUATION_INCREMENT; i = 0; } for ( ; gain > 0.0 && i < frames; i++) { int index = (i * channel_count) + j; amp[index] = s->pitchbuf[s->pitch_offset]*gain; gain -= ATTENUATION_INCREMENT; if (++s->pitch_offset >= s->pitch) s->pitch_offset = 0; } for ( ; i < frames; i++) { int index = (i * channel_count) + j; amp[i] = 0; } s->missing_samples += orig_len; save_history(s, amp, j, frames); delete [] tmp; j++; } return frames; } void PcmConcealer::save_history(plc_state_t *s, short *buf, int channel_index, int frames) { if (frames >= plc_history_len) { /* Just keep the last part of the new data, starting at the beginning of the buffer */ //memcpy(s->history, buf + len - plc_history_len, sizeof(short)*plc_history_len); int frames_to_copy = plc_history_len; for(int i = 0; i < frames_to_copy; i ++) { int index = (channel_count * (i + frames - plc_history_len)) + channel_index; s->history[i] = buf[index]; } s->buf_ptr = 0; return; } if (s->buf_ptr + frames > plc_history_len) { /* Wraps around - must break into two sections */ //memcpy(s->history + s->buf_ptr, buf, sizeof(short)*(plc_history_len - s->buf_ptr)); short *hist_ptr = s->history + s->buf_ptr; int frames_to_copy = plc_history_len - s->buf_ptr; for(int i = 0; i < frames_to_copy; i ++) { int index = (channel_count * i) + channel_index; hist_ptr[i] = buf[index]; } frames -= (plc_history_len - s->buf_ptr); //memcpy(s->history, buf + (plc_history_len - s->buf_ptr), sizeof(short)*len); frames_to_copy = frames; for(int i = 0; i < frames_to_copy; i ++) { int index = (channel_count * (i + (plc_history_len - s->buf_ptr))) + channel_index; s->history[i] = buf[index]; } s->buf_ptr = frames; return; } /* Can use just one section */ //memcpy(s->history + s->buf_ptr, buf, sizeof(short)*len); short *hist_ptr = s->history + s->buf_ptr; int frames_to_copy = frames; for(int i = 0; i < frames_to_copy; i ++) { int index = (channel_count * i) + channel_index; hist_ptr[i] = buf[index]; } s->buf_ptr += frames; } void PcmConcealer::normalise_history(plc_state_t *s) { short *tmp = new short[plc_history_len]; if (s->buf_ptr == 0) return; memcpy(tmp, s->history, sizeof(short)*s->buf_ptr); memcpy(s->history, s->history + s->buf_ptr, sizeof(short)*(plc_history_len - s->buf_ptr)); memcpy(s->history + plc_history_len - s->buf_ptr, tmp, sizeof(short)*s->buf_ptr); s->buf_ptr = 0; delete [] tmp; } int PcmConcealer::amdf_pitch(int min_pitch, int max_pitch, short amp[], int channel_index, int frames) { int i; int j; int acc; int min_acc; int pitch; pitch = min_pitch; min_acc = INT_MAX; for (i = max_pitch; i <= min_pitch; i++) { acc = 0; for (j = 0; j < frames; j++) { int index1 = (channel_count * (i+j)) + channel_index; int index2 = (channel_count * j) + channel_index; //std::cout << "Index 1: " << index1 << ", Index 2: " << index2 << std::endl; acc += abs(amp[index1] - amp[index2]); } if (acc < min_acc) { min_acc = acc; pitch = i; } } std::cout << "Pitch: " << pitch << std::endl; return pitch; } } P.S. - I must confess that digital audio is not my forte...

    Read the article

1