Search Results

Search found 145 results on 6 pages for 'analogy'.

Page 6/6 | < Previous Page | 2 3 4 5 6

How can I implement a site with ASP.NET MVC without using Visual Studio?

- by Cheeso

I have seen ASP.NET MVC Without Visual Studio, which asks, Is it possible to produce a website based on ASP.NET MVC, without using Visual Studio? And the accepted answer is, yes. Ok, next question: how? Here's an analogy. If I want to create an ASP.NET Webforms page, I load up my favorite text editor, create a file named Something.aspx. Then I insert into that file, some boilerplate: <%@ Page Language="C#" Debug="true" Trace="false" Src="Sourcefile.cs" Inherits="My.Namespace.ContentsPage" %> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Title goes here </title> <link rel="stylesheet" type="text/css" href="css/style.css"></link> <style type="text/css"> #elementid { font-size: 9pt; color: Navy; ... more css ... } </style> <script type="text/javascript" language='javascript'> // insert javascript here. </script> </head> <body> <asp:Literal Id='Holder' runat='server'/> <br/> <div id='msgs'></div> </body> </html> Then I also create the Sourcefile.cs file: namespace My.Namespace { using System; using System.Web; using System.Xml; // etc... public class ContentsPage : System.Web.UI.Page { protected System.Web.UI.WebControls.Literal Holder; void Page_Load(Object sender, EventArgs e) { // page load logic here } } } And that is a working ASPNET page, created in a text editor. Drop it into an IIS virtual directory, and it's working. What do I have to do, to make a basic, hello, World ASPNET MVC app, in a text editor? (without Visual Studio) Suppose I want a basic MVC app with a controller, one view, and a simple model. What files would I need to create, and what would go into them?

Read the article
How can I implement ASP.NET MVC without using Visual Studio?

- by Cheeso

I have seen ASP.NET MVC Without Visual Studio, which asks, Is it possible to produce a website based on ASP.NET MVC, without using Visual Studio? And the accepted answer is, yes. Ok, next question: how? Here's an analogy. If I want to create an ASP.NET Webforms page, I load up my favorite text editor, create a file named Something.aspx. Then I insert into that file, some boilerplate: <%@ Page Language="C#" Debug="true" Trace="false" Src="Sourcefile.cs" Inherits="My.Namespace.ContentsPage" %> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"> <head> <title>Title goes here </title> <link rel="stylesheet" type="text/css" href="css/style.css"></link> <style type="text/css"> #elementid { font-size: 9pt; color: Navy; ... more css ... } </style> <script type="text/javascript" language='javascript'> // insert javascript here. </script> </head> <body> <asp:Literal Id='Holder' runat='server'/> <br/> <div id='msgs'></div> </body> </html> Then I also create the Sourcefile.cs file: namespace My.Namespace { using System; using System.Web; using System.Xml; // etc... public class ContentsPage : System.Web.UI.Page { protected System.Web.UI.WebControls.Literal Holder; void Page_Load(Object sender, EventArgs e) { // page load logic here } } } And that is a working ASPNET page, created in a text editor. Drop it into an IIS virtual directory, and it's working. What do I have to do, to make a basic, hello, World ASPNET MVC app, in a text editor? (without Visual Studio)

Read the article
How do I do distributed UML development (à la FOSS)?

- by James A. Rosen

I have a UML project (built in IBM's Rational System Architect/Modeler, so stored in their XML format) that has grown quite large. Additionally, it now contains several pieces that other groups would like to re-use. I come from a software development (especially FOSS) background, and am trying to understand how to use that as an analogy here. The problem I am grappling with is similar to the Fragile Base Class problem. Let me start with how it works in an object-oriented (say, Java or Ruby) FOSS ecosystem: Group 1 publishes some "core" package, say "net/smtp version 1.0" Group 2 includes Group 1's net/smtp 1.0 package in the vendor library of their software project At some point, Group 1 creates a new 2.0 branch of net/smtp that breaks backwards compatibility (say, it removes an old class or method, or moves a class from one package to another). They tell users of the 1.0 version that it will be deprecated in one year. Group 2, when they have the time, updates to net/smtp 2.0. When they drop in the new package, their compiler (or test suite, for Ruby) tells them about the incompatibility. They do have to make some manual changes, but all of the changes are in the code, in plain text, a medium with which they are quite familiar. Plus, they can often use their IDE's (or text editor's) "global-search-and-replace" function once they figure out what the fixes are. When we try to apply this model to UML in RSA, we run into some problems. RSA supports some fairly powerful refactorings, but they seem to only work if you have write access to all of the pieces. If I rename a class in one package, RSA can rename the references, but only at the same time. It's very difficult to look at the underlying source (the XML) and figure out what's broken. To fix such a problem in the RSA editor itself means tons of clicking on things -- there is no good equivalent of "global-search-and-replace," at least not after an incomplete refactor. They real sticking point seems to be that RSA assumes that you want to do all your editing using their GUI, but that makes certain operations prohibitively difficult. Does anyone have examples of open-source UML projects that have overcome this problem? What strategies do they use for communicating changes?

Read the article
Is a many-to-many relationship with extra fields the right tool for my job?

- by whichhand

Previously had a go at asking a more specific version of this question, but had trouble articulating what my question was. On reflection that made me doubt if my chosen solution was correct for the problem, so this time I will explain the problem and ask if a) I am on the right track and b) if there is a way around my current brick wall. I am currently building a web interface to enable an existing database to be interrogated by (a small number of) users. Sticking with the analogy from the docs, I have models that look something like this: class Musician(models.Model): first_name = models.CharField(max_length=50) last_name = models.CharField(max_length=50) dob = models.DateField() class Album(models.Model): artist = models.ForeignKey(Musician) name = models.CharField(max_length=100) class Instrument(models.Model): artist = models.ForeignKey(Musician) name = models.CharField(max_length=100) Where I have one central table (Musician) and several tables of associated data that are related by either ForeignKey or OneToOneFields. Users interact with the database by creating filtering criteria to select a subset of Musicians based on data the data on the main or related tables. Likewise, the users can then select what piece of data is used to rank results that are presented to them. The results are then viewed initially as a 2 dimensional table with a single row per Musician with selected data fields (or aggregates) in each column. To give you some idea of scale, the database has ~5,000 Musicians with around 20 fields of related data. Up to here is fine and I have a working implementation. However, it is important that I have the ability for a given user to upload there own annotation data sets (more than one) and then filter and order on these in the same way they can with the existing data. The way I had tried to do this was to add the models: class UserDataSets(models.Model): user = models.ForeignKey(User) name = models.CharField(max_length=100) description = models.CharField(max_length=64) results = models.ManyToManyField(Musician, through='UserData') class UserData(models.Model): artist = models.ForeignKey(Musician) dataset = models.ForeignKey(UserDataSets) score = models.IntegerField() class Meta: unique_together = (("artist", "dataset"),) I have a simple upload mechanism enabling users to upload a data set file that consists of 1 to 1 relationship between a Musician and their "score". Within a given user dataset each artist will be unique, but different datasets are independent from each other and will often contain entries for the same musician. This worked fine for displaying the data, starting from a given artist I can do something like this: artist = Musician.objects.get(pk=1) dataset = UserDataSets.objects.get(pk=5) print artist.userdata_set.get(dataset=dataset.pk) However, this approach fell over when I came to implement the filtering and ordering of query set of musicians based on the data contained in a single user data set. For example, I could easily order the query set based on all of the data in the UserData table like this: artists = Musician.objects.all().order_by(userdata__score) But that does not help me order by the results of a given single user dataset. Likewise I need to be able to filter the query set based on the "scores" from different user data sets (eg find all musicians with a score 5 in dataset1 and < 2 in dataset2). Is there a way of doing this, or am I going about the whole thing wrong?

Read the article
Agile Awakenings and the Rules of Agile

- by Robert May

For those that care, you can read my history of management and technology to understand why I think I’m qualified to talk about this at all. It’s boring, so feel free to skip it. Awakenings I first started to play around with the idea of “agile” in 2004 or 2005. I found a book on the Rational Unified Process that I thought was good, and attempted to implement parts of it. I thought I was agile, but really, it wasn’t. I still didn’t understand the concept of a team. I still wanted to tell the team what to do and how to get it done. I still thought I was smarter than the team. After that job, I started work on another project and began helping that team. The first few months were really rough. We were implementing Scrum, which was relatively new to everyone on the team, and, quite frankly, I was doing a poor job of it. I was trying to micro-manage every aspect of the teams work, and we were all miserable. The moment of change came when the senior architect bailed on the project. His comment to me was: “This isn’t Agile. Where are the stand-ups? Where are the stories?” He was dead on, and I finally woke up. I finally realized that I was the problem! I wasn’t trusting the team. I wasn’t helping the team. I was being a manager. Like many (most?), I was claiming to be Agile and use Scrum, but I wasn’t in fact following the rules Scrum. Since then, I’ve done a lot of studying, hands on practice, coaching of many different teams, and other learning around Scrum, and I have discovered that Scrum has some rules that must be followed for success, even though the process is about continuous improvement. I’ve been practicing Scrum right for about 4 years now and have helped multiple teams implement it successfully, so what you’re about to get is based on experience, rather than just theory. The Rules of Scrum In my experience, what I’ve found is that most companies that claim to be doing Scrum or Agile are actually NOT doing either. This stems largely because they think that they can “adopt the rules of Agile that fit their organization.” Sadly, many of them think that this means they can adopt iterations (sprints) and not much else. Either that, or they think they can do whatever they want, or were doing before, and call it Scrum. This is simply not true. Here are some rules that must be followed for you to really be doing Scrum. I’ll go into detail on each one of these posts in future blog posts and update links here. My intent is that this will help other teams implementing scrum to see more success. Agile does not allow you to do whatever you want A Product Owner is required A ScrumMaster is required The team must function as a Team, and QA must be part of the team Support from upper management is required A prioritized product backlog is required A prioritized sprint backlog is required Release planning is required Complete spring planning is required Showcases are required Velocity must be measured Retrospectives are required Daily stand-ups are required Visibility is absolutely required For now, I think that’s enough, although I reserve the right to add more. If you’re breaking any of these rules, you’re probably not doing Scrum. There are exceptions to these rules, but until you have practiced Scrum for a while, you don’t know what those exceptions are. Breaking the Rules Many teams break these rules because they are the ones that expose the most pain. Scrum is not Advil. It’s not intended to mask the pain, its intended to cure it. Let me explain that analogy a bit more. Recently, my 7 year old son broke his arm, quite severely (see the X-Ray to the right). That caused him a great deal of pain. We went first to one doctor, and after viewing the X-Ray, they determined that there was no way that they’d cast the arm at their location. It was simply too bad of a break for them to deal with. They did, however, give him some Advil for the pain and put a splint on his arm to stabilize the broken bones. Within minutes, he was feeling much better. Had we been stupid, we could have gone home and he’d have been just as happy as ever . . . until the pain medication wore off or one of his siblings touched the splint. Then, all of that pain would come right back to the top. Sure, he could make it go away by just taking more Advil and moving the splint out of the way, but that wasn’t going to fix the problem permanently. We ended up in an emergency room with a doctor who could fix his arm. However, we were warned that the fix was going to be VERY painful, and it was. Even with heavy sedation (Propofol), my son was in enough pain that he squirmed and wiggled trying to get his arm away from the doctor. He had to endure this pain in order to have a functional arm. But the setting wasn’t the end. He had to have several casts, had to have it re-broken once, since the first setting didn’t take and finally was given a clean bill of health. Agile implementation is much like this story. Agile was developed as a result of people recognizing that the development methodologies that were currently in place simply were ineffective. However, the fix to the broken development that’s been festering for many years is not painless. Many people start Agile thinking that things will be wonderful. They won’t! Agile is about visibility, and often, it brings great pain to surface. It causes all of the missed deadlines, the cowboy coders, the coasters, the micro-managers, the lazy, and all of the other problems that are really part of your development process now to become painfully visible to EVERYONE. Many people don’t like this exposure. Agile will make the pain better, but not if you remove the cast (the rules above) prematurely and start breaking the rules that expose the most pain. The healing will take time and is not instant (like Advil). Figuring out what the true source of pain and fixing it is very valuable to you, your team, and your company. Remember as you’re doing this that Agile isn’t the source of the pain, it’s really just exposing it. Find the source. My recommendation is that ALL of these rules are followed for a minimum of six months, and preferably for an entire year, before you decide to break any of these rules. Get a few good releases under your belt. Figure out what your velocity is and start firing as a team. Chances are, after you see agile really in action, you won’t want to break the rules because you’ll see their value. More Reading Jean Tabaka recently published a list of 78 Things I Have Learned in 6 Years of Agile Coaching. Highly recommended. Technorati Tags: Agile,Scrum,Rules

Read the article
A Guided Tour of Complexity

- by JoshReuben

I just re-read Complexity – A Guided Tour by Melanie Mitchell , protégé of Douglas Hofstadter ( author of “Gödel, Escher, Bach”) http://www.amazon.com/Complexity-Guided-Tour-Melanie-Mitchell/dp/0199798109/ref=sr_1_1?ie=UTF8&qid=1339744329&sr=8-1 here are some notes and links: Evolved from Cybernetics, General Systems Theory, Synergetics some interesting transdisciplinary fields to investigate: Chaos Theory - http://en.wikipedia.org/wiki/Chaos_theory – small differences in initial conditions (such as those due to rounding errors in numerical computation) yield widely diverging outcomes for chaotic systems, rendering long-term prediction impossible. System Dynamics / Cybernetics - http://en.wikipedia.org/wiki/System_Dynamics – study of how feedback changes system behavior Network Theory - http://en.wikipedia.org/wiki/Network_theory – leverage Graph Theory to analyze symmetric / asymmetric relations between discrete objects Algebraic Topology - http://en.wikipedia.org/wiki/Algebraic_topology – leverage abstract algebra to analyze topological spaces There are limits to deterministic systems & to computation. Chaos Theory definitely applies to training an ANN (artificial neural network) – different weights will emerge depending upon the random selection of the training set. In recursive Non-Linear systems http://en.wikipedia.org/wiki/Nonlinear_system – output is not directly inferable from input. E.g. a Logistic map: Xt+1 = R Xt(1-Xt) Different types of bifurcations, attractor states and oscillations may occur – e.g. a Lorenz Attractor http://en.wikipedia.org/wiki/Lorenz_system Feigenbaum Constants http://en.wikipedia.org/wiki/Feigenbaum_constants express ratios in a bifurcation diagram for a non-linear map – the convergent limit of R (the rate of period-doubling bifurcations) is 4.6692016 Maxwell’s Demon - http://en.wikipedia.org/wiki/Maxwell%27s_demon - the Second Law of Thermodynamics has only a statistical certainty – the universe (and thus information) tends towards entropy. While any computation can theoretically be done without expending energy, with finite memory, the act of erasing memory is permanent and increases entropy. Life & thought is a counter-example to the universe’s tendency towards entropy. Leo Szilard and later Claude Shannon came up with the Information Theory of Entropy - http://en.wikipedia.org/wiki/Entropy_(information_theory) whereby Shannon entropy quantifies the expected value of a message’s information in bits in order to determine channel capacity and leverage Coding Theory (compression analysis). Ludwig Boltzmann came up with Statistical Mechanics - http://en.wikipedia.org/wiki/Statistical_mechanics – whereby our Newtonian perception of continuous reality is a probabilistic and statistical aggregate of many discrete quantum microstates. This is relevant for Quantum Information Theory http://en.wikipedia.org/wiki/Quantum_information and the Physics of Information - http://en.wikipedia.org/wiki/Physical_information. Hilbert’s Problems http://en.wikipedia.org/wiki/Hilbert's_problems pondered whether mathematics is complete, consistent, and decidable (the Decision Problem – http://en.wikipedia.org/wiki/Entscheidungsproblem – is there always an algorithm that can determine whether a statement is true). Godel’s Incompleteness Theorems http://en.wikipedia.org/wiki/G%C3%B6del's_incompleteness_theorems proved that mathematics cannot be both complete and consistent (e.g. “This statement is not provable”). Turing through the use of Turing Machines (http://en.wikipedia.org/wiki/Turing_machine symbol processors that can prove mathematical statements) and Universal Turing Machines (http://en.wikipedia.org/wiki/Universal_Turing_machine Turing Machines that can emulate other any Turing Machine via accepting programs as well as data as input symbols) that computation is limited by demonstrating the Halting Problem http://en.wikipedia.org/wiki/Halting_problem (is is not possible to know when a program will complete – you cannot build an infinite loop detector). You may be used to thinking of 1 / 2 / 3 dimensional systems, but Fractal http://en.wikipedia.org/wiki/Fractal systems are defined by self-similarity & have non-integer Hausdorff Dimensions !!! http://en.wikipedia.org/wiki/List_of_fractals_by_Hausdorff_dimension – the fractal dimension quantifies the number of copies of a self similar object at each level of detail – eg Koch Snowflake - http://en.wikipedia.org/wiki/Koch_snowflake Definitions of complexity: size, Shannon entropy, Algorithmic Information Content (http://en.wikipedia.org/wiki/Algorithmic_information_theory - size of shortest program that can generate a description of an object) Logical depth (amount of info processed), thermodynamic depth (resources required). Complexity is statistical and fractal. John Von Neumann’s other machine was the Self-Reproducing Automaton http://en.wikipedia.org/wiki/Self-replicating_machine . Cellular Automata http://en.wikipedia.org/wiki/Cellular_automaton are alternative form of Universal Turing machine to traditional Von Neumann machines where grid cells are locally synchronized with their neighbors according to a rule. Conway’s Game of Life http://en.wikipedia.org/wiki/Conway's_Game_of_Life demonstrates various emergent constructs such as “Glider Guns” and “Spaceships”. Cellular Automatons are not practical because logical ops require a large number of cells – wasteful & inefficient. There are no compilers or general program languages available for Cellular Automatons (as far as I am aware). Random Boolean Networks http://en.wikipedia.org/wiki/Boolean_network are extensions of cellular automata where nodes are connected at random (not to spatial neighbors) and each node has its own rule –> they demonstrate the emergence of complex & self organized behavior. Stephen Wolfram’s (creator of Mathematica, so give him the benefit of the doubt) New Kind of Science http://en.wikipedia.org/wiki/A_New_Kind_of_Science proposes the universe may be a discrete Finite State Automata http://en.wikipedia.org/wiki/Finite-state_machine whereby reality emerges from simple rules. I am 2/3 through this book. It is feasible that the universe is quantum discrete at the plank scale and that it computes itself – Digital Physics: http://en.wikipedia.org/wiki/Digital_physics – a simulated reality? Anyway, all behavior is supposedly derived from simple algorithmic rules & falls into 4 patterns: uniform , nested / cyclical, random (Rule 30 http://en.wikipedia.org/wiki/Rule_30) & mixed (Rule 110 - http://en.wikipedia.org/wiki/Rule_110 localized structures – it is this that is interesting). interaction between colliding propagating signal inputs is then information processing. Wolfram proposes the Principle of Computational Equivalence - http://mathworld.wolfram.com/PrincipleofComputationalEquivalence.html - all processes that are not obviously simple can be viewed as computations of equivalent sophistication. Meaning in information may emerge from analogy & conceptual slippages – see the CopyCat program: http://cognitrn.psych.indiana.edu/rgoldsto/courses/concepts/copycat.pdf Scale Free Networks http://en.wikipedia.org/wiki/Scale-free_network have a distribution governed by a Power Law (http://en.wikipedia.org/wiki/Power_law - much more common than Normal Distribution). They are characterized by hubs (resilience to random deletion of nodes), heterogeneity of degree values, self similarity, & small world structure. They grow via preferential attachment http://en.wikipedia.org/wiki/Preferential_attachment – tipping points triggered by positive feedback loops. 2 theories of cascading system failures in complex systems are Self-Organized Criticality http://en.wikipedia.org/wiki/Self-organized_criticality and Highly Optimized Tolerance http://en.wikipedia.org/wiki/Highly_optimized_tolerance. Computational Mechanics http://en.wikipedia.org/wiki/Computational_mechanics – use of computational methods to study phenomena governed by the principles of mechanics. This book is a great intuition pump, but does not cover the more mathematical subject of Computational Complexity Theory – http://en.wikipedia.org/wiki/Computational_complexity_theory I am currently reading this book on this subject: http://www.amazon.com/Computational-Complexity-Christos-H-Papadimitriou/dp/0201530821/ref=pd_sim_b_1 stay tuned for that review!

Read the article
Building KPIs to monitor your business Its not really about the Technology

When I have discussions with people about Business Intelligence, one of the questions the inevitably come up is about building KPIs and how to accomplish that. From a technical level the concept of a KPI is very simple, almost too simple in that it is like the tip of an iceberg floating above the water. The key to that iceberg is not really the tip, but the mass of the iceberg that is hidden beneath the surface upon which the tip sits. The analogy of the iceberg is not meant to indicate that the foundation of the KPI is overly difficult or complex. The disparity in size in meant to indicate that the larger thing that needs to be defined is not the technical tip, but the underlying business definition of what the KPI means. From a technical perspective the KPI consists of primarily the following items: Actual Value This is the actual value data point that is being measured. An example would be something like the amount of sales. Target Value This is the target goal for the KPI. This is a number that can be measured against Actual Value. An example would be $10,000 in monthly sales. Target Indicator Range This is the definition of ranges that define what type of indicator the user will see comparing the Actual Value to the Target Value. Most often this is defined by stoplight, but can be any indicator that is going to show a status in a quick fashion to the user. Typically this would be something like: Red Light = Actual Value more than 5% below target; Yellow Light = Within 5% of target either direction; Green Light = More than 5% higher than Target Value Status\Trend Indicator This is an optional attribute of a KPI that is typically used to show some kind of trend. The vast majority of these indicators are used to show some type of progress against a previous period. As an example, the status indicator might be used to show how the monthly sales compare to last month. With this type of indicator there needs to be not only a definition of what the ranges are for your status indictor, but then also what value the number needs to be compared against. So now we have an idea of what data points a KPI consists of from a technical perspective lets talk a bit about tools. As you can see technically there is not a whole lot to them and the choice of technology is not as important as the definition of the KPIs, which we will get to in a minute. There are many different types of tools in the Microsoft BI stack that you can use to expose your KPI to the business. These include Performance Point, SharePoint, Excel, and SQL Reporting Services. There are pluses and minuses to each technology and the right technology is based a lot on your goals and how you want to deliver the information to the users. Additionally, there are other non-Microsoft tools that can be used to expose KPI indicators to your business users. Regardless of the technology used as your front end, the heavy lifting of KPI is in the business definition of the values and benchmarks for that KPI. The discussion about KPIs is very dependent on the history of an organization and how much they are exposed to the attributes of a KPI. Often times when discussing KPIs with a business contact who has not been exposed to KPIs the discussion tends to also be a session educating the business user about what a KPI is and what goes into the definition of a KPI. The majority of times the business user has an idea of what their actual values are and they have been tracking those numbers for some time, generally in Excel and all manually. So they will know the amount of sales last month along with sales two years ago in the same month. Where the conversation tends to get stuck is when you start discussing what the target value should be. The actual value is answering the What and How much questions. When you are talking about the Target values you are asking the question Is this number good or bad. Typically, the user will know whether or not the value is good or bad, but most of the time they are not able to quantify what is good or bad. Their response is usually something like I just know. Because they have been watching the sales quantity for years now, they can tell you that a 5% decrease in sales this month might actually be a good thing, maybe because the salespeople are all waiting until next month when the new versions come out. It can sometimes be very hard to break the business people of this habit. One of the fears generally is that the status indicator is not subjective. Thus, in the scenario above, the business user is going to be fearful that their boss, just looking at a negative red indicator, is going to haul them out to the woodshed for a bad month. But, on the flip side, if all you are displaying is the amount of sales, only a person with knowledge of last month sales and the target amount for this month would have any idea if $10,000 in sales is good or not. Here is where a key point about KPIs needs to be communicated to both the business user and any user who might be viewing the results of that KPI. The KPI is just one tool that is used to report on business performance. The KPI is meant as a quick indicator of one business statistic. It is not meant to tell the entire story. It does not answer the question Why. Its primary purpose is to objectively and quickly expose an area of the business that might warrant more review. There is always going to be the need to do further analysis on any potential negative or neutral KPI. So, hopefully, once you have convinced your business user to come up with some target numbers and ranges for status indicators, you then need to take the next step and help them answer the Why question. The main question here to ask is, Okay, you see the indicator and you need to discover why the number is what is, where do you go?. The answer is usually a combination of sources. A sales manager might have some of the following items at their disposal (Marketing report showing a decrease in the promotional discounts for the month, Pricing Report showing the reduction of prices of older models, an Inventory Report showing the discontinuation of a particular product line, or a memo showing the ending of a large affiliate partnership. The answers to the question Why are never as simple as a single indicator value. Bring able to quickly get to this information is all about designing how a user accesses the KPIs and then also how easily they can get to the additional information they need. This is where a Dashboard mentality can come in handy. For example, the business user can have a dashboard that shows their KPIs, but also has links to some of the common reports that they run regarding Sales Data. The users boss may have the same KPIs on their dashboard, but instead of links to individual reports they are going to have a link to a status report that was created by the user that pulls together all the data about the KPI in a summary format the users boss can review. So some of the key things to think about when building or evaluating KPIs for your organization: Technology should not be the driving factor KPIs are of little value without some indicator for whether a value is good, bad or neutral. KPIs only give an answer to the Is this number good\bad? question Make sure the ability to drill into the Why of a KPI is close at hand and relevant to the user who is viewing the KPI. The KPI is a key business tool when defined properly to help monitor business performance across the enterprise in an objective and consistent manner. At times it might feel like the process of defining the business aspects of a KPI can sometimes be arduous, the payoff in the end can far outweigh the costs. Some of the benefits of going through this process are a better understanding of the key metrics for an organization and the measure of those metrics and a consistent snapshot of business performance that can be utilized across the organization. And I think that these are benefits to any organization regardless of the technology or the implementation.Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
Master-slave vs. peer-to-peer archictecture: benefits and problems

- by Ashok_Ora

Normal 0 false false false EN-US X-NONE X-NONE Almost two decades ago, I was a member of a database development team that introduced adaptive locking. Locking, the most popular concurrency control technique in database systems, is pessimistic. Locking ensures that two or more conflicting operations on the same data item don’t “trample” on each other’s toes, resulting in data corruption. In a nutshell, here’s the issue we were trying to address. In everyday life, traffic lights serve the same purpose. They ensure that traffic flows smoothly and when everyone follows the rules, there are no accidents at intersections. As I mentioned earlier, the problem with typical locking protocols is that they are pessimistic. Regardless of whether there is another conflicting operation in the system or not, you have to hold a lock! Acquiring and releasing locks can be quite expensive, depending on how many objects the transaction touches. Every transaction has to pay this penalty. To use the earlier traffic light analogy, if you have ever waited at a red light in the middle of nowhere with no one on the road, wondering why you need to wait when there’s clearly no danger of a collision, you know what I mean. The adaptive locking scheme that we invented was able to minimize the number of locks that a transaction held, by detecting whether there were one or more transactions that needed conflicting eyou could get by without holding any lock at all. In many “well-behaved” workloads, there are few conflicts, so this optimization is a huge win. If, on the other hand, there are many concurrent, conflicting requests, the algorithm gracefully degrades to the “normal” behavior with minimal cost. We were able to reduce the number of lock requests per TPC-B transaction from 178 requests down to 2! Wow! This is a dramatic improvement in concurrency as well as transaction latency. The lesson from this exercise was that if you can identify the common scenario and optimize for that case so that only the uncommon scenarios are more expensive, you can make dramatic improvements in performance without sacrificing correctness. So how does this relate to the architecture and design of some of the modern NoSQL systems? NoSQL systems can be broadly classified as master-slave sharded, or peer-to-peer sharded systems. NoSQL systems with a peer-to-peer architecture have an interesting way of handling changes. Whenever an item is changed, the client (or an intermediary) propagates the changes synchronously or asynchronously to multiple copies (for availability) of the data. Since the change can be propagated asynchronously, during some interval in time, it will be the case that some copies have received the update, and others haven’t. What happens if someone tries to read the item during this interval? The client in a peer-to-peer system will fetch the same item from multiple copies and compare them to each other. If they’re all the same, then every copy that was queried has the same (and up-to-date) value of the data item, so all’s good. If not, then the system provides a mechanism to reconcile the discrepancy and to update stale copies. So what’s the problem with this? There are two major issues: First, IT’S HORRIBLY PESSIMISTIC because, in the common case, it is unlikely that the same data item will be updated and read from different locations at around the same time! For every read operation, you have to read from multiple copies. That’s a pretty expensive, especially if the data are stored in multiple geographically separate locations and network latencies are high. Second, if the copies are not all the same, the application has to reconcile the differences and propagate the correct value to the out-dated copies. This means that the application program has to handle discrepancies in the different versions of the data item and resolve the issue (which can further add to cost and operation latency). Resolving discrepancies is only one part of the problem. What if the same data item was updated independently on two different nodes (copies)? In that case, due to the asynchronous nature of change propagation, you might land up with different versions of the data item in different copies. In this case, the application program also has to resolve conflicts and then propagate the correct value to the copies that are out-dated or have incorrect versions. This can get really complicated. My hunch is that there are many peer-to-peer-based applications that don’t handle this correctly, and worse, don’t even know it. Imagine have 100s of millions of records in your database – how can you tell whether a particular data item is incorrect or out of date? And what price are you willing to pay for ensuring that the data can be trusted? Multiple network messages per read request? Discrepancy and conflict resolution logic in the application, and potentially, additional messages? All this overhead, when all you were trying to do was to read a data item. Wouldn’t it be simpler to avoid this problem in the first place? Master-slave architectures like the Oracle NoSQL Database handles this very elegantly. A change to a data item is always sent to the master copy. Consequently, the master copy always has the most current and authoritative version of the data item. The master is also responsible for propagating the change to the other copies (for availability and read scalability). Client drivers are aware of master copies and replicas, and client drivers are also aware of the “currency” of a replica. In other words, each NoSQL Database client knows how stale a replica is. This vastly simplifies the job of the application developer. If the application needs the most current version of the data item, the client driver will automatically route the request to the master copy. If the application is willing to tolerate some staleness of data (e.g. a version that is no more than 1 second out of date), the client can easily determine which replica (or set of replicas) can satisfy the request, and route the request to the most efficient copy. This results in a dramatic simplification in application logic and also minimizes network requests (the driver will only send the request to exactl the right replica, not many). So, back to my original point. A well designed and well architected system minimizes or eliminates unnecessary overhead and avoids pessimistic algorithms wherever possible in order to deliver a highly efficient and high performance system. If you’ve every programmed an Oracle NoSQL Database application, you’ll know the difference! /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

Read the article
D2K to OA Framework Transition

- by PRajkumar

What is the difference between D2K form and OA Framework? It is a very innocent but important question for someone that desires to make transition from D2K to OA Framework. I hope you have already read and implemented OA Framework Getting Started. I will re-visit my own experience of implementing HelloWorld program in "OA Framework". When I implemented HelloWorld a year ago, I had no clue as to what I was doing & why I was doing those steps. I merely copied the steps from Oracle Tutorial without understanding them. Hence in this blog, I will try to explain in simple manner the meaning of OA Framework HelloWorld Program and compare the steps to D2K form [where possible]. To keep things simple, only basics will be discussed. Following key Steps were needed for HelloWorld Step 1 Create a new Workspace and a new Project as dictated by Oracle's tutorial. When defining project, you will specify a default package, which in this case was oracle.apps.ak.hello This means the following: - ak is the short name of the Application in Oracle [means fnd_applications.short_name] hello is the name of your project Step 2 Next, you will create a OA Page within hello project Think OA Page as the fmx file itself in D2K. I am saying so because this page gets attached to the form function. This page will be created within hello project, hence the package name oracle.apps.ak.hello.webui Note the webui, it is a convention to have page in webui, means this page represents the Web User Interface You will assign the default AM [OAApplicationModule]. Think of AM "Connection Manager" and "Transaction State Manager" for your page I can't co-relate this to anything in D2k, as there is no concept of Connection Pooling and that D2k is not stateless. Reason being that as soon as you kick off a D2K Form, it connects to a single session of Oracle and sticks to that single Oracle database session. So is not the case in OAF, hence AM is needed. Step 3 You create Region within the Page. · Region is what will store your fields. Text input fields will be of type messageTextInput. Think of Canvas in D2K. You can have nested regions. Stacked Canvas in D2K comes the closest to this component of OA Framework Step 4 Add a button to one of the nested regions The itemStyle should be submitButton, in case you want the page to be submitted when this button is clicked There is no WHEN-BUTTON-PRESSED trigger in OAF. In Framework, you will add a controller java code to handle events like Form Submit button clicks. JDeveloper generates the default code for you. Primarily two functions [should I call methods] will be created processRequest [for UI Rendering Handling] and processFormRequest Think of processRequest as WHEN-NEW-FORM-INSTANCE, though processRequest is very restrictive. Note What is the difference between processRequest and processFormRequest? These two methods are available in the Default Controller class that gets created. processFormRequest This method is commonly used to react/respond to the event that has taken place, for example click of a button. Some examples are if(oapagecontext.getParameter("Cancel") != null) (Do your processing for Cancellation/ Rollback) if(oapagecontext.getParameter("Submit") != null) (Do your validations and commit here) if(oapagecontext.getParameter("Update") != null) (Do your validations and commit here) In the above three examples, you could be calling oapagecontext.forwardImmediately to re-direct the page navigation to some other page if needed. processRequest In this method, usually page rendering related code is written. Effectively, each GUI component is a bean that gets initialised during processRequest. Those who are familiar with D2K forms, something like pre-query may be written in this method. Step 5 In the controller to access the value in field "HelloName" the command is String userContent = pageContext.getParameter("HelloName"); In D2k, we used :block.field. In OAFramework, at submission of page, all the field values get passed into to OAPageContext object. Use getParameter to access the field value To set the value of the field, use OAMessageTextInputBean field HelloName = (OAMessageTextInputBean)webBean.findChildRecursive("HelloName"); fieldHelloName.setText(pageContext,"Setting the default value" ); Note when setting field value in controller: Note 1. Do not set the value in processFormRequest Note 2. If the field comes from View Object, then do not use setText in controller Note 3. For control fields [that are not based on View Objects], you can use setText to assign values in processRequest method Lets take some notes to expand beyond the HelloWorld Project Note 1 In D2K-forms we sort of created a Window, attached to Canvas, and then fields within that Canvas. However in OA Framework, think of Page being fmx/Window, think of Region being a Canvas, and fields being within Regions. This is not a formal/accurate understanding of analogy between D2k and Framework, but is close to being logical. Note 2 In D2k, your Forms fmb file was compiled to fmx. It was fmx file that was deployed on mid-tier. In case of OAF, your OA Page is nothing but a XML file. We call this MDS [meta data]. Whatever name you give to "Page" in OAF, an XML file of the same name gets created. This xml file must then be loaded into database by using XML Importer command. Note 3 Apart from MDS XML file, almost everything else is merely deployed to your mid-tier. Usually this is underneath $JAVA_TOP/oracle/apps/../.. All java files will go underneath java top/oracle/apps/../.. etc. Note 4 When building tutorial, ignore the steps for setting "Attribute Sets". These are not mandatory. Oracle might just have developed their tutorials without including these. Think of these like Visual Attributes of D2K forms Note 5 Controller is where you will write any java code in OA Framework. You can create a Controller per Page or have a different Controller for each of the Regions with the same Page. Note 6 In the method processFormRequest of the Controller, you can access the values of the page by using notation pageContext.getParameter("<fieldname here>"). This method processFormRequest is executed when the OAF Screen/Page is submitted by click of a button. Note 7 Inside the controller, all the Database Related interactions for example interaction with View Objects happen via Application Module. But why so? Because Application Module Manages the transaction state of the Application. OAApplicationModuleImpl oaapplicationmoduleimpl = OAApplicationModuleImpl)oapagecontext.getApplicationModule(oawebbean); OADBTransaction oadbtransaction = OADBTransaction)oaapplicationmoduleimpl.getDBTransaction(); Note 8 In D2K, we have control block or a block based on database view. Similarly, in OA Framework, if the field does not have view Object attached, then it is like a control field. Hence in HelloWorld example, field HelloName is a control field [in D2K terminology]. A view Object can either be based on a view/table, synonym or on a SQL statement. Note 9 I wish to access the fields in multi record block that is based on view Object. Can I do this in Controller? Sure you can. To traverse through those records, do the below · Get the reference to the View Object using (OAViewObject)oapagecontext.getApplicationModule(oawebbean).findViewObject("VO Name Here") · Loop through the records in View Objects using count returned from oaviewobject.getFetchedRowCount() · For each record, fetch the value of the fields within the loop as oracle.jbo.Row row = oaviewobject.getRowAtRangeIndex(loop index here); (String)row.getAttribute("Column name of VO here ");

Read the article
Metro: Promises

- by Stephen.Walther

The goal of this blog entry is to describe the Promise class in the WinJS library. You can use promises whenever you need to perform an asynchronous operation such as retrieving data from a remote website or a file from the file system. Promises are used extensively in the WinJS library. Asynchronous Programming Some code executes immediately, some code requires time to complete or might never complete at all. For example, retrieving the value of a local variable is an immediate operation. Retrieving data from a remote website takes longer or might not complete at all. When an operation might take a long time to complete, you should write your code so that it executes asynchronously. Instead of waiting for an operation to complete, you should start the operation and then do something else until you receive a signal that the operation is complete. An analogy. Some telephone customer service lines require you to wait on hold – listening to really bad music – until a customer service representative is available. This is synchronous programming and very wasteful of your time. Some newer customer service lines enable you to enter your telephone number so the customer service representative can call you back when a customer representative becomes available. This approach is much less wasteful of your time because you can do useful things while waiting for the callback. There are several patterns that you can use to write code which executes asynchronously. The most popular pattern in JavaScript is the callback pattern. When you call a function which might take a long time to return a result, you pass a callback function to the function. For example, the following code (which uses jQuery) includes a function named getFlickrPhotos which returns photos from the Flickr website which match a set of tags (such as “dog” and “funny”): function getFlickrPhotos(tags, callback) { $.getJSON( "http://api.flickr.com/services/feeds/photos_public.gne?jsoncallback=?", { tags: tags, tagmode: "all", format: "json" }, function (data) { if (callback) { callback(data.items); } } ); } getFlickrPhotos("funny, dogs", function(data) { $.each(data, function(index, item) { console.log(item); }); }); The getFlickr() function includes a callback parameter. When you call the getFlickr() function, you pass a function to the callback parameter which gets executed when the getFlicker() function finishes retrieving the list of photos from the Flickr web service. In the code above, the callback function simply iterates through the results and writes each result to the console. Using callbacks is a natural way to perform asynchronous programming with JavaScript. Instead of waiting for an operation to complete, sitting there and listening to really bad music, you can get a callback when the operation is complete. Using Promises The CommonJS website defines a promise like this (http://wiki.commonjs.org/wiki/Promises): “Promises provide a well-defined interface for interacting with an object that represents the result of an action that is performed asynchronously, and may or may not be finished at any given point in time. By utilizing a standard interface, different components can return promises for asynchronous actions and consumers can utilize the promises in a predictable manner.” A promise provides a standard pattern for specifying callbacks. In the WinJS library, when you create a promise, you can specify three callbacks: a complete callback, a failure callback, and a progress callback. Promises are used extensively in the WinJS library. The methods in the animation library, the control library, and the binding library all use promises. For example, the xhr() method included in the WinJS base library returns a promise. The xhr() method wraps calls to the standard XmlHttpRequest object in a promise. The following code illustrates how you can use the xhr() method to perform an Ajax request which retrieves a file named Photos.txt: var options = { url: "/data/photos.txt" }; WinJS.xhr(options).then( function (xmlHttpRequest) { console.log("success"); var data = JSON.parse(xmlHttpRequest.responseText); console.log(data); }, function(xmlHttpRequest) { console.log("fail"); }, function(xmlHttpRequest) { console.log("progress"); } ) The WinJS.xhr() method returns a promise. The Promise class includes a then() method which accepts three callback functions: a complete callback, an error callback, and a progress callback: Promise.then(completeCallback, errorCallback, progressCallback) In the code above, three anonymous functions are passed to the then() method. The three callbacks simply write a message to the JavaScript Console. The complete callback also dumps all of the data retrieved from the photos.txt file. Creating Promises You can create your own promises by creating a new instance of the Promise class. The constructor for the Promise class requires a function which accepts three parameters: a complete, error, and progress function parameter. For example, the code below illustrates how you can create a method named wait10Seconds() which returns a promise. The progress function is called every second and the complete function is not called until 10 seconds have passed: (function () { "use strict"; var app = WinJS.Application; function wait10Seconds() { return new WinJS.Promise(function (complete, error, progress) { var seconds = 0; var intervalId = window.setInterval(function () { seconds++; progress(seconds); if (seconds > 9) { window.clearInterval(intervalId); complete(); } }, 1000); }); } app.onactivated = function (eventObject) { if (eventObject.detail.kind === Windows.ApplicationModel.Activation.ActivationKind.launch) { wait10Seconds().then( function () { console.log("complete") }, function () { console.log("error") }, function (seconds) { console.log("progress:" + seconds) } ); } } app.start(); })(); All of the work happens in the constructor function for the promise. The window.setInterval() method is used to execute code every second. Every second, the progress() callback method is called. If more than 10 seconds have passed then the complete() callback method is called and the clearInterval() method is called. When you execute the code above, you can see the output in the Visual Studio JavaScript Console. Creating a Timeout Promise In the previous section, we created a custom Promise which uses the window.setInterval() method to complete the promise after 10 seconds. We really did not need to create a custom promise because the Promise class already includes a static method for returning promises which complete after a certain interval. The code below illustrates how you can use the timeout() method. The timeout() method returns a promise which completes after a certain number of milliseconds. WinJS.Promise.timeout(3000).then( function(){console.log("complete")}, function(){console.log("error")}, function(){console.log("progress")} ); In the code above, the Promise completes after 3 seconds (3000 milliseconds). The Promise returned by the timeout() method does not support progress events. Therefore, the only message written to the console is the message “complete” after 10 seconds. Canceling Promises Some promises, but not all, support cancellation. When you cancel a promise, the promise’s error callback is executed. For example, the following code uses the WinJS.xhr() method to perform an Ajax request. However, immediately after the Ajax request is made, the request is cancelled. // Specify Ajax request options var options = { url: "/data/photos.txt" }; // Make the Ajax request var request = WinJS.xhr(options).then( function (xmlHttpRequest) { console.log("success"); }, function (xmlHttpRequest) { console.log("fail"); }, function (xmlHttpRequest) { console.log("progress"); } ); // Cancel the Ajax request request.cancel(); When you run the code above, the message “fail” is written to the Visual Studio JavaScript Console. Composing Promises You can build promises out of other promises. In other words, you can compose promises. There are two static methods of the Promise class which you can use to compose promises: the join() method and the any() method. When you join promises, a promise is complete when all of the joined promises are complete. When you use the any() method, a promise is complete when any of the promises complete. The following code illustrates how to use the join() method. A new promise is created out of two timeout promises. The new promise does not complete until both of the timeout promises complete: WinJS.Promise.join([WinJS.Promise.timeout(1000), WinJS.Promise.timeout(5000)]) .then(function () { console.log("complete"); }); The message “complete” will not be written to the JavaScript Console until both promises passed to the join() method completes. The message won’t be written for 5 seconds (5,000 milliseconds). The any() method completes when any promise passed to the any() method completes: WinJS.Promise.any([WinJS.Promise.timeout(1000), WinJS.Promise.timeout(5000)]) .then(function () { console.log("complete"); }); The code above writes the message “complete” to the JavaScript Console after 1 second (1,000 milliseconds). The message is written to the JavaScript console immediately after the first promise completes and before the second promise completes. Summary The goal of this blog entry was to describe WinJS promises. First, we discussed how promises enable you to easily write code which performs asynchronous actions. You learned how to use a promise when performing an Ajax request. Next, we discussed how you can create your own promises. You learned how to create a new promise by creating a constructor function with complete, error, and progress parameters. Finally, you learned about several advanced methods of promises. You learned how to use the timeout() method to create promises which complete after an interval of time. You also learned how to cancel promises and compose promises from other promises.

Read the article
Performance Optimization – It Is Faster When You Can Measure It

- by Alois Kraus

Performance optimization in bigger systems is hard because the measured numbers can vary greatly depending on the measurement method of your choice. To measure execution timing of specific methods in your application you usually use Time Measurement Method Potential Pitfalls Stopwatch Most accurate method on recent processors. Internally it uses the RDTSC instruction. Since the counter is processor specific you can get greatly different values when your thread is scheduled to another core or the core goes into a power saving mode. But things do change luckily: Intel's Designer's vol3b, section 16.11.1 "16.11.1 Invariant TSC The time stamp counter in newer processors may support an enhancement, referred to as invariant TSC. Processor's support for invariant TSC is indicated by CPUID.80000007H:EDX[8]. The invariant TSC will run at a constant rate in all ACPI P-, C-. and T-states. This is the architectural behavior moving forward. On processors with invariant TSC support, the OS may use the TSC for wall clock timer services (instead of ACPI or HPET timers). TSC reads are much more efficient and do not incur the overhead associated with a ring transition or access to a platform resource." DateTime.Now Good but it has only a resolution of 16ms which can be not enough if you want more accuracy. Reporting Method Potential Pitfalls Console.WriteLine Ok if not called too often. Debug.Print Are you really measuring performance with Debug Builds? Shame on you. Trace.WriteLine Better but you need to plug in some good output listener like a trace file. But be aware that the first time you call this method it will read your app.config and deserialize your system.diagnostics section which does also take time. In general it is a good idea to use some tracing library which does measure the timing for you and you only need to decorate some methods with tracing so you can later verify if something has changed for the better or worse. In my previous article I did compare measuring performance with quantum mechanics. This analogy does work surprising well. When you measure a quantum system there is a lower limit how accurately you can measure something. The Heisenberg uncertainty relation does tell us that you cannot measure of a quantum system the impulse and location of a particle at the same time with infinite accuracy. For programmers the two variables are execution time and memory allocations. If you try to measure the timings of all methods in your application you will need to store them somewhere. The fastest storage space besides the CPU cache is the memory. But if your timing values do consume all available memory there is no memory left for the actual application to run. On the other hand if you try to record all memory allocations of your application you will also need to store the data somewhere. This will cost you memory and execution time. These constraints are always there and regardless how good the marketing of tool vendors for performance and memory profilers are: Any measurement will disturb the system in a non predictable way. Commercial tool vendors will tell you they do calculate this overhead and subtract it from the measured values to give you the most accurate values but in reality it is not entirely true. After falling into the trap to trust the profiler timings several times I have got into the habit to Measure with a profiler to get an idea where potential bottlenecks are. Measure again with tracing only the specific methods to check if this method is really worth optimizing. Optimize it Measure again. Be surprised that your optimization has made things worse. Think harder Implement something that really works. Measure again Finished! - Or look for the next bottleneck. Recently I have looked into issues with serialization performance. For serialization DataContractSerializer was used and I was not sure if XML is really the most optimal wire format. After looking around I have found protobuf-net which uses Googles Protocol Buffer format which is a compact binary serialization format. What is good for Google should be good for us. A small sample app to check out performance was a matter of minutes: using ProtoBuf; using System; using System.Diagnostics; using System.IO; using System.Reflection; using System.Runtime.Serialization; [DataContract, Serializable] class Data { [DataMember(Order=1)] public int IntValue { get; set; } [DataMember(Order = 2)] public string StringValue { get; set; } [DataMember(Order = 3)] public bool IsActivated { get; set; } [DataMember(Order = 4)] public BindingFlags Flags { get; set; } } class Program { static MemoryStream _Stream = new MemoryStream(); static MemoryStream Stream { get { _Stream.Position = 0; _Stream.SetLength(0); return _Stream; } } static void Main(string[] args) { DataContractSerializer ser = new DataContractSerializer(typeof(Data)); Data data = new Data { IntValue = 100, IsActivated = true, StringValue = "Hi this is a small string value to check if serialization does work as expected" }; var sw = Stopwatch.StartNew(); int Runs = 1000 * 1000; for (int i = 0; i < Runs; i++) { //ser.WriteObject(Stream, data); Serializer.Serialize<Data>(Stream, data); } sw.Stop(); Console.WriteLine("Did take {0:N0}ms for {1:N0} objects", sw.Elapsed.TotalMilliseconds, Runs); Console.ReadLine(); } } The results are indeed promising: Serializer Time in ms N objects protobuf-net 807 1000000 DataContract 4402 1000000 Nearly a factor 5 faster and a much more compact wire format. Lets use it! After switching over to protbuf-net the transfered wire data has dropped by a factor two (good) and the performance has worsened by nearly a factor two. How is that possible? We have measured it? Protobuf-net is much faster! As it turns out protobuf-net is faster but it has a cost: For the first time a type is de/serialized it does use some very smart code-gen which does not come for free. Lets try to measure this one by setting of our performance test app the Runs value not to one million but to 1. Serializer Time in ms N objects protobuf-net 85 1 DataContract 24 1 The code-gen overhead is significant and can take up to 200ms for more complex types. The break even point where the code-gen cost is amortized by its faster serialization performance is (assuming small objects) somewhere between 20.000-40.000 serialized objects. As it turned out my specific scenario involved about 100 types and 1000 serializations in total. That explains why the good old DataContractSerializer is not so easy to take out of business. The final approach I ended up was to reduce the number of types and to serialize primitive types via BinaryWriter directly which turned out to be a pretty good alternative. It sounded good until I measured again and found that my optimizations so far do not help much. After looking more deeper at the profiling data I did found that one of the 1000 calls did take 50% of the time. So how do I find out which call it was? Normal profilers do fail short at this discipline. A (totally undeserved) relatively unknown profiler is SpeedTrace which does unlike normal profilers create traces of your applications by instrumenting your IL code at runtime. This way you can look at the full call stack of the one slow serializer call to find out if this stack was something special. Unfortunately the call stack showed nothing special. But luckily I have my own tracing as well and I could see that the slow serializer call did happen during the serialization of a bool value. When you encounter after much analysis something unreasonable you cannot explain it then the chances are good that your thread was suspended by the garbage collector. If there is a problem with excessive GCs remains to be investigated but so far the serialization performance seems to be mostly ok. When you do profile a complex system with many interconnected processes you can never be sure that the timings you just did measure are accurate at all. Some process might be hitting the disc slowing things down for all other processes for some seconds as well. There is a big difference between warm and cold startup. If you restart all processes you can basically forget the first run because of the OS disc cache, JIT and GCs make the measured timings very flexible. When you are in need of a random number generator you should measure cold startup times of a sufficiently complex system. After the first run you can try again getting different and much lower numbers. Now try again at least two times to get some feeling how stable the numbers are. Oh and try to do the same thing the next day. It might be that the bottleneck you found yesterday is gone today. Thanks to GC and other random stuff it can become pretty hard to find stuff worth optimizing if no big bottlenecks except bloatloads of code are left anymore. When I have found a spot worth optimizing I do make the code changes and do measure again to check if something has changed. If it has got slower and I am certain that my change should have made it faster I can blame the GC again. The thing is that if you optimize stuff and you allocate less objects the GC times will shift to some other location. If you are unlucky it will make your faster working code slower because you see now GCs at times where none were before. This is where the stuff does get really tricky. A safe escape hatch is to create a repro of the slow code in an isolated application so you can change things fast in a reliable manner. Then the normal profilers do also start working again. As Vance Morrison does point out it is much more complex to profile a system against the wall clock compared to optimize for CPU time. The reason is that for wall clock time analysis you need to understand how your system does work and which threads (if you have not one but perhaps 20) are causing a visible delay to the end user and which threads can wait a long time without affecting the user experience at all. Next time: Commercial profiler shootout.

Read the article
Spooling in SQL execution plans

- by Rob Farley

Sewing has never been my thing. I barely even know the terminology, and when discussing this with American friends, I even found out that half the words that Americans use are different to the words that English and Australian people use. That said – let’s talk about spools! In particular, the Spool operators that you find in some SQL execution plans. This post is for T-SQL Tuesday, hosted this month by me! I’ve chosen to write about spools because they seem to get a bad rap (even in my song I used the line “There’s spooling from a CTE, they’ve got recursion needlessly”). I figured it was worth covering some of what spools are about, and hopefully explain why they are remarkably necessary, and generally very useful. If you have a look at the Books Online page about Plan Operators, at http://msdn.microsoft.com/en-us/library/ms191158.aspx, and do a search for the word ‘spool’, you’ll notice it says there are 46 matches. 46! Yeah, that’s what I thought too... Spooling is mentioned in several operators: Eager Spool, Lazy Spool, Index Spool (sometimes called a Nonclustered Index Spool), Row Count Spool, Spool, Table Spool, and Window Spool (oh, and Cache, which is a special kind of spool for a single row, but as it isn’t used in SQL 2012, I won’t describe it any further here). Spool, Table Spool, Index Spool, Window Spool and Row Count Spool are all physical operators, whereas Eager Spool and Lazy Spool are logical operators, describing the way that the other spools work. For example, you might see a Table Spool which is either Eager or Lazy. A Window Spool can actually act as both, as I’ll mention in a moment. In sewing, cotton is put onto a spool to make it more useful. You might buy it in bulk on a cone, but if you’re going to be using a sewing machine, then you quite probably want to have it on a spool or bobbin, which allows it to be used in a more effective way. This is the picture that I want you to think about in relation to your data. I’m sure you use spools every time you use your sewing machine. I know I do. I can’t think of a time when I’ve got out my sewing machine to do some sewing and haven’t used a spool. However, I often run SQL queries that don’t use spools. You see, the data that is consumed by my query is typically in a useful state without a spool. It’s like I can just sew with my cotton despite it not being on a spool! Many of my favourite features in T-SQL do like to use spools though. This looks like a very similar query to before, but includes an OVER clause to return a column telling me the number of rows in my data set. I’ll describe what’s going on in a few paragraphs’ time. So what does a Spool operator actually do? The spool operator consumes a set of data, and stores it in a temporary structure, in the tempdb database. This structure is typically either a Table (ie, a heap), or an Index (ie, a b-tree). If no data is actually needed from it, then it could also be a Row Count spool, which only stores the number of rows that the spool operator consumes. A Window Spool is another option if the data being consumed is tightly linked to windows of data, such as when the ROWS/RANGE clause of the OVER clause is being used. You could maybe think about the type of spool being like whether the cotton is going onto a small bobbin to fit in the base of the sewing machine, or whether it’s a larger spool for the top. A Table or Index Spool is either Eager or Lazy in nature. Eager and Lazy are Logical operators, which talk more about the behaviour, rather than the physical operation. If I’m sewing, I can either be all enthusiastic and get all my cotton onto the spool before I start, or I can do it as I need it. “Lazy” might not the be the best word to describe a person – in the SQL world it describes the idea of either fetching all the rows to build up the whole spool when the operator is called (Eager), or populating the spool only as it’s needed (Lazy). Window Spools are both physical and logical. They’re eager on a per-window basis, but lazy between windows. And when is it needed? The way I see it, spools are needed for two reasons. 1 – When data is going to be needed AGAIN. 2 – When data needs to be kept away from the original source. If you’re someone that writes long stored procedures, you are probably quite aware of the second scenario. I see plenty of stored procedures being written this way – where the query writer populates a temporary table, so that they can make updates to it without risking the original table. SQL does this too. Imagine I’m updating my contact list, and some of my changes move data to later in the book. If I’m not careful, I might update the same row a second time (or even enter an infinite loop, updating it over and over). A spool can make sure that I don’t, by using a copy of the data. This problem is known as the Halloween Effect (not because it’s spooky, but because it was discovered in late October one year). As I’m sure you can imagine, the kind of spool you’d need to protect against the Halloween Effect would be eager, because if you’re only handling one row at a time, then you’re not providing the protection... An eager spool will block the flow of data, waiting until it has fetched all the data before serving it up to the operator that called it. In the query below I’m forcing the Query Optimizer to use an index which would be upset if the Name column values got changed, and we see that before any data is fetched, a spool is created to load the data into. This doesn’t stop the index being maintained, but it does mean that the index is protected from the changes that are being done. There are plenty of times, though, when you need data repeatedly. Consider the query I put above. A simple join, but then counting the number of rows that came through. The way that this has executed (be it ideal or not), is to ask that a Table Spool be populated. That’s the Table Spool operator on the top row. That spool can produce the same set of rows repeatedly. This is the behaviour that we see in the bottom half of the plan. In the bottom half of the plan, we see that the a join is being done between the rows that are being sourced from the spool – one being aggregated and one not – producing the columns that we need for the query. Table v Index When considering whether to use a Table Spool or an Index Spool, the question that the Query Optimizer needs to answer is whether there is sufficient benefit to storing the data in a b-tree. The idea of having data in indexes is great, but of course there is a cost to maintaining them. Here we’re creating a temporary structure for data, and there is a cost associated with populating each row into its correct position according to a b-tree, as opposed to simply adding it to the end of the list of rows in a heap. Using a b-tree could even result in page-splits as the b-tree is populated, so there had better be a reason to use that kind of structure. That all depends on how the data is going to be used in other parts of the plan. If you’ve ever thought that you could use a temporary index for a particular query, well this is it – and the Query Optimizer can do that if it thinks it’s worthwhile. It’s worth noting that just because a Spool is populated using an Index Spool, it can still be fetched using a Table Spool. The details about whether or not a Spool used as a source shows as a Table Spool or an Index Spool is more about whether a Seek predicate is used, rather than on the underlying structure. Recursive CTE I’ve already shown you an example of spooling when the OVER clause is used. You might see them being used whenever you have data that is needed multiple times, and CTEs are quite common here. With the definition of a set of data described in a CTE, if the query writer is leveraging this by referring to the CTE multiple times, and there’s no simplification to be leveraged, a spool could theoretically be used to avoid reapplying the CTE’s logic. Annoyingly, this doesn’t happen. Consider this query, which really looks like it’s using the same data twice. I’m creating a set of data (which is completely deterministic, by the way), and then joining it back to itself. There seems to be no reason why it shouldn’t use a spool for the set described by the CTE, but it doesn’t. On the other hand, if we don’t pull as many columns back, we might see a very different plan. You see, CTEs, like all sub-queries, are simplified out to figure out the best way of executing the whole query. My example is somewhat contrived, and although there are plenty of cases when it’s nice to give the Query Optimizer hints about how to execute queries, it usually doesn’t do a bad job, even without spooling (and you can always use a temporary table). When recursion is used, though, spooling should be expected. Consider what we’re asking for in a recursive CTE. We’re telling the system to construct a set of data using an initial query, and then use set as a source for another query, piping this back into the same set and back around. It’s very much a spool. The analogy of cotton is long gone here, as the idea of having a continual loop of cotton feeding onto a spool and off again doesn’t quite fit, but that’s what we have here. Data is being fed onto the spool, and getting pulled out a second time when the spool is used as a source. (This query is running on AdventureWorks, which has a ManagerID column in HumanResources.Employee, not AdventureWorks2012) The Index Spool operator is sucking rows into it – lazily. It has to be lazy, because at the start, there’s only one row to be had. However, as rows get populated onto the spool, the Table Spool operator on the right can return rows when asked, ending up with more rows (potentially) getting back onto the spool, ready for the next round. (The Assert operator is merely checking to see if we’ve reached the MAXRECURSION point – it vanishes if you use OPTION (MAXRECURSION 0), which you can try yourself if you like). Spools are useful. Don’t lose sight of that. Every time you use temporary tables or table variables in a stored procedure, you’re essentially doing the same – don’t get upset at the Query Optimizer for doing so, even if you think the spool looks like an expensive part of the query. I hope you’re enjoying this T-SQL Tuesday. Why not head over to my post that is hosting it this month to read about some other plan operators? At some point I’ll write a summary post – once I have you should find a comment below pointing at it. @rob_farley

Read the article
Managed Service Architectures Part I

- by barryoreilly

Instead of thinking about service oriented architecture, a concept that is continually defined, redefined, abused and mistreated, perhaps it is time to drop the acronym and consider what we actually need to get the job done. ‘Pure’ SOA involves the modeling of an organisation’s processes, the so called ‘Top Down’ approach, followed by the implementation of these processes as services. Another approach, more commonly seen in the wild, is the bottom up approach. This usually involves services that simply start popping up in the organization, and SOA in this case is often just an attempt to rein in these services. Such projects, although described as SOA projects for a variety of reasons, have clearly little relation to process driven architecture. Much has been written about these two approaches, with many deciding that a hybrid of both methods is needed to succeed with SOA. These hybrid methods are a sensible compromise, but one gets the feeling that there is too much focus on ‘Succeeding with SOA’. Organisations who focus too much on bottom up development, or who waste too much time and money on top down approaches that don’t produce results, are often recommended to attempt an ‘agile’(Erl) or ‘middle-out’ (Microsoft) approach in order to succeed with SOA. The problem with recommending this approach is that, in most cases, succeeding with SOA isn’t the aim of the project. If a project is started with the simple aim of ‘Succeeding with SOA’ then the reasons for the projects existence probably need to be questioned. There are a number of things we can be sure of: · An organisation will have a number of disparate IT systems · Some of these systems will have redundant data and functionality · Integration will give considerable ROI · Integration will already be under way. · Services will already exist in the organisation · These services will be inconsistent in their implementation and in their governance So there are three goals here: 1. Alignment between the business and IT 2. Integration of disparate systems 3. Management of services. 2 and 3 are going to happen, in fact they must happen if any degree of return is expected from the IT department. Ignoring 1 is considered a typical mistake in SOA implementations, as it ignores the business implications. However, the business implication of this approach is the money saved in more efficient IT processes. 2 and 3 are ongoing, and they will continue happening, even if a large project to produce a SOA metamodel is started. The result will then be an unstructured cackle of services, and a metamodel that is already going out of date. So we get stuck in and rebuild our services so that they match the metamodel, with the far reaching consequences that this will have on all our LOB systems are current. Lets imagine that this actually works ( how often do we rip and replace working software because it doesn't fit a certain pattern? Never -that's the point of integration), we will now be working with a metamodel that is out of date, and most likely incomplete if the organisation is large. Accepting that an object can have more than one model over time, with perhaps more than one model being at any given time will help us realise the limitations of the top down model. It is entirely normal , and perhaps necessary, for an organisation to be able to view an entity from different perspectives. So, instead of trying to constantly force these goals in a straight line, why not let them happen in parallel, and manage the changes in each layer. If company A has chosen to model their business processes and create a business architecture, there will be a reason behind this. Often the aim is to make the business more flexible and able to cope with change, through alignment between the business and the IT department. If company B’s IT department recognizes the problem of wild services springing up everywhere, and decides to do something about it, by designing a platform and processes for the introduction of services, is this not a valid approach? With the hybrid approach, it is recommended that company A begin deploying services as quickly as possible. Based on models that are clearly incomplete, and which will therefore change rapidly and often in the near future. Natural business evolution will also mean that the models can be guaranteed to change in the not so near future. To ‘Succeed with SOA’ Company B needs to go back to the drawing board and start modeling processes and objects. So, in effect, we are telling business analysts to start developing code based on a model they are unsure of, and telling programmers to ignore the obvious and growing problems in their IT department and start drawing lines and boxes. Could the problem be that there are two different problem domains? And the whole concept of SOA as it being described by clever salespeople today creates an example of oft dreaded ‘tight coupling’ between these two domains? Could it be that we have taken two large problem areas, and bundled the solution together in order to create a magic bullet? And then convinced ourselves that the bullet actually exists? Company A wants to have a closer relationship between the business and its IT department, in order to become a more flexible organization. Company B wants to decrease the maintenance costs of its IT infrastructure. If both companies focus on succeeding with SOA, then they aren’t focusing on their actual goals. If Company A starts building services from incomplete models, without a gameplan, they will end up in the same situation as company B, with wild services. If company B focuses on modeling, they could easily end up with the same problems as company A. Now we have two companies, who a short while ago had one problem each, that now have two problems each. This has happened because of a focus on ‘Succeeding with SOA’, rather than solving the problem at hand. This is not to suggest that the two problem domains are unrelated, a strategy that encompasses both will obviously be good for the organization. But only if the organization realizes this and can develop such a strategy. This strategy cannot be bought in a box. Anyone who has worked with SOA for a while will be used to analyzing the solutions to a problem and judging the solution’s level of coupling. If we have two applications that each perform separate functions, but need to communicate with each other, we create a integration layer between them, perhaps with a service, but we do all we can to reduce the dependency between the two systems. Using the same approach, we can separate the modeling (business architecture) and the service hosting (technical architecture). The business architecture describes the processes and business objects in the business domain. The technical architecture describes the hosting and management and implementation of services. The glue that binds these together, the integration layer in our analogy, is the service contract, where the operations map the processes to their technical implementation, and the messages map business concepts to software objects in the implementation. If we reduce the coupling between these layers, we should be able to allow developers to develop services, and business analysts to develop models, without the changes rippling through from one side to the other. This would allow company A to carry on modeling, and company B to develop a service platform, each achieving their intended goal, without necessarily creating the problems seen in pure top down or bottom up approaches. Company B could then at a later date map their service infrastructure to a unified model, and company A could carry on modeling, insulating deployed services from changes in the ongoing modeling. How do we do this? The concept of service virtualization has been around for a while, and is instantly realizable in Microsoft’s Managed Services Engine. Here we can create a layer of virtual services, which represent the business analyst’s view, presenting uniform contracts to the outside world. These services can then transform and route messages to the actual service implementations. I like to think of the virtual services with their beautifully modeled interfaces as ‘SOA services’, and the implementations as simple integration ‘adapter’ services providing an interface to a technical implementation. The Managed Services Engine also provides policy based control over services, regardless of where they are deployed, simplifying handling of security, logging, exception handling etc. This solves a big problem. The pressure to deliver services quickly is always there in projects. It is very important to quickly show value when implementing service architectures. There is also pressure to deliver quality, and you can’t easily do both at the same time. This approach allows quick delivery with quality increasing over time, allowing modeling and service development to occur in parallel and independent of each other. The link between business modeling and service implementation is not one that is obvious to many organizations, and requires a certain maturity to realize and drive forward. It is also completely possible that a company can benefit from one without the other, even if this approach is frowned upon today, there are many companies doing so and seeing ROI. Of course there are disadvantages to this. The biggest one being the transformations necessary between the virtual interfaces and the service implementations. Bad choices in developing the services in the service implementation could mean that it is impossible to map the modeled processes to the implementation with redevelopment of the service. In many cases the architect will not have a choice here anyway, as proprietary systems are often delivered with predeveloped services. The alternative is to wait until the model is finished and then build the service according the model. However, if that approach worked we wouldn’t be having this discussion! And even when it does work, natural business evolution will mean that the two concepts (model and implementation) will immediately start to drift away from each other, so coupling them tightly together so that they are forever bound to the model that only applies at the time of the modeling work will not really achieve a great deal. Architecture is all about trade offs, and here a choice has to be made. The choice is between something will initially be of low quality but will work, or something that may well be impossible to achieve in most situations. In conclusion, top-down is a natural approach for business analysts, and bottom-up is a natural approach for developers. Instead of trying to force something on both that neither want, and which has not shown itself to be successful, why not let them get on with their jobs, and let an enterprise architect coordinate the processes?

Read the article
Code Behaviour via Unit Tests

- by Dewald Galjaard

Normal 0 false false false EN-ZA X-NONE X-NONE /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-para-margin-top:0cm; mso-para-margin-right:0cm; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0cm; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-fareast-font-family:"Times New Roman"; mso-fareast-theme-font:minor-fareast; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin; mso-bidi-font-family:"Times New Roman"; mso-bidi-theme-font:minor-bidi;} Some four months ago my car started acting up. Symptoms included a sputtering as my car’s computer switched between gears intermittently. Imagine building up speed, then when you reach 80km/h the car magically and mysteriously decide to switch back to third or even second gear. Clearly it was confused! I managed to track down a technician, an expert in his field to help me out. As he fitted his handheld computer to some hidden port under the dash, he started to explain “These cars are quite intelligent, you know. When they sense something is wrong they run in a restrictive program which probably account for how you managed to drive here in the first place...” I was surprised and thought this was certainly going to be an interesting test drive. The car ran smoothly down the first couple of stretches as the technician ran through routine checks. Then he said “Ok, all looking good. We need to start testing aspects of the gearbox. Inside the gearbox there are a couple of sensors. One of them is a speed sensor which talks to the computer, which in turn will decide which gear to switch to. The restrictive program avoid these sensors altogether and allow the computer to obtain its input from other [non-affected] sources”. Then, as soon as he forced the speed sensor to come back online the symptoms and ill behaviour re-emerged... What an incredible analogy for getting into a discussion on unit testing software? Besides I should probably put my ill fortune to some good use, right? This example provide a lot of insight into how and why we should conduct unit tests when writing code. More importantly, it captures what is easily and unfortunately often the most overlooked goal of writing unit tests by those new to the art and those who oppose it alike - The goal of writing unit tests is to test the behaviour of our code under predefined conditions. Although it is very possible to test the intrinsic workings of each and every component in your code, writing several tests for each method in practise will soon prove to be an exhausting and ultimately fruitless exercise given the certain and ever changing nature of business requirements. Consequently it is true and quite possible whilst conducting proper unit tests, to call any single method several times as you examine and contemplate different scenarios. Let’s write some code to demonstrate what I mean. In my example I make use of the Moq framework and NUnit to create my tests. Truly you can use whatever you’re comfortable with. First we’ll create an ISpeedSensor interface. This is to represent the speed sensor located in the gearbox. Then we’ll create a Gearbox class which we’ll pass to a constructor when we instantiate an object of type Computer. All three are described below. ISpeedSensor.cs namespace AutomaticVehicle { public interface ISpeedSensor { int ReportCurrentSpeed(); } } Gearbox.cs namespace AutomaticVehicle { public class Gearbox { private ISpeedSensor _speedSensor; public Gearbox( ISpeedSensor gearboxSpeedSensor ) { _speedSensor = gearboxSpeedSensor; } /// <summary> /// This method obtain it's reading from the speed sensor. /// </summary> /// <returns></returns> public int ReportCurrentSpeed() { return _speedSensor.ReportCurrentSpeed(); } } } Computer.cs namespace AutomaticVehicle { public class Computer { private Gearbox _gearbox; public Computer( Gearbox gearbox ) { } public int GetCurrentSpeed() { return _gearbox.ReportCurrentSpeed( ); } } } Since this post is about Unit testing, that is exactly what we’ll create next. Create a second project in your solution. I called mine AutomaticVehicleTests and I immediately referenced the respective nunit, moq and AutomaticVehicle dll’s. We’re going to write a test to examine what happens inside the Computer class. ComputerTests.cs namespace AutomaticVehicleTests { [TestFixture] public class ComputerTests { [Test] public void Computer_Gearbox_SpeedSensor_DoesThrow() { // Mock ISpeedSensor in gearbox Mock< ISpeedSensor > speedSensor = new Mock< ISpeedSensor >( ); speedSensor.Setup( n => n.ReportCurrentSpeed() ).Throws<Exception>(); Gearbox gearbox = new Gearbox( speedSensor.Object ); // Create Computer instance to test it's behaviour towards an exception in gearbox Computer carComputer = new Computer( gearbox ); // For simplicity let’s assume for now the car only travels at 60 km/h. Assert.AreEqual( 60, carComputer.GetCurrentSpeed( ) ); } } } What is happening in this test? We have created a mocked object using the ISpeedsensor interface which we've passed to our Gearbox object. Notice that I created the mocked object using an interface, not the implementation. I’ll talk more about this in future posts but in short I do this to accentuate the fact that I'm not not really concerned with how SpeedSensor work internally at this particular point in time. Next I’ve gone ahead and created a scenario where I’ve declared the speed sensor in Gearbox to be faulty by forcing it to throw an exception should we ask Gearbox to report on its current speed. Sneaky, sneaky. This test is a simulation of how things may behave in the real world. Inevitability things break, whether it’s caused by mechanical failure, some logical error on your part or a fellow developer which didn’t consult the documentation (or the lack thereof ) - whether you’re calling a speed sensor, making a call to a database, calling a web service or just trying to write a file to disk. It’s a scenario I’ve created and this test is about how the code within the Computer instance will behave towards any such error as I’ve depicted. Now, if you’ve followed closely in my final assert method you would have noticed I did something quite unexpected. I might be getting ahead of myself now but I’m testing to see if the value returned is equal to what I expect it to be under perfect conditions – I’m not testing to see if an error has been thrown! Why is that? Well, in short this is TDD. Test Driven Development is about first writing your test to define the result we want, then to go back and change the implementation within your class to obtain the desired output (I need to make sure I can drive back to the repair shop. Remember? ) So let’s go ahead and run our test as is. It’s fails miserably... Good! Let’s go back to our Computer class and make a small change to the GetCurrentSpeed method. Computer.cs public int GetCurrentSpeed() { try { return _gearbox.ReportCurrentSpeed( ); } catch { RunRestrictiveProgram( ); } } This is a simple solution, I know, but it does provide a way to allow for different behaviour. You’re more than welcome to provide an implementation for RunRestrictiveProgram should you feel the need to. It's not within the scope of this post or related to the point I'm trying to make. What is important is to notice how the focus has shifted in our approach from how things can break - to how things behave when broken. Happy coding!

Read the article
On REST: WADL or not IDL, is the following approach right ?

- by redben

This question is a bit long, please bear with me. In REST, i think we should not need WADL or any IDL. But rather something that would implicitly cover its concept. The way I think about it is when we (humans) surf the Web, when we go to a web site for the first time, we don't know what services it provides. You discover those on the html home page (or a sitemap page in a help section) or maybe just the main menu on the home page. If you make an analogy, the homepage or site map to us humans is what WSDL is to WS-* or what WADL could be to a REST service. Only that its just like any other html content. I think that in REST the following is a good way to do things, respecting the HATEOS paradigm. Have a top level (or default) resource that lists links to your other resources. For a library example, say RestLibrary.com/ it could be something like: <root xmlns:lib="http://librarystandards.com/libraryml"> <resource class="lib:book"> <link type="application/vnd.libraryml+xml" template="mylib.com/book/{isbn}" /> <link type="application/vnd.libraryml+xml" rel="add" href="mylib.com/book" method="POST" /> <link type="application/vnd.libraryml+xml" rel="update" template="mylib.com/book/{isbn}" method="PUT" /> </resource> <resource class="lib:bookList"> <link template="mylib.com/book?keywords={keywords}" type="application/vnd.openlibrary+xml" rel="search" /> </resource> </root> Note that it is assumed that the media type "application/vnd.libraryml+xml" is a defined standard or (may be just proprietary vocabulary) named libraryml. Also, the client should be able to understand this "homepage" resource (elements root, resource and link). This is the part that could be used instead of WADL : an Abstract vocabulary that should be understandable by any client. You could use an existing standard like Atom for example. But the main idea is to have an abstract vocabulary understandable by any client. Why not WADL then ? well wadl is only for service discovery. The idea here is to have an light abstract vocabulary that would serve as a base for hypermedia. A "root" vocabulary. Like in owl we have owl:thing...etc Now if the client knows the "libraryml" standard it can follow the links to the things it understands (after parsing the media type properties and xmlns). If not, it just won't. When i can't understand how to deal with something in REST architecture i tend to see how we Humans do it in the Web. In the Web, we have the Generic language that is HTML that enables site builders to deliver any specific content, regardless of its meaning to the client (the user), Browsers understand HTML but not the "meaning" of its content. It is the user that understands the (domain specific) content. If i go to say QuantumPhysics.org, my browser can render the home page (it is just html after all) and i can read the home page. If i understand quantum then fine i can continue browsing. If i don't i just get out (unless i want to learn the hardway :) ) In the RetsLibrary.com example the client app is just like me+my browser on QuantumPhysics.org. the media type "application/vnd.libraryml+xml" is quantum physics (knowledge). http is http in both examples. Now HTML of QuantumPhysics.org is in RestLibrary.com is XML + that tiny little abstract vocabulary (root resource and link, that you could replace with something like Atom). So does this approach have any value ? don't we need a root tiny hyper-vocabulary so we can succeed with hypermedia and the "initial URI" concept ? edit Yeah why not RDF as the root vocabulary !

Read the article
Help to restructure my Doc/View more correctly

- by Harvey

Edited by OP. My program is in need of a lot of cleanup and restructuring. In another post I asked about leaving the MFC DocView framework and going to the WinProc & Message Loop way (what is that called for short?). Well at present I am thinking that I should clean up what I have in Doc View and perhaps later convert to non-MFC it that even makes sense. My Document class currently has almost nothing useful in it. I think a place to start is the InitInstance() function (posted below). In this part: POSITION pos=pDocTemplate->GetFirstDocPosition(); CLCWDoc *pDoc=(CLCWDoc *)pDocTemplate->GetNextDoc(pos); ASSERT_VALID(pDoc); POSITION vpos=pDoc->GetFirstViewPosition(); CChildView *pCV=(CChildView *)pDoc->GetNextView(vpos); This seem strange to me. I only have one doc and one view. I feel like I am going about it backwards with GetNextDoc() and GetNextView(). To try to use a silly analogy; it's like I have a book in my hand but I have to look up in it's index to find out what page the Title of the book is on. I'm tired of feeling embarrassed about my code. I either need correction or reassurance, or both. :) Also, all the miscellaneous items are in no particular order. I would like to rearrange them into an order that may be more standard, structured or straightforward. ALL suggestions welcome! BOOL CLCWApp::InitInstance() { InitCommonControls(); if(!AfxOleInit()) return FALSE; // Initialize the Toolbar dll. (Toolbar code by Nikolay Denisov.) InitGuiLibDLL(); // NOTE: insert GuiLib.dll into the resource chain SetRegistryKey(_T("Real Name Removed")); // Register document templates CSingleDocTemplate* pDocTemplate; pDocTemplate = new CSingleDocTemplate( IDR_MAINFRAME, RUNTIME_CLASS(CLCWDoc), RUNTIME_CLASS(CMainFrame), RUNTIME_CLASS(CChildView)); AddDocTemplate(pDocTemplate); // Parse command line for standard shell commands, DDE, file open CCmdLineInfo cmdInfo; ParseCommandLine(cmdInfo); // Dispatch commands specified on the command line // The window frame appears on the screen in here. if (!ProcessShellCommand(cmdInfo)) { AfxMessageBox("Failure processing Command Line"); return FALSE; } POSITION pos=pDocTemplate->GetFirstDocPosition(); CLCWDoc *pDoc=(CLCWDoc *)pDocTemplate->GetNextDoc(pos); ASSERT_VALID(pDoc); POSITION vpos=pDoc->GetFirstViewPosition(); CChildView *pCV=(CChildView *)pDoc->GetNextView(vpos); if(!cmdInfo.m_Fn1.IsEmpty() && !cmdInfo.m_Fn2.IsEmpty()) { pCV->OpenF1(cmdInfo.m_Fn1); pCV->OpenF2(cmdInfo.m_Fn2); pCV->DoCompare(); // Sends a paint message when complete } // enable file manager drag/drop and DDE Execute open m_pMainWnd->DragAcceptFiles(TRUE); m_pMainWnd->ShowWindow(SW_SHOWNORMAL); m_pMainWnd->UpdateWindow(); // paints the window background pCV->bDoSize=true; //Prevent a dozen useless size calculations return TRUE; } Thanks

Read the article
Oracle BI Server Modeling, Part 1- Designing a Query Factory

- by bob.ertl(at)oracle.com

Welcome to Oracle BI Development's BI Foundation blog, focused on helping you get the most value from your Oracle Business Intelligence Enterprise Edition (BI EE) platform deployments. In my first series of posts, I plan to show developers the concepts and best practices for modeling in the Common Enterprise Information Model (CEIM), the semantic layer of Oracle BI EE. In this segment, I will lay the groundwork for the modeling concepts. First, I will cover the big picture of how the BI Server fits into the system, and how the CEIM controls the query processing. Oracle BI EE Query Cycle The purpose of the Oracle BI Server is to bridge the gap between the presentation services and the data sources. There are typically a variety of data sources in a variety of technologies: relational, normalized transaction systems; relational star-schema data warehouses and marts; multidimensional analytic cubes and financial applications; flat files, Excel files, XML files, and so on. Business datasets can reside in a single type of source, or, most of the time, are spread across various types of sources. Presentation services users are generally business people who need to be able to query that set of sources without any knowledge of technologies, schemas, or how sources are organized in their company. They think of business analysis in terms of measures with specific calculations, hierarchical dimensions for breaking those measures down, and detailed reports of the business transactions themselves. Most of them create queries without knowing it, by picking a dashboard page and some filters. Others create their own analysis by selecting metrics and dimensional attributes, and possibly creating additional calculations. The BI Server bridges that gap from simple business terms to technical physical queries by exposing just the business focused measures and dimensional attributes that business people can use in their analyses and dashboards. After they make their selections and start the analysis, the BI Server plans the best way to query the data sources, writes the optimized sequence of physical queries to those sources, post-processes the results, and presents them to the client as a single result set suitable for tables, pivots and charts. The CEIM is a model that controls the processing of the BI Server. It provides the subject areas that presentation services exposes for business users to select simplified metrics and dimensional attributes for their analysis. It models the mappings to the physical data access, the calculations and logical transformations, and the data access security rules. The CEIM consists of metadata stored in the repository, authored by developers using the Administration Tool client. Presentation services and other query clients create their queries in BI EE's SQL-92 language, called Logical SQL or LSQL. The API simply uses ODBC or JDBC to pass the query to the BI Server. Presentation services writes the LSQL query in terms of the simplified objects presented to the users. The BI Server creates a query plan, and rewrites the LSQL into fully-detailed SQL or other languages suitable for querying the physical sources. For example, the LSQL on the left below was rewritten into the physical SQL for an Oracle 11g database on the right. Logical SQL Physical SQL SELECT "D0 Time"."T02 Per Name Month" saw_0, "D4 Product"."P01 Product" saw_1, "F2 Units"."2-01 Billed Qty (Sum All)" saw_2 FROM "Sample Sales" ORDER BY saw_0, saw_1 WITH SAWITH0 AS ( select T986.Per_Name_Month as c1, T879.Prod_Dsc as c2, sum(T835.Units) as c3, T879.Prod_Key as c4 from Product T879 /* A05 Product */ , Time_Mth T986 /* A08 Time Mth */ , FactsRev T835 /* A11 Revenue (Billed Time Join) */ where ( T835.Prod_Key = T879.Prod_Key and T835.Bill_Mth = T986.Row_Wid) group by T879.Prod_Dsc, T879.Prod_Key, T986.Per_Name_Month ) select SAWITH0.c1 as c1, SAWITH0.c2 as c2, SAWITH0.c3 as c3 from SAWITH0 order by c1, c2 Probably everybody reading this blog can write SQL or MDX. However, the trick in designing the CEIM is that you are modeling a query-generation factory. Rather than hand-crafting individual queries, you model behavior and relationships, thus configuring the BI Server machinery to manufacture millions of different queries in response to random user requests. This mass production requires a different mindset and approach than when you are designing individual SQL statements in tools such as Oracle SQL Developer, Oracle Hyperion Interactive Reporting (formerly Brio), or Oracle BI Publisher. The Structure of the Common Enterprise Information Model (CEIM) The CEIM has a unique structure specifically for modeling the relationships and behaviors that fill the gap from logical user requests to physical data source queries and back to the result. The model divides the functionality into three specialized layers, called Presentation, Business Model and Mapping, and Physical, as shown below. Presentation services clients can generally only see the presentation layer, and the objects in the presentation layer are normally the only ones used in the LSQL request. When a request comes into the BI Server from presentation services or another client, the relationships and objects in the model allow the BI Server to select the appropriate data sources, create a query plan, and generate the physical queries. That's the left to right flow in the diagram below. When the results come back from the data source queries, the right to left relationships in the model show how to transform the results and perform any final calculations and functions that could not be pushed down to the databases. Business Model Think of the business model as the heart of the CEIM you are designing. This is where you define the analytic behavior seen by the users, and the superset library of metric and dimension objects available to the user community as a whole. It also provides the baseline business-friendly names and user-readable dictionary. For these reasons, it is often called the "logical" model--it is a virtual database schema that persists no data, but can be queried as if it is a database. The business model always has a dimensional shape (more on this in future posts), and its simple shape and terminology hides the complexity of the source data models. Besides hiding complexity and normalizing terminology, this layer adds most of the analytic value, as well. This is where you define the rich, dimensional behavior of the metrics and complex business calculations, as well as the conformed dimensions and hierarchies. It contributes to the ease of use for business users, since the dimensional metric definitions apply in any context of filters and drill-downs, and the conformed dimensions enable dashboard-wide filters and guided analysis links that bring context along from one page to the next. The conformed dimensions also provide a key to hiding the complexity of many sources, including federation of different databases, behind the simple business model. Note that the expression language in this layer is LSQL, so that any expression can be rewritten into any data source's query language at run time. This is important for federation, where a given logical object can map to several different physical objects in different databases. It is also important to portability of the CEIM to different database brands, which is a key requirement for Oracle's BI Applications products. Your requirements process with your user community will mostly affect the business model. This is where you will define most of the things they specifically ask for, such as metric definitions. For this reason, many of the best-practice methodologies of our consulting partners start with the high-level definition of this layer. Physical Model The physical model connects the business model that meets your users' requirements to the reality of the data sources you have available. In the query factory analogy, think of the physical layer as the bill of materials for generating physical queries. Every schema, table, column, join, cube, hierarchy, etc., that will appear in any physical query manufactured at run time must be modeled here at design time. Each physical data source will have its own physical model, or "database" object in the CEIM. The shape of each physical model matches the shape of its physical source. In other words, if the source is normalized relational, the physical model will mimic that normalized shape. If it is a hypercube, the physical model will have a hypercube shape. If it is a flat file, it will have a denormalized tabular shape. To aid in query optimization, the physical layer also tracks the specifics of the database brand and release. This allows the BI Server to make the most of each physical source's distinct capabilities, writing queries in its syntax, and using its specific functions. This allows the BI Server to push processing work as deep as possible into the physical source, which minimizes data movement and takes full advantage of the database's own optimizer. For most data sources, native APIs are used to further optimize performance and functionality. The value of having a distinct separation between the logical (business) and physical models is encapsulation of the physical characteristics. This encapsulation is another enabler of packaged BI applications and federation. It is also key to hiding the complex shapes and relationships in the physical sources from the end users. Consider a routine drill-down in the business model: physically, it can require a drill-through where the first query is MDX to a multidimensional cube, followed by the drill-down query in SQL to a normalized relational database. The only difference from the user's point of view is that the 2nd query added a more detailed dimension level column - everything else was the same. Mappings Within the Business Model and Mapping Layer, the mappings provide the binding from each logical column and join in the dimensional business model, to each of the objects that can provide its data in the physical layer. When there is more than one option for a physical source, rules in the mappings are applied to the query context to determine which of the data sources should be hit, and how to combine their results if more than one is used. These rules specify aggregate navigation, vertical partitioning (fragmentation), and horizontal partitioning, any of which can be federated across multiple, heterogeneous sources. These mappings are usually the most sophisticated part of the CEIM. Presentation You might think of the presentation layer as a set of very simple relational-like views into the business model. Over ODBC/JDBC, they present a relational catalog consisting of databases, tables and columns. For business users, presentation services interprets these as subject areas, folders and columns, respectively. (Note that in 10g, subject areas were called presentation catalogs in the CEIM. In this blog, I will stick to 11g terminology.) Generally speaking, presentation services and other clients can query only these objects (there are exceptions for certain clients such as BI Publisher and Essbase Studio). The purpose of the presentation layer is to specialize the business model for different categories of users. Based on a user's role, they will be restricted to specific subject areas, tables and columns for security. The breakdown of the model into multiple subject areas organizes the content for users, and subjects superfluous to a particular business role can be hidden from that set of users. Customized names and descriptions can be used to override the business model names for a specific audience. Variables in the object names can be used for localization. For these reasons, you are better off thinking of the tables in the presentation layer as folders than as strict relational tables. The real semantics of tables and how they function is in the business model, and any grouping of columns can be included in any table in the presentation layer. In 11g, an LSQL query can also span multiple presentation subject areas, as long as they map to the same business model. Other Model Objects There are some objects that apply to multiple layers. These include security-related objects, such as application roles, users, data filters, and query limits (governors). There are also variables you can use in parameters and expressions, and initialization blocks for loading their initial values on a static or user session basis. Finally, there are Multi-User Development (MUD) projects for developers to check out units of work, and objects for the marketing feature used by our packaged customer relationship management (CRM) software. The Query Factory At this point, you should have a grasp on the query factory concept. When developing the CEIM model, you are configuring the BI Server to automatically manufacture millions of queries in response to random user requests. You do this by defining the analytic behavior in the business model, mapping that to the physical data sources, and exposing it through the presentation layer's role-based subject areas. While configuring mass production requires a different mindset than when you hand-craft individual SQL or MDX statements, it builds on the modeling and query concepts you already understand. The following posts in this series will walk through the CEIM modeling concepts and best practices in detail. We will initially review dimensional concepts so you can understand the business model, and then present a pattern-based approach to learning the mappings from a variety of physical schema shapes and deployments to the dimensional model. Along the way, we will also present the dimensional calculation template, and learn how to configure the many additivity patterns.

Read the article
Inside the Concurrent Collections: ConcurrentDictionary

- by Simon Cooper

Using locks to implement a thread-safe collection is rather like using a sledgehammer - unsubtle, easy to understand, and tends to make any other tool redundant. Unlike the previous two collections I looked at, ConcurrentStack and ConcurrentQueue, ConcurrentDictionary uses locks quite heavily. However, it is careful to wield locks only where necessary to ensure that concurrency is maximised. This will, by necessity, be a higher-level look than my other posts in this series, as there is quite a lot of code and logic in ConcurrentDictionary. Therefore, I do recommend that you have ConcurrentDictionary open in a decompiler to have a look at all the details that I skip over. The problem with locks There's several things to bear in mind when using locks, as encapsulated by the lock keyword in C# and the System.Threading.Monitor class in .NET (if you're unsure as to what lock does in C#, I briefly covered it in my first post in the series): Locks block threads The most obvious problem is that threads waiting on a lock can't do any work at all. No preparatory work, no 'optimistic' work like in ConcurrentQueue and ConcurrentStack, nothing. It sits there, waiting to be unblocked. This is bad if you're trying to maximise concurrency. Locks are slow Whereas most of the methods on the Interlocked class can be compiled down to a single CPU instruction, ensuring atomicity at the hardware level, taking out a lock requires some heavy lifting by the CLR and the operating system. There's quite a bit of work required to take out a lock, block other threads, and wake them up again. If locks are used heavily, this impacts performance. Deadlocks When using locks there's always the possibility of a deadlock - two threads, each holding a lock, each trying to aquire the other's lock. Fortunately, this can be avoided with careful programming and structured lock-taking, as we'll see. So, it's important to minimise where locks are used to maximise the concurrency and performance of the collection. Implementation As you might expect, ConcurrentDictionary is similar in basic implementation to the non-concurrent Dictionary, which I studied in a previous post. I'll be using some concepts introduced there, so I recommend you have a quick read of it. So, if you were implementing a thread-safe dictionary, what would you do? The naive implementation is to simply have a single lock around all methods accessing the dictionary. This would work, but doesn't allow much concurrency. Fortunately, the bucketing used by Dictionary allows a simple but effective improvement to this - one lock per bucket. This allows different threads modifying different buckets to do so in parallel. Any thread making changes to the contents of a bucket takes the lock for that bucket, ensuring those changes are thread-safe. The method that maps each bucket to a lock is the GetBucketAndLockNo method: private void GetBucketAndLockNo( int hashcode, out int bucketNo, out int lockNo, int bucketCount) { // the bucket number is the hashcode (without the initial sign bit) // modulo the number of buckets bucketNo = (hashcode & 0x7fffffff) % bucketCount; // and the lock number is the bucket number modulo the number of locks lockNo = bucketNo % m_locks.Length; } However, this does require some changes to how the buckets are implemented. The 'implicit' linked list within a single backing array used by the non-concurrent Dictionary adds a dependency between separate buckets, as every bucket uses the same backing array. Instead, ConcurrentDictionary uses a strict linked list on each bucket: This ensures that each bucket is entirely separate from all other buckets; adding or removing an item from a bucket is independent to any changes to other buckets. Modifying the dictionary All the operations on the dictionary follow the same basic pattern: void AlterBucket(TKey key, ...) { int bucketNo, lockNo; 1: GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, m_buckets.Length); 2: lock (m_locks[lockNo]) { 3: Node headNode = m_buckets[bucketNo]; 4: Mutate the node linked list as appropriate } } For example, when adding another entry to the dictionary, you would iterate through the linked list to check whether the key exists already, and add the new entry as the head node. When removing items, you would find the entry to remove (if it exists), and remove the node from the linked list. Adding, updating, and removing items all follow this pattern. Performance issues There is a problem we have to address at this point. If the number of buckets in the dictionary is fixed in the constructor, then the performance will degrade from O(1) to O(n) when a large number of items are added to the dictionary. As more and more items get added to the linked lists in each bucket, the lookup operations will spend most of their time traversing a linear linked list. To fix this, the buckets array has to be resized once the number of items in each bucket has gone over a certain limit. (In ConcurrentDictionary this limit is when the size of the largest bucket is greater than the number of buckets for each lock. This check is done at the end of the TryAddInternal method.) Resizing the bucket array and re-hashing everything affects every bucket in the collection. Therefore, this operation needs to take out every lock in the collection. Taking out mutiple locks at once inevitably summons the spectre of the deadlock; two threads each hold a lock, and each trying to acquire the other lock. How can we eliminate this? Simple - ensure that threads never try to 'swap' locks in this fashion. When taking out multiple locks, always take them out in the same order, and always take out all the locks you need before starting to release them. In ConcurrentDictionary, this is controlled by the AcquireLocks, AcquireAllLocks and ReleaseLocks methods. Locks are always taken out and released in the order they are in the m_locks array, and locks are all released right at the end of the method in a finally block. At this point, it's worth pointing out that the locks array is never re-assigned, even when the buckets array is increased in size. The number of locks is fixed in the constructor by the concurrencyLevel parameter. This simplifies programming the locks; you don't have to check if the locks array has changed or been re-assigned before taking out a lock object. And you can be sure that when a thread takes out a lock, another thread isn't going to re-assign the lock array. This would create a new series of lock objects, thus allowing another thread to ignore the existing locks (and any threads controlling them), breaking thread-safety. Consequences of growing the array Just because we're using locks doesn't mean that race conditions aren't a problem. We can see this by looking at the GrowTable method. The operation of this method can be boiled down to: private void GrowTable(Node[] buckets) { try { 1: Acquire first lock in the locks array // this causes any other thread trying to take out // all the locks to block because the first lock in the array // is always the one taken out first // check if another thread has already resized the buckets array // while we were waiting to acquire the first lock 2: if (buckets != m_buckets) return; 3: Calculate the new size of the backing array 4: Node[] array = new array[size]; 5: Acquire all the remaining locks 6: Re-hash the contents of the existing buckets into array 7: m_buckets = array; } finally { 8: Release all locks } } As you can see, there's already a check for a race condition at step 2, for the case when the GrowTable method is called twice in quick succession on two separate threads. One will successfully resize the buckets array (blocking the second in the meantime), when the second thread is unblocked it'll see that the array has already been resized & exit without doing anything. There is another case we need to consider; looking back at the AlterBucket method above, consider the following situation: Thread 1 calls AlterBucket; step 1 is executed to get the bucket and lock numbers. Thread 2 calls GrowTable and executes steps 1-5; thread 1 is blocked when it tries to take out the lock in step 2. Thread 2 re-hashes everything, re-assigns the buckets array, and releases all the locks (steps 6-8). Thread 1 is unblocked and continues executing, but the calculated bucket and lock numbers are no longer valid. Between calculating the correct bucket and lock number and taking out the lock, another thread has changed where everything is. Not exactly thread-safe. Well, a similar problem was solved in ConcurrentStack and ConcurrentQueue by storing a local copy of the state, doing the necessary calculations, then checking if that state is still valid. We can use a similar idea here: void AlterBucket(TKey key, ...) { while (true) { Node[] buckets = m_buckets; int bucketNo, lockNo; GetBucketAndLockNo( key.GetHashCode(), out bucketNo, out lockNo, buckets.Length); lock (m_locks[lockNo]) { // if the state has changed, go back to the start if (buckets != m_buckets) continue; Node headNode = m_buckets[bucketNo]; Mutate the node linked list as appropriate } break; } } TryGetValue and GetEnumerator And so, finally, we get onto TryGetValue and GetEnumerator. I've left these to the end because, well, they don't actually use any locks. How can this be? Whenever you change a bucket, you need to take out the corresponding lock, yes? Indeed you do. However, it is important to note that TryGetValue and GetEnumerator don't actually change anything. Just as immutable objects are, by definition, thread-safe, read-only operations don't need to take out a lock because they don't change anything. All lockless methods can happily iterate through the buckets and linked lists without worrying about locking anything. However, this does put restrictions on how the other methods operate. Because there could be another thread in the middle of reading the dictionary at any time (even if a lock is taken out), the dictionary has to be in a valid state at all times. Every change to state has to be made visible to other threads in a single atomic operation (all relevant variables are marked volatile to help with this). This restriction ensures that whatever the reading threads are doing, they never read the dictionary in an invalid state (eg items that should be in the collection temporarily removed from the linked list, or reading a node that has had it's key & value removed before the node itself has been removed from the linked list). Fortunately, all the operations needed to change the dictionary can be done in that way. Bucket resizes are made visible when the new array is assigned back to the m_buckets variable. Any additions or modifications to a node are done by creating a new node, then splicing it into the existing list using a single variable assignment. Node removals are simply done by re-assigning the node's m_next pointer. Because the dictionary can be changed by another thread during execution of the lockless methods, the GetEnumerator method is liable to return dirty reads - changes made to the dictionary after GetEnumerator was called, but before the enumeration got to that point in the dictionary. It's worth listing at this point which methods are lockless, and which take out all the locks in the dictionary to ensure they get a consistent view of the dictionary: Lockless: TryGetValue GetEnumerator The indexer getter ContainsKey Takes out every lock (lockfull?): Count IsEmpty Keys Values CopyTo ToArray Concurrent principles That covers the overall implementation of ConcurrentDictionary. I haven't even begun to scratch the surface of this sophisticated collection. That I leave to you. However, we've looked at enough to be able to extract some useful principles for concurrent programming: Partitioning When using locks, the work is partitioned into independant chunks, each with its own lock. Each partition can then be modified concurrently to other partitions. Ordered lock-taking When a method does need to control the entire collection, locks are taken and released in a fixed order to prevent deadlocks. Lockless reads Read operations that don't care about dirty reads don't take out any lock; the rest of the collection is implemented so that any reading thread always has a consistent view of the collection. That leads us to the final collection in this little series - ConcurrentBag. Lacking a non-concurrent analogy, it is quite different to any other collection in the class libraries. Prepare your thinking hats!

Read the article
How to count each digit in a range of integers?

- by Carlos Gutiérrez

Imagine you sell those metallic digits used to number houses, locker doors, hotel rooms, etc. You need to find how many of each digit to ship when your customer needs to number doors/houses: 1 to 100 51 to 300 1 to 2,000 with zeros to the left The obvious solution is to do a loop from the first to the last number, convert the counter to a string with or without zeros to the left, extract each digit and use it as an index to increment an array of 10 integers. I wonder if there is a better way to solve this, without having to loop through the entire integers range. Solutions in any language or pseudocode are welcome. Edit: Answers review John at CashCommons and Wayne Conrad comment that my current approach is good and fast enough. Let me use a silly analogy: If you were given the task of counting the squares in a chess board in less than 1 minute, you could finish the task by counting the squares one by one, but a better solution is to count the sides and do a multiplication, because you later may be asked to count the tiles in a building. Alex Reisner points to a very interesting mathematical law that, unfortunately, doesn’t seem to be relevant to this problem. Andres suggests the same algorithm I’m using, but extracting digits with %10 operations instead of substrings. John at CashCommons and phord propose pre-calculating the digits required and storing them in a lookup table or, for raw speed, an array. This could be a good solution if we had an absolute, unmovable, set in stone, maximum integer value. I’ve never seen one of those. High-Performance Mark and strainer computed the needed digits for various ranges. The result for one millon seems to indicate there is a proportion, but the results for other number show different proportions. strainer found some formulas that may be used to count digit for number which are a power of ten. Robert Harvey had a very interesting experience posting the question at MathOverflow. One of the math guys wrote a solution using mathematical notation. Aaronaught developed and tested a solution using mathematics. After posting it he reviewed the formulas originated from Math Overflow and found a flaw in it (point to Stackoverflow :). noahlavine developed an algorithm and presented it in pseudocode. A new solution After reading all the answers, and doing some experiments, I found that for a range of integer from 1 to 10n-1: For digits 1 to 9, n*10(n-1) pieces are needed For digit 0, if not using leading zeros, n*10n-1 - ((10n-1) / 9) are needed For digit 0, if using leading zeros, n*10n-1 - n are needed The first formula was found by strainer (and probably by others), and I found the other two by trial and error (but they may be included in other answers). For example, if n = 6, range is 1 to 999,999: For digits 1 to 9 we need 6*105 = 600,000 of each one For digit 0, without leading zeros, we need 6*105 – (106-1)/9 = 600,000 - 111,111 = 488,889 For digit 0, with leading zeros, we need 6*105 – 6 = 599,994 These numbers can be checked using High-Performance Mark results. Using these formulas, I improved the original algorithm. It still loops from the first to the last number in the range of integers, but, if it finds a number which is a power of ten, it uses the formulas to add to the digits count the quantity for a full range of 1 to 9 or 1 to 99 or 1 to 999 etc. Here's the algorithm in pseudocode: integer First,Last //First and last number in the range integer Number //Current number in the loop integer Power //Power is the n in 10^n in the formulas integer Nines //Nines is the resut of 10^n - 1, 10^5 - 1 = 99999 integer Prefix //First digits in a number. For 14,200, prefix is 142 array 0..9 Digits //Will hold the count for all the digits FOR Number = First TO Last CALL TallyDigitsForOneNumber WITH Number,1 //Tally the count of each digit //in the number, increment by 1 //Start of optimization. Comments are for Number = 1,000 and Last = 8,000. Power = Zeros at the end of number //For 1,000, Power = 3 IF Power 0 //The number ends in 0 00 000 etc Nines = 10^Power-1 //Nines = 10^3 - 1 = 1000 - 1 = 999 IF Number+Nines <= Last //If 1,000+999 < 8,000, add a full set Digits[0-9] += Power*10^(Power-1) //Add 3*10^(3-1) = 300 to digits 0 to 9 Digits[0] -= -Power //Adjust digit 0 (leading zeros formula) Prefix = First digits of Number //For 1000, prefix is 1 CALL TallyDigitsForOneNumber WITH Prefix,Nines //Tally the count of each //digit in prefix, //increment by 999 Number += Nines //Increment the loop counter 999 cycles ENDIF ENDIF //End of optimization ENDFOR SUBROUTINE TallyDigitsForOneNumber PARAMS Number,Count REPEAT Digits [ Number % 10 ] += Count Number = Number / 10 UNTIL Number = 0 For example, for range 786 to 3,021, the counter will be incremented: By 1 from 786 to 790 (5 cycles) By 9 from 790 to 799 (1 cycle) By 1 from 799 to 800 By 99 from 800 to 899 By 1 from 899 to 900 By 99 from 900 to 999 By 1 from 999 to 1000 By 999 from 1000 to 1999 By 1 from 1999 to 2000 By 999 from 2000 to 2999 By 1 from 2999 to 3000 By 1 from 3000 to 3010 (10 cycles) By 9 from 3010 to 3019 (1 cycle) By 1 from 3019 to 3021 (2 cycles) Total: 28 cycles Without optimization: 2,235 cycles Note that this algorithm solves the problem without leading zeros. To use it with leading zeros, I used a hack: If range 700 to 1,000 with leading zeros is needed, use the algorithm for 10,700 to 11,000 and then substract 1,000 - 700 = 300 from the count of digit 1. Benchmark and Source code I tested the original approach, the same approach using %10 and the new solution for some large ranges, with these results: Original 104.78 seconds With %10 83.66 With Powers of Ten 0.07 A screenshot of the benchmark application: If you would like to see the full source code or run the benchmark, use these links: Complete Source code (in Clarion): http://sca.mx/ftp/countdigits.txt Compilable project and win32 exe: http://sca.mx/ftp/countdigits.zip Accepted answer noahlavine solution may be correct, but l just couldn’t follow the pseudo code, I think there are some details missing or not completely explained. Aaronaught solution seems to be correct, but the code is just too complex for my taste. I accepted strainer’s answer, because his line of thought guided me to develop this new solution.

Read the article
A Taxonomy of Numerical Methods v1

- by JoshReuben

Numerical Analysis – When, What, (but not how) Once you understand the Math & know C++, Numerical Methods are basically blocks of iterative & conditional math code. I found the real trick was seeing the forest for the trees – knowing which method to use for which situation. Its pretty easy to get lost in the details – so I’ve tried to organize these methods in a way that I can quickly look this up. I’ve included links to detailed explanations and to C++ code examples. I’ve tried to classify Numerical methods in the following broad categories: Solving Systems of Linear Equations Solving Non-Linear Equations Iteratively Interpolation Curve Fitting Optimization Numerical Differentiation & Integration Solving ODEs Boundary Problems Solving EigenValue problems Enjoy – I did ! Solving Systems of Linear Equations Overview Solve sets of algebraic equations with x unknowns The set is commonly in matrix form Gauss-Jordan Elimination http://en.wikipedia.org/wiki/Gauss%E2%80%93Jordan_elimination C++: http://www.codekeep.net/snippets/623f1923-e03c-4636-8c92-c9dc7aa0d3c0.aspx Produces solution of the equations & the coefficient matrix Efficient, stable 2 steps: · Forward Elimination – matrix decomposition: reduce set to triangular form (0s below the diagonal) or row echelon form. If degenerate, then there is no solution · Backward Elimination –write the original matrix as the product of ints inverse matrix & its reduced row-echelon matrix à reduce set to row canonical form & use back-substitution to find the solution to the set Elementary ops for matrix decomposition: · Row multiplication · Row switching · Add multiples of rows to other rows Use pivoting to ensure rows are ordered for achieving triangular form LU Decomposition http://en.wikipedia.org/wiki/LU_decomposition C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-lu-decomposition-for-solving.html Represent the matrix as a product of lower & upper triangular matrices A modified version of GJ Elimination Advantage – can easily apply forward & backward elimination to solve triangular matrices Techniques: · Doolittle Method – sets the L matrix diagonal to unity · Crout Method - sets the U matrix diagonal to unity Note: both the L & U matrices share the same unity diagonal & can be stored compactly in the same matrix Gauss-Seidel Iteration http://en.wikipedia.org/wiki/Gauss%E2%80%93Seidel_method C++: http://www.nr.com/forum/showthread.php?t=722 Transform the linear set of equations into a single equation & then use numerical integration (as integration formulas have Sums, it is implemented iteratively). an optimization of Gauss-Jacobi: 1.5 times faster, requires 0.25 iterations to achieve the same tolerance Solving Non-Linear Equations Iteratively find roots of polynomials – there may be 0, 1 or n solutions for an n order polynomial use iterative techniques Iterative methods · used when there are no known analytical techniques · Requires set functions to be continuous & differentiable · Requires an initial seed value – choice is critical to convergence à conduct multiple runs with different starting points & then select best result · Systematic - iterate until diminishing returns, tolerance or max iteration conditions are met · bracketing techniques will always yield convergent solutions, non-bracketing methods may fail to converge Incremental method if a nonlinear function has opposite signs at 2 ends of a small interval x1 & x2, then there is likely to be a solution in their interval – solutions are detected by evaluating a function over interval steps, for a change in sign, adjusting the step size dynamically. Limitations – can miss closely spaced solutions in large intervals, cannot detect degenerate (coinciding) solutions, limited to functions that cross the x-axis, gives false positives for singularities Fixed point method http://en.wikipedia.org/wiki/Fixed-point_iteration C++: http://books.google.co.il/books?id=weYj75E_t6MC&pg=PA79&lpg=PA79&dq=fixed+point+method++c%2B%2B&source=bl&ots=LQ-5P_taoC&sig=lENUUIYBK53tZtTwNfHLy5PEWDk&hl=en&sa=X&ei=wezDUPW1J5DptQaMsIHQCw&redir_esc=y#v=onepage&q=fixed%20point%20method%20%20c%2B%2B&f=false Algebraically rearrange a solution to isolate a variable then apply incremental method Bisection method http://en.wikipedia.org/wiki/Bisection_method C++: http://numericalcomputing.wordpress.com/category/algorithms/ Bracketed - Select an initial interval, keep bisecting it ad midpoint into sub-intervals and then apply incremental method on smaller & smaller intervals – zoom in Adv: unaffected by function gradient à reliable Disadv: slow convergence False Position Method http://en.wikipedia.org/wiki/False_position_method C++: http://www.dreamincode.net/forums/topic/126100-bisection-and-false-position-methods/ Bracketed - Select an initial interval , & use the relative value of function at interval end points to select next sub-intervals (estimate how far between the end points the solution might be & subdivide based on this) Newton-Raphson method http://en.wikipedia.org/wiki/Newton's_method C++: http://www-users.cselabs.umn.edu/classes/Summer-2012/csci1113/index.php?page=./newt3 Also known as Newton's method Convenient, efficient Not bracketed – only a single initial guess is required to start iteration – requires an analytical expression for the first derivative of the function as input. Evaluates the function & its derivative at each step. Can be extended to the Newton MutiRoot method for solving multiple roots Can be easily applied to an of n-coupled set of non-linear equations – conduct a Taylor Series expansion of a function, dropping terms of order n, rewrite as a Jacobian matrix of PDs & convert to simultaneous linear equations !!! Secant Method http://en.wikipedia.org/wiki/Secant_method C++: http://forum.vcoderz.com/showthread.php?p=205230 Unlike N-R, can estimate first derivative from an initial interval (does not require root to be bracketed) instead of inputting it Since derivative is approximated, may converge slower. Is fast in practice as it does not have to evaluate the derivative at each step. Similar implementation to False Positive method Birge-Vieta Method http://mat.iitm.ac.in/home/sryedida/public_html/caimna/transcendental/polynomial%20methods/bv%20method.html C++: http://books.google.co.il/books?id=cL1boM2uyQwC&pg=SA3-PA51&lpg=SA3-PA51&dq=Birge-Vieta+Method+c%2B%2B&source=bl&ots=QZmnDTK3rC&sig=BPNcHHbpR_DKVoZXrLi4nVXD-gg&hl=en&sa=X&ei=R-_DUK2iNIjzsgbE5ID4Dg&redir_esc=y#v=onepage&q=Birge-Vieta%20Method%20c%2B%2B&f=false combines Horner's method of polynomial evaluation (transforming into lesser degree polynomials that are more computationally efficient to process) with Newton-Raphson to provide a computational speed-up Interpolation Overview Construct new data points for as close as possible fit within range of a discrete set of known points (that were obtained via sampling, experimentation) Use Taylor Series Expansion of a function f(x) around a specific value for x Linear Interpolation http://en.wikipedia.org/wiki/Linear_interpolation C++: http://www.hamaluik.com/?p=289 Straight line between 2 points à concatenate interpolants between each pair of data points Bilinear Interpolation http://en.wikipedia.org/wiki/Bilinear_interpolation C++: http://supercomputingblog.com/graphics/coding-bilinear-interpolation/2/ Extension of the linear function for interpolating functions of 2 variables – perform linear interpolation first in 1 direction, then in another. Used in image processing – e.g. texture mapping filter. Uses 4 vertices to interpolate a value within a unit cell. Lagrange Interpolation http://en.wikipedia.org/wiki/Lagrange_polynomial C++: http://www.codecogs.com/code/maths/approximation/interpolation/lagrange.php For polynomials Requires recomputation for all terms for each distinct x value – can only be applied for small number of nodes Numerically unstable Barycentric Interpolation http://epubs.siam.org/doi/pdf/10.1137/S0036144502417715 C++: http://www.gamedev.net/topic/621445-barycentric-coordinates-c-code-check/ Rearrange the terms in the equation of the Legrange interpolation by defining weight functions that are independent of the interpolated value of x Newton Divided Difference Interpolation http://en.wikipedia.org/wiki/Newton_polynomial C++: http://jee-appy.blogspot.co.il/2011/12/newton-divided-difference-interpolation.html Hermite Divided Differences: Interpolation polynomial approximation for a given set of data points in the NR form - divided differences are used to approximately calculate the various differences. For a given set of 3 data points , fit a quadratic interpolant through the data Bracketed functions allow Newton divided differences to be calculated recursively Difference table Cubic Spline Interpolation http://en.wikipedia.org/wiki/Spline_interpolation C++: https://www.marcusbannerman.co.uk/index.php/home/latestarticles/42-articles/96-cubic-spline-class.html Spline is a piecewise polynomial Provides smoothness – for interpolations with significantly varying data Use weighted coefficients to bend the function to be smooth & its 1st & 2nd derivatives are continuous through the edge points in the interval Curve Fitting A generalization of interpolating whereby given data points may contain noise à the curve does not necessarily pass through all the points Least Squares Fit http://en.wikipedia.org/wiki/Least_squares C++: http://www.ccas.ru/mmes/educat/lab04k/02/least-squares.c Residual – difference between observed value & expected value Model function is often chosen as a linear combination of the specified functions Determines: A) The model instance in which the sum of squared residuals has the least value B) param values for which model best fits data Straight Line Fit Linear correlation between independent variable and dependent variable Linear Regression http://en.wikipedia.org/wiki/Linear_regression C++: http://www.oocities.org/david_swaim/cpp/linregc.htm Special case of statistically exact extrapolation Leverage least squares Given a basis function, the sum of the residuals is determined and the corresponding gradient equation is expressed as a set of normal linear equations in matrix form that can be solved (e.g. using LU Decomposition) Can be weighted - Drop the assumption that all errors have the same significance –-> confidence of accuracy is different for each data point. Fit the function closer to points with higher weights Polynomial Fit - use a polynomial basis function Moving Average http://en.wikipedia.org/wiki/Moving_average C++: http://www.codeproject.com/Articles/17860/A-Simple-Moving-Average-Algorithm Used for smoothing (cancel fluctuations to highlight longer-term trends & cycles), time series data analysis, signal processing filters Replace each data point with average of neighbors. Can be simple (SMA), weighted (WMA), exponential (EMA). Lags behind latest data points – extra weight can be given to more recent data points. Weights can decrease arithmetically or exponentially according to distance from point. Parameters: smoothing factor, period, weight basis Optimization Overview Given function with multiple variables, find Min (or max by minimizing –f(x)) Iterative approach Efficient, but not necessarily reliable Conditions: noisy data, constraints, non-linear models Detection via sign of first derivative - Derivative of saddle points will be 0 Local minima Bisection method Similar method for finding a root for a non-linear equation Start with an interval that contains a minimum Golden Search method http://en.wikipedia.org/wiki/Golden_section_search C++: http://www.codecogs.com/code/maths/optimization/golden.php Bisect intervals according to golden ratio 0.618.. Achieves reduction by evaluating a single function instead of 2 Newton-Raphson Method Brent method http://en.wikipedia.org/wiki/Brent's_method C++: http://people.sc.fsu.edu/~jburkardt/cpp_src/brent/brent.cpp Based on quadratic or parabolic interpolation – if the function is smooth & parabolic near to the minimum, then a parabola fitted through any 3 points should approximate the minima – fails when the 3 points are collinear , in which case the denominator is 0 Simplex Method http://en.wikipedia.org/wiki/Simplex_algorithm C++: http://www.codeguru.com/cpp/article.php/c17505/Simplex-Optimization-Algorithm-and-Implemetation-in-C-Programming.htm Find the global minima of any multi-variable function Direct search – no derivatives required At each step it maintains a non-degenerative simplex – a convex hull of n+1 vertices. Obtains the minimum for a function with n variables by evaluating the function at n-1 points, iteratively replacing the point of worst result with the point of best result, shrinking the multidimensional simplex around the best point. Point replacement involves expanding & contracting the simplex near the worst value point to determine a better replacement point Oscillation can be avoided by choosing the 2nd worst result Restart if it gets stuck Parameters: contraction & expansion factors Simulated Annealing http://en.wikipedia.org/wiki/Simulated_annealing C++: http://code.google.com/p/cppsimulatedannealing/ Analogy to heating & cooling metal to strengthen its structure Stochastic method – apply random permutation search for global minima - Avoid entrapment in local minima via hill climbing Heating schedule - Annealing schedule params: temperature, iterations at each temp, temperature delta Cooling schedule – can be linear, step-wise or exponential Differential Evolution http://en.wikipedia.org/wiki/Differential_evolution C++: http://www.amichel.com/de/doc/html/ More advanced stochastic methods analogous to biological processes: Genetic algorithms, evolution strategies Parallel direct search method against multiple discrete or continuous variables Initial population of variable vectors chosen randomly – if weighted difference vector of 2 vectors yields a lower objective function value then it replaces the comparison vector Many params: #parents, #variables, step size, crossover constant etc Convergence is slow – many more function evaluations than simulated annealing Numerical Differentiation Overview 2 approaches to finite difference methods: · A) approximate function via polynomial interpolation then differentiate · B) Taylor series approximation – additionally provides error estimate Finite Difference methods http://en.wikipedia.org/wiki/Finite_difference_method C++: http://www.wpi.edu/Pubs/ETD/Available/etd-051807-164436/unrestricted/EAMPADU.pdf Find differences between high order derivative values - Approximate differential equations by finite differences at evenly spaced data points Based on forward & backward Taylor series expansion of f(x) about x plus or minus multiples of delta h. Forward / backward difference - the sums of the series contains even derivatives and the difference of the series contains odd derivatives – coupled equations that can be solved. Provide an approximation of the derivative within a O(h^2) accuracy There is also central difference & extended central difference which has a O(h^4) accuracy Richardson Extrapolation http://en.wikipedia.org/wiki/Richardson_extrapolation C++: http://mathscoding.blogspot.co.il/2012/02/introduction-richardson-extrapolation.html A sequence acceleration method applied to finite differences Fast convergence, high accuracy O(h^4) Derivatives via Interpolation Cannot apply Finite Difference method to discrete data points at uneven intervals – so need to approximate the derivative of f(x) using the derivative of the interpolant via 3 point Lagrange Interpolation Note: the higher the order of the derivative, the lower the approximation precision Numerical Integration Estimate finite & infinite integrals of functions More accurate procedure than numerical differentiation Use when it is not possible to obtain an integral of a function analytically or when the function is not given, only the data points are Newton Cotes Methods http://en.wikipedia.org/wiki/Newton%E2%80%93Cotes_formulas C++: http://www.siafoo.net/snippet/324 For equally spaced data points Computationally easy – based on local interpolation of n rectangular strip areas that is piecewise fitted to a polynomial to get the sum total area Evaluate the integrand at n+1 evenly spaced points – approximate definite integral by Sum Weights are derived from Lagrange Basis polynomials Leverage Trapezoidal Rule for default 2nd formulas, Simpson 1/3 Rule for substituting 3 point formulas, Simpson 3/8 Rule for 4 point formulas. For 4 point formulas use Bodes Rule. Higher orders obtain more accurate results Trapezoidal Rule uses simple area, Simpsons Rule replaces the integrand f(x) with a quadratic polynomial p(x) that uses the same values as f(x) for its end points, but adds a midpoint Romberg Integration http://en.wikipedia.org/wiki/Romberg's_method C++: http://code.google.com/p/romberg-integration/downloads/detail?name=romberg.cpp&can=2&q= Combines trapezoidal rule with Richardson Extrapolation Evaluates the integrand at equally spaced points The integrand must have continuous derivatives Each R(n,m) extrapolation uses a higher order integrand polynomial replacement rule (zeroth starts with trapezoidal) à a lower triangular matrix set of equation coefficients where the bottom right term has the most accurate approximation. The process continues until the difference between 2 successive diagonal terms becomes sufficiently small. Gaussian Quadrature http://en.wikipedia.org/wiki/Gaussian_quadrature C++: http://www.alglib.net/integration/gaussianquadratures.php Data points are chosen to yield best possible accuracy – requires fewer evaluations Ability to handle singularities, functions that are difficult to evaluate The integrand can include a weighting function determined by a set of orthogonal polynomials. Points & weights are selected so that the integrand yields the exact integral if f(x) is a polynomial of degree <= 2n+1 Techniques (basically different weighting functions): · Gauss-Legendre Integration w(x)=1 · Gauss-Laguerre Integration w(x)=e^-x · Gauss-Hermite Integration w(x)=e^-x^2 · Gauss-Chebyshev Integration w(x)= 1 / Sqrt(1-x^2) Solving ODEs Use when high order differential equations cannot be solved analytically Evaluated under boundary conditions RK for systems – a high order differential equation can always be transformed into a coupled first order system of equations Euler method http://en.wikipedia.org/wiki/Euler_method C++: http://rosettacode.org/wiki/Euler_method First order Runge–Kutta method. Simple recursive method – given an initial value, calculate derivative deltas. Unstable & not very accurate (O(h) error) – not used in practice A first-order method - the local error (truncation error per step) is proportional to the square of the step size, and the global error (error at a given time) is proportional to the step size In evolving solution between data points xn & xn+1, only evaluates derivatives at beginning of interval xn à asymmetric at boundaries Higher order Runge Kutta http://en.wikipedia.org/wiki/Runge%E2%80%93Kutta_methods C++: http://www.dreamincode.net/code/snippet1441.htm 2nd & 4th order RK - Introduces parameterized midpoints for more symmetric solutions à accuracy at higher computational cost Adaptive RK – RK-Fehlberg – estimate the truncation at each integration step & automatically adjust the step size to keep error within prescribed limits. At each step 2 approximations are compared – if in disagreement to a specific accuracy, the step size is reduced Boundary Value Problems Where solution of differential equations are located at 2 different values of the independent variable x à more difficult, because cannot just start at point of initial value – there may not be enough starting conditions available at the end points to produce a unique solution An n-order equation will require n boundary conditions – need to determine the missing n-1 conditions which cause the given conditions at the other boundary to be satisfied Shooting Method http://en.wikipedia.org/wiki/Shooting_method C++: http://ganeshtiwaridotcomdotnp.blogspot.co.il/2009/12/c-c-code-shooting-method-for-solving.html Iteratively guess the missing values for one end & integrate, then inspect the discrepancy with the boundary values of the other end to adjust the estimate Given the starting boundary values u1 & u2 which contain the root u, solve u given the false position method (solving the differential equation as an initial value problem via 4th order RK), then use u to solve the differential equations. Finite Difference Method For linear & non-linear systems Higher order derivatives require more computational steps – some combinations for boundary conditions may not work though Improve the accuracy by increasing the number of mesh points Solving EigenValue Problems An eigenvalue can substitute a matrix when doing matrix multiplication à convert matrix multiplication into a polynomial EigenValue For a given set of equations in matrix form, determine what are the solution eigenvalue & eigenvectors Similar Matrices - have same eigenvalues. Use orthogonal similarity transforms to reduce a matrix to diagonal form from which eigenvalue(s) & eigenvectors can be computed iteratively Jacobi method http://en.wikipedia.org/wiki/Jacobi_method C++: http://people.sc.fsu.edu/~jburkardt/classes/acs2_2008/openmp/jacobi/jacobi.html Robust but Computationally intense – use for small matrices < 10x10 Power Iteration http://en.wikipedia.org/wiki/Power_iteration For any given real symmetric matrix, generate the largest single eigenvalue & its eigenvectors Simplest method – does not compute matrix decomposition à suitable for large, sparse matrices Inverse Iteration Variation of power iteration method – generates the smallest eigenvalue from the inverse matrix Rayleigh Method http://en.wikipedia.org/wiki/Rayleigh's_method_of_dimensional_analysis Variation of power iteration method Rayleigh Quotient Method Variation of inverse iteration method Matrix Tri-diagonalization Method Use householder algorithm to reduce an NxN symmetric matrix to a tridiagonal real symmetric matrix vua N-2 orthogonal transforms Whats Next Outside of Numerical Methods there are lots of different types of algorithms that I’ve learned over the decades: Data Mining – (I covered this briefly in a previous post: http://geekswithblogs.net/JoshReuben/archive/2007/12/31/ssas-dm-algorithms.aspx ) Search & Sort Routing Problem Solving Logical Theorem Proving Planning Probabilistic Reasoning Machine Learning Solvers (eg MIP) Bioinformatics (Sequence Alignment, Protein Folding) Quant Finance (I read Wilmott’s books – interesting) Sooner or later, I’ll cover the above topics as well.

Read the article

Search Results

Search found 145 results on 6 pages for 'analogy'.

Page 6/6 | < Previous Page | 2 3 4 5 6

- by Cheeso

- by Cheeso

- by James A. Rosen

- by whichhand

- by Robert May

- by JoshReuben

- by Ashok_Ora

- by PRajkumar

- by Stephen.Walther

- by Alois Kraus

- by Rob Farley

- by barryoreilly

- by Dewald Galjaard

- by redben

- by Harvey

- by bob.ertl(at)oracle.com

- by Simon Cooper

- by Carlos Gutiérrez

- by JoshReuben

< Previous Page | 2 3 4 5 6