Search Results

Search found 9706 results on 389 pages for 'aggregate functions'.

Page 288/389 | < Previous Page | 284 285 286 287 288 289 290 291 292 293 294 295 | Next Page >

A little gem from MPN–FREE online course on Architectural Guidance for Migrating Applications to Windows Azure Platform

- by Eric Nelson

I know a lot of technical people who work in partners (ISVs, System Integrators etc). I know that virtually none of them would think of going to the Microsoft Partner Network (MPN) learning portal to find some deep and high quality technical content. Instead they would head to MSDN, Channel 9, msdev.com etc. I am one of those people :-) Hence imagine my surprise when i stumbled upon this little gem Architectural Guidance for Migrating Applications to Windows Azure Platform (your company and hence your live id need to be a member of MPN – which is free to join). This is first class stuff – and represents about 4 hours which is really 8 if you stop and ponder :) Course Structure The course is divided into eight modules. Each module explores a different factor that needs to be considered as part of the migration process. Module 1: Introduction: This section provides an introduction to the training course, highlighting the values of the Windows Azure Platform for developers. Module 2: Dynamic Environment: This section goes into detail about the dynamic environment of the Windows Azure Platform. This session will explain the difference between current development states and the Windows Azure Platform environment, detail the functions of roles, and highlight development considerations to be aware of when working with the Windows Azure Platform. Module 3: Local State: This session details the local state of the Windows Azure Platform. This section details the different types of storage within the Windows Azure Platform (Blobs, Tables, Queues, and SQL Azure). The training will provide technical guidance on local storage usage, how to write to blobs, how to effectively use table storage, and other authorization methods. Module 4: Latency and Timeouts: This session goes into detail explaining the considerations surrounding latency, timeouts and how to assess an IT portfolio. Module 5: Transactions and Bandwidth: This session details the performance metrics surrounding transactions and bandwidth in the Windows Azure Platform environment. This session will detail the transactions and bandwidth costs involved with the Windows Azure Platform and mitigation techniques that can be used to properly manage those costs. Module 6: Authentication and Authorization: This session details authentication and authorization protocols within the Windows Azure Platform. This session will detail information around web methods of authorization, web identification, Access Control Benefits, and a walkthrough of the Windows Identify Foundation. Module 7: Data Sensitivity: This session details data considerations that users and developers will experience when placing data into the cloud. This section of the training highlights these concerns, and details the strategies that developers can take to increase the security of their data in the cloud. Module 8: Summary Provides an overall review of the course.

Read the article
Documenting sp_ssiscatalog

- by jamiet

What is the best way to document an API? Moreover, what is the best way to document a T-SQL API? Before I try to answer those questions I should explain what I mean by “a T-SQL API”. I think of an API as being a collection of well-defined, known, code modules that provide some notion of a service to whomever uses it; in T-SQL terms I tend to think of a collection of stored procedures and functions as a form of API. Its a loose definition, I admit, and in SQL Server circles we don’t tend to think of stored procedures collectively as an API but if you think about it that’s exactly what they are. The question of how to document a T-SQL API came to my mind as I worked on sp_ssiscatalog. How could I make it easy for people to learn about the capabilities of sp_ssiscatalog without forcing them to dig through the code and find out for themselves? My opening gambit was to write documentation pages on the wiki at http://ssisreportingpack.codeplex.com. That’s kinda useful but it does suffer the disadvantage that someone using sp_ssiscatalog needs to go visit a webpage to read it – I want the documentation to be available wherever the user is using sp_ssiscatalog. Moreover, maintaining the wiki is a real PITA. Intellisense works up to a point, I guess: but that only shows whatever SQL Server knows about the various parameters, which isn’t all that much! I wanted a better way for my API users to learn about its capabilities and so I hit upon the idea of simply using PRINT statements within the code itself to inform the user what options are available; hence I added such PRINT statements in the latest check-in. Now when you execute (for example): EXEC sp_ssiscatalog @operation_type='execs' you can hit F6 a few times to view the messages pane and you shall see something like this: Notice that I’m returning information about all the parameters that can be used to affect the results that just got returned. I really do think this will be very useful to anyone using sp_ssiscatalog; I myself am always forgetting what the parameters are and I wrote the damn thing so I can’t really expect anyone else to remember them. I have not yet made available a release that has these changes in it but when I do I’ll blog about it right here. At the time of writing the latest available release of sp_ssiscatalog is DB v1.0.1.0 but if you want to the latest and greatest simply download it straight from source. Feedback is welcome as always. @Jamiet

Read the article
"Whole-team" C++ features?

- by Blaisorblade

In C++, features like exceptions impact your whole program: you can either disable them in your whole program, or you need to deal with them throughout your code. As a famous article on C++ Report puts it: Counter-intuitively, the hard part of coding exceptions is not the explicit throws and catches. The really hard part of using exceptions is to write all the intervening code in such a way that an arbitrary exception can propagate from its throw site to its handler, arriving safely and without damaging other parts of the program along the way. Since even new throws exceptions, every function needs to provide basic exception safety — unless it only calls functions which guarantee throwing no exception — unless you disable exceptions altogether in your whole project. Hence, exceptions are a "whole-program" or "whole-team" feature, since they must be understood by everybody in a team using them. But not all C++ features are like that, as far as I know. A possible example is that if I don't get templates but I do not use them, I will still be able to write correct C++ — or will I not?. I can even call sort on an array of integers and enjoy its amazing speed advantage wrt. C's qsort (because no function pointer is called), without risking bugs — or not? It seems templates are not "whole-team". Are there other C++ features which impact code not directly using them, and are hence "whole-team"? I am especially interested in features not present in C. Update: I'm especially looking for features where there's no language-enforced sign you need to be aware of them. The first answer I got mentioned const-correctness, which is also whole-team, hence everybody needs to learn about it; however, AFAICS it will impact you only if you call a function which is marked const, and the compiler will prevent you from calling it on non-const objects, so you get something to google for. With exceptions, you don't even get that; moreover, they're always used as soon as you use new, hence exceptions are more "insidious". Since I can't phrase this as objectively, though, I will appreciate any whole-team feature. Appendix: Why this question is objective (if you wonder) C++ is a complex language, so many projects or coding guides try to select "simple" C++ features, and many people try to include or exclude some ones according to mostly subjective criteria. Questions about that get rightfully closed regularly here on SO. Above, instead, I defined (as precisely as possible) what a "whole-team" language feature is, provide an example (exceptions), together with extensive supporting evidence in the literature about C++, and ask for whole-team features in C++ beyond exceptions. Whether you should use "whole-team" features, or whether that's a relevant concept, might be subjective — but that only means the importance of this question is subjective, like always.

Read the article
F# Project Euler Problem 1

- by MarkPearl

Every now and then I give project Euler a quick browse. Since I have been playing with F# I have found it a great way to learn the basics of the language. Today I thought I would give problem 1 an attempt… Problem 1 If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The sum of these multiples is 23. Find the sum of all the multiples of 3 or 5 below 1000. My F# Solution I broke this problem into two functions… 1) be able to generate a collection of numbers that are multiples of a number but but are smaller than another number. let GenerateMultiplesOfXbelowY X Y = X |> Seq.unfold (fun i -> if (i<Y) then Some(i, i+X) else None) I then needed something that generated collections for multiples of 3 & 5 and then removed any duplicates. Once this was done I would need to sum these all together to get a result. I found the Seq object to be extremely useful to achieve this… let Multiples = Seq.append (GenerateMultiplesOfXbelowY 3 1000) (GenerateMultiplesOfXbelowY 5 1000) |> Seq.distinct |> Seq.fold(fun acc a -> acc + a) 0 |> Console.WriteLine |> Console.ReadLine My complete solution was … open System let GenerateMultiplesOfXbelowY X Y = X |> Seq.unfold (fun i -> if (i<Y) then Some(i, i+X) else None) let Multiples = Seq.append (GenerateMultiplesOfXbelowY 3 1000) (GenerateMultiplesOfXbelowY 5 1000) |> Seq.distinct |> Seq.fold(fun acc a -> acc + a) 0 |> Console.WriteLine |> Console.ReadLine Which seemed to generate the correct result in a relatively short period of time although I am sure I will get some comments from the experts who know of some intrinsic method to achieve all of this in one method call.

Read the article
How is fundamental mathematics efficiently evaluated by programming languages?

- by Korvin Szanto

As I get more and more involved with the theory behind programming, I find myself fascinated and dumbfounded by seemingly simple things.. I realize that my understanding of the majority of fundamental processes is justified through circular logic Q: How does this work? A: Because it does! I hate this realization! I love knowledge, and on top of that I love learning, which leads me to my question (albeit it's a broad one). Question: How are fundamental mathematical operators assessed with programming languages? How have current methods been improved? Example var = 5 * 5; My interpretation: $num1 = 5; $num2 = 5; $num3 = 0; while ($num2 > 0) { $num3 = $num3 + $num1; $num2 = $num2 - 1; } echo $num3; This seems to be highly inefficient. With Higher factors, this method is very slow while the standard built in method is instantanious. How would you simulate multiplication without iterating addition? var = 5 / 5; How is this even done? I can't think of a way to literally split it 5 into 5 equal parts. var = 5 ^ 5; Iterations of iterations of addition? My interpretation: $base = 5; $mod = 5; $num1 = $base; while ($mod > 1) { $num2 = 5; $num3 = 0; while ($num2 > 0) { $num3 = $num3 + $num1; $num2 = $num2 - 1; } $num1 = $num3; $mod -=1; } echo $num3; Again, this is EXTREMELY inefficient, yet I can't think of another way to do this. This same question extends to all mathematical related functions that are handled automagically.

Read the article
C Programming arrays, I dont understand how I would go about making this program, If anyone can just guide me through the basic outline please :) [on hold]

- by Rashmi Kohli

Problem The temperature of a car engine has been measured, from real-world experiments, as shown in the table and graph below: Time (min) Temperature (oC) 0 20 1 36 2 61 3 68 4 77 5 110 Use linear regression to find the engine’s temperature at 1.5 minutes, 4.3 minutes, and any other time specified by the user. Background In engineering, many times we measure several data points in an experiment, but then we need to predict a value that we have not measured which lies between two measured values, such as the problem statement above. If the relation between the measured parameters seems to be roughly linear, then we can use linear regression to find the relationship between those parameters. In the graph of the problem statement above, the relation seems to be roughly linear. Hence, we can apply linear regression to the above problem. Assuming y {y0, y1, …yn-1} has a linear relation with x {x0, x1, … xn-1}, we can say that: y = mx+b where m and b can be found with linear regression as follows: For the problem in this lab, using linear regression gives us the following line (in blue) compared to the measured curve (in red). As you can see, there is usually a difference between the measured values and the estimated (predicted) values. What linear regression does is to minimize those differences and still give us a straight line (blue). Other methods, such as non-linear regression, are also possible to achieve higher accuracy and better curve fitting. Requirements Your program should first print the table of the temperatures similar to the way it’s printed in the problem statement. It should then calculate the temperature at minute 1.5 and 4.3 and show the answers to the user. Next, it should prompt the user to enter a time in minutes (or -1 to quit), and after reading the user’s specified time it should give the value of the engine’s temperature at that time. It should then go back to the prompt. Hints •Use a one dimensional array to store the temperature values given in the problem statement. •Use functions to separate tasks such as calculating m, calculating b, calculating the temperature at a given time, printing the prompt, etc. You can then give your algorithm as well as you pseudo code per function, as opposed to one large algorithm diagram or one large sequence of pseudo code.

Read the article
Designing Snake AI

- by Ronald

I'm new to this gamedev stackechange but have used the math and cs sites before. So, I'm in a competition to create AI for a snake that will compete with 3 other snakes in 5 minute rounds where the rules are much like the traditional Nokia snake game except that there are 4 snakes, the board is 50x50 and there are a number of small obstacles on the field. Like the Nokia game, your snake grows when you get to the fruit and if you crash into yourself, another snake or the wall you die. The game runs with a 50ms delay between moves and the server sends the new game state every 50ms which the code must analyze and what not and output the next move. The winner is the snake who had the longest length at any point in the game. Tie breakers are decided by kills. So far what I have done is implemented an A* graph search from each snake to determine if my snake is the closest to the apple and if it is, it goes for the apple. Otherwise, I made a neat little algorithm to determine the emptiest area of the board, which my snake goes for, to anticipate the next apple. Other than this I have some small survivability checks to ensure my snake isn't walking into a trap that it can't get out and if it does get stuck, I have something to give it a better chance of getting out. ... Anyway, I've tested my snake on a test server and it does quite well. Generally, my strategy of only going for the apple when its a sure thing and finding space when its not makes it grow faster than any other snakes (some snakes do a similar thing but often just go to the middle or a corner) sometimes it wins these trial games but is more often than not beaten by the same snake who seems to have the edge on survivability(my snake grows quicker but then dies somehow and this other snake just plods slowly along and wins on consistency. So I was wondering about any ideas anyone has to try and improve my snake. Or maybe ideas at a new approach to take. My functions and classes are good so changes that might seem drastic shouldn't be too bad. I encourage all ideas. Any thoughts ??

Read the article
Is it dangerous for me to give some of my Model classes Control-like methods?

- by Pureferret

In my personal project I have tried to stick to MVC, but I've also been made aware that sticking to MVC too tightly can be a bad thing as it makes writing awkward and forces the flow of the program in odd ways (i.e. some simple functions can be performed by something that normally wouldn't, and avoid MVC related overheads). So I'm beginning to feel justified in this compromise: I have some 'manager programs' that 'own' data and have some way to manipulate it, as such I think they'd count as both part of the model, and part of the control, and to me this feels more natural than keepingthem separate. For instance: One of my Managers is the PlayerCharacterManager that has these methods: void buySkill(PlayerCharacter playerCharacter, Skill skill); void changeName(); void changeRole(); void restatCharacter(); void addCharacterToGame(); void createNewCharacter(); PlayerCharacter getPlayerCharacter(); List<PlayerCharacter> getPlayersCharacter(Player player); List<PlayerCharacter> getAllCharacters(); I hope the mothod names are transparent enough that they don't all need explaining. I've called it a manager because it will help manage all of the PlayerCharacter 'model' objects the code creates, and create and keep a map of these. I may also get it to store other information in the future. I plan to have another two similar classes for this sort of control, but I will orchestrate when and how this happens, and what to do with the returned data via a pure controller class. This splitting up control between informed managers and the controller, as opposed to operating just through a controller seems like it will simplify my code and make it flow more. My question is, is this a dangerous choice, in terms of making the code harder to follow/test/fix? Is this somethign established as good or bad or neutral? I oculdn't find anything similar except the idea of Actors but that's not quite why I'm trying to do. Edit: Perhaps an example is needed; I'm using the Controller to update the view and access the data, so when I click the 'Add new character to a player button' it'll call methods in the controller that then go and tell the PlayerCharacterManager class to create a new character instance, it'll call the PlayerManager class to add that new character to the player-character map, and then it'll add this information to the database, and tell the view to update any GUIs effected. That is the sort of 'control sequence' I'm hoping to create with these manager classes.

Read the article
Advantages to Multiple Methods over Switch

- by tandu

I received a code review from a senior developer today asking "By the way, what is your objection to dispatching functions by way of a switch statement?" I have read in many places about how pumping an argument through switch to call methods is bad OOP, not as extensible, etc. However, I can't really come up with a definitive answer for him. I would like to settle this for myself once and for all. Here are our competing code suggestions (php used as an example, but can apply more universally): class Switch { public function go($arg) { switch ($arg) { case "one": echo "one\n"; break; case "two": echo "two\n"; break; case "three": echo "three\n"; break; default: throw new Exception("Unknown call: $arg"); break; } } } class Oop { public function go_one() { echo "one\n"; } public function go_two() { echo "two\n"; } public function go_three() { echo "three\n"; } public function __call($_, $__) { throw new Exception("Unknown call $_ with arguments: " . print_r($__, true)); } } Part of his argument was "It (switch method) has a much cleaner way of handling default cases than what you have in the generic __call() magic method." I disagree about the cleanliness and in fact prefer call, but I would like to hear what others have to say. Arguments I can come up with in support of Oop scheme: A bit cleaner in terms of the code you have to write (less, easier to read, less keywords to consider) Not all actions delegated to a single method. Not much difference in execution here, but at least the text is more compartmentalized. In the same vein, another method can be added anywhere in the class instead of a specific spot. Methods are namespaced, which is nice. Does not apply here, but consider a case where Switch::go() operated on a member rather than a parameter. You would have to change the member first, then call the method. For Oop you can call the methods independently at any time. Arguments I can come up with in support of Switch scheme: For the sake of argument, cleaner method of dealing with a default (unknown) request Seems less magical, which might make unfamiliar developers feel more comfortable Anyone have anything to add for either side? I'd like to have a good answer for him.

Read the article
Ad-hoc reporting similar to Microstrategy/Pentaho - is OLAP really the only choice (is OLAP even sufficient)?

- by TheBeefMightBeTough

So I'm getting ready to develop an API in Java that will provide all dimensions, metrics, hierarchies, etc to a user such that they can pick and choose what they want (say, e.g., dimensions of Location (a store) and Weekly, and the metric Product Sales $), provide their choices to the api, and have it spit out an object that contains the answer to their question (the object would probably be a set of cells). I don't even believe there will be much drill up/down. The data warehouse the APIwill interface with is in a standard form (FACT tables, dimensions, star schema format). My question is, is an OLAP framework such as Mondrian the only way to achieve something akin to ad-hoc reporting? I can envisage a really large Cube (or VirtualCube) that contains most of the dimensions and metrics the user could ever want, which would give the illusion of ad-hoc reporting. The problem is that there is a ton of setup to do (so much XML) to get the framework to work with the data. Further it requires specific knowledge, such as MDX, and even moreso learning the framework peculiars (Mondrian API). Finally, I am not positive it will scale much better than simply making queries against a SQL database. OLAP to me feels like very old technology. Is performance really an issue anymore? The alternative I can think of would be dynamic SQL. If the existing tables in the data warehouse conform to a naming scheme (FACT_, DIM_, etc), or if a very simple config file/ database table containing config information existed that stored which tables are fact tables, which are dimensions, and what metrics are available, then couldn't the api read from that and assembly the appropriate sql query? Would this necessarily be harder than learning MDX, Mondrian (or another OLAP framework), and creating all the cubes? In general, I feel that OLAP is at the same time too powerful (supports drill up/down, complex functions) and outdated and am reluctant to base my architecture on it. However, I am unsure if the alternative(s), such as rolling my own ad-hoc reporting framework using dynamic SQL would remove any complexity while still fulfilling requirements, both functional and non-functional (e.g., scalability; some FACT tables have many millions of rows). I also wonder about other techniques (e.g., hive). Has anyone here tried to do ad-hoc reporting? Any advice? I expect this project to take a pretty long time (3 months min, but probably longer), so I just do not want to commit to an architecture without being absolutely sure of its pros and cons. Thanks so much.

Read the article
The term "interface" in C++

- by Flexo

Java makes a clear distinction between class and interface. (I believe C# does also, but I have no experience with it). When writing C++ however there is no language enforced distinction between class and interface. Consequently I've always viewed interface as a workaround for the lack of multiple inheritance in Java. Making such a distinction feels arbitrary and meaningless in C++. I've always tended to go with the "write things in the most obvious way" approach, so if in C++ I've got what might be called an interface in Java, e.g.: class Foo { public: virtual void doStuff() = 0; ~Foo() = 0; }; and I then decided that most implementers of Foo wanted to share some common functionality I would probably write: class Foo { public: virtual void doStuff() = 0; ~Foo() {} protected: // If it needs this to do its thing: int internalHelperThing(int); // Or if it doesn't need the this pointer: static int someOtherHelper(int); }; Which then makes this not an interface in the Java sense anymore. Instead C++ has two important concepts, related to the same underlying inheritance problem: virtual inhertiance Classes with no member variables can occupy no extra space when used as a base "Base class subobjects may have zero size" Reference Of those I try to avoid #1 wherever possible - it's rare to encounter a scenario where that genuinely is the "cleanest" design. #2 is however a subtle, but important difference between my understanding of the term "interface" and the C++ language features. As a result of this I currently (almost) never refer to things as "interfaces" in C++ and talk in terms of base classes and their sizes. I would say that in the context of C++ "interface" is a misnomer. It has come to my attention though that not many people make such a distinction. Do I stand to lose anything by allowing (e.g. protected) non-virtual functions to exist within an "interface" in C++? (My feeling is the exactly the opposite - a more natural location for shared code) Is the term "interface" meaningful in C++ - does it imply only pure virtual or would it be fair to call C++ classes with no member variables an interface still?

Read the article
When a co-worker asks you to teach him what you know, do you share the information or keep it to yourself? [closed]

- by Chuck

I am the only developer/DBA in a small IT department. There is another guy who can do it, but he's more of a backup as he spends his time working on IT support stuff. Anyway we have a new hire and I've been training him on the IT support side of things. Seems like he is eager to learn and be productive, but nobody is going out of their way to show him anything. He's been asking me to teach him database design, SQL, etc. For some reason, the boss has him working with me. He is also sending him to meetings that I go to, yet he hasn't said outright that I have to teach him anything. Meanwhile, the boss insists on doing a lot of the support work himself (i.e. he hoards information and doesn't delegate to anyone). I'm a little bit on the fence. First, the new guy doesn't yet have a strong foundation on the IT support functions which is where we really need help at this time. Second, I paid thousands of dollars for classes and spent many hours learning this stuff. Is it my responsibility to teach others skills that I had to learn on my own? Others here really aren't quick to share information so I'm not sure that I should either in this environment. I do know that if I get him involved, and get him started on projects, then I'd be responsible for his mistakes. I had to take the heat for the other guy when he made mistakes. OTOH the guy wants to learn something, is motivated, and I don't want to stop him. We've had our share of slackers in the group and it's nice to have someone who is willing to work for a change. So what would you guys do? Would you teach him the skills that you spent all of that time learning? Set him up with a test database on his PC and recommend some books for him? Encourage him to get a strong foundation in IT support first and ask later? We haven't had a new hire in years, let alone one that is interested in what I do, so this is new to me.

Read the article
Imperative vs. component based programming [closed]

- by AlexW

I've been thinking about how programming and more specifically the teaching of programming is advocated amongst the community (online). Often I've heard that Ruby and RoR is an ideal platform for learning to program. I completely disagree... RoR and Ruby are based on the application of the component based paradigm, which means they are ideal for rapid application development. This is much like the MVC model in PHP and ASP.NET But, learning a proper imperative language like Java or C/C++ (or even Perl and PHP) is the only way for a new programmer to explore logic itself, and not get too bogged down in architectural concerns like the need for separation of concerns, and the preference for components. Maybe it's a personal preference thing. I rather think that the most interesting aspects to programming are the procedural bits of code I write that actually do stuff rather than the project planning, and modelling that comes about from fully object oriented engineering or simply using the MVC model. I know this may sound confused to some of you. I feel strongly though that the best way for programming to be taught is through imperative and procedural methods. Architectural (component) methods come later, if at all. After all, none of the amazing algorithms that exist were based on OOP practice! It's all procedural code when it comes to the 'magic'. OOP is useful in creating products and utilities. Algorithms are what makes things happen, and move data around, and so imperative (and/or procedural) code are what matters most. When I see programmers recommending Ruby on Rails to newbie developers, I think it's just so wrong. Just because you write less code with Ruby does not make it easier to do! It's the opposite... you have to know loads more to appreciate its succinct nature. New coders who really want to understand the nuts and bolts of coding need to go away and figure out writing methods/functions (i.e. imperative programming) and working in procedural style, in order to grasp the fundamentals, first, before looking into architectural ways of working. So, my question is: should Ruby ever be recommended as a first language? I think no (obviously)... what arguments are there for it?

Read the article
database independent coding framework options?

- by statirasystems

Background: I have not programmed in a while besides doing VBA and a little VB.NET. So please forgive my language use. I'm green and have a head cold. I am reading all I can now, but I have no programming circles to draw from. The information I am providing is to help guide you to what I am looking for. I am not confident I can ask the question properly. Story: I have four different projects that I am starting. Obviously I won't be working on all at the same time however they each will have similar needs and be inter related. They are as follows: Desktop Environment/System User Interface - basically a product that runs on major computers via mono or .net that unifies the look and functions. In the context of the up coming question it would be able to directly access data of various types. It would work in tandum with my office suite, system manager, and network application framework. Office Suite - technically it would not be a suite since I will be doing it from one interfacel except for the Communications Application. As far as the question, it will need to be able to link to various data sources for storing files and using, manipulating, and presenting information. System Manager - an intellegent system to manage and administer the entire network and all equipment. As far as the question, needs to be able to access data for archiving and and for accessing it's own settings stored in various formats, sql or xml. Network Application Framework - A complete system that can be used for ERP, CRM, CMS, Errata, File Management, and so on. As to the question to be able to access it's own or interlink with existing applications. Requirement: C#, Simplifies and reduces coding, use the same code to access diffent databases(ie MySQL, MS SQL, ACCESS, XML, ...), Mono would be nice but not a must, Question: What librarys, frameworks, or other options would be able to help with this? Is there a good resource to guide me? I don't want arguing over what is best, just information to help me further understand and make an educated decision.

Read the article
Application workflow

- by manseuk

I am in the planning process for a new application, the application will be written in PHP (using the Symfony 2 framework) but I'm not sure how relevant that is. The application will be browser based, although there will eventually be API access for other systems to interact with the data stored within the application, again probably not relavent at this point. The application manages SIM cards for lots of different providers - each SIM card belongs to a single provider but a single customer might have many SIM cards across many providers. The application allows the user to perform actions against the SIM card - for example Activate it, Barr it, Check on its status etc Some of the providers provide an API for doing this - so a single access point with multiple methods eg activateSIM, getStatus, barrSIM etc. The method names differ for each provider and some providers offer methods for extra functions that others don't. Some providers don't have APIs but do offer these methods by sending emails with attachments - the attachments are normally a CSV file that contains the SIM reference and action required - the email is processed by the provider and replied to once the action has been complete. To give you an example - the front end of my application will provide a customer with a list of SIM cards they own and give them access to the actions that are provided by the provider of each specific SIM card - some methods may require extra data which will either be stored in the backend or collected from the user frontend. Once the user has selected their action and added any required data I will handle the process in the backend and provide either instant feedback, in the case of the providers with APIs, or start the process off by sending an email and waiting for its reply before processing it and updating the backend so that next time the user checks the SIM card its status is correct (ie updated by a backend process). My reason for creating this question is because I'm stuck !! I'm confused about how to approach the actual workflow logic. I was thinking about creating a Provider Interface with the most common methods getStatus, activateSIM and barrSIM and then implementing that interface for each provider. So class Provider1 implements Provider - Then use a Factory to create the required class depending on user selected SIM card and invoking the method selected. This would work fine if all providers offered the same methods but they don't - there are a subset which are common but some providers offer extra methods - how can I implement that flexibly ? How can I deal with the processes where the workflow is different - ie some methods require and API call and value returned and some require an email to be sent and the next stage of the process doesn't start until the email reply is recieved ... Please help ! (I hope this is a readable question and that this is the correct place to be asking) Update I guess what I'm trying to avoid is a big if or switch / case statement - some design pattern that gives me a flexible approach to implementing this kind of fluid workflow .. anyone ?

Read the article
What is the right way to group this project into classes?

- by sigil

I originally asked this on SO, where it was closed and recommended that I ask it here instead. I'm trying to figure out how to group all the functions necessary for my project into classes. The goal of the project is to execute the following process: Get the user's FTP credentials (username & password). Check to make sure the credentials establish a valid connection to the FTP server. Query several Sharepoint lists and join the results of those queries to create a list of items that need to have action taken on them. Each item in the list has a folder. For each item: Zip the contents of the folder. Upload the folder to the FTP server using SFTP Update the item's Sharepoint data. Email the user an Excel report showing, e.g., Items without folder paths Items that failed to zip or upload Steps 2-5 are performed on a periodic basis; if step 2 returns an invalid connection, the user is alerted and the process returns to step 1. If at any point the user presses a certain key, the process terminates. I've defined the following set of classes, each of which is in its own .cs file: SFTP: file transfer processes DataHandler: Sharepoint data retrieval/querying/updating processes. Also makes and uploads the zip files. Exceptions: Not just one class, this is the .cs file where I have all of my exception classes. Report: Builds and sends the report. Program: The main class for running the program. I recognize that the DataHandler class is a god object, but I don't have a good idea of how to refactor it. I feel like it should be more fine-grained than just breaking it into Sharepoint, Zip, and Upload, but maybe that's it. Also, I haven't yet worked out how to combine the periodic behavior with the "wait for user input at any point in the process" part; I think that involves threads, which means other classes to manage the threads... I'm not that well-versed in design patterns, but is there one that fits this project well? If this is too big of a topic to neatly explain in an SO answer, I'll also accept a link to a good tutorial on what I'm trying to do here.

Read the article
Laptop freezes and seems to crash, but continues working after waiting for a few minutes [closed]

- by Corwin

I've had this old notebook laying around and because i was missing a second machine (My wife usually steals the first ;) ) I considered installing Linux. As a php developer I work with Linux servers (usually fedora) on a daily basis and because its an older machine that I want to use for development, linux seemed the best option. Speedwise I expected a good experience, better than Windows 7 on the same machine. The results where terrible. I tried ubuntu 12.04. The shell never got past showing the background. The system doesn't freeze since the mouse still works and I can use ctrl+alt+f2 etc to enter terminal mode. I expected hardware problems en even exchanged the harddisk en Ram memory. No luck though, so I started over and tried 11.10 Same results so I tried 10.04.4 which did install properly. Not sure if unity was the problem, but it seems likely. But then I tried simply things like surfing on the net, the system frooze and I thought it crashed so after a few minutes I pulled the plug and rebooted. But it happened again and I waited. After a few minutes the system came back to life like nothing happened. Long story short. Besides the fact that the entire interface is very sluggish, any and all graphical functions freezes the system. The more elaborate the animation would be, the longer it freezes. I switch chromium from window to fullscreenmode and had to wait 15 minutes to continue. I don't see the animation that's probably supposed to be in between. It just freezes and then after unfreezing its fullscreen. I don't think its a bug. I suspect the problem is with my graphics card. Like I said, its and old system. So old that I can't even find the original Ati drivers anywere. (I'll post the details of my system at the end of my post) I'm at a loss as to what to do next. I tried other Distro's. So far only dreamlinux works normally. Linux Mint won't start as a live CD. I think I simply need a driver update but I can't find them anywhere. Does anyone have the same experience ? Maybe even someone who has or had the same notebook running Ubuntu at some point ? Anyway, here are the specs: http://www.nec-driver.com/nec-driver/NEC-Versa-P550---FP550-Driver_421.html

Read the article
How should I refactor switch statements like this (Switching on type) to be more OO?

- by Taytay

I'm seeing some code like this in our code base, and want to refactor it: (Typescript psuedocode follows): class EntityManager{ private findEntityForServerObject(entityType:string, serverObject:any):IEntity { var existingEntity:IEntity = null; switch(entityType) { case Types.UserSetting: existingEntity = this.getUserSettingByUserIdAndSettingName(serverObject.user_id, serverObject.setting_name); break; case Types.Bar: existingEntity = this.getBarByUserIdAndId(serverObject.user_id, serverObject.id); break; //Lots more case statements here... } return existingEntity; } } The downsides of switching on type are self-explanatory. Normally, when switching behavior based on type, I try to push the behavior into subclasses so that I can reduce this to a single method call, and let polymorphism take care of the rest. However, the following two things are giving me pause: 1) I don't want to couple the serverObject with the class that is storing all of these objects. It doesn't know where to look for entities of a certain type. And unfortunately, the identity of a type of ServerObject varies with the type of ServerObject. (So sometimes it's just an ID, other times it's a combination of an id and a uniquely identifying string, etc). And this behavior doesn't belong down there on those subclasses. It is the responsibility of the EntityManager and its delegates. 2) In this case, I can't modify the ServerObject classes since they're plain old data objects. It should be mentioned that I've got other instances of the above method that take a parameter like "IEntity" and proceed to do almost the same thing (but slightly modify the name of the methods they're calling to get the identity of the entity). So, we might have: case Types.Bar: existingEntity = this.getBarByUserIdAndId(entity.getUserId(), entity.getId()); break; So in that case, I can change the entity interface and subclasses, but this isn't behavior that belongs in that class. So, I think that points me to some sort of map. So eventually I will call: private findEntityForServerObject(entityType:string, serverObject:any):IEntity { return aMapOfSomeSort[entityType].findByServerObject(serverObject); } private findEntityForEntity(someEntity:IEntity):IEntity { return aMapOfSomeSort[someEntity.entityType].findByEntity(someEntity); } Which means I need to register some sort of strategy classes/functions at runtime with this map. And again, I darn well better remember to register one for each my my types, or I'll get a runtime exception. Is there a better way to refactor this? I feel like I'm missing something really obvious here.

Read the article
How to check if a variable is an integer? [closed]

- by FyrePlanet

I'm going through my C++ book and have currently made a working Guess The Number game. The game generates a random number based on current time, has the user input their guess, and then tells them whether it was too high, too low, or the correct number. The game functions fine if you enter a number, but returns 'Too High' if you enter something that is not a number (such as a letter or punctuation). Not only does it return 'too high', but it also continually returns it. I was wondering what I could do to check if the guess, as input by the user, is an integer and if it is not, to return 'Not a number. Guess again.' Here is the code. #include <iostream> #include <cstdlib> #include <ctime> using namespace std; int main() { srand(time(0)); // seed the random number generator int theNumber = rand() % 100 + 1; // random number between 1 and 100 int tries = 0, guess; cout << "\tWelcome to Guess My Number!\n\n"; do { cout << "Enter a guess: "; cin >> guess; ++tries; if (guess > theNumber) cout << "Too high!\n\n"; if (guess < theNumber) cout << "Too low!\n\n"; } while (guess != theNumber); cout << "\nThat's it! You got it in " << tries << " guesses!\n"; cout << "\nPress Enter to exit.\n"; cin.ignore(cin.rdbuf()->in_avail() + 1); return 0; }

Read the article
Point in polygon OR point on polygon using LINQ

- by wageoghe

As noted in an earlier question, How to Zip enumerable with itself, I am working on some math algorithms based on lists of points. I am currently working on point in polygon. I have the code for how to do that and have found several good references here on SO, such as this link Hit test. So, I can figure out whether or not a point is in a polygon. As part of determining that, I want to determine if the point is actually on the polygon. This I can also do. If I can do all of that, what is my question you might ask? Can I do it efficiently using LINQ? I can already do something like the following (assuming a Pairwise extension method as described in my earlier question as well as in links to which my question/answers links, and assuming a Position type that has X and Y members). I have not tested much, so the lambda might not be 100% correct. Also, it does not take very small differences into account. public static PointInPolygonLocation PointInPolygon(IEnumerable<Position> pts, Position pt) { int numIntersections = pts.Pairwise( (p1, p2) => { if (p1.Y != p2.Y) { if ((p1.Y >= pt.Y && p2.Y < pt.Y) || (p1.Y < pt.Y && p2.Y >= pt.Y)) { if (p1.X < p1.X && p2.X < pt.X) { return 1; } if (p1.X < pt.X || p2.X < pt.X) { if (((pt.Y - p1.Y) * ((p1.X - p2.X) / (p1.Y - p2.Y)) * p1.X) < pt.X) { return 1; } } } } return 0; }).Sum(); if (numIntersections % 2 == 0) { return PointInPolygonLocation.Outside; } else { return PointInPolygonLocation.Inside; } } This function, PointInPolygon, takes the input Position, pt, iterates over the input sequence of position values, and uses the Jordan Curve method to determine how many times a ray extended from pt to the left intersects the polygon. The lambda expression will yield, into the "zipped" list, 1 for every segment that is crossed, and 0 for the rest. The sum of these values determines if pt is inside or outside of the polygon (odd == inside, even == outside). So far, so good. Now, for any consecutive pairs of position values in the sequence (i.e. in any execution of the lambda), we can also determine if pt is ON the segment p1, p2. If that is the case, we can stop the calculation because we have our answer. Ultimately, my question is this: Can I perform this calculation (maybe using Aggregate?) such that we will only iterate over the sequence no more than 1 time AND can we stop the iteration if we encounter a segment that pt is ON? In other words, if pt is ON the very first segment, there is no need to examine the rest of the segments because we have the answer. It might very well be that this operation (particularly the requirement/desire to possibly stop the iteration early) does not really lend itself well to the LINQ approach. It just occurred to me that maybe the lambda expression could yield a tuple, the intersection value (1 or 0 or maybe true or false) and the "on" value (true or false). Maybe then I could use TakeWhile(anontype.PointOnPolygon == false). If I Sum the tuples and if ON == 1, then the point is ON the polygon. Otherwise, the oddness or evenness of the sum of the other part of the tuple tells if the point is inside or outside.

Read the article
How do you automap List<float> or float[] with Fluent NHibernate?

- by Tom Bushell

Having successfully gotten a sample program working, I'm now starting to do Real Work with Fluent NHibernate - trying to use Automapping on my project's class heirarchy. It's a scientific instrumentation application, and the classes I'm mapping have several properties that are arrays of floats e.g. private float[] _rawY; public virtual float[] RawY { get { return _rawY; } set { _rawY = value; } } These arrays can contain a maximum of 500 values. I didn't expect Automapping to work on arrays, but tried it anyway, with some success at first. Each array was auto mapped to a BLOB (using SQLite), which seemed like a viable solution. The first problem came when I tried to call SaveOrUpdate on the objects containing the arrays - I got "No persister for float[]" exceptions. So my next thought was to convert all my arrays into ILists e.g. public virtual IList<float> RawY { get; set; } But now I get: NHibernate.MappingException: Association references unmapped class: System.Single Since Automapping can deal with lists of complex objects, it never occured to me it would not be able to map lists of basic types. But after doing some Googling for a solution, this seems to be the case. Some people seem to have solved the problem, but the sample code I saw requires more knowledge of NHibernate than I have right now - I didn't understand it. Questions: 1. How can I make this work with Automapping? 2. Also, is it better to use arrays or lists for this application? I can modify my app to use either if necessary (though I prefer lists). Edit: I've studied the code in Mapping Collection of Strings, and I see there is test code in the source that sets up an IList of strings, e.g. public virtual IList<string> ListOfSimpleChildren { get; set; } [Test] public void CanSetAsElement() { new MappingTester<OneToManyTarget>() .ForMapping(m => m.HasMany(x => x.ListOfSimpleChildren).Element("columnName")) .Element("class/bag/element").Exists(); } so this must be possible using pure Automapping, but I've had zero luck getting anything to work, probably because I don't have the requisite knowlege of manually mapping with NHibernate. Starting to think I'm going to have to hack this (by encoding the array of floats as a single string, or creating a class that contains a single float which I then aggregate into my lists), unless someone can tell me how to do it properly. End Edit Here's my CreateSessionFactory method, if that helps formulate a reply... private static ISessionFactory CreateSessionFactory() { ISessionFactory sessionFactory = null; const string autoMapExportDir = "AutoMapExport"; if( !Directory.Exists(autoMapExportDir) ) Directory.CreateDirectory(autoMapExportDir); try { var autoPersistenceModel = AutoMap.AssemblyOf<DlsAppOverlordExportRunData>() .Where(t => t.Namespace == "DlsAppAutomapped") .Conventions.Add( DefaultCascade.All() ) ; sessionFactory = Fluently.Configure() .Database(SQLiteConfiguration.Standard .UsingFile(DbFile) .ShowSql() ) .Mappings(m => m.AutoMappings.Add(autoPersistenceModel) .ExportTo(autoMapExportDir) ) .ExposeConfiguration(BuildSchema) .BuildSessionFactory() ; } catch (Exception e) { Debug.WriteLine(e); } return sessionFactory; }

Read the article
Implementing a popularity algorithm in Django

- by TheLizardKing

I am creating a site similar to reddit and hacker news that has a database of links and votes. I am implementing hacker news' popularity algorithm and things are going pretty swimmingly until it comes to actually gathering up these links and displaying them. The algorithm is simple: Y Combinator's Hacker News: Popularity = (p - 1) / (t + 2)^1.5` Votes divided by age factor. Where` p : votes (points) from users. t : time since submission in hours. p is subtracted by 1 to negate submitter's vote. Age factor is (time since submission in hours plus two) to the power of 1.5.factor is (time since submission in hours plus two) to the power of 1.5. I asked a very similar question over yonder http://stackoverflow.com/questions/1964395/complex-ordering-in-django but instead of contemplating my options I choose one and tried to make it work because that's how I did it with PHP/MySQL but I now know Django does things a lot differently. My models look something (exactly) like this class Link(models.Model): category = models.ForeignKey(Category) user = models.ForeignKey(User) created = models.DateTimeField(auto_now_add = True) modified = models.DateTimeField(auto_now = True) fame = models.PositiveIntegerField(default = 1) title = models.CharField(max_length = 256) url = models.URLField(max_length = 2048) def __unicode__(self): return self.title class Vote(models.Model): link = models.ForeignKey(Link) user = models.ForeignKey(User) created = models.DateTimeField(auto_now_add = True) modified = models.DateTimeField(auto_now = True) karma_delta = models.SmallIntegerField() def __unicode__(self): return str(self.karma_delta) and my view: def index(request): popular_links = Link.objects.select_related().annotate(karma_total = Sum('vote__karma_delta')) return render_to_response('links/index.html', {'links': popular_links}) Now from my previous question, I am trying to implement the algorithm using the sorting function. An answer from that question seems to think I should put the algorithm in the select and sort then. I am going to paginate these results so I don't think I can do the sorting in python without grabbing everything. Any suggestions on how I could efficiently do this? EDIT This isn't working yet but I think it's a step in the right direction: from django.shortcuts import render_to_response from linkett.apps.links.models import * def index(request): popular_links = Link.objects.select_related() popular_links = popular_links.extra( select = { 'karma_total': 'SUM(vote.karma_delta)', 'popularity': '(karma_total - 1) / POW(2, 1.5)', }, order_by = ['-popularity'] ) return render_to_response('links/index.html', {'links': popular_links}) This errors out into: Caught an exception while rendering: column "karma_total" does not exist LINE 1: SELECT ((karma_total - 1) / POW(2, 1.5)) AS "popularity", (S... EDIT 2 Better error? TemplateSyntaxError: Caught an exception while rendering: missing FROM-clause entry for table "vote" LINE 1: SELECT ((vote.karma_total - 1) / POW(2, 1.5)) AS "popularity... My index.html is simply: {% block content %} {% for link in links %} karma-up {{ link.karma_total }} karma-down {{ link.title }} Posted by {{ link.user }} to {{ link.category }} at {{ link.created }} {% empty %} No Links {% endfor %} {% endblock content %} EDIT 3 So very close! Again, all these answers are great but I am concentrating on a particular one because I feel it works best for my situation. from django.db.models import Sum from django.shortcuts import render_to_response from linkett.apps.links.models import * def index(request): popular_links = Link.objects.select_related().extra( select = { 'popularity': '(SUM(links_vote.karma_delta) - 1) / POW(2, 1.5)', }, tables = ['links_link', 'links_vote'], order_by = ['-popularity'], ) return render_to_response('links/test.html', {'links': popular_links}) Running this I am presented with an error hating on my lack of group by values. Specifically: TemplateSyntaxError at / Caught an exception while rendering: column "links_link.id" must appear in the GROUP BY clause or be used in an aggregate function LINE 1: ...karma_delta) - 1) / POW(2, 1.5)) AS "popularity", "links_lin... Not sure why my links_link.id wouldn't be in my group by but I am not sure how to alter my group by, django usually does that.

Read the article
N-tier Repository POCOs - Aggregates?

- by Sam

Assume the following simple POCOs, Country and State: public partial class Country { public Country() { States = new List<State>(); } public virtual int CountryId { get; set; } public virtual string Name { get; set; } public virtual string CountryCode { get; set; } public virtual ICollection<State> States { get; set; } } public partial class State { public virtual int StateId { get; set; } public virtual int CountryId { get; set; } public virtual Country Country { get; set; } public virtual string Name { get; set; } public virtual string Abbreviation { get; set; } } Now assume I have a simple respository that looks something like this: public partial class CountryRepository : IDisposable { protected internal IDatabase _db; public CountryRepository() { _db = new Database(System.Configuration.ConfigurationManager.AppSettings["DbConnName"]); } public IEnumerable<Country> GetAll() { return _db.Query<Country>("SELECT * FROM Countries ORDER BY Name", null); } public Country Get(object id) { return _db.SingleById(id); } public void Add(Country c) { _db.Insert(c); } /* ...And So On... */ } Typically in my UI I do not display all of the children (states), but I do display an aggregate count. So my country list view model might look like this: public partial class CountryListVM { [Key] public int CountryId { get; set; } public string Name { get; set; } public string CountryCode { get; set; } public int StateCount { get; set; } } When I'm using the underlying data provider (Entity Framework, NHibernate, PetaPoco, etc) directly in my UI layer, I can easily do something like this: IList<CountryListVM> list = db.Countries .OrderBy(c => c.Name) .Select(c => new CountryListVM() { CountryId = c.CountryId, Name = c.Name, CountryCode = c.CountryCode, StateCount = c.States.Count }) .ToList(); But when I'm using a repository or service pattern, I abstract away direct access to the data layer. It seems as though my options are to: Return the Country with a populated States collection, then map over in the UI layer. The downside to this approach is that I'm returning a lot more data than is actually needed. -or- Put all my view models into my Common dll library (as opposed to having them in the Models directory in my MVC app) and expand my repository to return specific view models instead of just the domain pocos. The downside to this approach is that I'm leaking UI specific stuff (MVC data validation annotations) into my previously clean POCOs. -or- Are there other options? How are you handling these types of things?

Read the article
Join and sum not compatible matrices through data.table

- by leodido

My goal is to "sum" two not compatible matrices (matrices with different dimensions) using (and preserving) row and column names. I've figured this approach: convert the matrices to data.table objects, join them and then sum columns vectors. An example: > M1 1 3 4 5 7 8 1 0 0 1 0 0 0 3 0 0 0 0 0 0 4 1 0 0 0 0 0 5 0 0 0 0 0 0 7 0 0 0 0 1 0 8 0 0 0 0 0 0 > M2 1 3 4 5 8 1 0 0 1 0 0 3 0 0 0 0 0 4 1 0 0 0 0 5 0 0 0 0 0 8 0 0 0 0 0 > M1 %ms% M2 1 3 4 5 7 8 1 0 0 2 0 0 0 3 0 0 0 0 0 0 4 2 0 0 0 0 0 5 0 0 0 0 0 0 7 0 0 0 0 1 0 8 0 0 0 0 0 0 This is my code: M1 <- matrix(c(0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0), byrow = TRUE, ncol = 6) colnames(M1) <- c(1,3,4,5,7,8) M2 <- matrix(c(0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0), byrow = TRUE, ncol = 5) colnames(M2) <- c(1,3,4,5,8) # to data.table objects DT1 <- data.table(M1, keep.rownames = TRUE, key = "rn") DT2 <- data.table(M2, keep.rownames = TRUE, key = "rn") # join and sum of common columns if (nrow(DT1) > nrow(DT2)) { A <- DT2[DT1, roll = TRUE] A[, list(X1 = X1 + X1.1, X3 = X3 + X3.1, X4 = X4 + X4.1, X5 = X5 + X5.1, X7, X8 = X8 + X8.1), by = rn] } That outputs: rn X1 X3 X4 X5 X7 X8 1: 1 0 0 2 0 0 0 2: 3 0 0 0 0 0 0 3: 4 2 0 0 0 0 0 4: 5 0 0 0 0 0 0 5: 7 0 0 0 0 1 0 6: 8 0 0 0 0 0 0 Then I can convert back this data.table to a matrix and fix row and column names. The questions are: how to generalize this procedure? I need a way to automatically create list(X1 = X1 + X1.1, X3 = X3 + X3.1, X4 = X4 + X4.1, X5 = X5 + X5.1, X7, X8 = X8 + X8.1) because i want to apply this function to matrices which dimensions (and row/columns names) are not known in advance. In summary I need a merge procedure that behaves as described. there are other strategies/implementations that achieve the same goal that are, at the same time, faster and generalized? (hoping that some data.table monster help me) to what kind of join (inner, outer, etc. etc.) is assimilable this procedure? Thanks in advance. p.s.: I'm using data.table version 1.8.2 EDIT - SOLUTIONS @Aaron solution. No external libraries, only base R. It works also on list of matrices. add_matrices_1 <- function(...) { a <- list(...) cols <- sort(unique(unlist(lapply(a, colnames)))) rows <- sort(unique(unlist(lapply(a, rownames)))) out <- array(0, dim = c(length(rows), length(cols)), dimnames = list(rows,cols)) for (m in a) out[rownames(m), colnames(m)] <- out[rownames(m), colnames(m)] + m out } @MadScone solution. Used reshape2 package. It works only on two matrices per call. add_matrices_2 <- function(m1, m2) { m <- acast(rbind(melt(M1), melt(M2)), Var1~Var2, fun.aggregate = sum) mn <- unique(colnames(m1), colnames(m2)) rownames(m) <- mn colnames(m) <- mn m } BENCHMARK (100 runs with microbenchmark package) Unit: microseconds expr min lq median uq max 1 add_matrices_1 196.009 257.5865 282.027 291.2735 549.397 2 add_matrices_2 13737.851 14697.9790 14864.778 16285.7650 25567.448 No need to comment the benchmark: @Aaron solution wins. I'll continue to investigate a similar solution for data.table objects. I'll add other solutions eventually reported or discovered.

Read the article
Please help fix and optimize this query

- by user607217

I am working on a system to find potential duplicates in our customers table (SQL 2005). I am using the built-in SOUNDEX value that our software computes when customers are added/updated, but I also implemented the double metaphone algorithm for better matching. This is the most-nested query I have created, and I can't help but think there is a better way to do it and I'd like to learn. In the inner-most query I am joining the customer table to the metaphone table I created, then finding customers that have identical pKey (primary phonetic key). I take that, union that with customers that have matching soundex values, and then proceed to score those matches with various text similarity functions. This is currently working, but I would also like to add a union of customers whose aKey (alternate phonetic key) match. This would be identical to "QUERY A" except to substitute on (c1Akey = c2Akey) for the join. However, when I attempt to include that, I get errors when I try to execute my query. Here is the code: --Create aggregate ranking select c1Name, c2Name, nDiff, c1Addr, c2Addr, aDiff, c1SSN, c2SSN, sDiff, c1DOB, c2DOB, dDiff, nDiff+aDiff+dDiff+sDiff as Score ,(sDiff+dDiff)*1.5 + (nDiff+dDiff)*1.5 + (nDiff+sDiff)*1.5 + aDiff *.5 + nDiff *.5 as [Rank] FROM ( --Create match scores for different fields SELECT c1Name, c2Name, c1Addr, c2Addr, c1SSN, c2SSN, c1LTD, c2LTD, c1DOB, c2DOB, dbo.Jaro(c1name, c2name) AS nDiff, dbo.JaroWinkler(c1addr, c2addr) AS aDiff, CASE WHEN c1dob = '1901-01-01' OR c2dob = '1901-01-01' OR c1dob = '1800-01-01' OR c2dob = '1800-01-01' THEN .5 ELSE dbo.SmithWaterman(c1dob, c2dob) END AS dDiff, CASE WHEN c1ssn = '000-00-0000' OR c2ssn = '000-00-0000' THEN .5 ELSE dbo.Jaro(c1ssn, c2ssn) END AS sDiff FROM -- Generate list of possible matches based on multiple phonetic matching fields ( select * from -- List of similar names from pKey field of ##Metaphone table --QUERY A BEGIN (select customers.custno as c1Custno, name as c1Name, haddr as c1Addr, ssn as c1SSN, lasttripdate as c1LTD, dob as c1DOB, soundex as c1Soundex, pkey as c1Pkey, akey as c1Akey from Customers WITH (nolock) join ##Metaphone on customers.custno = ##Metaphone.custno) as c1 JOIN (select customers.custno as c2Custno, name as c2Name, haddr as c2Addr, ssn as c2SSN, lasttripdate as c2LTD, dob as c2DOB, soundex as c2Soundex, pkey as c2Pkey, akey as c2Akey from Customers with (nolock) join ##Metaphone on customers.custno = ##Metaphone.custno) as c2 on (c1Pkey = c2Pkey) and (c1Custno < c2Custno) WHERE (c1Name <> 'PARENT, GUARDIAN') and c1soundex != c2soundex --QUERY A END union --List of similar names from pregenerated SOUNDEX field (select t1.custno, t1.name, t1.haddr, t1.ssn, t1.lasttripdate, t1.dob, t1.[soundex], 0, 0, t2.custno, t2.name, t2.haddr, t2.ssn, t2.lasttripdate, t2.dob, t2.[soundex], 0, 0 from Customers t1 WITH (nolock) join customers t2 with (nolock) on t1.[soundex] = t2.[soundex] and t1.custno < t2.custno where (t1.name <> 'PARENT, GUARDIAN')) ) as a ) as b where (sDiff+dDiff)*1.5 + (nDiff+dDiff)*1.5 + (nDiff+sDiff)*1.5 + aDiff *.5 + nDiff *.5 >= 7.5 order by [rank] desc, score desc Previously, I was using joins such as on c1.pkey = c2.pkey or c1.akey = c2.akey or c1.soundex = c2.soundex but the performance was horrendous, and using unions seems to be working a lot better. Out of 103K customers, tt is currently generating a list of 8.5M potential matches (based on the phonetic codes) in 2.25 minutes, and then taking another 2 to score, rank and filter those down to about 3000. So I am happy with the performance, I just can't help but think there is a better way to structure this, and I need help adding the extra union condition. Thanks!

Read the article

Search Results

Search found 9706 results on 389 pages for 'aggregate functions'.

Page 288/389 | < Previous Page | 284 285 286 287 288 289 290 291 292 293 294 295 | Next Page >

- by Eric Nelson

- by jamiet

- by Blaisorblade

- by MarkPearl

- by Korvin Szanto

- by Rashmi Kohli

- by Ronald

- by Pureferret

- by tandu

- by TheBeefMightBeTough

- by Flexo

- by Chuck

- by AlexW

- by statirasystems

- by manseuk

- by sigil

- by Corwin

- by Taytay

- by FyrePlanet

- by wageoghe

- by Tom Bushell

- by TheLizardKing

- by Sam

- by leodido

- by user607217

< Previous Page | 284 285 286 287 288 289 290 291 292 293 294 295 | Next Page >