Search Results

Search found 97 results on 4 pages for 'algorithmic'.

Page 4/4 | < Previous Page | 1 2 3 4 

  • FIlling a Java Bean tree structure from a csv flat file

    - by Clem
    Hi, I'm currently trying to construct a list of bean classes in Java from a flat description file formatted in csv. Concretely : Here is the structure of the csv file : MES_ID;GRP_PARENT_ID;GRP_ID;ATTR_ID M1 ; ;G1 ;A1 M1 ; ;G1 ;A2 M1 ;G1 ;G2 ;A3 M1 ;G1 ;G2 ;A4 M1 ;G2 ;G3 ;A5 M1 ; ;G4 ;A6 M1 ; ;G4 ;A7 M1 ; ;G4 ;A8 M2 ; ;G1 ;A1 M2 ; ;G1 ;A2 M2 ; ;G2 ;A3 M2 ; ;G2 ;A4 It corresponds to the hierarchical data structure : M1 ---G1 ------A1 ------A2 ------G2 ---------A3 ---------A4 ---------G3 ------------A5 ------G4 ---------A7 ---------A8 M2 ---G1 ------A1 ------A2 ---G2 ------A3 ------A4 Remarks : A message M can have an infinite number of groups G and attributes A A group G can have an infinite number of attributes and an infinite number of under-groups each of them having under-groups too That beeing said, I'm trying to read this flat csv decription to store it in this structure of beans : Map<String, MBean> messages = new HashMap<String, Mbean>(); == public class MBean { private String mes_id; private Map<String, GBean> groups; } public class GBean { private String grp_id; private Map<String, ABean> attributes; private Map<String, GBean> underGroups; } public class ABean { private String attr_id; } Reading the csv file sequentially is ok and I've been investigating how to use recursion to store the description data, but couldn't find a way. Thanks in advance for any of your algorithmic ideas. I hope it will put you in the mood of thinking about this ... I've to admit that I'm out of ideas :s

    Read the article

  • Autodetect Presence of CSV Headers in a File

    - by banzaimonkey
    Short question: How do I automatically detect whether a CSV file has headers in the first row? Details: I've written a small CSV parsing engine that places the data into an object that I can access as (approximately) an in-memory database. The original code was written to parse third-party CSV with a predictable format, but I'd like to be able to use this code more generally. I'm trying to figure out a reliable way to automatically detect the presence of CSV headers, so the script can decide whether to use the first row of the CSV file as keys / column names or start parsing data immediately. Since all I need is a boolean test, I could easily specify an argument after inspecting the CSV file myself, but I'd rather not have to (go go automation). I imagine I'd have to parse the first 3 to ? rows of the CSV file and look for a pattern of some sort to compare against the headers. I'm having nightmares of three particularly bad cases in which: The headers include numeric data for some reason The first few rows (or large portions of the CSV) are null There headers and data look too similar to tell them apart If I can get a "best guess" and have the parser fail with an error or spit out a warning if it can't decide, that's OK. If this is something that's going to be tremendously expensive in terms of time or computation (and take more time than it's supposed to save me) I'll happily scrap the idea and go back to working on "important things". I'm working with PHP, but this strikes me as more of an algorithmic / computational question than something that's implementation-specific. If there's a simple algorithm I can use, great. If you can point me to some relevant theory / discussion, that'd be great, too. If there's a giant library that does natural language processing or 300 different kinds of parsing, I'm not interested.

    Read the article

  • Better ways to implement a modulo operation (algorithm question)

    - by ryxxui
    I've been trying to implement a modular exponentiator recently. I'm writing the code in VHDL, but I'm looking for advice of a more algorithmic nature. The main component of the modular exponentiator is a modular multiplier which I also have to implement myself. I haven't had any problems with the multiplication algorithm- it's just adding and shifting and I've done a good job of figuring out what all of my variables mean so that I can multiply in a pretty reasonable amount of time. The problem that I'm having is with implementing the modulus operation in the multiplier. I know that performing repeated subtractions will work, but it will also be slow. I found out that I could shift the modulus to effectively subtract large multiples of the modulus but I think there might still be better ways to do this. The algorithm that I'm using works something like this (weird pseudocode follows): result,modulus : integer (n bits) (previously defined) shiftcount : integer (initialized to zero) while( (modulus<result) and (modulus(n-1) != 1) ){ modulus = modulus << 1 shiftcount++ } for(i=shiftcount;i>=0;i++){ if(modulus<result){result = result-modulus} if(i!=0){modulus = modulus << 1} } So...is this a good algorithm, or at least a good place to start? Wikipedia doesn't really discuss algorithms for implementing the modulo operation, and whenever I try to search elsewhere I find really interesting but incredibly complicated (and often unrelated) research papers and publications. If there's an obvious way to implement this that I'm not seeing, I'd really appreciate some feedback.

    Read the article

  • Interactive Web Collaboration Platform

    - by user1477586
    I am currently unsatisfied with most, if not all, web-based collaboration tools I've ever used. Examples of what I'm considering a "web-based collaboration tool" are TWiki, WebEx, etc. What I have in mind of developing is to develop my own. The target "market" is for software/hardware developers who will likely be working on separate projects linked by a common goal. Example: You work at a company who is designing a brand new printer. So, one man is working on the ink cartridge, another on the paper feed, one on the circuitry of the printer and one more is working on the drivers. What I would like to make (unless it already exists) is a website that, upon loading, presents the user with a web that has each project within it's own bubble, graphically. These bubbles would all be linked to the central bubble of "[Company Name]". Upon selecting an individual bubble the browser would focus and zoom in on that project bubble and expand more information (in bullets or with more affiliated bubbles) about subprojects, progress, setbacks, etc. Then, at a terminal "node" (i.e. a bubble with no sub-bubbles) a selection would then load a page with information relevant to that node. That's a lot of monologue. In short, I want to know how I might approach this. My background is with algorithmic use of Python and C++. I've never done interactive web design like this; though, I have gone through the W3C tutorials on HTML4/5. I'm willing to learn a new language I'm just wondering which language/set of languages seem(s) most appropriate for this project. Thanks, world, for imparting your knowledge to me. An example of what the start page might look like is:

    Read the article

  • Quickest algorithm for finding sets with high intersection

    - by conradlee
    I have a large number of user IDs (integers), potentially millions. These users all belong to various groups (sets of integers), such that there are on the order of 10 million groups. To simplify my example and get to the essence of it, let's assume that all groups contain 20 user IDs (i.e., all integer sets have a cardinality of 100). I want to find all pairs of integer sets that have an intersection of 15 or greater. Should I compare every pair of sets? (If I keep a data structure that maps userIDs to set membership, this would not be necessary.) What is the quickest way to do this? That is, what should my underlying data structure be for representing the integer sets? Sorted sets, unsorted---can hashing somehow help? And what algorithm should I use to compute set intersection)? I prefer answers that relate C/C++ (especially STL), but also any more general, algorithmic insights are welcome. Update Also, note that I will be running this in parallel in a shared memory environment, so ideas that cleanly extend to a parallel solution are preferred.

    Read the article

  • Verizon SongID - How is it programmed?

    - by CheeseConQueso
    For anyone not familiar with Verizon's SongID program, it is a free application downloadable through Verizon's VCast network. It listens to a song for 10 seconds at any point during the song and then sends this data to some all-knowing algorithmic beast that chews it up and sends you back all the ID3 tags (artist, album, song, etc...) The first two parts and last part are straightforward, but what goes on during the processing after the recorded sound is sent? I figure it must take the sound file (what format?), parse it (how? with what?) for some key identifiers (what are these? regular attributes of wave functions? phase/shift/amplitude/etc), and check it against a database. Everything I find online about how this works is something generic like what I typed above. From audiotag.info This service is based on a sophisticated audio recognition algorithm combining advanced audio fingerprinting technology and a large songs' database. When you upload an audio file, it is being analyzed by an audio engine. During the analysis its audio “fingerprint” is extracted and identified by comparing it to the music database. At the completion of this recognition process, information about songs with their matching probabilities are displayed on screen.

    Read the article

  • Performing text processing on flatpage content to include handling of custom tag

    - by Dzejkob
    Hi. I'm using flatpages app in my project to manage some html content. That content will include images, so I've made a ContentImage model allowing user to upload images using admin panel. The user should then be able to include those images in content of the flatpages. He can of course do that by manually typing image url into <img> tag, but that's not what I'm looking for. To make including images more convenient, I'm thinking about something like this: User edits an additional, let's say pre_content field of CustomFlatPage model (I'm using custom flatpage model already) instead of defining <img> tags directly, he uses a custom tag, something like [img=...] where ... is name of the ContentImage instance now the hardest part: before CustomFlatPage is saved, pre_content field is checked for all [img=...] occurences and they are processed like this: ContentImage model is searched if there's image instance with given name and if so, [img=...] is replaced with proper <img> tag. flatpage actual content is filled with processed pre_content and then flatpage is saved (pre_content is leaved unchanged, as edited by user) The part that I can't cope with is text processing. Should I use regular expressions? Apparently they can be slow for large strings. And how to organize logic? I assume it's rather algorithmic question, but I'm not familliar with text processing in Python enough, to do it myself. Can somebody give me any clues?

    Read the article

  • Architecture for database analytics

    - by David Cournapeau
    Hi, We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more. The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons: as typical with those kinds of scripts, they fall into the write-once and never-touched-again category tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM). this is slow and non-"agile" Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

    Read the article

  • How can I create photo effects in Android?

    - by PaulH
    I'd like to make an Android app that lets a user apply cool effects to photos taken with the camera. There are already a few out there, I know, but I'd like to try my own hand at one. I'm trying to figure out the best way to implement these effects. Here are some examples from the excellent Vignette app (which I own): http://www.flickr.com/groups/vignetteforandroid/pool/ I have been googling and stack-overflowing, but so far I've mostly found some references to published papers or books. I am ordering this one from Amazon presently - Digital Image Processing: An Algorithmic Introduction using Java After some reading, I think I have a basic understanding of manipulating the RGB values for all the pixels in the image. My main question is how do I come up with a transformation that produces cool effects? By cool effects I mean some like those in the Vignette app or IPhone apps: ToyCamera Polarize I already have quite a bit of experience with Java, and I've made my first app for android already. Any ideas? Thanks in advance.

    Read the article

  • Three most critical programming concepts

    - by Todd
    I know this has probably been asked in one form or fashion but I wanted to pose it once again within the context of my situation (and probably others here @ SO). I made a career change to Software Engineering some time ago without having an undergrad or grad degree in CS. I've supplemented my undergrad and grad studies in business with programming courses (VB, Java,C, C#) but never performed academic coursework in the other related disciplines (algorithms, design patterns, discrete math, etc.)...just mostly self-study. I know there are several of you who have either performed interviews and/or made hiring decisions. Given recent trends in demand, what would you say are the three most essential Comp Sci concepts that a developer should have a solid grasp of outside of language syntax? For example, I've seen blog posts of the "Absolute minimum X that every programmer must know" variety...that's what I'm looking for. Again if it's truly a redundancy please feel free to close; my feelings won't be hurt. (Closest ones I could find were http://stackoverflow.com/questions/164048/basic-programming-algorithmic-concepts- which was geared towards a true beginner, and http://stackoverflow.com/questions/648595/essential-areas-of-knowledge-which I didn't feel was concrete enough). Thanks in advance all! T.

    Read the article

  • Why junit ComparisonFailure is not used by assertEquals(Object, Object) ?

    - by Philippe Blayo
    In Junit 4, do you see any drawback to throw a ComparisonFailure instead of an AssertionError when assertEquals(Object, Object) fails ? assertEquals(Object, Object) throws a ComparisonFailure if both expected and actual are String an AssertionError if either is not a String @Test(expected=ComparisonFailure.class ) public void twoString() { assertEquals("a String", "another String"); } @Test(expected=AssertionError.class ) public void oneString() { assertEquals("a String", new Object()); } The two reasons why I ask the question: ComparisonFailure provide far more readable way to spot the differences in dialog box of eclipse or Intellij IDEA (FEST-Assert throws this exception) Junit 4 already use String.valueOf(Object) to build message "expected ... but was ..." (format method invoqued by Assert.assertEquals(message, Object, Object) in junit-4.8.2): static String format(String message, Object expected, Object actual) { ... String expectedString= String.valueOf(expected); String actualString= String.valueOf(actual); if (expectedString.equals(actualString)) return formatted + "expected: " + formatClassAndValue(expected, expectedString) +" but was: " + formatClassAndValue(actual, actualString); else return formatted +"expected:<"+ expectedString +"> but was:<"+ actualString +">"; Isn't it possible in assertEquals(message, Object, Object) to replace fail(format(message, expected, actual)); by throw new ComparisonFailure(message, formatClassAndValue(expectedObject, expectedString), formatClassAndValue(actualObject, actualString)); Do you see any compatibility issue with other tool, any algorithmic problem with that... ?

    Read the article

  • What happens when you create an instance of an object containing no state in C#?

    - by liquorice
    I am I think ok at algorithmic programming, if that is the right term? I used to play with turbo pascal and 8086 assembly language back in the 1980s as a hobby. But only very small projects and I haven't really done any programming in the 20ish years since then. So I am struggling for understanding like a drowning swimmer. So maybe this is a very niave question or I'm just making no sense at all, but say I have an object kind of like this: class Something : IDoer { void Do(ISomethingElse x) { x.DoWhatEverYouWant(42); } } And then I do var Thing1 = new Something(); var Thing2 = new Something(); Thing1.Do(blah); Thing2.Do(blah); does Thing1 = Thing2? does "new Something()" create anything? Or is it not much different different from having a static class, except I can pass it around and swap it out etc. Is the "Do" procedure in the same location in memory for both the Thing1(blah) and Thing2(blah) objects? I mean when executing it, does it mean there are two Something.Do procedures or just one?

    Read the article

  • Career guidance/advice for Junior-level Software Engineer [closed]

    - by John Do
    I have quite a few questions on my mind, so please bare with me. Please don't feel obligated to answer all of them, any as you choose will do. I'd appreciate if you could share some insight on any of these. Before I begin, some context: I currently have almost two years of professional experience as a Software Engineer, mainly developing software in Java. At this point, I feel that I have reached the peak in my career growth at the current company I am at and therefore I am looking for a new job, ideally again, as a Software Engineer. I have been interviewing for the past few months casually but have not had luck with companies I have a passion for. So, in no particular order - 1) In general, what are your thoughts on having graduate degrees in CS / Software Engineering. How much does it influence a salary increase, and do you think it's beneficial when working on real-world problems? I get the sense that a graduate degree in the field is trivial unless you really have a passion for research. 2) In general, in professional practice, how often had you have to write your own data structures and "complex" algorithms from scratch? In my own work, I have found myself relying mainly on third-party frameworks and the Java standard library to implement solutions as per business requirements. What are your thoughts on this? 3) In terms of resume, I feel the most ambivalent here. I want to be able to "blemish" my resume to a certain extent so that it stands out from others', but at the same time I do not want to over-exagerate my abilities. How do you strike a balance here? For example: I say that I am proficient in Java with data structures and algorithms. This is obviously a subjective and relative statement. I've taken the classes in my undergrad, and I've applied it in my work experience. What I feel as "prociency" can be seen as junior-level to others. How do you know what to say? Most of the time, recruiters (with no technical background) will be looking for keywords that stand out. This leads me to my next question (4). 4) Just from interviewing for the past few months (and getting plenty of rejections), I've come to realize that I may not be as proficient in data structures and algorithms as I thought I was. Do you think it's a good idea to remove the "proficient in java/data structure and algorithms"? I feel that being too hoenst on the resume will impede me from scoring opportunities to even have an interview with top-notch companies. What are your thoughts? 5) What is the absolute "must-have" knowledge going into a technical interview? I've been practicing several algorithmic and data sturcture problems now, and I feel that my abilities to solve arbitrary problems efficiently has not gotten significantly better. Do you think these abilities are something innate - it's either you have in you, or you don't? How can you teach yourself to learn, if you will? 6) How easy is it to go from industry/function to the next? I work mainly with backend technologies and I'm now interested in working with the frontend, i.e javascript,jquery,php or even mobile development. In your own experience, how did you not get pidgeon holed in your career? I feel that the choices you make now ultimately decide your future. As cliche as it sounds, I think it may be true. Here's what I mean: you've worked mainly as a backend engineer, people are interested in you doing the same thing since you've already accumulated experience in that function. How do get experience in a new function if people won't accept you because you don't already have it? It's a catch 22, you see... Are side projects the only real way to help you move from one function to another that you're truly interested in? For example: I could start writing my own mobile applications, even though I've worked mainly on the backend. Thanks so much for the long read. As a relatively new engineer to the real world, I am very humble and would like those who are experienced to shed some light. Thank you so much.

    Read the article

  • Developing Schema Compare for Oracle (Part 6): 9i Query Performance

    - by Simon Cooper
    All throughout the EAP and beta versions of Schema Compare for Oracle, our main request was support for Oracle 9i. After releasing version 1.0 with support for 10g and 11g, our next step was then to get version 1.1 of SCfO out with support for 9i. However, there were some significant problems that we had to overcome first. This post will concentrate on query execution time. When we first tested SCfO on a 9i server, after accounting for various changes to the data dictionary, we found that database registration was taking a long time. And I mean a looooooong time. The same database that on 10g or 11g would take a couple of minutes to register would be taking upwards of 30 mins on 9i. Obviously, this is not ideal, so a poke around the query execution plans was required. As an example, let's take the table population query - the one that reads ALL_TABLES and joins it with a few other dictionary views to get us back our list of tables. On 10g, this query takes 5.6 seconds. On 9i, it takes 89.47 seconds. The difference in execution plan is even more dramatic - here's the (edited) execution plan on 10g: -------------------------------------------------------------------------------| Id | Operation | Name | Bytes | Cost |-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 108K| 939 || 1 | SORT ORDER BY | | 108K| 939 || 2 | NESTED LOOPS OUTER | | 108K| 938 ||* 3 | HASH JOIN RIGHT OUTER | | 103K| 762 || 4 | VIEW | ALL_EXTERNAL_LOCATIONS | 2058 | 3 ||* 20 | HASH JOIN RIGHT OUTER | | 73472 | 759 || 21 | VIEW | ALL_EXTERNAL_TABLES | 2097 | 3 ||* 34 | HASH JOIN RIGHT OUTER | | 39920 | 755 || 35 | VIEW | ALL_MVIEWS | 51 | 7 || 58 | NESTED LOOPS OUTER | | 39104 | 748 || 59 | VIEW | ALL_TABLES | 6704 | 668 || 89 | VIEW PUSHED PREDICATE | ALL_TAB_COMMENTS | 2025 | 5 || 106 | VIEW | ALL_PART_TABLES | 277 | 11 |------------------------------------------------------------------------------- And the same query on 9i: -------------------------------------------------------------------------------| Id | Operation | Name | Bytes | Cost |-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 16P| 55G|| 1 | SORT ORDER BY | | 16P| 55G|| 2 | NESTED LOOPS OUTER | | 16P| 862M|| 3 | NESTED LOOPS OUTER | | 5251G| 992K|| 4 | NESTED LOOPS OUTER | | 4243M| 2578 || 5 | NESTED LOOPS OUTER | | 2669K| 1440 ||* 6 | HASH JOIN OUTER | | 398K| 302 || 7 | VIEW | ALL_TABLES | 342K| 276 || 29 | VIEW | ALL_MVIEWS | 51 | 20 ||* 50 | VIEW PUSHED PREDICATE | ALL_TAB_COMMENTS | 2043 | ||* 66 | VIEW PUSHED PREDICATE | ALL_EXTERNAL_TABLES | 1777K| ||* 80 | VIEW PUSHED PREDICATE | ALL_EXTERNAL_LOCATIONS | 1744K| ||* 96 | VIEW | ALL_PART_TABLES | 852K| |------------------------------------------------------------------------------- Have a look at the cost column. 10g's overall query cost is 939, and 9i is 55,000,000,000 (or more precisely, 55,496,472,769). It's also having to process far more data. What on earth could be causing this huge difference in query cost? After trawling through the '10g New Features' documentation, we found item 1.9.2.21. Before 10g, Oracle advised that you do not collect statistics on data dictionary objects. From 10g, it advised that you do collect statistics on the data dictionary; for our queries, Oracle therefore knows what sort of data is in the dictionary tables, and so can generate an efficient execution plan. On 9i, no statistics are present on the system tables, so Oracle has to use the Rule Based Optimizer, which turns most LEFT JOINs into nested loops. If we force 9i to use hash joins, like 10g, we get a much better plan: -------------------------------------------------------------------------------| Id | Operation | Name | Bytes | Cost |-------------------------------------------------------------------------------| 0 | SELECT STATEMENT | | 7587K| 3704 || 1 | SORT ORDER BY | | 7587K| 3704 ||* 2 | HASH JOIN OUTER | | 7587K| 822 ||* 3 | HASH JOIN OUTER | | 5262K| 616 ||* 4 | HASH JOIN OUTER | | 2980K| 465 ||* 5 | HASH JOIN OUTER | | 710K| 432 ||* 6 | HASH JOIN OUTER | | 398K| 302 || 7 | VIEW | ALL_TABLES | 342K| 276 || 29 | VIEW | ALL_MVIEWS | 51 | 20 || 50 | VIEW | ALL_PART_TABLES | 852K| 104 || 78 | VIEW | ALL_TAB_COMMENTS | 2043 | 14 || 93 | VIEW | ALL_EXTERNAL_LOCATIONS | 1744K| 31 || 106 | VIEW | ALL_EXTERNAL_TABLES | 1777K| 28 |------------------------------------------------------------------------------- That's much more like it. This drops the execution time down to 24 seconds. Not as good as 10g, but still an improvement. There are still several problems with this, however. 10g introduced a new join method - a right outer hash join (used in the first execution plan). The 9i query optimizer doesn't have this option available, so forcing a hash join means it has to hash the ALL_TABLES table, and furthermore re-hash it for every hash join in the execution plan; this could be thousands and thousands of rows. And although forcing hash joins somewhat alleviates this problem on our test systems, there's no guarantee that this will improve the execution time on customers' systems; it may even increase the time it takes (say, if all their tables are partitioned, or they've got a lot of materialized views). Ideally, we would want a solution that provides a speedup whatever the input. To try and get some ideas, we asked some oracle performance specialists to see if they had any ideas or tips. Their recommendation was to add a hidden hook into the product that allowed users to specify their own query hints, or even rewrite the queries entirely. However, we would prefer not to take that approach; as well as a lot of new infrastructure & a rewrite of the population code, it would have meant that any users of 9i would have to spend some time optimizing it to get it working on their system before they could use the product. Another approach was needed. All our population queries have a very specific pattern - a base table provides most of the information we need (ALL_TABLES for tables, or ALL_TAB_COLS for columns) and we do a left join to extra subsidiary tables that fill in gaps (for instance, ALL_PART_TABLES for partition information). All the left joins use the same set of columns to join on (typically the object owner & name), so we could re-use the hash information for each join, rather than re-hashing the same columns for every join. To allow us to do this, along with various other performance improvements that could be done for the specific query pattern we were using, we read all the tables individually and do a hash join on the client. Fortunately, this 'pure' algorithmic problem is the kind that can be very well optimized for expected real-world situations; as well as storing row data we're not using in the hash key on disk, we use very specific memory-efficient data structures to store all the information we need. This allows us to achieve a database population time that is as fast as on 10g, and even (in some situations) slightly faster, and a memory overhead of roughly 150 bytes per row of data in the result set (for schemas with 10,000 tables in that means an extra 1.4MB memory being used during population). Next: fun with the 9i dictionary views.

    Read the article

  • How can I estimate the entropy of a password?

    - by Wug
    Having read various resources about password strength I'm trying to create an algorithm that will provide a rough estimation of how much entropy a password has. I'm trying to create an algorithm that's as comprehensive as possible. At this point I only have pseudocode, but the algorithm covers the following: password length repeated characters patterns (logical) different character spaces (LC, UC, Numeric, Special, Extended) dictionary attacks It does NOT cover the following, and SHOULD cover it WELL (though not perfectly): ordering (passwords can be strictly ordered by output of this algorithm) patterns (spatial) Can anyone provide some insight on what this algorithm might be weak to? Specifically, can anyone think of situations where feeding a password to the algorithm would OVERESTIMATE its strength? Underestimations are less of an issue. The algorithm: // the password to test password = ? length = length(password) // unique character counts from password (duplicates discarded) uqlca = number of unique lowercase alphabetic characters in password uquca = number of uppercase alphabetic characters uqd = number of unique digits uqsp = number of unique special characters (anything with a key on the keyboard) uqxc = number of unique special special characters (alt codes, extended-ascii stuff) // algorithm parameters, total sizes of alphabet spaces Nlca = total possible number of lowercase letters (26) Nuca = total uppercase letters (26) Nd = total digits (10) Nsp = total special characters (32 or something) Nxc = total extended ascii characters that dont fit into other categorys (idk, 50?) // algorithm parameters, pw strength growth rates as percentages (per character) flca = entropy growth factor for lowercase letters (.25 is probably a good value) fuca = EGF for uppercase letters (.4 is probably good) fd = EGF for digits (.4 is probably good) fsp = EGF for special chars (.5 is probably good) fxc = EGF for extended ascii chars (.75 is probably good) // repetition factors. few unique letters == low factor, many unique == high rflca = (1 - (1 - flca) ^ uqlca) rfuca = (1 - (1 - fuca) ^ uquca) rfd = (1 - (1 - fd ) ^ uqd ) rfsp = (1 - (1 - fsp ) ^ uqsp ) rfxc = (1 - (1 - fxc ) ^ uqxc ) // digit strengths strength = ( rflca * Nlca + rfuca * Nuca + rfd * Nd + rfsp * Nsp + rfxc * Nxc ) ^ length entropybits = log_base_2(strength) A few inputs and their desired and actual entropy_bits outputs: INPUT DESIRED ACTUAL aaa very pathetic 8.1 aaaaaaaaa pathetic 24.7 abcdefghi weak 31.2 H0ley$Mol3y_ strong 72.2 s^fU¬5ü;y34G< wtf 88.9 [a^36]* pathetic 97.2 [a^20]A[a^15]* strong 146.8 xkcd1** medium 79.3 xkcd2** wtf 160.5 * these 2 passwords use shortened notation, where [a^N] expands to N a's. ** xkcd1 = "Tr0ub4dor&3", xkcd2 = "correct horse battery staple" The algorithm does realize (correctly) that increasing the alphabet size (even by one digit) vastly strengthens long passwords, as shown by the difference in entropy_bits for the 6th and 7th passwords, which both consist of 36 a's, but the second's 21st a is capitalized. However, they do not account for the fact that having a password of 36 a's is not a good idea, it's easily broken with a weak password cracker (and anyone who watches you type it will see it) and the algorithm doesn't reflect that. It does, however, reflect the fact that xkcd1 is a weak password compared to xkcd2, despite having greater complexity density (is this even a thing?). How can I improve this algorithm? Addendum 1 Dictionary attacks and pattern based attacks seem to be the big thing, so I'll take a stab at addressing those. I could perform a comprehensive search through the password for words from a word list and replace words with tokens unique to the words they represent. Word-tokens would then be treated as characters and have their own weight system, and would add their own weights to the password. I'd need a few new algorithm parameters (I'll call them lw, Nw ~= 2^11, fw ~= .5, and rfw) and I'd factor the weight into the password as I would any of the other weights. This word search could be specially modified to match both lowercase and uppercase letters as well as common character substitutions, like that of E with 3. If I didn't add extra weight to such matched words, the algorithm would underestimate their strength by a bit or two per word, which is OK. Otherwise, a general rule would be, for each non-perfect character match, give the word a bonus bit. I could then perform simple pattern checks, such as searches for runs of repeated characters and derivative tests (take the difference between each character), which would identify patterns such as 'aaaaa' and '12345', and replace each detected pattern with a pattern token, unique to the pattern and length. The algorithmic parameters (specifically, entropy per pattern) could be generated on the fly based on the pattern. At this point, I'd take the length of the password. Each word token and pattern token would count as one character; each token would replace the characters they symbolically represented. I made up some sort of pattern notation, but it includes the pattern length l, the pattern order o, and the base element b. This information could be used to compute some arbitrary weight for each pattern. I'd do something better in actual code. Modified Example: Password: 1234kitty$$$$$herpderp Tokenized: 1 2 3 4 k i t t y $ $ $ $ $ h e r p d e r p Words Filtered: 1 2 3 4 @W5783 $ $ $ $ $ @W9001 @W9002 Patterns Filtered: @P[l=4,o=1,b='1'] @W5783 @P[l=5,o=0,b='$'] @W9001 @W9002 Breakdown: 3 small, unique words and 2 patterns Entropy: about 45 bits, as per modified algorithm Password: correcthorsebatterystaple Tokenized: c o r r e c t h o r s e b a t t e r y s t a p l e Words Filtered: @W6783 @W7923 @W1535 @W2285 Breakdown: 4 small, unique words and no patterns Entropy: 43 bits, as per modified algorithm The exact semantics of how entropy is calculated from patterns is up for discussion. I was thinking something like: entropy(b) * l * (o + 1) // o will be either zero or one The modified algorithm would find flaws with and reduce the strength of each password in the original table, with the exception of s^fU¬5ü;y34G<, which contains no words or patterns.

    Read the article

  • How to build, sort and print a tree of a sort?

    - by Tuplanolla
    This is more of an algorithmic dilemma than a language-specific problem, but since I'm currently using Ruby I'll tag this as such. I've already spent over 20 hours on this and I would've never believed it if someone told me writing a LaTeX parser was a walk in the park in comparison. I have a loop to read hierarchies (that are prefixed with \m) from different files art.tex: \m{Art} graphical.tex: \m{Art}{Graphical} me.tex: \m{About}{Me} music.tex: \m{Art}{Music} notes.tex: \m{Art}{Music}{Sheet Music} site.tex: \m{About}{Site} something.tex: \m{Something} whatever.tex: \m{Something}{That}{Does Not}{Matter} and I need to sort them alphabetically and print them out as a tree About Me (me.tex) Site (site.tex) Art (art.tex) Graphical (graphical.tex) Music (music.tex) Sheet Music (notes.tex) Something (something.tex) That Does Not Matter (whatever.tex) in (X)HTML <ul> <li>About</li> <ul> <li><a href="me.tex">Me</a></li> <li><a href="site.tex">Site</a></li> </ul> <li><a href="art.tex">Art</a></li> <ul> <li><a href="graphical.tex">Graphical</a></li> <li><a href="music.tex">Music</a></li> <ul> <li><a href="notes.tex">Sheet Music</a></li> </ul> </ul> <li><a href="something.tex">Something</a></li> <ul> <li>That</li> <ul> <li>Doesn't</li> <ul> <li><a href="whatever.tex">Matter</a></li> </ul> </ul> </ul> </ul> using Ruby without Rails, which means that at least Array.sort and Dir.glob are available. All of my attempts were formed like this (as this part should work just fine). def fss_brace_array(ss_input)#a concise version of another function; converts {1}{2}...{n} into an array [1, 2, ..., n] or returns an empty array ss_output = ss_input[1].scan(%r{\{(.*?)\}}) rescue ss_output = [] ensure return ss_output end #define tree s_handle = File.join(:content.to_s, "*") Dir.glob("#{s_handle}.tex").each do |s_handle| File.open(s_handle, "r") do |f_handle| while s_line = f_handle.gets if s_all = s_line.match(%r{\\m\{(\{.*?\})+\}}) s_all = s_all.to_a #do something with tree, fss_brace_array(s_all) and s_handle break end end end end #do something else with tree

    Read the article

  • Improving performance on data pasting 2000 rows with validations

    - by Lohit
    I have N rows (which could be nothing less than 1000) on an excel spreadsheet. And in this sheet our project has 150 columns like this: Now, our application needs data to be copied (using normal Ctrl+C) and pasted (using Ctrl+V) from the excel file sheet on our GUI sheet. Copy pasting 1000 records takes around 5-6 seconds which is okay for our requirement, but the problem is when we need to make sure the data entered is valid. So we have to validate data in each row generate appropriate error messages and format the data as per requirement. So we need to at runtime parse and evaluate data in each row. Now all the formatting of data and validations come from the back-end database and we have it in a data-table (dtValidateAndFormatConditions). The conditions would be around 50. So you can see how slow this whole process becomes since N X 150 X 50 operations are required to complete this whole process. Initially it took approximately 2-3 minutes but now i have reduced it to 20 - 30 seconds. However i have increased the speed by making an expression parser of my own - and not by any algorithm, is there any other way i can improve performance, by using Divide and Conquer or some other mechanism. Currently i am not really sure how to go about this. Here is what part of my code looks like: public virtual void ValidateAndFormatOnCopyPaste(DataTable DtCopied, int CurRow) { foreach (DataRow dRow in dtValidateAndFormatConditions.Rows) { string Condition = dRow["Condition"]; string FormatValue = Value = dRow["Value"]; GetValidatedFormattedData(DtCopied,ref Condition, ref FormatValue ,iRowIndex); Condition = Parse(Condition); dRow["Condition"] = Condition; FormatValue = Parse(FormatValue ); dRow["Value"] = FormatValue; } } The above code gets called row-wise like this: public override void ValidateAndFormat(DataTable dtChangedRecords, CellRange cr) { int iRowStart = cr.Row, iRowEnd = cr.Row + cr.RowCount; for (int iRow = iRowStart; iRow < iRowEnd; iRow++) { ValidateAndFormatOnCopyPaste(dtChangedRecords,iRow); } } Please know my question needs a more algorithmic solution than code optimization, however any answers containing code related optimizations will be appreciated as well. (Tagged Linq because although not seen i have been using linq in some parts of my code).

    Read the article

  • How can I represent a line of music notes in a way that allows fast insertion at any index?

    - by chairbender
    For "fun", and to learn functional programming, I'm developing a program in Clojure that does algorithmic composition using ideas from this theory of music called "Westergaardian Theory". It generates lines of music (where a line is just a single staff consisting of a sequence of notes, each with pitches and durations). It basically works like this: Start with a line consisting of three notes (the specifics of how these are chosen are not important). Randomly perform one of several "operations" on this line. The operation picks randomly from all pairs of adjacent notes that meet a certain criteria (for each pair, the criteria only depends on the pair and is independent of the other notes in the line). It inserts 1 or several notes (depending on the operation) between the chosen pair. Each operation has its own unique criteria. Continue randomly performing these operations on the line until the line is the desired length. The issue I've run into is that my implementation of this is quite slow, and I suspect it could be made faster. I'm new to Clojure and functional programming in general (though I'm experienced with OO), so I'm hoping someone with more experience can point out if I'm not thinking in a functional paradigm or missing out on some FP technique. My current implementation is that each line is a vector containing maps. Each map has a :note and a :dur. :note's value is a keyword representing a musical note like :A4 or :C#3. :dur's value is a fraction, representing the duration of the note (1 is a whole note, 1/4 is a quarter note, etc...). So, for example, a line representing the C major scale starting on C3 would look like this: [ {:note :C3 :dur 1} {:note :D3 :dur 1} {:note :E3 :dur 1} {:note :F3 :dur 1} {:note :G3 :dur 1} {:note :A4 :dur 1} {:note :B4 :dur 1} ] This is a problematic representation because there's not really a quick way to insert into an arbitrary index of a vector. But insertion is the most frequently performed operation on these lines. My current terrible function for inserting notes into a line basically splits the vector using subvec at the point of insertion, uses conj to join the first part + notes + last part, then uses flatten and vec to make them all be in a one-dimensional vector. For example if I want to insert C3 and D3 into the the C major scale at index 3 (where the F3 is), it would do this (I'll use the note name in place of the :note and :dur maps): (conj [C3 D3 E3] [C3 D3] [F3 G3 A4 B4]), which creates [C3 D3 E3 [C3 D3] [F3 G3 A4 B4]] (vec (flatten previous-vector)) which gives [C3 D3 E3 C3 D3 F3 G3 A4 B4] The run time of that is O(n), AFAIK. I'm looking for a way to make this insertion faster. I've searched for information on Clojure data structures that have fast insertion but haven't found anything that would work. I found "finger trees" but they only allow fast insertion at the start or end of the list. Edit: I split this into two questions. The other part is here.

    Read the article

  • Auto-Suggest via &lsquo;Trie&rsquo; (Pre-fix Tree)

    - by Strenium
    Auto-Suggest (Auto-Complete) “thing” has been around for a few years. Here’s my little snippet on the subject. For one of my projects, I had to deal with a non-trivial set of items to be pulled via auto-suggest used by multiple concurrent users. Simple, dumb iteration through a list in local cache or back-end access didn’t quite cut it. Enter a nifty little structure, perfectly suited for storing and matching verbal data: “Trie” (http://tinyurl.com/db56g) also known as a Pre-fix Tree: “Unlike a binary search tree, no node in the tree stores the key associated with that node; instead, its position in the tree defines the key with which it is associated. All the descendants of a node have a common prefix of the string associated with that node, and the root is associated with the empty string. Values are normally not associated with every node, only with leaves and some inner nodes that correspond to keys of interest.” This is a very scalable, performing structure. Though, as usual, something ‘fast’ comes at a cost of ‘size’; fortunately RAM is more plentiful today so I can live with that. I won’t bore you with the detailed algorithmic performance here - Google can do a better job of such. So, here’s C# implementation of all this. Let’s start with individual node: Trie Node /// <summary> /// Contains datum of a single trie node. /// </summary> public class AutoSuggestTrieNode {     public char Value { get; set; }       /// <summary>     /// Gets a value indicating whether this instance is leaf node.     /// </summary>     /// <value>     ///     <c>true</c> if this instance is leaf node; otherwise, a prefix node <c>false</c>.     /// </value>     public bool IsLeafNode { get; private set; }       public List<AutoSuggestTrieNode> DescendantNodes { get; private set; }         /// <summary>     /// Initializes a new instance of the <see cref="AutoSuggestTrieNode"/> class.     /// </summary>     /// <param name="value">The phonetic value.</param>     /// <param name="isLeafNode">if set to <c>true</c> [is leaf node].</param>     public AutoSuggestTrieNode(char value = ' ', bool isLeafNode = false)     {         Value = value;         IsLeafNode = isLeafNode;           DescendantNodes = new List<AutoSuggestTrieNode>();     }       /// <summary>     /// Gets the descendants of the pre-fix node, if any.     /// </summary>     /// <param name="descendantValue">The descendant value.</param>     /// <returns></returns>     public AutoSuggestTrieNode GetDescendant(char descendantValue)     {         return DescendantNodes.FirstOrDefault(descendant => descendant.Value == descendantValue);     } }   Quite self-explanatory, imho. A node is either a “Pre-fix” or a “Leaf” node. “Leaf” contains the full “word”, while the “Pre-fix” nodes act as indices used for matching the results.   Ok, now the Trie: Trie Structure /// <summary> /// Contains structure and functionality of an AutoSuggest Trie (Pre-fix Tree) /// </summary> public class AutoSuggestTrie {     private readonly AutoSuggestTrieNode _root = new AutoSuggestTrieNode();       /// <summary>     /// Adds the word to the trie by breaking it up to pre-fix nodes + leaf node.     /// </summary>     /// <param name="word">Phonetic value.</param>     public void AddWord(string word)     {         var currentNode = _root;         word = word.Trim().ToLower();           for (int i = 0; i < word.Length; i++)         {             var child = currentNode.GetDescendant(word[i]);               if (child == null) /* this character hasn't yet been indexed in the trie */             {                 var newNode = new AutoSuggestTrieNode(word[i], word.Count() - 1 == i);                   currentNode.DescendantNodes.Add(newNode);                 currentNode = newNode;             }             else                 currentNode = child; /* this character is already indexed, move down the trie */         }     }         /// <summary>     /// Gets the suggested matches.     /// </summary>     /// <param name="word">The phonetic search value.</param>     /// <returns></returns>     public List<string> GetSuggestedMatches(string word)     {         var currentNode = _root;         word = word.Trim().ToLower();           var indexedNodesValues = new StringBuilder();         var resultBag = new ConcurrentBag<string>();           for (int i = 0; i < word.Trim().Length; i++)  /* traverse the trie collecting closest indexed parent (parent can't be leaf, obviously) */         {             var child = currentNode.GetDescendant(word[i]);               if (child == null || word.Count() - 1 == i)                 break; /* done looking, the rest of the characters aren't indexed in the trie */               indexedNodesValues.Append(word[i]);             currentNode = child;         }           Action<AutoSuggestTrieNode, string> collectAllMatches = null;         collectAllMatches = (node, aggregatedValue) => /* traverse the trie collecting matching leafNodes (i.e. "full words") */             {                 if (node.IsLeafNode) /* full word */                     resultBag.Add(aggregatedValue); /* thread-safe write */                   Parallel.ForEach(node.DescendantNodes, descendandNode => /* asynchronous recursive traversal */                 {                     collectAllMatches(descendandNode, String.Format("{0}{1}", aggregatedValue, descendandNode.Value));                 });             };           collectAllMatches(currentNode, indexedNodesValues.ToString());           return resultBag.OrderBy(o => o).ToList();     }         /// <summary>     /// Gets the total words (leafs) in the trie. Recursive traversal.     /// </summary>     public int TotalWords     {         get         {             int runningCount = 0;               Action<AutoSuggestTrieNode> traverseAllDecendants = null;             traverseAllDecendants = n => { runningCount += n.DescendantNodes.Count(o => o.IsLeafNode); n.DescendantNodes.ForEach(traverseAllDecendants); };             traverseAllDecendants(this._root);               return runningCount;         }     } }   Matching operations and Inserts involve traversing the nodes before the right “spot” is found. Inserts need be synchronous since ordering of data matters here. However, matching can be done in parallel traversal using recursion (line 64). Here’s sample usage:   [TestMethod] public void AutoSuggestTest() {     var autoSuggestCache = new AutoSuggestTrie();       var testInput = @"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer nec odio. Praesent libero.                 Sed cursus ante dapibus diam. Sed nisi. Nulla quis sem at nibh elementum imperdiet. Duis sagittis ipsum. Praesent mauris.                 Fusce nec tellus sed augue semper porta. Mauris massa. Vestibulum lacinia arcu eget nulla. Class aptent taciti sociosqu ad                 litora torquent per conubia nostra, per inceptos himenaeos. Curabitur sodales ligula in libero. Sed dignissim lacinia nunc.                 Curabitur tortor. Pellentesque nibh. Aenean quam. In scelerisque sem at dolor. Maecenas mattis. Sed convallis tristique sem.                 Proin ut ligula vel nunc egestas porttitor. Morbi lectus risus, iaculis vel, suscipit quis, luctus non, massa. Fusce ac                 turpis quis ligula lacinia aliquet. Mauris ipsum. Nulla metus metus, ullamcorper vel, tincidunt sed, euismod in, nibh. Quisque                 volutpat condimentum velit. Class aptent taciti sociosqu ad litora torquent per conubia nostra, per inceptos himenaeos. Nam                 nec ante. Sed lacinia, urna non tincidunt mattis, tortor neque adipiscing diam, a cursus ipsum ante quis turpis. Nulla                 facilisi. Ut fringilla. Suspendisse potenti. Nunc feugiat mi a tellus consequat imperdiet. Vestibulum sapien. Proin quam. Etiam                 ultrices. Suspendisse in justo eu magna luctus suscipit. Sed lectus. Integer euismod lacus luctus magna. Quisque cursus, metus                 vitae pharetra auctor, sem massa mattis sem, at interdum magna augue eget diam. Vestibulum ante ipsum primis in faucibus orci                 luctus et ultrices posuere cubilia Curae; Morbi lacinia molestie dui. Praesent blandit dolor. Sed non quam. In vel mi sit amet                 augue congue elementum. Morbi in ipsum sit amet pede facilisis laoreet. Donec lacus nunc, viverra nec.";       testInput.Split(' ').ToList().ForEach(word => autoSuggestCache.AddWord(word));       var testMatches = autoSuggestCache.GetSuggestedMatches("le"); }   ..and the result: That’s it!

    Read the article

  • Scala n00b: Critique my code

    - by Peter
    G'day everyone, I'm a Scala n00b (but am experienced with other languages) and am learning the language as I find time - very much enjoying it so far! Usually when learning a new language the first thing I do is implement Conway's Game of Life, since it's just complex enough to give a good sense of the language, but small enough in scope to be able to whip up in a couple of hours (most of which is spent wrestling with syntax). Anyhoo, having gone through this exercise with Scala I was hoping the Scala gurus out there might take a look at the code I've ended up with and provide feedback on it. I'm after anything - algorithmic improvements (particularly concurrent solutions!), stylistic improvements, alternative APIs or language constructs, disgust at the length of my function names - whatever feedback you've got, I'm keen to hear it! You should be able to run the following script via "scala GameOfLife.scala" - by default it will run a 20x20 board with a single glider on it - please feel free to experiment. // CONWAY'S GAME OF LIFE (SCALA) abstract class GameOfLifeBoard(val aliveCells : Set[Tuple2[Int, Int]]) { // Executes a "time tick" - returns a new board containing the next generation def tick : GameOfLifeBoard // Is the board empty? def empty : Boolean = aliveCells.size == 0 // Is the given cell alive? protected def alive(cell : Tuple2[Int, Int]) : Boolean = aliveCells contains cell // Is the given cell dead? protected def dead(cell : Tuple2[Int, Int]) : Boolean = !alive(cell) } class InfiniteGameOfLifeBoard(aliveCells : Set[Tuple2[Int, Int]]) extends GameOfLifeBoard(aliveCells) { // Executes a "time tick" - returns a new board containing the next generation override def tick : GameOfLifeBoard = new InfiniteGameOfLifeBoard(nextGeneration) // The next generation of this board protected def nextGeneration : Set[Tuple2[Int, Int]] = aliveCells flatMap neighbours filter shouldCellLiveInNextGeneration // Should the given cell should live in the next generation? protected def shouldCellLiveInNextGeneration(cell : Tuple2[Int, Int]) : Boolean = (alive(cell) && (numberOfAliveNeighbours(cell) == 2 || numberOfAliveNeighbours(cell) == 3)) || (dead(cell) && numberOfAliveNeighbours(cell) == 3) // The number of alive neighbours for the given cell protected def numberOfAliveNeighbours(cell : Tuple2[Int, Int]) : Int = aliveNeighbours(cell) size // Returns the alive neighbours for the given cell protected def aliveNeighbours(cell : Tuple2[Int, Int]) : Set[Tuple2[Int, Int]] = aliveCells intersect neighbours(cell) // Returns all neighbours (whether dead or alive) for the given cell protected def neighbours(cell : Tuple2[Int, Int]) : Set[Tuple2[Int, Int]] = Set((cell._1-1, cell._2-1), (cell._1, cell._2-1), (cell._1+1, cell._2-1), (cell._1-1, cell._2), (cell._1+1, cell._2), (cell._1-1, cell._2+1), (cell._1, cell._2+1), (cell._1+1, cell._2+1)) // Information on where the currently live cells are protected def xVals = aliveCells map { cell => cell._1 } protected def xMin = (xVals reduceLeft (_ min _)) - 1 protected def xMax = (xVals reduceLeft (_ max _)) + 1 protected def xRange = xMin until xMax + 1 protected def yVals = aliveCells map { cell => cell._2 } protected def yMin = (yVals reduceLeft (_ min _)) - 1 protected def yMax = (yVals reduceLeft (_ max _)) + 1 protected def yRange = yMin until yMax + 1 // Returns a simple graphical representation of this board override def toString : String = { var result = "" for (y <- yRange) { for (x <- xRange) { if (alive (x,y)) result += "# " else result += ". " } result += "\n" } result } // Equality stuff override def equals(other : Any) : Boolean = { other match { case that : InfiniteGameOfLifeBoard => (that canEqual this) && that.aliveCells == this.aliveCells case _ => false } } def canEqual(other : Any) : Boolean = other.isInstanceOf[InfiniteGameOfLifeBoard] override def hashCode = aliveCells.hashCode } class FiniteGameOfLifeBoard(val boardWidth : Int, val boardHeight : Int, aliveCells : Set[Tuple2[Int, Int]]) extends InfiniteGameOfLifeBoard(aliveCells) { override def tick : GameOfLifeBoard = new FiniteGameOfLifeBoard(boardWidth, boardHeight, nextGeneration) // Determines the coordinates of all of the neighbours of the given cell override protected def neighbours(cell : Tuple2[Int, Int]) : Set[Tuple2[Int, Int]] = super.neighbours(cell) filter { cell => cell._1 >= 0 && cell._1 < boardWidth && cell._2 >= 0 && cell._2 < boardHeight } // Information on where the currently live cells are override protected def xRange = 0 until boardWidth override protected def yRange = 0 until boardHeight // Equality stuff override def equals(other : Any) : Boolean = { other match { case that : FiniteGameOfLifeBoard => (that canEqual this) && that.boardWidth == this.boardWidth && that.boardHeight == this.boardHeight && that.aliveCells == this.aliveCells case _ => false } } override def canEqual(other : Any) : Boolean = other.isInstanceOf[FiniteGameOfLifeBoard] override def hashCode : Int = { 41 * ( 41 * ( 41 + super.hashCode ) + boardHeight.hashCode ) + boardWidth.hashCode } } class GameOfLife(initialBoard: GameOfLifeBoard) { // Run the game of life until the board is empty or the exact same board is seen twice // Important note: this method does NOT necessarily terminate!! def go : Unit = { var currentBoard = initialBoard var previousBoards = List[GameOfLifeBoard]() while (!currentBoard.empty && !(previousBoards contains currentBoard)) { print(27.toChar + "[2J") // ANSI: clear screen print(27.toChar + "[;H") // ANSI: move cursor to top left corner of screen println(currentBoard.toString) Thread.sleep(75) // Warning: unbounded list concatenation can result in OutOfMemoryExceptions ####TODO: replace with LRU bounded list previousBoards = List(currentBoard) ::: previousBoards currentBoard = currentBoard tick } // Print the final board print(27.toChar + "[2J") // ANSI: clear screen print(27.toChar + "[;H") // ANSI: move cursor to top left corner of screen println(currentBoard.toString) } } // Script starts here val simple = Set((1,1)) val square = Set((4,4), (4,5), (5,4), (5,5)) val glider = Set((2,1), (3,2), (1,3), (2,3), (3,3)) val initialBoard = glider (new GameOfLife(new FiniteGameOfLifeBoard(20, 20, initialBoard))).go //(new GameOfLife(new InfiniteGameOfLifeBoard(initialBoard))).go // COPYRIGHT PETER MONKS 2010 Thanks! Peter

    Read the article

  • rotating bitmaps. In code.

    - by Marco van de Voort
    Is there a faster way to rotate a large bitmap by 90 or 270 degrees than simply doing a nested loop with inverted coordinates? The bitmaps are 8bpp and typically 2048*2400*8bpp Currently I do this by simply copying with argument inversion, roughly (pseudo code: for x = 0 to 2048-1 for y = 0 to 2048-1 dest[x][y]=src[y][x]; (In reality I do it with pointers, for a bit more speed, but that is roughly the same magnitude) GDI is quite slow with large images, and GPU load/store times for textures (GF7 cards) are in the same magnitude as the current CPU time. Any tips, pointers? An in-place algorithm would even be better, but speed is more important than being in-place. Target is Delphi, but it is more an algorithmic question. SSE(2) vectorization no problem, it is a big enough problem for me to code it in assembler Duplicates How do you rotate a two dimensional array?. Follow up to Nils' answer Image 2048x2700 - 2700x2048 Compiler Turbo Explorer 2006 with optimization on. Windows: Power scheme set to "Always on". (important!!!!) Machine: Core2 6600 (2.4 GHz) time with old routine: 32ms (step 1) time with stepsize 8 : 12ms time with stepsize 16 : 10ms time with stepsize 32+ : 9ms Meanwhile I also tested on a Athlon 64 X2 (5200+ iirc), and the speed up there was slightly more than a factor four (80 to 19 ms). The speed up is well worth it, thanks. Maybe that during the summer months I'll torture myself with a SSE(2) version. However I already thought about how to tackle that, and I think I'll run out of SSE2 registers for an straight implementation: for n:=0 to 7 do begin load r0, <source+n*rowsize> shift byte from r0 into r1 shift byte from r0 into r2 .. shift byte from r0 into r8 end; store r1, <target> store r2, <target+1*<rowsize> .. store r8, <target+7*<rowsize> So 8x8 needs 9 registers, but 32-bits SSE only has 8. Anyway that is something for the summer months :-) Note that the pointer thing is something that I do out of instinct, but it could be there is actually something to it, if your dimensions are not hardcoded, the compiler can't turn the mul into a shift. While muls an sich are cheap nowadays, they also generate more register pressure afaik. The code (validated by subtracting result from the "naieve" rotate1 implementation): const stepsize = 32; procedure rotatealign(Source: tbw8image; Target:tbw8image); var stepsx,stepsy,restx,resty : Integer; RowPitchSource, RowPitchTarget : Integer; pSource, pTarget,ps1,ps2 : pchar; x,y,i,j: integer; rpstep : integer; begin RowPitchSource := source.RowPitch; // bytes to jump to next line. Can be negative (includes alignment) RowPitchTarget := target.RowPitch; rpstep:=RowPitchTarget*stepsize; stepsx:=source.ImageWidth div stepsize; stepsy:=source.ImageHeight div stepsize; // check if mod 16=0 here for both dimensions, if so -> SSE2. for y := 0 to stepsy - 1 do begin psource:=source.GetImagePointer(0,y*stepsize); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(target.imagewidth-(y+1)*stepsize,0); for x := 0 to stepsx - 1 do begin for i := 0 to stepsize - 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[stepsize-1-i]; // (maxx-i,0); for j := 0 to stepsize - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize); inc(ptarget,rpstep); end; end; // 3 more areas to do, with dimensions // - stepsy*stepsize * restx // right most column of restx width // - stepsx*stepsize * resty // bottom row with resty height // - restx*resty // bottom-right rectangle. restx:=source.ImageWidth mod stepsize; // typically zero because width is // typically 1024 or 2048 resty:=source.Imageheight mod stepsize; if restx>0 then begin // one loop less, since we know this fits in one line of "blocks" psource:=source.GetImagePointer(source.ImageWidth-restx,0); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(Target.imagewidth-stepsize,Target.imageheight-restx); for y := 0 to stepsy - 1 do begin for i := 0 to stepsize - 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[stepsize-1-i]; // (maxx-i,0); for j := 0 to restx - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize*RowPitchSource); dec(ptarget,stepsize); end; end; if resty>0 then begin // one loop less, since we know this fits in one line of "blocks" psource:=source.GetImagePointer(0,source.ImageHeight-resty); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(0,0); for x := 0 to stepsx - 1 do begin for i := 0 to resty- 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[resty-1-i]; // (maxx-i,0); for j := 0 to stepsize - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; inc(psource,stepsize); inc(ptarget,rpstep); end; end; if (resty>0) and (restx>0) then begin // another loop less, since only one block psource:=source.GetImagePointer(source.ImageWidth-restx,source.ImageHeight-resty); // gets pointer to pixel x,y ptarget:=Target.GetImagePointer(0,target.ImageHeight-restx); for i := 0 to resty- 1 do begin ps1:=@psource[rowpitchsource*i]; // ( 0,i) ps2:=@ptarget[resty-1-i]; // (maxx-i,0); for j := 0 to restx - 1 do begin ps2[0]:=ps1[j]; inc(ps2,RowPitchTarget); end; end; end; end;

    Read the article

  • The Incremental Architect&rsquo;s Napkin - #5 - Design functions for extensibility and readability

    - by Ralf Westphal
    Originally posted on: http://geekswithblogs.net/theArchitectsNapkin/archive/2014/08/24/the-incremental-architectrsquos-napkin---5---design-functions-for.aspx The functionality of programs is entered via Entry Points. So what we´re talking about when designing software is a bunch of functions handling the requests represented by and flowing in through those Entry Points. Designing software thus consists of at least three phases: Analyzing the requirements to find the Entry Points and their signatures Designing the functionality to be executed when those Entry Points get triggered Implementing the functionality according to the design aka coding I presume, you´re familiar with phase 1 in some way. And I guess you´re proficient in implementing functionality in some programming language. But in my experience developers in general are not experienced in going through an explicit phase 2. “Designing functionality? What´s that supposed to mean?” you might already have thought. Here´s my definition: To design functionality (or functional design for short) means thinking about… well, functions. You find a solution for what´s supposed to happen when an Entry Point gets triggered in terms of functions. A conceptual solution that is, because those functions only exist in your head (or on paper) during this phase. But you may have guess that, because it´s “design” not “coding”. And here is, what functional design is not: It´s not about logic. Logic is expressions (e.g. +, -, && etc.) and control statements (e.g. if, switch, for, while etc.). Also I consider calling external APIs as logic. It´s equally basic. It´s what code needs to do in order to deliver some functionality or quality. Logic is what´s doing that needs to be done by software. Transformations are either done through expressions or API-calls. And then there is alternative control flow depending on the result of some expression. Basically it´s just jumps in Assembler, sometimes to go forward (if, switch), sometimes to go backward (for, while, do). But calling your own function is not logic. It´s not necessary to produce any outcome. Functionality is not enhanced by adding functions (subroutine calls) to your code. Nor is quality increased by adding functions. No performance gain, no higher scalability etc. through functions. Functions are not relevant to functionality. Strange, isn´t it. What they are important for is security of investment. By introducing functions into our code we can become more productive (re-use) and can increase evolvability (higher unterstandability, easier to keep code consistent). That´s no small feat, however. Evolvable code can hardly be overestimated. That´s why to me functional design is so important. It´s at the core of software development. To sum this up: Functional design is on a level of abstraction above (!) logical design or algorithmic design. Functional design is only done until you get to a point where each function is so simple you are very confident you can easily code it. Functional design an logical design (which mostly is coding, but can also be done using pseudo code or flow charts) are complementary. Software needs both. If you start coding right away you end up in a tangled mess very quickly. Then you need back out through refactoring. Functional design on the other hand is bloodless without actual code. It´s just a theory with no experiments to prove it. But how to do functional design? An example of functional design Let´s assume a program to de-duplicate strings. The user enters a number of strings separated by commas, e.g. a, b, a, c, d, b, e, c, a. And the program is supposed to clear this list of all doubles, e.g. a, b, c, d, e. There is only one Entry Point to this program: the user triggers the de-duplication by starting the program with the string list on the command line C:\>deduplicate "a, b, a, c, d, b, e, c, a" a, b, c, d, e …or by clicking on a GUI button. This leads to the Entry Point function to get called. It´s the program´s main function in case of the batch version or a button click event handler in the GUI version. That´s the physical Entry Point so to speak. It´s inevitable. What then happens is a three step process: Transform the input data from the user into a request. Call the request handler. Transform the output of the request handler into a tangible result for the user. Or to phrase it a bit more generally: Accept input. Transform input into output. Present output. This does not mean any of these steps requires a lot of effort. Maybe it´s just one line of code to accomplish it. Nevertheless it´s a distinct step in doing the processing behind an Entry Point. Call it an aspect or a responsibility - and you will realize it most likely deserves a function of its own to satisfy the Single Responsibility Principle (SRP). Interestingly the above list of steps is already functional design. There is no logic, but nevertheless the solution is described - albeit on a higher level of abstraction than you might have done yourself. But it´s still on a meta-level. The application to the domain at hand is easy, though: Accept string list from command line De-duplicate Present de-duplicated strings on standard output And this concrete list of processing steps can easily be transformed into code:static void Main(string[] args) { var input = Accept_string_list(args); var output = Deduplicate(input); Present_deduplicated_string_list(output); } Instead of a big problem there are three much smaller problems now. If you think each of those is trivial to implement, then go for it. You can stop the functional design at this point. But maybe, just maybe, you´re not so sure how to go about with the de-duplication for example. Then just implement what´s easy right now, e.g.private static string Accept_string_list(string[] args) { return args[0]; } private static void Present_deduplicated_string_list( string[] output) { var line = string.Join(", ", output); Console.WriteLine(line); } Accept_string_list() contains logic in the form of an API-call. Present_deduplicated_string_list() contains logic in the form of an expression and an API-call. And then repeat the functional design for the remaining processing step. What´s left is the domain logic: de-duplicating a list of strings. How should that be done? Without any logic at our disposal during functional design you´re left with just functions. So which functions could make up the de-duplication? Here´s a suggestion: De-duplicate Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Processing step 2 obviously was the core of the solution. That´s where real creativity was needed. That´s the core of the domain. But now after this refinement the implementation of each step is easy again:private static string[] Parse_string_list(string input) { return input.Split(',') .Select(s => s.Trim()) .ToArray(); } private static Dictionary<string,object> Compile_unique_strings(string[] strings) { return strings.Aggregate( new Dictionary<string, object>(), (agg, s) => { agg[s] = null; return agg; }); } private static string[] Serialize_unique_strings( Dictionary<string,object> dict) { return dict.Keys.ToArray(); } With these three additional functions Main() now looks like this:static void Main(string[] args) { var input = Accept_string_list(args); var strings = Parse_string_list(input); var dict = Compile_unique_strings(strings); var output = Serialize_unique_strings(dict); Present_deduplicated_string_list(output); } I think that´s very understandable code: just read it from top to bottom and you know how the solution to the problem works. It´s a mirror image of the initial design: Accept string list from command line Parse the input string into a true list of strings. Register each string in a dictionary/map/set. That way duplicates get cast away. Transform the data structure into a list of unique strings. Present de-duplicated strings on standard output You can even re-generate the design by just looking at the code. Code and functional design thus are always in sync - if you follow some simple rules. But about that later. And as a bonus: all the functions making up the process are small - which means easy to understand, too. So much for an initial concrete example. Now it´s time for some theory. Because there is method to this madness ;-) The above has only scratched the surface. Introducing Flow Design Functional design starts with a given function, the Entry Point. Its goal is to describe the behavior of the program when the Entry Point is triggered using a process, not an algorithm. An algorithm consists of logic, a process on the other hand consists just of steps or stages. Each processing step transforms input into output or a side effect. Also it might access resources, e.g. a printer, a database, or just memory. Processing steps thus can rely on state of some sort. This is different from Functional Programming, where functions are supposed to not be stateful and not cause side effects.[1] In its simplest form a process can be written as a bullet point list of steps, e.g. Get data from user Output result to user Transform data Parse data Map result for output Such a compilation of steps - possibly on different levels of abstraction - often is the first artifact of functional design. It can be generated by a team in an initial design brainstorming. Next comes ordering the steps. What should happen first, what next etc.? Get data from user Parse data Transform data Map result for output Output result to user That´s great for a start into functional design. It´s better than starting to code right away on a given function using TDD. Please get me right: TDD is a valuable practice. But it can be unnecessarily hard if the scope of a functionn is too large. But how do you know beforehand without investing some thinking? And how to do this thinking in a systematic fashion? My recommendation: For any given function you´re supposed to implement first do a functional design. Then, once you´re confident you know the processing steps - which are pretty small - refine and code them using TDD. You´ll see that´s much, much easier - and leads to cleaner code right away. For more information on this approach I call “Informed TDD” read my book of the same title. Thinking before coding is smart. And writing down the solution as a bunch of functions possibly is the simplest thing you can do, I´d say. It´s more according to the KISS (Keep It Simple, Stupid) principle than returning constants or other trivial stuff TDD development often is started with. So far so good. A simple ordered list of processing steps will do to start with functional design. As shown in the above example such steps can easily be translated into functions. Moving from design to coding thus is simple. However, such a list does not scale. Processing is not always that simple to be captured in a list. And then the list is just text. Again. Like code. That means the design is lacking visuality. Textual representations need more parsing by your brain than visual representations. Plus they are limited in their “dimensionality”: text just has one dimension, it´s sequential. Alternatives and parallelism are hard to encode in text. In addition the functional design using numbered lists lacks data. It´s not visible what´s the input, output, and state of the processing steps. That´s why functional design should be done using a lightweight visual notation. No tool is necessary to draw such designs. Use pen and paper; a flipchart, a whiteboard, or even a napkin is sufficient. Visualizing processes The building block of the functional design notation is a functional unit. I mostly draw it like this: Something is done, it´s clear what goes in, it´s clear what comes out, and it´s clear what the processing step requires in terms of state or hardware. Whenever input flows into a functional unit it gets processed and output is produced and/or a side effect occurs. Flowing data is the driver of something happening. That´s why I call this approach to functional design Flow Design. It´s about data flow instead of control flow. Control flow like in algorithms is of no concern to functional design. Thinking about control flow simply is too low level. Once you start with control flow you easily get bogged down by tons of details. That´s what you want to avoid during design. Design is supposed to be quick, broad brush, abstract. It should give overview. But what about all the details? As Robert C. Martin rightly said: “Programming is abot detail”. Detail is a matter of code. Once you start coding the processing steps you designed you can worry about all the detail you want. Functional design does not eliminate all the nitty gritty. It just postpones tackling them. To me that´s also an example of the SRP. Function design has the responsibility to come up with a solution to a problem posed by a single function (Entry Point). And later coding has the responsibility to implement the solution down to the last detail (i.e. statement, API-call). TDD unfortunately mixes both responsibilities. It´s just coding - and thereby trying to find detailed implementations (green phase) plus getting the design right (refactoring). To me that´s one reason why TDD has failed to deliver on its promise for many developers. Using functional units as building blocks of functional design processes can be depicted very easily. Here´s the initial process for the example problem: For each processing step draw a functional unit and label it. Choose a verb or an “action phrase” as a label, not a noun. Functional design is about activities, not state or structure. Then make the output of an upstream step the input of a downstream step. Finally think about the data that should flow between the functional units. Write the data above the arrows connecting the functional units in the direction of the data flow. Enclose the data description in brackets. That way you can clearly see if all flows have already been specified. Empty brackets mean “no data is flowing”, but nevertheless a signal is sent. A name like “list” or “strings” in brackets describes the data content. Use lower case labels for that purpose. A name starting with an upper case letter like “String” or “Customer” on the other hand signifies a data type. If you like, you also can combine descriptions with data types by separating them with a colon, e.g. (list:string) or (strings:string[]). But these are just suggestions from my practice with Flow Design. You can do it differently, if you like. Just be sure to be consistent. Flows wired-up in this manner I call one-dimensional (1D). Each functional unit just has one input and/or one output. A functional unit without an output is possible. It´s like a black hole sucking up input without producing any output. Instead it produces side effects. A functional unit without an input, though, does make much sense. When should it start to work? What´s the trigger? That´s why in the above process even the first processing step has an input. If you like, view such 1D-flows as pipelines. Data is flowing through them from left to right. But as you can see, it´s not always the same data. It get´s transformed along its passage: (args) becomes a (list) which is turned into (strings). The Principle of Mutual Oblivion A very characteristic trait of flows put together from function units is: no functional units knows another one. They are all completely independent of each other. Functional units don´t know where their input is coming from (or even when it´s gonna arrive). They just specify a range of values they can process. And they promise a certain behavior upon input arriving. Also they don´t know where their output is going. They just produce it in their own time independent of other functional units. That means at least conceptually all functional units work in parallel. Functional units don´t know their “deployment context”. They now nothing about the overall flow they are place in. They are just consuming input from some upstream, and producing output for some downstream. That makes functional units very easy to test. At least as long as they don´t depend on state or resources. I call this the Principle of Mutual Oblivion (PoMO). Functional units are oblivious of others as well as an overall context/purpose. They are just parts of a whole focused on a single responsibility. How the whole is built, how a larger goal is achieved, is of no concern to the single functional units. By building software in such a manner, functional design interestingly follows nature. Nature´s building blocks for organisms also follow the PoMO. The cells forming your body do not know each other. Take a nerve cell “controlling” a muscle cell for example:[2] The nerve cell does not know anything about muscle cells, let alone the specific muscel cell it is “attached to”. Likewise the muscle cell does not know anything about nerve cells, let a lone a specific nerve cell “attached to” it. Saying “the nerve cell is controlling the muscle cell” thus only makes sense when viewing both from the outside. “Control” is a concept of the whole, not of its parts. Control is created by wiring-up parts in a certain way. Both cells are mutually oblivious. Both just follow a contract. One produces Acetylcholine (ACh) as output, the other consumes ACh as input. Where the ACh is going, where it´s coming from neither cell cares about. Million years of evolution have led to this kind of division of labor. And million years of evolution have produced organism designs (DNA) which lead to the production of these different cell types (and many others) and also to their co-location. The result: the overall behavior of an organism. How and why this happened in nature is a mystery. For our software, though, it´s clear: functional and quality requirements needs to be fulfilled. So we as developers have to become “intelligent designers” of “software cells” which we put together to form a “software organism” which responds in satisfying ways to triggers from it´s environment. My bet is: If nature gets complex organisms working by following the PoMO, who are we to not apply this recipe for success to our much simpler “machines”? So my rule is: Wherever there is functionality to be delivered, because there is a clear Entry Point into software, design the functionality like nature would do it. Build it from mutually oblivious functional units. That´s what Flow Design is about. In that way it´s even universal, I´d say. Its notation can also be applied to biology: Never mind labeling the functional units with nouns. That´s ok in Flow Design. You´ll do that occassionally for functional units on a higher level of abstraction or when their purpose is close to hardware. Getting a cockroach to roam your bedroom takes 1,000,000 nerve cells (neurons). Getting the de-duplication program to do its job just takes 5 “software cells” (functional units). Both, though, follow the same basic principle. Translating functional units into code Moving from functional design to code is no rocket science. In fact it´s straightforward. There are two simple rules: Translate an input port to a function. Translate an output port either to a return statement in that function or to a function pointer visible to that function. The simplest translation of a functional unit is a function. That´s what you saw in the above example. Functions are mutually oblivious. That why Functional Programming likes them so much. It makes them composable. Which is the reason, nature works according to the PoMO. Let´s be clear about one thing: There is no dependency injection in nature. For all of an organism´s complexity no DI container is used. Behavior is the result of smooth cooperation between mutually oblivious building blocks. Functions will often be the adequate translation for the functional units in your designs. But not always. Take for example the case, where a processing step should not always produce an output. Maybe the purpose is to filter input. Here the functional unit consumes words and produces words. But it does not pass along every word flowing in. Some words are swallowed. Think of a spell checker. It probably should not check acronyms for correctness. There are too many of them. Or words with no more than two letters. Such words are called “stop words”. In the above picture the optionality of the output is signified by the astrisk outside the brackets. It means: Any number of (word) data items can flow from the functional unit for each input data item. It might be none or one or even more. This I call a stream of data. Such behavior cannot be translated into a function where output is generated with return. Because a function always needs to return a value. So the output port is translated into a function pointer or continuation which gets passed to the subroutine when called:[3]void filter_stop_words( string word, Action<string> onNoStopWord) { if (...check if not a stop word...) onNoStopWord(word); } If you want to be nitpicky you might call such a function pointer parameter an injection. And technically you´re right. Conceptually, though, it´s not an injection. Because the subroutine is not functionally dependent on the continuation. Firstly continuations are procedures, i.e. subroutines without a return type. Remember: Flow Design is about unidirectional data flow. Secondly the name of the formal parameter is chosen in a way as to not assume anything about downstream processing steps. onNoStopWord describes a situation (or event) within the functional unit only. Translating output ports into function pointers helps keeping functional units mutually oblivious in cases where output is optional or produced asynchronically. Either pass the function pointer to the function upon call. Or make it global by putting it on the encompassing class. Then it´s called an event. In C# that´s even an explicit feature.class Filter { public void filter_stop_words( string word) { if (...check if not a stop word...) onNoStopWord(word); } public event Action<string> onNoStopWord; } When to use a continuation and when to use an event dependens on how a functional unit is used in flows and how it´s packed together with others into classes. You´ll see examples further down the Flow Design road. Another example of 1D functional design Let´s see Flow Design once more in action using the visual notation. How about the famous word wrap kata? Robert C. Martin has posted a much cited solution including an extensive reasoning behind his TDD approach. So maybe you want to compare it to Flow Design. The function signature given is:string WordWrap(string text, int maxLineLength) {...} That´s not an Entry Point since we don´t see an application with an environment and users. Nevertheless it´s a function which is supposed to provide a certain functionality. The text passed in has to be reformatted. The input is a single line of arbitrary length consisting of words separated by spaces. The output should consist of one or more lines of a maximum length specified. If a word is longer than a the maximum line length it can be split in multiple parts each fitting in a line. Flow Design Let´s start by brainstorming the process to accomplish the feat of reformatting the text. What´s needed? Words need to be assembled into lines Words need to be extracted from the input text The resulting lines need to be assembled into the output text Words too long to fit in a line need to be split Does sound about right? I guess so. And it shows a kind of priority. Long words are a special case. So maybe there is a hint for an incremental design here. First let´s tackle “average words” (words not longer than a line). Here´s the Flow Design for this increment: The the first three bullet points turned into functional units with explicit data added. As the signature requires a text is transformed into another text. See the input of the first functional unit and the output of the last functional unit. In between no text flows, but words and lines. That´s good to see because thereby the domain is clearly represented in the design. The requirements are talking about words and lines and here they are. But note the asterisk! It´s not outside the brackets but inside. That means it´s not a stream of words or lines, but lists or sequences. For each text a sequence of words is output. For each sequence of words a sequence of lines is produced. The asterisk is used to abstract from the concrete implementation. Like with streams. Whether the list of words gets implemented as an array or an IEnumerable is not important during design. It´s an implementation detail. Does any processing step require further refinement? I don´t think so. They all look pretty “atomic” to me. And if not… I can always backtrack and refine a process step using functional design later once I´ve gained more insight into a sub-problem. Implementation The implementation is straightforward as you can imagine. The processing steps can all be translated into functions. Each can be tested easily and separately. Each has a focused responsibility. And the process flow becomes just a sequence of function calls: Easy to understand. It clearly states how word wrapping works - on a high level of abstraction. And it´s easy to evolve as you´ll see. Flow Design - Increment 2 So far only texts consisting of “average words” are wrapped correctly. Words not fitting in a line will result in lines too long. Wrapping long words is a feature of the requested functionality. Whether it´s there or not makes a difference to the user. To quickly get feedback I decided to first implement a solution without this feature. But now it´s time to add it to deliver the full scope. Fortunately Flow Design automatically leads to code following the Open Closed Principle (OCP). It´s easy to extend it - instead of changing well tested code. How´s that possible? Flow Design allows for extension of functionality by inserting functional units into the flow. That way existing functional units need not be changed. The data flow arrow between functional units is a natural extension point. No need to resort to the Strategy Pattern. No need to think ahead where extions might need to be made in the future. I just “phase in” the remaining processing step: Since neither Extract words nor Reformat know of their environment neither needs to be touched due to the “detour”. The new processing step accepts the output of the existing upstream step and produces data compatible with the existing downstream step. Implementation - Increment 2 A trivial implementation checking the assumption if this works does not do anything to split long words. The input is just passed on: Note how clean WordWrap() stays. The solution is easy to understand. A developer looking at this code sometime in the future, when a new feature needs to be build in, quickly sees how long words are dealt with. Compare this to Robert C. Martin´s solution:[4] How does this solution handle long words? Long words are not even part of the domain language present in the code. At least I need considerable time to understand the approach. Admittedly the Flow Design solution with the full implementation of long word splitting is longer than Robert C. Martin´s. At least it seems. Because his solution does not cover all the “word wrap situations” the Flow Design solution handles. Some lines would need to be added to be on par, I guess. But even then… Is a difference in LOC that important as long as it´s in the same ball park? I value understandability and openness for extension higher than saving on the last line of code. Simplicity is not just less code, it´s also clarity in design. But don´t take my word for it. Try Flow Design on larger problems and compare for yourself. What´s the easier, more straightforward way to clean code? And keep in mind: You ain´t seen all yet ;-) There´s more to Flow Design than described in this chapter. In closing I hope I was able to give you a impression of functional design that makes you hungry for more. To me it´s an inevitable step in software development. Jumping from requirements to code does not scale. And it leads to dirty code all to quickly. Some thought should be invested first. Where there is a clear Entry Point visible, it´s functionality should be designed using data flows. Because with data flows abstraction is possible. For more background on why that´s necessary read my blog article here. For now let me point out to you - if you haven´t already noticed - that Flow Design is a general purpose declarative language. It´s “programming by intention” (Shalloway et al.). Just write down how you think the solution should work on a high level of abstraction. This breaks down a large problem in smaller problems. And by following the PoMO the solutions to those smaller problems are independent of each other. So they are easy to test. Or you could even think about getting them implemented in parallel by different team members. Flow Design not only increases evolvability, but also helps becoming more productive. All team members can participate in functional design. This goes beyon collective code ownership. We´re talking collective design/architecture ownership. Because with Flow Design there is a common visual language to talk about functional design - which is the foundation for all other design activities.   PS: If you like what you read, consider getting my ebook “The Incremental Architekt´s Napkin”. It´s where I compile all the articles in this series for easier reading. I like the strictness of Function Programming - but I also find it quite hard to live by. And it certainly is not what millions of programmers are used to. Also to me it seems, the real world is full of state and side effects. So why give them such a bad image? That´s why functional design takes a more pragmatic approach. State and side effects are ok for processing steps - but be sure to follow the SRP. Don´t put too much of it into a single processing step. ? Image taken from www.physioweb.org ? My code samples are written in C#. C# sports typed function pointers called delegates. Action is such a function pointer type matching functions with signature void someName(T t). Other languages provide similar ways to work with functions as first class citizens - even Java now in version 8. I trust you find a way to map this detail of my translation to your favorite programming language. I know it works for Java, C++, Ruby, JavaScript, Python, Go. And if you´re using a Functional Programming language it´s of course a no brainer. ? Taken from his blog post “The Craftsman 62, The Dark Path”. ?

    Read the article

< Previous Page | 1 2 3 4