Search Results

Search found 362 results on 15 pages for 'semantics'.

Page 12/15 | < Previous Page | 8 9 10 11 12 13 14 15 | Next Page >

How might I wrap the FindXFile-style APIs to the STL-style Iterator Pattern in C++?

- by BillyONeal

Hello everyone :) I'm working on wrapping up the ugly innards of the FindFirstFile/FindNextFile loop (though my question applies to other similar APIs, such as RegEnumKeyEx or RegEnumValue, etc.) inside iterators that work in a manner similar to the Standard Template Library's istream_iterators. I have two problems here. The first is with the termination condition of most "foreach" style loops. STL style iterators typically use operator!= inside the exit condition of the for, i.e. std::vector<int> test; for(std::vector<int>::iterator it = test.begin(); it != test.end(); it++) { //Do stuff } My problem is I'm unsure how to implement operator!= with such a directory enumeration, because I do not know when the enumeration is complete until I've actually finished with it. I have sort of a hack together solution in place now that enumerates the entire directory at once, where each iterator simply tracks a reference counted vector, but this seems like a kludge which can be done a better way. The second problem I have is that there are multiple pieces of data returned by the FindXFile APIs. For that reason, there's no obvious way to overload operator* as required for iterator semantics. When I overload that item, do I return the file name? The size? The modified date? How might I convey the multiple pieces of data to which such an iterator must refer to later in an ideomatic way? I've tried ripping off the C# style MoveNext design but I'm concerned about not following the standard idioms here. class SomeIterator { public: bool next(); //Advances the iterator and returns true if successful, false if the iterator is at the end. std::wstring fileName() const; //other kinds of data.... }; EDIT: And the caller would look like: SomeIterator x = ??; //Construct somehow while(x.next()) { //Do stuff } Thanks! Billy3

Read the article
Permuting a binary tree without the use of lists

- by Banang

I need to find an algorithm for generating every possible permutation of a binary tree, and need to do so without using lists (this is because the tree itself carries semantics and restraints that cannot be translated into lists). I've found an algorithm that works for trees with the height of three or less, but whenever I get to greater hights, I loose one set of possible permutations per height added. Each node carries information about its original state, so that one node can determine if all possible permutations have been tried for that node. Also, the node carries information on weather or not it's been 'swapped', i.e. if it has seen all possible permutations of it's subtree. The tree is left-centered, meaning that the right node should always (except in some cases that I don't need to cover for this algorithm) be a leaf node, while the left node is always either a leaf or a branch. The algorithm I'm using at the moment can be described sort of like this: if the left child node has been swapped swap my right node with the left child nodes right node set the left child node as 'unswapped' if the current node is back to its original state swap my right node with the lowest left nodes' right node swap the lowest left nodes two childnodes set my left node as 'unswapped' set my left chilnode to use this as it's original state set this node as swapped return null return this; else if the left child has not been swapped if the result of trying to permute left child is null return the permutation of this node else return the permutation of the left child node if this node has a left node and a right node that are both leaves swap them set this node to be 'swapped' The desired behaviour of the algoritm would be something like this: branch / | branch 3 / | branch 2 / | 0 1 branch / | branch 3 / | branch 2 / | 1 0 <-- first swap branch / | branch 3 / | branch 1 <-- second swap / | 2 0 branch / | branch 3 / | branch 1 / | 0 2 <-- third swap branch / | branch 3 / | branch 0 <-- fourth swap / | 1 2 and so on... Sorry for the ridiculisly long and waddly explanation, would really, really apreciate any sort of help you guys could offer me. Thanks a bunch!

Read the article
What are the lengths/limits C preprocessor as a language creation tool? Where can I learn more about

- by Weston C

In his FAQ @ http://www2.research.att.com/~bs/bs_faq.html#bootstrapping, Bjarne Stroustrup says: To build [Cfront, the first C++ compiler], I first used C to write a "C with Classes"-to-C preprocessor. "C with Classes" was a C dialect that became the immediate ancestor to C++... I then wrote the first version of Cfront in "C with Classes". When I read this, it piqued my interest in the C preprocessor. I'd seen its macro capabilities as suitable for simplifying common expressions but hadn't thought about its ability to significantly add to syntax and semantics on the level that I imagine bringing classes to C took. So now I have a couple of questions on my mind: 1) Are there other examples of this approach to bootstrapping a language off of C? 2) Is the source to Stroustrup's original work available anywhere? 3) Where could I learn more about the specifics of utilizing this technique? 4) What are the lengths/limits of that approach? Could one, say, create a set of preprocessor macros that let someone write in something significantly Lisp/Scheme like?

Read the article
Non standard interaction among two tables to avoid very large merge

- by riko

Suppose I have two tables A and B. Table A has a multi-level index (a, b) and one column (ts). b determines univocally ts. A = pd.DataFrame( [('a', 'x', 4), ('a', 'y', 6), ('a', 'z', 5), ('b', 'x', 4), ('b', 'z', 5), ('c', 'y', 6)], columns=['a', 'b', 'ts']).set_index(['a', 'b']) AA = A.reset_index() Table B is another one-column (ts) table with non-unique index (a). The ts's are sorted "inside" each group, i.e., B.ix[x] is sorted for each x. Moreover, there is always a value in B.ix[x] that is greater than or equal to the values in A. B = pd.DataFrame( dict(a=list('aaaaabbcccccc'), ts=[1, 2, 4, 5, 7, 7, 8, 1, 2, 4, 5, 8, 9])).set_index('a') The semantics in this is that B contains observations of occurrences of an event of type indicated by the index. I would like to find from B the timestamp of the first occurrence of each event type after the timestamp specified in A for each value of b. In other words, I would like to get a table with the same shape of A, that instead of ts contains the "minimum value occurring after ts" as specified by table B. So, my goal would be: C: ('a', 'x') 4 ('a', 'y') 7 ('a', 'z') 5 ('b', 'x') 7 ('b', 'z') 7 ('c', 'y') 8 I have some working code, but is terribly slow. C = AA.apply(lambda row: ( row[0], row[1], B.ix[row[0]].irow(np.searchsorted(B.ts[row[0]], row[2]))), axis=1).set_index(['a', 'b']) Profiling shows the culprit is obviously B.ix[row[0]].irow(np.searchsorted(B.ts[row[0]], row[2]))). However, standard solutions using merge/join would take too much RAM in the long run. Consider that now I have 1000 a's, assume constant the average number of b's per a (probably 100-200), and consider that the number of observations per a is probably in the order of 300. In production I will have 1000 more a's. 1,000,000 x 200 x 300 = 60,000,000,000 rows may be a bit too much to keep in RAM, especially considering that the data I need is perfectly described by a C like the one I discussed above. How would I improve the performance?

Read the article
Beginner SQL section: avoiding repeated expression

- by polygenelubricants

I'm entirely new at SQL, but let's say that on the StackExchange Data Explorer, I just want to list the top 15 users by reputation, and I wrote something like this: SELECT TOP 15 DisplayName, Id, Reputation, Reputation/1000 As RepInK FROM Users WHERE RepInK > 10 ORDER BY Reputation DESC Currently this gives an Error: Invalid column name 'RepInK', which makes sense, I think, because RepInK is not a column in Users. I can easily fix this by saying WHERE Reputation/1000 > 10, essentially repeating the formula. So the questions are: Can I actually use the RepInK "column" in the WHERE clause? Do I perhaps need to create a virtual table/view with this column, and then do a SELECT/WHERE query on it? Can I name an expression, e.g. Reputation/1000, so I only have to repeat the names in a few places instead of the formula? What do you call this? A substitution macro? A function? A stored procedure? Is there an SQL quicksheet, glossary of terms, language specification, anything I can use to quickly pick up the syntax and semantics of the language? I understand that there are different "flavors"?

Read the article
Behavior difference between UIView.subviews and [NSView subviews]

- by zpasternack

I have a piece of code in an iPhone app, which removes all subviews from a UIView subclass. It looks like this: NSArray* subViews = self.subviews; for( UIView *aView in subViews ) { [aView removeFromSuperview]; } This works fine. In fact, I never really gave it much thought until I tried nearly the same thing in a Mac OS X app (from an NSView subclass): NSArray* subViews = [self subviews]; for( NSView *aView in subViews ) { [aView removeFromSuperview]; } That totally doesn’t work. Specifically, at runtime, I get this: *** Collection <NSCFArray: 0x1005208a0> was mutated while being enumerated. I ended up doing it like so: NSArray* subViews = [[self subviews] copy]; for( NSView *aView in subViews ) { [aView removeFromSuperview]; } [subViews release]; That's fine. What’s bugging me, though, is why does it work on the iPhone? subviews is a copy property: @property(nonatomic,readonly,copy) NSArray *subviews; My first thought was, maybe @synthesize’d getters return a copy when the copy attribute is specified. The doc is clear on the semantics of copy for setters, but doesn’t appear to say either way for getters (or at least, it’s not apparent to me). And actually, doing a few tests of my own, this clearly does not seem to be the case. Which is good, I think returning a copy would be problematic, for a few reasons. So the question is: how does the above code work on the iPhone? NSView is clearly returning a pointer to the actual array of subviews, and perhaps UIView isn’t. Perhaps it’s simply an implementation detail of UIView, and I shouldn’t get worked up about it. Can anyone offer any insight?

Read the article
Multithreaded linked list traversal

- by Rob Bryce

Given a (doubly) linked list of objects (C++), I have an operation that I would like multithread, to perform on each object. The cost of the operation is not uniform for each object. The linked list is the preferred storage for this set of objects for a variety of reasons. The 1st element in each object is the pointer to the next object; the 2nd element is the previous object in the list. I have solved the problem by building an array of nodes, and applying OpenMP. This gave decent performance. I then switched to my own threading routines (based off Windows primitives) and by using InterlockedIncrement() (acting on the index into the array), I can achieve higher overall CPU utilization and faster through-put. Essentially, the threads work by "leap-frog'ing" along the elements. My next approach to optimization is to try to eliminate creating/reusing the array of elements in my linked list. However, I'd like to continue with this "leap-frog" approach and somehow use some nonexistent routine that could be called "InterlockedCompareDereference" - to atomically compare against NULL (end of list) and conditionally dereference & store, returning the dereferenced value. I don't think InterlockedCompareExchangePointer() will work since I cannot atomically dereference the pointer and call this Interlocked() method. I've done some reading and others are suggesting critical sections or spin-locks. Critical sections seem heavy-weight here. I'm tempted to try spin-locks but I thought I'd first pose the question here and ask what other people are doing. I'm not convinced that the InterlockedCompareExchangePointer() method itself could be used like a spin-lock. Then one also has to consider acquire/release/fence semantics... Ideas? Thanks!

Read the article
do the Python libraries have a natural dependence on the global namespace?

- by msw

I first ran into this when trying to determine the relative performance of two generators: t = timeit.repeat('g.get()', setup='g = my_generator()') So I dug into the timeit module and found that the setup and statement are evaluated with their own private, initially empty namespaces so naturally the binding of g never becomes accessible to the g.get() statement. The obvious solution is to wrap them into a class, thus adding to the global namespace. I bumped into this again when attempting, in another project, to use the multiprocessing module to divide a task among workers. I even bundled everything nicely into a class but unfortunately the call pool.apply_async(runmc, arg) fails with a PicklingError because buried inside the work object that runmc instantiates is (effectively) an assignment: self.predicate = lambda x, y: x > y so the whole object can't be (understandably) pickled and whereas: def foo(x, y): return x > y pickle.dumps(foo) is fine, the sequence bar = lambda x, y: x > y yields True from callable(bar) and from type(bar), but it Can't pickle <function <lambda> at 0xb759b764>: it's not found as __main__.<lambda>. I've given only code fragments because I can easily fix these cases by merely pulling them out into module or object level defs. The bug here appears to be in my understanding of the semantics of namespace use in general. If the nature of the language requires that I create more def statements I'll happily do so; I fear that I'm missing an essential concept though. Why is there such a strong reliance on the global namespace? Or, what am I failing to understand? Namespaces are one honking great idea -- let's do more of those!

Read the article
Give the mount point of a path

- by Charles Stewart

The following, very non-robust shell code will give the mount point of $path: (for i in $(df|cut -c 63-99); do case $path in $i*) echo $i;; esac; done) | tail -n 1 Is there a better way to do this? Postscript This script is really awful, but has the redeeming quality that it Works On My Systems. Note that several mount points may be prefixes of $path. Examples On a Linux system: cas@txtproof:~$ path=/sys/block/hda1 cas@txtproof:~$ for i in $(df -a|cut -c 57-99); do case $path in $i*) echo $i;; esac; done| tail -1 /sys On a Mac osx system cas local$ path=/dev/fd/0 cas local$ for i in $(df -a|cut -c 63-99); do case $path in $i*) echo $i;; esac; done| tail -1 /dev Note the need to vary cut's parameters, because of the way df's output differs: indeed, awk is better. Answer It looks like munging tabular output is the only way within the shell, but df /dev/fd/impossible | tail -1 | awk '{ print $NF}' is a big improvement on what I had. Note two differences in semantics: firstly, df $path insists that $path names an existing file, the script I had above doesn't care; secondly, there are no worries about dereferncing symlinks. It's not difficult to write Python code to do the job.

Read the article
Is it possible to tie nested generics?

- by Michael Deardeuff

Is it possible to tie nested generics/captures together? I often have the problem of having a Map lookup of class to genericized item of said class. In concrete terms I want something like this (no, T is not declared anywhere). private Map<Class<T>, ServiceLoader<T>> loaders = Maps.newHashMap(); In short, I want loaders.put/get to have semantics something like these: <T> ServiceLoader<T> get(Class<T> klass) {...} <T> void put(Class<T> klass, ServiceLoader<T> loader) {...} Is the following the best I can do? Do I have to live with the inevitable @SuppressWarnings("unchecked") somewhere down the line? private Map<Class<?>, ServiceLoader<?>> loaders = Maps.newHashMap();

Read the article
Do the 'up to date' guarantees provided by final field in Java's memory model extend to indirect ref

- by mattbh

The Java language spec defines semantics of final fields in section 17.5: The usage model for final fields is a simple one. Set the final fields for an object in that object's constructor. Do not write a reference to the object being constructed in a place where another thread can see it before the object's constructor is finished. If this is followed, then when the object is seen by another thread, that thread will always see the correctly constructed version of that object's final fields. It will also see versions of any object or array referenced by those final fields that are at least as up-to-date as the final fields are. My question is - does the 'up-to-date' guarantee extend to the contents of nested arrays, and nested objects? An example scenario: Thread A constructs a HashMap of ArrayLists, then assigns the HashMap to final field 'myFinal' in an instance of class 'MyClass' Thread B sees a (non-synchronized) reference to the MyClass instance and reads 'myFinal', and accesses and reads the contents of one of the ArrayLists In this scenario, are the members of the ArrayList as seen by Thread B guaranteed to be at least as up to date as they were when MyClass's constructor completed?

Read the article
How to store arbitrary data for some HTML tags

- by nickf

I'm making a page which has some interaction provided by javascript. Just as an example: links which send an AJAX request to get the content of articles and then display that data in a div. Obviously in this example, I need each link to store an extra bit of information: the id of the article. The way I've been handling it in case was to put that information in the href link this: <a class="article" href="#5"> I then use jQuery to find the a.article elements and attach the appropriate event handler. (don't get too hung up on the usability or semantics here, it's just an example) Anyway, this method works, but it smells a bit, and isn't extensible at all (what happens if the click function has more than one parameter? what if some of those parameters are optional?) The immediately obvious answer was to use attributes on the element. I mean, that's what they're for, right? (Kind of). <a articleid="5" href="link/for/non-js-users.html"> In my recent question I asked if this method was valid, and it turns out that short of defining my own DTD (I don't), then no, it's not valid or reliable. A common response was to put the data into the class attribute (though that might have been because of my poorly-chosen example), but to me, this smells even more. Yes it's technically valid, but it's not a great solution. Another method I'd used in the past was to actually generate some JS and insert it into the page in a <script> tag, creating a struct which would associate with the object. var myData = { link0 : { articleId : 5, target : '#showMessage' // etc... }, link1 : { articleId : 13 } }; <a href="..." id="link0"> But this can be a real pain in butt to maintain and is generally just very messy. So, to get to the question, how do you store arbitrary pieces of information for HTML tags?

Read the article
Difference between SET autocommit=1 and START TRANSACTION in mysql (Have I missed something?)

- by tkolar

Hey there, I am reading up on transactions in mysql and am not sure whether I have grasped something specific correctly, and I want to be sure I understood that correctly, so here goes. I know what a transaction is supposed to do, I'm just not sure whether I understood the statement semantics or not. So, my question is, is anything wrong, (and, if that is the case, what is wrong) with the following: By default, autocommit mode is enabled in mysql. Now, SET autocommit=0; will begin a transaction, SET autocommit=1; will implicitly commit. It is possible to COMMIT; as well as ROLLBACK;, in both of which cases autocommit is still set to 0 afterwards (and a new transaction is implicitly started). START TRANSACTION; will basically SET autocommit=0; until a COMMIT; or ROLLBACK; takes place. In other words, START TRANSACTION; and SET autocommit=0; are equivalent, except for the fact that START TRANSACTION; does the equivalent of implicitly adding a SET autocommit=0; after COMMIT; or ROLLBACK; If that is the case, I don't understand http://dev.mysql.com/doc/refman/5.5/en/set-transaction.html#isolevel_serializable - seeing as having an isolation level implies that there is a transaction, meaning that autocommit should be off anyway? And if there is another difference (other than the one described above) between beginning a transaction and setting autocommit, what is it? Thanks a lot in advance for your help!

Read the article
Nested/Sub data types in haskell

- by Tom Carstens

So what would be nice is if you could do something like the following (not necessarily with this format, just the general idea): data Minor = MinorA | MinorB data Major = Minor | MajorB isMinor :: Major -> Bool isMinor Minor = True isMinor _ = False So isMinor MinorA would report True (instead of an error.) At the moment you might do something like: data Major = MinorA | MinorB | MajorB isMinor :: Major -> Bool isMinor MinorA = True isMinor MinorB = True isMinor _ = False It's not terrible or anything, but it doesn't expand nicely (as in if Minor when up to MinorZ this would be terribly clunky). To avoid that problem you can wrap Minor: data Minor = MinorA | MinorB data Major = MajorA Minor | MajorB isMinor :: Major -> Bool isMinor (MajorA _) = True isMinor _ = False But now you have to make sure to wrap your Minors to use them as a Major... again not terrible; just doesn't really express the semantics I'd like very well (i.e. Major can be any Minor or MajorB). The first (legal) example is "Major can be MinorA..." but doesn't have any knowledge of Minor and the second is "Major can be MajorA that takes a Minor..." p.s. No, this isn't really about anything concrete.

Read the article
What does the `new` keyword do

- by Mike

I'm following a Java tutorial online, trying to learn the language, and it's bouncing between two semantics for using arrays. long results[] = new long[3]; results[0] = 1; results[1] = 2; results[2] = 3; and: long results[] = {1, 2, 3}; The tutorial never really mentioned why it switched back and forth between the two so I searched a little on the topic. My current understanding is that the new operator is creating an object of "array of longs" type. What I do not understand is why do I want that, and what are the ramifications of that? Are there certain "array" specific methods that won't work on an array unless it's an "array object"? Is there anything that I can't do with an "array object" that I can do with a normal array? Does the Java VM have to do clean up on objects initialized with the new operator that it wouldn't normally have to do? I'm coming from C, so my Java terminology, may not be correct here, so please ask for clarification if something's not understandable.

Read the article
innobackupex - after restoring - quit without updating PID file

- by clarkk

After restoring a backup the server can't start.. restoring # tar -izxf /var/www/bak/db/2013-11-10-1437_mysql.tar.gz -C /var/www/bak/db_import # innobackupex --use-memory=1G --apply-log /var/www/bak/db_import # service mysql stop # mv /var/lib/mysql /var/lib/mysql-old # mkdir /var/lib/mysql # innobackupex --copy-back /var/www/bak/db_import # chown -R mysql:mysql /var/lib/mysql # service mysql start error log 131110 21:24:20 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql 2013-11-10 21:24:21 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details). 2013-11-10 21:24:21 6194 [Warning] Using pre 5.5 semantics to load error messages from /opt/mysql/server-5.6/share/english/. 2013-11-10 21:24:21 6194 [Warning] If this is not intended, refer to the documentation for valid usage of --lc-messages-dir and --language parameters. 2013-11-10 21:24:21 6194 [Note] Plugin 'FEDERATED' is disabled. /usr/local/mysql/bin/mysqld: Table 'mysql.plugin' doesn't exist 2013-11-10 21:24:21 6194 [ERROR] Can't open the mysql.plugin table. Please run mysql_upgrade to create it. 2013-11-10 21:24:21 6194 [Note] InnoDB: The InnoDB memory heap is disabled 2013-11-10 21:24:21 6194 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins 2013-11-10 21:24:21 6194 [Note] InnoDB: Compressed tables use zlib 1.2.3 2013-11-10 21:24:21 6194 [Note] InnoDB: Using Linux native AIO 2013-11-10 21:24:21 6194 [Note] InnoDB: Not using CPU crc32 instructions 2013-11-10 21:24:21 6194 [Note] InnoDB: Initializing buffer pool, size = 128.0M 2013-11-10 21:24:21 6194 [Note] InnoDB: Completed initialization of buffer pool 2013-11-10 21:24:21 6194 [Note] InnoDB: Highest supported file format is Barracuda. 2013-11-10 21:24:22 6194 [Note] InnoDB: 128 rollback segment(s) are active. 2013-11-10 21:24:22 6194 [Note] InnoDB: Waiting for purge to start 2013-11-10 21:24:22 6194 [Note] InnoDB: 5.6.12 started; log sequence number 636992658 2013-11-10 21:24:22 6194 [Note] Server hostname (bind-address): '127.0.0.1'; port: 3306 2013-11-10 21:24:22 6194 [Note] - '127.0.0.1' resolves to '127.0.0.1'; 2013-11-10 21:24:22 6194 [Note] Server socket created on IP: '127.0.0.1'. 2013-11-10 21:24:22 6194 [ERROR] Fatal error: Can't open and lock privilege tables: Table 'mysql.user' doesn't exist 131110 21:24:22 mysqld_safe mysqld from pid file /var/run/mysqld/mysqld.pid ended mysql_upgrade /opt/mysql/server-5.6/bin/mysql_upgrade -u root -pxxxxx -P 3308 Warning: Using a password on the command line interface can be insecure. Looking for 'mysql' as: /opt/mysql/server-5.6/bin/mysql Looking for 'mysqlcheck' as: /opt/mysql/server-5.6/bin/mysqlcheck FATAL ERROR: Upgrade failed

Read the article
How to enable caching on Apache / Ubuntu Linux?

- by Jim Mischel

I have a large (several megabytes) XML file that's updated rather frequently (every 10 minutes or less) and gets a lot of traffic. I'd like to implement some caching to reduce bandwidth and server load. Looking at the Apache documents, I see a dizzying array of configuration options that involve various combinations of mod_expires, mod_headers, and mod_cache (and variants). I end up running in circles and the results aren't what I expect. I'm comfortable editing the various configuration files if I have some idea what I'm supposed to change. But at the moment I'm poking around in the dark and that's never a comfortable feeling. So, perhaps if I describe what I want, somebody here can take me by the hand and say, "This is what you need to do." Periodically, this file, call it "stuff.xml" is updated and a new version copied to the directory. The external url would be, for example, http://example.com/stuff.xml. Understand, this part works. Whenever I request the file, I get the expected result. But the file is big and I want to save bandwidth, so first I'd like to implement conditional GET semantics with the If-Modified-Since header. How do I do this? I've enabled mod_headers and mod_expired and added the <FilesMatching> section in my httpd.conf as recommended in countless examples I've seen online, but that didn't change the behavior when made a conditional GET request. I always get a status 200 with the entire document. So how the heck do I implement this? That'll cut down on neeless transfers. I'd also like to limit the amount of data transferred. Seeing as this is XML, gzipping it should save me 50% or more. My next step would be to somehow gzip the file and, if it's not too difficult, store it in memory. That'll cut down on per-access data transfer, and also reduce disk transfers. So how do I implement this type of caching? Thanks in advance.

Read the article
Why does writing a file to an NFS share send a COMMIT operation to the NFS server?

- by Antonis Christofides

I have a Debian squeeze (2.6.32-5-amd64) which is at the same time a NFS4 server and client (it mounts itself through NFS4). The local directory that leads directly to disk is /nfs4exports/mydir, whereas /nfs4mounts/mydir is the same thing mounted through NFS, using the machine's external IP address. Here is the line from fstab: 192.168.1.75:/mydir /nfs4mounts/mydir nfs4 soft 0 0 I have an application that writes many small files. If I write directly to /nfs4exports/mydir, it writes thousands of files per second; but if I write to /nfs4mounts/mydir, it writes 4 files per second or so. I can greatly increase speed if I add async to /etc/exports. (Writing a single large file to the NFS-mounted directory goes at more than 100 MB/s.) I examine the server statistics and I see that whenever a file is written, it is "committed" (this also happens with NFSv3): root@debianvboxtest:~# mount -t nfs4 192.168.1.75:/mydir /mnt root@debianvboxtest:~# nfsstat|grep -A 2 'nfs v4 operations' Server nfs v4 operations: op0-unused op1-unused op2-future access close commit 0 0% 0 0% 0 0% 10 4% 1 0% 1 0% root@debianvboxtest:~# echo 'hello' >/mnt/test1056 root@debianvboxtest:~# nfsstat|grep -A 2 'nfs v4 operations' Server nfs v4 operations: op0-unused op1-unused op2-future access close commit 0 0% 0 0% 0 0% 11 4% 2 0% 2 0% Now in the RFC, I read this: The COMMIT operation is similar in operation and semantics to the POSIX fsync(2) system call that synchronizes a file's state with the disk (file data and metadata is flushed to disk or stable storage). COMMIT performs the same operation for a client, flushing any unsynchronized data and metadata on the server to the server's disk or stable storage for the specified file. I don't understand why the client commits. I don't think that the "echo" shell built-in command runs fsync; if echo wrote to a local file and then the machine went down, the file might be lost. In contrast, the NFS client appears to be sending a COMMIT upon completion of the echo. Why? I am reluctant to use the async NFS server option, because it would apparently ignore COMMIT. I feel as if I had a local filesystem and I had to choose between syncing every file upon close and ignoring fsync altogether. What have I understood wrong?

Read the article
How can the Private Bytes of a process be significantly less than its effect on the system commit charge?

- by bacar

On a 64-bit Windows Server 2003, I can see using taskmgr or process explorer that the total commit charge is around 3.5GB, yet when I sum the Private Bytes consumed by each process (by running pslist -m and adding all values under the Priv column) the total comes in at 1.6GB. I know which process seems to be causing this (sqlservr.exe) as when I kill the process, the commit charge drops dramatically. However the process in question is consuming only ~220MB of Private Bytes yet killing the process drops the commit charge by ~1.6GB. How is this possible? How can the commit charge be so significantly greater than Private Bytes, which should represent the amount of committed memory? If some other factor contributes to the commit charge, what is that factor and how can I view its impact in process explorer? Note: I claim that I understand the difference between reserved and committed memory already: my investigations above relate specifically to Private Bytes which includes only committed memory and excludes reserved memory. the Virtual Size of the process in this case is over 4GB, but this should be irrelevant - Virtual Size in procexp represents reserved, not committed memory, and should not contribute to the commit charge. I'm particularly interested in generalised answers to this question: I'm assuming that if sqlservr.exe can behave in this way, that any process potentially could. Further Investigations I notice that pointing Sysinternals VMMap at this process reports a committed "Private Data" of 1.6GB despite Procexp's reported a Private Bytes of 220MB. This is particularly strange given that the documentation for this field in the "Windows® Sysinternals Administrator's Reference" states that: Private Data memory is memory that is allocated by VirtualAlloc and that is not further handled by the Heap Manager or the .NET runtime, or assigned to the Stack category... VMMap’s definition of “Private Data” is more granular than that of Process Explorer’s “private bytes.” Procexp’s “private bytes” includes all private committed memory belonging to the process. i.e. that VMMap's committed "Private Data" should be smaller than procexp's "Private Bytes". Also, after reading the 'Process committed memory' section of Mark Russinovich's excellent Pushing the Limits of Windows: Virtual Memory, he highlights two cases which won't show up in Private Bytes: File mapping views with copy-on-write semantics (however, according to VMMap there is no significant space allocated to Mapped Files). pagefile-backed virtual memory (however, I tried testlimit with the -l flag as suggested, and no significant memory is consumed by pagefile-backed sections)

Read the article
C# 4.0: Dynamic Programming

- by Paulo Morgado

The major feature of C# 4.0 is dynamic programming. Not just dynamic typing, but dynamic in broader sense, which means talking to anything that is not statically typed to be a .NET object. Dynamic Language Runtime The Dynamic Language Runtime (DLR) is piece of technology that unifies dynamic programming on the .NET platform, the same way the Common Language Runtime (CLR) has been a common platform for statically typed languages. The CLR always had dynamic capabilities. You could always use reflection, but its main goal was never to be a dynamic programming environment and there were some features missing. The DLR is built on top of the CLR and adds those missing features to the .NET platform. The Dynamic Language Runtime is the core infrastructure that consists of: Expression Trees The same expression trees used in LINQ, now improved to support statements. Dynamic Dispatch Dispatches invocations to the appropriate binder. Call Site Caching For improved efficiency. Dynamic languages and languages with dynamic capabilities are built on top of the DLR. IronPython and IronRuby were already built on top of the DLR, and now, the support for using the DLR is being added to C# and Visual Basic. Other languages built on top of the CLR are expected to also use the DLR in the future. Underneath the DLR there are binders that talk to a variety of different technologies: .NET Binder Allows to talk to .NET objects. JavaScript Binder Allows to talk to JavaScript in SilverLight. IronPython Binder Allows to talk to IronPython. IronRuby Binder Allows to talk to IronRuby. COM Binder Allows to talk to COM. Whit all these binders it is possible to have a single programming experience to talk to all these environments that are not statically typed .NET objects. The dynamic Static Type Let’s take this traditional statically typed code: Calculator calculator = GetCalculator(); int sum = calculator.Sum(10, 20); Because the variable that receives the return value of the GetCalulator method is statically typed to be of type Calculator and, because the Calculator type has an Add method that receives two integers and returns an integer, it is possible to call that Sum method and assign its return value to a variable statically typed as integer. Now lets suppose the calculator was not a statically typed .NET class, but, instead, a COM object or some .NET code we don’t know he type of. All of the sudden it gets very painful to call the Add method: object calculator = GetCalculator(); Type calculatorType = calculator.GetType(); object res = calculatorType.InvokeMember("Add", BindingFlags.InvokeMethod, null, calculator, new object[] { 10, 20 }); int sum = Convert.ToInt32(res); And what if the calculator was a JavaScript object? ScriptObject calculator = GetCalculator(); object res = calculator.Invoke("Add", 10, 20); int sum = Convert.ToInt32(res); For each dynamic domain we have a different programming experience and that makes it very hard to unify the code. With C# 4.0 it becomes possible to write code this way: dynamic calculator = GetCalculator(); int sum = calculator.Add(10, 20); You simply declare a variable who’s static type is dynamic. dynamic is a pseudo-keyword (like var) that indicates to the compiler that operations on the calculator object will be done dynamically. The way you should look at dynamic is that it’s just like object (System.Object) with dynamic semantics associated. Anything can be assigned to a dynamic. dynamic x = 1; dynamic y = "Hello"; dynamic z = new List<int> { 1, 2, 3 }; At run-time, all object will have a type. In the above example x is of type System.Int32. When one or more operands in an operation are typed dynamic, member selection is deferred to run-time instead of compile-time. Then the run-time type is substituted in all variables and normal overload resolution is done, just like it would happen at compile-time. The result of any dynamic operation is always dynamic and, when a dynamic object is assigned to something else, a dynamic conversion will occur. Code Resolution Method double x = 1.75; double y = Math.Abs(x); compile-time double Abs(double x) dynamic x = 1.75; dynamic y = Math.Abs(x); run-time double Abs(double x) dynamic x = 2; dynamic y = Math.Abs(x); run-time int Abs(int x) The above code will always be strongly typed. The difference is that, in the first case the method resolution is done at compile-time, and the others it’s done ate run-time. IDynamicMetaObjectObject The DLR is pre-wired to know .NET objects, COM objects and so forth but any dynamic language can implement their own objects or you can implement your own objects in C# through the implementation of the IDynamicMetaObjectProvider interface. When an object implements IDynamicMetaObjectProvider, it can participate in the resolution of how method calls and property access is done. The .NET Framework already provides two implementations of IDynamicMetaObjectProvider: DynamicObject : IDynamicMetaObjectProvider The DynamicObject class enables you to define which operations can be performed on dynamic objects and how to perform those operations. For example, you can define what happens when you try to get or set an object property, call a method, or perform standard mathematical operations such as addition and multiplication. ExpandoObject : IDynamicMetaObjectProvider The ExpandoObject class enables you to add and delete members of its instances at run time and also to set and get values of these members. This class supports dynamic binding, which enables you to use standard syntax like sampleObject.sampleMember, instead of more complex syntax like sampleObject.GetAttribute("sampleMember").

Read the article
Session Report: What’s New in JSF: A Complete Tour of JSF 2.2

- by Janice J. Heiss

On Wednesday, Ed Burns, Consulting Staff Member at Oracle, presented a session, CON3870 -- “What’s New in JSF: A Complete Tour of JSF 2.2,” in which he provided an update on recent developments in JavaServer Faces 2.2. He began by emphasizing that, “JavaServer Faces 2.2 continues the evolution of the Java EE standard user interface technology. Like previous releases, this iteration is very community-driven and transparent.” He pointed out that since JSF was introduced at the 2001 JavaOne Keynote, it has had a long and successful run and has found a home in applications where the UI logic resides entirely on the server where the model and UI logic is. In such cases, the browser performs fairly simple functions. However, developers can take advantage of the power of browsers, something that Project Avatar is focused on by letting developers author their applications so the UI logic is running on the client and communicating to the back end via RESTful web services. “Most importantly,” remarked Burns, “JSF 2.2 offers a really good migration path because even in the scope of one application you could have an app written with JSF that has its UI logic on the server and, on a gradual basis, you could migrate parts of the app over to use client-side technologies. This can be done at any level of granularity – per page or per collection of pages. It all depends on what you want to do.” His presentation, which focused on the basic new features of JSF 2.2, began by restating the scope of JSF and encouraged attendees to check out Roger Kitain’s session: CON5133 “Techniques for Responsive Real-Time Web UIs.” Burns explained that JSF has endured because, “We still need web apps that are maintainable, localizable, quick to build, accessible, secure, look great and are fun to use.” It is used on every continent – the curious can go here to check out where its unofficial usage is tracked. He emphasized the significance of the UI logic being substantially on the server. This: Separates Component Semantics from Rendering, Allows components to “own” their little patch of the UI -- encode/decode, And offers a well-defined lifecycle: Inversion of Control. Burns reminded attendees that JSR-344, the spec for JSF 2.2, is now on Java Community Process 2.8, a revised version of the JCP that allows for more openness and transparency. He then offered some tools for community access to JSF 2.2: * Public java.net projects spec http://jsf-spec.java.net/ impl http://jsf.java.net/ Open Source: GPL+Classpath Exception * Mailing Lists [email protected] Public readable archive, JSPA signed member read/write [email protected] Public readable archive, any java.net member read/write All mail sent to jsr344-experts is sent to users. * Issue Tracker spec http://jsf-spec.java.net/issues/ impl http://jsf.java.net/issues/ JSF 2.2, which is JSR 344, has a Public Review Draft planned by December 2012 with no need for a Renewal Ballot. The Early Draft Review of JSR 344 was published on December 8, 2011. Interested developers are encouraged to offer their input. Six Big Ticket Features of JSF 2.2 Burns summarized the six big ticket features of JSF 2.2:* HTML5 Friendly Markup Support Pass through attributes and elements * Faces Flows* Cross Site Request Forgery Protection* Loading Facelets via ResourceHandler* File Upload Component* Multi-Templating He explained that he called it “HTML 5 friendly” because there is really nothing HTML 5 specific about it -- it could be 4. But it enables developers to use new elements that are present in HTML5 without having a JSF component library that is written to take advantage of those specifically. It gives the page author the ability to use plain HTML5 to write their page, but to still take advantage of the server-side available in JSF. He presented a demo showing JSF 2.2’s ability to leverage the expressiveness of HTML5. Burns then explained the significance of face flows, which offer function points and quantify how much work has taken place, something of great value to JSF users. He went on to talk about JSF 2.2.’s cross-site request forgery protection (CSRF) and offered details about how it protects applications against attack. Then he talked about JSF 2.2’s File Upload Component and explained that the final specification will have Ajax and non-Ajax support. The current milestone has non-Ajax support implemented. He then went on to explain its capacity to add facelets through ResourceHandler. Previously, JSF 2.0 added Facelets and ResourceHandler as disparate units; now in JSF 2.2 the two concepts are unified. Finally, he explained the concept of multi-templating in JSF 2.2 and went on to discuss more medium-level features of the release. For an easy, low maintenance way of staying in touch with JSF developments go to JSF’s Twitter page where every month or so, important updates are offered.

Read the article
Parallelism in .NET – Part 8, PLINQ’s ForAll Method

- by Reed

Parallel LINQ extends LINQ to Objects, and is typically very similar. However, as I previously discussed, there are some differences. Although the standard way to handle simple Data Parellelism is via Parallel.ForEach, it’s possible to do the same thing via PLINQ. PLINQ adds a new method unavailable in standard LINQ which provides new functionality… LINQ is designed to provide a much simpler way of handling querying, including filtering, ordering, grouping, and many other benefits. Reading the description in LINQ to Objects on MSDN, it becomes clear that the thinking behind LINQ deals with retrieval of data. LINQ works by adding a functional programming style on top of .NET, allowing us to express filters in terms of predicate functions, for example. PLINQ is, generally, very similar. Typically, when using PLINQ, we write declarative statements to filter a dataset or perform an aggregation. However, PLINQ adds one new method, which provides a very different purpose: ForAll. The ForAll method is defined on ParallelEnumerable, and will work upon any ParallelQuery<T>. Unlike the sequence operators in LINQ and PLINQ, ForAll is intended to cause side effects. It does not filter a collection, but rather invokes an action on each element of the collection. At first glance, this seems like a bad idea. For example, Eric Lippert clearly explained two philosophical objections to providing an IEnumerable<T>.ForEach extension method, one of which still applies when parallelized. The sole purpose of this method is to cause side effects, and as such, I agree that the ForAll method “violates the functional programming principles that all the other sequence operators are based upon”, in exactly the same manner an IEnumerable<T>.ForEach extension method would violate these principles. Eric Lippert’s second reason for disliking a ForEach extension method does not necessarily apply to ForAll – replacing ForAll with a call to Parallel.ForEach has the same closure semantics, so there is no loss there. Although ForAll may have philosophical issues, there is a pragmatic reason to include this method. Without ForAll, we would take a fairly serious performance hit in many situations. Often, we need to perform some filtering or grouping, then perform an action using the results of our filter. Using a standard foreach statement to perform our action would avoid this philosophical issue: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action foreach (var item in filteredItems) { // These will now run serially item.DoSomething(); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This would cause a loss in performance, since we lose any parallelism in place, and cause all of our actions to be run serially. We could easily use a Parallel.ForEach instead, which adds parallelism to the actions: // Filter our collection var filteredItems = collection.AsParallel().Where( i => i.SomePredicate() ); // Now perform an action once the filter completes Parallel.ForEach(filteredItems, item => { // These will now run in parallel item.DoSomething(); }); This is a noticeable improvement, since both our filtering and our actions run parallelized. However, there is still a large bottleneck in place here. The problem lies with my comment “perform an action once the filter completes”. Here, we’re parallelizing the filter, then collecting all of the results, blocking until the filter completes. Once the filtering of every element is completed, we then repartition the results of the filter, reschedule into multiple threads, and perform the action on each element. By moving this into two separate statements, we potentially double our parallelization overhead, since we’re forcing the work to be partitioned and scheduled twice as many times. This is where the pragmatism comes into play. By violating our functional principles, we gain the ability to avoid the overhead and cost of rescheduling the work: // Perform an action on the results of our filter collection .AsParallel() .Where( i => i.SomePredicate() ) .ForAll( i => i.DoSomething() ); The ability to avoid the scheduling overhead is a compelling reason to use ForAll. This really goes back to one of the key points I discussed in data parallelism: Partition your problem in a way to place the most work possible into each task. Here, this means leaving the statement attached to the expression, even though it causes side effects and is not standard usage for LINQ. This leads to my one guideline for using ForAll: The ForAll extension method should only be used to process the results of a parallel query, as returned by a PLINQ expression. Any other usage scenario should use Parallel.ForEach, instead.

Read the article
Red Gate in the Community

- by Nick Harrison

Much has been said recently about Red Gate's community involvement and commitment to the DotNet community. Much of this has been unduly negative. Before you start throwing stones and spewing obscenities, consider some additional facts: Red Gate's software is actually very good. I have worked on many projects where Red Gate's software was instrumental in finishing successfully. Red Gate is VERY good to the community. I have spoken at many user groups and code camps where Red Gate has been a sponsor. Red Gate consistently offers up money to pay for the venue or food, and they will often give away licenses as door prizes. There are many such community events that would not take place without Red Gate's support. All I have ever seen them ask for is to have their products mentioned or be listed as a sponsor. They don't insist on anyone following a specific script. They don't monitor how their products are showcased. They let their products speak for themselves. Red Gate sponsors the Simple Talk web site. I publish there regularly. Red Gate has never exerted editorial pressure on me. No one has ever told me we can't publish this unless you mention Red Gate products. No one has ever said, you need to say nice things about Red Gate products in order to be published. They have told me, "you need to make this less academic, so you don't alienate too many readers. "You need to actually write an introduction so people will know what you are talking about". "You need to write this so that someone who isn't a reflection nut will follow what you are trying to say." In short, they have been good editors worried about the quality of the content and what the readers are likely to be interested in. For me personally, Red Gate and Simple Talk have both been excellent to work with. As for the developer outrage… I am a little embarrassed by so much of the response that I am seeing. So much of the complaints remind me of little children whining "but you promised" Semantics aside. A promise is just a promise. It's not like they "pinky sweared". Sadly no amount name calling or "double dog daring" will change the economics of the situation. Red Gate is not a multibillion dollar corporation. They are a mid size company doing the best they can. Without a doubt, their pockets are not as deep as Microsoft's. I honestly believe that they did try to make the "freemium" model work. Sadly it did not. I have no doubt that they intended for it to work and that they tried to make it work. I also have no doubt that they labored over making this decision. This could not have been an easy decision to make. Many people are gleefully proclaiming a massive backlash against Red Gate swearing off their wonderful products and promising to bash them at every opportunity from now on. This is childish behavior that does not represent professionals. This type of behavior is more in line with bullies in the school yard than professionals in a professional community. Now for my own prediction… This back lash against Red Gate is not likely to last very long. We will all realize that we still need their products. We may look around for alternatives, but realize that they really do have the best in class for every product that they produce, and that they really are not exorbitantly priced. We will see them sponsoring Code Camps and User Groups and be reminded, "hey this isn't such a bad company". On the other hand, software shops like Red Gate, will remember this back lash and give a second thought to supporting open source projects. They will worry about getting involved when an individual wants to turn over control for a product that they developed but can no longer support alone. Who wants to run the risk of not being able to follow through on their best intentions. In the end we may all suffer, even the toddlers among us throwing the temper tantrum, "BUT YOU PROMISED!" Disclaimer Before anyone asks or jumps to conclusions, I do not get paid by Red Gate to say any of this. I have often written about their products, and I have long thought that they are a wonderful company with amazing products. If they ever open an office in the SE United States, I will be one of the first to apply.

Read the article
PanelGridLayout - A Layout Revolution

- by Duncan Mills

With the most recent 11.1.2 patchset (11.1.2.3) there has been a lot of excitement around ADF Essentials (and rightly so), however, in all the fuss I didn't want an even more significant change to get missed - yes you read that correctly, a more significant change! I'm talking about the new panelGridLayout component, I can confidently say that this one of the most revolutionary components that we've introduced in 11g, even though it sounds rather boring. To be totally accurate, panelGrid was introduced in 11.1.2.2 but without any presence in the component palette or other design time support, so it was largely missed unless you read the release notes. However in this latest patchset it's finally front and center. Its time to explore - we (really) need to talk about layout. Let's face it,with ADF Faces rich client, layout is a rather arcane pursuit, once you are a layout master, all bow before you, but it's more of an art than a science, and it is often, in fact, way too difficult to achieve what should (apparently) be a pretty simple. Here's a great example, it's a homework assignment I set for folks I'm teaching this stuff to: The requirements for this layout are: The header is 80px high, the footer is 30px. These are both fixed. The first section of the header containing the logo is 180px wide The logo is centered within the top left hand corner of the header The title text is start aligned in the center zone of the header and will wrap if the browser window is narrowed. It should be aligned in the center of the vertical space The about link is anchored to the right hand side of the browser with a 20px gap and again is center aligned vertically. It will move as the browser window is reduced in width. The footer has a right aligned copyright statement, again middle aligned within a 30px high footer region and with a 20px buffer to the right hand edge. It will move as the browser window is reduced in width. All remaining space is given to a central zone, which, in this case contains a panelSplitter. Expect that at some point in time you'll need a separate messages line in the center of the footer. In the homework assigment I set I also stipulate that no inlineStyles can be used to control alignment or margins and no use of other taglibs (e.g. JSF HTML or Trinidad HTML). So, if we take this purist approach, that basic page layout (in my stock solution) requires 3 panelStretchLayouts, 5 panelGroupLayouts and 4 spacers - not including the spacer I use for the logo and the contents of the central zone splitter - phew! The point is that even a seemingly simple layout needs a bit of thinking about, particulatly when you consider strechting and browser re-size behavior. In fact, this little sample actually teaches you much of what you need to know to become vaguely competant at layouts in the framework. The underlying result of "the way things are" is that most of us reach for panelStretchLayout before even finishing the first sip of coffee as we embark on a new page design. In fact most pages you will see in any moderately complex ADF page will basically be nested panelStretchLayouts and panelGroupLayouts, sometimes many, many levels deep. So this is a problem, we've known this for some time and now we have a good solution. (I should point out that the oft-used Trinidad trh tags are not a particularly good solution as you're tie-ing yourself to an HTML table based layout in that case with a host of attendent issues in resize and bi-di behavior, but I digress.) So, tadaaa, I give to you panelGridLayout. PanelGrid, as the name suggests takes a grid like (dare I say slightly gridbag-like) approach to layout, dividing your layout into rows and colums with margins, sizing, stretch behaviour, colspans and rowspans all rolled in, all without the use of inlineStyle. As such, it provides for a much more powerful and consise way of defining a layout such as the one above that is actually simpler and much more logical to design. The basic building blocks are the panelGridLayout itself, gridRow and gridCell. Your content sits inside the cells inside the rows, all helpfully allowing both streching, valign and halign definitions without the need to nest further panelGroupLayouts. So much simpler! If I break down the homework example above my nested comglomorate of 12 containers and spacers can be condensed down into a single panelGrid with 3 rows and 5 cell definitions (39 lines of source reduced to 24 in the case of the sample). What's more, the actual runtime representation in the browser DOM is much, much simpler, and clean, with basically one DIV per cell (Note that just because the panelGridLayout semantics looks like an HTML table does not mean that it's rendered that way!) . Another hidden benefit is the runtime cost. Because we can use a single layout to achieve much more complex geometries the client side layout code inside the browser is having to work a lot less. This will be a real benefit if your application needs to run on lower powered clients such as netbooks or tablets. So, it's time, if you're on 11.1.2.2 or above, to smile warmly at your panelStretchLayouts, wrap the blanket around it's knees and wheel it off to the Sunset Retirement Home for a well deserved rest. There's a new kid on the block and it wants to be your friend.

Read the article
How can I estimate the entropy of a password?

- by Wug

Having read various resources about password strength I'm trying to create an algorithm that will provide a rough estimation of how much entropy a password has. I'm trying to create an algorithm that's as comprehensive as possible. At this point I only have pseudocode, but the algorithm covers the following: password length repeated characters patterns (logical) different character spaces (LC, UC, Numeric, Special, Extended) dictionary attacks It does NOT cover the following, and SHOULD cover it WELL (though not perfectly): ordering (passwords can be strictly ordered by output of this algorithm) patterns (spatial) Can anyone provide some insight on what this algorithm might be weak to? Specifically, can anyone think of situations where feeding a password to the algorithm would OVERESTIMATE its strength? Underestimations are less of an issue. The algorithm: // the password to test password = ? length = length(password) // unique character counts from password (duplicates discarded) uqlca = number of unique lowercase alphabetic characters in password uquca = number of uppercase alphabetic characters uqd = number of unique digits uqsp = number of unique special characters (anything with a key on the keyboard) uqxc = number of unique special special characters (alt codes, extended-ascii stuff) // algorithm parameters, total sizes of alphabet spaces Nlca = total possible number of lowercase letters (26) Nuca = total uppercase letters (26) Nd = total digits (10) Nsp = total special characters (32 or something) Nxc = total extended ascii characters that dont fit into other categorys (idk, 50?) // algorithm parameters, pw strength growth rates as percentages (per character) flca = entropy growth factor for lowercase letters (.25 is probably a good value) fuca = EGF for uppercase letters (.4 is probably good) fd = EGF for digits (.4 is probably good) fsp = EGF for special chars (.5 is probably good) fxc = EGF for extended ascii chars (.75 is probably good) // repetition factors. few unique letters == low factor, many unique == high rflca = (1 - (1 - flca) ^ uqlca) rfuca = (1 - (1 - fuca) ^ uquca) rfd = (1 - (1 - fd ) ^ uqd ) rfsp = (1 - (1 - fsp ) ^ uqsp ) rfxc = (1 - (1 - fxc ) ^ uqxc ) // digit strengths strength = ( rflca * Nlca + rfuca * Nuca + rfd * Nd + rfsp * Nsp + rfxc * Nxc ) ^ length entropybits = log_base_2(strength) A few inputs and their desired and actual entropy_bits outputs: INPUT DESIRED ACTUAL aaa very pathetic 8.1 aaaaaaaaa pathetic 24.7 abcdefghi weak 31.2 H0ley$Mol3y_ strong 72.2 s^fU¬5ü;y34G< wtf 88.9 [a^36]* pathetic 97.2 [a^20]A[a^15]* strong 146.8 xkcd1** medium 79.3 xkcd2** wtf 160.5 * these 2 passwords use shortened notation, where [a^N] expands to N a's. ** xkcd1 = "Tr0ub4dor&3", xkcd2 = "correct horse battery staple" The algorithm does realize (correctly) that increasing the alphabet size (even by one digit) vastly strengthens long passwords, as shown by the difference in entropy_bits for the 6th and 7th passwords, which both consist of 36 a's, but the second's 21st a is capitalized. However, they do not account for the fact that having a password of 36 a's is not a good idea, it's easily broken with a weak password cracker (and anyone who watches you type it will see it) and the algorithm doesn't reflect that. It does, however, reflect the fact that xkcd1 is a weak password compared to xkcd2, despite having greater complexity density (is this even a thing?). How can I improve this algorithm? Addendum 1 Dictionary attacks and pattern based attacks seem to be the big thing, so I'll take a stab at addressing those. I could perform a comprehensive search through the password for words from a word list and replace words with tokens unique to the words they represent. Word-tokens would then be treated as characters and have their own weight system, and would add their own weights to the password. I'd need a few new algorithm parameters (I'll call them lw, Nw ~= 2^11, fw ~= .5, and rfw) and I'd factor the weight into the password as I would any of the other weights. This word search could be specially modified to match both lowercase and uppercase letters as well as common character substitutions, like that of E with 3. If I didn't add extra weight to such matched words, the algorithm would underestimate their strength by a bit or two per word, which is OK. Otherwise, a general rule would be, for each non-perfect character match, give the word a bonus bit. I could then perform simple pattern checks, such as searches for runs of repeated characters and derivative tests (take the difference between each character), which would identify patterns such as 'aaaaa' and '12345', and replace each detected pattern with a pattern token, unique to the pattern and length. The algorithmic parameters (specifically, entropy per pattern) could be generated on the fly based on the pattern. At this point, I'd take the length of the password. Each word token and pattern token would count as one character; each token would replace the characters they symbolically represented. I made up some sort of pattern notation, but it includes the pattern length l, the pattern order o, and the base element b. This information could be used to compute some arbitrary weight for each pattern. I'd do something better in actual code. Modified Example: Password: 1234kitty$$$$$herpderp Tokenized: 1 2 3 4 k i t t y $ $ $ $ $ h e r p d e r p Words Filtered: 1 2 3 4 @W5783 $ $ $ $ $ @W9001 @W9002 Patterns Filtered: @P[l=4,o=1,b='1'] @W5783 @P[l=5,o=0,b='$'] @W9001 @W9002 Breakdown: 3 small, unique words and 2 patterns Entropy: about 45 bits, as per modified algorithm Password: correcthorsebatterystaple Tokenized: c o r r e c t h o r s e b a t t e r y s t a p l e Words Filtered: @W6783 @W7923 @W1535 @W2285 Breakdown: 4 small, unique words and no patterns Entropy: 43 bits, as per modified algorithm The exact semantics of how entropy is calculated from patterns is up for discussion. I was thinking something like: entropy(b) * l * (o + 1) // o will be either zero or one The modified algorithm would find flaws with and reduce the strength of each password in the original table, with the exception of s^fU¬5ü;y34G<, which contains no words or patterns.

Read the article

Search Results

Search found 362 results on 15 pages for 'semantics'.

Page 12/15 | < Previous Page | 8 9 10 11 12 13 14 15 | Next Page >

- by BillyONeal

- by Banang

- by Weston C

- by riko

- by polygenelubricants

- by zpasternack

- by Rob Bryce

- by msw

- by Charles Stewart

- by Michael Deardeuff

- by mattbh

- by nickf

- by tkolar

- by Tom Carstens

- by Mike

- by clarkk

- by Jim Mischel

- by Antonis Christofides

- by bacar

- by Paulo Morgado

- by Janice J. Heiss

- by Reed

- by Nick Harrison

- by Duncan Mills

- by Wug

< Previous Page | 8 9 10 11 12 13 14 15 | Next Page >