Search Results

Search found 114 results on 5 pages for 'phd'.

Page 5/5 | < Previous Page | 1 2 3 4 5

[LaTeX] positions of page numbers, position of chapter headings, chapters AND Table of Contents, Ref

- by kaikanmonaco

I am writing my PhD thesis (120+ pages) in latex, the deadline is approaching and I am struggling with layout problems. I am using the documentstyle book. I am posting both problems in this one thread because I am not sure if the solution might be related to both problems or not. Problems are: 1.) The page numbers are mostly located on the top-right of each page (this is correct and where I want them to be). However, only on the first page of chapters and on the first page of what I call "special chapters", the page number is located bottom-centered. With "special chapters" I mean: List of Contents, List of Figures, List of Tables, References, Index. My university will not accept the thesis like this. The page number must ALWAYS be top-right one each page, even if the page is the first page of a chapter or the first page of something like the List of Contents. How can I fix this? 2.) On the first page of chapters and "special chapters" (List of Contents...), the chapter title is located far too low on the page. This is the standard layout of LaTeX with documentstyle book I think. However, the chapter title must start at the very top of the page! I.e. the same height as the normal text on the pages that follow. I mean the chapter title, not the header. I.e., if there is a chapter called "Chapter 1 Dynamics of foobar under mechanical stress" then that text has to start from the top the page, but right now it starts several centimeters below the top. How can I fix this? Have tried all kinds of things to no effect, I'd be very thankful for a solution! Thanks.

Read the article
Recent Innovations to ILOM

- by B.Koch

by Josh Rosen If you are wondering how Oracle can make some of the most advanced, reliable, and fault tolerant servers on the market, look no further than Oracle Integrated Lights Out Manager or ILOM. We build ILOM into every server we create, from Oracle x86 Systems such as X3-2 to the SPARC T-Series family. Oracle ILOM is an embedded service processor, but it's really more than that. It's a computer within a computer. It's smart, it's tightly integrated into all aspects of the server's operation, and it's a big reason why Oracle servers are used for some of the most mission-critical workloads out there. To understand the value of ILOM, there is no better place to start than its fault management capability. We have taken the sophisticated fault management architecture from Solaris, developed and refined over a decade, and built it into each and every ILOM. ILOM detects a potential issue at its earliest stage, watching low-level sensors. If the root cause of a problem is not clear from a single error reading, ILOM will look for other clues and combine multiple pieces of information to correctly identify a failing component. ILOM provides peace of mind. We tailor our fault management for each new server platform that we produce. You can rest assured that it's always actively keeping the server healthy. And if there is a problem, you can be confident it will let you know by sending you a notification by e-mail or trap. We also heard IT managers tell us they needed a Ph.D. in computer engineering to manage today's servers. It doesn't have to be that way. Thanks to the latest innovations to Oracle ILOM, we present hardware inventory and status in way that makes sense – to anyone. Green means everything is healthy and red means something is wrong. When a component needs to be replaced a clear message indicates where the problem is and points you at a knowledge article about that problem. It's that simple. Simpler management and simple interfaces mean reduced complexity and lower costs to manage. And we know that's really important. ILOM does all this while also providing advanced service processor features you depend on for managing enterprise class systems. You can remotely control the server power, interact with a virtual video console for the server, and mount media on the server remotely. There is no need to spend money on a KVM switch to get this functionality. And when people hear how advanced ILOM is, they can't believe ILOM is free. All features are enabled and included with each Oracle server that you buy. There are no advanced licenses you need to purchase or features to unlock. Configuring ILOM has also never been easier. It is now possible to configure almost all aspects of the server directly from ILOM. This includes changing BIOS settings, persistently modifying boot order, and optimizing power settings -- all directly from ILOM. But Oracle's innovation does not stop with ILOM. Oracle has engineered Oracle Enterprise Manager Ops Center to integrate directly with ILOM, providing centralized management across all of our servers. Ops Center will discover each of your Oracle servers over the network by searching for ILOMs. When it finds one, it knows how to communicate with ILOM to monitoring and configure that server from application to disk. Since every server that Oracle produces, from x86 Systems to SPARC T-Series up and down the line, comes with Oracle ILOM, you can manage all Oracle servers in the same way. And while all of our servers may have different components on the inside, each with their specialized functions, the way you integrate them and the way you monitor and manage them is exactly the same. Oracle ILOM is state-of-art. If you are looking for a server that make systems management simple and is easy to integrate and maintain, check out the latest advances to Oracle ILOM. Josh Rosen is a Principal Product Manager at Oracle and previously spent more than a decade as a developer and architect of system management software. Josh has worked on system management for many of Oracle's hardware products ranging from the earliest blade systems to the latest Oracle x86 servers.

Read the article
Analysing and measuring the performance of a .NET application (survey results)

- by Laila

Back in December last year, I asked myself: could it be that .NET developers think that you need three days and a PhD to do performance profiling on their code? What if developers are shunning profilers because they perceive them as too complex to use? If so, then what method do they use to measure and analyse the performance of their .NET applications? Do they even care about performance? So, a few weeks ago, I decided to get a 1-minute survey up and running in the hopes that some good, hard data would clear the matter up once and for all. I posted the survey on Simple Talk and got help from a few people to promote it. The survey consisted of 3 simple questions: Amazingly, 533 developers took the time to respond - which means I had enough data to get representative results! So before I go any further, I would like to thank all of you who contributed, because I now have some pretty good answers to the troubling questions I was asking myself. To thank you properly, I thought I would share some of the results with you. First of all, application performance is indeed important to most of you. In fact, performance is an intrinsic part of the development cycle for a good 40% of you, which is much higher than I had anticipated, I have to admit. (I know, "Have a little faith Laila!") When asked what tool you use to measure and analyse application performance, I found that nearly half of the respondents use logging statements, a third use performance counters, and 70% of respondents use a profiler of some sort (a 3rd party performance profilers, the CLR profiler or the Visual Studio profiler). The importance attributed to logging statements did surprise me a little. I am still not sure why somebody would go to the trouble of manually instrumenting code in order to measure its performance, instead of just using a profiler. I personally find the process of annotating code, calculating times from log files, and relating it all back to your source terrifyingly laborious. Not to mention that you then need to remember to turn it all off later! Even when you have logging in place throughout all your code anyway, you still have a fair amount of potentially error-prone calculation to sift through the results; in addition, you'll only get method-level rather than line-level timings, and you won't get timings from any framework or library methods you don't have source for. To top it all, we all know that bottlenecks are rarely where you would expect them to be, so you could be wasting time looking for a performance problem in the wrong place. On the other hand, profilers do all the work for you: they automatically collect the CPU and wall-clock timings, and present the results from method timing all the way down to individual lines of code. Maybe I'm missing a trick. I would love to know about the types of scenarios where you actively prefer to use logging statements. Finally, while a third of the respondents didn't have a strong opinion about code performance profilers, those who had an opinion thought that they were mainly complex to use and time consuming. Three respondents in particular summarised this perfectly: "sometimes, they are rather complex to use, adding an additional time-sink to the process of trying to resolve the existing problem". "they are simple to use, but the results are hard to understand" "Complex to find the more advanced things, easy to find some low hanging fruit". These results confirmed my suspicions: Profilers are seen to be designed for more advanced users who can use them effectively and make sense of the results. I found yet more interesting information when I started comparing samples of "developers for whom performance is an important part of the dev cycle", with those "to whom performance is only looked at in times of crisis", and "developers to whom performance is not important, as long as the app works". See the three graphs below. Sample of developers to whom performance is an important part of the dev cycle: Sample of developers to whom performance is important only in times of crisis: Sample of developers to whom performance is not important, as long as the app works: As you can see, there is a strong correlation between the usage of a profiler and the importance attributed to performance: indeed, the more important performance is to a development team, the more likely they are to use a profiler. In addition, developers to whom performance is an important part of the dev cycle have a higher tendency to use a much wider range of methods for performance measurement and analysis. And, unsurprisingly, the less important performance is, the less varied the methods of measurement are. So all in all, to come back to my random questions: .NET developers do care about performance. Those who care the most use a wider range of performance measurement methods than those who care less. But overall, logging statements, performance counters and third party performance profilers are the performance measurement methods of choice for most developers. Finally, although most of you find code profilers complex to use, those of you who care the most about performance tend to use profilers more than those of you to whom performance is not so important.

Read the article
Is the Scala 2.8 collections library a case of "the longest suicide note in history" ?

- by oxbow_lakes

First note the inflammatory subject title is a quotation made about the manifesto of a UK political party in the early 1980s. This question is subjective but it is a genuine question, I've made it CW and I'd like some opinions on the matter. Despite whatever my wife and coworkers keep telling me, I don't think I'm an idiot: I have a good degree in mathematics from the University of Oxford and I've been programming commercially for almost 12 years and in Scala for about a year (also commercially). I have just started to look at the Scala collections library re-implementation which is coming in the imminent 2.8 release. Those familiar with the library from 2.7 will notice that the library, from a usage perspective, has changed little. For example... > List("Paris", "London").map(_.length) res0: List[Int] List(5, 6) ...would work in either versions. The library is eminently useable: in fact it's fantastic. However, those previously unfamiliar with Scala and poking around to get a feel for the language now have to make sense of method signatures like: def map[B, That](f: A => B)(implicit bf: CanBuildFrom[Repr, B, That]): That For such simple functionality, this is a daunting signature and one which I find myself struggling to understand. Not that I think Scala was ever likely to be the next Java (or /C/C++/C#) - I don't believe its creators were aiming it at that market - but I think it is/was certainly feasible for Scala to become the next Ruby or Python (i.e. to gain a significant commercial user-base) Is this going to put people off coming to Scala? Is this going to give Scala a bad name in the commercial world as an academic plaything that only dedicated PhD students can understand? Are CTOs and heads of software going to get scared off? Was the library re-design a sensible idea? If you're using Scala commercially, are you worried about this? Are you planning to adopt 2.8 immediately or wait to see what happens? Steve Yegge once attacked Scala (mistakenly in my opinion) for what he saw as its overcomplicated type-system. I worry that someone is going to have a field day spreading fud with this API (similarly to how Josh Bloch scared the JCP out of adding closures to Java). Note - I should be clear that, whilst I believe that Josh Bloch was influential in the rejection of the BGGA closures proposal, I don't ascribe this to anything other than his honestly-held beliefs that the proposal represented a mistake.

Read the article
The Java Specialist: An Interview with Java Champion Heinz Kabutz

- by Janice J. Heiss

Dr. Heinz Kabutz is well known for his Java Specialists’ Newsletter, initiated in November 2000, where he displays his acute grasp of the intricacies of the Java platform for an estimated 70,000 readers; for his work as a consultant; and for his workshops and trainings at his home on the Island of Crete where he has lived since 2006 -- where he is known to curl up on the beach with his laptop to hack away, in between dips in the Mediterranean. Kabutz was born of German parents and raised in Cape Town, South Africa, where he developed a love of programming in junior high school through his explorations on a ZX Spectrum computer. He received a B.S. from the University of Cape Town, and at 25, a Ph.D., both in computer science. He will be leading a two-hour hands-on lab session, HOL6500 – “Finding and Solving Java Deadlocks,” at this year’s JavaOne that will explore what causes deadlocks and how to solve them. Q: Tell us about your JavaOne plans.A: I am arriving on Sunday evening and have just one hands-on-lab to do on Monday morning. This is the first time that a non-Oracle team is doing a HOL at JavaOne under Oracle's stewardship and we are all a bit nervous about how it will turn out. Oracle has been immensely helpful in getting us set up. I have a great team helping me: Kirk Pepperdine, Dario Laverde, Benjamin Evans and Martijn Verburg from jClarity, Nathan Reynolds from Oracle, Henri Tremblay of OCTO Technology and Jeff Genender of Savoir Technologies. Monday will be hard work, but after that, I will hopefully get to network with fellow Java experts, attend interesting sessions and just enjoy San Francisco. Oh, and my kids have already given me a shopping list of things to get, like a GoPro Hero 2 dive housing for shooting those nice videos of Crete. (That's me at the beginning diving down.) Q: What sessions are you attending that we should know about?A: Sometimes the most unusual sessions are the best. I avoid the "big names". They often are spread too thin with all their sessions, which makes it difficult for them to deliver what I would consider deep content. I also avoid entertainers who might be good at presenting but who do not say that much.In 2010, I attended a session by Vladimir Yaroslavskiy where he talked about sorting. Although he struggled to speak English, what he had to say was spectacular. There was hardly anybody in the room, having not heard of Vladimir before. To me that was the highlight of 2010. Funnily enough, he was supposed to speak with Joshua Bloch, but if you remember, Google cancelled. If Bloch has been there, the room would have been packed to capacity.Q: Give us an update on the Java Specialists’ Newsletter.A: The Java Specialists' Newsletter continues being read by an elite audience around the world. The apostrophe in the name is significant. It is a newsletter for Java specialists. When I started it twelve years ago, I was trying to find non-obvious things in Java to write about. Things that would be interesting to an advanced audience.As an April Fool's joke, I told my readers in Issue 44 that subscribing would remain free, but that they would have to pay US$5 to US$7 depending on their geographical location. I received quite a few angry emails from that one. I would have not earned that much from unsubscriptions. Most readers stay for a very long time.After Oracle bought Sun, the Java community held its breath for about two years whilst Oracle was figuring out what to do with Java. For a while, we were quite concerned that there was not much progress shown by Oracle. My newsletter still continued, but it was quite difficult finding new things to write about. We have probably about 70,000 readers, which is quite a small number for a Java publication. However, our readers are the top in the Java industry. So I don't mind having "only" 70000 readers, as long as they are the top 0.7%.Java concurrency is a very important topic that programmers think they should know about, but often neglect to fully understand. I continued writing about that and made some interesting discoveries. For example, in Issue 165, I showed how we can get thread starvation with the ReadWriteLock. This was a bug in Java 5, which was corrected in Java 6, but perhaps a bit too much. Whereas we could get starvation of writers in Java 5, in Java 6 we could now get starvation of readers. All of these interesting findings make their way into my courseware to help companies avoid these pitfalls.Another interesting discovery was how polymorphism works in the Server HotSpot compiler in Issue 157 and Issue 158. HotSpot can inline methods from interfaces that have only one implementation class in the JVM. When a new subclass is instantiated and called for the first time, the JVM will undo the previous optimization and re-optimize differently.Here is a little memory puzzle for your readers: public class JavaMemoryPuzzle { private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6); public void f() { { byte[] data = new byte[dataSize]; } byte[] data2 = new byte[dataSize]; } public static void main(String[] args) { JavaMemoryPuzzle jmp = new JavaMemoryPuzzle(); jmp.f(); }}When you run this you will always get an OutOfMemoryError, even though the local variable data is no longer visible outside of the code block.So here comes the puzzle, that I'd like you to ponder a bit. If you very politely ask the VM to release memory, then you don't get an OutOfMemoryError: public class JavaMemoryPuzzlePolite { private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6); public void f() { { byte[] data = new byte[dataSize]; } for(int i=0; i<10; i++) { System.out.println("Please be so kind and release memory"); } byte[] data2 = new byte[dataSize]; } public static void main(String[] args) { JavaMemoryPuzzlePolite jmp = new JavaMemoryPuzzlePolite(); jmp.f(); System.out.println("No OutOfMemoryError"); }}Why does this work? When I published this in my newsletter, I received over 400 emails from excited readers around the world, most of whom sent me the wrong explanation. After the 300th wrong answer, my replies became unfortunately a bit curt. Have a look at Issue 174 for a detailed explanation, but before you do, put on your thinking caps and try to figure it out yourself. Q: What do you think Java developers should know that they currently do not know?A: They should definitely get to know more about concurrency. It is a tough subject that most programmers try to avoid. Unfortunately we do come in contact with it. And when we do, we need to know how to protect ourselves and how to solve tricky system errors.Knowing your IDE is also useful. Most IDEs have a ton of shortcuts, which can make you a lot more productive in moving code around. Another thing that is useful is being able to read GC logs. Kirk Pepperdine has a great talk at JavaOne that I can recommend if you want to learn more. It's this: CON5405 – “Are Your Garbage Collection Logs Speaking to You?” Q: What are you looking forward to in Java 8?A: I'm quite excited about lambdas, though I must confess that I have not studied them in detail yet. Maurice Naftalin's Lambda FAQ is quite a good start to document what you can do with them. I'm looking forward to finding all the interesting bugs that we will now get due to lambdas obscuring what is really going on underneath, just like we had with generics.I am quite impressed with what the team at Oracle did with OpenJDK's performance. A lot of the benchmarks now run faster.Hopefully Java 8 will come with JSR 310, the Date and Time API. It still boggles my mind that such an important API has been left out in the cold for so long.What I am not looking forward to is losing perm space. Even though some systems run out of perm space, at least the problem is contained and they usually manage to work around it. In most cases, this is due to a memory leak in that region of memory. Once they bundle perm space with the old generation, I predict that memory leaks in perm space will be harder to find. More contracts for us, but also more pain for our customers.

Read the article
The Java Specialist: An Interview with Java Champion Heinz Kabutz

- by Janice J. Heiss

Dr. Heinz Kabutz is well known for his Java Specialists’ Newsletter, initiated in November 2000, where he displays his acute grasp of the intricacies of the Java platform for an estimated 70,000 readers; for his work as a consultant; and for his workshops and trainings at his home on the Island of Crete where he has lived since 2006 -- where he is known to curl up on the beach with his laptop to hack away, in between dips in the Mediterranean. Kabutz was born of German parents and raised in Cape Town, South Africa, where he developed a love of programming in junior high school through his explorations on a ZX Spectrum computer. He received a B.S. from the University of Cape Town, and at 25, a Ph.D., both in computer science. He will be leading a two-hour hands-on lab session, HOL6500 – “Finding and Solving Java Deadlocks,” at this year’s JavaOne that will explore what causes deadlocks and how to solve them. Q: Tell us about your JavaOne plans.A: I am arriving on Sunday evening and have just one hands-on-lab to do on Monday morning. This is the first time that a non-Oracle team is doing a HOL at JavaOne under Oracle's stewardship and we are all a bit nervous about how it will turn out. Oracle has been immensely helpful in getting us set up. I have a great team helping me: Kirk Pepperdine, Dario Laverde, Benjamin Evans and Martijn Verburg from jClarity, Nathan Reynolds from Oracle, Henri Tremblay of OCTO Technology and Jeff Genender of Savoir Technologies. Monday will be hard work, but after that, I will hopefully get to network with fellow Java experts, attend interesting sessions and just enjoy San Francisco. Oh, and my kids have already given me a shopping list of things to get, like a GoPro Hero 2 dive housing for shooting those nice videos of Crete. (That's me at the beginning diving down.) Q: What sessions are you attending that we should know about?A: Sometimes the most unusual sessions are the best. I avoid the "big names". They often are spread too thin with all their sessions, which makes it difficult for them to deliver what I would consider deep content. I also avoid entertainers who might be good at presenting but who do not say that much.In 2010, I attended a session by Vladimir Yaroslavskiy where he talked about sorting. Although he struggled to speak English, what he had to say was spectacular. There was hardly anybody in the room, having not heard of Vladimir before. To me that was the highlight of 2010. Funnily enough, he was supposed to speak with Joshua Bloch, but if you remember, Google cancelled. If Bloch has been there, the room would have been packed to capacity.Q: Give us an update on the Java Specialists’ Newsletter.A: The Java Specialists' Newsletter continues being read by an elite audience around the world. The apostrophe in the name is significant. It is a newsletter for Java specialists. When I started it twelve years ago, I was trying to find non-obvious things in Java to write about. Things that would be interesting to an advanced audience.As an April Fool's joke, I told my readers in Issue 44 that subscribing would remain free, but that they would have to pay US$5 to US$7 depending on their geographical location. I received quite a few angry emails from that one. I would have not earned that much from unsubscriptions. Most readers stay for a very long time.After Oracle bought Sun, the Java community held its breath for about two years whilst Oracle was figuring out what to do with Java. For a while, we were quite concerned that there was not much progress shown by Oracle. My newsletter still continued, but it was quite difficult finding new things to write about. We have probably about 70,000 readers, which is quite a small number for a Java publication. However, our readers are the top in the Java industry. So I don't mind having "only" 70000 readers, as long as they are the top 0.7%.Java concurrency is a very important topic that programmers think they should know about, but often neglect to fully understand. I continued writing about that and made some interesting discoveries. For example, in Issue 165, I showed how we can get thread starvation with the ReadWriteLock. This was a bug in Java 5, which was corrected in Java 6, but perhaps a bit too much. Whereas we could get starvation of writers in Java 5, in Java 6 we could now get starvation of readers. All of these interesting findings make their way into my courseware to help companies avoid these pitfalls.Another interesting discovery was how polymorphism works in the Server HotSpot compiler in Issue 157 and Issue 158. HotSpot can inline methods from interfaces that have only one implementation class in the JVM. When a new subclass is instantiated and called for the first time, the JVM will undo the previous optimization and re-optimize differently.Here is a little memory puzzle for your readers: public class JavaMemoryPuzzle { private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6); public void f() { { byte[] data = new byte[dataSize]; } byte[] data2 = new byte[dataSize]; } public static void main(String[] args) { JavaMemoryPuzzle jmp = new JavaMemoryPuzzle(); jmp.f(); }}When you run this you will always get an OutOfMemoryError, even though the local variable data is no longer visible outside of the code block.So here comes the puzzle, that I'd like you to ponder a bit. If you very politely ask the VM to release memory, then you don't get an OutOfMemoryError: public class JavaMemoryPuzzlePolite { private final int dataSize = (int) (Runtime.getRuntime().maxMemory() * 0.6); public void f() { { byte[] data = new byte[dataSize]; } for(int i=0; i<10; i++) { System.out.println("Please be so kind and release memory"); } byte[] data2 = new byte[dataSize]; } public static void main(String[] args) { JavaMemoryPuzzlePolite jmp = new JavaMemoryPuzzlePolite(); jmp.f(); System.out.println("No OutOfMemoryError"); }}Why does this work? When I published this in my newsletter, I received over 400 emails from excited readers around the world, most of whom sent me the wrong explanation. After the 300th wrong answer, my replies became unfortunately a bit curt. Have a look at Issue 174 for a detailed explanation, but before you do, put on your thinking caps and try to figure it out yourself. Q: What do you think Java developers should know that they currently do not know?A: They should definitely get to know more about concurrency. It is a tough subject that most programmers try to avoid. Unfortunately we do come in contact with it. And when we do, we need to know how to protect ourselves and how to solve tricky system errors.Knowing your IDE is also useful. Most IDEs have a ton of shortcuts, which can make you a lot more productive in moving code around. Another thing that is useful is being able to read GC logs. Kirk Pepperdine has a great talk at JavaOne that I can recommend if you want to learn more. It's this: CON5405 – “Are Your Garbage Collection Logs Speaking to You?” Q: What are you looking forward to in Java 8?A: I'm quite excited about lambdas, though I must confess that I have not studied them in detail yet. Maurice Naftalin's Lambda FAQ is quite a good start to document what you can do with them. I'm looking forward to finding all the interesting bugs that we will now get due to lambdas obscuring what is really going on underneath, just like we had with generics.I am quite impressed with what the team at Oracle did with OpenJDK's performance. A lot of the benchmarks now run faster.Hopefully Java 8 will come with JSR 310, the Date and Time API. It still boggles my mind that such an important API has been left out in the cold for so long.What I am not looking forward to is losing perm space. Even though some systems run out of perm space, at least the problem is contained and they usually manage to work around it. In most cases, this is due to a memory leak in that region of memory. Once they bundle perm space with the old generation, I predict that memory leaks in perm space will be harder to find. More contracts for us, but also more pain for our customers. Originally published on blogs.oracle.com/javaone.

Read the article
NLP with greatly contrained input and abilities

- by Mike F

Hat in hand here. I'm a seasoned developer and I would be grateful for a bit of help. I don't have time to read or digest long intricate discussions on theoretical concepts around NLP (or go get my PHD). That said, I have read a few and it's a damn interesting field. The problem is I need real world solutions, for real world products, in real world time frames. The problem I'm having is right now I'm not sure what the right questions are to ask to get started implementing. I believe this is mostly related to vocabulary. I'll read somewhere, a blog post, a forum post, a whitepaper, and it says, I'm doing flooping with the blargy blarg method, and I go google flooping and blargy blarg, and I get references to more obscurity. It seemingly never ends. So, my question is multiphased. First, more generally, how do I become passingly educated on this quickly? Just in time educated. I only need to know what I need to know to take the next step. I've spent 20 years writing code. Explain quick. I'll get it. (I mean provide a reference to something that explains quickly of course). I'm happy to read the right book, but I don't want to read a book where I read the chapter introduction that explains what floopy floop is and then skip over the rest of the chapter with examples of floopy flooping (because now I get what it is). I also don't want to read a book that goes into too much detail with theoretical underpinnings or history. For example, the Jurafsky book seems like way more than I need: http://www.amazon.com/gp/product/0131873210. But I will read it if this is the right book to read. (It's also dang expensive!) I need the root node of the expedited learning tree here, if you will. Point me in the right direction and I'll be quite grateful. I'm expecting quite a lot of firehose drinking - I just need the right firehose. Second, what I need to do is take a single sentence, with a very reduced vocabulary, and get a grammar tree (sorry if this is the wrong terminology) that I can do something with. I know I could easily write this command line input style in c in a more conventional manner, but I need it to be way better than that. But I don't need a chatterbot either. What I'm doing needs to live in a constrained environment. I can't use Python (unfortunately). I can't ship with gigabytes of corpuses. I need any libraries I use to be in c/c++. If I have to write this myself, I will. Hopefully, it will be achievable considering the reduced problem set. Maybe, probably, that's just naive. If so, let me know. :-) Thanks in advance - Mike

Read the article
Google App Engine + Form Validation

- by Iwona

Hi, I would like to do google app engine form validation but I dont know how to do it? I tried like this: from google.appengine.ext.db import djangoforms from django import newforms as forms class SurveyForm(forms.Form): occupations_choices = ( ('1', ""), ('2', "Undergraduate student"), ('3', "Postgraduate student (MSc)"), ('4', "Postgraduate student (PhD)"), ('5', "Lab assistant"), ('6', "Technician"), ('7', "Lecturer"), ('8', "Other" ) ) howreach_choices = ( ('1', ""), ('2', "Typed the URL directly"), ('3', "Site is bookmarked"), ('4', "A search engine"), ('5', "A link from another site"), ('6', "From a book"), ('7', "Other") ) boxes_choices = ( ("des", "Website Design"), ("svr", "Web Server Administration"), ("com", "Electronic Commerce"), ("mkt", "Web Marketing/Advertising"), ("edu", "Web-Related Education") ) name = forms.CharField(label='Name', max_length=100, required=True) email = forms.EmailField(label='Your Email Address:') occupations = forms.ChoiceField(choices=occupations_choices, label='What is your occupation?') howreach = forms.ChoiceField(choices=howreach_choices, label='How did you reach this site?') # radio buttons 1-5 rating = forms.ChoiceField(choices=range(1,6), label='What is your occupation?', widget=forms.RadioSelect) boxes = forms.ChoiceField(choices=boxes_choices, label='Are you involved in any of the following? (check all that apply):', widget=forms.CheckboxInput) comment = forms.CharField(widget=forms.Textarea, required=False) And I wanted to display it like this: template_values = { 'url' : url, 'url_linktext' : url_linktext, 'userName' : userName, 'item1' : SurveyForm() } And I have this error message: Traceback (most recent call last): File "C:\Program Files\Google\google_appengine\google\appengine\ext\webapp_init_.py", line 515, in call handler.get(*groups) File "C:\Program Files\Google\google_appengine\demos\b00213576\main.py", line 144, in get self.response.out.write(template.render(path, template_values)) File "C:\Program Files\Google\google_appengine\google\appengine\ext\webapp\template.py", line 143, in render return t.render(Context(template_dict)) File "C:\Program Files\Google\google_appengine\google\appengine\ext\webapp\template.py", line 183, in wrap_render return orig_render(context) File "C:\Program Files\Google\google_appengine\lib\django\django\template_init_.py", line 168, in render return self.nodelist.render(context) File "C:\Program Files\Google\google_appengine\lib\django\django\template_init_.py", line 705, in render bits.append(self.render_node(node, context)) File "C:\Program Files\Google\google_appengine\lib\django\django\template_init_.py", line 718, in render_node return(node.render(context)) File "C:\Program Files\Google\google_appengine\lib\django\django\template\defaulttags.py", line 209, in render return self.nodelist_true.render(context) File "C:\Program Files\Google\google_appengine\lib\django\django\template_init_.py", line 705, in render bits.append(self.render_node(node, context)) File "C:\Program Files\Google\google_appengine\lib\django\django\template_init_.py", line 718, in render_node return(node.render(context)) File "C:\Program Files\Google\google_appengine\lib\django\django\template_init_.py", line 768, in render return self.encode_output(output) File "C:\Program Files\Google\google_appengine\lib\django\django\template_init_.py", line 757, in encode_output return str(output) File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\util.py", line 26, in str return self.unicode().encode(settings.DEFAULT_CHARSET) File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\forms.py", line 73, in unicode return self.as_table() File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\forms.py", line 144, in as_table return self._html_output(u'%(label)s%(errors)s%(field)s%(help_text)s', u'%s', '', u'%s', False) File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\forms.py", line 129, in _html_output output.append(normal_row % {'errors': bf_errors, 'label': label, 'field': unicode(bf), 'help_text': help_text}) File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\forms.py", line 232, in unicode value = value.str() File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\util.py", line 26, in str return self.unicode().encode(settings.DEFAULT_CHARSET) File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\widgets.py", line 246, in unicode return u'\n%s\n' % u'\n'.join([u'%s' % w for w in self]) File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\widgets.py", line 238, in iter yield RadioInput(self.name, self.value, self.attrs.copy(), choice, i) File "C:\Program Files\Google\google_appengine\lib\django\django\newforms\widgets.py", line 212, in init self.choice_value = smart_unicode(choice[0]) TypeError: 'int' object is unsubscriptable Do You have any idea how I can do this validation in different case? I have tried to do it using this kind of: class ItemUserAnswer(djangoforms.ModelForm): class Meta: model = UserAnswer But I dont know how to add extra labels to this form and it is displayed in one line. Do You have any suggestions? Thanks a lot as it making me crazy why it is still not working:/

Read the article
How to design a high-level application protocol for metadata syncing between devices and server?

- by Jaanus

I am looking for guidance on how to best think about designing a high-level application protocol to sync metadata between end-user devices and a server. My goal: the user can interact with the application data on any device, or on the web. The purpose of this protocol is to communicate changes made on one endpoint to other endpoints through the server, and ensure all devices maintain a consistent picture of the application data. If user makes changes on one device or on the web, the protocol will push data to the central repository, from where other devices can pull it. Some other design thoughts: I call it "metadata syncing" because the payloads will be quite small, in the form of object IDs and small metadata about those ID-s. When client endpoints retrieve new metadata over this protocol, they will fetch actual object data from an external source based on this metadata. Fetching the "real" object data is out of scope, I'm only talking about metadata syncing here. Using HTTP for transport and JSON for payload container. The question is basically about how to best design the JSON payload schema. I want this to be easy to implement and maintain on the web and across desktop and mobile devices. The best approach feels to be simple timer- or event-based HTTP request/response without any persistent channels. Also, you should not have a PhD to read it, and I want my spec to fit on 2 pages, not 200. Authentication and security are out of scope for this question: assume that the requests are secure and authenticated. The goal is eventual consistency of data on devices, it is not entirely realtime. For example, user can make changes on one device while being offline. When going online again, user would perform "sync" operation to push local changes and retrieve remote changes. Having said that, the protocol should support both of these modes of operation: Starting from scratch on a device, should be able to pull the whole metadata picture "sync as you go". When looking at the data on two devices side by side and making changes, should be easy to push those changes as short individual messages which the other device can receive near-realtime (subject to when it decides to contact server for sync). As a concrete example, you can think of Dropbox (it is not what I'm working on, but it helps to understand the model): on a range of devices, the user can manage a files and folders—move them around, create new ones, remove old ones etc. And in my context the "metadata" would be the file and folder structure, but not the actual file contents. And metadata fields would be something like file/folder name and time of modification (all devices should see the same time of modification). Another example is IMAP. I have not read the protocol, but my goals (minus actual message bodies) are the same. Feels like there are two grand approaches how this is done: transactional messages. Each change in the system is expressed as delta and endpoints communicate with those deltas. Example: DVCS changesets. REST: communicating the object graph as a whole or in part, without worrying so much about the individual atomic changes. What I would like in the answers: Is there anything important I left out above? Constraints, goals? What is some good background reading on this? (I realize this is what many computer science courses talk about at great length and detail... I am hoping to short-circuit it by looking at some crash course or nuggets.) What are some good examples of such protocols that I could model after, or even use out of box? (I mention Dropbox and IMAP above... I should probably read the IMAP RFC.)

Read the article
Pain Comes Instantly

- by user701213

When I look back at recent blog entries – many of which are not all that current (more on where my available writing time is going later) – I am struck by how many of them focus on public policy or legislative issues instead of, say, the latest nefarious cyberattack or exploit (or everyone’s favorite new pastime: coining terms for the Coming Cyberpocalypse: “digital Pearl Harbor” is so 1941). Speaking of which, I personally hope evil hackers from Malefactoria will someday hack into my bathroom scale – which in a future time will be connected to the Internet because, gosh, wouldn’t it be great to have absolutely everything in your life Internet-enabled? – and recalibrate it so I’m 10 pounds thinner. The horror. In part, my focus on public policy is due to an admitted limitation of my skill set. I enjoy reading technical articles about exploits and cybersecurity trends, but writing a blog entry on those topics would take more research than I have time for and, quite honestly, doesn’t play to my strengths. The first rule of writing is “write what you know.” The bigger contributing factor to my recent paucity of blog entries is that more and more of my waking hours are spent engaging in “thrust and parry” activity involving emerging regulations of some sort or other. I’ve opined in earlier blogs about what constitutes good and reasonable public policy so nobody can accuse me of being reflexively anti-regulation. That said, you have so many cycles in the day, and most of us would rather spend it slaying actual dragons than participating in focus groups on whether dragons are really a problem, whether lassoing them (with organic, sustainable and recyclable lassos) is preferable to slaying them – after all, dragons are people, too - and whether we need lasso compliance auditors to make sure lassos are being used correctly and humanely. (A point that seems to evade many rule makers: slaying dragons actually accomplishes something, whereas talking about “approved dragon slaying procedures and requirements” wastes the time of those who are competent to dispatch actual dragons and who were doing so very well without the input of “dragon-slaying theorists.”) Unfortunately for so many of us who would just get on with doing our day jobs, cybersecurity is rapidly devolving into the “focus groups on dragon dispatching” realm, which actual dragons slayers have little choice but to participate in. The general trend in cybersecurity is that powers-that-be – which encompasses groups other than just legislators – are often increasingly concerned and therefore feel they need to Do Something About Cybersecurity. Many seem to believe that if only we had the right amount of regulation and oversight, there would be no data breaches: a breach simply must mean Someone Is At Fault and Needs Supervision. (Leaving aside the fact that we have lots of home invasions despite a) guard dogs b) liberal carry permits c) alarm systems d) etc.) Also note that many well-managed and security-aware organizations, like the US Department of Defense, still get hacked. More specifically, many powers-that-be feel they must direct industry in a multiplicity of ways, up to and including how we actually build and deploy information technology systems. The more prescriptive the requirement, the more regulators or overseers a) can be seen to be doing something b) feel as if they are doing something regardless of whether they are actually doing something useful or cost effective. Note: an unfortunate concomitant of Doing Something is that often the cure is worse than the ailment. That is, doing what overseers want creates unfortunate byproducts that they either didn’t foresee or worse, don’t care about. After all, the logic goes, we Did Something. Prescriptive practice in the IT industry is problematic for a number of reasons. For a start, prescriptive guidance is really only appropriate if: • It is cost effective• It is “current” (meaning, the guidance doesn’t require the use of the technical equivalent of buggy whips long after horse-drawn transportation has become passé)*• It is practical (that is, pragmatic, proven and effective in the real world, not theoretical and unproven)• It solves the right problem With the above in mind, heading up the list of “you must be joking” regulations are recent disturbing developments in the Payment Card Industry (PCI) world. I’d like to give PCI kahunas the benefit of the doubt about their intentions, except that efforts by Oracle among others to make them aware of “unfortunate side effects of your requirements” – which is as tactful I can be for reasons that I believe will become obvious below - have gone, to-date, unanswered and more importantly, unchanged. A little background on PCI before I get too wound up. In 2008, the Payment Card Industry (PCI) Security Standards Council (SSC) introduced the Payment Application Data Security Standard (PA-DSS). That standard requires vendors of payment applications to ensure that their products implement specific requirements and undergo security assessment procedures. In order to have an application listed as a Validated Payment Application (VPA) and available for use by merchants, software vendors are required to execute the PCI Payment Application Vendor Release Agreement (VRA). (Are you still with me through all the acronyms?) Beginning in August 2010, the VRA imposed new obligations on vendors that are extraordinary and extraordinarily bad, short-sighted and unworkable. Specifically, PCI requires vendors to disclose (dare we say “tell all?”) to PCI any known security vulnerabilities and associated security breaches involving VPAs. ASAP. Think about the impact of that. PCI is asking a vendor to disclose to them: • Specific details of security vulnerabilities • Including exploit information or technical details of the vulnerability • Whether or not there is any mitigation available (as in a patch) PCI, in turn, has the right to blab about any and all of the above – specifically, to distribute all the gory details of what is disclosed - to the PCI SSC, qualified security assessors (QSAs), and any affiliate or agent or adviser of those entities, who are in turn permitted to share it with their respective affiliates, agents, employees, contractors, merchants, processors, service providers and other business partners. This assorted crew can’t be more than, oh, hundreds of thousands of entities. Does anybody believe that several hundred thousand people can keep a secret? Or that several hundred thousand people are all equally trustworthy? Or that not one of the people getting all that information would blab vulnerability details to a bad guy, even by accident? Or be a bad guy who uses the information to break into systems? (Wait, was that the Easter Bunny that just hopped by? Bringing world peace, no doubt.) Sarcasm aside, common sense tells us that telling lots of people a secret is guaranteed to “unsecret” the secret. Notably, being provided details of a vulnerability (without a patch) is of little or no use to companies running the affected application. Few users have the technological sophistication to create a workaround, and even if they do, most workarounds break some other functionality in the application or surrounding environment. Also, given the differences among corporate implementations of any application, it is highly unlikely that a single workaround is going to work for all corporate users. So until a patch is developed by the vendor, users remain at risk of exploit: even more so if the details of vulnerability have been widely shared. Sharing that information widely before a patch is available therefore does not help users, and instead helps only those wanting to exploit known security bugs. There’s a shocker for you. Furthermore, we already know that insider information about security vulnerabilities inevitably leaks, which is why most vendors closely hold such information and limit dissemination until a patch is available (and frequently limit dissemination of technical details even with the release of a patch). That’s the industry norm, not that PCI seems to realize or acknowledge that. Why would anybody release a bunch of highly technical exploit information to a cast of thousands, whose only “vetting” is that they are members of a PCI consortium? Oracle has had personal experience with this problem, which is one reason why information on security vulnerabilities at Oracle is “need to know” (we use our own row level access control to limit access to security bugs in our bug database, and thus less than 1% of development has access to this information), and we don’t provide some customers with more information than others or with vulnerability information and/or patches earlier than others. Failure to remember “insider information always leaks” creates problems in the general case, and has created problems for us specifically. A number of years ago, one of the UK intelligence agencies had information about a non-public security vulnerability in an Oracle product that they circulated among other UK and Commonwealth defense and intelligence entities. Nobody, it should be pointed out, bothered to report the problem to Oracle, even though only Oracle could produce a patch. The vulnerability was finally reported to Oracle by (drum roll) a US-based commercial company, to whom the information had leaked. (Note: every time I tell this story, the MI-whatever agency that created the problem gets a bit shirty with us. I know they meant well and have improved their vulnerability handling/sharing processes but, dudes, next time you find an Oracle vulnerability, try reporting it to us first before blabbing to lots of people who can’t actually fix the problem. Thank you!) Getting back to PCI: clearly, these new disclosure obligations increase the risk of exploitation of a vulnerability in a VPA and thus, of misappropriation of payment card data and customer information that a VPA processes, stores or transmits. It stands to reason that VRA’s current requirement for the widespread distribution of security vulnerability exploit details -- at any time, but particularly before a vendor can issue a patch or a workaround -- is very poor public policy. It effectively publicizes information of great value to potential attackers while not providing compensating benefits - actually, any benefits - to payment card merchants or consumers. In fact, it magnifies the risk to payment card merchants and consumers. The risk is most prominent in the time before a patch has been released, since customers often have little option but to continue using an application or system despite the risks. However, the risk is not limited to the time before a patch is issued: customers often need days, or weeks, to apply patches to systems, based upon the complexity of the issue and dependence on surrounding programs. Rather than decreasing the available window of exploit, this requirement increases the available window of exploit, both as to time available to exploit a vulnerability and the ease with which it can be exploited. Also, why would hackers focus on finding new vulnerabilities to exploit if they can get “EZHack” handed to them in such a manner: a) a vulnerability b) in a payment application c) with exploit code: the “Hacking Trifecta!“ It’s fair to say that this is probably the exact opposite of what PCI – or any of us – would want. Established industry practice concerning vulnerability handling avoids the risks created by the VRA’s vulnerability disclosure requirements. Specifically, the norm is not to release information about a security bug until the associated patch (or a pretty darn good workaround) has been issued. Once a patch is available, the notice to the user community is a high-level communication discussing the product at issue, the level of risk associated with the vulnerability, and how to apply the patch. The notices do not include either the specific customers affected by the vulnerability or forensic reports with maps of the exploit (both of which are required by the current VRA). In this way, customers have the tools they need to prioritize patching and to help prevent an attack, and the information released does not increase the risk of exploit. Furthermore, many vendors already use industry standards for vulnerability description: Common Vulnerability Enumeration (CVE) and Common Vulnerability Scoring System (CVSS). CVE helps ensure that customers know which particular issues a patch addresses and CVSS helps customers determine how severe a vulnerability is on a relative scale. Industry already provides the tools customers need to know what the patch contains and how bad the problem is that the patch remediates. So, what’s a poor vendor to do? Oracle is reaching out to other vendors subject to PCI and attempting to enlist then in a broad effort to engage PCI in rethinking (that is, eradicating) these requirements. I would therefore urge all who care about this issue, but especially those in the vendor community whose applications are subject to PCI and who may not have know they were being asked to tell-all to PCI and put their customers at risk, to do one of the following: • Contact PCI with your concerns• Contact Oracle (we are looking for vendors to sign our statement of concern)• And make sure you tell your customers that you have to rat them out to PCI if there is a breach involving the payment application I like to be charitable and say “PCI meant well” but in as important a public policy issue as what you disclose about vulnerabilities, to whom and when, meaning well isn’t enough. We need to do well. PCI, as regards this particular issue, has not done well, and has compounded the error by thus far being nonresponsive to those of us who have labored mightily to try to explain why they might want to rethink telling the entire planet about security problems with no solutions. By Way of Explanation… Non-related to PCI whatsoever, and the explanation for why I have not been blogging a lot recently, I have been working on Other Writing Venues with my sister Diane (who has also worked in the tech sector, inflicting upgrades on unsuspecting and largely ungrateful end users). I am pleased to note that we have recently (self-)published the first in the Miss Information Technology Murder Mystery series, Outsourcing Murder. The genre might best be described as “chick lit meets geek scene.” Our sisterly nom de plume is Maddi Davidson and (shameless plug follows): you can order the paper version of the book on Amazon, or the Kindle or Nook versions on www.amazon.com or www.bn.com, respectively. From our book jacket: Emma Jones, a 20-something IT consultant, is working on an outsourcing project at Tahiti Tacos, a restaurant chain offering Polynexican cuisine: refried poi, anyone? Emma despises her boss Padmanabh, a brilliant but arrogant partner in GD Consulting. When Emma discovers His-Royal-Padness’s body (verdict: death by cricket bat), she becomes a suspect.With her overprotective family and her best friend Stacey providing endless support and advice, Emma stumbles her way through an investigation of Padmanabh’s murder, bolstered by fusion food feeding frenzies, endless cups of frou-frou coffee and serious surfing sessions. While Stacey knows a PI who owes her a favor, landlady Magda urges Emma to tart up her underwear drawer before the next cute cop with a search warrant arrives. Emma’s mother offers to fix her up with a PhD student at Berkeley and showers her with self-defense gizmos while her old lover Keoni beckons from Hawai’i. And everyone, even Shaun the barista, knows a good lawyer. Book 2, Denial of Service, is coming out this summer. * Given the rate of change in technology, today’s “thou shalts” are easily next year’s “buggy whip guidance.”

Read the article
CodePlex Daily Summary for Tuesday, November 20, 2012

CodePlex Daily Summary for Tuesday, November 20, 2012Popular ReleasesTwitter Bootstrap for SharePoint: Twitter Bootstrap Master Page for SharePoint 2010: Twitter Bootstrap Master Page for SharePoint 2010 Version 1.0 Beta by Liam Powell @LiamPowell87 for more information visit my blog http://www.LiamPowell.com .WSP file can be deployed to SharePoint using powershell .Rar file contains sourceJson.NET: Json.NET 4.5 Release 11: New feature - Added ITraceWriter, MemoryTraceWriter, DiagnosticsTraceWriter New feature - Added StringEscapeHandling with options to escape HTML and non-ASCII characters New feature - Added non-generic JToken.ToObject methods New feature - Deserialize ISet<T> properties as HashSet<T> New feature - Added implicit conversions for Uri, TimeSpan, Guid New feature - Missing byte, char, Guid, TimeSpan and Uri explicit conversion operators added to JToken New feature - Special case...HigLabo: HigLabo_20121119: HigLabo_2012111 --HigLabo.Mail-- Modify bug fix of ExecuteAppend method. Add ExecuteXList method to ImapClient class. --HigLabo.Net.WindowsLive-- Add AsyncCall to WindowsLiveClient class.mojoPortal: 2.3.9.4: see release notes on mojoportal.com http://www.mojoportal.com/mojoportal-2394-released Note that we have separate deployment packages for .NET 3.5 and .NET 4.0, but we recommend you to use .NET 4, we will probably drop support for .NET 3.5 once .NET 4.5 is available The deployment package downloads on this page are pre-compiled and ready for production deployment, they contain no C# source code and are not intended for use in Visual Studio. To download the source code see getting the lates...VidCoder: 1.4.6 Beta: Brought back the x264 advanced options panel due to popular demand. Thank you for all the feedback. x264 Preset/Profile/Tune/Level has been moved back to the Video tab, along with a copy of the "extra options" string. Added Fast Decode and Zero Latency checkboxes to support multiple Tunes. Added cropping option "None". Audio bitrates that are incompatible with the encoder (such as MP3 > 320 kbps) are no longer preset on the list. Fixed crash on opening VidCoder after de-selecting "re...DotNetNuke® Store: 03.01.07: What's New in this release? IMPORTANT: this version requires DotNetNuke 04.06.02 or higher! DO NOT REPORT BUGS HERE IN THE ISSUE TRACKER, INSTEAD USE THE DotNetNuke Store Forum! Bugs corrected: - Replaced some hard coded references to the default address provider classes by the corresponding interfaces to allow the creation of another address provider with a different name. New Features: - Added the 'pickup' delivery option at checkout. - Added the 'no delivery' option in the Store Admin ...Bundle Transformer - a modular extension for ASP.NET Web Optimization Framework: Bundle Transformer 1.6.10: Version: 1.6.10 Published: 11/18/2012 Now almost all of the Bundle Transformer's assemblies is signed (except BundleTransformer.Yui.dll); In BundleTransformer.SassAndScss the SassAndCoffee.Ruby library was replaced by my own implementation of the Sass- and SCSS-compiler (based on code of the SassAndCoffee.Ruby library version 2.0.2.0); In BundleTransformer.CoffeeScript added support of CoffeeScript version 1.4.0-3; In BundleTransformer.TypeScript added support of TypeScript version 0....ExtJS based ASP.NET 2.0 Controls: FineUI v3.2.0: +2012-11-18 v3.2.0 -?????????????????SelectedValueArray????????（◇?◆:）。 -???????????????????RecoverPropertiesFromJObject????（〓?〓、????、??、Vian_Pan）。 -????????????，?????????????，???SelectedValueArray???????（sam.chang）。 -??Alert.Show???????????（swtseaman）。 -???????????????，??Icon??IconUrl????（swtseaman）。 -?????????TimePicker（??）。 -?????????，??/res.axd?css=blue.css&v=1。 -????????，?????????????，???????。 -????MenuCheckBox（???????）。 -?RadioButton??AutoPostBack??。 -???????FCKEditor?????????...BugNET Issue Tracker: BugNET 1.2: Please read our release notes for BugNET 1.2: http://blog.bugnetproject.com/bugnet-1-2-has-been-released Please do not post questions as reviews. Questions should be posted in the Discussions tab, where they will usually get promptly responded to. If you post a question as a review, you will pollute the rating, and you won't get an answer.Paint.NET PSD Plugin: 2.2.0: Changes: Layer group visibility is now applied to all layers within the group. This greatly improves the visual fidelity of complex PSD files that have hidden layer groups. Layer group names are prefixed so that users can get an indication of the layer group hierarchy. (Paint.NET has a flat list of layers, so the hierarchy is flattened out on load.) The progress bar now reports status when saving PSD files, instead of showing an indeterminate rolling bar. Performance improvement of 1...CRM 2011 Visual Ribbon Editor: Visual Ribbon Editor (1.3.1116.7): [IMPROVED] Detailed error message descriptions for FaultException [FIX] Fixed bug in rule CrmOfflineAccessStateRule which had incorrect State attribute name [FIX] Fixed bug in rule EntityPropertyRule which was missing PropertyValue attribute [FIX] Current connection information was not displayed in status bar while refreshing list of entitiesSuper Metroid Randomizer: Super Metroid Randomizer v5: v5 -Added command line functionality for automation purposes. -Implented Krankdud's change to randomize the Etecoon's item. NOTE: this version will not accept seeds from a previous version. The seed format has changed by necessity. v4 -Started putting version numbers at the top of the form. -Added a warning when suitless Maridia is required in a parsed seed. v3 -Changed seed to only generate filename-legal characters. Using old seeds will still work exactly the same. -Files can now be saved...Caliburn Micro: WPF, Silverlight, WP7 and WinRT/Metro made easy.: Caliburn.Micro v1.4: Changes This version includes many bug fixes across all platforms, improvements to nuget support and...the biggest news of all...full support for both WinRT and WP8. Download Contents Debug and Release Assemblies Samples Readme.txt License.txt Packages Available on Nuget Caliburn.Micro – The full framework compiled into an assembly. Caliburn.Micro.Start - Includes Caliburn.Micro plus a starting bootstrapper, view model and view. Caliburn.Micro.Container – The Caliburn.Micro invers...DirectX Tool Kit: November 15, 2012: November 15, 2012 Added support for WIC2 when available on Windows 8 and Windows 7 with KB 2670838 Cleaned up warning level 4 warningsDotNetNuke® Community Edition CMS: 06.02.05: Major Highlights Updated the system so that it supports nested folders in the App_Code folder Updated the Global Error Handling so that when errors within the global.asax handler happen, they are caught and shown in a page displaying the original HTTP error code Fixed issue that stopped users from specifying Link URLs that open on a new window Security FixesFixed issue in the Member Directory module that could show members to non authenticated users Fixed issue in the Lists modul...fastJSON: v2.0.10: - added MonoDroid projectxUnit.net Contrib: xunitcontrib-resharper 0.7 (RS 7.1, 6.1.1): xunitcontrib release 0.6.1 (ReSharper runner) This release provides a test runner plugin for Resharper 7.1 RTM and 6.1.1, targetting all versions of xUnit.net. (See the xUnit.net project to download xUnit.net itself.) This release drops 7.0 support and targets the latest revisions of the last two major versions of ReSharper (namely 7.0 and 6.1.1). Copies of the plugin that support previous verions of ReSharper can be downloaded from this release. Also note that all builds work against ALL ...OnTopReplica: Release 3.4: Update to the 3 version with major fixes and improvements. Compatible with Windows 8. Now runs (and requires) .NET Framework v.4.0. Added relative mode for region selection (allows the user to select regions as margins from the borders of the thumbnail, useful for windows which have a variable size but fixed size controls, like video players). Improved window seeking when restoring cloned thumbnail or cloning a window by title or by class. Improved settings persistence. Improved co...DotSpatial: DotSpatial 1.4: This is a Minor Release. See the changes in the issue tracker. Minimal -- includes DotSpatial core and essential extensions Extended -- includes debugging symbols and additional extensions Tutorials are available. Just want to run the software? End user (non-programmer) version available branded as MapWindow Want to add your own feature? Develop a plugin, using the template and contribute to the extension feed (you can also write extensions that you distribute in other ways). Components ...WinRT XAML Toolkit: WinRT XAML Toolkit - 1.3.5: WinRT XAML Toolkit based on the Windows 8 RTM SDK. Download the latest source from the SOURCE CODE page. For compiled version use NuGet. You can add it to your project in Visual Studio by going to View/Other Windows/Package Manager Console and entering: PM> Install-Package winrtxamltoolkit Features Attachable Behaviors AwaitableUI extensions Controls Converters Debugging helpers Extension methods Imaging helpers IO helpers VisualTree helpers Samples Recent changes Docum...New ProjectsAzzeton: azzetonBadminton: Source codeBitFox Expression Evaluator: Integrate evaluation of expressions wrote in fox language into your app. Part of BITFOX, a project to help in migration of Visual Foxpro apps to .NET world.Brunch: Brunch is a Visual Studio 2008 add-in that shows the name of a code branch in a toolbar that you can place wherever you want. You'll no longer have to inspect the path of a file in your project to find out which branch you are working in.CapturePoint365 - an Office 365 extension to Cropper: CapturePoint365 - an Office 365 extension to Cropper utility. CLR Profiler: Provides downloads for those who want to use a profiler of managed code, and those who want to write a profiler of managed code.CMS KickStart: Working on building a Content Management System.cricketcodeplex: uploading projectDanish Language Pack for Community Server: Danish Language Packs for Community Server 2.1 and 2007. At its inception, this project contains quite incomplete translations to Danish. This project provides a common resource where all interested parties can gradually improve on this language pack. Please join to improve the contents. The source code tree contains two major folders: One for Community Server 2.1 and one for Community Server 2007.Data Workflow Activities: OData and SQL Server Workflow Activities and DesignersDelegateMock: DelegateMock is C# library for mocking and stubbing delegates.DevMango: MongoDB ToolEasyIADP Application Component: EasyIADP application component is built for application which want to integrate to Intel AppUp Center. Enterprise Library Logging Dynamics Crm 2011 Trace Listener: Enterprise Library Trace Listener that writes to Microsoft Dynamics CRM 2011, formatting the output with an ILogFormatterEPiServer Customizable Page Reference Properties: Customizable PageReference and LinkCollection properties for EPiServer. Allows to easily setup root page and available types for selecting. This project uses cool <a href="http://episerverfpr.codeplex.com">Filtered Page Reference</a> library written by <a href="http://world.episerver.com/Blogs/Lee-Crowe">Lee Crowe</a>. For more information look at this blog post: <a href="http://dotnetcake.blogspot.com/2011/08/episerver-filtered-page-reference-easy.html">EPiServer Filtered Page Referen...exSnake: exSnake is a C# version of the classic game Snake.Friday Shopping: Windows Mobile applicationHIC Projects Home: HIC's central public location for source control, file collaboration and project management.Infinite DoWork: Example of an infinite loop.Kinect n Touch to WWT: Using Kinect and Touch devices with WWT - worldwide telescopeKoka: Koka is a function-oriented strongly typed language that separates pure values from side-effecting computations. Laboratório de Engenharia de Software - Projeto: Criado para estudar e aplicar novas tecnologias web.M26WC - Mono 2.6 Wizard Control: Wizard which runs under Mono2.6 A fork of: http://aerowizard.codeplex.com/Mercado seguridad: Este es mi summaryMVC 4 Web d?t tour du l?ch: Web d?t tour du l?ch mvc 4ObjectMerger: ObjectMerger is a class library with extension methods for merging two different objects of the same (generic) type. It's developed in C#. omr.selector.js: Easy dom selectorORM-Micro: ORM-Micro Easiest and fastest Micro ORM, you've got the queries, you've got the objects, take the best of two worlds !Primary5choo1: ????????DEMOProject1327: wdPubSync: Visual Studio's publishing solutions are slow and somewhat unreliable. PubSync is my solution to these issues. Rhythm Comet: Jogo em C# utilizando o framework XNARuntime Hello Worlds: Runtime Hello Worlds is a very simple project demonstrating a number of ways of implementing "Hello, World" in C#/.NET 4.0 in runtime generated and/or bound code.SeguridadDeSistemas: sumarySercury: ????????，????WCF??????????????。Setup Project Tuner: A VisualStudio addin that gives you some additional views on your setup project (.vdproj) filesSimple Sales Tracking CRM API Wrapper: The Simple Sales Tracking API Wrapper, enables easy extention development and integration with the hosted service at http://www.simplesalestracking.comSkincaretips: Skincare tips coding shownSQScriptRunner: Simple Quick Script Runner allows an administrator to run T-SQL Scripts against one or more servers with common characteristics. For example, an maintenance script might be targeted at a list of mirrored servers, or a list of computers running SQL Express.SystemHelperLibrary: Some helper classesTask Manager Nuke: How to write a dotnetnuke moduleThe TVDB API: Library to utilise The TVDB API.Thermo: This is the software developed for my PhD. It's about computing thermal properties of materials using first principles quantum mechanics.tool projects: Tool Projectstourism: updateVirtualPoSH Script Repository: This is the VirtualPoSH Script Repository!Wheel of Jeopardy: This is the repository for Wheel of Jeopardy project for Software Engineering 605.401.82 SU10 class.

Read the article
Toorcon14

- by danx

Toorcon 2012 Information Security Conference San Diego, CA, http://www.toorcon.org/ Dan Anderson, October 2012 It's almost Halloween, and we all know what that means—yes, of course, it's time for another Toorcon Conference! Toorcon is an annual conference for people interested in computer security. This includes the whole range of hackers, computer hobbyists, professionals, security consultants, press, law enforcement, prosecutors, FBI, etc. We're at Toorcon 14—see earlier blogs for some of the previous Toorcon's I've attended (back to 2003). This year's "con" was held at the Westin on Broadway in downtown San Diego, California. The following are not necessarily my views—I'm just the messenger—although I could have misquoted or misparaphrased the speakers. Also, I only reviewed some of the talks, below, which I attended and interested me. MalAndroid—the Crux of Android Infections, Aditya K. Sood Programming Weird Machines with ELF Metadata, Rebecca "bx" Shapiro Privacy at the Handset: New FCC Rules?, Valkyrie Hacking Measured Boot and UEFI, Dan Griffin You Can't Buy Security: Building the Open Source InfoSec Program, Boris Sverdlik What Journalists Want: The Investigative Reporters' Perspective on Hacking, Dave Maas & Jason Leopold Accessibility and Security, Anna Shubina Stop Patching, for Stronger PCI Compliance, Adam Brand McAfee Secure & Trustmarks — a Hacker's Best Friend, Jay James & Shane MacDougall MalAndroid—the Crux of Android Infections Aditya K. Sood, IOActive, Michigan State PhD candidate Aditya talked about Android smartphone malware. There's a lot of old Android software out there—over 50% Gingerbread (2.3.x)—and most have unpatched vulnerabilities. Of 9 Android vulnerabilities, 8 have known exploits (such as the old Gingerbread Global Object Table exploit). Android protection includes sandboxing, security scanner, app permissions, and screened Android app market. The Android permission checker has fine-grain resource control, policy enforcement. Android static analysis also includes a static analysis app checker (bouncer), and a vulnerablity checker. What security problems does Android have? User-centric security, which depends on the user to grant permission and make smart decisions. But users don't care or think about malware (the're not aware, not paranoid). All they want is functionality, extensibility, mobility Android had no "proper" encryption before Android 3.0 No built-in protection against social engineering and web tricks Alternative Android app markets are unsafe. Simply visiting some markets can infect Android Aditya classified Android Malware types as: Type A—Apps. These interact with the Android app framework. For example, a fake Netflix app. Or Android Gold Dream (game), which uploads user files stealthy manner to a remote location. Type K—Kernel. Exploits underlying Linux libraries or kernel Type H—Hybrid. These use multiple layers (app framework, libraries, kernel). These are most commonly used by Android botnets, which are popular with Chinese botnet authors What are the threats from Android malware? These incude leak info (contacts), banking fraud, corporate network attacks, malware advertising, malware "Hackivism" (the promotion of social causes. For example, promiting specific leaders of the Tunisian or Iranian revolutions. Android malware is frequently "masquerated". That is, repackaged inside a legit app with malware. To avoid detection, the hidden malware is not unwrapped until runtime. The malware payload can be hidden in, for example, PNG files. Less common are Android bootkits—there's not many around. What they do is hijack the Android init framework—alteering system programs and daemons, then deletes itself. For example, the DKF Bootkit (China). Android App Problems: no code signing! all self-signed native code execution permission sandbox — all or none alternate market places no robust Android malware detection at network level delayed patch process Programming Weird Machines with ELF Metadata Rebecca "bx" Shapiro, Dartmouth College, NH https://github.com/bx/elf-bf-tools @bxsays on twitter Definitions. "ELF" is an executable file format used in linking and loading executables (on UNIX/Linux-class machines). "Weird machine" uses undocumented computation sources (I think of them as unintended virtual machines). Some examples of "weird machines" are those that: return to weird location, does SQL injection, corrupts the heap. Bx then talked about using ELF metadata as (an uintended) "weird machine". Some ELF background: A compiler takes source code and generates a ELF object file (hello.o). A static linker makes an ELF executable from the object file. A runtime linker and loader takes ELF executable and loads and relocates it in memory. The ELF file has symbols to relocate functions and variables. ELF has two relocation tables—one at link time and another one at loading time: .rela.dyn (link time) and .dynsym (dynamic table). GOT: Global Offset Table of addresses for dynamically-linked functions. PLT: Procedure Linkage Tables—works with GOT. The memory layout of a process (not the ELF file) is, in order: program (+ heap), dynamic libraries, libc, ld.so, stack (which includes the dynamic table loaded into memory) For ELF, the "weird machine" is found and exploited in the loader. ELF can be crafted for executing viruses, by tricking runtime into executing interpreted "code" in the ELF symbol table. One can inject parasitic "code" without modifying the actual ELF code portions. Think of the ELF symbol table as an "assembly language" interpreter. It has these elements: instructions: Add, move, jump if not 0 (jnz) Think of symbol table entries as "registers" symbol table value is "contents" immediate values are constants direct values are addresses (e.g., 0xdeadbeef) move instruction: is a relocation table entry add instruction: relocation table "addend" entry jnz instruction: takes multiple relocation table entries The ELF weird machine exploits the loader by relocating relocation table entries. The loader will go on forever until told to stop. It stores state on stack at "end" and uses IFUNC table entries (containing function pointer address). The ELF weird machine, called "Brainfu*k" (BF) has: 8 instructions: pointer inc, dec, inc indirect, dec indirect, jump forward, jump backward, print. Three registers - 3 registers Bx showed example BF source code that implemented a Turing machine printing "hello, world". More interesting was the next demo, where bx modified ping. Ping runs suid as root, but quickly drops privilege. BF modified the loader to disable the library function call dropping privilege, so it remained as root. Then BF modified the ping -t argument to execute the -t filename as root. It's best to show what this modified ping does with an example: $ whoami bx $ ping localhost -t backdoor.sh # executes backdoor $ whoami root $ The modified code increased from 285948 bytes to 290209 bytes. A BF tool compiles "executable" by modifying the symbol table in an existing ELF executable. The tool modifies .dynsym and .rela.dyn table, but not code or data. Privacy at the Handset: New FCC Rules? "Valkyrie" (Christie Dudley, Santa Clara Law JD candidate) Valkyrie talked about mobile handset privacy. Some background: Senator Franken (also a comedian) became alarmed about CarrierIQ, where the carriers track their customers. Franken asked the FCC to find out what obligations carriers think they have to protect privacy. The carriers' response was that they are doing just fine with self-regulation—no worries! Carriers need to collect data, such as missed calls, to maintain network quality. But carriers also sell data for marketing. Verizon sells customer data and enables this with a narrow privacy policy (only 1 month to opt out, with difficulties). The data sold is not individually identifiable and is aggregated. But Verizon recommends, as an aggregation workaround to "recollate" data to other databases to identify customers indirectly. The FCC has regulated telephone privacy since 1934 and mobile network privacy since 2007. Also, the carriers say mobile phone privacy is a FTC responsibility (not FCC). FTC is trying to improve mobile app privacy, but FTC has no authority over carrier / customer relationships. As a side note, Apple iPhones are unique as carriers have extra control over iPhones they don't have with other smartphones. As a result iPhones may be more regulated. Who are the consumer advocates? Everyone knows EFF, but EPIC (Electrnic Privacy Info Center), although more obsecure, is more relevant. What to do? Carriers must be accountable. Opt-in and opt-out at any time. Carriers need incentive to grant users control for those who want it, by holding them liable and responsible for breeches on their clock. Location information should be added current CPNI privacy protection, and require "Pen/trap" judicial order to obtain (and would still be a lower standard than 4th Amendment). Politics are on a pro-privacy swing now, with many senators and the Whitehouse. There will probably be new regulation soon, and enforcement will be a problem, but consumers will still have some benefit. Hacking Measured Boot and UEFI Dan Griffin, JWSecure, Inc., Seattle, @JWSdan Dan talked about hacking measured UEFI boot. First some terms: UEFI is a boot technology that is replacing BIOS (has whitelisting and blacklisting). UEFI protects devices against rootkits. TPM - hardware security device to store hashs and hardware-protected keys "secure boot" can control at firmware level what boot images can boot "measured boot" OS feature that tracks hashes (from BIOS, boot loader, krnel, early drivers). "remote attestation" allows remote validation and control based on policy on a remote attestation server. Microsoft pushing TPM (Windows 8 required), but Google is not. Intel TianoCore is the only open source for UEFI. Dan has Measured Boot Tool at http://mbt.codeplex.com/ with a demo where you can also view TPM data. TPM support already on enterprise-class machines. UEFI Weaknesses. UEFI toolkits are evolving rapidly, but UEFI has weaknesses: assume user is an ally trust TPM implicitly, and attached to computer hibernate file is unprotected (disk encryption protects against this) protection migrating from hardware to firmware delays in patching and whitelist updates will UEFI really be adopted by the mainstream (smartphone hardware support, bank support, apathetic consumer support) You Can't Buy Security: Building the Open Source InfoSec Program Boris Sverdlik, ISDPodcast.com co-host Boris talked about problems typical with current security audits. "IT Security" is an oxymoron—IT exists to enable buiness, uptime, utilization, reporting, but don't care about security—IT has conflict of interest. There's no Magic Bullet ("blinky box"), no one-size-fits-all solution (e.g., Intrusion Detection Systems (IDSs)). Regulations don't make you secure. The cloud is not secure (because of shared data and admin access). Defense and pen testing is not sexy. Auditors are not solution (security not a checklist)—what's needed is experience and adaptability—need soft skills. Step 1: First thing is to Google and learn the company end-to-end before you start. Get to know the management team (not IT team), meet as many people as you can. Don't use arbitrary values such as CISSP scores. Quantitive risk assessment is a myth (e.g. AV*EF-SLE). Learn different Business Units, legal/regulatory obligations, learn the business and where the money is made, verify company is protected from script kiddies (easy), learn sensitive information (IP, internal use only), and start with low-hanging fruit (customer service reps and social engineering). Step 2: Policies. Keep policies short and relevant. Generic SANS "security" boilerplate policies don't make sense and are not followed. Focus on acceptable use, data usage, communications, physical security. Step 3: Implementation: keep it simple stupid. Open source, although useful, is not free (implementation cost). Access controls with authentication & authorization for local and remote access. MS Windows has it, otherwise use OpenLDAP, OpenIAM, etc. Application security Everyone tries to reinvent the wheel—use existing static analysis tools. Review high-risk apps and major revisions. Don't run different risk level apps on same system. Assume host/client compromised and use app-level security control. Network security VLAN != segregated because there's too many workarounds. Use explicit firwall rules, active and passive network monitoring (snort is free), disallow end user access to production environment, have a proxy instead of direct Internet access. Also, SSL certificates are not good two-factor auth and SSL does not mean "safe." Operational Controls Have change, patch, asset, & vulnerability management (OSSI is free). For change management, always review code before pushing to production For logging, have centralized security logging for business-critical systems, separate security logging from administrative/IT logging, and lock down log (as it has everything). Monitor with OSSIM (open source). Use intrusion detection, but not just to fulfill a checkbox: build rules from a whitelist perspective (snort). OSSEC has 95% of what you need. Vulnerability management is a QA function when done right: OpenVas and Seccubus are free. Security awareness The reality is users will always click everything. Build real awareness, not compliance driven checkbox, and have it integrated into the culture. Pen test by crowd sourcing—test with logging COSSP http://www.cossp.org/ - Comprehensive Open Source Security Project What Journalists Want: The Investigative Reporters' Perspective on Hacking Dave Maas, San Diego CityBeat Jason Leopold, Truthout.org The difference between hackers and investigative journalists: For hackers, the motivation varies, but method is same, technological specialties. For investigative journalists, it's about one thing—The Story, and they need broad info-gathering skills. J-School in 60 Seconds: Generic formula: Person or issue of pubic interest, new info, or angle. Generic criteria: proximity, prominence, timeliness, human interest, oddity, or consequence. Media awareness of hackers and trends: journalists becoming extremely aware of hackers with congressional debates (privacy, data breaches), demand for data-mining Journalists, use of coding and web development for Journalists, and Journalists busted for hacking (Murdock). Info gathering by investigative journalists include Public records laws. Federal Freedom of Information Act (FOIA) is good, but slow. California Public Records Act is a lot stronger. FOIA takes forever because of foot-dragging—it helps to be specific. Often need to sue (especially FBI). CPRA is faster, and requests can be vague. Dumps and leaks (a la Wikileaks) Journalists want: leads, protecting ourselves, our sources, and adapting tools for news gathering (Google hacking). Anonomity is important to whistleblowers. They want no digital footprint left behind (e.g., email, web log). They don't trust encryption, want to feel safe and secure. Whistleblower laws are very weak—there's no upside for whistleblowers—they have to be very passionate to do it. Accessibility and Security or: How I Learned to Stop Worrying and Love the Halting Problem Anna Shubina, Dartmouth College Anna talked about how accessibility and security are related. Accessibility of digital content (not real world accessibility). mostly refers to blind users and screenreaders, for our purpose. Accessibility is about parsing documents, as are many security issues. "Rich" executable content causes accessibility to fail, and often causes security to fail. For example MS Word has executable format—it's not a document exchange format—more dangerous than PDF or HTML. Accessibility is often the first and maybe only sanity check with parsing. They have no choice because someone may want to read what you write. Google, for example, is very particular about web browser you use and are bad at supporting other browsers. Uses JavaScript instead of links, often requiring mouseover to display content. PDF is a security nightmare. Executible format, embedded flash, JavaScript, etc. 15 million lines of code. Google Chrome doesn't handle PDF correctly, causing several security bugs. PDF has an accessibility checker and PDF tagging, to help with accessibility. But no PDF checker checks for incorrect tags, untagged content, or validates lists or tables. None check executable content at all. The "Halting Problem" is: can one decide whether a program will ever stop? The answer, in general, is no (Rice's theorem). The same holds true for accessibility checkers. Language-theoretic Security says complicated data formats are hard to parse and cannot be solved due to the Halting Problem. W3C Web Accessibility Guidelines: "Perceivable, Operable, Understandable, Robust" Not much help though, except for "Robust", but here's some gems: * all information should be parsable (paraphrasing) * if not parsable, cannot be converted to alternate formats * maximize compatibility in new document formats Executible webpages are bad for security and accessibility. They say it's for a better web experience. But is it necessary to stuff web pages with JavaScript for a better experience? A good example is The Drudge Report—it has hand-written HTML with no JavaScript, yet drives a lot of web traffic due to good content. A bad example is Google News—hidden scrollbars, guessing user input. Solutions: Accessibility and security problems come from same source Expose "better user experience" myth Keep your corner of Internet parsable Remember "Halting Problem"—recognize false solutions (checking and verifying tools) Stop Patching, for Stronger PCI Compliance Adam Brand, protiviti @adamrbrand, http://www.picfun.com/ Adam talked about PCI compliance for retail sales. Take an example: for PCI compliance, 50% of Brian's time (a IT guy), 960 hours/year was spent patching POSs in 850 restaurants. Often applying some patches make no sense (like fixing a browser vulnerability on a server). "Scanner worship" is overuse of vulnerability scanners—it gives a warm and fuzzy and it's simple (red or green results—fix reds). Scanners give a false sense of security. In reality, breeches from missing patches are uncommon—more common problems are: default passwords, cleartext authentication, misconfiguration (firewall ports open). Patching Myths: Myth 1: install within 30 days of patch release (but PCI Â§6.1 allows a "risk-based approach" instead). Myth 2: vendor decides what's critical (also PCI Â§6.1). But Â§6.2 requires user ranking of vulnerabilities instead. Myth 3: scan and rescan until it passes. But PCI Â§11.2.1b says this applies only to high-risk vulnerabilities. Adam says good recommendations come from NIST 800-40. Instead use sane patching and focus on what's really important. From NIST 800-40: Proactive: Use a proactive vulnerability management process: use change control, configuration management, monitor file integrity. Monitor: start with NVD and other vulnerability alerts, not scanner results. Evaluate: public-facing system? workstation? internal server? (risk rank) Decide:on action and timeline Test: pre-test patches (stability, functionality, rollback) for change control Install: notify, change control, tickets McAfee Secure & Trustmarks — a Hacker's Best Friend Jay James, Shane MacDougall, Tactical Intelligence Inc., Canada "McAfee Secure Trustmark" is a website seal marketed by McAfee. A website gets this badge if they pass their remote scanning. The problem is a removal of trustmarks act as flags that you're vulnerable. Easy to view status change by viewing McAfee list on website or on Google. "Secure TrustGuard" is similar to McAfee. Jay and Shane wrote Perl scripts to gather sites from McAfee and search engines. If their certification image changes to a 1x1 pixel image, then they are longer certified. Their scripts take deltas of scans to see what changed daily. The bottom line is change in TrustGuard status is a flag for hackers to attack your site. Entire idea of seals is silly—you're raising a flag saying if you're vulnerable.

Read the article
Guidance: A Branching strategy for Scrum Teams

- by Martin Hinshelwood

Having a good branching strategy will save your bacon, or at least your code. Be careful when deviating from your branching strategy because if you do, you may be worse off than when you started! This is one possible branching strategy for Scrum teams and I will not be going in depth with Scrum but you can find out more about Scrum by reading the Scrum Guide and you can even assess your Scrum knowledge by having a go at the Scrum Open Assessment. You can also read SSW’s Rules to Better Scrum using TFS which have been developed during our own Scrum implementations. Acknowledgements Bill Heys – Bill offered some good feedback on this post and helped soften the language. Note: Bill is a VS ALM Ranger and co-wrote the Branching Guidance for TFS 2010 Willy-Peter Schaub – Willy-Peter is an ex Visual Studio ALM MVP turned blue badge and has been involved in most of the guidance including the Branching Guidance for TFS 2010 Chris Birmele – Chris wrote some of the early TFS Branching and Merging Guidance. Dr Paul Neumeyer, Ph.D Parallel Processes, ScrumMaster and SSW Solution Architect – Paul wanted to have feature branches coming from the release branch as well. We agreed that this is really a spin-off that needs own project, backlog, budget and Team. Scenario: A product is developed RTM 1.0 is released and gets great sales. Extra features are demanded but the new version will have double to price to pay to recover costs, work is approved by the guys with budget and a few sprints later RTM 2.0 is released. Sales a very low due to the pricing strategy. There are lots of clients on RTM 1.0 calling out for patches. As I keep getting Reverse Integration and Forward Integration mixed up and Bill keeps slapping my wrists I thought I should have a reminder: You still seemed to use reverse and/or forward integration in the wrong context. I would recommend reviewing your document at the end to ensure that it agrees with the common understanding of these terms merge (forward integration) from parent to child (same direction as the branch), and merge (reverse integration) from child to parent (the reverse direction of the branch). - one of my many slaps on the wrist from Bill Heys. As I mentioned previously we are using a single feature branching strategy in our current project. The single biggest mistake developers make is developing against the “Main” or “Trunk” line. This ultimately leads to messy code as things are added and never finished. Your only alternative is to NEVER check in unless your code is 100%, but this does not work in practice, even with a single developer. Your ADD will kick in and your half-finished code will be finished enough to pass the build and the tests. You do use builds don’t you? Sadly, this is a very common scenario and I have had people argue that branching merely adds complexity. Then again I have seen the other side of the universe ... branching structures from he... We should somehow convince everyone that there is a happy between no-branching and too-much-branching. - Willy-Peter Schaub, VS ALM Ranger, Microsoft A key benefit of branching for development is to isolate changes from the stable Main branch. Branching adds sanity more than it adds complexity. We do try to stress in our guidance that it is important to justify a branch, by doing a cost benefit analysis. The primary cost is the effort to do merges and resolve conflicts. A key benefit is that you have a stable code base in Main and accept changes into Main only after they pass quality gates, etc. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft The second biggest mistake developers make is branching anything other than the WHOLE “Main” line. If you branch parts of your code and not others it gets out of sync and can make integration a nightmare. You should have your Source, Assets, Build scripts deployment scripts and dependencies inside the “Main” folder and branch the whole thing. Some departments within MSFT even go as far as to add the environments used to develop the product in there as well; although I would not recommend that unless you have a massive SQL cluster to house your source code. We tried the “add environment” back in South-Africa and while it was “phenomenal”, especially when having to switch between environments, the disk storage and processing requirements killed us. We opted for virtualization to skin this cat of keeping a ready-to-go environment handy. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I think people often think that you should have separate branches for separate environments (e.g. Dev, Test, Integration Test, QA, etc.). I prefer to think of deploying to environments (such as from Main to QA) rather than branching for QA). - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft You can read about SSW’s Rules to better Source Control for some additional information on what Source Control to use and how to use it. There are also a number of branching Anti-Patterns that should be avoided at all costs: You know you are on the wrong track if you experience one or more of the following symptoms in your development environment: Merge Paranoia—avoiding merging at all cost, usually because of a fear of the consequences. Merge Mania—spending too much time merging software assets instead of developing them. Big Bang Merge—deferring branch merging to the end of the development effort and attempting to merge all branches simultaneously. Never-Ending Merge—continuous merging activity because there is always more to merge. Wrong-Way Merge—merging a software asset version with an earlier version. Branch Mania—creating many branches for no apparent reason. Cascading Branches—branching but never merging back to the main line. Mysterious Branches—branching for no apparent reason. Temporary Branches—branching for changing reasons, so the branch becomes a permanent temporary workspace. Volatile Branches—branching with unstable software assets shared by other branches or merged into another branch. Note Branches are volatile most of the time while they exist as independent branches. That is the point of having them. The difference is that you should not share or merge branches while they are in an unstable state. Development Freeze—stopping all development activities while branching, merging, and building new base lines. Berlin Wall—using branches to divide the development team members, instead of dividing the work they are performing. -Branching and Merging Primer by Chris Birmele - Developer Tools Technical Specialist at Microsoft Pty Ltd in Australia In fact, this can result in a merge exercise no-one wants to be involved in, merging hundreds of thousands of change sets and trying to get a consolidated build. Again, we need to find a happy medium. - Willy-Peter Schaub on Merge Paranoia Merge conflicts are generally the result of making changes to the same file in both the target and source branch. If you create merge conflicts, you will eventually need to resolve them. Often the resolution is manual. Merging more frequently allows you to resolve these conflicts close to when they happen, making the resolution clearer. Waiting weeks or months to resolve them, the Big Bang approach, means you are more likely to resolve conflicts incorrectly. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Main line, this is where your stable code lives and where any build has known entities, always passes and has a happy test that passes as well? Many development projects consist of, a single “Main” line of source and artifacts. This is good; at least there is source control . There are however a couple of issues that need to be considered. What happens if: you and your team are working on a new set of features and the customer wants a change to his current version? you are working on two features and the customer decides to abandon one of them? you have two teams working on different feature sets and their changes start interfering with each other? I just use labels instead of branches? That's a lot of “what if’s”, but there is a simple way of preventing this. Branching… In TFS, labels are not immutable. This does not mean they are not useful. But labels do not provide a very good development isolation mechanism. Branching allows separate code sets to evolve separately (e.g. Current with hotfixes, and vNext with new development). I don’t see how labels work here. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Creating a single feature branch means you can isolate the development work on that branch. Its standard practice for large projects with lots of developers to use Feature branching and you can check the Branching Guidance for the latest recommendations from the Visual Studio ALM Rangers for other methods. In the diagram above you can see my recommendation for branching when using Scrum development with TFS 2010. It consists of a single Sprint branch to contain all the changes for the current sprint. The main branch has the permissions changes so contributors to the project can only Branch and Merge with “Main”. This will prevent accidental check-ins or checkouts of the “Main” line that would contaminate the code. The developers continue to develop on sprint one until the completion of the sprint. Note: In the real world, starting a new Greenfield project, this process starts at Sprint 2 as at the start of Sprint 1 you would have artifacts in version control and no need for isolation. Figure: Once the sprint is complete the Sprint 1 code can then be merged back into the Main line. There are always good practices to follow, and one is to always do a Forward Integration from Main into Sprint 1 before you do a Reverse Integration from Sprint 1 back into Main. In this case it may seem superfluous, but this builds good muscle memory into your developer’s work ethic and means that no bad habits are learned that would interfere with additional Scrum Teams being added to the Product. The process of completing your sprint development: The Team completes their work according to their definition of done. Merge from “Main” into “Sprint1” (Forward Integration) Stabilize your code with any changes coming from other Scrum Teams working on the same product. If you have one Scrum Team this should be quick, but there may have been bug fixes in the Release branches. (we will talk about release branches later) Merge from “Sprint1” into “Main” to commit your changes. (Reverse Integration) Check-in Delete the Sprint1 branch Note: The Sprint 1 branch is no longer required as its useful life has been concluded. Check-in Done But you are not yet done with the Sprint. The goal in Scrum is to have a “potentially shippable product” at the end of every Sprint, and we do not have that yet, we only have finished code. Figure: With Sprint 1 merged you can create a Release branch and run your final packaging and testing In 99% of all projects I have been involved in or watched, a “shippable product” only happens towards the end of the overall lifecycle, especially when sprints are short. The in-between releases are great demonstration releases, but not shippable. Perhaps it comes from my 80’s brain washing that we only ship when we reach the agreed quality and business feature bar. - Willy-Peter Schaub, VS ALM Ranger, Microsoft Although you should have been testing and packaging your code all the way through your Sprint 1 development, preferably using an automated process, you still need to test and package with stable unchanging code. This is where you do what at SSW we call a “Test Please”. This is first an internal test of the product to make sure it meets the needs of the customer and you generally use a resource external to your Team. Then a “Test Please” is conducted with the Product Owner to make sure he is happy with the output. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: If you find a deviation from the expected result you fix it on the Release branch. If during your final testing or your “Test Please” you find there are issues or bugs then you should fix them on the release branch. If you can’t fix them within the time box of your Sprint, then you will need to create a Bug and put it onto the backlog for prioritization by the Product owner. Make sure you leave plenty of time between your merge from the development branch to find and fix any problems that are uncovered. This process is commonly called Stabilization and should always be conducted once you have completed all of your User Stories and integrated all of your branches. Even once you have stabilized and released, you should not delete the release branch as you would with the Sprint branch. It has a usefulness for servicing that may extend well beyond the limited life you expect of it. Note: Don't get forced by the business into adding features into a Release branch instead that indicates the unspoken requirement is that they are asking for a product spin-off. In this case you can create a new Team Project and branch from the required Release branch to create a new Main branch for that product. And you create a whole new backlog to work from. Figure: When the Team decides it is happy with the product you can create a RTM branch. Once you have fixed all the bugs you can, and added any you can’t to the Product Backlog, and you Team is happy with the result you can create a Release. This would consist of doing the final Build and Packaging it up ready for your Sprint Review meeting. You would then create a read-only branch that represents the code you “shipped”. This is really an Audit trail branch that is optional, but is good practice. You could use a Label, but Labels are not Auditable and if a dispute was raised by the customer you can produce a verifiable version of the source code for an independent party to check. Rare I know, but you do not want to be at the wrong end of a legal battle. Like the Release branch the RTM branch should never be deleted, or only deleted according to your companies legal policy, which in the UK is usually 7 years. Figure: If you have made any changes in the Release you will need to merge back up to Main in order to finalise the changes. Nothing is really ever done until it is in Main. The same rules apply when merging any fixes in the Release branch back into Main and you should do a reverse merge before a forward merge, again for the muscle memory more than necessity at this stage. Your Sprint is now nearly complete, and you can have a Sprint Review meeting knowing that you have made every effort and taken every precaution to protect your customer’s investment. Note: In order to really achieve protection for both you and your client you would add Automated Builds, Automated Tests, Automated Acceptance tests, Acceptance test tracking, Unit Tests, Load tests, Web test and all the other good engineering practices that help produce reliable software. Figure: After the Sprint Planning meeting the process begins again. Where the Sprint Review and Retrospective meetings mark the end of the Sprint, the Sprint Planning meeting marks the beginning. After you have completed your Sprint Planning and you know what you are trying to achieve in Sprint 2 you can create your new Branch to develop in. How do we handle a bug(s) in production that can’t wait? Although in Scrum the only work done should be on the backlog there should be a little buffer added to the Sprint Planning for contingencies. One of these contingencies is a bug in the current release that can’t wait for the Sprint to finish. But how do you handle that? Willy-Peter Schaub asked an excellent question on the release activities: In reality Sprint 2 starts when sprint 1 ends + weekend. Should we not cater for a possible parallelism between Sprint 2 and the release activities of sprint 1? It would introduce FI’s from main to sprint 2, I guess. Your “Figure: Merging print 2 back into Main.” covers, what I tend to believe to be reality in most cases. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I agree, and if you have a single Scrum team then your resources are limited. The Scrum Team is responsible for packaging and release, so at least one run at stabilization, package and release should be included in the Sprint time box. If more are needed on the current production release during the Sprint 2 time box then resource needs to be pulled from Sprint 2. The Product Owner and the Team have four choices (in order of disruption/cost): Backlog: Add the bug to the backlog and fix it in the next Sprint Buffer Time: Use any buffer time included in the current Sprint to fix the bug quickly Make time: Remove a Story from the current Sprint that is of equal value to the time lost fixing the bug(s) and releasing. Note: The Team must agree that it can still meet the Sprint Goal. Cancel Sprint: Cancel the sprint and concentrate all resource on fixing the bug(s) Note: This can be a very costly if the current sprint has already had a lot of work completed as it will be lost. The choice will depend on the complexity and severity of the bug(s) and both the Product Owner and the Team need to agree. In this case we will go with option #2 or #3 as they are uncomplicated but severe bugs. Figure: Real world issue where a bug needs fixed in the current release. If the bug(s) is urgent enough then then your only option is to fix it in place. You can edit the release branch to find and fix the bug, hopefully creating a test so it can’t happen again. Follow the prior process and conduct an internal and customer “Test Please” before releasing. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: After you have fixed the bug you need to ship again. You then need to again create an RTM branch to hold the version of the code you released in escrow. Figure: Main is now out of sync with your Release. We now need to get these new changes back up into the Main branch. Do a reverse and then forward merge again to get the new code into Main. But what about the branch, are developers not working on Sprint 2? Does Sprint 2 now have changes that are not in Main and Main now have changes that are not in Sprint 2? Well, yes… and this is part of the hit you take doing branching. But would this scenario even have been possible without branching? Figure: Getting the changes in Main into Sprint 2 is very important. The Team now needs to do a Forward Integration merge into their Sprint and resolve any conflicts that occur. Maybe the bug has already been fixed in Sprint 2, maybe the bug no longer exists! This needs to be identified and resolved by the developers before they continue to get further out of Sync with Main. Note: Avoid the “Big bang merge” at all costs. Figure: Merging Sprint 2 back into Main, the Forward Integration, and R0 terminates. Sprint 2 now merges (Reverse Integration) back into Main following the procedures we have already established. Figure: The logical conclusion. This then allows the creation of the next release. By now you should be getting the big picture and hopefully you learned something useful from this post. I know I have enjoyed writing it as I find these exploratory posts coupled with real world experience really help harden my understanding. Branching is a tool; it is not a silver bullet. Don’t over use it, and avoid “Anti-Patterns” where possible. Although the diagram above looks complicated I hope showing you how it is formed simplifies it as much as possible. Technorati Tags: Branching,Scrum,VS ALM,TFS 2010,VS2010

Read the article
How John Got 15x Improvement Without Really Trying

- by rchrd

The following article was published on a Sun Microsystems website a number of years ago by John Feo. It is still useful and worth preserving. So I'm republishing it here. How I Got 15x Improvement Without Really Trying John Feo, Sun Microsystems Taking ten "personal" program codes used in scientific and engineering research, the author was able to get from 2 to 15 times performance improvement easily by applying some simple general optimization techniques. Introduction Scientific research based on computer simulation depends on the simulation for advancement. The research can advance only as fast as the computational codes can execute. The codes' efficiency determines both the rate and quality of results. In the same amount of time, a faster program can generate more results and can carry out a more detailed simulation of physical phenomena than a slower program. Highly optimized programs help science advance quickly and insure that monies supporting scientific research are used as effectively as possible. Scientific computer codes divide into three broad categories: ISV, community, and personal. ISV codes are large, mature production codes developed and sold commercially. The codes improve slowly over time both in methods and capabilities, and they are well tuned for most vendor platforms. Since the codes are mature and complex, there are few opportunities to improve their performance solely through code optimization. Improvements of 10% to 15% are typical. Examples of ISV codes are DYNA3D, Gaussian, and Nastran. Community codes are non-commercial production codes used by a particular research field. Generally, they are developed and distributed by a single academic or research institution with assistance from the community. Most users just run the codes, but some develop new methods and extensions that feed back into the general release. The codes are available on most vendor platforms. Since these codes are younger than ISV codes, there are more opportunities to optimize the source code. Improvements of 50% are not unusual. Examples of community codes are AMBER, CHARM, BLAST, and FASTA. Personal codes are those written by single users or small research groups for their own use. These codes are not distributed, but may be passed from professor-to-student or student-to-student over several years. They form the primordial ocean of applications from which community and ISV codes emerge. Government research grants pay for the development of most personal codes. This paper reports on the nature and performance of this class of codes. Over the last year, I have looked at over two dozen personal codes from more than a dozen research institutions. The codes cover a variety of scientific fields, including astronomy, atmospheric sciences, bioinformatics, biology, chemistry, geology, and physics. The sources range from a few hundred lines to more than ten thousand lines, and are written in Fortran, Fortran 90, C, and C++. For the most part, the codes are modular, documented, and written in a clear, straightforward manner. They do not use complex language features, advanced data structures, programming tricks, or libraries. I had little trouble understanding what the codes did or how data structures were used. Most came with a makefile. Surprisingly, only one of the applications is parallel. All developers have access to parallel machines, so availability is not an issue. Several tried to parallelize their applications, but stopped after encountering difficulties. Lack of education and a perception that parallelism is difficult prevented most from trying. I parallelized several of the codes using OpenMP, and did not judge any of the codes as difficult to parallelize. Even more surprising than the lack of parallelism is the inefficiency of the codes. I was able to get large improvements in performance in a matter of a few days applying simple optimization techniques. Table 1 lists ten representative codes [names and affiliation are omitted to preserve anonymity]. Improvements on one processor range from 2x to 15.5x with a simple average of 4.75x. I did not use sophisticated performance tools or drill deep into the program's execution character as one would do when tuning ISV or community codes. Using only a profiler and source line timers, I identified inefficient sections of code and improved their performance by inspection. The changes were at a high level. I am sure there is another factor of 2 or 3 in each code, and more if the codes are parallelized. The study’s results show that personal scientific codes are running many times slower than they should and that the problem is pervasive. Computational scientists are not sloppy programmers; however, few are trained in the art of computer programming or code optimization. I found that most have a working knowledge of some programming language and standard software engineering practices; but they do not know, or think about, how to make their programs run faster. They simply do not know the standard techniques used to make codes run faster. In fact, they do not even perceive that such techniques exist. The case studies described in this paper show that applying simple, well known techniques can significantly increase the performance of personal codes. It is important that the scientific community and the Government agencies that support scientific research find ways to better educate academic scientific programmers. The inefficiency of their codes is so bad that it is retarding both the quality and progress of scientific research. # cacheperformance redundantoperations loopstructures performanceimprovement 1 x x 15.5 2 x 2.8 3 x x 2.5 4 x 2.1 5 x x 2.0 6 x 5.0 7 x 5.8 8 x 6.3 9 2.2 10 x x 3.3 Table 1 — Area of improvement and performance gains of 10 codes The remainder of the paper is organized as follows: sections 2, 3, and 4 discuss the three most common sources of inefficiencies in the codes studied. These are cache performance, redundant operations, and loop structures. Each section includes several examples. The last section summaries the work and suggests a possible solution to the issues raised. Optimizing cache performance Commodity microprocessor systems use caches to increase memory bandwidth and reduce memory latencies. Typical latencies from processor to L1, L2, local, and remote memory are 3, 10, 50, and 200 cycles, respectively. Moreover, bandwidth falls off dramatically as memory distances increase. Programs that do not use cache effectively run many times slower than programs that do. When optimizing for cache, the biggest performance gains are achieved by accessing data in cache order and reusing data to amortize the overhead of cache misses. Secondary considerations are prefetching, associativity, and replacement; however, the understanding and analysis required to optimize for the latter are probably beyond the capabilities of the non-expert. Much can be gained simply by accessing data in the correct order and maximizing data reuse. 6 out of the 10 codes studied here benefited from such high level optimizations. Array Accesses The most important cache optimization is the most basic: accessing Fortran array elements in column order and C array elements in row order. Four of the ten codes—1, 2, 4, and 10—got it wrong. Compilers will restructure nested loops to optimize cache performance, but may not do so if the loop structure is too complex, or the loop body includes conditionals, complex addressing, or function calls. In code 1, the compiler failed to invert a key loop because of complex addressing do I = 0, 1010, delta_x IM = I - delta_x IP = I + delta_x do J = 5, 995, delta_x JM = J - delta_x JP = J + delta_x T1 = CA1(IP, J) + CA1(I, JP) T2 = CA1(IM, J) + CA1(I, JM) S1 = T1 + T2 - 4 * CA1(I, J) CA(I, J) = CA1(I, J) + D * S1 end do end do In code 2, the culprit is conditionals do I = 1, N do J = 1, N If (IFLAG(I,J) .EQ. 0) then T1 = Value(I, J-1) T2 = Value(I-1, J) T3 = Value(I, J) T4 = Value(I+1, J) T5 = Value(I, J+1) Value(I,J) = 0.25 * (T1 + T2 + T5 + T4) Delta = ABS(T3 - Value(I,J)) If (Delta .GT. MaxDelta) MaxDelta = Delta endif enddo enddo I fixed both programs by inverting the loops by hand. Code 10 has three-dimensional arrays and triply nested loops. The structure of the most computationally intensive loops is too complex to invert automatically or by hand. The only practical solution is to transpose the arrays so that the dimension accessed by the innermost loop is in cache order. The arrays can be transposed at construction or prior to entering a computationally intensive section of code. The former requires all array references to be modified, while the latter is cost effective only if the cost of the transpose is amortized over many accesses. I used the second approach to optimize code 10. Code 5 has four-dimensional arrays and loops are nested four deep. For all of the reasons cited above the compiler is not able to restructure three key loops. Assume C arrays and let the four dimensions of the arrays be i, j, k, and l. In the original code, the index structure of the three loops is L1: for i L2: for i L3: for i for l for l for j for k for j for k for j for k for l So only L3 accesses array elements in cache order. L1 is a very complex loop—much too complex to invert. I brought the loop into cache alignment by transposing the second and fourth dimensions of the arrays. Since the code uses a macro to compute all array indexes, I effected the transpose at construction and changed the macro appropriately. The dimensions of the new arrays are now: i, l, k, and j. L3 is a simple loop and easily inverted. L2 has a loop-carried scalar dependence in k. By promoting the scalar name that carries the dependence to an array, I was able to invert the third and fourth subloops aligning the loop with cache. Code 5 is by far the most difficult of the four codes to optimize for array accesses; but the knowledge required to fix the problems is no more than that required for the other codes. I would judge this code at the limits of, but not beyond, the capabilities of appropriately trained computational scientists. Array Strides When a cache miss occurs, a line (64 bytes) rather than just one word is loaded into the cache. If data is accessed stride 1, than the cost of the miss is amortized over 8 words. Any stride other than one reduces the cost savings. Two of the ten codes studied suffered from non-unit strides. The codes represent two important classes of "strided" codes. Code 1 employs a multi-grid algorithm to reduce time to convergence. The grids are every tenth, fifth, second, and unit element. Since time to convergence is inversely proportional to the distance between elements, coarse grids converge quickly providing good starting values for finer grids. The better starting values further reduce the time to convergence. The downside is that grids of every nth element, n > 1, introduce non-unit strides into the computation. In the original code, much of the savings of the multi-grid algorithm were lost due to this problem. I eliminated the problem by compressing (copying) coarse grids into continuous memory, and rewriting the computation as a function of the compressed grid. On convergence, I copied the final values of the compressed grid back to the original grid. The savings gained from unit stride access of the compressed grid more than paid for the cost of copying. Using compressed grids, the loop from code 1 included in the previous section becomes do j = 1, GZ do i = 1, GZ T1 = CA(i+0, j-1) + CA(i-1, j+0) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) S1 = T1 + T4 - 4 * CA1(i+0, j+0) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 enddo enddo where CA and CA1 are compressed arrays of size GZ. Code 7 traverses a list of objects selecting objects for later processing. The labels of the selected objects are stored in an array. The selection step has unit stride, but the processing steps have irregular stride. A fix is to save the parameters of the selected objects in temporary arrays as they are selected, and pass the temporary arrays to the processing functions. The fix is practical if the same parameters are used in selection as in processing, or if processing comprises a series of distinct steps which use overlapping subsets of the parameters. Both conditions are true for code 7, so I achieved significant improvement by copying parameters to temporary arrays during selection. Data reuse In the previous sections, we optimized for spatial locality. It is also important to optimize for temporal locality. Once read, a datum should be used as much as possible before it is forced from cache. Loop fusion and loop unrolling are two techniques that increase temporal locality. Unfortunately, both techniques increase register pressure—as loop bodies become larger, the number of registers required to hold temporary values grows. Once register spilling occurs, any gains evaporate quickly. For multiprocessors with small register sets or small caches, the sweet spot can be very small. In the ten codes presented here, I found no opportunities for loop fusion and only two opportunities for loop unrolling (codes 1 and 3). In code 1, unrolling the outer and inner loop one iteration increases the number of result values computed by the loop body from 1 to 4, do J = 1, GZ-2, 2 do I = 1, GZ-2, 2 T1 = CA1(i+0, j-1) + CA1(i-1, j+0) T2 = CA1(i+1, j-1) + CA1(i+0, j+0) T3 = CA1(i+0, j+0) + CA1(i-1, j+1) T4 = CA1(i+1, j+0) + CA1(i+0, j+1) T5 = CA1(i+2, j+0) + CA1(i+1, j+1) T6 = CA1(i+1, j+1) + CA1(i+0, j+2) T7 = CA1(i+2, j+1) + CA1(i+1, j+2) S1 = T1 + T4 - 4 * CA1(i+0, j+0) S2 = T2 + T5 - 4 * CA1(i+1, j+0) S3 = T3 + T6 - 4 * CA1(i+0, j+1) S4 = T4 + T7 - 4 * CA1(i+1, j+1) CA(i+0, j+0) = CA1(i+0, j+0) + DD * S1 CA(i+1, j+0) = CA1(i+1, j+0) + DD * S2 CA(i+0, j+1) = CA1(i+0, j+1) + DD * S3 CA(i+1, j+1) = CA1(i+1, j+1) + DD * S4 enddo enddo The loop body executes 12 reads, whereas as the rolled loop shown in the previous section executes 20 reads to compute the same four values. In code 3, two loops are unrolled 8 times and one loop is unrolled 4 times. Here is the before for (k = 0; k < NK[u]; k++) { sum = 0.0; for (y = 0; y < NY; y++) { sum += W[y][u][k] * delta[y]; } backprop[i++]=sum; } and after code for (k = 0; k < KK - 8; k+=8) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (y = 0; y < NY; y++) { sum0 += W[y][0][k+0] * delta[y]; sum1 += W[y][0][k+1] * delta[y]; sum2 += W[y][0][k+2] * delta[y]; sum3 += W[y][0][k+3] * delta[y]; sum4 += W[y][0][k+4] * delta[y]; sum5 += W[y][0][k+5] * delta[y]; sum6 += W[y][0][k+6] * delta[y]; sum7 += W[y][0][k+7] * delta[y]; } backprop[k+0] = sum0; backprop[k+1] = sum1; backprop[k+2] = sum2; backprop[k+3] = sum3; backprop[k+4] = sum4; backprop[k+5] = sum5; backprop[k+6] = sum6; backprop[k+7] = sum7; } for one of the loops unrolled 8 times. Optimizing for temporal locality is the most difficult optimization considered in this paper. The concepts are not difficult, but the sweet spot is small. Identifying where the program can benefit from loop unrolling or loop fusion is not trivial. Moreover, it takes some effort to get it right. Still, educating scientific programmers about temporal locality and teaching them how to optimize for it will pay dividends. Reducing instruction count Execution time is a function of instruction count. Reduce the count and you usually reduce the time. The best solution is to use a more efficient algorithm; that is, an algorithm whose order of complexity is smaller, that converges quicker, or is more accurate. Optimizing source code without changing the algorithm yields smaller, but still significant, gains. This paper considers only the latter because the intent is to study how much better codes can run if written by programmers schooled in basic code optimization techniques. The ten codes studied benefited from three types of "instruction reducing" optimizations. The two most prevalent were hoisting invariant memory and data operations out of inner loops. The third was eliminating unnecessary data copying. The nature of these inefficiencies is language dependent. Memory operations The semantics of C make it difficult for the compiler to determine all the invariant memory operations in a loop. The problem is particularly acute for loops in functions since the compiler may not know the values of the function's parameters at every call site when compiling the function. Most compilers support pragmas to help resolve ambiguities; however, these pragmas are not comprehensive and there is no standard syntax. To guarantee that invariant memory operations are not executed repetitively, the user has little choice but to hoist the operations by hand. The problem is not as severe in Fortran programs because in the absence of equivalence statements, it is a violation of the language's semantics for two names to share memory. Codes 3 and 5 are C programs. In both cases, the compiler did not hoist all invariant memory operations from inner loops. Consider the following loop from code 3 for (y = 0; y < NY; y++) { i = 0; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += delta[y] * I1[i++]; } } } Since dW[y][u] can point to the same memory space as delta for one or more values of y and u, assignment to dW[y][u][k] may change the value of delta[y]. In reality, dW and delta do not overlap in memory, so I rewrote the loop as for (y = 0; y < NY; y++) { i = 0; Dy = delta[y]; for (u = 0; u < NU; u++) { for (k = 0; k < NK[u]; k++) { dW[y][u][k] += Dy * I1[i++]; } } } Failure to hoist invariant memory operations may be due to complex address calculations. If the compiler can not determine that the address calculation is invariant, then it can hoist neither the calculation nor the associated memory operations. As noted above, code 5 uses a macro to address four-dimensional arrays #define MAT4D(a,q,i,j,k) (double *)((a)->data + (q)*(a)->strides[0] + (i)*(a)->strides[3] + (j)*(a)->strides[2] + (k)*(a)->strides[1]) The macro is too complex for the compiler to understand and so, it does not identify any subexpressions as loop invariant. The simplest way to eliminate the address calculation from the innermost loop (over i) is to define a0 = MAT4D(a,q,0,j,k) before the loop and then replace all instances of *MAT4D(a,q,i,j,k) in the loop with a0[i] A similar problem appears in code 6, a Fortran program. The key loop in this program is do n1 = 1, nh nx1 = (n1 - 1) / nz + 1 nz1 = n1 - nz * (nx1 - 1) do n2 = 1, nh nx2 = (n2 - 1) / nz + 1 nz2 = n2 - nz * (nx2 - 1) ndx = nx2 - nx1 ndy = nz2 - nz1 gxx = grn(1,ndx,ndy) gyy = grn(2,ndx,ndy) gxy = grn(3,ndx,ndy) balance(n1,1) = balance(n1,1) + (force(n2,1) * gxx + force(n2,2) * gxy) * h1 balance(n1,2) = balance(n1,2) + (force(n2,1) * gxy + force(n2,2) * gyy)*h1 end do end do The programmer has written this loop well—there are no loop invariant operations with respect to n1 and n2. However, the loop resides within an iterative loop over time and the index calculations are independent with respect to time. Trading space for time, I precomputed the index values prior to the entering the time loop and stored the values in two arrays. I then replaced the index calculations with reads of the arrays. Data operations Ways to reduce data operations can appear in many forms. Implementing a more efficient algorithm produces the biggest gains. The closest I came to an algorithm change was in code 4. This code computes the inner product of K-vectors A(i) and B(j), 0 = i < N, 0 = j < M, for most values of i and j. Since the program computes most of the NM possible inner products, it is more efficient to compute all the inner products in one triply-nested loop rather than one at a time when needed. The savings accrue from reading A(i) once for all B(j) vectors and from loop unrolling. for (i = 0; i < N; i+=8) { for (j = 0; j < M; j++) { sum0 = 0.0; sum1 = 0.0; sum2 = 0.0; sum3 = 0.0; sum4 = 0.0; sum5 = 0.0; sum6 = 0.0; sum7 = 0.0; for (k = 0; k < K; k++) { sum0 += A[i+0][k] * B[j][k]; sum1 += A[i+1][k] * B[j][k]; sum2 += A[i+2][k] * B[j][k]; sum3 += A[i+3][k] * B[j][k]; sum4 += A[i+4][k] * B[j][k]; sum5 += A[i+5][k] * B[j][k]; sum6 += A[i+6][k] * B[j][k]; sum7 += A[i+7][k] * B[j][k]; } C[i+0][j] = sum0; C[i+1][j] = sum1; C[i+2][j] = sum2; C[i+3][j] = sum3; C[i+4][j] = sum4; C[i+5][j] = sum5; C[i+6][j] = sum6; C[i+7][j] = sum7; }} This change requires knowledge of a typical run; i.e., that most inner products are computed. The reasons for the change, however, derive from basic optimization concepts. It is the type of change easily made at development time by a knowledgeable programmer. In code 5, we have the data version of the index optimization in code 6. Here a very expensive computation is a function of the loop indices and so cannot be hoisted out of the loop; however, the computation is invariant with respect to an outer iterative loop over time. We can compute its value for each iteration of the computation loop prior to entering the time loop and save the values in an array. The increase in memory required to store the values is small in comparison to the large savings in time. The main loop in Code 8 is doubly nested. The inner loop includes a series of guarded computations; some are a function of the inner loop index but not the outer loop index while others are a function of the outer loop index but not the inner loop index for (j = 0; j < N; j++) { for (i = 0; i < M; i++) { r = i * hrmax; R = A[j]; temp = (PRM[3] == 0.0) ? 1.0 : pow(r, PRM[3]); high = temp * kcoeff * B[j] * PRM[2] * PRM[4]; low = high * PRM[6] * PRM[6] / (1.0 + pow(PRM[4] * PRM[6], 2.0)); kap = (R > PRM[6]) ? high * R * R / (1.0 + pow(PRM[4]*r, 2.0) : low * pow(R/PRM[6], PRM[5]); < rest of loop omitted > }} Note that the value of temp is invariant to j. Thus, we can hoist the computation for temp out of the loop and save its values in an array. for (i = 0; i < M; i++) { r = i * hrmax; TEMP[i] = pow(r, PRM[3]); } [N.B. – the case for PRM[3] = 0 is omitted and will be reintroduced later.] We now hoist out of the inner loop the computations invariant to i. Since the conditional guarding the value of kap is invariant to i, it behooves us to hoist the computation out of the inner loop, thereby executing the guard once rather than M times. The final version of the code is for (j = 0; j < N; j++) { R = rig[j] / 1000.; tmp1 = kcoeff * par[2] * beta[j] * par[4]; tmp2 = 1.0 + (par[4] * par[4] * par[6] * par[6]); tmp3 = 1.0 + (par[4] * par[4] * R * R); tmp4 = par[6] * par[6] / tmp2; tmp5 = R * R / tmp3; tmp6 = pow(R / par[6], par[5]); if ((par[3] == 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp5; } else if ((par[3] == 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * tmp4 * tmp6; } else if ((par[3] != 0.0) && (R > par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp5; } else if ((par[3] != 0.0) && (R <= par[6])) { for (i = 1; i <= imax1; i++) KAP[i] = tmp1 * TEMP[i] * tmp4 * tmp6; } for (i = 0; i < M; i++) { kap = KAP[i]; r = i * hrmax; < rest of loop omitted > } } Maybe not the prettiest piece of code, but certainly much more efficient than the original loop, Copy operations Several programs unnecessarily copy data from one data structure to another. This problem occurs in both Fortran and C programs, although it manifests itself differently in the two languages. Code 1 declares two arrays—one for old values and one for new values. At the end of each iteration, the array of new values is copied to the array of old values to reset the data structures for the next iteration. This problem occurs in Fortran programs not included in this study and in both Fortran 77 and Fortran 90 code. Introducing pointers to the arrays and swapping pointer values is an obvious way to eliminate the copying; but pointers is not a feature that many Fortran programmers know well or are comfortable using. An easy solution not involving pointers is to extend the dimension of the value array by 1 and use the last dimension to differentiate between arrays at different times. For example, if the data space is N x N, declare the array (N, N, 2). Then store the problem’s initial values in (_, _, 2) and define the scalar names new = 2 and old = 1. At the start of each iteration, swap old and new to reset the arrays. The old–new copy problem did not appear in any C program. In programs that had new and old values, the code swapped pointers to reset data structures. Where unnecessary coping did occur is in structure assignment and parameter passing. Structures in C are handled much like scalars. Assignment causes the data space of the right-hand name to be copied to the data space of the left-hand name. Similarly, when a structure is passed to a function, the data space of the actual parameter is copied to the data space of the formal parameter. If the structure is large and the assignment or function call is in an inner loop, then copying costs can grow quite large. While none of the ten programs considered here manifested this problem, it did occur in programs not included in the study. A simple fix is always to refer to structures via pointers. Optimizing loop structures Since scientific programs spend almost all their time in loops, efficient loops are the key to good performance. Conditionals, function calls, little instruction level parallelism, and large numbers of temporary values make it difficult for the compiler to generate tightly packed, highly efficient code. Conditionals and function calls introduce jumps that disrupt code flow. Users should eliminate or isolate conditionls to their own loops as much as possible. Often logical expressions can be substituted for if-then-else statements. For example, code 2 includes the following snippet MaxDelta = 0.0 do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) if (Delta > MaxDelta) MaxDelta = Delta enddo enddo if (MaxDelta .gt. 0.001) goto 200 Since the only use of MaxDelta is to control the jump to 200 and all that matters is whether or not it is greater than 0.001, I made MaxDelta a boolean and rewrote the snippet as MaxDelta = .false. do J = 1, N do I = 1, M < code omitted > Delta = abs(OldValue ? NewValue) MaxDelta = MaxDelta .or. (Delta .gt. 0.001) enddo enddo if (MaxDelta) goto 200 thereby, eliminating the conditional expression from the inner loop. A microprocessor can execute many instructions per instruction cycle. Typically, it can execute one or more memory, floating point, integer, and jump operations. To be executed simultaneously, the operations must be independent. Thick loops tend to have more instruction level parallelism than thin loops. Moreover, they reduce memory traffice by maximizing data reuse. Loop unrolling and loop fusion are two techniques to increase the size of loop bodies. Several of the codes studied benefitted from loop unrolling, but none benefitted from loop fusion. This observation is not too surpising since it is the general tendency of programmers to write thick loops. As loops become thicker, the number of temporary values grows, increasing register pressure. If registers spill, then memory traffic increases and code flow is disrupted. A thick loop with many temporary values may execute slower than an equivalent series of thin loops. The biggest gain will be achieved if the thick loop can be split into a series of independent loops eliminating the need to write and read temporary arrays. I found such an occasion in code 10 where I split the loop do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do into two disjoint loops do i = 1, n do j = 1, m A24(j,i)= S24(j,i) * T24(j,i) + S25(j,i) * U25(j,i) B24(j,i)= S24(j,i) * T25(j,i) + S25(j,i) * U24(j,i) A25(j,i)= S24(j,i) * C24(j,i) + S25(j,i) * V24(j,i) B25(j,i)= S24(j,i) * U25(j,i) + S25(j,i) * V25(j,i) end do end do do i = 1, n do j = 1, m C24(j,i)= S26(j,i) * T26(j,i) + S27(j,i) * U26(j,i) D24(j,i)= S26(j,i) * T27(j,i) + S27(j,i) * V26(j,i) C25(j,i)= S27(j,i) * S28(j,i) + S26(j,i) * U28(j,i) D25(j,i)= S27(j,i) * T28(j,i) + S26(j,i) * V28(j,i) end do end do Conclusions Over the course of the last year, I have had the opportunity to work with over two dozen academic scientific programmers at leading research universities. Their research interests span a broad range of scientific fields. Except for two programs that relied almost exclusively on library routines (matrix multiply and fast Fourier transform), I was able to improve significantly the single processor performance of all codes. Improvements range from 2x to 15.5x with a simple average of 4.75x. Changes to the source code were at a very high level. I did not use sophisticated techniques or programming tools to discover inefficiencies or effect the changes. Only one code was parallel despite the availability of parallel systems to all developers. Clearly, we have a problem—personal scientific research codes are highly inefficient and not running parallel. The developers are unaware of simple optimization techniques to make programs run faster. They lack education in the art of code optimization and parallel programming. I do not believe we can fix the problem by publishing additional books or training manuals. To date, the developers in questions have not studied the books or manual available, and are unlikely to do so in the future. Short courses are a possible solution, but I believe they are too concentrated to be much use. The general concepts can be taught in a three or four day course, but that is not enough time for students to practice what they learn and acquire the experience to apply and extend the concepts to their codes. Practice is the key to becoming proficient at optimization. I recommend that graduate students be required to take a semester length course in optimization and parallel programming. We would never give someone access to state-of-the-art scientific equipment costing hundreds of thousands of dollars without first requiring them to demonstrate that they know how to use the equipment. Yet the criterion for time on state-of-the-art supercomputers is at most an interesting project. Requestors are never asked to demonstrate that they know how to use the system, or can use the system effectively. A semester course would teach them the required skills. Government agencies that fund academic scientific research pay for most of the computer systems supporting scientific research as well as the development of most personal scientific codes. These agencies should require graduate schools to offer a course in optimization and parallel programming as a requirement for funding. About the Author John Feo received his Ph.D. in Computer Science from The University of Texas at Austin in 1986. After graduate school, Dr. Feo worked at Lawrence Livermore National Laboratory where he was the Group Leader of the Computer Research Group and principal investigator of the Sisal Language Project. In 1997, Dr. Feo joined Tera Computer Company where he was project manager for the MTA, and oversaw the programming and evaluation of the MTA at the San Diego Supercomputer Center. In 2000, Dr. Feo joined Sun Microsystems as an HPC application specialist. He works with university research groups to optimize and parallelize scientific codes. Dr. Feo has published over two dozen research articles in the areas of parallel parallel programming, parallel programming languages, and application performance.

Read the article

Search Results

Search found 114 results on 5 pages for 'phd'.

Page 5/5 | < Previous Page | 1 2 3 4 5

- by kaikanmonaco

- by B.Koch

- by Laila

- by oxbow_lakes

- by Janice J. Heiss

- by Janice J. Heiss

- by Mike F

- by Iwona

- by Jaanus

- by user701213

- by danx

- by Martin Hinshelwood

- by rchrd

< Previous Page | 1 2 3 4 5