Search Results

Search found 5221 results on 209 pages for 'low latency'.

Page 170/209 | < Previous Page | 166 167 168 169 170 171 172 173 174 175 176 177 | Next Page >

Very high Magento/Apache memory usage even without visitors (are we fooled by our hosting company?)

- by MrDobalina

I am no server guy and we have issues with our speed so I come here asking for advise. We have a VPS with 2 cores and 2gb of RAM at a Magento specialized hosting company. Over the course of the last weeks our site speed has gotten worse, even though our store is new, has less than 1000 SKUs and not even 100 visitos a day. At magespeedtest.com we only get 1.87 trans/sec @ 2.11 secs each with a mere 5 concurrent users. Our magento log files are clean, we have no huge database tables or anything like that. When we take a look at our server real time stats, we see that the memory usage jumped up from about 34% to 71% and now 82% in just a few days in idle, with no visitors on the site. Our hosting company said that we do not need to worry about that as it`s maybe related to mysql which creates buffers (which are maybe not even actually being used) and what is important is CPU and swap - stats are ok here. They also said that the low benchmark scores are caused by bad extensions or template modifications on our side. We are not sure if we can trust that statement as we only have 4 plugins installed (all from aheadworks and amasty which are known to be one of the best magento extension developers). Our template modifications are purely html and css, no modifications to the php code. Our pagespeed is ranked with 93/100 in firebug and Magento is properly configured, so the problem really just gets obvious when there are a handful of users on the site at the same time. Can anyone confirm our hosting`s statement about memory usage and where can I start looking for a solution?

Read the article
Raspberry Pi how to format HDD

- by Speed

Hi I am very new to Raspberry Pi environment, so looking for a bit of help to format a usb hard disk drive. I ran lsblk and got sda 8:0 0 37.3G 0 disk sda1 8:1 0 37.3G 0 part looking on web, if tried the following "sudo mkfs.ext4 /dev/sda1 -L USB40gb" it did something but when I tried to mount the drive again, it still showed the files that were there before and I can not create new file/folder "Error creating directory: Permission denied" I am writing this from my windows 8.1 pc so can not cut and paste from the pi. trying to format its output is a bit hard. Oh, there is Nothing written after the word "part" above. There use to be /media/USB40gb so I have done something because this has disappeared. I am using PCManFM 0.9.10 It does not have a format option, which would make life a lot easier, but then its not windows. I think I am running the basic linux os for the pi. It boots to a graphic environment, but I do not know how to advise what it is. I think its OpenBox 2.0.4 Thanks in advance Speed PS: I reran the format string above but this time I changed the label to read USB37gb. I did this to confirm that I was in fact formatting the right drive. Low and behold, it actually formatted the drive, wiping everything from it. Great ... testing it by creating a new folder on the drive and get error msg Permission Denied! So I have fixed the formatting issue by trial and error but still can't use the drive... Suggestions anyone?

Read the article
value types in the vm

- by john.rose

value types in the vm p.p1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} p.p2 {margin: 0.0px 0.0px 14.0px 0.0px; font: 14.0px Times} p.p3 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times} p.p4 {margin: 0.0px 0.0px 15.0px 0.0px; font: 14.0px Times} p.p5 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier} p.p6 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Courier; min-height: 17.0px} p.p7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p8 {margin: 0.0px 0.0px 0.0px 36.0px; text-indent: -36.0px; font: 14.0px Times; min-height: 18.0px} p.p9 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; min-height: 18.0px} p.p10 {margin: 0.0px 0.0px 12.0px 0.0px; font: 14.0px Times; color: #000000} li.li1 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times} li.li7 {margin: 0.0px 0.0px 0.0px 0.0px; font: 14.0px Times; min-height: 18.0px} span.s1 {font: 14.0px Courier} span.s2 {color: #000000} span.s3 {font: 14.0px Courier; color: #000000} ol.ol1 {list-style-type: decimal} Or, enduring values for a changing world. Introduction A value type is a data type which, generally speaking, is designed for being passed by value in and out of methods, and stored by value in data structures. The only value types which the Java language directly supports are the eight primitive types. Java indirectly and approximately supports value types, if they are implemented in terms of classes. For example, both Integer and String may be viewed as value types, especially if their usage is restricted to avoid operations appropriate to Object. In this note, we propose a definition of value types in terms of a design pattern for Java classes, accompanied by a set of usage restrictions. We also sketch the relation of such value types to tuple types (which are a JVM-level notion), and point out JVM optimizations that can apply to value types. This note is a thought experiment to extend the JVM’s performance model in support of value types. The demonstration has two phases. Initially the extension can simply use design patterns, within the current bytecode architecture, and in today’s Java language. But if the performance model is to be realized in practice, it will probably require new JVM bytecode features, changes to the Java language, or both. We will look at a few possibilities for these new features. An Axiom of Value In the context of the JVM, a value type is a data type equipped with construction, assignment, and equality operations, and a set of typed components, such that, whenever two variables of the value type produce equal corresponding values for their components, the values of the two variables cannot be distinguished by any JVM operation. Here are some corollaries: A value type is immutable, since otherwise a copy could be constructed and the original could be modified in one of its components, allowing the copies to be distinguished. Changing the component of a value type requires construction of a new value. The equals and hashCode operations are strictly component-wise. If a value type is represented by a JVM reference, that reference cannot be successfully synchronized on, and cannot be usefully compared for reference equality. A value type can be viewed in terms of what it doesn’t do. We can say that a value type omits all value-unsafe operations, which could violate the constraints on value types. These operations, which are ordinarily allowed for Java object types, are pointer equality comparison (the acmp instruction), synchronization (the monitor instructions), all the wait and notify methods of class Object, and non-trivial finalize methods. The clone method is also value-unsafe, although for value types it could be treated as the identity function. Finally, and most importantly, any side effect on an object (however visible) also counts as an value-unsafe operation. A value type may have methods, but such methods must not change the components of the value. It is reasonable and useful to define methods like toString, equals, and hashCode on value types, and also methods which are specifically valuable to users of the value type. Representations of Value Value types have two natural representations in the JVM, unboxed and boxed. An unboxed value consists of the components, as simple variables. For example, the complex number x=(1+2i), in rectangular coordinate form, may be represented in unboxed form by the following pair of variables: /*Complex x = Complex.valueOf(1.0, 2.0):*/ double x_re = 1.0, x_im = 2.0; These variables might be locals, parameters, or fields. Their association as components of a single value is not defined to the JVM. Here is a sample computation which computes the norm of the difference between two complex numbers: double distance(/*Complex x:*/ double x_re, double x_im, /*Complex y:*/ double y_re, double y_im) { /*Complex z = x.minus(y):*/ double z_re = x_re - y_re, z_im = x_im - y_im; /*return z.abs():*/ return Math.sqrt(z_re*z_re + z_im*z_im); } A boxed representation groups component values under a single object reference. The reference is to a ‘wrapper class’ that carries the component values in its fields. (A primitive type can naturally be equated with a trivial value type with just one component of that type. In that view, the wrapper class Integer can serve as a boxed representation of value type int.) The unboxed representation of complex numbers is practical for many uses, but it fails to cover several major use cases: return values, array elements, and generic APIs. The two components of a complex number cannot be directly returned from a Java function, since Java does not support multiple return values. The same story applies to array elements: Java has no ’array of structs’ feature. (Double-length arrays are a possible workaround for complex numbers, but not for value types with heterogeneous components.) By generic APIs I mean both those which use generic types, like Arrays.asList and those which have special case support for primitive types, like String.valueOf and PrintStream.println. Those APIs do not support unboxed values, and offer some problems to boxed values. Any ’real’ JVM type should have a story for returns, arrays, and API interoperability. The basic problem here is that value types fall between primitive types and object types. Value types are clearly more complex than primitive types, and object types are slightly too complicated. Objects are a little bit dangerous to use as value carriers, since object references can be compared for pointer equality, and can be synchronized on. Also, as many Java programmers have observed, there is often a performance cost to using wrapper objects, even on modern JVMs. Even so, wrapper classes are a good starting point for talking about value types. If there were a set of structural rules and restrictions which would prevent value-unsafe operations on value types, wrapper classes would provide a good notation for defining value types. This note attempts to define such rules and restrictions. Let’s Start Coding Now it is time to look at some real code. Here is a definition, written in Java, of a complex number value type. @ValueSafe public final class Complex implements java.io.Serializable { // immutable component structure: public final double re, im; private Complex(double re, double im) { this.re = re; this.im = im; } // interoperability methods: public String toString() { return "Complex("+re+","+im+")"; } public List<Double> asList() { return Arrays.asList(re, im); } public boolean equals(Complex c) { return re == c.re && im == c.im; } public boolean equals(@ValueSafe Object x) { return x instanceof Complex && equals((Complex) x); } public int hashCode() { return 31*Double.valueOf(re).hashCode() + Double.valueOf(im).hashCode(); } // factory methods: public static Complex valueOf(double re, double im) { return new Complex(re, im); } public Complex changeRe(double re2) { return valueOf(re2, im); } public Complex changeIm(double im2) { return valueOf(re, im2); } public static Complex cast(@ValueSafe Object x) { return x == null ? ZERO : (Complex) x; } // utility methods and constants: public Complex plus(Complex c) { return new Complex(re+c.re, im+c.im); } public Complex minus(Complex c) { return new Complex(re-c.re, im-c.im); } public double abs() { return Math.sqrt(re*re + im*im); } public static final Complex PI = valueOf(Math.PI, 0.0); public static final Complex ZERO = valueOf(0.0, 0.0); } This is not a minimal definition, because it includes some utility methods and other optional parts. The essential elements are as follows: The class is marked as a value type with an annotation. The class is final, because it does not make sense to create subclasses of value types. The fields of the class are all non-private and final. (I.e., the type is immutable and structurally transparent.) From the supertype Object, all public non-final methods are overridden. The constructor is private. Beyond these bare essentials, we can observe the following features in this example, which are likely to be typical of all value types: One or more factory methods are responsible for value creation, including a component-wise valueOf method. There are utility methods for complex arithmetic and instance creation, such as plus and changeIm. There are static utility constants, such as PI. The type is serializable, using the default mechanisms. There are methods for converting to and from dynamically typed references, such as asList and cast. The Rules In order to use value types properly, the programmer must avoid value-unsafe operations. A helpful Java compiler should issue errors (or at least warnings) for code which provably applies value-unsafe operations, and should issue warnings for code which might be correct but does not provably avoid value-unsafe operations. No such compilers exist today, but to simplify our account here, we will pretend that they do exist. A value-safe type is any class, interface, or type parameter marked with the @ValueSafe annotation, or any subtype of a value-safe type. If a value-safe class is marked final, it is in fact a value type. All other value-safe classes must be abstract. The non-static fields of a value class must be non-public and final, and all its constructors must be private. Under the above rules, a standard interface could be helpful to define value types like Complex. Here is an example: @ValueSafe public interface ValueType extends java.io.Serializable { // All methods listed here must get redefined. // Definitions must be value-safe, which means // they may depend on component values only. List<? extends Object> asList(); int hashCode(); boolean equals(@ValueSafe Object c); String toString(); } //@ValueSafe inherited from supertype: public final class Complex implements ValueType { … The main advantage of such a conventional interface is that (unlike an annotation) it is reified in the runtime type system. It could appear as an element type or parameter bound, for facilities which are designed to work on value types only. More broadly, it might assist the JVM to perform dynamic enforcement of the rules for value types. Besides types, the annotation @ValueSafe can mark fields, parameters, local variables, and methods. (This is redundant when the type is also value-safe, but may be useful when the type is Object or another supertype of a value type.) Working forward from these annotations, an expression E is defined as value-safe if it satisfies one or more of the following: The type of E is a value-safe type. E names a field, parameter, or local variable whose declaration is marked @ValueSafe. E is a call to a method whose declaration is marked @ValueSafe. E is an assignment to a value-safe variable, field reference, or array reference. E is a cast to a value-safe type from a value-safe expression. E is a conditional expression E0 ? E1 : E2, and both E1 and E2 are value-safe. Assignments to value-safe expressions and initializations of value-safe names must take their values from value-safe expressions. A value-safe expression may not be the subject of a value-unsafe operation. In particular, it cannot be synchronized on, nor can it be compared with the “==” operator, not even with a null or with another value-safe type. In a program where all of these rules are followed, no value-type value will be subject to a value-unsafe operation. Thus, the prime axiom of value types will be satisfied, that no two value type will be distinguishable as long as their component values are equal. More Code To illustrate these rules, here are some usage examples for Complex: Complex pi = Complex.valueOf(Math.PI, 0); Complex zero = pi.changeRe(0); //zero = pi; zero.re = 0; ValueType vtype = pi; @SuppressWarnings("value-unsafe") Object obj = pi; @ValueSafe Object obj2 = pi; obj2 = new Object(); // ok List<Complex> clist = new ArrayList<Complex>(); clist.add(pi); // (ok assuming List.add param is @ValueSafe) List<ValueType> vlist = new ArrayList<ValueType>(); vlist.add(pi); // (ok) List<Object> olist = new ArrayList<Object>(); olist.add(pi); // warning: "value-unsafe" boolean z = pi.equals(zero); boolean z1 = (pi == zero); // error: reference comparison on value type boolean z2 = (pi == null); // error: reference comparison on value type boolean z3 = (pi == obj2); // error: reference comparison on value type synchronized (pi) { } // error: synch of value, unpredictable result synchronized (obj2) { } // unpredictable result Complex qq = pi; qq = null; // possible NPE; warning: “null-unsafe" qq = (Complex) obj; // warning: “null-unsafe" qq = Complex.cast(obj); // OK @SuppressWarnings("null-unsafe") Complex empty = null; // possible NPE qq = empty; // possible NPE (null pollution) The Payoffs It follows from this that either the JVM or the java compiler can replace boxed value-type values with unboxed ones, without affecting normal computations. Fields and variables of value types can be split into their unboxed components. Non-static methods on value types can be transformed into static methods which take the components as value parameters. Some common questions arise around this point in any discussion of value types. Why burden the programmer with all these extra rules? Why not detect programs automagically and perform unboxing transparently? The answer is that it is easy to break the rules accidently unless they are agreed to by the programmer and enforced. Automatic unboxing optimizations are tantalizing but (so far) unreachable ideal. In the current state of the art, it is possible exhibit benchmarks in which automatic unboxing provides the desired effects, but it is not possible to provide a JVM with a performance model that assures the programmer when unboxing will occur. This is why I’m writing this note, to enlist help from, and provide assurances to, the programmer. Basically, I’m shooting for a good set of user-supplied “pragmas” to frame the desired optimization. Again, the important thing is that the unboxing must be done reliably, or else programmers will have no reason to work with the extra complexity of the value-safety rules. There must be a reasonably stable performance model, wherein using a value type has approximately the same performance characteristics as writing the unboxed components as separate Java variables. There are some rough corners to the present scheme. Since Java fields and array elements are initialized to null, value-type computations which incorporate uninitialized variables can produce null pointer exceptions. One workaround for this is to require such variables to be null-tested, and the result replaced with a suitable all-zero value of the value type. That is what the “cast” method does above. Generically typed APIs like List<T> will continue to manipulate boxed values always, at least until we figure out how to do reification of generic type instances. Use of such APIs will elicit warnings until their type parameters (and/or relevant members) are annotated or typed as value-safe. Retrofitting List<T> is likely to expose flaws in the present scheme, which we will need to engineer around. Here are a couple of first approaches: public interface java.util.List<@ValueSafe T> extends Collection<T> { … public interface java.util.List<T extends Object|ValueType> extends Collection<T> { … (The second approach would require disjunctive types, in which value-safety is “contagious” from the constituent types.) With more transformations, the return value types of methods can also be unboxed. This may require significant bytecode-level transformations, and would work best in the presence of a bytecode representation for multiple value groups, which I have proposed elsewhere under the title “Tuples in the VM”. But for starters, the JVM can apply this transformation under the covers, to internally compiled methods. This would give a way to express multiple return values and structured return values, which is a significant pain-point for Java programmers, especially those who work with low-level structure types favored by modern vector and graphics processors. The lack of multiple return values has a strong distorting effect on many Java APIs. Even if the JVM fails to unbox a value, there is still potential benefit to the value type. Clustered computing systems something have copy operations (serialization or something similar) which apply implicitly to command operands. When copying JVM objects, it is extremely helpful to know when an object’s identity is important or not. If an object reference is a copied operand, the system may have to create a proxy handle which points back to the original object, so that side effects are visible. Proxies must be managed carefully, and this can be expensive. On the other hand, value types are exactly those types which a JVM can “copy and forget” with no downside. Array types are crucial to bulk data interfaces. (As data sizes and rates increase, bulk data becomes more important than scalar data, so arrays are definitely accompanying us into the future of computing.) Value types are very helpful for adding structure to bulk data, so a successful value type mechanism will make it easier for us to express richer forms of bulk data. Unboxing arrays (i.e., arrays containing unboxed values) will provide better cache and memory density, and more direct data movement within clustered or heterogeneous computing systems. They require the deepest transformations, relative to today’s JVM. There is an impedance mismatch between value-type arrays and Java’s covariant array typing, so compromises will need to be struck with existing Java semantics. It is probably worth the effort, since arrays of unboxed value types are inherently more memory-efficient than standard Java arrays, which rely on dependent pointer chains. It may be sufficient to extend the “value-safe” concept to array declarations, and allow low-level transformations to change value-safe array declarations from the standard boxed form into an unboxed tuple-based form. Such value-safe arrays would not be convertible to Object[] arrays. Certain connection points, such as Arrays.copyOf and System.arraycopy might need additional input/output combinations, to allow smooth conversion between arrays with boxed and unboxed elements. Alternatively, the correct solution may have to wait until we have enough reification of generic types, and enough operator overloading, to enable an overhaul of Java arrays. Implicit Method Definitions The example of class Complex above may be unattractively complex. I believe most or all of the elements of the example class are required by the logic of value types. If this is true, a programmer who writes a value type will have to write lots of error-prone boilerplate code. On the other hand, I think nearly all of the code (except for the domain-specific parts like plus and minus) can be implicitly generated. Java has a rule for implicitly defining a class’s constructor, if no it defines no constructors explicitly. Likewise, there are rules for providing default access modifiers for interface members. Because of the highly regular structure of value types, it might be reasonable to perform similar implicit transformations on value types. Here’s an example of a “highly implicit” definition of a complex number type: public class Complex implements ValueType { // implicitly final public double re, im; // implicitly public final //implicit methods are defined elementwise from te fields: // toString, asList, equals(2), hashCode, valueOf, cast //optionally, explicit methods (plus, abs, etc.) would go here } In other words, with the right defaults, a simple value type definition can be a one-liner. The observant reader will have noticed the similarities (and suitable differences) between the explicit methods above and the corresponding methods for List<T>. Another way to abbreviate such a class would be to make an annotation the primary trigger of the functionality, and to add the interface(s) implicitly: public @ValueType class Complex { … // implicitly final, implements ValueType (But to me it seems better to communicate the “magic” via an interface, even if it is rooted in an annotation.) Implicitly Defined Value Types So far we have been working with nominal value types, which is to say that the sequence of typed components is associated with a name and additional methods that convey the intention of the programmer. A simple ordered pair of floating point numbers can be variously interpreted as (to name a few possibilities) a rectangular or polar complex number or Cartesian point. The name and the methods convey the intended meaning. But what if we need a truly simple ordered pair of floating point numbers, without any further conceptual baggage? Perhaps we are writing a method (like “divideAndRemainder”) which naturally returns a pair of numbers instead of a single number. Wrapping the pair of numbers in a nominal type (like “QuotientAndRemainder”) makes as little sense as wrapping a single return value in a nominal type (like “Quotient”). What we need here are structural value types commonly known as tuples. For the present discussion, let us assign a conventional, JVM-friendly name to tuples, roughly as follows: public class java.lang.tuple.$DD extends java.lang.tuple.Tuple { double $1, $2; } Here the component names are fixed and all the required methods are defined implicitly. The supertype is an abstract class which has suitable shared declarations. The name itself mentions a JVM-style method parameter descriptor, which may be “cracked” to determine the number and types of the component fields. The odd thing about such a tuple type (and structural types in general) is it must be instantiated lazily, in response to linkage requests from one or more classes that need it. The JVM and/or its class loaders must be prepared to spin a tuple type on demand, given a simple name reference, $xyz, where the xyz is cracked into a series of component types. (Specifics of naming and name mangling need some tasteful engineering.) Tuples also seem to demand, even more than nominal types, some support from the language. (This is probably because notations for non-nominal types work best as combinations of punctuation and type names, rather than named constructors like Function3 or Tuple2.) At a minimum, languages with tuples usually (I think) have some sort of simple bracket notation for creating tuples, and a corresponding pattern-matching syntax (or “destructuring bind”) for taking tuples apart, at least when they are parameter lists. Designing such a syntax is no simple thing, because it ought to play well with nominal value types, and also with pre-existing Java features, such as method parameter lists, implicit conversions, generic types, and reflection. That is a task for another day. Other Use Cases Besides complex numbers and simple tuples there are many use cases for value types. Many tuple-like types have natural value-type representations. These include rational numbers, point locations and pixel colors, and various kinds of dates and addresses. Other types have a variable-length ‘tail’ of internal values. The most common example of this is String, which is (mathematically) a sequence of UTF-16 character values. Similarly, bit vectors, multiple-precision numbers, and polynomials are composed of sequences of values. Such types include, in their representation, a reference to a variable-sized data structure (often an array) which (somehow) represents the sequence of values. The value type may also include ’header’ information. Variable-sized values often have a length distribution which favors short lengths. In that case, the design of the value type can make the first few values in the sequence be direct ’header’ fields of the value type. In the common case where the header is enough to represent the whole value, the tail can be a shared null value, or even just a null reference. Note that the tail need not be an immutable object, as long as the header type encapsulates it well enough. This is the case with String, where the tail is a mutable (but never mutated) character array. Field types and their order must be a globally visible part of the API. The structure of the value type must be transparent enough to have a globally consistent unboxed representation, so that all callers and callees agree about the type and order of components that appear as parameters, return types, and array elements. This is a trade-off between efficiency and encapsulation, which is forced on us when we remove an indirection enjoyed by boxed representations. A JVM-only transformation would not care about such visibility, but a bytecode transformation would need to take care that (say) the components of complex numbers would not get swapped after a redefinition of Complex and a partial recompile. Perhaps constant pool references to value types need to declare the field order as assumed by each API user. This brings up the delicate status of private fields in a value type. It must always be possible to load, store, and copy value types as coordinated groups, and the JVM performs those movements by moving individual scalar values between locals and stack. If a component field is not public, what is to prevent hostile code from plucking it out of the tuple using a rogue aload or astore instruction? Nothing but the verifier, so we may need to give it more smarts, so that it treats value types as inseparable groups of stack slots or locals (something like long or double). My initial thought was to make the fields always public, which would make the security problem moot. But public is not always the right answer; consider the case of String, where the underlying mutable character array must be encapsulated to prevent security holes. I believe we can win back both sides of the tradeoff, by training the verifier never to split up the components in an unboxed value. Just as the verifier encapsulates the two halves of a 64-bit primitive, it can encapsulate the the header and body of an unboxed String, so that no code other than that of class String itself can take apart the values. Similar to String, we could build an efficient multi-precision decimal type along these lines: public final class DecimalValue extends ValueType { protected final long header; protected private final BigInteger digits; public DecimalValue valueOf(int value, int scale) { assert(scale >= 0); return new DecimalValue(((long)value << 32) + scale, null); } public DecimalValue valueOf(long value, int scale) { if (value == (int) value) return valueOf((int)value, scale); return new DecimalValue(-scale, new BigInteger(value)); } } Values of this type would be passed between methods as two machine words. Small values (those with a significand which fits into 32 bits) would be represented without any heap data at all, unless the DecimalValue itself were boxed. (Note the tension between encapsulation and unboxing in this case. It would be better if the header and digits fields were private, but depending on where the unboxing information must “leak”, it is probably safer to make a public revelation of the internal structure.) Note that, although an array of Complex can be faked with a double-length array of double, there is no easy way to fake an array of unboxed DecimalValues. (Either an array of boxed values or a transposed pair of homogeneous arrays would be reasonable fallbacks, in a current JVM.) Getting the full benefit of unboxing and arrays will require some new JVM magic. Although the JVM emphasizes portability, system dependent code will benefit from using machine-level types larger than 64 bits. For example, the back end of a linear algebra package might benefit from value types like Float4 which map to stock vector types. This is probably only worthwhile if the unboxing arrays can be packed with such values. More Daydreams A more finely-divided design for dynamic enforcement of value safety could feature separate marker interfaces for each invariant. An empty marker interface Unsynchronizable could cause suitable exceptions for monitor instructions on objects in marked classes. More radically, a Interchangeable marker interface could cause JVM primitives that are sensitive to object identity to raise exceptions; the strangest result would be that the acmp instruction would have to be specified as raising an exception. @ValueSafe public interface ValueType extends java.io.Serializable, Unsynchronizable, Interchangeable { … public class Complex implements ValueType { // inherits Serializable, Unsynchronizable, Interchangeable, @ValueSafe … It seems possible that Integer and the other wrapper types could be retro-fitted as value-safe types. This is a major change, since wrapper objects would be unsynchronizable and their references interchangeable. It is likely that code which violates value-safety for wrapper types exists but is uncommon. It is less plausible to retro-fit String, since the prominent operation String.intern is often used with value-unsafe code. We should also reconsider the distinction between boxed and unboxed values in code. The design presented above obscures that distinction. As another thought experiment, we could imagine making a first class distinction in the type system between boxed and unboxed representations. Since only primitive types are named with a lower-case initial letter, we could define that the capitalized version of a value type name always refers to the boxed representation, while the initial lower-case variant always refers to boxed. For example: complex pi = complex.valueOf(Math.PI, 0); Complex boxPi = pi; // convert to boxed myList.add(boxPi); complex z = myList.get(0); // unbox Such a convention could perhaps absorb the current difference between int and Integer, double and Double. It might also allow the programmer to express a helpful distinction among array types. As said above, array types are crucial to bulk data interfaces, but are limited in the JVM. Extending arrays beyond the present limitations is worth thinking about; for example, the Maxine JVM implementation has a hybrid object/array type. Something like this which can also accommodate value type components seems worthwhile. On the other hand, does it make sense for value types to contain short arrays? And why should random-access arrays be the end of our design process, when bulk data is often sequentially accessed, and it might make sense to have heterogeneous streams of data as the natural “jumbo” data structure. These considerations must wait for another day and another note. More Work It seems to me that a good sequence for introducing such value types would be as follows: Add the value-safety restrictions to an experimental version of javac. Code some sample applications with value types, including Complex and DecimalValue. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. A staggered roll-out like this would decouple language changes from bytecode changes, which is always a convenient thing. A similar investigation should be applied (concurrently) to array types. In this case, it seems to me that the starting point is in the JVM: Add an experimental unboxing array data structure to a production JVM, perhaps along the lines of Maxine hybrids. No bytecode or language support is required at first; everything can be done with encapsulated unsafe operations and/or method handles. Create an experimental JVM which internally unboxes value types but does not require new bytecodes to do so. Ensure the feasibility of the performance model for the sample applications. Add tuple-like bytecodes (with or without generic type reification) to a major revision of the JVM, and teach the Java compiler to switch in the new bytecodes without code changes. That’s enough musing me for now. Back to work!

Read the article
256 Windows Azure Worker Roles, Windows Kinect and a 90's Text-Based Ray-Tracer

- by Alan Smith

For a couple of years I have been demoing a simple render farm hosted in Windows Azure using worker roles and the Azure Storage service. At the start of the presentation I deploy an Azure application that uses 16 worker roles to render a 1,500 frame 3D ray-traced animation. At the end of the presentation, when the animation was complete, I would play the animation delete the Azure deployment. The standing joke with the audience was that it was that it was a “$2 demo”, as the compute charges for running the 16 instances for an hour was $1.92, factor in the bandwidth charges and it’s a couple of dollars. The point of the demo is that it highlights one of the great benefits of cloud computing, you pay for what you use, and if you need massive compute power for a short period of time using Windows Azure can work out very cost effective. The “$2 demo” was great for presenting at user groups and conferences in that it could be deployed to Azure, used to render an animation, and then removed in a one hour session. I have always had the idea of doing something a bit more impressive with the demo, and scaling it from a “$2 demo” to a “$30 demo”. The challenge was to create a visually appealing animation in high definition format and keep the demo time down to one hour. This article will take a run through how I achieved this. Ray Tracing Ray tracing, a technique for generating high quality photorealistic images, gained popularity in the 90’s with companies like Pixar creating feature length computer animations, and also the emergence of shareware text-based ray tracers that could run on a home PC. In order to render a ray traced image, the ray of light that would pass from the view point must be tracked until it intersects with an object. At the intersection, the color, reflectiveness, transparency, and refractive index of the object are used to calculate if the ray will be reflected or refracted. Each pixel may require thousands of calculations to determine what color it will be in the rendered image. Pin-Board Toys Having very little artistic talent and a basic understanding of maths I decided to focus on an animation that could be modeled fairly easily and would look visually impressive. I’ve always liked the pin-board desktop toys that become popular in the 80’s and when I was working as a 3D animator back in the 90’s I always had the idea of creating a 3D ray-traced animation of a pin-board, but never found the energy to do it. Even if I had a go at it, the render time to produce an animation that would look respectable on a 486 would have been measured in months. PolyRay Back in 1995 I landed my first real job, after spending three years being a beach-ski-climbing-paragliding-bum, and was employed to create 3D ray-traced animations for a CD-ROM that school kids would use to learn physics. I had got into the strange and wonderful world of text-based ray tracing, and was using a shareware ray-tracer called PolyRay. PolyRay takes a text file describing a scene as input and, after a few hours processing on a 486, produced a high quality ray-traced image. The following is an example of a basic PolyRay scene file. background Midnight_Blue static define matte surface { ambient 0.1 diffuse 0.7 } define matte_white texture { matte { color white } } define matte_black texture { matte { color dark_slate_gray } } define position_cylindrical 3 define lookup_sawtooth 1 define light_wood <0.6, 0.24, 0.1> define median_wood <0.3, 0.12, 0.03> define dark_wood <0.05, 0.01, 0.005> define wooden texture { noise surface { ambient 0.2 diffuse 0.7 specular white, 0.5 microfacet Reitz 10 position_fn position_cylindrical position_scale 1 lookup_fn lookup_sawtooth octaves 1 turbulence 1 color_map( [0.0, 0.2, light_wood, light_wood] [0.2, 0.3, light_wood, median_wood] [0.3, 0.4, median_wood, light_wood] [0.4, 0.7, light_wood, light_wood] [0.7, 0.8, light_wood, median_wood] [0.8, 0.9, median_wood, light_wood] [0.9, 1.0, light_wood, dark_wood]) } } define glass texture { surface { ambient 0 diffuse 0 specular 0.2 reflection white, 0.1 transmission white, 1, 1.5 }} define shiny surface { ambient 0.1 diffuse 0.6 specular white, 0.6 microfacet Phong 7 } define steely_blue texture { shiny { color black } } define chrome texture { surface { color white ambient 0.0 diffuse 0.2 specular 0.4 microfacet Phong 10 reflection 0.8 } } viewpoint { from <4.000, -1.000, 1.000> at <0.000, 0.000, 0.000> up <0, 1, 0> angle 60 resolution 640, 480 aspect 1.6 image_format 0 } light <-10, 30, 20> light <-10, 30, -20> object { disc <0, -2, 0>, <0, 1, 0>, 30 wooden } object { sphere <0.000, 0.000, 0.000>, 1.00 chrome } object { cylinder <0.000, 0.000, 0.000>, <0.000, 0.000, -4.000>, 0.50 chrome } After setting up the background and defining colors and textures, the viewpoint is specified. The “camera” is located at a point in 3D space, and it looks towards another point. The angle, image resolution, and aspect ratio are specified. Two lights are present in the image at defined coordinates. The three objects in the image are a wooden disc to represent a table top, and a sphere and cylinder that intersect to form a pin that will be used for the pin board toy in the final animation. When the image is rendered, the following image is produced. The pins are modeled with a chrome surface, so they reflect the environment around them. Note that the scale of the pin shaft is not correct, this will be fixed later. Modeling the Pin Board The frame of the pin-board is made up of three boxes, and six cylinders, the front box is modeled using a clear, slightly reflective solid, with the same refractive index of glass. The other shapes are modeled as metal. object { box <-5.5, -1.5, 1>, <5.5, 5.5, 1.2> glass } object { box <-5.5, -1.5, -0.04>, <5.5, 5.5, -0.09> steely_blue } object { box <-5.5, -1.5, -0.52>, <5.5, 5.5, -0.59> steely_blue } object { cylinder <-5.2, -1.2, 1.4>, <-5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, -1.2, 1.4>, <5.2, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <-5.2, 5.2, 1.4>, <-5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <5.2, 5.2, 1.4>, <5.2, 5.2, -0.74>, 0.2 steely_blue } object { cylinder <0, -1.2, 1.4>, <0, -1.2, -0.74>, 0.2 steely_blue } object { cylinder <0, 5.2, 1.4>, <0, 5.2, -0.74>, 0.2 steely_blue } In order to create the matrix of pins that make up the pin board I used a basic console application with a few nested loops to create two intersecting matrixes of pins, which models the layout used in the pin boards. The resulting image is shown below. The pin board contains 11,481 pins, with the scene file containing 23,709 lines of code. For the complete animation 2,000 scene files will be created, which is over 47 million lines of code. Each pin in the pin-board will slide out a specific distance when an object is pressed into the back of the board. This is easily modeled by setting the Z coordinate of the pin to a specific value. In order to set all of the pins in the pin-board to the correct position, a bitmap image can be used. The position of the pin can be set based on the color of the pixel at the appropriate position in the image. When the Windows Azure logo is used to set the Z coordinate of the pins, the following image is generated. The challenge now was to make a cool animation. The Azure Logo is fine, but it is static. Using a normal video to animate the pins would not work; the colors in the video would not be the same as the depth of the objects from the camera. In order to simulate the pin board accurately a series of frames from a depth camera could be used. Windows Kinect The Kenect controllers for the X-Box 360 and Windows feature a depth camera. The Kinect SDK for Windows provides a programming interface for Kenect, providing easy access for .NET developers to the Kinect sensors. The Kinect Explorer provided with the Kinect SDK is a great starting point for exploring Kinect from a developers perspective. Both the X-Box 360 Kinect and the Windows Kinect will work with the Kinect SDK, the Windows Kinect is required for commercial applications, but the X-Box Kinect can be used for hobby projects. The Windows Kinect has the advantage of providing a mode to allow depth capture with objects closer to the camera, which makes for a more accurate depth image for setting the pin positions. Creating a Depth Field Animation The depth field animation used to set the positions of the pin in the pin board was created using a modified version of the Kinect Explorer sample application. In order to simulate the pin board accurately, a small section of the depth range from the depth sensor will be used. Any part of the object in front of the depth range will result in a white pixel; anything behind the depth range will be black. Within the depth range the pixels in the image will be set to RGB values from 0,0,0 to 255,255,255. A screen shot of the modified Kinect Explorer application is shown below. The Kinect Explorer sample application was modified to include slider controls that are used to set the depth range that forms the image from the depth stream. This allows the fine tuning of the depth image that is required for simulating the position of the pins in the pin board. The Kinect Explorer was also modified to record a series of images from the depth camera and save them as a sequence JPEG files that will be used to animate the pins in the animation the Start and Stop buttons are used to start and stop the image recording. En example of one of the depth images is shown below. Once a series of 2,000 depth images has been captured, the task of creating the animation can begin. Rendering a Test Frame In order to test the creation of frames and get an approximation of the time required to render each frame a test frame was rendered on-premise using PolyRay. The output of the rendering process is shown below. The test frame contained 23,629 primitive shapes, most of which are the spheres and cylinders that are used for the 11,800 or so pins in the pin board. The 1280x720 image contains 921,600 pixels, but as anti-aliasing was used the number of rays that were calculated was 4,235,777, with 3,478,754,073 object boundaries checked. The test frame of the pin board with the depth field image applied is shown below. The tracing time for the test frame was 4 minutes 27 seconds, which means rendering the2,000 frames in the animation would take over 148 hours, or a little over 6 days. Although this is much faster that an old 486, waiting almost a week to see the results of an animation would make it challenging for animators to create, view, and refine their animations. It would be much better if the animation could be rendered in less than one hour. Windows Azure Worker Roles The cost of creating an on-premise render farm to render animations increases in proportion to the number of servers. The table below shows the cost of servers for creating a render farm, assuming a cost of $500 per server. Number of Servers Cost 1 $500 16 $8,000 256 $128,000 As well as the cost of the servers, there would be additional costs for networking, racks etc. Hosting an environment of 256 servers on-premise would require a server room with cooling, and some pretty hefty power cabling. The Windows Azure compute services provide worker roles, which are ideal for performing processor intensive compute tasks. With the scalability available in Windows Azure a job that takes 256 hours to complete could be perfumed using different numbers of worker roles. The time and cost of using 1, 16 or 256 worker roles is shown below. Number of Worker Roles Render Time Cost 1 256 hours $30.72 16 16 hours $30.72 256 1 hour $30.72 Using worker roles in Windows Azure provides the same cost for the 256 hour job, irrespective of the number of worker roles used. Provided the compute task can be broken down into many small units, and the worker role compute power can be used effectively, it makes sense to scale the application so that the task is completed quickly, making the results available in a timely fashion. The task of rendering 2,000 frames in an animation is one that can easily be broken down into 2,000 individual pieces, which can be performed by a number of worker roles. Creating a Render Farm in Windows Azure The architecture of the render farm is shown in the following diagram. The render farm is a hybrid application with the following components: · On-Premise o Windows Kinect – Used combined with the Kinect Explorer to create a stream of depth images. o Animation Creator – This application uses the depth images from the Kinect sensor to create scene description files for PolyRay. These files are then uploaded to the jobs blob container, and job messages added to the jobs queue. o Process Monitor – This application queries the role instance lifecycle table and displays statistics about the render farm environment and render process. o Image Downloader – This application polls the image queue and downloads the rendered animation files once they are complete. · Windows Azure o Azure Storage – Queues and blobs are used for the scene description files and completed frames. A table is used to store the statistics about the rendering environment. The architecture of each worker role is shown below. The worker role is configured to use local storage, which provides file storage on the worker role instance that can be use by the applications to render the image and transform the format of the image. The service definition for the worker role with the local storage configuration highlighted is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceDefinition name="CloudRay" > <WorkerRole name="CloudRayWorkerRole" vmsize="Small"> <Imports> </Imports> <ConfigurationSettings> <Setting name="DataConnectionString" /> </ConfigurationSettings> <LocalResources> <LocalStorage name="RayFolder" cleanOnRoleRecycle="true" /> </LocalResources> </WorkerRole> </ServiceDefinition> The two executable programs, PolyRay.exe and DTA.exe are included in the Azure project, with Copy Always set as the property. PolyRay will take the scene description file and render it to a Truevision TGA file. As the TGA format has not seen much use since the mid 90’s it is converted to a JPG image using Dave's Targa Animator, another shareware application from the 90’s. Each worker roll will use the following process to render the animation frames. 1. The worker process polls the job queue, if a job is available the scene description file is downloaded from blob storage to local storage. 2. PolyRay.exe is started in a process with the appropriate command line arguments to render the image as a TGA file. 3. DTA.exe is started in a process with the appropriate command line arguments convert the TGA file to a JPG file. 4. The JPG file is uploaded from local storage to the images blob container. 5. A message is placed on the images queue to indicate a new image is available for download. 6. The job message is deleted from the job queue. 7. The role instance lifecycle table is updated with statistics on the number of frames rendered by the worker role instance, and the CPU time used. The code for this is shown below. public override void Run() { // Set environment variables string polyRayPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), PolyRayLocation); string dtaPath = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), DTALocation); LocalResource rayStorage = RoleEnvironment.GetLocalResource("RayFolder"); string localStorageRootPath = rayStorage.RootPath; JobQueue jobQueue = new JobQueue("renderjobs"); JobQueue downloadQueue = new JobQueue("renderimagedownloadjobs"); CloudRayBlob sceneBlob = new CloudRayBlob("scenes"); CloudRayBlob imageBlob = new CloudRayBlob("images"); RoleLifecycleDataSource roleLifecycleDataSource = new RoleLifecycleDataSource(); Frames = 0; while (true) { // Get the render job from the queue CloudQueueMessage jobMsg = jobQueue.Get(); if (jobMsg != null) { // Get the file details string sceneFile = jobMsg.AsString; string tgaFile = sceneFile.Replace(".pi", ".tga"); string jpgFile = sceneFile.Replace(".pi", ".jpg"); string sceneFilePath = Path.Combine(localStorageRootPath, sceneFile); string tgaFilePath = Path.Combine(localStorageRootPath, tgaFile); string jpgFilePath = Path.Combine(localStorageRootPath, jpgFile); // Copy the scene file to local storage sceneBlob.DownloadFile(sceneFilePath); // Run the ray tracer. string polyrayArguments = string.Format("\"{0}\" -o \"{1}\" -a 2", sceneFilePath, tgaFilePath); Process polyRayProcess = new Process(); polyRayProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), polyRayPath); polyRayProcess.StartInfo.Arguments = polyrayArguments; polyRayProcess.Start(); polyRayProcess.WaitForExit(); // Convert the image string dtaArguments = string.Format(" {0} /FJ /P{1}", tgaFilePath, Path.GetDirectoryName (jpgFilePath)); Process dtaProcess = new Process(); dtaProcess.StartInfo.FileName = Path.Combine(Environment.GetEnvironmentVariable("RoleRoot"), dtaPath); dtaProcess.StartInfo.Arguments = dtaArguments; dtaProcess.Start(); dtaProcess.WaitForExit(); // Upload the image to blob storage imageBlob.UploadFile(jpgFilePath); // Add a download job. downloadQueue.Add(jpgFile); // Delete the render job message jobQueue.Delete(jobMsg); Frames++; } else { Thread.Sleep(1000); } // Log the worker role activity. roleLifecycleDataSource.Alive ("CloudRayWorker", RoleLifecycleDataSource.RoleLifecycleId, Frames); } } Monitoring Worker Role Instance Lifecycle In order to get more accurate statistics about the lifecycle of the worker role instances used to render the animation data was tracked in an Azure storage table. The following class was used to track the worker role lifecycles in Azure storage. public class RoleLifecycle : TableServiceEntity { public string ServerName { get; set; } public string Status { get; set; } public DateTime StartTime { get; set; } public DateTime EndTime { get; set; } public long SecondsRunning { get; set; } public DateTime LastActiveTime { get; set; } public int Frames { get; set; } public string Comment { get; set; } public RoleLifecycle() { } public RoleLifecycle(string roleName) { PartitionKey = roleName; RowKey = Utils.GetAscendingRowKey(); Status = "Started"; StartTime = DateTime.UtcNow; LastActiveTime = StartTime; EndTime = StartTime; SecondsRunning = 0; Frames = 0; } } A new instance of this class is created and added to the storage table when the role starts. It is then updated each time the worker renders a frame to record the total number of frames rendered and the total processing time. These statistics are used be the monitoring application to determine the effectiveness of use of resources in the render farm. Rendering the Animation The Azure solution was deployed to Windows Azure with the service configuration set to 16 worker role instances. This allows for the application to be tested in the cloud environment, and the performance of the application determined. When I demo the application at conferences and user groups I often start with 16 instances, and then scale up the application to the full 256 instances. The configuration to run 16 instances is shown below. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="16" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> About six minutes after deploying the application the first worker roles become active and start to render the first frames of the animation. The CloudRay Monitor application displays an icon for each worker role instance, with a number indicating the number of frames that the worker role has rendered. The statistics on the left show the number of active worker roles and statistics about the render process. The render time is the time since the first worker role became active; the CPU time is the total amount of processing time used by all worker role instances to render the frames. Five minutes after the first worker role became active the last of the 16 worker roles activated. By this time the first seven worker roles had each rendered one frame of the animation. With 16 worker roles u and running it can be seen that one hour and 45 minutes CPU time has been used to render 32 frames with a render time of just under 10 minutes. At this rate it would take over 10 hours to render the 2,000 frames of the full animation. In order to complete the animation in under an hour more processing power will be required. Scaling the render farm from 16 instances to 256 instances is easy using the new management portal. The slider is set to 256 instances, and the configuration saved. We do not need to re-deploy the application, and the 16 instances that are up and running will not be affected. Alternatively, the configuration file for the Azure service could be modified to specify 256 instances. <?xml version="1.0" encoding="utf-8"?> <ServiceConfiguration serviceName="CloudRay" xmlns="http://schemas.microsoft.com/ServiceHosting/2008/10/ServiceConfiguration" osFamily="1" osVersion="*"> <Role name="CloudRayWorkerRole"> <Instances count="256" /> <ConfigurationSettings> <Setting name="DataConnectionString" value="DefaultEndpointsProtocol=https;AccountName=cloudraydata;AccountKey=..." /> </ConfigurationSettings> </Role> </ServiceConfiguration> Six minutes after the new configuration has been applied 75 new worker roles have activated and are processing their first frames. Five minutes later the full configuration of 256 worker roles is up and running. We can see that the average rate of frame rendering has increased from 3 to 12 frames per minute, and that over 17 hours of CPU time has been utilized in 23 minutes. In this test the time to provision 140 worker roles was about 11 minutes, which works out at about one every five seconds. We are now half way through the rendering, with 1,000 frames complete. This has utilized just under three days of CPU time in a little over 35 minutes. The animation is now complete, with 2,000 frames rendered in a little over 52 minutes. The CPU time used by the 256 worker roles is 6 days, 7 hours and 22 minutes with an average frame rate of 38 frames per minute. The rendering of the last 1,000 frames took 16 minutes 27 seconds, which works out at a rendering rate of 60 frames per minute. The frame counts in the server instances indicate that the use of a queue to distribute the workload has been very effective in distributing the load across the 256 worker role instances. The first 16 instances that were deployed first have rendered between 11 and 13 frames each, whilst the 240 instances that were added when the application was scaled have rendered between 6 and 9 frames each. Completed Animation I’ve uploaded the completed animation to YouTube, a low resolution preview is shown below. Pin Board Animation Created using Windows Kinect and 256 Windows Azure Worker Roles The animation can be viewed in 1280x720 resolution at the following link: http://www.youtube.com/watch?v=n5jy6bvSxWc Effective Use of Resources According to the CloudRay monitor statistics the animation took 6 days, 7 hours and 22 minutes CPU to render, this works out at 152 hours of compute time, rounded up to the nearest hour. As the usage for the worker role instances are billed for the full hour, it may have been possible to render the animation using fewer than 256 worker roles. When deciding the optimal usage of resources, the time required to provision and start the worker roles must also be considered. In the demo I started with 16 worker roles, and then scaled the application to 256 worker roles. It would have been more optimal to start the application with maybe 200 worker roles, and utilized the full hour that I was being billed for. This would, however, have prevented showing the ease of scalability of the application. The new management portal displays the CPU usage across the worker roles in the deployment. The average CPU usage across all instances is 93.27%, with over 99% used when all the instances are up and running. This shows that the worker role resources are being used very effectively. Grid Computing Scenarios Although I am using this scenario for a hobby project, there are many scenarios where a large amount of compute power is required for a short period of time. Windows Azure provides a great platform for developing these types of grid computing applications, and can work out very cost effective. · Windows Azure can provide massive compute power, on demand, in a matter of minutes. · The use of queues to manage the load balancing of jobs between role instances is a simple and effective solution. · Using a cloud-computing platform like Windows Azure allows proof-of-concept scenarios to be tested and evaluated on a very low budget. · No charges for inbound data transfer makes the uploading of large data sets to Windows Azure Storage services cost effective. (Transaction charges still apply.) Tips for using Windows Azure for Grid Computing Scenarios I found the implementation of a render farm using Windows Azure a fairly simple scenario to implement. I was impressed by ease of scalability that Azure provides, and by the short time that the application took to scale from 16 to 256 worker role instances. In this case it was around 13 minutes, in other tests it took between 10 and 20 minutes. The following tips may be useful when implementing a grid computing project in Windows Azure. · Using an Azure Storage queue to load-balance the units of work across multiple worker roles is simple and very effective. The design I have used in this scenario could easily scale to many thousands of worker role instances. · Windows Azure accounts are typically limited to 20 cores. If you need to use more than this, a call to support and a credit card check will be required. · Be aware of how the billing model works. You will be charged for worker role instances for the full clock our in which the instance is deployed. Schedule the workload to start just after the clock hour has started. · Monitor the utilization of the resources you are provisioning, ensure that you are not paying for worker roles that are idle. · If you are deploying third party applications to worker roles, you may well run into licensing issues. Purchasing software licenses on a per-processor basis when using hundreds of processors for a short time period would not be cost effective. · Third party software may also require installation onto the worker roles, which can be accomplished using start-up tasks. Bear in mind that adding a startup task and possible re-boot will add to the time required for the worker role instance to start and activate. An alternative may be to use a prepared VM and use VM roles. · Consider using the Windows Azure Autoscaling Application Block (WASABi) to autoscale the worker roles in your application. When using a large number of worker roles, the utilization must be carefully monitored, if the scaling algorithms are not optimal it could get very expensive!

Read the article
Using R to Analyze G1GC Log Files

- by user12620111

Using R to Analyze G1GC Log Files body, td { font-family: sans-serif; background-color: white; font-size: 12px; margin: 8px; } tt, code, pre { font-family: 'DejaVu Sans Mono', 'Droid Sans Mono', 'Lucida Console', Consolas, Monaco, monospace; } h1 { font-size:2.2em; } h2 { font-size:1.8em; } h3 { font-size:1.4em; } h4 { font-size:1.0em; } h5 { font-size:0.9em; } h6 { font-size:0.8em; } a:visited { color: rgb(50%, 0%, 50%); } pre { margin-top: 0; max-width: 95%; border: 1px solid #ccc; white-space: pre-wrap; } pre code { display: block; padding: 0.5em; } code.r, code.cpp { background-color: #F8F8F8; } table, td, th { border: none; } blockquote { color:#666666; margin:0; padding-left: 1em; border-left: 0.5em #EEE solid; } hr { height: 0px; border-bottom: none; border-top-width: thin; border-top-style: dotted; border-top-color: #999999; } @media print { * { background: transparent !important; color: black !important; filter:none !important; -ms-filter: none !important; } body { font-size:12pt; max-width:100%; } a, a:visited { text-decoration: underline; } hr { visibility: hidden; page-break-before: always; } pre, blockquote { padding-right: 1em; page-break-inside: avoid; } tr, img { page-break-inside: avoid; } img { max-width: 100% !important; } @page :left { margin: 15mm 20mm 15mm 10mm; } @page :right { margin: 15mm 10mm 15mm 20mm; } p, h2, h3 { orphans: 3; widows: 3; } h2, h3 { page-break-after: avoid; } } pre .operator, pre .paren { color: rgb(104, 118, 135) } pre .literal { color: rgb(88, 72, 246) } pre .number { color: rgb(0, 0, 205); } pre .comment { color: rgb(76, 136, 107); } pre .keyword { color: rgb(0, 0, 255); } pre .identifier { color: rgb(0, 0, 0); } pre .string { color: rgb(3, 106, 7); } var hljs=new function(){function m(p){return p.replace(/&/gm,"&").replace(/"}while(y.length||w.length){var v=u().splice(0,1)[0];z+=m(x.substr(q,v.offset-q));q=v.offset;if(v.event=="start"){z+=t(v.node);s.push(v.node)}else{if(v.event=="stop"){var p,r=s.length;do{r--;p=s[r];z+=("")}while(p!=v.node);s.splice(r,1);while(r'+M[0]+""}else{r+=M[0]}O=P.lR.lastIndex;M=P.lR.exec(L)}return r+L.substr(O,L.length-O)}function J(L,M){if(M.sL&&e[M.sL]){var r=d(M.sL,L);x+=r.keyword_count;return r.value}else{return F(L,M)}}function I(M,r){var L=M.cN?'':"";if(M.rB){y+=L;M.buffer=""}else{if(M.eB){y+=m(r)+L;M.buffer=""}else{y+=L;M.buffer=r}}D.push(M);A+=M.r}function G(N,M,Q){var R=D[D.length-1];if(Q){y+=J(R.buffer+N,R);return false}var P=q(M,R);if(P){y+=J(R.buffer+N,R);I(P,M);return P.rB}var L=v(D.length-1,M);if(L){var O=R.cN?"":"";if(R.rE){y+=J(R.buffer+N,R)+O}else{if(R.eE){y+=J(R.buffer+N,R)+O+m(M)}else{y+=J(R.buffer+N+M,R)+O}}while(L1){O=D[D.length-2].cN?"":"";y+=O;L--;D.length--}var r=D[D.length-1];D.length--;D[D.length-1].buffer="";if(r.starts){I(r.starts,"")}return R.rE}if(w(M,R)){throw"Illegal"}}var E=e[B];var D=[E.dM];var A=0;var x=0;var y="";try{var s,u=0;E.dM.buffer="";do{s=p(C,u);var t=G(s[0],s[1],s[2]);u+=s[0].length;if(!t){u+=s[1].length}}while(!s[2]);if(D.length1){throw"Illegal"}return{r:A,keyword_count:x,value:y}}catch(H){if(H=="Illegal"){return{r:0,keyword_count:0,value:m(C)}}else{throw H}}}function g(t){var p={keyword_count:0,r:0,value:m(t)};var r=p;for(var q in e){if(!e.hasOwnProperty(q)){continue}var s=d(q,t);s.language=q;if(s.keyword_count+s.rr.keyword_count+r.r){r=s}if(s.keyword_count+s.rp.keyword_count+p.r){r=p;p=s}}if(r.language){p.second_best=r}return p}function i(r,q,p){if(q){r=r.replace(/^((]+|\t)+)/gm,function(t,w,v,u){return w.replace(/\t/g,q)})}if(p){r=r.replace(/\n/g,"")}return r}function n(t,w,r){var x=h(t,r);var v=a(t);var y,s;if(v){y=d(v,x)}else{return}var q=c(t);if(q.length){s=document.createElement("pre");s.innerHTML=y.value;y.value=k(q,c(s),x)}y.value=i(y.value,w,r);var u=t.className;if(!u.match("(\\s|^)(language-)?"+v+"(\\s|$)")){u=u?(u+" "+v):v}if(/MSIE [678]/.test(navigator.userAgent)&&t.tagName=="CODE"&&t.parentNode.tagName=="PRE"){s=t.parentNode;var p=document.createElement("div");p.innerHTML=""+y.value+"";t=p.firstChild.firstChild;p.firstChild.cN=s.cN;s.parentNode.replaceChild(p.firstChild,s)}else{t.innerHTML=y.value}t.className=u;t.result={language:v,kw:y.keyword_count,re:y.r};if(y.second_best){t.second_best={language:y.second_best.language,kw:y.second_best.keyword_count,re:y.second_best.r}}}function o(){if(o.called){return}o.called=true;var r=document.getElementsByTagName("pre");for(var p=0;p|=||=||=|\\?|\\[|\\{|\\(|\\^|\\^=|\\||\\|=|\\|\\||~";this.ER="(?![\\s\\S])";this.BE={b:"\\\\.",r:0};this.ASM={cN:"string",b:"'",e:"'",i:"\\n",c:[this.BE],r:0};this.QSM={cN:"string",b:'"',e:'"',i:"\\n",c:[this.BE],r:0};this.CLCM={cN:"comment",b:"//",e:"$"};this.CBLCLM={cN:"comment",b:"/\\*",e:"\\*/"};this.HCM={cN:"comment",b:"#",e:"$"};this.NM={cN:"number",b:this.NR,r:0};this.CNM={cN:"number",b:this.CNR,r:0};this.BNM={cN:"number",b:this.BNR,r:0};this.inherit=function(r,s){var p={};for(var q in r){p[q]=r[q]}if(s){for(var q in s){p[q]=s[q]}}return p}}();hljs.LANGUAGES.cpp=function(){var a={keyword:{"false":1,"int":1,"float":1,"while":1,"private":1,"char":1,"catch":1,"export":1,virtual:1,operator:2,sizeof:2,dynamic_cast:2,typedef:2,const_cast:2,"const":1,struct:1,"for":1,static_cast:2,union:1,namespace:1,unsigned:1,"long":1,"throw":1,"volatile":2,"static":1,"protected":1,bool:1,template:1,mutable:1,"if":1,"public":1,friend:2,"do":1,"return":1,"goto":1,auto:1,"void":2,"enum":1,"else":1,"break":1,"new":1,extern:1,using:1,"true":1,"class":1,asm:1,"case":1,typeid:1,"short":1,reinterpret_cast:2,"default":1,"double":1,register:1,explicit:1,signed:1,typename:1,"try":1,"this":1,"switch":1,"continue":1,wchar_t:1,inline:1,"delete":1,alignof:1,char16_t:1,char32_t:1,constexpr:1,decltype:1,noexcept:1,nullptr:1,static_assert:1,thread_local:1,restrict:1,_Bool:1,complex:1},built_in:{std:1,string:1,cin:1,cout:1,cerr:1,clog:1,stringstream:1,istringstream:1,ostringstream:1,auto_ptr:1,deque:1,list:1,queue:1,stack:1,vector:1,map:1,set:1,bitset:1,multiset:1,multimap:1,unordered_set:1,unordered_map:1,unordered_multiset:1,unordered_multimap:1,array:1,shared_ptr:1}};return{dM:{k:a,i:"",k:a,r:10,c:["self"]}]}}}();hljs.LANGUAGES.r={dM:{c:[hljs.HCM,{cN:"number",b:"\\b0[xX][0-9a-fA-F]+[Li]?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+(?:[eE][+\\-]?\\d*)?L\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\b\\d+\\.(?!\\d)(?:i\\b)?",e:hljs.IMMEDIATE_RE,r:1},{cN:"number",b:"\\b\\d+(?:\\.\\d*)?(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"number",b:"\\.\\d+(?:[eE][+\\-]?\\d*)?i?\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"keyword",b:"(?:tryCatch|library|setGeneric|setGroupGeneric)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\.",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\.\\.\\d+(?![\\w.])",e:hljs.IMMEDIATE_RE,r:10},{cN:"keyword",b:"\\b(?:function)",e:hljs.IMMEDIATE_RE,r:2},{cN:"keyword",b:"(?:if|in|break|next|repeat|else|for|return|switch|while|try|stop|warning|require|attach|detach|source|setMethod|setClass)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"literal",b:"(?:NA|NA_integer_|NA_real_|NA_character_|NA_complex_)\\b",e:hljs.IMMEDIATE_RE,r:10},{cN:"literal",b:"(?:NULL|TRUE|FALSE|T|F|Inf|NaN)\\b",e:hljs.IMMEDIATE_RE,r:1},{cN:"identifier",b:"[a-zA-Z.][a-zA-Z0-9._]*\\b",e:hljs.IMMEDIATE_RE,r:0},{cN:"operator",b:"|=|| Using R to Analyze G1GC Log Files Using R to Analyze G1GC Log Files Introduction Working in Oracle Platform Integration gives an engineer opportunities to work on a wide array of technologies. My team’s goal is to make Oracle applications run best on the Solaris/SPARC platform. When looking for bottlenecks in a modern applications, one needs to be aware of not only how the CPUs and operating system are executing, but also network, storage, and in some cases, the Java Virtual Machine. I was recently presented with about 1.5 GB of Java Garbage First Garbage Collector log file data. If you’re not familiar with the subject, you might want to review Garbage First Garbage Collector Tuning by Monica Beckwith. The customer had been running Java HotSpot 1.6.0_31 to host a web application server. I was told that the Solaris/SPARC server was running a Java process launched using a commmand line that included the following flags: -d64 -Xms9g -Xmx9g -XX:+UseG1GC -XX:MaxGCPauseMillis=200 -XX:InitiatingHeapOccupancyPercent=80 -XX:PermSize=256m -XX:MaxPermSize=256m -XX:+PrintGC -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintGCDateStamps -XX:+PrintFlagsFinal -XX:+DisableExplicitGC -XX:+UnlockExperimentalVMOptions -XX:ParallelGCThreads=8 Several sources on the internet indicate that if I were to print out the 1.5 GB of log files, it would require enough paper to fill the bed of a pick up truck. Of course, it would be fruitless to try to scan the log files by hand. Tools will be required to summarize the contents of the log files. Others have encountered large Java garbage collection log files. There are existing tools to analyze the log files: IBM’s GC toolkit The chewiebug GCViewer gchisto HPjmeter Instead of using one of the other tools listed, I decide to parse the log files with standard Unix tools, and analyze the data with R. Data Cleansing The log files arrived in two different formats. I guess that the difference is that one set of log files was generated using a more verbose option, maybe -XX:+PrintHeapAtGC, and the other set of log files was generated without that option. Format 1 In some of the log files, the log files with the less verbose format, a single trace, i.e. the report of a singe garbage collection event, looks like this: {Heap before GC invocations=12280 (full 61): garbage-first heap total 9437184K, used 7499918K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 1 young (4096K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. 2014-05-14T07:24:00.988-0700: 60586.353: [GC pause (young) 7324M->7320M(9216M), 0.1567265 secs] Heap after GC invocations=12281 (full 61): garbage-first heap total 9437184K, used 7496533K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) region size 4096K, 0 young (0K), 0 survivors (0K) compacting perm gen total 262144K, used 144077K [0xffffffff40000000, 0xffffffff50000000, 0xffffffff50000000) the space 262144K, 54% used [0xffffffff40000000, 0xffffffff48cb3758, 0xffffffff48cb3800, 0xffffffff50000000) No shared spaces configured. } A simple grep can be used to extract a summary: $ grep "\[ GC pause (young" g1gc.log 2014-05-13T13:24:35.091-0700: 3.109: [GC pause (young) 20M->5029K(9216M), 0.0146328 secs] 2014-05-13T13:24:35.440-0700: 3.459: [GC pause (young) 9125K->6077K(9216M), 0.0086723 secs] 2014-05-13T13:24:37.581-0700: 5.599: [GC pause (young) 25M->8470K(9216M), 0.0203820 secs] 2014-05-13T13:24:42.686-0700: 10.704: [GC pause (young) 44M->15M(9216M), 0.0288848 secs] 2014-05-13T13:24:48.941-0700: 16.958: [GC pause (young) 51M->20M(9216M), 0.0491244 secs] 2014-05-13T13:24:56.049-0700: 24.066: [GC pause (young) 92M->26M(9216M), 0.0525368 secs] 2014-05-13T13:25:34.368-0700: 62.383: [GC pause (young) 602M->68M(9216M), 0.1721173 secs] But that format wasn't easily read into R, so I needed to be a bit more tricky. I used the following Unix command to create a summary file that was easy for R to read. $ echo "SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime" $ grep "\[GC pause (young" g1gc.log | grep -v mark | sed -e 's/[A-SU-z,]/ /g' -e 's/->/ /' -e 's/: / /g' | more SecondsSinceLaunch BeforeSize AfterSize TotalSize RealTime 2014-05-13T13:24:35.091-0700 3.109 20 5029 9216 0.0146328 2014-05-13T13:24:35.440-0700 3.459 9125 6077 9216 0.0086723 2014-05-13T13:24:37.581-0700 5.599 25 8470 9216 0.0203820 2014-05-13T13:24:42.686-0700 10.704 44 15 9216 0.0288848 2014-05-13T13:24:48.941-0700 16.958 51 20 9216 0.0491244 2014-05-13T13:24:56.049-0700 24.066 92 26 9216 0.0525368 2014-05-13T13:25:34.368-0700 62.383 602 68 9216 0.1721173 Format 2 In some of the log files, the log files with the more verbose format, a single trace, i.e. the report of a singe garbage collection event, was more complicated than Format 1. Here is a text file with an example of a single G1GC trace in the second format. As you can see, it is quite complicated. It is nice that there is so much information available, but the level of detail can be overwhelming. I wrote this awk script (download) to summarize each trace on a single line. #!/usr/bin/env awk -f BEGIN { printf("SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize\n") } ###################### # Save count data from lines that are at the start of each G1GC trace. # Each trace starts out like this: # {Heap before GC invocations=14 (full 0): # garbage-first heap total 9437184K, used 325496K [0xfffffffd00000000, 0xffffffff40000000, 0xffffffff40000000) ###################### /{Heap.*full/{ gsub ( "\\)" , "" ); nf=split($0,a,"="); split(a[2],b," "); getline; if ( match($0, "first") ) { G1GC=1; IncrementalCount=b[1]; FullCount=substr( b[3], 1, length(b[3])-1 ); } else { G1GC=0; } } ###################### # Pull out time stamps that are in lines with this format: # 2014-05-12T14:02:06.025-0700: 94.312: [GC pause (young), 0.08870154 secs] ###################### /GC pause/ { DateTime=$1; SecondsSinceLaunch=substr($2, 1, length($2)-1); } ###################### # Heap sizes are in lines that look like this: # [ 4842M->4838M(9216M)] ###################### /\[ .*]$/ { gsub ( "\\[" , "" ); gsub ( "\ \]" , "" ); gsub ( "->" , " " ); gsub ( "\$ " , " " ); gsub ( "\ $" , " " ); split($0,a," "); if ( split(a[1],b,"M") > 1 ) {BeforeSize=b[1]*1024;} if ( split(a[1],b,"K") > 1 ) {BeforeSize=b[1];} if ( split(a[2],b,"M") > 1 ) {AfterSize=b[1]*1024;} if ( split(a[2],b,"K") > 1 ) {AfterSize=b[1];} if ( split(a[3],b,"M") > 1 ) {TotalSize=b[1]*1024;} if ( split(a[3],b,"K") > 1 ) {TotalSize=b[1];} } ###################### # Emit an output line when you find input that looks like this: # [Times: user=1.41 sys=0.08, real=0.24 secs] ###################### /\[Times/ { if (G1GC==1) { gsub ( "," , "" ); split($2,a,"="); UserTime=a[2]; split($3,a,"="); SysTime=a[2]; split($4,a,"="); RealTime=a[2]; print DateTime,SecondsSinceLaunch,IncrementalCount,FullCount,UserTime,SysTime,RealTime,BeforeSize,AfterSize,TotalSize; G1GC=0; } } The resulting summary is about 25X smaller that the original file, but still difficult for a human to digest. SecondsSinceLaunch IncrementalCount FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ... 2014-05-12T18:36:34.669-0700: 3985.744 561 0 0.57 0.06 0.16 1724416 1720320 9437184 2014-05-12T18:36:34.839-0700: 3985.914 562 0 0.51 0.06 0.19 1724416 1720320 9437184 2014-05-12T18:36:35.069-0700: 3986.144 563 0 0.60 0.04 0.27 1724416 1721344 9437184 2014-05-12T18:36:35.354-0700: 3986.429 564 0 0.33 0.04 0.09 1725440 1722368 9437184 2014-05-12T18:36:35.545-0700: 3986.620 565 0 0.58 0.04 0.17 1726464 1722368 9437184 2014-05-12T18:36:35.726-0700: 3986.801 566 0 0.43 0.05 0.12 1726464 1722368 9437184 2014-05-12T18:36:35.856-0700: 3986.930 567 0 0.30 0.04 0.07 1726464 1723392 9437184 2014-05-12T18:36:35.947-0700: 3987.023 568 0 0.61 0.04 0.26 1727488 1723392 9437184 2014-05-12T18:36:36.228-0700: 3987.302 569 0 0.46 0.04 0.16 1731584 1724416 9437184 Reading the Data into R Once the GC log data had been cleansed, either by processing the first format with the shell script, or by processing the second format with the awk script, it was easy to read the data into R. g1gc.df = read.csv("summary.txt", row.names = NULL, stringsAsFactors=FALSE,sep="") str(g1gc.df) ## 'data.frame': 8307 obs. of 10 variables: ## $ row.names : chr "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ... ## $ SecondsSinceLaunch: num 1.16 1.47 1.97 3.83 6.1 ... ## $ IncrementalCount : int 0 1 2 3 4 5 6 7 8 9 ... ## $ FullCount : int 0 0 0 0 0 0 0 0 0 0 ... ## $ UserTime : num 0.11 0.05 0.04 0.21 0.08 0.26 0.31 0.33 0.34 0.56 ... ## $ SysTime : num 0.04 0.01 0.01 0.05 0.01 0.06 0.07 0.06 0.07 0.09 ... ## $ RealTime : num 0.02 0.02 0.01 0.04 0.02 0.04 0.05 0.04 0.04 0.06 ... ## $ BeforeSize : int 8192 5496 5768 22528 24576 43008 34816 53248 55296 93184 ... ## $ AfterSize : int 1400 1672 2557 4907 7072 14336 16384 18432 19456 21504 ... ## $ TotalSize : int 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 9437184 ... head(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount ## 1 2014-05-12T14:00:32.868-0700: 1.161 0 ## 2 2014-05-12T14:00:33.179-0700: 1.472 1 ## 3 2014-05-12T14:00:33.677-0700: 1.969 2 ## 4 2014-05-12T14:00:35.538-0700: 3.830 3 ## 5 2014-05-12T14:00:37.811-0700: 6.103 4 ## 6 2014-05-12T14:00:41.428-0700: 9.720 5 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 1 0 0.11 0.04 0.02 8192 1400 9437184 ## 2 0 0.05 0.01 0.02 5496 1672 9437184 ## 3 0 0.04 0.01 0.01 5768 2557 9437184 ## 4 0 0.21 0.05 0.04 22528 4907 9437184 ## 5 0 0.08 0.01 0.02 24576 7072 9437184 ## 6 0 0.26 0.06 0.04 43008 14336 9437184 Basic Statistics Once the data has been read into R, simple statistics are very easy to generate. All of the numbers from high school statistics are available via simple commands. For example, generate a summary of every column: summary(g1gc.df) ## row.names SecondsSinceLaunch IncrementalCount FullCount ## Length:8307 Min. : 1 Min. : 0 Min. : 0.0 ## Class :character 1st Qu.: 9977 1st Qu.:2048 1st Qu.: 0.0 ## Mode :character Median :12855 Median :4136 Median : 12.0 ## Mean :12527 Mean :4156 Mean : 31.6 ## 3rd Qu.:15758 3rd Qu.:6262 3rd Qu.: 61.0 ## Max. :55484 Max. :8391 Max. :113.0 ## UserTime SysTime RealTime BeforeSize ## Min. :0.040 Min. :0.0000 Min. : 0.0 Min. : 5476 ## 1st Qu.:0.470 1st Qu.:0.0300 1st Qu.: 0.1 1st Qu.:5137920 ## Median :0.620 Median :0.0300 Median : 0.1 Median :6574080 ## Mean :0.751 Mean :0.0355 Mean : 0.3 Mean :5841855 ## 3rd Qu.:0.920 3rd Qu.:0.0400 3rd Qu.: 0.2 3rd Qu.:7084032 ## Max. :3.370 Max. :1.5600 Max. :488.1 Max. :8696832 ## AfterSize TotalSize ## Min. : 1380 Min. :9437184 ## 1st Qu.:5002752 1st Qu.:9437184 ## Median :6559744 Median :9437184 ## Mean :5785454 Mean :9437184 ## 3rd Qu.:7054336 3rd Qu.:9437184 ## Max. :8482816 Max. :9437184 Q: What is the total amount of User CPU time spent in garbage collection? sum(g1gc.df$UserTime) ## [1] 6236 As you can see, less than two hours of CPU time was spent in garbage collection. Is that too much? To find the percentage of time spent in garbage collection, divide the number above by total_elapsed_time*CPU_count. In this case, there are a lot of CPU’s and it turns out the the overall amount of CPU time spent in garbage collection isn’t a problem when viewed in isolation. When calculating rates, i.e. events per unit time, you need to ask yourself if the rate is homogenous across the time period in the log file. Does the log file include spikes of high activity that should be separately analyzed? Averaging in data from nights and weekends with data from business hours may alias problems. If you have a reason to suspect that the garbage collection rates include peaks and valleys that need independent analysis, see the “Time Series” section, below. Q: How much garbage is collected on each pass? The amount of heap space that is recovered per GC pass is surprisingly low: At least one collection didn’t recover any data. (“Min.=0”) 25% of the passes recovered 3MB or less. (“1st Qu.=3072”) Half of the GC passes recovered 4MB or less. (“Median=4096”) The average amount recovered was 56MB. (“Mean=56390”) 75% of the passes recovered 36MB or less. (“3rd Qu.=36860”) At least one pass recovered 2GB. (“Max.=2121000”) g1gc.df$Delta = g1gc.df$BeforeSize - g1gc.df$AfterSize summary(g1gc.df$Delta) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0 3070 4100 56400 36900 2120000 Q: What is the maximum User CPU time for a single collection? The worst garbage collection (“Max.”) is many standard deviations away from the mean. The data appears to be right skewed. summary(g1gc.df$UserTime) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.040 0.470 0.620 0.751 0.920 3.370 sd(g1gc.df$UserTime) ## [1] 0.3966 Basic Graphics Once the data is in R, it is trivial to plot the data with formats including dot plots, line charts, bar charts (simple, stacked, grouped), pie charts, boxplots, scatter plots histograms, and kernel density plots. Histogram of User CPU Time per Collection I don't think that this graph requires any explanation. hist(g1gc.df$UserTime, main="User CPU Time per Collection", xlab="Seconds", ylab="Frequency") Box plot to identify outliers When the initial data is viewed with a box plot, you can see the one crazy outlier in the real time per GC. Save this data point for future analysis and drop the outlier so that it’s not throwing off our statistics. Now the box plot shows many outliers, which will be examined later, using times series analysis. Notice that the scale of the x-axis changes drastically once the crazy outlier is removed. par(mfrow=c(2,1)) boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(dominated by a crazy outlier)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") crazy.outlier.df=g1gc.df[g1gc.df$RealTime > 400,] g1gc.df=g1gc.df[g1gc.df$RealTime < 400,] boxplot(g1gc.df$UserTime,g1gc.df$SysTime,g1gc.df$RealTime, main="Box Plot of Time per GC\n(crazy outlier excluded)", names=c("usr","sys","elapsed"), xlab="Seconds per GC", ylab="Time (Seconds)", horizontal = TRUE, outcol="red") box(which = "outer", lty = "solid") Here is the crazy outlier for future analysis: crazy.outlier.df ## row.names SecondsSinceLaunch IncrementalCount ## 8233 2014-05-12T23:15:43.903-0700: 20741 8316 ## FullCount UserTime SysTime RealTime BeforeSize AfterSize TotalSize ## 8233 112 0.55 0.42 488.1 8381440 8235008 9437184 ## Delta ## 8233 146432 R Time Series Data To analyze the garbage collection as a time series, I’ll use Z’s Ordered Observations (zoo). “zoo is the creator for an S3 class of indexed totally ordered observations which includes irregular time series.” require(zoo) ## Loading required package: zoo ## ## Attaching package: 'zoo' ## ## The following objects are masked from 'package:base': ## ## as.Date, as.Date.numeric head(g1gc.df[,1]) ## [1] "2014-05-12T14:00:32.868-0700:" "2014-05-12T14:00:33.179-0700:" ## [3] "2014-05-12T14:00:33.677-0700:" "2014-05-12T14:00:35.538-0700:" ## [5] "2014-05-12T14:00:37.811-0700:" "2014-05-12T14:00:41.428-0700:" options("digits.secs"=3) times=as.POSIXct( g1gc.df[,1], format="%Y-%m-%dT%H:%M:%OS%z:") g1gc.z = zoo(g1gc.df[,-c(1)], order.by=times) head(g1gc.z) ## SecondsSinceLaunch IncrementalCount FullCount ## 2014-05-12 17:00:32.868 1.161 0 0 ## 2014-05-12 17:00:33.178 1.472 1 0 ## 2014-05-12 17:00:33.677 1.969 2 0 ## 2014-05-12 17:00:35.538 3.830 3 0 ## 2014-05-12 17:00:37.811 6.103 4 0 ## 2014-05-12 17:00:41.427 9.720 5 0 ## UserTime SysTime RealTime BeforeSize AfterSize ## 2014-05-12 17:00:32.868 0.11 0.04 0.02 8192 1400 ## 2014-05-12 17:00:33.178 0.05 0.01 0.02 5496 1672 ## 2014-05-12 17:00:33.677 0.04 0.01 0.01 5768 2557 ## 2014-05-12 17:00:35.538 0.21 0.05 0.04 22528 4907 ## 2014-05-12 17:00:37.811 0.08 0.01 0.02 24576 7072 ## 2014-05-12 17:00:41.427 0.26 0.06 0.04 43008 14336 ## TotalSize Delta ## 2014-05-12 17:00:32.868 9437184 6792 ## 2014-05-12 17:00:33.178 9437184 3824 ## 2014-05-12 17:00:33.677 9437184 3211 ## 2014-05-12 17:00:35.538 9437184 17621 ## 2014-05-12 17:00:37.811 9437184 17504 ## 2014-05-12 17:00:41.427 9437184 28672 Example of Two Benchmark Runs in One Log File The data in the following graph is from a different log file, not the one of primary interest to this article. I’m including this image because it is an example of idle periods followed by busy periods. It would be uninteresting to average the rate of garbage collection over the entire log file period. More interesting would be the rate of garbage collect in the two busy periods. Are they the same or different? Your production data may be similar, for example, bursts when employees return from lunch and idle times on weekend evenings, etc. Once the data is in an R Time Series, you can analyze isolated time windows. Clipping the Time Series data Flashing back to our test case… Viewing the data as a time series is interesting. You can see that the work intensive time period is between 9:00 PM and 3:00 AM. Lets clip the data to the interesting period: par(mfrow=c(2,1)) plot(g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Complete Log File", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") clipped.g1gc.z=window(g1gc.z, start=as.POSIXct("2014-05-12 21:00:00"), end=as.POSIXct("2014-05-13 03:00:00")) plot(clipped.g1gc.z$UserTime, type="h", main="User Time per GC\nTime: Limited to Benchmark Execution", xlab="Time of Day", ylab="CPU Seconds per GC", col="#1b9e77") box(which = "outer", lty = "solid") Cumulative Incremental and Full GC count Here is the cumulative incremental and full GC count. When the line is very steep, it indicates that the GCs are repeating very quickly. Notice that the scale on the Y axis is different for full vs. incremental. plot(clipped.g1gc.z[,c(2:3)], main="Cumulative Incremental and Full GC count", xlab="Time of Day", col="#1b9e77") GC Analysis of Benchmark Execution using Time Series data In the following series of 3 graphs: The “After Size” show the amount of heap space in use after each garbage collection. Many Java objects are still referenced, i.e. alive, during each garbage collection. This may indicate that the application has a memory leak, or may indicate that the application has a very large memory footprint. Typically, an application's memory footprint plateau's in the early stage of execution. One would expect this graph to have a flat top. The steep decline in the heap space may indicate that the application crashed after 2:00. The second graph shows that the outliers in real execution time, discussed above, occur near 2:00. when the Java heap seems to be quite full. The third graph shows that Full GCs are infrequent during the first few hours of execution. The rate of Full GC's, (the slope of the cummulative Full GC line), changes near midnight. plot(clipped.g1gc.z[,c("AfterSize","RealTime","FullCount")], xlab="Time of Day", col=c("#1b9e77","red","#1b9e77")) GC Analysis of heap recovered Each GC trace includes the amount of heap space in use before and after the individual GC event. During garbage coolection, unreferenced objects are identified, the space holding the unreferenced objects is freed, and thus, the difference in before and after usage indicates how much space has been freed. The following box plot and bar chart both demonstrate the same point - the amount of heap space freed per garbage colloection is surprisingly low. par(mfrow=c(2,1)) boxplot(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", horizontal = TRUE, col="red") hist(as.vector(clipped.g1gc.z$Delta), main="Amount of Heap Recovered per GC Pass", xlab="Size in KB", breaks=100, col="red") box(which = "outer", lty = "solid") This graph is the most interesting. The dark blue area shows how much heap is occupied by referenced Java objects. This represents memory that holds live data. The red fringe at the top shows how much data was recovered after each garbage collection. barplot(clipped.g1gc.z[,c("AfterSize","Delta")], col=c("#7570b3","#e7298a"), xlab="Time of Day", border=NA) legend("topleft", c("Live Objects","Heap Recovered on GC"), fill=c("#7570b3","#e7298a")) box(which = "outer", lty = "solid") When I discuss the data in the log files with the customer, I will ask for an explaination for the large amount of referenced data resident in the Java heap. There are two are posibilities: There is a memory leak and the amount of space required to hold referenced objects will continue to grow, limited only by the maximum heap size. After the maximum heap size is reached, the JVM will throw an “Out of Memory” exception every time that the application tries to allocate a new object. If this is the case, the aplication needs to be debugged to identify why old objects are referenced when they are no longer needed. The application has a legitimate requirement to keep a large amount of data in memory. The customer may want to further increase the maximum heap size. Another possible solution would be to partition the application across multiple cluster nodes, where each node has responsibility for managing a unique subset of the data. Conclusion In conclusion, R is a very powerful tool for the analysis of Java garbage collection log files. The primary difficulty is data cleansing so that information can be read into an R data frame. Once the data has been read into R, a rich set of tools may be used for thorough evaluation.

Read the article
Mongodb performance on Windows

- by Chris

I've been researching nosql options available for .NET lately and MongoDB is emerging as a clear winner in terms of availability and support, so tonight I decided to give it a go. I downloaded version 1.2.4 (Windows x64 binary) from the mongodb site and ran it with the following options: C:\mongodb\bin>mkdir data C:\mongodb\bin>mongod -dbpath ./data --cpu --quiet I then loaded up the latest mongodb-csharp driver from http://github.com/samus/mongodb-csharp and immediately ran the benchmark program. Having heard about how "amazingly fast" MongoDB is, I was rather shocked at the poor benchmark performance. Starting Tests encode (small).........................................320000 00:00:00.0156250 encode (medium)........................................80000 00:00:00.0625000 encode (large).........................................1818 00:00:02.7500000 decode (small).........................................320000 00:00:00.0156250 decode (medium)........................................160000 00:00:00.0312500 decode (large).........................................2370 00:00:02.1093750 insert (small, no index)...............................2176 00:00:02.2968750 insert (medium, no index)..............................2269 00:00:02.2031250 insert (large, no index)...............................778 00:00:06.4218750 insert (small, indexed)................................2051 00:00:02.4375000 insert (medium, indexed)...............................2133 00:00:02.3437500 insert (large, indexed)................................835 00:00:05.9843750 batch insert (small, no index).........................53333 00:00:00.0937500 batch insert (medium, no index)........................26666 00:00:00.1875000 batch insert (large, no index).........................1114 00:00:04.4843750 find_one (small, no index).............................350 00:00:14.2812500 find_one (medium, no index)............................204 00:00:24.4687500 find_one (large, no index).............................135 00:00:37.0156250 find_one (small, indexed)..............................352 00:00:14.1718750 find_one (medium, indexed).............................184 00:00:27.0937500 find_one (large, indexed)..............................128 00:00:38.9062500 find (small, no index).................................516 00:00:09.6718750 find (medium, no index)................................316 00:00:15.7812500 find (large, no index).................................216 00:00:23.0468750 find (small, indexed)..................................532 00:00:09.3906250 find (medium, indexed).................................346 00:00:14.4375000 find (large, indexed)..................................212 00:00:23.5468750 find range (small, indexed)............................440 00:00:11.3593750 find range (medium, indexed)...........................294 00:00:16.9531250 find range (large, indexed)............................199 00:00:25.0625000 Press any key to continue... For starters, I can get better non-batch insert performance from SQL Server Express. What really struck me, however, was the slow performance of the find_nnnn queries. Why is retrieving data from MongoDB so slow? What am I missing? Edit: This was all on the local machine, no network latency or anything. MongoDB's CPU usage ran at about 75% the entire time the test was running. Edit 2: Also, I ran a trace on the benchmark program and confirmed that 50% of the CPU time spent was waiting for MongoDB to return data, so it's not a performance issue with the C# driver.

Read the article
XSL Template outputting massive chunk of text, rather than HTML. But only on one section

- by Throlkim

I'm having a slightly odd situation with an XSL template. Most of it outputs fine, but a certain for-each loop is causing me problems. Here's the XML: <area> <feature type="Hall"> <Heading><![CDATA[Hall]]></Heading> <Para><![CDATA[Communal gardens, pathway leading to PVCu double glazed communal front door to]]></Para> </feature> <feature type="Entrance Hall"> <Heading><![CDATA[Communal Entrance Hall]]></Heading> <Para><![CDATA[Plain ceiling, centre light fitting, fire door through to inner hallway, wood and glazed panelled front door to]]></Para> </feature> <feature type="Inner Hall"> <Heading><![CDATA[Inner Hall]]></Heading> <Para><![CDATA[Plain ceiling with pendant light fitting and covings, security telephone, airing cupboard housing gas boiler serving domestic hot water and central heating, telephone point, storage cupboard housing gas and electric meters, wooden panelled doors off to all rooms.]]></Para> </feature> <feature type="Lounge (Reception)" width="3.05" length="4.57" units="metre"> <Heading><![CDATA[Lounge (Reception)]]></Heading> <Para><![CDATA[15' 6" x 10' 7" (4.72m x 3.23m) Window to the side and rear elevation, papered ceiling with pendant light fitting and covings, two double panelled radiators, power points, wall mounted security entry phone, TV aerial point.]]></Para> </feature> <feature type="Kitchen" width="3.05" length="3.66" units="metre"> <Heading><![CDATA[Kitchen]]></Heading> <Para><![CDATA[12' x 10' (3.66m x 3.05m) Double glazed window to the rear elevation, textured ceiling with strip lighting, range of base and wall units in Beech with brushed aluminium handles, co-ordinated working surfaces with inset stainless steel sink with mixer taps over, co-ordinated tiled splashbacks, gas and electric cooker points, large storage cupboard with shelving, power points.]]></Para> </feature> <feature type="Entrance Porch"> <Heading><![CDATA[Balcony]]></Heading> <Para><![CDATA[Views across the communal South facing garden, wrought iron balustrade.]]></Para> </feature> <feature type="Bedroom" width="3.35" length="3.96" units="metre"> <Heading><![CDATA[Bedroom One]]></Heading> <Para><![CDATA[13' 6" x 11' 5" (4.11m x 3.48m) Double glazed windows to the front and side elevations, papered ceiling with pendant light fittings and covings, single panelled radiator, power points, telephone point, security entry phone.]]></Para> </feature> <feature type="Bedroom" width="3.05" length="3.35" units="metre"> <Heading><![CDATA[Bedroom Two]]></Heading> <Para><![CDATA[11' 4" x 10' 1" (3.45m x 3.07m) Double glazed window to the front elevation, plain ceiling with centre light fitting and covings, power points.]]></Para> </feature> <feature type="bathroom"> <Heading><![CDATA[Bathroom]]></Heading> <Para><![CDATA[Obscure double glazed window to the rear elevation, textured ceiling with centre light fitting and extractor fan, suite in white comprising of low level WC, wall mounted wash hand basin and walk in shower housing 'Triton T80' electric shower, co-ordinated tiled splashbacks.]]></Para> </feature> </area> And here's the section of my template that processes it: <xsl:for-each select="area"> <li> <xsl:for-each select="feature"> <li> <h5> <xsl:value-of select="Heading"/> </h5> <xsl:value-of select="Para"/> </li> </xsl:for-each> </li> </xsl:for-each> And here's the output: Hall Communal gardens, pathway leading to PVCu double glazed communal front door to Communal Entrance Hall Plain ceiling, centre light fitting, fire door through to inner hallway, wood and glazed panelled front door to Inner Hall Plain ceiling with pendant light fitting and covings, security telephone, airing cupboard housing gas boiler serving domestic hot water and central heating, telephone point, storage cupboard housing gas and electric meters, wooden panelled doors off to all rooms. Lounge (Reception) 15' 6" x 10' 7" (4.72m x 3.23m) Window to the side and rear elevation, papered ceiling with pendant light fitting and covings, two double panelled radiators, power points, wall mounted security entry phone, TV aerial point. Kitchen 12' x 10' (3.66m x 3.05m) Double glazed window to the rear elevation, textured ceiling with strip lighting, range of base and wall units in Beech with brushed aluminium handles, co-ordinated working surfaces with inset stainless steel sink with mixer taps over, co-ordinated tiled splashbacks, gas and electric cooker points, large storage cupboard with shelving, power points. Balcony Views across the communal South facing garden, wrought iron balustrade. Bedroom One 13' 6" x 11' 5" (4.11m x 3.48m) Double glazed windows to the front and side elevations, papered ceiling with pendant light fittings and covings, single panelled radiator, power points, telephone point, security entry phone. Bedroom Two 11' 4" x 10' 1" (3.45m x 3.07m) Double glazed window to the front elevation, plain ceiling with centre light fitting and covings, power points. Bathroom Obscure double glazed window to the rear elevation, textured ceiling with centre light fitting and extractor fan, suite in white comprising of low level WC, wall mounted wash hand basin and walk in shower housing 'Triton T80' electric shower, co-ordinated tiled splashbacks. For reference, here's the entire XSLT: http://pastie.org/private/eq4gjvqoc1amg9ynyf6wzg The rest of it all outputs fine - what am I missing from the above section?

Read the article
What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article
Problems related to showing MessageBox from non-GUI threads

- by Hans Løken

I'm working on a heavily data-bound Win.Forms application where I've found some strange behavior. The app has separate I/O threads receiving updates through asynchronous web-requests which it then sends to the main/GUI thread for processing and updating of application-wide data-stores (which in turn may be data-bound to various GUI-elements, etc.). The server at the other end of the web-requests requires periodic requests or the session times out. I've gone through several attempted solutions of dealing with thread-issues etc. and I've observed the following behavior: If I use Control.Invoke for sending updates from I/O-thread(s) to main-thread and this update causes a MessageBox to be shown the main form's message pump stops until the user clicks the ok-button. This also blocks the I/O-thread from continuing eventually leading to timeouts on the server. If I use Control.BeginInvoke for sending updates from I/O-thread(s) to main-thread the main form's message pump does not stop, but if the processing of an update leads to a messagebox being shown, the processing of the rest of that update is halted until the user clicks ok. Since the I/O-threads keep running and the message pump keeps processing messages several BeginInvoke's for updates may be called before the one with the message box is finished. This leads to out-of-sequence updates which is unacceptable. I/O-threads add updates to a blocking queue (very similar to http://stackoverflow.com/questions/530211/creating-a-blocking-queuet-in-net/530228#530228). GUI-thread uses a Forms.Timer that periodically applies all updates in the blocking queue. This solution solves both the problem of blocking I/O threads and sequentiality of updates i.e. next update will be never be started until previous is finished. However, there is a small performance cost as well as introducing a latency in showing updates that is unacceptable in the long run. I would like update-processing in the main-thread to be event-driven rather than polling. So to my question. How should I do this to: avoid blocking the I/O-threads guarantee that updates are finished in-sequence keep the main message pump running while showing a message box as a result of an update.

Read the article
DirectShow: Video-Preview and Image (with working code)

- by xsl

Questions / Issues If someone can recommend me a good free hosting site I can provide the whole project file. As mentioned in the text below the TakePicture() method is not working properly on the HTC HD 2 device. It would be nice if someone could look at the code below and tell me if it is right or wrong what I'm doing. Introduction I recently asked a question about displaying a video preview, taking camera image and rotating a video stream with DirectShow. The tricky thing about the topic is, that it's very hard to find good examples and the documentation and the framework itself is very hard to understand for someone who is new to windows programming and C++ in general. Nevertheless I managed to create a class that implements most of this features and probably works with most mobile devices. Probably because the DirectShow implementation depends a lot on the device itself. I could only test it with the HTC HD and HTC HD2, which are known as quite incompatible. HTC HD Working: Video preview, writing photo to file Not working: Set video resolution (CRASH), set photo resolution (LOW quality) HTC HD 2 Working: Set video resolution, set photo resolution Problematic: Video Preview rotated Not working: Writing photo to file To make it easier for others by providing a working example, I decided to share everything I have got so far below. I removed all of the error handling for the sake of simplicity. As far as documentation goes, I can recommend you to read the MSDN documentation, after that the code below is pretty straight forward. void Camera::Init() { CreateComObjects(); _captureGraphBuilder->SetFiltergraph(_filterGraph); InitializeVideoFilter(); InitializeStillImageFilter(); } Dipslay a video preview (working with any tested handheld): void Camera::DisplayVideoPreview(HWND windowHandle) { IVideoWindow *_vidWin; _filterGraph->QueryInterface(IID_IMediaControl,(void **) &_mediaControl); _filterGraph->QueryInterface(IID_IVideoWindow, (void **) &_vidWin); _videoCaptureFilter->QueryInterface(IID_IAMVideoControl, (void**) &_videoControl); _captureGraphBuilder->RenderStream(&PIN_CATEGORY_PREVIEW, &MEDIATYPE_Video, _videoCaptureFilter, NULL, NULL); CRect rect; long width, height; GetClientRect(windowHandle, &rect); _vidWin->put_Owner((OAHWND)windowHandle); _vidWin->put_WindowStyle(WS_CHILD | WS_CLIPSIBLINGS); _vidWin->get_Width(&width); _vidWin->get_Height(&height); height = rect.Height(); _vidWin->put_Height(height); _vidWin->put_Width(rect.Width()); _vidWin->SetWindowPosition(0,0, rect.Width(), height); _mediaControl->Run(); } HTC HD2: If set SetPhotoResolution() is called FindPin will return E_FAIL. If not, it will create a file full of null bytes. HTC HD: Works void Camera::TakePicture(WCHAR *fileName) { CComPtr<IFileSinkFilter> fileSink; CComPtr<IPin> stillPin; CComPtr<IUnknown> unknownCaptureFilter; CComPtr<IAMVideoControl> videoControl; _imageSinkFilter.QueryInterface(&fileSink); fileSink->SetFileName(fileName, NULL); _videoCaptureFilter.QueryInterface(&unknownCaptureFilter); _captureGraphBuilder->FindPin(unknownCaptureFilter, PINDIR_OUTPUT, &PIN_CATEGORY_STILL, &MEDIATYPE_Video, FALSE, 0, &stillPin); _videoCaptureFilter.QueryInterface(&videoControl); videoControl->SetMode(stillPin, VideoControlFlag_Trigger); } Set resolution: Works great on HTC HD2. HTC HD won't allow SetVideoResolution() and only offers one low resolution photo resolution: void Camera::SetVideoResolution(int width, int height) { SetResolution(true, width, height); } void Camera::SetPhotoResolution(int width, int height) { SetResolution(false, width, height); } void Camera::SetResolution(bool video, int width, int height) { IAMStreamConfig *config; config = NULL; if (video) { _captureGraphBuilder->FindInterface(&PIN_CATEGORY_PREVIEW, &MEDIATYPE_Video, _videoCaptureFilter, IID_IAMStreamConfig, (void**) &config); } else { _captureGraphBuilder->FindInterface(&PIN_CATEGORY_STILL, &MEDIATYPE_Video, _videoCaptureFilter, IID_IAMStreamConfig, (void**) &config); } int resolutions, size; VIDEO_STREAM_CONFIG_CAPS caps; config->GetNumberOfCapabilities(&resolutions, &size); for (int i = 0; i < resolutions; i++) { AM_MEDIA_TYPE *mediaType; if (config->GetStreamCaps(i, &mediaType, reinterpret_cast<BYTE*>(&caps)) == S_OK ) { int maxWidth = caps.MaxOutputSize.cx; int maxHeigth = caps.MaxOutputSize.cy; if(maxWidth == width && maxHeigth == height) { VIDEOINFOHEADER *info = reinterpret_cast<VIDEOINFOHEADER*>(mediaType->pbFormat); info->bmiHeader.biWidth = maxWidth; info->bmiHeader.biHeight = maxHeigth; info->bmiHeader.biSizeImage = DIBSIZE(info->bmiHeader); config->SetFormat(mediaType); DeleteMediaType(mediaType); break; } DeleteMediaType(mediaType); } } } Other methods used to build the filter graph and create the COM objects: void Camera::CreateComObjects() { CoInitialize(NULL); CoCreateInstance(CLSID_CaptureGraphBuilder, NULL, CLSCTX_INPROC_SERVER, IID_ICaptureGraphBuilder2, (void **) &_captureGraphBuilder); CoCreateInstance(CLSID_FilterGraph, NULL, CLSCTX_INPROC_SERVER, IID_IGraphBuilder, (void **) &_filterGraph); CoCreateInstance(CLSID_VideoCapture, NULL, CLSCTX_INPROC, IID_IBaseFilter, (void**) &_videoCaptureFilter); CoCreateInstance(CLSID_IMGSinkFilter, NULL, CLSCTX_INPROC, IID_IBaseFilter, (void**) &_imageSinkFilter); } void Camera::InitializeVideoFilter() { _videoCaptureFilter->QueryInterface(&_propertyBag); wchar_t deviceName[MAX_PATH] = L"\0"; GetDeviceName(deviceName); CComVariant comName = deviceName; CPropertyBag propertyBag; propertyBag.Write(L"VCapName", &comName); _propertyBag->Load(&propertyBag, NULL); _filterGraph->AddFilter(_videoCaptureFilter, L"Video Capture Filter Source"); } void Camera::InitializeStillImageFilter() { _filterGraph->AddFilter(_imageSinkFilter, L"Still image filter"); _captureGraphBuilder->RenderStream(&PIN_CATEGORY_STILL, &MEDIATYPE_Video, _videoCaptureFilter, NULL, _imageSinkFilter); } void Camera::GetDeviceName(WCHAR *deviceName) { HRESULT hr = S_OK; HANDLE handle = NULL; DEVMGR_DEVICE_INFORMATION di; GUID guidCamera = { 0xCB998A05, 0x122C, 0x4166, 0x84, 0x6A, 0x93, 0x3E, 0x4D, 0x7E, 0x3C, 0x86 }; di.dwSize = sizeof(di); handle = FindFirstDevice(DeviceSearchByGuid, &guidCamera, &di); StringCchCopy(deviceName, MAX_PATH, di.szLegacyName); } Full header file: #ifndef __CAMERA_H__ #define __CAMERA_H__ class Camera { public: void Init(); void DisplayVideoPreview(HWND windowHandle); void TakePicture(WCHAR *fileName); void SetVideoResolution(int width, int height); void SetPhotoResolution(int width, int height); private: CComPtr<ICaptureGraphBuilder2> _captureGraphBuilder; CComPtr<IGraphBuilder> _filterGraph; CComPtr<IBaseFilter> _videoCaptureFilter; CComPtr<IPersistPropertyBag> _propertyBag; CComPtr<IMediaControl> _mediaControl; CComPtr<IAMVideoControl> _videoControl; CComPtr<IBaseFilter> _imageSinkFilter; void GetDeviceName(WCHAR *deviceName); void InitializeVideoFilter(); void InitializeStillImageFilter(); void CreateComObjects(); void SetResolution(bool video, int width, int height); }; #endif

Read the article
What's the best Scala build system?

- by gatoatigrado

I've seen questions about IDE's here -- Which is the best IDE for Scala development? and What is the current state of tooling for Scala?, but I've had mixed experiences with IDEs. Right now, I'm using the Eclipse IDE with the automatic workspace refresh option, and KDE 4's Kate as my text editor. Here are some of the problems I'd like to solve: use my own editor IDEs are really geared at everyone using their components. I like Kate better, but the refresh system is very annoying (it doesn't use inotify, rather, maybe a 10s polling interval). The reason I don't use the built-in text editor is because broken auto-complete functionalities cause the IDE to hang for maybe 10s. rebuild only modified files The Eclipse build system is broken. It doesn't know when to rebuild classes. I find myself almost half of the time going to project-clean. Worse, it seems even after it has finished building my project, a few minutes later it will pop up with some bizarre error (edit - these errors appear to be things that were previously solved with a project clean, but then come back up...). Finally, setting "Preferences / Continue launch if project contains errors" to "prompt" seems to have no effect for Scala projects (i.e. it always launches even if there are errors). build customization I can use the "nightly" release, but I'll want to modify and use my own Scala builds, not the compiler that's built into the IDE's plugin. It would also be nice to pass [e.g.] -Xprint:jvm to the compiler (to print out lowered code). fast compiling Though Eclipse doesn't always build right, it does seem snappy -- even more so than fsc. I looked at Ant and Maven, though haven't employed either yet (I'll also need to spend time solving #3 and #4). I wanted to see if anyone has other suggestions before I spend time getting a suboptimal build system working. Thanks in advance! UPDATE - I'm now using Maven, passing a project as a compiler plugin to it. It seems fast enough; I'm not sure what kind of jar caching Maven does. A current repository for Scala 2.8.0 is available [link]. The archetypes are very cool, and cross-platform support seems very good. However, about compile issues, I'm not sure if fsc is actually fixed, or my project is stable enough (e.g. class names aren't changing) -- running it manually doesn't bother me as much. If you'd like to see an example, feel free to browse the pom.xml files I'm using [github]. UPDATE 2 - from benchmarks I've seen, Daniel Spiewak is right that buildr's faster than Maven (and, if one is doing incremental changes, Maven's 10 second latency gets annoying), so if one can craft a compatible build file, then it's probably worth it...

Read the article
The best way to separate admin functionality from a public site?

- by AndrewO

I'm working on a site that's grown both in terms of user-base and functionality to the point where it's becoming evident that some of the admin tasks should be separate from the public website. I was wondering what the best way to do this would be. For example, the site has a large social component to it, and a public sales interface. But at the same time, there's back office tasks, bulk upload processing, dashboards (with long running queries), and customer relations tools in the admin section that I would like to not be effected by spikes in public traffic (or effect the public-facing response time). The site is running on a fairly standard Rails/MySQL/Linux stack, but I think this is more of an architecture problem than an implementation one: mainly, how does one keep the data and business logic in sync between these different applications? Some strategies that I'm evaluating: 1) Create a slave database of the public facing database on another machine. Extract out all of the model and library code so that it can be shared between the applications. Create new controllers and views for the admin interfaces. I have limited experience with replication and am not even sure that it's supposed to be used this way (most of the time I've seen it, it's been for scaling out the read capabilities of the same application, rather than having multiple different ones). I'm also worried about the potential for latency issues if the slave is not on the same network. 2) Create new more task/department-specific applications and use a message oriented middleware to integrate them. I read Enterprise Integration Patterns awhile back and they seemed to advocate this for distributed systems. (Alternatively, in some cases the basic Rails-style RESTful API functionality might suffice.) But, I have nightmares about data synchronization issues and the massive re-architecting that this would entail. 3) Some mixture of the two. For example, the only public information necessary for some of the back office tasks is a read-only completion time or status. Would it make sense to have that on a completely separate system and send the data to public? Meanwhile, the user/group admin functionality would be run on a separate system sharing the database? The downside is, this seems to keep many of the concerns I have with the first two, especially the re-architecting. I'm sure the answers are going to be highly dependent on a site's specific needs, but I'd love to hear success (or failure) stories.

Read the article
HTTP Post requests using HttpClient take 2 seconds, why?

- by pableu

Update: You might better hold off this for a bit, I just noticed I could be my fault after all. Working on this all afternoon, and then I find a flaw ten minutes after posting here, ts. Hi, I'am currently coding an android app that submits stuff in the background using HTTP Post and AsyncTask. I use the org.apache.http.client Package for this. I based my code on this example. Basically, my code looks like this: public void postData() { // Create a new HttpClient and Post Header HttpClient httpclient = new DefaultHttpClient(); HttpPost httppost = new HttpPost("http://192.168.1.137:8880/form"); try { List<NameValuePair> nameValuePairs = new ArrayList<NameValuePair>(2); nameValuePairs.add(new BasicNameValuePair("id", "12345")); nameValuePairs.add(new BasicNameValuePair("stringdata", "AndDev is Cool!")); httppost.setEntity(new UrlEncodedFormEntity(nameValuePairs)); // Execute HTTP Post Request HttpResponse response = httpclient.execute(httppost); } catch (ClientProtocolException e) { Log.e(TAG,e.toString()); } catch (IOException e) { Log.e(TAG,e.toString()); } } The problem is that the httpclient.execute(..) line takes around 1.5 to 3 seconds, and I do not understand why. Just requesting a page with HTTP Get takes around 80 ms or so, so the problem doesn't seem to be the network latency itself. The problem doesn't seem to be on the server side either, I have also tried POSTing data to http://www.disney.com/ with similarly slow results. And Firebug shows 1 ms response time when POSTing data to my server locally. This happens on the Emulator and with my Nexus One (both with Android 2.2). If you want to look at the complete code, I've put it on GitHub. It's just a dummy program to do HTTP Post in the background using AsyncTask on the push of a button. It's my first Android app, and my first java code for a long time. And incidentially, also my first question on Stackoverflow ;-) Any ideas why httpclient.execute(httppost) takes so long?

Read the article
How to handle very frequent updates to a Lucene index

- by fsm

I am trying to prototype an indexing/search application which uses very volatile indexing data sources (forums, social networks etc), here are some of the performance requirements, Very fast turn-around time (by this I mean that any new data (such as a new message on a forum) should be available in the search results very soon (less than a minute)) I need to discard old documents on a fairly regular basis to ensure that the search results are not dated. Last but not least, the search application needs to be responsive. (latency on the order of 100 milliseconds, and should support at least 10 qps) All of the requirements I have currently can be met w/o using Lucene (and that would let me satisfy all 1,2 and 3), but I am anticipating other requirements in the future (like search relevance etc) which Lucene makes easier to implement. However, since Lucene is designed for use cases far more complex than the one I'm currently working on, I'm having a hard time satisfying my performance requirements. Here are some questions, a. I read that the optimize() method in the IndexWriter class is expensive, and should not be used by applications that do frequent updates, what are the alternatives? b. In order to do incremental updates, I need to keep committing new data, and also keep refreshing the index reader to make sure it has the new data available. These are going to affect 1 and 3 above. Should I try duplicate indices? What are some common approaches to solving this problem? c. I know that Lucene provides a delete method, which lets you delete all documents that match a certain query, in my case, I need to delete all documents which are older than a certain age, now one option is to add a date field to every document and use that to delete documents later. Is it possible to do range queries on document ids (I can create my own id field since I think that the one created by lucene keeps changing) to delete documents? Is it any faster than comparing dates represented as strings? I know these are very open questions, so I am not looking for a detailed answer, I will try to treat all of your answers as suggestions and use them to inform my design. Thanks! Please let me know if you need any other information.

Read the article
AudioTrack lag: obtainBuffer timed out

- by BTR

I'm playing WAVs on my Android phone by loading the file and feeding the bytes into AudioTrack.write() via the FileInputStream BufferedInputStream DataInputStream method. The audio plays fine and when it is, I can easily adjust sample rate, volume, etc on the fly with nice performance. However, it's taking about two full seconds for a track to start playing. I know AudioTrack has an inescapable delay, but this is ridiculous. Every time I play a track, I get this: 03-13 14:55:57.100: WARN/AudioTrack(3454): obtainBuffer timed out (is the CPU pegged?) 0x2e9348 user=00000960, server=00000000 03-13 14:55:57.340: WARN/AudioFlinger(72): write blocked for 233 msecs, 9 delayed writes, thread 0xba28 I've noticed that the delayed write count increases by one every time I play a track -- even across multiple sessions -- from the time the phone has been turned on. The block time is always 230 - 240ms, which makes sense considering a minimum buffer size of 9600 on this device (9600 / 44100). I've seen this message in countless searches on the Internet, but it usually seems to be related to not playing audio at all or skipping audio. In my case, it's just a delayed start. I'm running all my code in a high priority thread. Here's a truncated-yet-functional version of what I'm doing. This is the thread callback in my playback class. Again, this works (only playing 16-bit, 44.1kHz, stereo files right now), it just takes forever to start and has that obtainBuffer/delayed write message every time. public void run() { // Load file FileInputStream mFileInputStream; try { // mFile is instance of custom file class -- this is correct, // so don't sweat this line mFileInputStream = new FileInputStream(mFile.path()); } catch (FileNotFoundException e) {} BufferedInputStream mBufferedInputStream = new BufferedInputStream(mFileInputStream, mBufferLength); DataInputStream mDataInputStream = new DataInputStream(mBufferedInputStream); // Skip header try { if (mDataInputStream.available() > 44) mDataInputStream.skipBytes(44); } catch (IOException e) {} // Initialize device mAudioTrack = new AudioTrack(AudioManager.STREAM_MUSIC, ConfigManager.SAMPLE_RATE, AudioFormat.CHANNEL_CONFIGURATION_STEREO, AudioFormat.ENCODING_PCM_16BIT, ConfigManager.AUDIO_BUFFER_LENGTH, AudioTrack.MODE_STREAM); mAudioTrack.play(); // Initialize buffer byte[] mByteArray = new byte[mBufferLength]; int mBytesToWrite = 0; int mBytesWritten = 0; // Loop to keep thread running while (mRun) { // This flag is turned on when the user presses "play" while (mPlaying) { try { // Check if data is available if (mDataInputStream.available() > 0) { // Read data from file and write to audio device mBytesToWrite = mDataInputStream.read(mByteArray, 0, mBufferLength); mBytesWritten += mAudioTrack.write(mByteArray, 0, mBytesToWrite); } } catch (IOException e) { } } } } If I can get past the artificially long lag, I can easily deal with the inherit latency by starting my write at a later, predictable position (ie, skip past the minimum buffer length when I start playing a file).

Read the article
SQL SERVER – Update Statistics are Sampled By Default

- by pinaldave

After reading my earlier post SQL SERVER – Create Primary Key with Specific Name when Creating Table on Statistics, I have received another question by a blog reader. The question is as follows: Question: Are the statistics sampled by default? Answer: Yes. The sampling rate can be specified by the user and it can be anywhere between a very low value to 100%. Let us do a small experiment to verify if the auto update on statistics is left on. Also, let’s examine a very large table that is created and statistics by default- whether the statistics are sampled or not. USE [AdventureWorks] GO -- Create Table CREATE TABLE [dbo].[StatsTest]( [ID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [varchar](100) NULL, [LastName] [varchar](100) NULL, [City] [varchar](100) NULL, CONSTRAINT [PK_StatsTest] PRIMARY KEY CLUSTERED ([ID] ASC) ) ON [PRIMARY] GO -- Insert 1 Million Rows INSERT INTO [dbo].[StatsTest] (FirstName,LastName,City) SELECT TOP 1000000 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Update the statistics UPDATE STATISTICS [dbo].[StatsTest] GO -- Shows the statistics DBCC SHOW_STATISTICS ("StatsTest"PK_StatsTest) GO -- Clean up DROP TABLE [dbo].[StatsTest] GO Now let us observe the result of the DBCC SHOW_STATISTICS. The result shows that Resultset is for sure sampling for a large dataset. The percentage of sampling is based on data distribution as well as the kind of data in the table. Before dropping the table, let us check first the size of the table. The size of the table is 35 MB. Now, let us run the above code with lesser number of the rows. USE [AdventureWorks] GO -- Create Table CREATE TABLE [dbo].[StatsTest]( [ID] [int] IDENTITY(1,1) NOT NULL, [FirstName] [varchar](100) NULL, [LastName] [varchar](100) NULL, [City] [varchar](100) NULL, CONSTRAINT [PK_StatsTest] PRIMARY KEY CLUSTERED ([ID] ASC) ) ON [PRIMARY] GO -- Insert 1 Hundred Thousand Rows INSERT INTO [dbo].[StatsTest] (FirstName,LastName,City) SELECT TOP 100000 'Bob', CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%2 = 1 THEN 'Smith' ELSE 'Brown' END, CASE WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 1 THEN 'New York' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 5 THEN 'San Marino' WHEN ROW_NUMBER() OVER (ORDER BY a.name)%10 = 3 THEN 'Los Angeles' ELSE 'Houston' END FROM sys.all_objects a CROSS JOIN sys.all_objects b GO -- Update the statistics UPDATE STATISTICS [dbo].[StatsTest] GO -- Shows the statistics DBCC SHOW_STATISTICS ("StatsTest"PK_StatsTest) GO -- Clean up DROP TABLE [dbo].[StatsTest] GO You can see that Rows Sampled is just the same as Rows of the table. In this case, the sample rate is 100%. Before dropping the table, let us also check the size of the table. The size of the table is less than 4 MB. Let us compare the Result set just for a valid reference. Test 1: Total Rows: 1000000, Rows Sampled: 255420, Size of the Table: 35.516 MB Test 2: Total Rows: 100000, Rows Sampled: 100000, Size of the Table: 3.555 MB The reason behind the sample in the Test1 is that the data space is larger than 8 MB, and therefore it uses more than 1024 data pages. If the data space is smaller than 8 MB and uses less than 1024 data pages, then the sampling does not happen. Sampling aids in reducing excessive data scan; however, sometimes it reduces the accuracy of the data as well. Please note that this is just a sample test and there is no way it can be claimed as a benchmark test. The result can be dissimilar on different machines. There are lots of other information can be included when talking about this subject. I will write detail post covering all the subject very soon. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: Pinal Dave, SQL, SQL Authority, SQL Index, SQL Optimization, SQL Performance, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: SQL Statistics

Read the article
Inspire Geek Love with These Hilarious Geek Valentines

- by Eric Z Goodnight

Want to send some Geek Love to that special someone? Why not do it with these elementary school throwback valentines, and win their heart this upcoming Valentine’s day—the geek way! Read on to see the simple method to make your own custom Valentines, as well as download a set of eleven ready-made ones any geek guy or gal should be delighted get. It’s amore! How to Make Custom Valentines A size we’ve used for all of our Valentines is a 3” x 4” at 150 dpi. This is fairly low resolution for print, but makes a great graphic to email. With your new image open, Navigate to Edit > Fill and fill your background layer with a rich, red color (or whatever appeals to you.) By setting “Use” to “Foreground color as shown above, you’ll paint whatever foreground color you have in your color picker. Press to select the text tool. Set a few text objects, using whatever fonts appeal to you. Pixel fonts, like this one, are freely downloadable, and we’ve already shared a great list of Valentines fonts. Copy an image from the internet if you’re confident your sweetie won’t mind a bit of fair use of copyrighted imagery. If they do mind, find yourself some great Creative Commons images. to do a free transform on your image, sizing it to whatever dimensions work best for your design. Right click your newly added image layer in your panel and Choose “Blending Effects” to pick a Layer Style. “Stroke” with this setting adds a black line around your image. Also turning on “Outer Glow” with this setting puts a dark black shadow around the top and bottom (and sides, although they are hidden). Add some more text. Double entendre is recommended. Click and hold down on the “Rectangle Tool” to get the “Custom Shape Tool.” The custom shape tool has useful vector shapes built into it. Find the “Shape” dropdown in the menu to find the heart image. Click and drag to create a vector heart shape in your image. Your layers panel is where you can change the color, if it happens to use the wrong one at first. Click the color swatch in your panel, highlighted in blue above. will transform your vector heart. You can also use it to rotate, if you like. Add some details, like this Power or Standby symbol, which can be found in symbol fonts, taken from images online, or drawn by hand. Your Valentine is now ready to be saved as a JPG or PNG and sent to the object of your affection! Keep reading to see a list of 11 downloadable How-To Geek Valentines, including this one and the three from the header image. Download The HTG Set of Valentines Download the HTG Geek Valentines (ZIP) Download the HTG Geek Valentines (ZIP) When he’s not wooing ladies with Valentines cards, you can email the author at [email protected] with your Photoshop and Graphics questions. Your questions may be featured in a future How-To Geek article! Latest Features How-To Geek ETC Inspire Geek Love with These Hilarious Geek Valentines How to Integrate Dropbox with Pages, Keynote, and Numbers on iPad RGB? CMYK? Alpha? What Are Image Channels and What Do They Mean? How to Recover that Photo, Picture or File You Deleted Accidentally How To Colorize Black and White Vintage Photographs in Photoshop How To Get SSH Command-Line Access to Windows 7 Using Cygwin How to Kid Proof Your Computer’s Power and Reset Buttons Microsoft’s Windows Media Player Extension Adds H.264 Support Back to Google Chrome Android Notifier Pushes Android Notices to Your Desktop Dead Space 2 Theme for Chrome and Iron Carl Sagan and Halo Reach Mashup – We Humans are Capable of Greatness [Video] Battle the Necromorphs Once Again on Your Desktop with the Dead Space 2 Theme for Windows 7

Read the article
Airtel 3G in Chennai – User experience, Price & What’s the catch?

- by Boonei

Finally ! Here we are with Airtel 3G in India. Now Airtel customers can have a go at real 3G speed. Sources suggest that the delay in rolling out 3G was due to hardware problems. It was provided by Ericsson. Now first things first. Let me get to the point. I had subscribed to Airtel’s 3G pack Rs.100 for 100 MB. This is to check out how good it is, did not want to pay a hefty sum at the first instance. It was pretty smooth upgrading.. After the upgrade I did see the much awaited 3G signal bar on my phone. Ok! now its testing time. User experience First I did a bit of browsing, boy ! it was pretty quick, web pages loaded in a jiffy. I really did not time it because it loaded really quick. I loaded a YouTube Video, no buffering, watched the 4 min Video with no problems, it took around 6 MB of data usage Made a Skype call for about 6 min, voice clarity was really good and data usage was around 4-5 MB Tried Google Maps everything was so fast could not see the difference between computer and my phone, used it for about couple of minutes. Did listen to an Online Radio for about 5 min took about 8 MB of data usage Guess there is no need to say about Facebook or Twitter. It was good obviously. Video Call – Not yet tested Price – Do you get what you pay for ? 3G speed is fantastic, you have to really feel it to enjoy it. But currently in Airtel, 3G is available only in 3 places wiz. Bengaluru, Chennai, Coimbatore. ok ! Its not even there in all the metros? hmmm. 3G signal was not available in all parts of Chennai, often in many places it changed to 2G. Let alone all the places, even in my house when walking from one room to another sometimes its shows 2G. When it chaged from 3G to 2G there was lag in the application when it was loading data which often made me wonder if the application hanged. Currently prices not low. 2G plans in Airtel is Rs.98 for 2GB and for Rs.100 its only 100MB in 3G. Now you decide please, it’s quite a debate. The Catch – There is always a catch right ? If you have bought 3G connection and in places where 3G is not available (2G) and use any application that requires data connections (youtube, browse, chat etc) its changed with 3G!. Meaning if you have bought 100MB of 3G by paying Rs.100 like I did, suppose you used the connection for about 10MB using 2G, then it would reduce from the 100MB to 90 MB….That’s bad ! You cannot have 2G and 3G plans activated at the same point of time in your phone. You will pay 3G price for using 2G. This article titled,Airtel 3G in Chennai – User experience, Price & What’s the catch?, was originally published at Tech Dreams. Grab our rss feed or fan us on Facebook to get updates from us.

Read the article
Raspberry Pi entrance signed backed by Umbraco - Part 1

- by Chris Houston

Being experts on all things Umbraco, we jumped at the chance to help our client, QV Offices, with their pressing signage predicament. They needed to display a sign in the entrance to their building and approached us for our advice. Of course it had to be electronic: displaying multiple names of their serviced office clients, meeting room bookings and on-the-pulse promotions. But with a winding Victorian staircase and minimal storage space how could the monitor be run, updated and managed? That’s where we came in…Raspberry PiUmbraco CMSAutomatic updatesAutomated monitor of the signPower saving when the screen is not in useMounting the screenThe screen that has been used is a standard LED low energy Full HD screen and has been mounted on the wall using it's VESA mounting points, as the wall is a stud wall we were able to add an access panel behind the screen to feed through the mains, HDMI and sensor cables.The Raspberry Pi is then tucked away out of sight in the main electrical cupboard which just happens to be next to the sign, we had an electrician add a power point inside this cupboard to allow us to power the screen and the Raspberry Pi.Designing the interface and editing the contentAlthough a room sign was the initial requirement from QV Offices, their medium term goal has always been to add online meeting booking to their website and hence we suggested adding information about the current and next day's meetings to the sign that would be pulled directly from their online booking system.We produced the design and built the web page to fit exactly on a 1920 x 1080 screen (Full HD in Portrait)As you would expect all the information can be edited via an Umbraco CMS, they are able to add floors, rooms, clients and virtual clients as well as add meeting bookings to their meeting diary.How we configured the Raspberry PiAfter receiving a new Raspberry Pi we downloaded the latest release of Raspbian operating system and followed the official guide which shows how to copy the OS onto an SD card from a Mac, we then followed the majority of steps on this useful guide: 10 Things to Do After Buying a Raspberry Pi.Installing ChromiumWe chose to use the Chromium web browser which for those who do not know is the open sourced version of Google Chrome. You can install this from the terminal with the following command:sudo apt-get install chromium-browserInstalling UnclutterWe found this little application which automatically hides the mouse pointer, it is used in the script below and is installed using the following command:sudo apt-get install unclutterAuto start Chromium and disabling the screen saver, power saving and mouseWhen the Raspberry Pi has been installed it will not have a keyboard or mouse and hence if their was a power cut we needed it to always boot and re-loaded Chromium with the correct URL.Our preferred command line text editor is Nano and I have assumed you know how to use this editor or will be able to work it out pretty quickly.So using the following command:sudo nano /etc/xdg/lxsession/LXDE/autostartWe then changed the autostart file content to:@lxpanel --profile LXDE@pcmanfm --desktop --profile LXDE@xscreensaver -no-splash@xset s off@xset -dpms@xset s noblank@chromium --kiosk --incognito http://www.qvoffices.com/someURL@unclutter -idle 0The first few commands turn off the screen saver and power saving, we then open Cromium in Kiosk Mode (full screen with no menu etc) and pass in the URL to use (I have changed the URL in this example) We found a useful blog post with the Cromium command line switches.Finally we also open an application called Unclutter which auto hides the mouse after 0 seconds, so you will never see a mouse on the sign.We also had to edit the following file:sudo nano /etc/lightdm/lightdm.confAnd added the following line under the [SeatDefault] section:xserver-command=X -s 0 dpmsRefreshing the screenWe decided to try and add a scheduled task that would trigger Chromium to reload the page, at some point in the future we might well change this to using Javascript to update the content, but for now this works fine.First we installed the XDOTool which enables you to script Keyboard commands:sudo apt-get install xdotoolWe used the Refreshing Chromium Browser by Shell Script post as a reference and created the following shell script (which we called refreshing.sh):export DISPLAY=":0"WID=$(xdotool search --onlyvisible --class chromium|head -1)xdotool windowactivate ${WID}xdotool key ctrl+F5This selects the correct display and then sends a CTRL + F5 to refresh Chromium.You will need to give this file execute permissions:chmod a=rwx refreshing.shNow we have the script file setup we just need to schedule it to call this script periodically which is done by using Crontab, to edit this you use the following command:crontab -eAnd we added the following:*/5 * * * * DISPLAY=":.0" /home/pi/scripts/refreshing.sh >/home/pi/cronlog.log 2>&1This calls our script every 5 minutes to refresh the display and it logs any errors to the cronlog.log file.SummaryQV Offices now have a richer and more manageable booking system than they did before we started, and a great new sign to boot.How could we make sure that the sign was running smoothly downstairs in a busy office centre? A second post will follow outlining exactly how Vizioz enabled QV Offices to monitor their sign simply and remotely, from the comfort of their desks.

Read the article
Convert YouTube Videos to MP3 with YouTube Downloader

- by DigitalGeekery

Are you looking for a way to take the music videos you watch on YouTube and convert them to MP3? Today we take a look at an easy way to convert those YouTube videos to MP3 for free with YouTube Downloader. The YouTube Downloader functions in two steps. First, it downloads the video from YouTube in MP4 format, and then allows you to convert that MP4 file to MP3. Note: It also supports conversion conversion to some other formats such as AVI video, MOV, iPhone, PSP, 3GP, and WMV. Installation and usage Download and Install YouTube Downloader. (See download link below) Open the YouTube Downloader by clicking on the desktop icon. Find a YouTube video you’d like to convert to MP3 and copy the URL. Paste the URL into the “Enter video URL” text box in YouTube Downloader. When you hover your mouse over the text box, the text box will auto-fill with the URL from your clipboard. Select the “Download video from YouTube” radio button and click “Ok.” Choose a folder to location to download your YouTube video and click “Save.” The video is downloaded in MP4 format. Now wait while the video is downloaded to your hard drive. Select the “Convert video (previously downloaded) from file” radio button. Click the (…) button to the right of the “Select video file” text box to browse for and select the MP4 file you just downloaded. Then select “MPEG Audio Layer (MP3) from the “Convert to” drop down list. Select “OK” to begin the conversion. Choose the conversion quality by moving the slider to the right or left. The options are: Low (96kbps bite rate), Medium (128kbps bit rate), Optimal (192kbps bit rate), and High 256kbps bit rate). Here you can select the output volume as well. Click “OK” when finished. If there is a portion of the beginning or end of the video that you wish to cut out of the MP3, select the “Cut video” check box and choose a Start and End time. Click “OK” when finished. Note: The start and end time represent the audio portion of the MP3 you wish to keep. All portions before and after these times will be cut. The conversion process will begin and should only take a few moments. Times will vary depending on the size of the video you’re converting. Conversion was successful! The MP3 you converted will be in the same directory you downloaded the video to. Now you’re ready to listen to your MP3 or import it to your Zune, iTunes, or music library. You may also want to delete the MP4 files after the conversion if you will no longer need them. Conclusion YouTube Downloader features a very simple interface that’s user friendly and easy to use. It comes in handy when you watch videos that look horrible, but the sound quality is good. Or if you just need to hear the audio of something posted and don’t need the video. It also allows you to download from Google Video, MySpace, and others. Download YouTube Downloader Similar Articles Productive Geek Tips Download YouTube Videos with Cheetah YouTube DownloaderWatch YouTube Videos in Cinema Style in FirefoxStop YouTube Videos from Automatically Playing in FirefoxRemove Unsuitable Comments from YouTubeImprove YouTube Video Viewing in Google Chrome TouchFreeze Alternative in AutoHotkey The Icy Undertow Desktop Windows Home Server – Backup to LAN The Clear & Clean Desktop Use This Bookmarklet to Easily Get Albums Use AutoHotkey to Assign a Hotkey to a Specific Window Latest Software Reviews Tinyhacker Random Tips Revo Uninstaller Pro Registry Mechanic 9 for Windows PC Tools Internet Security Suite 2010 PCmover Professional Windows Media Player 12: Tweak Video & Sound with Playback Enhancements Own a cell phone, or does a cell phone own you? Make your Joomla & Drupal Sites Mobile with OSMOBI Integrate Twitter and Delicious and Make Life Easier Design Your Web Pages Using the Golden Ratio Worldwide Growth of the Internet

Read the article
DevConnections Slides and Samples Posted

- by Rick Strahl

I’ve posted the slides and samples to my DevConnections Sessions for anyone interested. I had a lot of fun with my sessions this time around mainly because the sessions picked were a little off the beaten track (well, the handlers/modules and e-commerce sessions anyway). For those of you that attended I hope you found the sessions useful. For the rest of you – you can check out the slides and samples if you like. Here’s what was covered: Introduction to jQuery with ASP.NET This session covered mostly the client side of jQuery demonstrated on a small sample page with a variety of incrementally built up examples of selection and page manipulation. This session also introduces some of the basics of AJAX communication talking to ASP.NET. When I do this session it never turns out exactly the same way and this time around the examples were on the more basic side and purely done with hands on demonstrations rather than walk throughs of more complex examples. Alas this session always feels like it needs another half an hour to get through the full sortiment of functionality. The slides and samples cover a wider variety of topics and there are many examples that demonstrate more advanced operations like interacting with WCF REST services, using client templating and building rich client only windowed interfaces. Download Low Level ASP.NET: Handlers and Modules This session was a look at the ASP.NET pipeline and it discusses some of the ASP.NET base architecture and key components from HttpRuntime on up through the various modules and handlers that make up the ASP.NET/IIS pipeline. This session is fun as there are a number of cool examples that demonstrate the power and flexibility of ASP.NET, but some of the examples were external and interfacing with other technologies so they’re not actually included in the downloadable samples. However, there are still a few cool ones in there – there’s an image resizing handler, an image overlay module that stamps images with Sample if loaded from a certain folder, an OpenID authentication module (which failed during the demo due to the crappy internet connection at DevConnections this year :-}), Response filtering using a generic filter stream component, a generic error handler and a few others. The slides cover a lot of the ASP.NET pipeline flow and various HttpRuntime components. Download Electronic Payment Processing in ASP.NET Applications This session covered the business end and integration of electronic credit card processing and PayPal. A good part of this session deals with what’s involved in payment processing, getting signed up and who you have to deal with for your merchant account. We then took a look at integration of credit card processing via some generic components provided with the session that allow processing using a unified class interface with specific implementations for several of the most common gateway providers including Authorize.NET, PayFlowPro, LinkPoint, BluePay etc. We also briefly looked at PayPal Classic implementation which provides a quick and cheap if not quite as professional mechanism for taking payments online. The samples provide the Credit Card processing wrappers for the various gateway providers as well as a PayPal helper class to generate the PayPal redirect urls as well as helper code for dealing with IPN callbacks. Download Hope some of you will find the material useful. Enjoy.© Rick Strahl, West Wind Technologies, 2005-2010Posted in ASP.NET

Read the article
What are developer's problems with helpful error messages?

- by Moo-Juice

It continue to astounds me that, in this day and age, products that have years of use under their belt, built by teams of professionals, still to this day - fail to provide helpful error messages to the user. In some cases, the addition of just a little piece of extra information could save a user hours of trouble. A program that generates an error, generated it for a reason. It has everything at its disposal to inform the user as much as it can, why something failed. And yet it seems that providing information to aid the user is a low-priority. I think this is a huge failing. One example is from SQL Server. When you try and restore a database that is in use, it quite rightly won't let you. SQL Server knows what processes and applications are accessing it. Why can't it include information about the process(es) that are using the database? I know not everyone passes an Applicatio_Name attribute on their connection string, but even a hint about the machine in question could be helpful. Another candidate, also SQL Server (and mySQL) is the lovely string or binary data would be truncated error message and equivalents. A lot of the time, a simple perusal of the SQL statement that was generated and the table shows which column is the culprit. This isn't always the case, and if the database engine picked up on the error, why can't it save us that time and just tells us which damned column it was? On this example, you could argue that there may be a performance hit to checking it and that this would impede the writer. Fine, I'll buy that. How about, once the database engine knows there is an error, it does a quick comparison after-the-fact, between values that were going to be stored, versus the column lengths. Then display that to the user. ASP.NET's horrid Table Adapters are also guilty. Queries can be executed and one can be given an error message saying that a constraint somewhere is being violated. Thanks for that. Time to compare my data model against the database, because the developers are too lazy to provide even a row number, or example data. (For the record, I'd never use this data-access method by choice, it's just a project I have inherited!). Whenever I throw an exception from my C# or C++ code, I provide everything I have at hand to the user. The decision has been made to throw it, so the more information I can give, the better. Why did my function throw an exception? What was passed in, and what was expected? It takes me just a little longer to put something meaningful in the body of an exception message. Hell, it does nothing but help me whilst I develop, because I know my code throws things that are meaningful. One could argue that complicated exception messages should not be displayed to the user. Whilst I disagree with that, it is an argument that can easily be appeased by having a different level of verbosity depending on your build. Even then, the users of ASP.NET and SQL Server are not your typical users, and would prefer something full of verbosity and yummy information because they can track down their problems faster. Why to developers think it is okay, in this day and age, to provide the bare minimum amount of information when an error occurs? It's 2011 guys, come on.

Read the article
USB software protection dongle for Java with an SDK which is cross-platform “for real”. Does it exist?

- by Unai Vivi

What I'd like to ask is if anybody knows about an hardware USB-dongle for software protection which offers a very complete out-of-the-box API support for cross-platform Java deployments. Its SDK should provide a jar (only one, not one different library per OS & bitness) ready to be added to one's project as a library. The jar should contain all the native stuff for the various OSes and bitnesses From the application's point of view, one should continue to write (api calls) once and run everywhere, without having to care where the end-user will run the software The provided jar should itself deal with loading the appropriate native library Does such a thing exist? With what I've tried so far, you have different APIs and compiled libraries for win32, linux32, win64, linux64, etc (or you even have to compile stuff yourself on the target machine), but hey, we're doing Java here, we don't know (and don't care) where the program will run! And we can't expect the end-user to be a software engineer, tweak (and break!) its linux server, link libraries, mess with gcc, litter the filesystem, etc... In general, Java support (in a transparent cross-platform fashion) is quite bad with the dongle SDKs I've evaluated so far (e.g. KeyLok and SecuTech's UniKey). I even purchased (no free evaluation kit available) SecureMetric SDKs&dongles (they should've been "soooo" straighforward to integrate -- according to marketing material :\ ) and they were the worst ever: SecureDongle X has no 64bit support and SecureDongle SD is not cross-platform at all. So, has anyone out there been through this and found the ultimate Java security usb dongle for cross-platform deployments? Note: software is low-volume, high-value; application is off-line (intranet with no internet access), so no online-activation alternatives and the like. -- EDIT Tried out HASP dongles (used to be called "Aladdin"), and added them to the no-no list: here, too, there is no out-of-the-box (out-of-the-jar) support: e.g. end-linux-user has to manually put the .so library (the specific file for the appropriate bitness) in the right place on his filesystem, and export an env. variable accordingly. -- EDIT 2 I really don't understand all the negativity and all the downvoting: is this a taboo topic? Is it so hard to understand that a freelance developer has to put food on the table everyday to feed its family and pay the bills at the end of the month? Please don't talk about "adding value" as a supplier, because that'd be off-topic. Furthermore I'm not in direct contact with end-customers, but there's an intermediate reselling entity: it's this entity I want to prevent selling copies of the software without sharing the revenue. -- EDIT 3 I'd like to emphasize the fact that the question is looking for a technical answer, not one about opinions concerning business models, philosophical lucubrations on the concept of value, resellers' reliability, etc. I cannot change resellers, because this isn't a "general purpose" kind of sw, but a very vertical one and (for some reasons it's not worth explaining here) I must go through them. I just need to prevent the "we sold 2 copies, here's your share [bwahaha we sold 10]" scenario.

Read the article
WebSocket and Java EE 7 - Getting Ready for JSR 356 (TOTD #181)

- by arungupta

WebSocket is developed as part of HTML 5 specification and provides a bi-directional, full-duplex communication channel over a single TCP socket. It provides dramatic improvement over the traditional approaches of Polling, Long-Polling, and Streaming for two-way communication. There is no latency from establishing new TCP connections for each HTTP message. There is a WebSocket API and the WebSocket Protocol. The Protocol defines "handshake" and "framing". The handshake defines how a normal HTTP connection can be upgraded to a WebSocket connection. The framing defines wire format of the message. The design philosophy is to keep the framing minimum to avoid the overhead. Both text and binary data can be sent using the API. WebSocket may look like a competing technology to Server-Sent Events (SSE), but they are not. Here are the key differences: WebSocket can send and receive data from a client. A typical example of WebSocket is a two-player game or a chat application. Server-Sent Events can only push data data to the client. A typical example of SSE is stock ticker or news feed. With SSE, XMLHttpRequest can be used to send data to the server. For server-only updates, WebSockets has an extra overhead and programming can be unecessarily complex. SSE provides a simple and easy-to-use model that is much better suited. SSEs are sent over traditional HTTP and so no modification is required on the server-side. WebSocket require servers that understand the protocol. SSE have several features that are missing from WebSocket such as automatic reconnection, event IDs, and the ability to send arbitrary events. The client automatically tries to reconnect if the connection is closed. The default wait before trying to reconnect is 3 seconds and can be configured by including "retry: XXXX\n" header where XXXX is the milliseconds to wait before trying to reconnect. Event stream can include a unique event identifier. This allows the server to determine which events need to be fired to each client in case the connection is dropped in between. The data can span multiple lines and can be of any text format as long as EventSource message handler can process it. WebSockets provide true real-time updates, SSE can be configured to provide close to real-time by setting appropriate timeouts. OK, so all excited about WebSocket ? Want to convert your POJOs into WebSockets endpoint ? websocket-sdk and GlassFish 4.0 is here to help! The complete source code shown in this project can be downloaded here. On the server-side, the WebSocket SDK converts a POJO into a WebSocket endpoint using simple annotations. Here is how a WebSocket endpoint will look like: @WebSocket(path="/echo")public class EchoBean { @WebSocketMessage public String echo(String message) { return message + " (from your server)"; }} In this code "@WebSocket" is a class-level annotation that declares a POJO to accept WebSocket messages. The path at which the messages are accepted is specified in this annotation. "@WebSocketMessage" indicates the Java method that is invoked when the endpoint receives a message. This method implementation echoes the received message concatenated with an additional string. The client-side HTML page looks like <div style="text-align: center;"> <form action=""> <input onclick="send_echo()" value="Press me" type="button"> <input id="textID" name="message" value="Hello WebSocket!" type="text"><br> </form></div><div id="output"></div> WebSocket allows a full-duplex communication. So the client, a browser in this case, can send a message to a server, a WebSocket endpoint in this case. And the server can send a message to the client at the same time. This is unlike HTTP which follows a "request" followed by a "response". In this code, the "send_echo" method in the JavaScript is invoked on the button click. There is also a <div> placeholder to display the response from the WebSocket endpoint. The JavaScript looks like: <script language="javascript" type="text/javascript"> var wsUri = "ws://localhost:8080/websockets/echo"; var websocket = new WebSocket(wsUri); websocket.onopen = function(evt) { onOpen(evt) }; websocket.onmessage = function(evt) { onMessage(evt) }; websocket.onerror = function(evt) { onError(evt) }; function init() { output = document.getElementById("output"); } function send_echo() { websocket.send(textID.value); writeToScreen("SENT: " + textID.value); } function onOpen(evt) { writeToScreen("CONNECTED"); } function onMessage(evt) { writeToScreen("RECEIVED: " + evt.data); } function onError(evt) { writeToScreen('<span style="color: red;">ERROR:</span> ' + evt.data); } function writeToScreen(message) { var pre = document.createElement("p"); pre.style.wordWrap = "break-word"; pre.innerHTML = message; output.appendChild(pre); } window.addEventListener("load", init, false);</script> In this code The URI to connect to on the server side is of the format ws://<HOST>:<PORT>/websockets/<PATH> "ws" is a new URI scheme introduced by the WebSocket protocol. <PATH> is the path on the endpoint where the WebSocket messages are accepted. In our case, it is ws://localhost:8080/websockets/echo WEBSOCKET_SDK-1 will ensure that context root is included in the URI as well. WebSocket is created as a global object so that the connection is created only once. This object establishes a connection with the given host, port and the path at which the endpoint is listening. The WebSocket API defines several callbacks that can be registered on specific events. The "onopen", "onmessage", and "onerror" callbacks are registered in this case. The callbacks print a message on the browser indicating which one is called and additionally also prints the data sent/received. On the button click, the WebSocket object is used to transmit text data to the endpoint. Binary data can be sent as one blob or using buffering. The HTTP request headers sent for the WebSocket call are: GET ws://localhost:8080/websockets/echo HTTP/1.1Origin: http://localhost:8080Connection: UpgradeSec-WebSocket-Extensions: x-webkit-deflate-frameHost: localhost:8080Sec-WebSocket-Key: mDbnYkAUi0b5Rnal9/cMvQ==Upgrade: websocketSec-WebSocket-Version: 13 And the response headers received are Connection:UpgradeSec-WebSocket-Accept:q4nmgFl/lEtU2ocyKZ64dtQvx10=Upgrade:websocket(Challenge Response):00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00 The headers are shown in Chrome as shown below: The complete source code shown in this project can be downloaded here. The builds from websocket-sdk are integrated in GlassFish 4.0 builds. Would you like to live on the bleeding edge ? Then follow the instructions below to check out the workspace and install the latest SDK: Check out the source code svn checkout https://svn.java.net/svn/websocket-sdk~source-code-repository Build and install the trunk in your local repository as: mvn install Copy "./bundles/websocket-osgi/target/websocket-osgi-0.3-SNAPSHOT.jar" to "glassfish3/glassfish/modules/websocket-osgi.jar" in your GlassFish 4 latest promoted build. Notice, you need to overwrite the JAR file. Anybody interested in building a cool application using WebSocket and get it running on GlassFish ? :-) This work will also feed into JSR 356 - Java API for WebSocket. On a lighter side, there seems to be less agreement on the name. Here are some of the options that are prevalent: WebSocket (W3C API, the URL is www.w3.org/TR/websockets though) Web Socket (HTML5 Demos - html5demos.com/web-socket) Websocket (Jenkins Plugin - wiki.jenkins-ci.org/display/JENKINS/Websocket%2BPlugin) WebSockets (Used by Mozilla - developer.mozilla.org/en/WebSockets, but use WebSocket as well) Web sockets (HTML5 Working Group - www.whatwg.org/specs/web-apps/current-work/multipage/network.html) Web Sockets (Chrome Blog - blog.chromium.org/2009/12/web-sockets-now-available-in-google.html) I prefer "WebSocket" as that seems to be most common usage and used by the W3C API as well. What do you use ?

Read the article
SSH new connection begins to hang (not reject or terminate) after a day or so on Ubuntu 13.04 server

- by kross

Recently we upgraded the server from 12.04 LTS server to 13.04. All was well, including after a reboot. With all packages updated we began to see a strange issue, ssh works for a day or so (unclear on timing) then a later request for SSH hangs (cannot ctrl+c, nothing). It is up and serving webserver traffic etc. Port 22 is open (ips etc altered slightly for posting): nmap -T4 -A x.acme.com Starting Nmap 6.40 ( http://nmap.org ) at 2013-09-12 16:01 CDT Nmap scan report for x.acme.com (69.137.56.18) Host is up (0.026s latency). rDNS record for 69.137.56.18: c-69-137-56-18.hsd1.tn.provider.net Not shown: 998 filtered ports PORT STATE SERVICE VERSION 22/tcp open ssh OpenSSH 6.1p1 Debian 4 (protocol 2.0) | ssh-hostkey: 1024 54:d3:e3:38:44:f4:20:a4:e7:42:49:d0:a7:f1:3e:21 (DSA) | 2048 dc:21:77:3b:f4:4e:74:d0:87:33:14:40:04:68:33:a6 (RSA) |_256 45:69:10:79:5a:9f:0b:f0:66:15:39:87:b9:a1:37:f7 (ECDSA) 80/tcp open http Jetty 7.6.2.v20120308 | http-title: Log in as a Bamboo user - Atlassian Bamboo |_Requested resource was http://x.acme.com/userlogin!default.action;jsessionid=19v135zn8cl1tgso28fse4d50?os_destination=%2Fstart.action Service Info: OS: Linux; CPE: cpe:/o:linux:linux_kernel Service detection performed. Please report any incorrect results at http://nmap.org/submit/ . Nmap done: 1 IP address (1 host up) scanned in 12.89 seconds Here is the ssh -vvv: ssh -vvv x.acme.com OpenSSH_5.9p1, OpenSSL 0.9.8x 10 May 2012 debug1: Reading configuration data /Users/tfergeson/.ssh/config debug1: Reading configuration data /etc/ssh_config debug1: /etc/ssh_config line 20: Applying options for * debug2: ssh_connect: needpriv 0 debug1: Connecting to x.acme.com [69.137.56.18] port 22. debug1: Connection established. debug3: Incorrect RSA1 identifier debug3: Could not load "/Users/tfergeson/.ssh/id_rsa" as a RSA1 public key debug1: identity file /Users/tfergeson/.ssh/id_rsa type 1 debug1: identity file /Users/tfergeson/.ssh/id_rsa-cert type -1 debug1: identity file /Users/tfergeson/.ssh/id_dsa type -1 debug1: identity file /Users/tfergeson/.ssh/id_dsa-cert type -1 debug1: Remote protocol version 2.0, remote software version OpenSSH_6.1p1 Debian-4 debug1: match: OpenSSH_6.1p1 Debian-4 pat OpenSSH* debug1: Enabling compatibility mode for protocol 2.0 debug1: Local version string SSH-2.0-OpenSSH_5.9 debug2: fd 3 setting O_NONBLOCK debug3: load_hostkeys: loading entries for host "x.acme.com" from file "/Users/tfergeson/.ssh/known_hosts" debug3: load_hostkeys: found key type RSA in file /Users/tfergeson/.ssh/known_hosts:10 debug3: load_hostkeys: loaded 1 keys debug3: order_hostkeyalgs: prefer hostkeyalgs: [email protected],[email protected],ssh-rsa debug1: SSH2_MSG_KEXINIT sent debug1: SSH2_MSG_KEXINIT received debug2: kex_parse_kexinit: diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: [email protected],[email protected],ssh-rsa,[email protected],[email protected],ssh-dss debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-sha2-256,hmac-sha2-256-96,hmac-sha2-512,hmac-sha2-512-96,hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,[email protected],zlib debug2: kex_parse_kexinit: none,[email protected],zlib debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: kex_parse_kexinit: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group14-sha1,diffie-hellman-group1-sha1 debug2: kex_parse_kexinit: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256 debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,aes128-cbc,3des-cbc,blowfish-cbc,cast128-cbc,aes192-cbc,aes256-cbc,arcfour,[email protected] debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: hmac-md5,hmac-sha1,[email protected],hmac-sha2-256,hmac-sha2-512,hmac-ripemd160,[email protected],hmac-sha1-96,hmac-md5-96 debug2: kex_parse_kexinit: none,[email protected] debug2: kex_parse_kexinit: none,[email protected] debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: debug2: kex_parse_kexinit: first_kex_follows 0 debug2: kex_parse_kexinit: reserved 0 debug2: mac_setup: found hmac-md5 debug1: kex: server->client aes128-ctr hmac-md5 none debug2: mac_setup: found hmac-md5 debug1: kex: client->server aes128-ctr hmac-md5 none debug1: SSH2_MSG_KEX_DH_GEX_REQUEST(1024<1024<8192) sent debug1: expecting SSH2_MSG_KEX_DH_GEX_GROUP debug2: dh_gen_key: priv key bits set: 130/256 debug2: bits set: 503/1024 debug1: SSH2_MSG_KEX_DH_GEX_INIT sent debug1: expecting SSH2_MSG_KEX_DH_GEX_REPLY debug1: Server host key: RSA dc:21:77:3b:f4:4e:74:d0:87:33:14:40:04:68:33:a6 debug3: load_hostkeys: loading entries for host "x.acme.com" from file "/Users/tfergeson/.ssh/known_hosts" debug3: load_hostkeys: found key type RSA in file /Users/tfergeson/.ssh/known_hosts:10 debug3: load_hostkeys: loaded 1 keys debug3: load_hostkeys: loading entries for host "69.137.56.18" from file "/Users/tfergeson/.ssh/known_hosts" debug3: load_hostkeys: found key type RSA in file /Users/tfergeson/.ssh/known_hosts:6 debug3: load_hostkeys: loaded 1 keys debug1: Host 'x.acme.com' is known and matches the RSA host key. debug1: Found key in /Users/tfergeson/.ssh/known_hosts:10 debug2: bits set: 493/1024 debug1: ssh_rsa_verify: signature correct debug2: kex_derive_keys debug2: set_newkeys: mode 1 debug1: SSH2_MSG_NEWKEYS sent debug1: expecting SSH2_MSG_NEWKEYS debug2: set_newkeys: mode 0 debug1: SSH2_MSG_NEWKEYS received debug1: Roaming not allowed by server debug1: SSH2_MSG_SERVICE_REQUEST sent debug2: service_accept: ssh-userauth debug1: SSH2_MSG_SERVICE_ACCEPT received debug2: key: /Users/tfergeson/.ssh/id_rsa (0x7ff189c1d7d0) debug2: key: /Users/tfergeson/.ssh/id_dsa (0x0) debug1: Authentications that can continue: publickey debug3: start over, passed a different list publickey debug3: preferred publickey,keyboard-interactive,password debug3: authmethod_lookup publickey debug3: remaining preferred: keyboard-interactive,password debug3: authmethod_is_enabled publickey debug1: Next authentication method: publickey debug1: Offering RSA public key: /Users/tfergeson/.ssh/id_rsa debug3: send_pubkey_test debug2: we sent a publickey packet, wait for reply debug1: Server accepts key: pkalg ssh-rsa blen 277 debug2: input_userauth_pk_ok: fp 3c:e5:29:6c:9d:27:d1:7d:e8:09:a2:e8:8e:6e:af:6f debug3: sign_and_send_pubkey: RSA 3c:e5:29:6c:9d:27:d1:7d:e8:09:a2:e8:8e:6e:af:6f debug1: read PEM private key done: type RSA debug1: Authentication succeeded (publickey). Authenticated to x.acme.com ([69.137.56.18]:22). debug1: channel 0: new [client-session] debug3: ssh_session2_open: channel_new: 0 debug2: channel 0: send open debug1: Requesting [email protected] debug1: Entering interactive session. debug2: callback start debug2: client_session2_setup: id 0 debug2: fd 3 setting TCP_NODELAY debug2: channel 0: request pty-req confirm 1 debug1: Sending environment. debug3: Ignored env ATLAS_OPTS debug3: Ignored env rvm_bin_path debug3: Ignored env TERM_PROGRAM debug3: Ignored env GEM_HOME debug3: Ignored env SHELL debug3: Ignored env TERM debug3: Ignored env CLICOLOR debug3: Ignored env IRBRC debug3: Ignored env TMPDIR debug3: Ignored env Apple_PubSub_Socket_Render debug3: Ignored env TERM_PROGRAM_VERSION debug3: Ignored env MY_RUBY_HOME debug3: Ignored env TERM_SESSION_ID debug3: Ignored env USER debug3: Ignored env COMMAND_MODE debug3: Ignored env rvm_path debug3: Ignored env COM_GOOGLE_CHROME_FRAMEWORK_SERVICE_PROCESS/USERS/tfergeson/LIBRARY/APPLICATION_SUPPORT/GOOGLE/CHROME_SOCKET debug3: Ignored env JPDA_ADDRESS debug3: Ignored env APDK_HOME debug3: Ignored env SSH_AUTH_SOCK debug3: Ignored env Apple_Ubiquity_Message debug3: Ignored env __CF_USER_TEXT_ENCODING debug3: Ignored env rvm_sticky_flag debug3: Ignored env MAVEN_OPTS debug3: Ignored env LSCOLORS debug3: Ignored env rvm_prefix debug3: Ignored env PATH debug3: Ignored env PWD debug3: Ignored env JAVA_HOME debug1: Sending env LANG = en_US.UTF-8 debug2: channel 0: request env confirm 0 debug3: Ignored env JPDA_TRANSPORT debug3: Ignored env rvm_version debug3: Ignored env M2_HOME debug3: Ignored env HOME debug3: Ignored env SHLVL debug3: Ignored env rvm_ruby_string debug3: Ignored env LOGNAME debug3: Ignored env M2_REPO debug3: Ignored env GEM_PATH debug3: Ignored env AWS_RDS_HOME debug3: Ignored env rvm_delete_flag debug3: Ignored env EC2_PRIVATE_KEY debug3: Ignored env RUBY_VERSION debug3: Ignored env SECURITYSESSIONID debug3: Ignored env EC2_CERT debug3: Ignored env _ debug2: channel 0: request shell confirm 1 debug2: callback done debug2: channel 0: open confirm rwindow 0 rmax 32768 I can hard reboot (only mac monitors at that location) and it will again be accessible. This now happens every single time. It is imperative that I get it sorted. The strange thing is that it behaves initially then starts to hang after several hours. I perused logs previously and nothing stood out. From the auth.log, I can see that it has allowed me in, but still I get nothing back on the client side: Sep 20 12:47:50 cbear sshd[25376]: Accepted publickey for tfergeson from 10.1.10.14 port 54631 ssh2 Sep 20 12:47:50 cbear sshd[25376]: pam_unix(sshd:session): session opened for user tfergeson by (uid=0) UPDATES: Still occurring even after setting UseDNS no and commenting out #session optional pam_mail.so standard noenv This does not appear to be a network/dns related issue, as all services running on the machine are as responsive and accessible as ever, with the exception of sshd. Any thoughts on where to start?

Read the article

Search Results

Search found 5221 results on 209 pages for 'low latency'.

Page 170/209 | < Previous Page | 166 167 168 169 170 171 172 173 174 175 176 177 | Next Page >

- by MrDobalina

- by Speed

- by john.rose

- by Alan Smith

- by user12620111

- by Chris

- by Throlkim

- by Alex Dunlop

- by Hans Løken

- by xsl

- by gatoatigrado

- by AndrewO

- by pableu

- by fsm

- by BTR

- by pinaldave

- by Eric Z Goodnight

- by Boonei

- by Chris Houston

- by DigitalGeekery

- by Rick Strahl

- by Moo-Juice

- by Unai Vivi

- by arungupta

- by kross

< Previous Page | 166 167 168 169 170 171 172 173 174 175 176 177 | Next Page >