jkff - Developer IT

Don't understand the typing of Scala's delimited continuations (A @cps[B,C])

- by jkff

I'm struggling to understand what precisely does it mean when a value has type A @cps[B,C] and what types of this form should I assign to my values when using the delimited continuations facility. I've looked at some sources: http://lamp.epfl.ch/~rompf/continuations-icfp09.pdf http://www.scala-lang.org/node/2096 http://dcsobral.blogspot.com/2009/07/delimited-continuations-explained-in.html http://blog.richdougherty.com/2009/02/delimited-continuations-in-scala_24.html but they didn't give me much intuition into this. In the last link, the author tries to give an explicit explanation, but it is not clear enough anyway. The A here represents the output of the computation, which is also the input to its continuation. The B represents the return type of that continuation, and the C represents its "final" return type—because shift can do further processing to the returned value and change its type. I don't understand the difference between "output of the computation", "return type of the continuation" and "final return type of the continuation". They sound like synonyms.

Read the article

Examples of monoids/semigroups in programming

- by jkff

It is well-known that monoids are stunningly ubiquitous in programing. They are so ubiquitous and so useful that I, as a 'hobby project', am working on a system that is completely based on their properties (distributed data aggregation). To make the system useful I need useful monoids :) I already know of these: Numeric or matrix sum Numeric or matrix product Minimum or maximum under a total order with a top or bottom element (more generally, join or meet in a bounded lattice, or even more generally, product or coproduct in a category) Set union Map union where conflicting values are joined using a monoid Intersection of subsets of a finite set (or just set intersection if we speak about semigroups) Intersection of maps with a bounded key domain (same here) Merge of sorted sequences, perhaps with joining key-equal values in a different monoid/semigroup Bounded merge of sorted lists (same as above, but we take the top N of the result) Cartesian product of two monoids or semigroups List concatenation Endomorphism composition. Now, let us define a quasi-property of an operation as a property that holds up to an equivalence relation. For example, list concatenation is quasi-commutative if we consider lists of equal length or with identical contents up to permutation to be equivalent. Here are some quasi-monoids and quasi-commutative monoids and semigroups: Any (a+b = a or b, if we consider all elements of the carrier set to be equivalent) Any satisfying predicate (a+b = the one of a and b that is non-null and satisfies some predicate P, if none does then null; if we consider all elements satisfying P equivalent) Bounded mixture of random samples (xs+ys = a random sample of size N from the concatenation of xs and ys; if we consider any two samples with the same distribution as the whole dataset to be equivalent) Bounded mixture of weighted random samples Which others do exist?

Read the article

Random access gzip stream

- by jkff

I'd like to be able to do random access into a gzipped file. I can afford to do some preprocessing on it (say, build some kind of index), provided that the result of the preprocessing is much smaller than the file itself. Any advice? My thoughts were: Hack on an existing gzip implementation and serialize its decompressor state every, say, 1 megabyte of compressed data. Then to do random access, deserialize the decompressor state and read from the megabyte boundary. This seems hard, especially since I'm working with Java and I couldn't find a pure-java gzip implementation :( Re-compress the file in chunks of 1Mb and do same as above. This has the disadvantage of doubling the required disk space. Write a simple parser of the gzip format that doesn't do any decompressing and only detects and indexes block boundaries (if there even are any blocks: I haven't yet read the gzip format description)

Read the article

Strange profiling results: definitely non-bottleneck method pops up

- by jkff

I'm profiling a program using sampling profiling in YourKit and JProfiler, and also "manually" (I launch it and press Ctrl-Break several times to get thread dumps). All three methods give me extremely strange results: some tens of percents of time spent in a 3-line method that does not even do any allocation or synchronization and doesn't have loops etc. Moreover, after I made this method into a NOP and even removed its invocation completely, the observable program performance didn't change at all (although it got a negligible memory leak, since it was a method for freeing a cheap resource). I'm thinking that this might be because of the constraints that JVM puts on the moments at which a thread's stacktrace may be taken, and it somehow turns out that in my program it is exactly the moments where this method is invoked, although there is absolutely nothing special about it or the context in which it is invoked. What can be the explanation for this phenomenon? What are the aforementioned constraints? What further measurements can I take to clarify the situation?

Read the article

Articles about replication schemes/algorithms?

- by jkff

I'm designing an hierarchical distributed system (every node has zero or more "master" nodes to which it propagates its current data). The data gets continuously updated and I'd like to guarantee that at least N nodes have almost-current data at any given time. I do not need complete consistency, only eventual consistency (t.i. for any time instant, the current snapshot of data should eventually appear on at least N nodes. It is tricky to define the term "current" here, but still). Nodes may fail and go back up at any moment, and there is no single "central" node. O overflowers! Point me to some good papers describing replication schemes. I've so far found one: Consistency Management in Optimistic Replication Algorithms

Read the article

What is the most efficient way to store a mapping "key -> event stream"?

- by jkff

Suppose there are ~10,000's of keys, where each key corresponds to a stream of events. I'd like to support the following operations: push(key, timestamp, event) - pushes event to the event queue for key, marked with the given timestamp. It is guaranteed that event timestamps for a particular key are pushed in sorted or almost sorted order. tail(key, timestamp) - get all events for key since the given timestamp. Usually the timestamp requests for a given key are almost monotonically increasing, almost synchronously with pushes for the same key. This stuff has to be persistent (although it is not absolutely necessary to persist pushes immediately and to keep tails with pushes strictly in sync), so I'm going to use some kind of database. What is the optimal kind of database structure for this task? Would it be better to use a relational database, a key-value storage, or something else?

Read the article

Tools for viewing logs of unlimited size

- by jkff

It's no secret that application logs can go well beyond the limits of naive log viewers, and the desired viewer functionality (say, filtering the log based on a condition, or highlighting particular message types, or splitting it into sublogs based on a field value, or merging several logs based on a time axis, or bookmarking etc.) is beyond the abilities of large-file text viewers. I wonder: Whether decent specialized applications exist (I haven't found any) What functionality might one expect from such an application? (I'm asking because my student is writing such an application, and the functionality above has already been implemented to a certain extent of usability)

Read the article

Does allocation speed depend on the garbage collector being used?

- by jkff

My app is allocating a ton of objects (1mln per second; most objects are byte arrays of size ~80-100 and strings of the same size) and I think it might be the source of its poor performance. The app's working set is only tens of megabytes. Profiling the app shows that GC time is negligibly small. However, I suspect that perhaps the allocation procedure depends on which GC is being used, and some settings might make allocation faster or perhaps make a positive influence on cache hit rate, etc. Is that so? Or is allocation performance independent on GC settings under the assumption that garbage collection itself takes little time?

Developer IT