Search Results

Search found 12 results on 1 pages for 'jpurdy'.

Page 1/1 | 1 

  • CPU Usage in Very Large Coherence Clusters

    - by jpurdy
    When sizing Coherence installations, one of the complicating factors is that these installations (by their very nature) tend to be application-specific, with some being large, memory-intensive caches, with others acting as I/O-intensive transaction-processing platforms, and still others performing CPU-intensive calculations across the data grid. Regardless of the primary resource requirements, Coherence sizing calculations are inherently empirical, in that there are so many permutations that a simple spreadsheet approach to sizing is rarely optimal (though it can provide a good starting estimate). So we typically recommend measuring actual resource usage (primarily CPU cycles, network bandwidth and memory) at a given load, and then extrapolating from those measurements. Of course there may be multiple types of load, and these may have varying degrees of correlation -- for example, an increased request rate may drive up the number of objects "pinned" in memory at any point, but the increase may be less than linear if those objects are naturally shared by concurrent requests. But for most reasonably-designed applications, a linear resource model will be reasonably accurate for most levels of scale. However, at extreme scale, sizing becomes a bit more complicated as certain cluster management operations -- while very infrequent -- become increasingly critical. This is because certain operations do not naturally tend to scale out. In a small cluster, sizing is primarily driven by the request rate, required cache size, or other application-driven metrics. In larger clusters (e.g. those with hundreds of cluster members), certain infrastructure tasks become intensive, in particular those related to members joining and leaving the cluster, such as introducing new cluster members to the rest of the cluster, or publishing the location of partitions during rebalancing. These tasks have a strong tendency to require all updates to be routed via a single member for the sake of cluster stability and data integrity. Fortunately that member is dynamically assigned in Coherence, so it is not a single point of failure, but it may still become a single point of bottleneck (until the cluster finishes its reconfiguration, at which point this member will have a similar load to the rest of the members). The most common cause of scaling issues in large clusters is disabling multicast (by configuring well-known addresses, aka WKA). This obviously impacts network usage, but it also has a large impact on CPU usage, primarily since the senior member must directly communicate certain messages with every other cluster member, and this communication requires significant CPU time. In particular, the need to notify the rest of the cluster about membership changes and corresponding partition reassignments adds stress to the senior member. Given that portions of the network stack may tend to be single-threaded (both in Coherence and the underlying OS), this may be even more problematic on servers with poor single-threaded performance. As a result of this, some extremely large clusters may be configured with a smaller number of partitions than ideal. This results in the size of each partition being increased. When a cache server fails, the other servers will use their fractional backups to recover the state of that server (and take over responsibility for their backed-up portion of that state). The finest granularity of this recovery is a single partition, and the single service thread can not accept new requests during this recovery. Ordinarily, recovery is practically instantaneous (it is roughly equivalent to the time required to iterate over a set of backup backing map entries and move them to the primary backing map in the same JVM). But certain factors can increase this duration drastically (to several seconds): large partitions, sufficiently slow single-threaded CPU performance, many or expensive indexes to rebuild, etc. The solution of course is to mitigate each of those factors but in many cases this may be challenging. Larger clusters also lead to the temptation to place more load on the available hardware resources, spreading CPU resources thin. As an example, while we've long been aware of how garbage collection can cause significant pauses, it usually isn't viewed as a major consumer of CPU (in terms of overall system throughput). Typically, the use of a concurrent collector allows greater responsiveness by minimizing pause times, at the cost of reducing system throughput. However, at a recent engagement, we were forced to turn off the concurrent collector and use a traditional parallel "stop the world" collector to reduce CPU usage to an acceptable level. In summary, there are some less obvious factors that may result in excessive CPU consumption in a larger cluster, so it is even more critical to test at full scale, even though allocating sufficient hardware may often be much more difficult for these large clusters.

    Read the article

  • Coherence Query Performance in Large Clusters

    - by jpurdy
    Large clusters (measured in terms of the number of storage-enabled members participating in the largest cache services) may introduce challenges when issuing queries. There is no particular cluster size threshold for this, rather a gradually increasing tendency for issues to arise. The most obvious challenges are that a client's perceived query latency will be determined by the slowest responder (more likely to be a factor in larger clusters) as well as the fact that adding additional cache servers will not increase query throughput if the query processing is not compute-bound (which would generally be the case for most indexed queries). If the data set can take advantage of the partition affinity features of Coherence, then the application can use a PartitionedFilter to target a query to a single server (using partition affinity to ensure that all data is in a single partition). If this can not be done, then avoiding an excessive number of cache server JVMs will help, as will ensuring that each cache server has sufficient CPU resources available and is also properly configured to minimize GC pauses (the most common cause of a slow-responding cache server).

    Read the article

  • Dealing with Fine-Grained Cache Entries in Coherence

    - by jpurdy
    On occasion we have seen significant memory overhead when using very small cache entries. Consider the case where there is a small key (say a synthetic key stored in a long) and a small value (perhaps a number or short string). With most backing maps, each cache entry will require an instance of Map.Entry, and in the case of a LocalCache backing map (used for expiry and eviction), there is additional metadata stored (such as last access time). Given the size of this data (usually a few dozen bytes) and the granularity of Java memory allocation (often a minimum of 32 bytes per object, depending on the specific JVM implementation), it is easily possible to end up with the case where the cache entry appears to be a couple dozen bytes but ends up occupying several hundred bytes of actual heap, resulting in anywhere from a 5x to 10x increase in stated memory requirements. In most cases, this increase applies to only a few small NamedCaches, and is inconsequential -- but in some cases it might apply to one or more very large NamedCaches, in which case it may dominate memory sizing calculations. Ultimately, the requirement is to avoid the per-entry overhead, which can be done either at the application level by grouping multiple logical entries into single cache entries, or at the backing map level, again by combining multiple entries into a smaller number of larger heap objects. At the application level, it may be possible to combine objects based on parent-child or sibling relationships (basically the same requirements that would apply to using partition affinity). If there is no natural relationship, it may still be possible to combine objects, effectively using a Coherence NamedCache as a "map of maps". This forces the application to first find a collection of objects (by performing a partial hash) and then to look within that collection for the desired object. This is most naturally implemented as a collection of entry processors to avoid pulling unnecessary data back to the client (and also to encapsulate that logic within a service layer). At the backing map level, the NIO storage option keeps keys on heap, and so has limited benefit for this situation. The Elastic Data features of Coherence naturally combine entries into larger heap objects, with the caveat that only data -- and not indexes -- can be stored in Elastic Data.

    Read the article

  • Coherence Data Guarantees for Data Reads - Basic Terminology

    - by jpurdy
    When integrating Coherence into applications, each application has its own set of requirements with respect to data integrity guarantees. Developers often describe these requirements using expressions like "avoiding dirty reads" or "making sure that updates are transactional", but we often find that even in a small group of people, there may be a wide range of opinions as to what these terms mean. This may simply be due to a lack of familiarity, but given that Coherence sits at an intersection of several (mostly) unrelated fields, it may be a matter of conflicting vocabularies (e.g. "consistency" is similar but different in transaction processing versus multi-threaded programming). Since almost all data read consistency issues are related to the concept of concurrency, it is helpful to start with a definition of that, or rather what it means for two operations to be concurrent. Rather than implying that they occur "at the same time", concurrency is a slightly weaker statement -- it simply means that it can't be proven that one event precedes (or follows) the other. As an example, in a Coherence application, if two client members mutate two different cache entries sitting on two different cache servers at roughly the same time, it is likely that one update will precede the other by a significant amount of time (say 0.1ms). However, since there is no guarantee that all four members have their clocks perfectly synchronized, and there is no way to precisely measure the time it takes to send a given message between any two members (that have differing clocks), we consider these to be concurrent operations since we can not (easily) prove otherwise. So this leads to a question that we hear quite frequently: "Are the contents of the near cache always synchronized with the underlying distributed cache?". It's easy to see that if an update on a cache server results in a message being sent to each near cache, and then that near cache being updated that there is a window where the contents are different. However, this is irrelevant, since even if the application reads directly from the distributed cache, another thread update the cache before the read is returned to the application. Even if no other member modifies a cache entry prior to the local near cache entry being updated (and subsequently read), the purpose of reading a cache entry is to do something with the result, usually either displaying for consumption by a human, or by updating the entry based on the current state of the entry. In the former case, it's clear that if the data is updated faster than a human can perceive, then there is no problem (and in many cases this can be relaxed even further). For the latter case, the application must assume that the value might potentially be updated before it has a chance to update it. This almost aways the case with read-only caches, and the solution is the traditional optimistic transaction pattern, which requires the application to explicitly state what assumptions it made about the old value of the cache entry. If the application doesn't want to bother stating those assumptions, it is free to lock the cache entry prior to reading it, ensuring that no other threads will mutate the entry, a pessimistic approach. The optimistic approach relies on what is sometimes called a "fuzzy read". In other words, the application assumes that the read should be correct, but it also acknowledges that it might not be. (I use the qualifier "sometimes" because in some writings, "fuzzy read" indicates the situation where the application actually sees an original value and then later sees an updated value within the same transaction -- however, both definitions are roughly equivalent from an application design perspective). If the read is not correct it is called a "stale read". Going back to the definition of concurrency, it may seem difficult to precisely define a stale read, but the practical way of detecting a stale read is that is will cause the encompassing transaction to roll back if it tries to update that value. The pessimistic approach relies on a "coherent read", a guarantee that the value returned is not only the same as the primary copy of that value, but also that it will remain that way. In most cases this can be used interchangeably with "repeatable read" (though that term has additional implications when used in the context of a database system). In none of cases above is it possible for the application to perform a "dirty read". A dirty read occurs when the application reads a piece of data that was never committed. In practice the only way this can occur is with multi-phase updates such as transactions, where a value may be temporarily update but then withdrawn when a transaction is rolled back. If another thread sees that value prior to the rollback, it is a dirty read. If an application uses optimistic transactions, dirty reads will merely result in a lack of forward progress (this is actually one of the main risks of dirty reads -- they can be chained and potentially cause cascading rollbacks). The concepts of dirty reads, fuzzy reads, stale reads and coherent reads are able to describe the vast majority of requirements that we see in the field. However, the important thing is to define the terms used to define requirements. A quick web search for each of the terms in this article will show multiple meanings, so I've selected what are generally the most common variations, but it never hurts to state each definition explicitly if they are critical to the success of a project (many applications have sufficiently loose requirements that precise terminology can be avoided).

    Read the article

  • Implementing a Custom Coherence PartitionAssignmentStrategy

    - by jpurdy
    A recent A-Team engagement required the development of a custom PartitionAssignmentStrategy (PAS). By way of background, a PAS is an implementation of a Java interface that controls how a Coherence partitioned cache service assigns partitions (primary and backup copies) across the available set of storage-enabled members. While seemingly straightforward, this is actually a very difficult problem to solve. Traditionally, Coherence used a distributed algorithm spread across the cache servers (and as of Coherence 3.7, this is still the default implementation). With the introduction of the PAS interface, the model of operation was changed so that the logic would run solely in the cache service senior member. Obviously, this makes the development of a custom PAS vastly less complex, and in practice does not introduce a significant single point of failure/bottleneck. Note that Coherence ships with a default PAS implementation but it is not used by default. Further, custom PAS implementations are uncommon (this engagement was the first custom implementation that we know of). The particular implementation mentioned above also faced challenges related to managing multiple backup copies but that won't be discussed here. There were a few challenges that arose during design and implementation: Naive algorithms had an unreasonable upper bound of computational cost. There was significant complexity associated with configurations where the member count varied significantly between physical machines. Most of the complexity of a PAS is related to rebalancing, not initial assignment (which is usually fairly simple). A custom PAS may need to solve several problems simultaneously, such as: Ensuring that each member has a similar number of primary and backup partitions (e.g. each member has the same number of primary and backup partitions) Ensuring that each member carries similar responsibility (e.g. the most heavily loaded member has no more than one partition more than the least loaded). Ensuring that each partition is on the same member as a corresponding local resource (e.g. for applications that use partitioning across message queues, to ensure that each partition is collocated with its corresponding message queue). Ensuring that a given member holds no more than a given number of partitions (e.g. no member has more than 10 partitions) Ensuring that backups are placed far enough away from the primaries (e.g. on a different physical machine or a different blade enclosure) Achieving the above goals while ensuring that partition movement is minimized. These objectives can be even more complicated when the topology of the cluster is irregular. For example, if multiple cluster members may exist on each physical machine, then clearly the possibility exists that at certain points (e.g. following a member failure), the number of members on each machine may vary, in certain cases significantly so. Consider the case where there are three physical machines, with 3, 3 and 9 members each (respectively). This introduces complexity since the backups for the 9 members on the the largest machine must be spread across the other 6 members (to ensure placement on different physical machines), preventing an even distribution. For any given problem like this, there are usually reasonable compromises available, but the key point is that objectives may conflict under extreme (but not at all unlikely) circumstances. The most obvious general purpose partition assignment algorithm (possibly the only general purpose one) is to define a scoring function for a given mapping of partitions to members, and then apply that function to each possible permutation, selecting the most optimal permutation. This would result in N! (factorial) evaluations of the scoring function. This is clearly impractical for all but the smallest values of N (e.g. a partition count in the single digits). It's difficult to prove that more efficient general purpose algorithms don't exist, but the key take away from this is that algorithms will tend to either have exorbitant worst case performance or may fail to find optimal solutions (or both) -- it is very important to be able to show that worst case performance is acceptable. This quickly leads to the conclusion that the problem must be further constrained, perhaps by limiting functionality or by using domain-specific optimizations. Unfortunately, it can be very difficult to design these more focused algorithms. In the specific case mentioned, we constrained the solution space to very small clusters (in terms of machine count) with small partition counts and supported exactly two backup copies, and accepted the fact that partition movement could potentially be significant (preferring to solve that issue through brute force). We then used the out-of-the-box PAS implementation as a fallback, delegating to it for configurations that were not supported by our algorithm. Our experience was that the PAS interface is quite usable, but there are intrinsic challenges to designing PAS implementations that should be very carefully evaluated before committing to that approach.

    Read the article

  • Impact of Server Failure on Coherence Request Processing

    - by jpurdy
    Requests against a given cache server may be temporarily blocked for several seconds following the failure of other cluster members. This may cause issues for applications that can not tolerate multi-second response times even during failover processing (ignoring for the moment that in practice there are a variety of issues that make such absolute guarantees challenging even when there are no server failures). In general, Coherence is designed around the principle that failures in one member should not affect the rest of the cluster if at all possible. However, it's obvious that if that failed member was managing a piece of state that another member depends on, the second member will need to wait until a new member assumes responsibility for managing that state. This transfer of responsibility is (as of Coherence 3.7) performed by the primary service thread for each cache service. The finest possible granularity for transferring responsibility is a single partition. So the question becomes how to minimize the time spent processing each partition. Here are some optimizations that may reduce this period: Reduce the size of each partition (by increasing the partition count) Increase the number of JVMs across the cluster (increasing the total number of primary service threads) Increase the number of CPUs across the cluster (making sure that each JVM has a CPU core when needed) Re-evaluate the set of configured indexes (as these will need to be rebuilt when a partition moves) Make sure that the backing map is as fast as possible (in most cases this means running on-heap) Make sure that the cluster is running on hardware with fast CPU cores (since the partition processing is single-threaded) As always, proper testing is required to make sure that configuration changes have the desired effect (and also to quantify that effect).

    Read the article

  • Using WKA in Large Coherence Clusters (Disabling Multicast)

    - by jpurdy
    Disabling hardware multicast (by configuring well-known addresses aka WKA) will place significant stress on the network. For messages that must be sent to multiple servers, rather than having a server send a single packet to the switch and having the switch broadcast that packet to the rest of the cluster, the server must send a packet to each of the other servers. While hardware varies significantly, consider that a server with a single gigabit connection can send at most ~70,000 packets per second. To continue with some concrete numbers, in a cluster with 500 members, that means that each server can send at most 140 cluster-wide messages per second. And if there are 10 cluster members on each physical machine, that number shrinks to 14 cluster-wide messages per second (or with only mild hyperbole, roughly zero). It is also important to keep in mind that network I/O is not only expensive in terms of the network itself, but also the consumption of CPU required to send (or receive) a message (due to things like copying the packet bytes, processing a interrupt, etc). Fortunately, Coherence is designed to rely primarily on point-to-point messages, but there are some features that are inherently one-to-many: Announcing the arrival or departure of a member Updating partition assignment maps across the cluster Creating or destroying a NamedCache Invalidating a cache entry from a large number of client-side near caches Distributing a filter-based request across the full set of cache servers (e.g. queries, aggregators and entry processors) Invoking clear() on a NamedCache The first few of these are operations that are primarily routed through a single senior member, and also occur infrequently, so they usually are not a primary consideration. There are cases, however, where the load from introducing new members can be substantial (to the point of destabilizing the cluster). Consider the case where cluster in the first paragraph grows from 500 members to 1000 members (holding the number of physical machines constant). During this period, there will be 500 new member introductions, each of which may consist of several cluster-wide operations (for the cluster membership itself as well as the partitioned cache services, replicated cache services, invocation services, management services, etc). Note that all of these introductions will route through that one senior member, which is sharing its network bandwidth with several other members (which will be communicating to a lesser degree with other members throughout this process). While each service may have a distinct senior member, there's a good chance during initial startup that a single member will be the senior for all services (if those services start on the senior before the second member joins the cluster). It's obvious that this could cause CPU and/or network starvation. In the current release of Coherence (3.7.1.3 as of this writing), the pure unicast code path also has less sophisticated flow-control for cluster-wide messages (compared to the multicast-enabled code path), which may also result in significant heap consumption on the senior member's JVM (from the message backlog). This is almost never a problem in practice, but with sufficient CPU or network starvation, it could become critical. For the non-operational concerns (near caches, queries, etc), the application itself will determine how much load is placed on the cluster. Applications intended for deployment in a pure unicast environment should be careful to avoid excessive dependence on these features. Even in an environment with multicast support, these operations may scale poorly since even with a constant request rate, the underlying workload will increase at roughly the same rate as the underlying resources are added. Unless there is an infrastructural requirement to the contrary, multicast should be enabled. If it can't be enabled, care should be taken to ensure the added overhead doesn't lead to performance or stability issues. This is particularly crucial in large clusters.

    Read the article

  • Using Queries with Coherence Write-Behind Caches

    - by jpurdy
    Applications that use write-behind caching and wish to query the logical entity set have the option of querying the NamedCache itself or querying the database. In the former case, no particular restrictions exist beyond the limitations intrinsic to the Coherence query engine itself. In the latter case, queries may see partially committed transactions (e.g. with a parent-child relationship, the version of the parent may be different than the version of the child objects) and/or significant version skew (the query may see the current version of one object and a far older version of another object). This is consistent with "read committed" semantics, but the read skew may be far greater than would ever occur in a non-cached environment. As is usually the case, the application developer may choose to accept these limitations (with the hope that they are sufficiently infrequent), or they may choose to validate the reads (perhaps via a version flag on the objects). This also applies to situations where a third party application (such as a reporting tool) is querying the database. In many cases, the database may only be in a consistent state after the Coherence cluster has been halted.

    Read the article

  • Using XA Transactions in Coherence-based Applications

    - by jpurdy
    While the costs of XA transactions are well known (e.g. increased data contention, higher latency, significant disk I/O for logging, availability challenges, etc.), in many cases they are the most attractive option for coordinating logical transactions across multiple resources. There are a few common approaches when integrating Coherence into applications via the use of an application server's transaction manager: Use of Coherence as a read-only cache, applying transactions to the underlying database (or any system of record) instead of the cache. Use of TransactionMap interface via the included resource adapter. Use of the new ACID transaction framework, introduced in Coherence 3.6.   Each of these may have significant drawbacks for certain workloads. Using Coherence as a read-only cache is the simplest option. In this approach, the application is responsible for managing both the database and the cache (either within the business logic or via application server hooks). This approach also tends to provide limited benefit for many workloads, particularly those workloads that either have queries (given the complexity of maintaining a fully cached data set in Coherence) or are not read-heavy (where the cost of managing the cache may outweigh the benefits of reading from it). All updates are made synchronously to the database, leaving it as both a source of latency as well as a potential bottleneck. This approach also prevents addressing "hot data" problems (when certain objects are updated by many concurrent transactions) since most database servers offer no facilities for explicitly controlling concurrent updates. Finally, this option tends to be a better fit for key-based access (rather than filter-based access such as queries) since this makes it easier to aggressively invalidate cache entries without worrying about when they will be reloaded. The advantage of this approach is that it allows strong data consistency as long as optimistic concurrency control is used to ensure that database updates are applied correctly regardless of whether the cache contains stale (or even dirty) data. Another benefit of this approach is that it avoids the limitations of Coherence's write-through caching implementation. TransactionMap is generally used when Coherence acts as system of record. TransactionMap is not generally compatible with write-through caching, so it will usually be either used to manage a standalone cache or when the cache is backed by a database via write-behind caching. TransactionMap has some restrictions that may limit its utility, the most significant being: The lock-based concurrency model is relatively inefficient and may introduce significant latency and contention. As an example, in a typical configuration, a transaction that updates 20 cache entries will require roughly 40ms just for lock management (assuming all locks are granted immediately, and excluding validation and writing which will require a similar amount of time). This may be partially mitigated by denormalizing (e.g. combining a parent object and its set of child objects into a single cache entry), at the cost of increasing false contention (e.g. transactions will conflict even when updating different child objects). If the client (application server JVM) fails during the commit phase, locks will be released immediately, and the transaction may be partially committed. In practice, this is usually not as bad as it may sound since the commit phase is usually very short (all locks having been previously acquired). Note that this vulnerability does not exist when a single NamedCache is used and all updates are confined to a single partition (generally implying the use of partition affinity). The unconventional TransactionMap API is cumbersome but manageable. Only a few methods are transactional, primarily get(), put() and remove(). The ACID transactions framework (accessed via the Connection class) provides atomicity guarantees by implementing the NamedCache interface, maintaining its own cache data and transaction logs inside a set of private partitioned caches. This feature may be used as either a local transactional resource or as logging XA resource. However, a lack of database integration precludes the use of this functionality for most applications. A side effect of this is that this feature has not seen significant adoption, meaning that any use of this is subject to the usual headaches associated with being an early adopter (greater chance of bugs and greater risk of hitting an unoptimized code path). As a result, for the moment, we generally recommend against using this feature. In summary, it is possible to use Coherence in XA-oriented applications, and several customers are doing this successfully, but it is not a core usage model for the product, so care should be taken before committing to this path. For most applications, the most robust solution is normally to use Coherence as a read-only cache of the underlying data resources, even if this prevents taking advantage of certain product features.

    Read the article

  • Using Queries with Coherence Read-Through Caches

    - by jpurdy
    Applications that rely on partial caches of databases, and use read-through to maintain those caches, have some trade-offs if queries are required. Coherence does not support push-down queries, so queries will apply only to data that currently exists in the cache. This is technically consistent with "read committed" semantics, but the potential absence of data may make the results so unintuitive as to be useless for most use cases (depending on how much of the database is held in cache). Alternatively, the application itself may manually "push down" queries to the database, either retrieving results equivalent to querying the cache directly, or may query the database for a key set and read the values from the cache (relying on read-through to handle any missing values). Obviously, if the result set is too large, reading through the cache may cause significant thrashing. It's also worth pointing out that if the cache is asynchronously synchronized with the database (perhaps via database change listener), that an application may commit a transaction to the database, then generate a key set from the database via a query, then read cache entries through the cache, possibly resulting in a race condition where the application sees older data than it had previously committed. In theory this is not problematic but in practice it is very unintuitive. For this reason it often makes sense to invalidate the cache when updating the database, forcing the next read-through to update the cache.

    Read the article

  • Using Bulk Operations with Coherence Off-Heap Storage

    - by jpurdy
    Some NamedCache methods (including clear(), entrySet(Filter), aggregate(Filter, …), invoke(Filter, …)) may generate large intermediate results. The size of these intermediate results may result in out-of-memory exceptions on cache servers, and in some cases on cache clients. This may be particularly problematic if out-of-memory exceptions occur on more than one server (since these operations may be cluster-wide) or if these exceptions cause additional memory use on the surviving servers as they take over partitions from the failed servers. This may be particularly problematic with clusters that use off-heap storage (such as NIO or Elastic Data storage options), since these storage options allow greater than normal cache sizes but do nothing to address the size of intermediate results or final result sets. One workaround is to use a PartitionedFilter, which allows the application to break up a larger operation into a number of smaller operations, each targeting either a set of partitions (useful for reducing the load on each cache server) or a set of members (useful for managing client result set sizes). It is also possible to return a key set, and then pull in the full entries using that key set. This also allows the application to take advantage of near caching, though this may be of limited value if the result is large enough to result in near cache thrashing.

    Read the article

  • Using the Coherence ConcurrentMap Interface (Locking API)

    - by jpurdy
    For many developers using Coherence, the first place they look for concurrency control is the com.tangosol.util.ConcurrentMap interface (part of the NamedCache interface). The ConcurrentMap interface includes methods for explicitly locking data. Despite the obvious appeal of a lock-based API, these methods should generally be avoided for a variety of reasons: They are very "chatty" in that they can't be bundled with other operations (such as get and put) and there are no collection-based versions of them. Locks do directly not impact mutating calls (including puts and entry processors), so all code must make explicit lock requests before modifying (or in some cases reading) cache entries. They require coordination of all code that may mutate the objects, including the need to lock at the same level of granularity (there is no built-in lock hierarchy and thus no concept of lock escalation). Even if all code is properly coordinated (or there's only one piece of code), failure during updates that may leave a collection of changes to a set of objects in a partially committed state. There is no concept of a read-only lock. In general, use of locking is highly discouraged for most applications. Instead, the use of entry processors provides a far more efficient approach, at the cost of some additional complexity.

    Read the article

1