Search Results

Search found 6716 results on 269 pages for 'distributed algorithm'.

Page 147/269 | < Previous Page | 143 144 145 146 147 148 149 150 151 152 153 154 | Next Page >

Orchestrating the Virtual Enterprise, Part II

- by Kathryn Perry

A guest post by Jon Chorley, Oracle's CSO & Vice President, SCM Product Strategy Almost everyone has ordered from Amazon.com at one time or another. Our orders are as likely to be fulfilled by third parties as they are by Amazon itself. To deliver the order promptly and efficiently, Amazon has to send it to the right fulfillment location and know the availability in that location. It needs to be able to track status of the fulfillment and deal with exceptions. As a virtual enterprise, Amazon's operations, using thousands of trading partners, requires a very different approach to fulfillment than the traditional 'take an order and ship it from your own warehouse' model. Amazon had no choice but to develop a complex, expensive and custom solution to tackle this problem as there used to be no product solution available. Now, other companies who want to follow similar models have a better off-the-shelf choice -- Oracle Distributed Order Orchestration (DOO). Consider how another of our customers is using our distributed orchestration solution. This major airplane manufacturer has a highly complex business and interacts regularly with the U.S. Government and major airlines. It sits in the middle of an intricate supply chain and needed to improve visibility across its many different entities. Oracle Fusion DOO gives the company an orchestration mechanism so it could improve quality, speed, flexibility, and consistency without requiring an organ transplant of these highly complex legacy systems. Many retailers face the challenge of dealing with brick and mortar, Web, and reseller channels. They all need to be knitted together into a virtual enterprise experience that is consistent for their customers. When a large U.K. grocer with a strong brick and mortar retail operation added an online business, they turned to Oracle Fusion DOO to bring these entities together. Disturbing the Peace with Acquisitions Quite often a company's ERP system is disrupted when it acquires a new company. An acquisition can inject a new set of processes and systems -- or even introduce an entirely new business like Sun's hardware did at Oracle. This challenge has been a driver for some of our DOO customers. A large power management company is using Oracle Fusion DOO to provide the flexibility to rapidly integrate additional products and services into its central fulfillment operation. The Flip Side of Fulfillment Meanwhile, we haven't ignored similar challenges on the supply side of the equation. Specifically, how to manage complex supply in a flexible way when there are multiple trading parties involved? How to manage the supply to suppliers? How to manage critical components that need to merge in a tier two or tier three supply chain? By investing in supply orchestration solutions for the virtual enterprise, we plan to give users better visibility into their network of suppliers to help them drive down costs. We also think this technology and full orchestration process can be applied to the financial side of organizations. An example is transactions that flow through complex internal structures to minimize tax exposure. We can help companies manage those transactions effectively by thinking about the internal organization as a virtual enterprise and bringing the same solution set to this internal challenge. The Clear Front Runner No other company is investing in solving the virtual enterprise supply chain issues like Oracle is. Oracle is in a unique position to become the gold standard in this market space. We have the infrastructure of Oracle technology. We already have an Oracle Fusion DOO application which embraces the best of what's required in this area. And we're absolutely committed to extending our Fusion solution to other use cases and delivering even more business value. Jon ChorleyChief Sustainability Officer & Vice President, SCM Product StrategyOracle Corporation

Read the article
Blueprints for Oracle NoSQL Database

- by dan.mcclary

I think that some of the most interesting analytic problems are graph problems. I'm always interested in new ways to store and access graphs. As such, I really like the work being done by Tinkerpop to create Open Source Software to make property graphs more accessible over a wide variety of datastores. Since key-value stores like Oracle NoSQL Database are well-suited to storing property graphs, I decided to extend the Blueprints API to work with it. Below I'll discuss some of the implementation details, but you can check out the finished product here: http://github.com/dwmclary/blueprints-oracle-nosqldb. What's in a Property Graph? In the most general sense, a graph is just a collection of vertices and edges. Vertices and edges can have properties: weights, names, or any number of other traits. In an undirected graph, edges connect vertices without direction. A directed graph specifies that all edges have a head and a tail --- a direction. A multi-graph allows multiple edges to connect two vertices. A "property graph" encompasses all of these traits. Key-Value Stores for Property Graphs Key-Value stores like Oracle NoSQL Database tend to be ideal for implementing property graphs. First, if any vertex or edge can have any number of traits, we can treat it as a hash map. For example: Vertex["name"] = "Mary" Vertex["age"] = 28 Vertex["ID"] = 12345 and so on. This is a natural key-value relationship: the key "name" maps to the value "Mary." Moreover if we maintain two hash maps, one for vertex objects and one for edge objects, we've essentially captured the graph. As such, any scalable key-value store is fertile ground for planting graphs. Oracle NoSQL Database as a Scalable Graph Database While Oracle NoSQL Database offers useful features like tunable consistency, what lends it to storing property graphs is the storage guarantees around its key structure. Keys in Oracle NoSQL Database are divided into two parts: a major key and a minor key. The storage guarantee is simple. Major keys will be distributed across storage nodes, which could encompass a large number of servers. However, all minor keys which are children of a given major key are guaranteed to be stored on the same storage node. For example, the vertices: /Personnel/Vertex/1 and /Personnel/Vertex/2 May be stored on different servers, but /Personnel/Vertex/1-/name and /Personnel/Vertex/1-/age will always be on the same server. This means that we can structure our graph database such that retrieving all the properties for a vertex or edge requires I/O from only a single storage node. Moreover, Oracle NoSQL Database provides a storeIterator which allows us to store a huge number of vertices and edges in a scalable fashion. By storing the vertices and edges as major keys, we guarantee that they are distributed evenly across all storage nodes. At the same time we can use a partial major key to iterate over all the vertices or edges (e.g. we search over /Personnel/Vertex to iterate over all vertices). Fork It! The Blueprints API and Oracle NoSQL Database present a great way to get started using a scalable key-value database to store and access graph data. However, a graph store isn't useful without a good graph to work on. I encourage you to fork or pull the repository, store some data, and try using Gremlin or any other language to explore.

Read the article
My Dog, Cross-Channel Shopping, and Fusion SCM

- by Kathryn Perry

A guest post by Mark Carson, Director, Oracle Fusion Supply Chain Management I was walking my dog Max in an open space behind my house. As we tromped through the tall weeds I remembered it is tick season and that I should get Max some protection. While he sniffed merrily in the tick infested brush, I started shopping in the middle of an open field on my phone. I thought it would be convenient to pick up the tick medicine from a pet store on the way home. Searching the pet store website I saw that they had the medicine, but there was no information on whether the store had any in stock and there were no options for shipping it to the store for pickup. I could return it, but not pick it up which seamed kind of odd. I really didn't feel like making calls to the local stores to find out if they had it. Since the product is popular, I tried one of the large 'everything' stores. Browsing its website I could see that it could be shipped to me, shipped to the store for free, and that the store nearest to me had it in stock. Needless to say, this store became a better option. This experience is a small example of why retailers, distributors, and manufactures have placed a high priority on enabling 'cross-channel commerce.' Shoppers like you and me expect to be able to search, compare, buy and return products on-line and over the phone using a variety of devices including PDAs, tablets and in-store kiosks. The pet store lost my business because its web channel had limited information about its stores. I have spoken with many customers and prospects about cross-channel commerce. They all realize the business implications and urgency behind cross-channel commerce but recognize there are challenges to enable it. New and existing applications must be integrated together globally through a consistent cross-channel business process. Integration is required between applications that provide the initial shopping experience and delivery applications associated with warehouses, stores, and partners. The enablement must be accomplished in a flexible way to react to fast-changing product portfolios and new acquisitions, while at the same time minimizing costs through reuse of existing systems. Meanwhile, the business must continue to grow and decision makers need to balance new capability with peak seasons. The challenges above are not unique to retail. Any customer in any industry who has multiple points for capturing orders and multiple points for fulfilling orders will face these challenges. With this in mind, we had a unique opportunity in Fusion SCM to re-think how to build a set of modular and flexible applications in the order management space that would make these challenges easier to conquer. The results are Fusion Distributed Order Orchestration and Global Order Promising. These applications can help companies, such as the pet store, enable true cross-channel commerce. The apps provide highly adaptable and flexible business processes to automate order orchestration across multiple cross-channel systems. They also show a global view of supply across warehouses, stores, and partners for real-time availability and more accurate order promising. Additional capability includes a standards-based integration framework for seamless execution and the ability to reuse existing systems for faster and lower cost implementations. OK, that was a mouthful of features and benefits. As Max waited to cross the street (he can do basic math too), I wondered if he could relate. He does not care about leash laws, pick-up courtesy, where he can/can't walk, what time of day it is, or even ticks. He does not care about how all these things could make walking complicated. He just wants to walk. Similarly, customers just want to shop and companies just want to make it easier to sell and deliver. You can learn more about Distributed Order Orchestration and Global Order Promising in cross-channel here.

Read the article
New R Interface to Oracle Data Mining Available for Download

- by charlie.berger

The R Interface to Oracle Data Mining ( R-ODM) allows R users to access the power of Oracle Data Mining's in-database functions using the familiar R syntax. R-ODM provides a powerful environment for prototyping data analysis and data mining methodologies. R-ODM is especially useful for: Quick prototyping of vertical or domain-based applications where the Oracle Database supports the application Scripting of "production" data mining methodologies Customizing graphics of ODM data mining results (examples: classification, regression, anomaly detection) The R-ODM interface allows R users to mine data using Oracle Data Mining from the R programming environment. It consists of a set of function wrappers written in source R language that pass data and parameters from the R environment to the Oracle RDBMS enterprise edition as standard user PL/SQL queries via an ODBC interface. The R-ODM interface code is a thin layer of logic and SQL that calls through an ODBC interface. R-ODM does not use or expose any Oracle product code as it is completely an external interface and not part of any Oracle product. R-ODM is similar to the example scripts (e.g., the PL/SQL demo code) that illustrates the use of Oracle Data Mining, for example, how to create Data Mining models, pass arguments, retrieve results etc. R-ODM is packaged as a standard R source package and is distributed freely as part of the R environment's Comprehensive R Archive Network (CRAN). For information about the R environment, R packages and CRAN, see www.r-project.org. R-ODM is particularly intended for data analysts and statisticians familiar with R but not necessarily familiar with the Oracle database environment or PL/SQL. It is a convenient environment to rapidly experiment and prototype Data Mining models and applications. Data Mining models prototyped in the R environment can easily be deployed in their final form in the database environment, just like any other standard Oracle Data Mining model. What is R? R is a system for statistical computation and graphics. It consists of a language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files. The design of R has been heavily influenced by two existing languages: Becker, Chambers & Wilks' S and Sussman's Scheme. Whereas the resulting language is very similar in appearance to S, the underlying implementation and semantics are derived from Scheme. R was initially written by Ross Ihaka and Robert Gentleman at the Department of Statistics of the University of Auckland in Auckland, New Zealand. Since mid-1997 there has been a core group (the "R Core Team") who can modify the R source code archive. Besides this core group many R users have contributed application code as represented in the near 1,500 publicly-available packages in the CRAN archive (which has shown exponential growth since 2001; R News Volume 8/2, October 2008). Today the R community is a vibrant and growing group of dozens of thousands of users worldwide. It is free software distributed under a GNU-style copyleft, and an official part of the GNU project ("GNU S"). Resources: R website / CRAN R-ODM

Read the article
Application Logging needs work

Application Logging Application logging is the act of logging events that occur within an application much like how a court report documents what happens in court case. Application logs can be useful for several reasons, but the most common use for logs is to recreate steps to find the root cause of applications errors. Other uses can include the detection of Fraud, verification of user activity, or provide audits on user/data interactions. “Logs can contain different kinds of data. The selection of the data used is normally affected by the motivation leading to the logging. “ (OWASP, 2009) OWASP also stats that logging include applicable debugging information like the event date time, responsible process, and a description of the event. “There are many reasons why a logging system is a necessary part of delivering a distributed application. One of the most important is the ability to track exactly how many users are using the application during different time periods.” (Hatton, 2000) Hatton also states that application logging helps system designers determine whether parts of an application aren't being used as designed. He implies that low usage can be used to identify if users like or do not like aspects of a system based on user usage of the application. This enables application designers to extract why users don't like aspects of an application so that changes can be made to increase its usefulness and effectiveness. “Logging memory usage can also assist you in tuning up the internals of your application. If you're experiencing a randomly occurring problem, being able to match activities performed with the memory status at the time may enable you to discover the cause of the problem. It also gives you a good indication of the health of the distributed server machine at the time any activity is performed. “ (Hatton, 2000) Commonly Logged Application Events (Defined by OWASP) Access of Data Creation of Data Modification of Data in any form Administrative Functions Configuration Changes Debugging Information(Application Events) Authorization Attempts Data Deletion Network Communication Authentication Events Errors/Exceptions Application Error Logging The functionality associated with application error logging is actually the combination of proper error handling and applications logging. If we look back at Figure 4 and Figure 5, these code examples allow developers to handle various types of errors that occur within the life cycle of an application’s execution. Application logging can be applied within the Catch section of the TryCatch statement allowing for the errors to be logged when they occur. By placing the logging within the Catch section specific error details can be accessed that help identify the source of the error, the path to the error, what caused the error and definition of the error that occurred. This can then be logged and reviewed at a later date in order recreate the error that was received based data found in the application log. By allowing applications to log errors developers IT staff can use them to recreate errors that are encountered by end-users or other dependent systems.

Read the article
Webcast On-Demand: Building Java EE Apps That Scale

- by jeckels

With some awesome work by one of our architects, Randy Stafford, we recently completed a webcast on scaling Java EE apps efficiently. Did you miss it? No problem. We have a replay available on-demand for you. Just hit the '+' sign drop-down for access.Topics include: Domain object caching Service response caching Session state caching JSR-107 HotCache and more! Further, we had several interesting questions asked by our audience, and we thought we'd share a sampling of those here for you - just in case you had the same queries yourself. Enjoy! What is the largest Coherence deployment out there? We have seen deployments with over 500 JVMs in the Coherence cluster, and deployments with over 1000 JVMs using the Coherence jar file, in one system. On the management side there is an ecosystem of monitoring tools from Oracle and third parties with dashboards graphing values from Coherence's JMX instrumentation. For lifecycle management we have seen a lot of custom scripting over the years, but we've also integrated closely with WebLogic to leverage its management ecosystem for deploying Coherence-based applications and managing process life cycles. That integration introduces a new Java EE archive type, the Grid Archive or GAR, which embeds in an EAR and can be seen by a WAR in WebLogic. That integration also doesn't require any extra WebLogic licensing if Coherence is licensed. How is Coherence different from a NoSQL Database like MongoDB? Coherence can be considered a NoSQL technology. It pre-dates the NoSQL movement, having been first released in 2001 whereas the term "NoSQL" was coined in 2009. Coherence has a key-value data model primarily but can also be used for document data models. Coherence manages data in memory currently, though disk persistence is in a future release currently in beta testing. Where the data is managed yields a few differences from the most well-known NoSQL products: access latency is faster with Coherence, though well-known NoSQL databases can manage more data. Coherence also has features that well-known NoSQL database lack, such as grid computing, eventing, and data source integration. Finally Coherence has had 15 years of maturation and hardening from usage in mission-critical systems across a variety of industries, particularly financial services. Can I use Coherence for local caching? Yes, you get additional features beyond just a java.util.Map: you get expiration capabilities, size-limitation capabilities, eventing capabilites, etc. Are there APIs available for GoldenGate HotCache? It's mostly a black box. You configure it, and it just puts objects into your caches. However you can treat it as a glass box, and use Coherence event interceptors to enhance its behavior - and there are use cases for that. Are Coherence caches updated transactionally? Coherence provides several mechanisms for concurrency control. If a project insists on full-blown JTA / XA distributed transactions, Coherence caches can participate as resources. But nobody does that because it's a performance and scalability anti-pattern. At finer granularity, Coherence guarantees strict ordering of all operations (reads and writes) against a single cache key if the operations are done using Coherence's "EntryProcessor" feature. And Coherence has a unique feature called "partition-level transactions" which guarantees atomic writes of multiple cache entries (even in different caches) without requiring JTA / XA distributed transaction semantics.

Read the article
Mercurial fails while commiting/updating/etc. using Mercuriual+TrueCrypt+MAC

- by lukewar

While trying to work with Mercurial on project located on TrueCrypt partition I always get en error as follows: ** unknown exception encountered, details follow ** report bug details to http://mercurial.selenic.com/bts/ ** or [email protected] ** Mercurial Distributed SCM (version 1.5.2+20100502) ** Extensions loaded: Traceback (most recent call last): File "/usr/local/bin/hg", line 27, in mercurial.dispatch.run() File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 16, in run sys.exit(dispatch(sys.argv[1:])) File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 30, in dispatch return _runcatch(u, args) File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 50, in _runcatch return _dispatch(ui, args) File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 470, in _dispatch return runcommand(lui, repo, cmd, fullargs, ui, options, d) File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 340, in runcommand ret = _runcommand(ui, options, cmd, d) File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 521, in _runcommand return checkargs() File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 475, in checkargs return cmdfunc() File "/Library/Python/2.6/site-packages/mercurial/dispatch.py", line 469, in d = lambda: util.checksignature(func)(ui, *args, **cmdoptions) File "/Library/Python/2.6/site-packages/mercurial/util.py", line 401, in check return func(*args, **kwargs) File "/Library/Python/2.6/site-packages/mercurial/commands.py", line 3332, in update return hg.update(repo, rev) File "/Library/Python/2.6/site-packages/mercurial/hg.py", line 362, in update stats = _merge.update(repo, node, False, False, None) File "/Library/Python/2.6/site-packages/mercurial/merge.py", line 495, in update _checkunknown(wc, p2) File "/Library/Python/2.6/site-packages/mercurial/merge.py", line 77, in _checkunknown for f in wctx.unknown(): File "/Library/Python/2.6/site-packages/mercurial/context.py", line 660, in unknown return self._status[4] File "/Library/Python/2.6/site-packages/mercurial/util.py", line 156, in get result = self.func(obj) File "/Library/Python/2.6/site-packages/mercurial/context.py", line 622, in _status return self._repo.status(unknown=True) File "/Library/Python/2.6/site-packages/mercurial/localrepo.py", line 1023, in status if (f not in ctx1 or ctx2.flags(f) != ctx1.flags(f) File "/Library/Python/2.6/site-packages/mercurial/context.py", line 694, in flags flag = findflag(self._parents[0]) File "/Library/Python/2.6/site-packages/mercurial/context.py", line 690, in findflag return ff(path) File "/Library/Python/2.6/site-packages/mercurial/dirstate.py", line 145, in f if 'x' in fallback(x): TypeError: argument of type 'NoneType' is not iterable It is worth mention that Mercurial works perfectly if project is not located on TrueCrypt partition. Configuration: MacOS X 10.6.3 Mercurial Distributed SCM (version 1.5.2+20100502) Python 2.6.5 Have anyone of you generous people able to help me? :)

Read the article
TransactionScope won't work with DB2 provider

- by Florin

Hi Everyone, I've been trying to use TransactionScope with a DB2 database (using DB2 .Net provider v 9.0.0.2 and c# 2.0) which SHOULD be supported according to IBM. I have tried all the advice i could find on the IBM forums (such as here) to no avail. I have enabled XA transactions on my XP Sp2 machine, tried also from a Win 2003 Server machine but i consistently get the infamous error: ERROR [58005] [IBM][DB2/NT] SQL0998N Error occurred during transaction or heuristic processing. Reason Code = "16". Subcode = "2-80004005". SQLSTATE=58005 The windows event log says: The XA Transaction Manager attempted to load the XA resource manager DLL. The call to LOADLIBRARY for the XA resource manager DLL failed: DLL=C:\APPS\IBM\DB2v95fp2\SQLLIB\BIN\DB2APP.DLL File=d:\comxp_sp2\com\com1x\dtc\dtc\xatm\src\xarmconn.cpp Line=2467. Also, granted the NETWORK SERVICE user full rights to the folder and dll. Here's the MSDTC startup message MS DTC started with the following settings: Security Configuration (OFF = 0 and ON = 1): Network Administration of Transactions = 0, Network Clients = 0, Inbound Distributed Transactions using Native MSDTC Protocol = 0, Outbound Distributed Transactions using Native MSDTC Protocol = 0, Transaction Internet Protocol (TIP) = 0, XA Transactions = 1 Any help would be much appreciated! Thanks, Florin

Read the article
What makes merging in DVCS easy?

- by afriza

I read at Joel on Software: With distributed version control, the distributed part is actually not the most interesting part. The interesting part is that these systems think in terms of changes, not in terms of versions. and at HgInit: When we have to merge, Subversion tries to look at both revisions—my modified code, and your modified code—and it tries to guess how to smash them together in one big unholy mess. It usually fails, producing pages and pages of “merge conflicts” that aren’t really conflicts, simply places where Subversion failed to figure out what we did. By contrast, while we were working separately in Mercurial, Mercurial was busy keeping a series of changesets. And so, when we want to merge our code together, Mercurial actually has a whole lot more information: it knows what each of us changed and can reapply those changes, rather than just looking at the final product and trying to guess how to put it together. By looking at the SVN's repository folder, I have the impression that Subversion is maintaining each revisions as changeset. And from what I know, Hg is using both changeset and snapshot while Git is purely using snapshot to store the data. If my assumption is correct, then there must be other ways that make merging in DVCS easy. What are those?

Read the article
Is their a definitive list for the differences between the current version of SQL Azure and SQL Serv

- by Aim Kai

I am a relative newbie when it comes to SQL Azure!! I was wondering if there was a definitive list somewhere regarding what is and is not supported by SQL Azure in regards to SQL Server 2008? I have had a look through google but I've noticed some of the blog posts are missing things which I have found through my own testing: For example, quite a lot is summarised in this blog entry http://www.keepitsimpleandfast.com/2009/12/main-differences-between-sql-azure-and.html Common Language Runtime (CLR) Database file placement Database mirroring Distributed queries Distributed transactions Filegroup management Global temporary tables Spatial data and indexes SQL Server configuration options SQL Server Service Broker System tables Trace Flags which is a repeat of the MSDN page http://msdn.microsoft.com/en-us/library/ff394115.aspx I've noticed from my own testing that the following seem to have issues when migrating from SQL Server 2008 to the Azure: XML Types (the msdn does mention large custom types - I guess it may include this?? even if the data schema is really small?) Multi-part views I've been using SQL Azure Migration Wizard v3.1.8 to migrate local databases into the cloud. I was wondering if anyone could point to a list or give me any information till when these features are likely to be included in SQL Azure.

Read the article
Choosing between .NET Service Bus Queues vs Azure Queue Service

- by ChrisV

Just a quick question regarding an Azure application. If I have a number of Web and Worker roles that need to communicate, documentation says to use the Azure Queue Service. However, I've just read that the new .NET Service Bus now also offers queues. These look to be more powerful as they appear to offer a much more detailed API. Whilst the .NSB looks more interesting it has a couple of issues that make me wary of using it in distributed application. (for example, Queue Expiration... if I cannot guarantee that a queue will be renewed on time I may lose it all!). Has anyone had any experience using either of these two technologies and could give any advice on when to choose one over the other. I suspect that whilst the service bus looks more powerful, as my use case is really just enabling Web/Worker roles to communicate between each other, that the Azure Queue Service is what I'm after. But I'm just really looking for confirmation of that before progamming myself in to a corner :-) Thanks in advance. UPDATE Have read up about the two systems over the break. It defo looks like .NET service bus is more specifically designed for integrating systems rather than providing a general purpose reliable messaging system. Azure Queues are distributed and so reliable and scalable in a way that .NSB queues are not and so more suitable for code hosted within Azure itself. Thanks for the responses.

Read the article
Is Git ready to be recommended to my boss?

- by Mike Weller

I want to recomment Git to my boss as a new source control system, since we're stuck in the 90s with VSS (ouch), but are the tools and 3rd party support good enough yet? Specifically I'm talking about GUI front-ends similar to TortoiseSVN, decent visual diff/merge support, as well as stuff like email commit notifications and general support from 3rd parties like IDEs and build systems. Even though this will be used by programmers, we really need this kind of stuff in our team. I don't want to leave everyone stuck with a new tool, and even a new source control paradigm (distributed), with nothing but a command-line app and some online tutorials. This would be a step backwards. So what do you think... is Git ready? What decent tools exist for Git and what third party development apps support it? EDIT: My original question was pretty vague so I'm updating it to specifically ask for a list of available tools and 3rd party support for Git. Maybe we can get a community wiki post with a list of stuff. I also do not consider 'use subversion' to be an adequate answer. There are other reasons to use a distributed source control system other than offline editing - private and cheap branches being one of them.

Read the article
Extract 2 numbers preceded with two different strings in a paragrapf using TCL Regular Expression

- by Madhu

Hi, I need to extract two different numbers preceded by two different strings. Employee Id-- Employee16(I need 16) and Employee links-- Employee links:2 (I need 2). Source String looks like following: Employee16, Employee name is QueenRose Working for 46w0d Billing is Distributed 65537 assigned tasks, 0 reordered, 0 unassigned 0 discarded, 0 lost received, 5/255 load received sequence unavailable, 0xC2E7 sent sequence Employee links: 2 active, 0 inactive (max not set, min not set) Dt3/5/10:0, since 46w0d, no tasks pending Dt3/5/10:10, since 21w0d, no tasks rcvd Employee is currently working in Hardware section. Employee19, Employee name is Edward11 Working for 48w4d Billing is Distributed 206801498 assigned tasks, 0 reordered, 0 unassigned 655372 discarded, 0 lost received, 9/255 load received sequence unavailable, 0x23CA sent sequence Employee links: 7 active, 0 inactive (max not set, min not set) Dt3/5/10:0, since 47w2d, tasks pending Dt3/5/10:10, since 28w6d, no tasks pending Dt3/5/10:11, since 18w4d, no tasks pending Dt3/5/10:12, since 18w4d, no tasks pending Dt3/5/10:13, since 18w4d, no tasks pending Dt3/5/10:14, since 18w4d, no tasks pending Dt3/5/10:15, since 7w2d, no tasks pending Employee is currently working in Hardware sectione. Employee6 (inactive) Employee links: 2 Dt3/5/10:0 (inactive) Dt3/5/10:10 (inactive) Employee7 (inactive) Employee links: 2 Dt3/5/10:0 (inactive) Dt3/5/10:10 (inactive) Tried with the following: Multilink(\d+)[^\n\r]*[^M]*Member links:\s+(\d+) But is not listing all the Ids and links. Can anybody help me getting this? Thanks in advance, Madhu.

Read the article
Best practices to deal with "slightly different" branches of source code

- by jedi_coder

This question is rather agnostic than related to a certain version control program. Assume there is a source code tree under certain distributed version control. Let's call it A. At some point somebody else clones it and gets its own copy. Let's call it B. I'll call A and B branches, even if some version control tools have different definitions for branches (some might call A and B repositories). Let's assume that branch A is the "main" branch. In the context of distributed version control this only means that branch A is modified much more actively and the owner of branch B periodically syncs (pulls) new updates from branch A. Let's consider that a certain source file in branch B contains a class (again, it's also language agnostic). The owner of branch B considers that some class methods are more appropriate and groups them together by moving them inside the class body. Functionally nothing has changed - this is a very trivial refactoring of the code. But the change gets reflected in diffs. Now, assuming that this change from branch B will never get merged into branch A, the owner of branch B will always get this difference when pulling from branch A and merging into his own workspace. Even if there's only one such trivial change, the owner of branch B needs to resolve conflicts every time when pulling from branch A. As long as branches A and B are modified independently, more and more conflicts like this appear. What is the workaround for this situation? Which workflow should the owner of branch B follow to minimize the effort for periodically syncing with branch A?

Read the article
Velocity CTP: can we 'search' for objects?

- by Stato Machino

It appears that 'tags' allow us to associate a 'search term' with the objects placed into the Velocity cache space. However, these can only be queried within a 'region'. Further, regions somehow limit the locality of objects in the cache to a single server (or maybe something kinda like that). So this appears to make it hard to perform any operation for which the unique Id of the cached item is not persisted or continuously available to the application that stores and retrieves objects to and from the cache. In any case, I can't see an easy way to 'cleanse' the cache of objects or to find objects across the entire cache that may share some prefix, postfix or infix values in the cache key so that i can clear out the cache of object repeatedly created in unit tests, for example. And I am unsure about the consequences of regions being associated with single server cache locations. So I would appreciate any help with the following questions: What is the difference between a 'distributed cache' (called a 'partitioned' cache??) when using regions, and a 'local cache'? 1.a. In particular, are the region-oriented values in a distributed cache visible through a cache factory that is configured to 'see' the entire cache space? Are the operations of creating and removing 'regions' efficient enough that it would be reasonable to create a region and a group of tags for each bundle of objects that need to be cached? 2.a. Or does this just push the problem of scoping the 'search for objects' up the chain because the ability of the DataCache object to query down through regions and tags as limited as querying for the cache keys of objects themselves. Thanks, Stato

Read the article
Ipsec config problem // openswan

- by user90696

I try to configure Ipsec on server with openswan as client. But receive error - possible, it's auth error. What I wrote wrong in config ? Thank you for answers. #1: STATE_MAIN_I2: sent MI2, expecting MR2 003 "f-net" #1: received Vendor ID payload [Cisco-Unity] 003 "f-net" #1: received Vendor ID payload [Dead Peer Detection] 003 "f-net" #1: ignoring unknown Vendor ID payload [ca917959574c7d5aed4222a9df367018] 003 "f-net" #1: received Vendor ID payload [XAUTH] 108 "f-net" #1: STATE_MAIN_I3: sent MI3, expecting MR3 003 "f-net" #1: discarding duplicate packet; already STATE_MAIN_I3 010 "f-net" #1: STATE_MAIN_I3: retransmission; will wait 20s for response 003 "f-net" #1: discarding duplicate packet; already STATE_MAIN_I3 003 "f-net" #1: discarding duplicate packet; already STATE_MAIN_I3 003 "f-net" #1: discarding duplicate packet; already STATE_MAIN_I3 010 "f-net" #1: STATE_MAIN_I3: retransmission; will wait 40s for response 031 "f-net" #1: max number of retransmissions (2) reached STATE_MAIN_I3. Possible authentication failure: no acceptable response to our first encrypted message 000 "f-net" #1: starting keying attempt 2 of at most 3, but releasing whack other side - Cisco ASA. parameters for my connection on our Linux server : VPN Gateway 8.*.*.* (Cisco ) Phase 1 Exchange Type Main Mode Identification Type IP Address Local ID 4.*.*.* (our Linux server IP) Remote ID 8.*.*.* (VPN server IP) Authentication PSK Pre Shared Key Diffie-Hellman Key Group DH 5 (1536 bit) or DH 2 (1024 bit) Encryption Algorithm AES 256 HMAC Function SHA-1 Lifetime 86.400 seconds / no volume limit Phase 2 Security Protocol ESP Connection Mode Tunnel Encryption Algorithm AES 256 HMAC Function SHA-1 Lifetime 3600 seconds / 4.608.000 kilobytes DPD / IKE Keepalive 15 seconds PFS off Remote Network 192.168.100.0/24 Local Network 1 10.0.0.0/16 ............... Local Network 5 current openswan config : # config setup klipsdebug=all plutodebug="control parsing" protostack=netkey nat_traversal=no virtual_private=%v4:10.0.0.0/8,%v4:192.168.0.0/16,%v4:172.16.0.0/12 oe=off nhelpers=0 conn f-net type=tunnel keyexchange=ike authby=secret auth=esp esp=aes256-sha1 keyingtries=3 pfs=no aggrmode=no keylife=3600s ike=aes256-sha1-modp1024 # left=4.*.*.* leftsubnet=10.0.0.0/16 leftid=4.*.*.* leftnexthop=%defaultroute right=8.*.*.* rightsubnet=192.168.100.0/24 rightid=8.*.*.* rightnexthop=%defaultroute auto=add

Read the article
Test A SSH Connection from Windows commandline

- by IguanaMinstrel

I am looking for a way to test if a SSH server is available from a Windows host. I found this one-liner, but it requires the a Unix/Linux host: ssh -q -o "BatchMode=yes" user@host "echo 2>&1" && echo "UP" || echo "DOWN" Telnet'ing to port 22 works, but that's not really scriptable. I have also played around with Plink, but I haven't found a way to get the functionality of the one-liner above. Does anyone know Plink enough to make this work? Are there any other windows based tools that would work? Please note that the SSH servers in question are behind a corporate firewall and are NOT internet accessible. Arrrg. Figured it out: C:\>plink -batch -v user@host Looking up host "host" Connecting to 10.10.10.10 port 22 We claim version: SSH-2.0-PuTTY_Release_0.62 Server version: SSH-2.0-OpenSSH_4.7p1-hpn12v17_q1.217 Using SSH protocol version 2 Server supports delayed compression; will try this later Doing Diffie-Hellman group exchange Doing Diffie-Hellman key exchange with hash SHA-256 Host key fingerprint is: ssh-rsa 1024 aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa:aa Initialised AES-256 SDCTR client->server encryption Initialised HMAC-SHA1 client->server MAC algorithm Initialised AES-256 SDCTR server->client encryption Initialised HMAC-SHA1 server->client MAC algorithm Using username "user". Using SSPI from SECUR32.DLL Attempting GSSAPI authentication GSSAPI authentication initialised GSSAPI authentication initialised GSSAPI authentication loop finished OK Attempting keyboard-interactive authentication Disconnected: Unable to authenticate C:\>

Read the article
All websites migrated from server running IIS6 to IIS7

- by Leah

Hi, I hope someone will be able to help me with this. We have recently migrated all of our clients' sites to a new server running IIS7 - all the sites were originally running on a server running IIS6. Ever since the migration, lots of our clients are reporting error messages. There seems to be quite a number of issues related to sending emails and also, we have had the following error message reported by several different clients: Server Error in '/' Application. -------------------------------------------------------------------------------- Validation of viewstate MAC failed. If this application is hosted by a Web Farm or cluster, ensure that <machineKey> configuration specifies the same validationKey and validation algorithm. AutoGenerate cannot be used in a cluster. Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code. Exception Details: System.Web.HttpException: Validation of viewstate MAC failed. If this application is hosted by a Web Farm or cluster, ensure that <machineKey> configuration specifies the same validationKey and validation algorithm. AutoGenerate cannot be used in a cluster. I have read elsewhere that this error can appear if a button is clicked before the whole page has finished loading. But as this error has now appeared on multiple sites and only since the server migration, it seems to me that it must be something else. I was wondering if someone could tell me if there is something specific which needs to be changed for .NET sites when sites are moved from a server running IIS6 to a server running IIS7? I don't deal with the actual servers very much so I'm afraid this is very much a grey area for me. Any help would be very much appreciated.

Read the article
Estimating compressed file size using a list parameter

- by Sai

I am currently compressing a list of files from a directory in the following format: tar -cvjf test_1.tar.gz -T test_1.lst --no-recursion The above command will compress only those files mentioned in the list. I am doing this because this list is generated such that it fits a DVD. However, during compression the compression rate decreases the estimated file size and there is abundant space left in the DVD. This is something like a Knapsack algorithm. I would like to estimate the compressed file size and add some more files to the list. I found that it is possible to estimate file size using the following command: tar -cjf - Folder/ | wc -c This command does not take a list parameter. Is there a way to estimate compressed file size? I am also looking into options like perl scripts etc. Edit: I think I should provide more information since I have been doing a lot of web search. I came across a perl script(Link)that sort of emulates the Knapsack algorithm. The current problem with the above mentioned script is that it splits the files in their original state. When I compress the files after splitting them, there are opportunities for adding more files which I consider to be inefficient. There are 2 ways I could resolve the inefficiency: a) Compress individual files and save them in a directory using a script. The compressed file could provide a best estimate. I could generate a script using a folder of compressed files and use them on the uncompressed ones. b) Check whether the compressed file's size is less than the required size. If so, I should keep adding files until I meet the requirement. However, the addition of new files to the compressed file is an optimization problem by itself.

Read the article
How does Windows 7 taskbar "color hot-tracking" feature calculate the colour to use?

- by theyetiman

This has intrigued me for quite some time. Does anyone know the algorithm Windows 7 Aero uses to determine the colour to use as the hot-tracking hover highlight on taskbar buttons for currently-running apps? It is definitely based on the icon of the app, but I can't see a specific pattern of where it's getting the colour value from. It doesn't seem to be any of the following: An average colour value from the entire icon, otherwise you would get brown all the time with multi-coloured icons like Chrome. The colour used the most in the image, otherwise you'd get yellow for the SQL Server Management Studio icon (6th from left). Also, the Chrome icon used red, green and yellow in equal measure. A colour located at certain pixel coordinates within the icon, because Chrome is red -indicating the top of the icon - and Notepad++ (2nd from right) is green - indicating the bottom of the icon. I asked this question on ux.stackoverflow.com and it got closed as off-topic, but someone answered with the following: As described by Raymond Chen in this MSDN blog article: Some people ask how it's done. It's really nothing special. The code just looks for the predominant color in the icon. (And, since visual designers are sticklers for this sort of thing, black, white, and shades of gray are not considered "colors" for the purpose of this calculation.) However I wasn't really satisfied with that answer because it doesn't explain how the "predominant" colour is calculated. Surely on the SQL Management Studio icon, the predominant colour, to my eyes at least, is yellow. Yet the highlight is green. I want to know, specifically, what the algorithm is.

Read the article
EFS Remote Encryption

- by Apoulet

We have been trying to setup EFS across our domain. Unfortunately Reading/Writing file over network share does not work, we get an "Access Denied" error. Another worrying fact is that I managed to get it working for 1 machine but no other would work. The machines are all Windows 2008R2, running as VM under ESXi host. According to: http://technet.microsoft.com/en-us/library/bb457116.aspx#EHAA We setup the involved machine to be trusted for delegation The user are not restricted and can be trusted for delegation. The users have logged-in on both side and can read/write encrypted files without issues locally. I enabled Kerberos logging in the registry and this is the relevant logs that I get on the machine that has the encrypted files. In order for all certificate that the user possess (Only Key Name changes): Event ID 5058: Audit Success, "Other System Events" Key file operation. Subject: Security ID: {MyDOMAIN}\{MyID} Account Name: {MyID} Account Domain: {MyDOMAIN} Logon ID: 0xbXXXXXXX Cryptographic Parameters: Provider Name: Microsoft Software Key Storage Provider Algorithm Name: Not Available. Key Name: {CE885431-9B4F-47C2-8415-2D766B999999} Key Type: User key. Key File Operation Information: File Path: C:\Users\{MyID}\AppData\Roaming\Microsoft\Crypto\RSA\S-1-5-21-4585646465656-260371901-2912106767-1207\66099999999991e891f187e791277da03d_dfe9ecd8-31c4-4b0f-9b57-6fd3cab90760 Operation: Read persisted key from file. Return Code: 0x0[/code] Event ID 5061: Audit Faillure, "System Intergrity" [code]Cryptographic operation. Subject: Security ID: {MyDOMAIN}\{MyID} Account Name: {MyID} Account Domain: {MyDOMAIN} Logon ID: 0xbXXXXXXX Cryptographic Parameters: Provider Name: Microsoft Software Key Storage Provider Algorithm Name: RSA Key Name: {CE885431-9B4F-47C2-8415-2D766B999999} Key Type: User key. Cryptographic Operation: Operation: Open Key. Return Code: 0x8009000b Could this be related to this error from the CryptAcquireContext function NTE_BAD_KEY_STATE 0x8009000BL The user password has changed since the private keys were encrypted. The problem is that the users I using at the moment can not change their password.

Read the article
Windows media scaling/interpolation method

- by MichaelH

Usually in Windows, if videos or other media is upscaled from a certain resolution to a higher resolution (e.g. "monitor size"), a bilinear filtering algorithm or similar is used, such that the upscaled material doesn't look blocky. On my system however, the used interpolation algorithm changed from 'bilinear' to 'nearest neighbor' at some point, with the effect that upscaled videos (e.g. viewed in MPC or WMP, and also Skype video streams) and games (e.g. from PopCap) appear rather blocky. Not sure what the common factor between those is, could be DirectShow(?). I am not aware of having changed any setting that could have affected this state, in fact I am not even aware such a setting exists. I'm guesing that some installed software must have changed something on my computer. My computer is running Windows 7, but I had already experienced the same effect on an XP machine some while ago, where it changed back again to the more pleasing bilinear interpolation after a while, as magically as the first time. What could be wrong with this installation, and how can I change this upscaling interpolation behavior?

Read the article
How many guesses per second are possible against an encrypted disk? [closed]

- by HappyDeveloper

I understand that guesses per second depends on the hardware and the encryption algorithm, so I don't expect an absolute number as answer. For example, with an average machine you can make a lot (thousands?) of guesses per second for a hash created with a single md5 round, because md5 is fast, making brute force and dictionary attacks a real danger for most passwords. But if instead you use bcrypt with enough rounds, you can slow the attack down to 1 guess per second, for example. 1) So how does disk encryption usually work? This is how I imagine it, tell me if it is close to reality: When I enter the passphrase, it is hashed with a slow algorithm to generate a key (always the same?). Because this is slow, brute force is not a good approach to break it. Then, with the generated key, the disk is unencrypted on the fly very fast, so there is not a significant performance lose. 2) How can I test this with my own machine? I want to calculate the guesses per second my machine can make. 3) How many guesses per second are possible against an encrypted disk with the fastest PC ever so far?

Read the article
Implementation of ZipCrypto / Zip 2.0 encryption in java

- by gomesla

I'm trying o implement the zipcrypto / zip 2.0 encryption algoritm to deal with encrypted zip files as discussed in http://www.pkware.com/documents/casestudies/APPNOTE.TXT I believe I've followed the specs but just can't seem to get it working. I'm fairly sure the issue has to do with my interpretation of the crc algorithm. The documentation states CRC-32: (4 bytes) The CRC-32 algorithm was generously contributed by David Schwaderer and can be found in his excellent book "C Programmers Guide to NetBIOS" published by Howard W. Sams & Co. Inc. The 'magic number' for the CRC is 0xdebb20e3. The proper CRC pre and post conditioning is used, meaning that the CRC register is pre-conditioned with all ones (a starting value of 0xffffffff) and the value is post-conditioned by taking the one's complement of the CRC residual. Here is the snippet that I'm using for the crc32 public class PKZIPCRC32 { private static final int CRC32_POLYNOMIAL = 0xdebb20e3; private int crc = 0xffffffff; private int CRCTable[]; public PKZIPCRC32() { buildCRCTable(); } private void buildCRCTable() { int i, j; CRCTable = new int[256]; for (i = 0; i <= 255; i++) { crc = i; for (j = 8; j > 0; j--) if ((crc & 1) == 1) crc = (crc >>> 1) ^ CRC32_POLYNOMIAL; else crc >>>= 1; CRCTable[i] = crc; } } private int crc32(byte buffer[], int start, int count, int lastcrc) { int temp1, temp2; int i = start; crc = lastcrc; while (count-- != 0) { temp1 = crc >>> 8; temp2 = CRCTable[(crc ^ buffer[i++]) & 0xFF]; crc = temp1 ^ temp2; } return crc; } public int crc32(int crc, byte buffer) { return crc32(new byte[] { buffer }, 0, 1, crc); } } Below is my complete code. Can anyone see what I'm doing wrong. package org.apache.commons.compress.archivers.zip; import java.io.IOException; import java.io.InputStream; public class ZipCryptoInputStream extends InputStream { public class PKZIPCRC32 { private static final int CRC32_POLYNOMIAL = 0xdebb20e3; private int crc = 0xffffffff; private int CRCTable[]; public PKZIPCRC32() { buildCRCTable(); } private void buildCRCTable() { int i, j; CRCTable = new int[256]; for (i = 0; i <= 255; i++) { crc = i; for (j = 8; j > 0; j--) if ((crc & 1) == 1) crc = (crc >>> 1) ^ CRC32_POLYNOMIAL; else crc >>>= 1; CRCTable[i] = crc; } } private int crc32(byte buffer[], int start, int count, int lastcrc) { int temp1, temp2; int i = start; crc = lastcrc; while (count-- != 0) { temp1 = crc >>> 8; temp2 = CRCTable[(crc ^ buffer[i++]) & 0xFF]; crc = temp1 ^ temp2; } return crc; } public int crc32(int crc, byte buffer) { return crc32(new byte[] { buffer }, 0, 1, crc); } } private static final long ENCRYPTION_KEY_1 = 0x12345678; private static final long ENCRYPTION_KEY_2 = 0x23456789; private static final long ENCRYPTION_KEY_3 = 0x34567890; private InputStream baseInputStream = null; private final PKZIPCRC32 checksumEngine = new PKZIPCRC32(); private long[] keys = null; public ZipCryptoInputStream(ZipArchiveEntry zipEntry, InputStream inputStream, String passwd) throws Exception { baseInputStream = inputStream; // Decryption // ---------- // PKZIP encrypts the compressed data stream. Encrypted files must // be decrypted before they can be extracted. // // Each encrypted file has an extra 12 bytes stored at the start of // the data area defining the encryption header for that file. The // encryption header is originally set to random values, and then // itself encrypted, using three, 32-bit keys. The key values are // initialized using the supplied encryption password. After each byte // is encrypted, the keys are then updated using pseudo-random number // generation techniques in combination with the same CRC-32 algorithm // used in PKZIP and described elsewhere in this document. // // The following is the basic steps required to decrypt a file: // // 1) Initialize the three 32-bit keys with the password. // 2) Read and decrypt the 12-byte encryption header, further // initializing the encryption keys. // 3) Read and decrypt the compressed data stream using the // encryption keys. // Step 1 - Initializing the encryption keys // ----------------------------------------- // // Key(0) <- 305419896 // Key(1) <- 591751049 // Key(2) <- 878082192 // // loop for i <- 0 to length(password)-1 // update_keys(password(i)) // end loop // // Where update_keys() is defined as: // // update_keys(char): // Key(0) <- crc32(key(0),char) // Key(1) <- Key(1) + (Key(0) & 000000ffH) // Key(1) <- Key(1) * 134775813 + 1 // Key(2) <- crc32(key(2),key(1) >> 24) // end update_keys // // Where crc32(old_crc,char) is a routine that given a CRC value and a // character, returns an updated CRC value after applying the CRC-32 // algorithm described elsewhere in this document. keys = new long[] { ENCRYPTION_KEY_1, ENCRYPTION_KEY_2, ENCRYPTION_KEY_3 }; for (int i = 0; i < passwd.length(); ++i) { update_keys((byte) passwd.charAt(i)); } // Step 2 - Decrypting the encryption header // ----------------------------------------- // // The purpose of this step is to further initialize the encryption // keys, based on random data, to render a plaintext attack on the // data ineffective. // // Read the 12-byte encryption header into Buffer, in locations // Buffer(0) thru Buffer(11). // // loop for i <- 0 to 11 // C <- buffer(i) ^ decrypt_byte() // update_keys(C) // buffer(i) <- C // end loop // // Where decrypt_byte() is defined as: // // unsigned char decrypt_byte() // local unsigned short temp // temp <- Key(2) | 2 // decrypt_byte <- (temp * (temp ^ 1)) >> 8 // end decrypt_byte // // After the header is decrypted, the last 1 or 2 bytes in Buffer // should be the high-order word/byte of the CRC for the file being // decrypted, stored in Intel low-byte/high-byte order. Versions of // PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is // used on versions after 2.0. This can be used to test if the password // supplied is correct or not. byte[] encryptionHeader = new byte[12]; baseInputStream.read(encryptionHeader); for (int i = 0; i < encryptionHeader.length; i++) { encryptionHeader[i] ^= decrypt_byte(); update_keys(encryptionHeader[i]); } } protected byte decrypt_byte() { byte temp = (byte) (keys[2] | 2); return (byte) ((temp * (temp ^ 1)) >> 8); } @Override public int read() throws IOException { // // Step 3 - Decrypting the compressed data stream // ---------------------------------------------- // // The compressed data stream can be decrypted as follows: // // loop until done // read a character into C // Temp <- C ^ decrypt_byte() // update_keys(temp) // output Temp // end loop int read = baseInputStream.read(); read ^= decrypt_byte(); update_keys((byte) read); return read; } private final void update_keys(byte ch) { keys[0] = checksumEngine.crc32((int) keys[0], ch); keys[1] = keys[1] + (byte) keys[0]; keys[1] = keys[1] * 134775813 + 1; keys[2] = checksumEngine.crc32((int) keys[2], (byte) (keys[1] >> 24)); } }

Read the article
What is the fastest cyclic synchronization in Java (ExecutorService vs. CyclicBarrier vs. X)?

- by Alex Dunlop

Which Java synchronization construct is likely to provide the best performance for a concurrent, iterative processing scenario with a fixed number of threads like the one outlined below? After experimenting on my own for a while (using ExecutorService and CyclicBarrier) and being somewhat surprised by the results, I would be grateful for some expert advice and maybe some new ideas. Existing questions here do not seem to focus primarily on performance, hence this new one. Thanks in advance! The core of the app is a simple iterative data processing algorithm, parallelized to the spread the computational load across 8 cores on a Mac Pro, running OS X 10.6 and Java 1.6.0_07. The data to be processed is split into 8 blocks and each block is fed to a Runnable to be executed by one of a fixed number of threads. Parallelizing the algorithm was fairly straightforward, and it functionally works as desired, but its performance is not yet what I think it could be. The app seems to spend a lot of time in system calls synchronizing, so after some profiling I wonder whether I selected the most appropriate synchronization mechanism(s). A key requirement of the algorithm is that it needs to proceed in stages, so the threads need to sync up at the end of each stage. The main thread prepares the work (very low overhead), passes it to the threads, lets them work on it, then proceeds when all threads are done, rearranges the work (again very low overhead) and repeats the cycle. The machine is dedicated to this task, Garbage Collection is minimized by using per-thread pools of pre-allocated items, and the number of threads can be fixed (no incoming requests or the like, just one thread per CPU core). V1 - ExecutorService My first implementation used an ExecutorService with 8 worker threads. The program creates 8 tasks holding the work and then lets them work on it, roughly like this: // create one thread per CPU executorService = Executors.newFixedThreadPool( 8 ); ... // now process data in cycles while( ...) { // package data into 8 work items ... // create one Callable task per work item ... // submit the Callables to the worker threads executorService.invokeAll( taskList ); } This works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as much as the processing algorithm would be expected to allow (some work items will finish faster than others, then idle). However, as the work items become smaller (and this is not really under the program's control), the user CPU load shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.8% 85% 1.30 64k 2.5% 77% 5.6 16k 4% 64% 22.5 4096 8% 56% 86 1024 13% 38% 227 256 17% 19% 420 64 19% 17% 948 16 19% 13% 1626 Legend: - block size = size of the work item (= computational steps) - system = system load, as shown in OS X Activity Monitor (red bar) - user = user load, as shown in OS X Activity Monitor (green bar) - cycles/sec = iterations through the main while loop, more is better The primary area of concern here is the high percentage of time spent in the system, which appears to be driven by thread synchronization calls. As expected, for smaller work items, ExecutorService.invokeAll() will require relatively more effort to sync up the threads versus the amount of work being performed in each thread. But since ExecutorService is more generic than it would need to be for this use case (it can queue tasks for threads if there are more tasks than cores), I though maybe there would be a leaner synchronization construct. V2 - CyclicBarrier The next implementation used a CyclicBarrier to sync up the threads before receiving work and after completing it, roughly as follows: main() { // create the barrier barrier = new CyclicBarrier( 8 + 1 ); // create Runable for thread, tell it about the barrier Runnable task = new WorkerThreadRunnable( barrier ); // start the threads for( int i = 0; i < 8; i++ ) { // create one thread per core new Thread( task ).start(); } while( ... ) { // tell threads about the work ... // N threads + this will call await(), then system proceeds barrier.await(); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; } public void run() { while( true ) { // wait for work barrier.await(); // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should), and for very large work items indeed all 8 CPUs become highly loaded, as before. However, as the work items become smaller, the load still shrinks dramatically: blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.7% 78% 6.1 16k 5.5% 52% 25 4096 9% 29% 64 1024 11% 15% 117 256 12% 8% 169 64 12% 6.5% 285 16 12% 6% 377 For large work items, synchronization is negligible and the performance is identical to V1. But unexpectedly, the results of the (highly specialized) CyclicBarrier seem MUCH WORSE than those for the (generic) ExecutorService: throughput (cycles/sec) is only about 1/4th of V1. A preliminary conclusion would be that even though this seems to be the advertised ideal use case for CyclicBarrier, it performs much worse than the generic ExecutorService. V3 - Wait/Notify + CyclicBarrier It seemed worth a try to replace the first cyclic barrier await() with a simple wait/notify mechanism: main() { // create the barrier // create Runable for thread, tell it about the barrier // start the threads while( ... ) { // tell threads about the work // for each: workerThreadRunnable.setWorkItem( ... ); // ... now worker threads work on the work... // wait for worker threads to finish barrier.await(); } } class WorkerThreadRunnable implements Runnable { CyclicBarrier barrier; @NotNull volatile private Callable<Integer> workItem; WorkerThreadRunnable( CyclicBarrier barrier ) { this.barrier = barrier; this.workItem = NO_WORK; } final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { synchronized( this ) { workItem = callable; notify(); } } public void run() { while( true ) { // wait for work while( true ) { synchronized( this ) { if( workItem != NO_WORK ) break; try { wait(); } catch( InterruptedException e ) { e.printStackTrace(); } } } // do the work ... // wait for everyone else to finish barrier.await(); } } } Again, this works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.4% 80% 6.3 16k 4.6% 60% 30.1 4096 8.6% 41% 98.5 1024 12% 23% 202 256 14% 11.6% 299 64 14% 10.0% 518 16 14.8% 8.7% 679 The throughput for small work items is still much worse than that of the ExecutorService, but about 2x that of the CyclicBarrier. Eliminating one CyclicBarrier eliminates half of the gap. V4 - Busy wait instead of wait/notify Since this app is the primary one running on the system and the cores idle anyway if they're not busy with a work item, why not try a busy wait for work items in each thread, even if that spins the CPU needlessly. The worker thread code changes as follows: class WorkerThreadRunnable implements Runnable { // as before final protected void setWorkItem( @NotNull final Callable<Integer> callable ) { workItem = callable; } public void run() { while( true ) { // busy-wait for work while( true ) { if( workItem != NO_WORK ) break; } // do the work ... // wait for everyone else to finish barrier.await(); } } } Also works well functionally (it does what it should). blocksize | system | user | cycles/sec 256k 1.9% 85% 1.30 64k 2.2% 81% 6.3 16k 4.2% 62% 33 4096 7.5% 40% 107 1024 10.4% 23% 210 256 12.0% 12.0% 310 64 11.9% 10.2% 550 16 12.2% 8.6% 741 For small work items, this increases throughput by a further 10% over the CyclicBarrier + wait/notify variant, which is not insignificant. But it is still much lower-throughput than V1 with the ExecutorService. V5 - ? So what is the best synchronization mechanism for such a (presumably not uncommon) problem? I am weary of writing my own sync mechanism to completely replace ExecutorService (assuming that it is too generic and there has to be something that can still be taken out to make it more efficient). It is not my area of expertise and I'm concerned that I'd spend a lot of time debugging it (since I'm not even sure my wait/notify and busy wait variants are correct) for uncertain gain. Any advice would be greatly appreciated.

Read the article

Search Results

Search found 6716 results on 269 pages for 'distributed algorithm'.

Page 147/269 | < Previous Page | 143 144 145 146 147 148 149 150 151 152 153 154 | Next Page >

- by Kathryn Perry

- by dan.mcclary

- by Kathryn Perry

- by charlie.berger

- by jeckels

- by lukewar

- by Florin

- by afriza

- by Aim Kai

- by ChrisV

- by Mike Weller

- by Madhu

- by jedi_coder

- by Stato Machino

- by user90696

- by IguanaMinstrel

- by Leah

- by Sai

- by theyetiman

- by Apoulet

- by MichaelH

- by HappyDeveloper

- by gomesla

- by Alex Dunlop

< Previous Page | 143 144 145 146 147 148 149 150 151 152 153 154 | Next Page >