Search Results

Search found 6716 results on 269 pages for 'distributed algorithm'.

Page 1/269 | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • Recommendations for distributed processing/distributed storage systems

    - by Eddie
    At my organization we have a processing and storage system spread across two dozen linux machines that handles over a petabyte of data. The system right now is very ad-hoc; processing automation and data management is handled by a collection of large perl programs on independent machines. I am looking at distributed processing and storage systems to make it easier to maintain, evenly distribute load and data with replication, and grow in disk space and compute power. The system needs to be able to handle millions of files, varying in size between 50 megabytes to 50 gigabytes. Once created, the files will not be appended to, only replaced completely if need be. The files need to be accessible via HTTP for customer download. Right now, processing is automated by perl scripts (that I have complete control over) which call a series of other programs (that I don't have control over because they are closed source) that essentially transforms one data set into another. No data mining happening here. Here is a quick list of things I am looking for: Reliability: These data must be accessible over HTTP about 99% of the time so I need something that does data replication across the cluster. Scalability: I want to be able to add more processing power and storage easily and rebalance the data on across the cluster. Distributed processing: Easy and automatic job scheduling and load balancing that fits with processing workflow I briefly described above. Data location awareness: Not strictly required but desirable. Since data and processing will be on the same set of nodes I would like the job scheduler to schedule jobs on or close to the node that the data is actually on to cut down on network traffic. Here is what I've looked at so far: Storage Management: GlusterFS: Looks really nice and easy to use but doesn't seem to have a way to figure out what node(s) a file actually resides on to supply as a hint to the job scheduler. GPFS: Seems like the gold standard of clustered filesystems. Meets most of my requirements except, like glusterfs, data location awareness. Ceph: Seems way to immature right now. Distributed processing: Sun Grid Engine: I have a lot of experience with this and it's relatively easy to use (once it is configured properly that is). But Oracle got its icy grip around it and it no longer seems very desirable. Both: Hadoop/HDFS: At first glance it looked like hadoop was perfect for my situation. Distributed storage and job scheduling and it was the only thing I found that would give me the data location awareness that I wanted. But I don't like the namename being a single point of failure. Also, I'm not really sure if the MapReduce paradigm fits the type of processing workflow that I have. It seems like you need to write all your software specifically for MapReduce instead of just using Hadoop as a generic job scheduler. OpenStack: I've done some reading on this but I'm having trouble deciding if it fits well with my problem or not. Does anyone have opinions or recommendations for technologies that would fit my problem well? Any suggestions or advise would be greatly appreciated. Thanks!

    Read the article

  • Building a Redundant / Distributed Application

    - by MattW
    This is more of a "point me in the right direction" question. My team of three and I have built a hosted web app that queues and routes customer chat requests to available customer service agents (It does other things as well, but this is enough background to illustrate the issue). The basic dev architecture today is: a single page ajax web UI (ASP.NET MVC) with floating chat windows (think Gmail) a backend Windows service to queue and route the chat requests this service also logs the chats, calculates service levels, etc a Comet server product that routes data between the web frontend and the backend Windows service this also helps us detect which Agents are still connected (online) And our hardware architecture today is: 2 servers to host the web UI portion of the application a load balancer to route requests to the 2 different web app servers a third server to host the SQL Server DB and the backend Windows service responsible for queuing / delivering chats So as it stands today, one of the web app servers could go down and we would be ok. However, if something would happen to the SQL Server / Windows Service server we would be boned. My question - how can I make this backend Windows service logic be able to be spread across multiple machines (distributed)? The Windows service is written to accept requests from the Comet server, check for available Agents, and route the chat to those agents. How can I make this more distributed? How can I make it so that I can distribute the work of the backend Windows service can be spread across multiple machines for redundancy and uptime purposes? Will I need to re-write it with distributed computing in mind? I should also note that I am hosting all of this on Rackspace Cloud instances - so maybe it is something I should be less concerned about? Thanks in advance for any help!

    Read the article

  • Java - System design with distributed Queues and Locks

    - by sunny
    Looking for inputs to evaluate a design for a system (java) which would have a distributed queue serving several (but not too many) nodes. These nodes would process objects present in the distributed queue and on occasion require a distributed lock across the cluster on an arbitrary (distributed) data structures. These (distributed) data structures could potentially lie in a distributed cache. Eliminating Terracotta (DSO),Hazelcast and Akka what could be alternative choices. Currently considering zookeeper as a distributed locking mechanism. Since the recommendation of a znode is not to exceed the 1M size , the understanding is that zookeeper should not be used a distributed queue. And also from Netflix curator tech note 4. So should a distributed cache, say like memcached, or redis be used to emulate a distributed queue ? i.e. The distributed queue will be stored in the caches and will be locked cluster-wide via zookeeper. Are there potential pitfalls with this high-level approach. The objects don't need to be taken off the queue. The object will pass through a lifecycle which will determine its removal from the queue. There would be about 10k+ objects in a queue at a given time changing states and any node could service one stage of the object's lifecycle. (Although not strictly necessary .. i.e. one node could serve the entire lifecycle if that is more efficient.) Any suggestions/alternatives ? sidenote: new to zookeeper ; redis etc.

    Read the article

  • How to choose an integer linear programming solver ?

    - by Cassie
    Hi all, I am newbie for integer linear programming. I plan to use a integer linear programming solver to solve my combinatorial optimization problem. I am more familiar with C++/object oriented programming on an IDE. Now I am using NetBeans with Cygwin to write my applications most of time. May I ask if there is an easy use ILP solver for me? Or it depends on the problem I want to solve ? I am trying to do some resources mapping optimization. Please let me know if any further information is required. Thank you very much, Cassie.

    Read the article

  • open source gossip-based membership protocol?

    - by Aaron
    I am looking for a library which I can plug into a distributed application which implements any gossip-based membership protocol. Such a library would allow me to send/receive membership lists, merge received membership lists, etc... Even better would be if the library implemented a protocol with performance O(logn) performance guarantees. Does anyone know of any open source library like this? It doesn't need to meet all of the aforementioned requirements; even something partially implemented would be helpful.

    Read the article

  • A leader election algorithm for an oriented hypercube

    - by mick
    I'm stuck with some problem where I have to design a leader election algorithm for an oriented hypercube. This should be done by using a tournament with a number of rounds equal to the dimension D of the hypercube. In each stage d, with 1 <= d < D two candidate leaders of neighbouring d-dimensional hypercubes should compete to become the single candidate leader of the (d+1)-dimensional hypercube that is the union of their respective hypercubes.

    Read the article

  • Distributed storage and computing

    - by Tim van Elteren
    Dear Serverfault community, After researching a number of distributed file systems for deployment in a production environment with the main purpose of performing both batch and real-time distributed computing I've identified the following list as potential candidates, mainly on maturity, license and support: Ceph Lustre GlusterFS HDFS FhGFS MooseFS XtreemFS The key properties that our system should exhibit: an open source, liberally licensed, yet production ready, e.g. a mature, reliable, community and commercially supported solution; ability to run on commodity hardware, preferably be designed for it; provide high availability of the data with the most focus on reads; high scalability, so operation over multiple data centres, possibly on a global scale; removal of single points of failure with the use of replication and distribution of (meta-)data, e.g. provide fault-tolerance. The sensitivity points that were identified, and resulted in the following questions, are: transparency to the processing layer / application with respect to data locality, e.g. know where data is physically located on a server level, mainly for resource allocation and fast processing, high performance, how can this be accomplished? Do you from experience know what solutions provide this transparency and to what extent? posix compliance, or conformance, is mentioned on the wiki pages of most of the above listed solutions. The question here mainly is, how relevant is support for the posix standard? Hadoop for example isn't posix compliant by design, what are the pro's and con's? what about the difference between synchronous and asynchronous opeartion of a distributed file system. Though a synchronous distributed file system has the preference because of reliability it also imposes certain limitations with respect to scalability. What would be, from your expertise, the way to go on this? I'm looking forward to your replies. Thanks in advance! :) With kind regards, Tim van Elteren

    Read the article

  • Elastic versus Distributed in caching.

    - by Mike Reys
    Until now, I hadn't heard about Elastic Caching yet. Today I read Mike Gualtieri's Blog entry. I immediately thought about Oracle Coherence and got a little scare throughout the reading. Elastic Caching is the next step after Distributed Caching. As we've always positioned Coherence as a Distributed Cache, I thought for a brief instance that Oracle had missed a new trend/technology. But then I started reading the characteristics of an Elastic Cache. Forrester definition: Software infrastructure that provides application developers with data caching services that are distributed across two or more server nodes that consistently perform as volumes grow can be scaled without downtime provide a range of fault-tolerance levels Hey wait a minute, doesn't Coherence fullfill all these requirements? Oh yes, I think it does! The next defintion in the article is about Elastic Application Platforms. This is mainly more of the same with the addition of code execution. Now there is analytics functionality in Oracle Coherence. The analytics capability provides data-centric functions like distributed aggregation, searching and sorting. Coherence also provides continuous querying and event-handling. I think that when it comes to providing an Elastic Application Platform (as in the Forrester definition), Oracle is close, nearly there. And what's more, as Elastic Platform is the next big thing towards the big C word, Oracle Coherence makes you cloud-ready ;-) There you go! Find more info on Oracle Coherence here.

    Read the article

  • Significance of Bresenhams Line of Sight algorithm

    - by GamDroid
    What is the significance of Bresenhams Line of Sight algorithm in chasing and evading in games? As far as i know and implemented this algorithm calulates the straight line between two given points. However while implementing it in game development i stored the points calculated using this algorithm in an array.Then im traversing this array for chasing and evading purpose. This looks to be working good with some angles only.In an pixel based environment/tile based. What if there are some obstacles added in the paths of the two points? then this algorithm will not work right? How well can we use the Bresenhams Line algorithm in game development?

    Read the article

  • Building a distributed system on Amazon Web Services

    - by Songo
    Would simply using AWS to build an application make this application a distributed system? For example if someone uses RDS for the database server, EC2 for the application itself and S3 for hosting user uploaded media, does that make it a distributed system? If not, then what should it be called and what is this application lacking for it to be distributed? Update Here is my take on the application to clarify my approach to building the system: The application I'm building is a social game for Facebook. I developed the application locally on a LAMP stack using Symfony2. For production I used an a single EC2 Micro instance for hosting the app itself, RDS for hosting my database, S3 for the user uploaded files and CloudFront for hosting static content. I know this may sound like a naive approach, so don't be shy to express your ideas.

    Read the article

  • Design a Distributed System

    - by Bonton255
    I am preparing for an interview on Distributed Systems. I have gone through a lot of text and understand the basics of the area. However, I need some examples of discussions on designing a distributed system given a scenario. For example, if I were to design a distributed system to calculate if a number N is primary or not, what will the be design of the system, what will be the impact of network latency, CPU performance, node failure, addition of nodes, time synchronization etc. If you guys could present your in-depth thoughts on this example, or point me to some similar discussion, that would be really helpful.

    Read the article

  • Can an issue tracking system be distributed?

    - by Klaim
    I was thinking about issue tracking software like Redmine, Trac or even the one that is in Fossil and something hit me: Is there a reason why Redmine and Trac are not possible to be distributed? Or maybe it's possible and I just don't know how it's possible? If it's not possible, why? By distributed I mean like Facebook or Google or other applications that effectively runs on multiple hardware a the same time but share data.

    Read the article

  • The best algorithm enhancing alpha-beta?

    - by Risa
    I'm studying AI. My teacher gave us source code of a chess-like game and asked us to enhance it. My exercise is to improve the alpha/beta algorithm implementing in that game. The programmer already uses transposition tables, MTD(f) with alpha/beta+memory (MTD(f) is the best algorithm I know by far). So is there any better algorithm to enhance alpha-beta search or a good way to implement MTD(f) in coding a game?

    Read the article

  • Distributed Computing Framework (.NET) - Specifically for CPU Instensive operations

    - by StevenH
    I am currently researching the options that are available (both Open Source and Commercial) for developing a distributed application. "A distributed system consists of multiple autonomous computers that communicate through a computer network." Wikipedia The application is focused on distributing highly cpu intensive operations (as opposed to data intensive) so I'm sure MapReduce solutions don't fit the bill. Any framework that you can recommend ( + give a brief summary of any experience or comparison to other frameworks ) would be greatly appreciated. Thanks.

    Read the article

  • Distributed Development Tools -- (Version control and Project Management)

    - by Macy Abbey
    Hello, I've recently become responsible for choosing which source control and project management software to use for a company that employs me. Currently it uses Jira (project management) and Subversion (version control). I know there are many other options out there -- the ones I know about are all in this article http://mashable.com/2010/07/14/distributed-developer-teams/ . I'm leaning towards recommending they just stay with what they have as it seems workable and any change would have to be worth the cost of switching to say github/basecamp or some other solution. Some details on the team: It's a distributed development shop. Meetings of the whole team in one room are rare. It's currently a very small development team (three developers). The project management software is used by developers and a product manager or two. What are you experiences with version control and project management web applications? Are there any you would recommend and you think are worth the switching cost of time to learn new services / implementing the change? Edit: After educating myself further on the options it appears DVCS offer powerful benefits that may be worth investing in now as opposed to later in the company's lifetime when the switching cost is higher: I'm a Subversion geek, why I should consider or not consider Mercurial or Git or any other DVCS?

    Read the article

  • Distributed Development Tools -- (Version control and Project Management)

    - by Macy Abbey
    I've recently become responsible for choosing which source control and project management software to use for a company that employs me. Currently it uses Jira (project management) and Subversion (version control). I know there are many other options out there -- the ones I know about are all in this article http://mashable.com/2010/07/14/distributed-developer-teams/ . I'm leaning towards recommending they just stay with what they have as it seems workable and any change would have to be worth the cost of switching to say github/basecamp or some other solution. Some details on the team: It's a distributed development shop. Meetings of the whole team in one room are rare. It's currently a very small development team (three developers). The project management software is used by developers and a product manager or two. What are you experiences with version control and project management web applications? Are there any you would recommend and you think are worth the switching cost of time to learn new services / implementing the change? Edit: After educating myself further on the options it appears DVCS offer powerful benefits that may be worth investing in now as opposed to later in the company's lifetime when the switching cost is higher: I'm a Subversion geek, why I should consider or not consider Mercurial or Git or any other DVCS?

    Read the article

  • Matchmaking algorithm with a set of filters

    - by Yuriy Pogrebnyak
    I'm looking for matchmaking algorithm for 1x1 online game. Players must be matched not by their skill or level, as usual, but by some specific filters. Each player sends request, where he specifies some set of parameters (generally, 2-4 parameters). If some parameter is specified, player can be matched only with those who has sent this parameter with exactly the same value, or those who hasn't specified this parameter. I need this algorithm to be thread-safe and preferably fast. It would be great if it'll work for 3-4 or even more parameters, but also I'm looking for algorithm that works with only one parameter (in my case it's game bet). Also I'd appreciate ideas on how to implement or improve this algorithm on my server platform - ASP.NET. One more problem I'm facing is that finding match can't be executed right after user sends request, because if other user sends request before matching for previous is finished, they won't be matched even is they possibly could. So it seems that match finding should be started on schedule, and I need help on how to optimize it and how to choose time interval for starting new match finding. P.S. I've also posted this question on stackoverflow

    Read the article

  • JavaScript distributed computing project

    - by Ben L.
    I made a website that does absolutely nothing, and I've proven to myself that people like to stay there - I've already logged 11+ hours worth of cumulative time on the page. My question is whether it would be possible (or practical) to use the website as a distributed computing site. My first impulse was to find out if there were any JavaScript distributed computing projects already active, so that I could put a piece of code on the page and be done. Unfortunately, all I could find was a big list of websites that thought it might be a cool idea. I'm thinking that I might want to start with something like integer factorization - in this case, RSA numbers. It would be easy for the server to check if an answer was correct (simply test for modulus equals zero), and also easy to implement. Is my idea feasible? Is there already a project out there that I can use?

    Read the article

  • Control diamond square algorithm to generate islands/pangea.

    - by Gabriel A. Zorrilla
    I generated a height map with the diamond square algorithm. The thing is i do not manage to create islands, this is, restrict the height other than water level range to a certain value in the center of the map. I manualy seeded a circle in the middle of the map but the rest of the map still receives heights over the water level. I dont fully understand the Perlin noise algorithm so i'd like to work with my current implementation of the diamond square algorithm which took me 3 days to interpret and code in PHP. :P

    Read the article

  • word disambiguation algorithm (Lesk algorithm)

    - by anyssnordin
    Hii.. Can anybody help me to find an algorithm in Java code to find synonyms of a search word based on the context and I want to implement the algorithm with WordNet database. For example, "I am running a Java program". From the context, I want to find the synonyms for the word "running", but the synonyms must be suitable according to a context.

    Read the article

  • Enumerating combinations in a distributed manner

    - by Reyzooti
    I have a problem where I must analyse 500C5 combinations (255244687600) of something. Distributing it over a 10 node cluster where each cluster processes roughly 10^6 combinations per second means the job will be complete in about 7hours. The problem I have is distributing the 255244687600 combinations over the 10 nodes. I'd like to present each node with 25524468760, however the algorithms I'm using can only produce the combinations sequentially, I'd like to be able to pass the set of elements and a range of combination indicies eg: [0-10^7) or [10^7,2.0 10^7) etc and have the nodes themselves figure out the combinations. The algorithms I'm using at the moment are from the following: http://home.roadrunner.com/~hinnant/combinations.html A logical question I've considered using a master node, that enumerates each of the combinations and sends work to each of the nodes, however the overhead incurred in iterating the combinations from a single node and communicating back and forth work is enormous, and will subsequently lead to the master node becoming the bottleneck. Are there any good combination iterating algorithms geared up for efficient/optimal distributed enumeration?

    Read the article

  • Hopcroft–Karp algorithm in Python

    - by Simon
    I am trying to implement the Hopcroft Karp algorithm in Python using networkx as graph representation. Currently I am as far as this: #Algorithms for bipartite graphs import networkx as nx import collections class HopcroftKarp(object): INFINITY = -1 def __init__(self, G): self.G = G def match(self): self.N1, self.N2 = self.partition() self.pair = {} self.dist = {} self.q = collections.deque() #init for v in self.G: self.pair[v] = None self.dist[v] = HopcroftKarp.INFINITY matching = 0 while self.bfs(): for v in self.N1: if self.pair[v] and self.dfs(v): matching = matching + 1 return matching def dfs(self, v): if v != None: for u in self.G.neighbors_iter(v): if self.dist[ self.pair[u] ] == self.dist[v] + 1 and self.dfs(self.pair[u]): self.pair[u] = v self.pair[v] = u return True self.dist[v] = HopcroftKarp.INFINITY return False return True def bfs(self): for v in self.N1: if self.pair[v] == None: self.dist[v] = 0 self.q.append(v) else: self.dist[v] = HopcroftKarp.INFINITY self.dist[None] = HopcroftKarp.INFINITY while len(self.q) > 0: v = self.q.pop() if v != None: for u in self.G.neighbors_iter(v): if self.dist[ self.pair[u] ] == HopcroftKarp.INFINITY: self.dist[ self.pair[u] ] = self.dist[v] + 1 self.q.append(self.pair[u]) return self.dist[None] != HopcroftKarp.INFINITY def partition(self): return nx.bipartite_sets(self.G) The algorithm is taken from http://en.wikipedia.org/wiki/Hopcroft%E2%80%93Karp_algorithm However it does not work. I use the following test code G = nx.Graph([ (1,"a"), (1,"c"), (2,"a"), (2,"b"), (3,"a"), (3,"c"), (4,"d"), (4,"e"),(4,"f"),(4,"g"), (5,"b"), (5,"c"), (6,"c"), (6,"d") ]) matching = HopcroftKarp(G).match() print matching Unfortunately this does not work, I end up in an endless loop :(. Can someone spot the error, I am out of ideas and I must admit that I have not yet fully understand the algorithm, so it is mostly an implementation of the pseudo code on wikipedia

    Read the article

1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >