Search Results

Search found 874 results on 35 pages for 'scalability'.

Page 8/35 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >

  • How to calculate real-time stats?

    - by Diego Jancic
    I have a site with millions of users (well, actually it doesn't have any yet, but let's imagine), and I want to calculate some stats like "log-ins in the past hour". The problem is similar to the one described here: http://highscalability.com/blog/2008/4/19/how-to-build-a-real-time-analytics-system.html The simplest approach would be to do a select like this: select count(distinct user_id) from logs where date>='20120601 1200' and date <='20120601 1300' (of course other conditions could apply for the stats, like log-ins per country) Of course this would be really slow, mainly if it has millions (or even thousands) of rows, and I want to query this every time a page is displayed. How would you summarize the data? What should go to the (mem)cache? EDIT: I'm looking for a way to de-normalize the data, or to keep the cache up-to-date. For example I could increment an in-memory variable every time someone logs in, but that would help to know the total amount of logins, not the "logins in the last hour". Hope it's more clear now.

    Read the article

  • Combining cache methods - memcache/disk based

    - by Industrial
    Hi! Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us. What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data. Here's our two approaches that we have thought of right now: Using memcache on all<< major queries and leaving it alone to do what it does best. Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage. The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance. Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes. What approach should we use? - Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users? Thanks a lot!

    Read the article

  • Using memory-based cache together with conventional cache

    - by Industrial
    Hi! Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us. What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data. Here's our two approaches that we have thought of right now: Using memcache on all<< major queries and leaving it alone to do what it does best. Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage. The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance. Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes. What approach should we use? - Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users? Thanks a lot!

    Read the article

  • Network Load Balancing (NLB): is it suitable for "stateful" ASP.NET applications?

    - by micha12
    Hi everybody, I have posted the following question concerning ASP.NET web farms. http://stackoverflow.com/questions/1816756/how-to-create-an-asp-net-web-farm/ Guys recommended using Network Load Balancing (NLB) as a primary way of creating a web farm. However, Wikipedia says that "NLBS is intended for ... stateless applications". Our web application, however, is absolutely "stateful": it is a closed site to which users will have access by login and password, and information for every user will be different: people will see their own trades and operations. Should we still use NLB in this scenario? Thank you.

    Read the article

  • Is having functionality in DB a road block to scalability?

    - by Estefany Velez
    I may not be able to give the right title to the question. But here it is, We are developing financial portal for wealth management. We are expecting over 10000 clients to use the application. The portal calculates various performance analytics based on the the technical analysis of the stock market. We developed lot of the functionality through Stored procedures, user defined functions, triggers etc. through Database. We thought we can gain huge performance boost doing stuff directly in database than through C# code. And we actually did get a huge performance boost. When I tried to brag about the achievement to our CTO, he counter questioned my decision of having functionality implemented in database rather than code. According to him such applications suffer scalability problems. In his words "These days things are kept in memory/cache. Clustered data is hard to manage over time. Facebook, Google have nothing in database. It is the era of thin servers and thick clients. DB is used only to store plain data and functionality should be completely decoupled from the database." Can you guys please give me some suggestions as to whether what he says is right. How to go about architect such an application?

    Read the article

  • What is the Reason large sites don't use MySQL with ASP.NET?

    - by Luke101
    I have read this article from highscalability about stackoverflow and other large websites. Many large high traffic .NET sites such as plentyoffish.com, mysapce and SO all use .NET technologies and use SQL SERver for their database. In the article it says SO said As you add more and more database servers the SQL Server license costs can be outrageous. So by starting scale up and gradually going scale out with non-open source software you can be in a world of financial hurt. I don't understand why don't high traffic .NET sites convert their databases to MySQL as it is waay cheaper then SQL Server

    Read the article

  • Does Seaside scale?

    - by Richard Durr
    Seaside is known as "the heretical web framework". One of the points that make it heretical is that it has much shared state. That however is something which, in my current understanding, hinders easy scaling. Ruby on rails on the other hand shares as less state as possible. It has been known to scale pretty well, even if it is dog slow compared to modern smalltalk vms. flickr uses php and has scaled to an extremly big infrastructure... So has anybody some experience in the scaling of Seaside?

    Read the article

  • Does PHP session conflict with Share-Nothing-Architecture?

    - by Morgan Cheng
    When I first meet PHP, I'm amazed by the idea Sharing-Nothing-Architecture. I once in a project whose scalaiblity suffers from sharing data among different HTTP requests. However, as I proceed my PHP learning. I found that PHP has sessions. This looks conflict with the idea of sharing nothing. So, PHP session is just invented to make counterpart technology of ASP/ASP.NET/J2EE? Should high scalable web sites use PHP session?

    Read the article

  • Articles about replication schemes/algorithms?

    - by jkff
    I'm designing an hierarchical distributed system (every node has zero or more "master" nodes to which it propagates its current data). The data gets continuously updated and I'd like to guarantee that at least N nodes have almost-current data at any given time. I do not need complete consistency, only eventual consistency (t.i. for any time instant, the current snapshot of data should eventually appear on at least N nodes. It is tricky to define the term "current" here, but still). Nodes may fail and go back up at any moment, and there is no single "central" node. O overflowers! Point me to some good papers describing replication schemes. I've so far found one: Consistency Management in Optimistic Replication Algorithms

    Read the article

  • Database caching on a shared host

    - by tau
    Anyone have any ideas how to increase MySQL performance on a shared host? My question has less to do with overall database performance and more to do with simply retrieving user-submitted data. Currently my database will create caches at timed intervals, and then the PHP will selectively access the static files it needs. This has given me a noticeable performance boost, but I am worried about a time in which I have so much data that having to read in big files in PHP will actually be slower. I am just looking for ideas for shared hosting solutions; I am not going to get my own server anytime soon. Thanks!

    Read the article

  • Does AutoSproc Scale Well?

    - by nickyt
    We use AutoSproc as our DAL, not my choice, but it was there when I started working at my job. I was wondering if any one had any experience using AutoSproc with large web applications? I'm just curious if it would scale well as our application is growing and we might need to pop it into a web farm at some point. If it doesn't scale well, what would you suggest then since there are several options out there. Any info is greatly appreciated.

    Read the article

  • Event feed implementation - will it scale?

    - by SlappyTheFish
    Situation: I am currently designing a feed system for a social website whereby each user has a feed of their friends' activities. I have two possible methods how to generate the feeds and I would like to ask which is best in terms of ability to scale. Events from all users are collected in one central database table, event_log. Users are paired as friends in the table friends. The RDBMS we are using is MySQL. Standard method: When a user requests their feed page, the system generates the feed by inner joining event_log with friends. The result is then cached and set to timeout after 5 minutes. Scaling is achieved by varying this timeout. Hypothesised method: A task runs in the background and for each new, unprocessed item in event_log, it creates entries in the database table user_feed pairing that event with all of the users who are friends with the user who initiated the event. One table row pairs one event with one user. The problems with the standard method are well known – what if a lot of people's caches expire at the same time? The solution also does not scale well – the brief is for feeds to update as close to real-time as possible The hypothesised solution in my eyes seems much better; all processing is done offline so no user waits for a page to generate and there are no joins so database tables can be sharded across physical machines. However, if a user has 100,000 friends and creates 20 events in one session, then that results in inserting 2,000,000 rows into the database. Question: The question boils down to two points: Is this worst-case scenario mentioned above problematic, i.e. does table size have an impact on MySQL performance and are there any issues with this mass inserting of data for each event? Is there anything else I have missed?

    Read the article

  • Partitioning requests in code among several servers

    - by Jacques René Mesrine
    I have several forum servers (what they are is irrelevant) which stores posts from users and I want to be able to partition requests among these servers. I'm currently leaning towards partitioning them by geographic location. To improve the locality of data, users will be separated into regions e.g. North America, South America and so on. Is there any design pattern on how to implement the function that maps the partioning property to the server, so that this piece of code has high availability and would not become a single point of failure ? f( Region ) -> Server IP

    Read the article

  • White Label Ecommerce app. Shared or Individual dbs

    - by MetaDan
    Currently I'm working with an in house white label cms that we resell to multiple clients and it all runs from the same box/db. I'm just looking at converting this to have an ecommerce version that we'll run alongside it. I'm wondering whether there will be an issue keeping all the products/categories/orders in one db or whether it would be advisory to separate each instance of the site into its own db for this. These white label instances will only be sold to smaller companies that probably wont have masses of traffic/products and are looking for a simple ecommerce site. Anything larger will definitely get its own hosting and db. But for smaller scale stuff do you think a single db will be ok?

    Read the article

  • Commercial web application--scalable database design

    - by Rob Campbell
    I'm designing a set of web apps to track scientific laboratory data. Each laboratory has several members, each of whom will access both their own data and that of their laboratory as a whole. Many typical queries will thus be expected to return records of multiple members (e.g. my mouse, joe's mouse and sally's mouse). I think I have the database fairly well normalized. I'm now wondering how to ensure that users can efficiently access both their own data and their lab's data set when it is mixed among (hopefully) a whole ton of records from other labs. What I've come up with so far is that most tables will end with two fields: user_id and labgroup_id. The WHERE clause of any SELECT statement will include the appropriate reference to one of the id fields ("...WHERE 'labroup_id=n..." or "...WHERE user_id=n..."). My questions are: Is this an approach that will scale to 10^6 or more records? If so, what's the best way to use these fields in a query so that it most efficiently searches the relevant subset of the database? e.g. Should the first step in querying be to create a temporary table containing just the labgroup's data? Or will indexing using some combination of the id, user_id, and labroup_id fields be sufficient at that scale? I thank any responders very much in advance.

    Read the article

  • Using memcache together with conventional cache

    - by Industrial
    Hi! Here's the deal. We would have taken the complete static html road to solve performance issues, but since the site will be partially dynamic, this won't work out for us. What we have thought of instead is using memcache + eAccelerator to speed up PHP and take care of caching for the most used data. Here's our two approaches that we have thought of right now: Using memcache on all<< major queries and leaving it alone to do what it does best. Usinc memcache for most commonly retrieved data, and combining with a standard harddrive-stored cache for further usage. The major advantage of only using memcache is of course the performance, but as users increases, the memory usage gets heavy. Combining the two sounds like a more natural approach to us, even though the theoretical compromize in performance. Memcached appears to have some replication features available as well, which may come handy when it's time to increase the nodes. What approach should we use? - Is it stupid to compromize and combine the two methods? Should we insted be focusing on utilizing memcache and instead focusing on upgrading the memory as the load increases with the number of users? Thanks a lot!

    Read the article

  • Scalable Database Tagging Schema

    - by Longpoke
    EDIT: To people building tagging systems. Don't read this. It is not what you are looking for. I asked this when I wasn't aware that RDBMS all have their own optimization methods, just use a simple many to many scheme. I have a posting system that has millions of posts. Each post can have an infinite number of tags associated with it. Users can create tags which have notes, date created, owner, etc. A tag is almost like a post itself, because people can post notes about the tag. Each tag association has an owner and date, so we can see who added the tag and when. My question is how can I implement this? It has to be fast searching posts by tag, or tags by post. Also, users can add tags to posts by typing the name into a field, kind of like the google search bar, it has to fill in the rest of the tag name for you. I have 3 solutions at the moment, but not sure which is the best, or if there is a better way. Note that I'm not showing the layout of notes since it will be trivial once I get a proper solution for tags. Method 1. Linked list tagId in post points to a linked list in tag_assoc, the application must traverse the list until flink=0 post: id, content, ownerId, date, tagId, notesId tag_assoc: id, tagId, ownerId, flink tag: id, name, notesId Method 2. Denormalization tags is simply a VARCHAR or TEXT field containing a tab delimited array of tagId:ownerId. It cannot be a fixed size. post: id, content, ownerId, date, tags, notesId tag: id, name, notesId Method 3. Toxi (from: http://www.pui.ch/phred/archives/2005/04/tags-database-schemas.html, also same thing here: http://stackoverflow.com/questions/20856/how-do-you-recommend-implementing-tags-or-tagging) post: id, content, ownerId, date, notesId tag_assoc: ownerId, tagId, postId tag: id, name, notesId Method 3 raises the question, how fast will it be to iterate through every single row in tag_assoc? Methods 1 and 2 should be fast for returning tags by post, but for posts by tag, another lookup table must be made. The last thing I have to worry about is optimizing searching tags by name, I have not worked that out yet. I made an ASCII diagram here: http://pastebin.com/f1c4e0e53

    Read the article

  • How to make write operation idempotent?

    - by Morgan Cheng
    I'm reading article about recently release Gizzard sharding framework by twitter(http://engineering.twitter.com/2010/04/introducing-gizzard-framework-for.html). It mentions that all write operations must be idempotent to make sure high reliability. According to wikipedia, "Idempotent operations are operations that can be applied multiple times without changing the result." But, IMHO, in Gazzard case, idempotent write operation should be operations that sequence doesn't matter. Now, my question is: How to make write operation idempotent? The only thing I can image is to have a version number attached to each write. For example, in blog system. Each blog must have a $blog_id and $content. In application level, we always write a blog content like this write($blog_id, $content, $version). The $version is determined to be unique in application level. So, if application first try to set one blog to "Hello world" and second want it to be "Goodbye", the write is idempotent. We have such two write operations: write($blog_id, "Hello world", 1); write($blog_id, "Goodbye", 2); These two operations are supposed to changed two different records in DB. So, no matter how many times and what sequence these two operations executed, the results are same. This is just my understanding. Please correct me if I'm wrong.

    Read the article

  • Architecture for database analytics

    - by David Cournapeau
    Hi, We have an architecture where we provide each customer Business Intelligence-like services for their website (internet merchant). Now, I need to analyze those data internally (for algorithmic improvement, performance tracking, etc...) and those are potentially quite heavy: we have up to millions of rows / customer / day, and I may want to know how many queries we had in the last month, weekly compared, etc... that is the order of billions entries if not more. The way it is currently done is quite standard: daily scripts which scan the databases, and generate big CSV files. I don't like this solutions for several reasons: as typical with those kinds of scripts, they fall into the write-once and never-touched-again category tracking things in "real-time" is necessary (we have separate toolset to query the last few hours ATM). this is slow and non-"agile" Although I have some experience in dealing with huge datasets for scientific usage, I am a complete beginner as far as traditional RDBM go. It seems that using column-oriented database for analytics could be a solution (the analytics don't need most of the data we have in the app database), but I would like to know what other options are available for this kind of issues.

    Read the article

  • Cache layer for MVC - Model or controller?

    - by Industrial
    Hi everyone, I am having some second thoughts about where to implement the caching part. Where is the most appropriate place to implement it, you think? Inside every model, or in the controller? Approach 1 (psuedo-code): // mycontroller.php MyController extends Controller_class { function index () { $data = $this->model->getData(); echo $data; } } // myModel.php MyModel extends Model_Class{ function getData() { $data = memcached->get('data'); if (!$data) { $query->SQL_QUERY("Do query!"); } return $data; } } Approach 2: // mycontroller.php MyController extends Controller_class { function index () { $dataArray = $this->memcached->getMulti('data','data2'); foreach ($dataArray as $key) { if (!$key) { $data = $this->model->getData(); $this->memcached->set($key, $data); } } echo $data; } } // myModel.php MyModel extends Model_Class{ function getData() { $query->SQL_QUERY("Do query!"); return $data; } } Thoughts: Approach 1: No multiget/multi-set. If a high number of keys would be returned, overhead would be caused. Easier to maintain, all database/cache handling is in each model Approach 2: Better performancewise - multiset/multiget is used More code required Harder to maintain Tell me what you think!

    Read the article

  • Can in-memory SQLite databases scale with concurrency?

    - by Kent Boogaart
    In order to prevent a SQLite in-memory database from being cleaned up, one must use the same connection to access the database. However, using the same connection causes SQLite to synchronize access to the database. Thus, if I have many threads performing reads against an in-memory database, it is slower on a multi-core machine than the exact same code running against a file-backed database. Is there any way to get the best of both worlds? That is, an in-memory database that permits multiple, concurrent calls to the database?

    Read the article

  • Searching Natural Language Sentence Structure

    - by Cerin
    What's the best way to store and search a database of natural language sentence structure trees? Using OpenNLP's English Treebank Parser, I can get fairly reliable sentence structure parsings for arbitrary sentences. What I'd like to do is create a tool that can extract all the doc strings from my source code, generate these trees for all sentences in the doc strings, store these trees and their associated function name in a database, and then allow a user to search the database using natural language queries. So, given the sentence "This uploads files to a remote machine." for the function upload_files(), I'd have the tree: (TOP (S (NP (DT This)) (VP (VBZ uploads) (NP (NNS files)) (PP (TO to) (NP (DT a) (JJ remote) (NN machine)))) (. .))) If someone entered the query "How can I upload files?", equating to the tree: (TOP (SBARQ (WHADVP (WRB How)) (SQ (MD can) (NP (PRP I)) (VP (VB upload) (NP (NNS files)))) (. ?))) how would I store and query these trees in a SQL database? I've written a simple proof-of-concept script that can perform this search using a mix of regular expressions and network graph parsing, but I'm not sure how I'd implement this in a scalable way. And yes, I realize my example would be trivial to retrieve using a simple keyword search. The idea I'm trying to test is how I might take advantage of grammatical structure, so I can weed-out entries with similar keywords, but a different sentence structure. For example, with the above query, I wouldn't want to retrieve the entry associated with the sentence "Checks a remote machine to find a user that uploads files." which has similar keywords, but is obviously describing a completely different behavior.

    Read the article

  • Using AMQP to collect events

    - by synapse
    Does AMQP has any advantages over an ad-hoc implementation for a simple stats gathering scenario? It works like this - clients send events (more than we care to put into persistent storage) to (several) web workers, the workers aggrregate them and write to a single database. I don't think I should consider using AMQP for this because I'll still need web workers to receive events from clients through HTTP and to publish them. Am I missing something?

    Read the article

  • architecture - centraled location for different modules (cms, webapplications, ...) - best practise

    - by NicoJuicy
    Let's just say that i want to create a cms + other online applications. I want them all to integrate into a central location, but they also have to be available seperately (not everyone want's more than the cms solution). Would i create a huge central application that contains all the database, which communicates through a webserice with the "standalone - integrated" modules? Or would i create them seperately and the only thing that the "central" application would do is syncing the information (eg. the cms and another solution can have the same tables (eg. clients or employees). Or do you have another idea? (i know i'm a little vague, but i can't "give" a lot of details because of work - contract). If someone has all the "packages" it should be possible for the central application to integrate all the modules at one place! Or if someone has more than 1 module, it should combine this on the website. What i thought is best, is that the central location contains only the users and their rights (eg. cms - all rights, ...), and the information get synced with every change. (module cms, adding a new client - store locally and send data to the central location, central location - send to modules = table clients updated everywhere) This way it is easy if someone only "bought" a module, they can sync it easily through the complete architecture. I hope i made myself clear!

    Read the article

  • Why does this Java code not utilize all CPU cores?

    - by ReneS
    The attached simple Java code should load all available cpu core when starting it with the right parameters. So for instance, you start it with java VMTest 8 int 0 and it will start 8 threads that do nothing else than looping and adding 2 to an integer. Something that runs in registers and not even allocates new memory. The problem we are facing now is, that we do not get a 24 core machine loaded (AMD 2 sockets with 12 cores each), when running this simple program (with 24 threads of course). Similar things happen with 2 programs each 12 threads or smaller machines. So our suspicion is that the JVM (Sun JDK 6u20 on Linux x64) does not scale well. Did anyone see similar things or has the ability to run it and report whether or not it runs well on his/her machine (= 8 cores only please)? Ideas? I tried that on Amazon EC2 with 8 cores too, but the virtual machine seems to run different from a real box, so the loading behaves totally strange. package com.test; import java.util.concurrent.ExecutorService; import java.util.concurrent.Executors; import java.util.concurrent.Future; import java.util.concurrent.TimeUnit; public class VMTest { public class IntTask implements Runnable { @Override public void run() { int i = 0; while (true) { i = i + 2; } } } public class StringTask implements Runnable { @Override public void run() { int i = 0; String s; while (true) { i++; s = "s" + Integer.valueOf(i); } } } public class ArrayTask implements Runnable { private final int size; public ArrayTask(int size) { this.size = size; } @Override public void run() { int i = 0; String[] s; while (true) { i++; s = new String[size]; } } } public void doIt(String[] args) throws InterruptedException { final String command = args[1].trim(); ExecutorService executor = Executors.newFixedThreadPool(Integer.valueOf(args[0])); for (int i = 0; i < Integer.valueOf(args[0]); i++) { Runnable runnable = null; if (command.equalsIgnoreCase("int")) { runnable = new IntTask(); } else if (command.equalsIgnoreCase("string")) { runnable = new StringTask(); } Future<?> submit = executor.submit(runnable); } executor.awaitTermination(1, TimeUnit.HOURS); } public static void main(String[] args) throws InterruptedException { if (args.length < 3) { System.err.println("Usage: VMTest threadCount taskDef size"); System.err.println("threadCount: Number 1..n"); System.err.println("taskDef: int string array"); System.err.println("size: size of memory allocation for array, "); System.exit(-1); } new VMTest().doIt(args); } }

    Read the article

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >