Search Results

Search found 46 results on 2 pages for 'shard'.

Page 1/2 | 1 2  | Next Page >

  • SQL Azure: Notes on Building a Shard Technology

    - by Herve Roggero
    In Chapter 10 of the book on SQL Azure (http://www.apress.com/book/view/9781430229612) I am co-authoring, I am digging deeper in what it takes to write a Shard. It's actually a pretty cool exercise, and I wanted to share some thoughts on how I am designing the technology. A Shard is a technology that spreads the load of database requests over multiple databases, as transparently as possible. The type of shard I am building is called a Vertical Partition Shard  (VPS). A VPS is a mechanism by which the data is stored in one or more databases behind the scenes, but your code has no idea at design time which data is in which database. It's like having a mini cloud for records instead of services. Imagine you have three SQL Azure databases that have the same schema (DB1, DB2 and DB3), you would like to issue a SELECT * FROM Users on all three databases, concatenate the results into a single resultset, and order by last name. Imagine you want to ensure your code doesn't need to change if you add a new database to the shard (DB4). Now imagine that you want to make sure all three databases are queried at the same time, in a multi-threaded manner so your code doesn't have to wait for three database calls sequentially. Then, imagine you would like to obtain a breadcrumb (in the form of a new, virtual column) that gives you a hint as to which database a record came from, so that you could update it if needed. Now imagine all that is done through the standard SqlClient library... and you have the Shard I am currently building. Here are some lessons learned and techniques I am using with this shard: Parellel Processing: Querying databases in parallel is not too hard using the Task Parallel Library; all you need is to lock your resources when needed Deleting/Updating Data: That's not too bad either as long as you have a breadcrumb. However it becomes more difficult if you need to update a single record and you don't know in which database it is. Inserting Data: I am using a round-robin approach in which each new insert request is directed to the next database in the shard. Not sure how to deal with Bulk Loads just yet... Shard Databases:  I use a static collection of SqlConnection objects which needs to be loaded once; from there on all the Shard commands use this collection Extension Methods: In order to make it look like the Shard commands are part of the SqlClient class I use extension methods. For example I added ExecuteShardQuery and ExecuteShardNonQuery methods to SqlClient. Exceptions: Capturing exceptions in a multi-threaded code is interesting... but I kept it simple for now. I am using the ConcurrentQueue to store my exceptions. Database GUID: Every database in the shard is given a GUID, which is calculated based on the connection string's values. DataTable. The Shard methods return a DataTable object which can be bound to objects.  I will be sharing the code soon as an open-source project in CodePlex. Please stay tuned on twitter to know when it will be available (@hroggero). Or check www.bluesyntax.net for updates on the shard. Thanks!

    Read the article

  • First Shard for SQL Azure and SQL Server

    - by Herve Roggero
    That's it!!!!! It's ready to go and be tested, abused and improved! It requires .NET 4.0 and uses some cool technologies, like caching (the new System.Runtime.Caching) and the Task Parallel Library (System.Threading.Tasks). With this library you can: Define a shard of 1, 2 or 100 SQL databases (a mix of SQL Server and SQL Azure) Read from the shard in parallel or sequentially, and cache resultsets Update, Delete a record from the shard Insert records quickly in the shard with a round-robin load Reset the cache You can download the source code and a sample application here: http://enzosqlshard.codeplex.com/  Note about the breadcrumbs: I had to add a connection GUID in order for the library to know which database a record came from. The GUID is currently calculated on the fly in the library using some of the parameters of the connection string. The GUID is also dynamically added to the result set so the client can pass it back to the library. I am curious to get your feedback on this approach. ** Correction from my previous post: this is a library for a Horizontal Partition Shard (HPS): tables are split across databases horizontally. So in essence, the tables need to have the same schema across the databases.

    Read the article

  • Elasticsearch Shard stuck in INITIALIZING state

    - by Peeyush
    I stopped one of my elastic search node & as a normal behavior elastic search started relocating my shard on that node to another node. But even after 15 hours it is still stuck in INITIALIZING state. Also that shard is fluctuating between two nodes,for some time it stays on one node then automatically shift to another node & keep on doing that after every few hours. Main issue is it is still in INITIALIZING state after so many hours. I am using version 1.2.1. This shard which is stuck, it is a replica. I am getting this error in logs: [ERROR][index.engine.internal ] [mynode] [myindex][3] failed to acquire searcher, source delete_by_query java.lang.NullPointerException [WARN ][index.engine.internal ] [mynode] [myindex][3] failed engine [deleteByQuery/shard failed on replica] [WARN ][cluster.action.shard ] [mynode] [myindex][3] sending failed shard for [myindex][3], node[Sp3URfNVQlq2i4i3EjCakw], [R], s[INITIALIZING], indexUUID [kTikCHshQMKEQ_jAuWWWnw], reason [engine failure, message [deleteByQuery/shard failed on replica][EngineException[[myindex][3] failed to acquire searcher, source delete_by_query]; nested: NullPointerException; ]]

    Read the article

  • SQL SERVER – Shard No More – An Innovative Look at Distributed Peer-to-peer SQL Database

    - by pinaldave
    There is no doubt that SQL databases play an important role in modern applications. In an ideal world, a single database can handle hundreds of incoming connections from multiple clients and scale to accommodate the related transactions. However the world is not ideal and databases are often a cause of major headaches when applications need to scale to accommodate more connections, transactions, or both. In order to overcome scaling issues, application developers often resort to administrative acrobatics, also known as database sharding. Sharding helps to improve application performance and throughput by splitting the database into two or more shards. Unfortunately, this practice also requires application developers to code transactional consistency into their applications. Getting transactional consistency across multiple SQL database shards can prove to be very difficult. Sharding requires developers to think about things like rollbacks, constraints, and referential integrity across tables within their applications when these types of concerns are best handled by the database. It also makes other common operations such as joins, searches, and memory management very difficult. In short, the very solution implemented to overcome throughput issues becomes a bottleneck in and of itself. What if database sharding was no longer required to scale your application? Let me explain. For the past several months I have been following and writing about NuoDB, a hot new SQL database technology out of Cambridge, MA. NuoDB is officially out of beta and they have recently released their first release candidate so I decided to dig into the database in a little more detail. Their architecture is very interesting and exciting because it completely eliminates the need to shard a database to achieve higher throughput. Each NuoDB database consists of at least three or more processes that enable a single database to run across multiple hosts. These processes include a Broker, a Transaction Engine and a Storage Manager.  Brokers are responsible for connecting client applications to Transaction Engines and maintain a global view of the network to keep track of the multiple Transaction Engines available at any time. Transaction Engines are in-memory processes that client applications connect to for processing SQL transactions. Storage Managers are responsible for persisting data to disk and serving up records to the Transaction Managers if they don’t exist in memory. The secret to NuoDB’s approach to solving the sharding problem is that it is a truly distributed, peer-to-peer, SQL database. Each of its processes can be deployed across multiple hosts. When client applications need to connect to a Transaction Engine, the Broker will automatically route the request to the most available process. Since multiple Transaction Engines and Storage Managers running across multiple host machines represent a single logical database, you never have to resort to sharding to get the throughput your application requires. NuoDB is a new pioneer in the SQL database world. They are making database scalability simple by eliminating the need for acrobatics such as sharding, and they are also making general administration of the database simpler as well.  Their distributed database appears to you as a user like a single SQL Server database.  With their RC1 release they have also provided a web based administrative console that they call NuoConsole. This tool makes it extremely easy to deploy and manage NuoDB processes across one or multiple hosts with the click of a mouse button. See for yourself by downloading NuoDB here. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: CodeProject, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, SQLServer, T SQL, Technology Tagged: NuoDB

    Read the article

  • Rename Mongo Shard

    - by HeySteve
    Can I, and if I can, how can I rename a shard in Mongo? Like if I wanted to change the instances of rs0 to rep0 below: mongos> sh.status() --- Sharding Status --- sharding version: { "_id" : 1, "version" : 4, "minCompatibleVersion" : 4, "currentVersion" : 5, "clusterId" : ObjectId("111111111111") } shards: { "_id" : "rs0", "host" : "rs0/mongo0a:27017,mongo0b:27017" } ... I have thought about removing and re-adding the shard, but I'm not sure how I'd do that without having to drain the shard and drop dbs. Currently 0 of the collections have sharding enabled, I just have a few standalones added as shards. Thanks

    Read the article

  • I have to shard a mysql database. I want to start with 12 shards on 2 machines. What is the best w

    - by Tim
    All tables are InnoDb. I would rather not use mysqldump, because the shard sizes will be about 200 GB (about 700 million rows), and that will take too long. I was hoping to just stop mysql for an hour, copy the data files to a new machine, and start back up. But you can't do this with InnoDb, as some data is in the shared tablespace. Even if I have the innodb_file_per_table option set. This is not a website, but a custom application, used by tens of thousands right now, so uptime and performance are important. I suppose I could add logic into my server application to allow for gradual rebalancing / moving of a shard. Does anyone have a better idea?

    Read the article

  • English to French translation of computing terminology

    - by Rich
    I work in France as a Java programmer, mainly in French, but am a native English speaker. My level of French is pretty good (French wife!), but one thing I have problems with is working out whether to use English terminology or a French equivalent. Examples: lock (as in a synchronisation lock) - do I use the verb "locker" or do I use verrouiller? shard (databases) - "un shard" or "un tesson" (which means a shard of glass) ...and so-on... So, what do people recommend? Can anyone point out some good websites for translating this kind of terminology? The usual online translation tools are a bit too everyday English/French, not the slightly more specialised version that I find myself needing.

    Read the article

  • How would you gather client's data on Google App Engine without using Datastore/Backend Instances too much?

    - by ruslan
    I'm relatively new to StackExchange and not sure if it's appropriate place to ask design question. Site gives me a hint "The question you're asking appears subjective and is likely to be closed". Please let me know. Anyway.. One of the projects I'm working on is online survey engine. It's my first big commercial project on Google App Engine. I need your advice on how to collect stats and efficiently record them in DataStore without bankrupting me. Initial requirements are: After user finishes survey client sends list of pairs [ID (int) + PercentHit (double)]. This list shows how close answers of this user match predefined answers of reference answerers (which identified by IDs). I call them "target IDs". Creator of the survey wants to see aggregated % for given IDs for last hour, particular timeframe or from the beginning of the survey. Some surveys may have thousands of target/reference answerers. So I created entity public class HitsStatsDO implements Serializable { @Id transient private Long id; transient private Long version = (long) 0; transient private Long startDate; @Parent transient private Key parent; // fake parent which contains target id @Transient int targetId; private double avgPercent; private long hitCount; } But writing HitsStatsDO for each target from each user would give a lot of data. For instance I had a survey with 3000 targets which was answered by ~4 million people within one week with 300K people taking survey in first day. Even if we assume they were answering it evenly for 24 hours it would give us ~1040 writes/second. Obviously it hits concurrent writes limit of Datastore. I decided I'll collect data for one hour and save that, that's why there are avgPercent and hitCount in HitsStatsDO. GAE instances are stateless so I had to use dynamic backend instance. There I have something like this: // Contains stats for one hour private class Shard { ReadWriteLock lock = new ReentrantReadWriteLock(); Map<Integer, HitsStatsDO> map = new HashMap<Integer, HitsStatsDO>(); // Key is target ID public void saveToDatastore(); public void updateStats(Long startDate, Map<Integer, Double> hits); } and map with shard for current hour and previous hour (which doesn't stay here for long) private HashMap<Long, Shard> shards = new HashMap<Long, Shard>(); // Key is HitsStatsDO.startDate So once per hour I dump Shard for previous hour to Datastore. Plus I have class LifetimeStats which keeps Map<Integer, HitsStatsDO> in memcached where map-key is target ID. Also in my backend shutdown hook method I dump stats for unfinished hour to Datastore. There is only one major issue here - I have only ONE backend instance :) It raises following questions on which I'd like to hear your opinion: Can I do this without using backend instance ? What if one instance is not enough ? How can I split data between multiple dynamic backend instances? It hard because I don't know how many I have because Google creates new one as load increases. I know I can launch exact number of resident backend instances. But how many ? 2, 5, 10 ? What if I have no load at all for a week. Constantly running 10 backend instances is too expensive. What do I do with data from clients while backend instance is dead/restarting? Thank you very much in advance for your thoughts.

    Read the article

  • How can I gather client's data on Google App Engine without using Datastore/Backend Instances too much?

    - by ruslan
    One of the projects I'm working on is online survey engine. It's my first big commercial project on Google App Engine. I need your advice on how to collect stats and efficiently record them in DataStore without bankrupting me. Initial requirements are: After user finishes survey client sends list of pairs [ID (int) + PercentHit (double)]. This list shows how close answers of this user match predefined answers of reference answerers (which identified by IDs). I call them "target IDs". Creator of the survey wants to see aggregated % for given IDs for last hour, particular timeframe or from the beginning of the survey. Some surveys may have thousands of target/reference answerers. So I created entity public class HitsStatsDO implements Serializable { @Id transient private Long id; transient private Long version = (long) 0; transient private Long startDate; @Parent transient private Key parent; // fake parent which contains target id @Transient int targetId; private double avgPercent; private long hitCount; } But writing HitsStatsDO for each target from each user would give a lot of data. For instance I had a survey with 3000 targets which was answered by ~4 million people within one week with 300K people taking survey in first day. Even if we assume they were answering it evenly for 24 hours it would give us ~1040 writes/second. Obviously it hits concurrent writes limit of Datastore. I decided I'll collect data for one hour and save that, that's why there are avgPercent and hitCount in HitsStatsDO. GAE instances are stateless so I had to use dynamic backend instance. There I have something like this: // Contains stats for one hour private class Shard { ReadWriteLock lock = new ReentrantReadWriteLock(); Map<Integer, HitsStatsDO> map = new HashMap<Integer, HitsStatsDO>(); // Key is target ID public void saveToDatastore(); public void updateStats(Long startDate, Map<Integer, Double> hits); } and map with shard for current hour and previous hour (which doesn't stay here for long) private HashMap<Long, Shard> shards = new HashMap<Long, Shard>(); // Key is HitsStatsDO.startDate So once per hour I dump Shard for previous hour to Datastore. Plus I have class LifetimeStats which keeps Map<Integer, HitsStatsDO> in memcached where map-key is target ID. Also in my backend shutdown hook method I dump stats for unfinished hour to Datastore. There is only one major issue here - I have only ONE backend instance :) It raises following questions on which I'd like to hear your opinion: Can I do this without using backend instance ? What if one instance is not enough ? How can I split data between multiple dynamic backend instances? It hard because I don't know how many I have because Google creates new one as load increases. I know I can launch exact number of resident backend instances. But how many ? 2, 5, 10 ? What if I have no load at all for a week. Constantly running 10 backend instances is too expensive. What do I do with data from clients while backend instance is dead/restarting?

    Read the article

  • Implementing database redundancy with sharded tables

    - by ensnare
    We're looking to implement load balancing by horizontally sharding our tables across a cluster of servers. What are some options to implement live redundancy should a server fail? Would it be effective to do (2) INSERTS instead of one ... one to the target shard, and another to a secondary shard which could be accessed should the primary shard not respond? Or is there a better way? Thanks.

    Read the article

  • Finding out which tile a mouse click landed in

    - by Shard
    I am working on an icometric grid based game and im having an issue trying to link a mouse click from the user to a tile. I have been able to split the problem into 2 parts which is first finding a rectangle that sourounds a tile, which I have been able to do but the second part of figuring out from the rectangle which tile the click landed in has got me stumped. Here is an example of a rectangle with tiles on the inside: The rectangle is 70px long and 30px high so if i use an input of say 30x(top)/20y(left) how would I go about determining which tile this fell into?

    Read the article

  • PASS Conference 2011 Topic: Multitenant Design and Sharding with SQL Azure

    - by Herve Roggero
    I am really happy to announce that I have been accepted as a speaker at the 2011 PASS Conference in Seattle. The topic? It will be about SQL Azure scalability using shards, and the Data Federation feature of SQL Azure. I will also talk extensively about the community open-source sharding library Enzo SQL Shard (enzosqlshard.codeplex.com) and show how to make the most out of it. In general, the presentation will provide details about how to properly design an application for sharding, how to make it work for SQL Server, SQL Azure, and how to leverage the upcoming Data Federation technology that Microsoft is planning. The primary objective is to turn sharding an implementation concern, not a development concern. Using a library like Enzo SQL Shard will help you achieve this objective. If you come to PASS Summit this year, come see me and mention you saw this blog!

    Read the article

  • Gravity Forms not loading under https, jQuery is not defined

    - by cmykrgbb
    I am using Gravity Forms on my Wordpress site, and so far so good. The problem is I have made the page secure (https/SSL), and this is making the form not to work. It looks like the issue is how the site is trying to load jQuery. There are 23 JS errors on the page, which seem to be due to a failed jQuery load "Uncaught ReferenceError: jQuery is not defined". If I go to the page where the source is trying to pull the jQuery file, you'll see the error:https://code.jquery.com/jquery-1.7.1.min.js?ver=3.4.2 Screenshot of the error: https://www.evernote.com/shard/s212/sh/326f95d6-a498-4c33-b413-7e968225cc79/c2e380ed0fa02a913f712005c8301185 And this screenshot is the reference in the page source: https://www.evernote.com/shard/s212/sh/ae547962-c017-4321-90a2-c51433e59262/124ae116f2b803771f4eb36c90b5a524 So I have been told I'd want to look into that - that's where the ultimate issue is, but I don't really know what to do next. Is it failing because of Gravity Forms, the HTTPS plugin from Wordpress, my SSL certificate...? Thanks in advance!

    Read the article

  • Can't copy and paste into outlook 2003 from any other aplication

    - by Shard
    I am having an issue with outlook 2003 where i cannot copy and paste anything from any other application into a email that i am working on. I have tried copying from excel, word, notepad and from a web browser. I can confirm that i can copy and paste parts from the same email. Has anyone come across this issue before and/or know of a solution?

    Read the article

  • Good tools that fit on a thumb drive

    - by Shard
    I have been on the lookout lately for some good tools to fill up my flash drive and I thought I would ask the SF community for recomendations on good tools that will fit onto a thumb drive. Some i use are Driver Packs, CCleaner and the portable apps suite

    Read the article

  • How do you deal with denormalization / secondary indexes in database sharding?

    - by Continuation
    Say I have a "message" table with 2 secondary indexes: "recipient_id" "sender_id" I want to shard the "message" table by "recipient_id". That way to retrieve all messages sent to a certain recipient I only need to query one shard. But at the same time, I want to be able to make a query that ask for all messages sent by a certain sender. Now I don't want to send that query to every single shard of the "message" table. One way to do this is to duplicate the data and have a "message_by_sender" table sharded by "sender_id". The problem with that approach is that every time a message has been sent, I need to insert the message into both "message" and "message_by_sender" tables. But what if after inserting into "message" the insertion into "message_by_sender" fail? In that case the message exists in "message" but not in "message_by_sender". How do I make sure that if a message exists in "message" then it also exists in "message_by_sender" without resorting to 2 phase commit? This must be a very common issue for anyone who shards their databases. How do you deal woth it?

    Read the article

  • Regex to validate JSON

    - by Shard
    I am looking for a Regex that allows me to validate json. I am very new to Regex's and i know enough that parsing with Regex is bad but can it be used to validate?

    Read the article

  • Jquery and frames

    - by Shard
    I am currently working on a web application that has been created using a magnitude of frames that stretch down up to 5 times, The issue is that i need to preform some jquery magic throughout the website. What would be the best way to go about this (other than rewriting it which i have considered)? EDIT: The Frame Structure is something along the lines of this: Index.html menu.html banner.html list.html footer.html /lib/index.html header.html body.html footer.html The magic i am referencing is a few hot key shortcuts, find and replace that kind of stuff

    Read the article

  • Using SimpleXMLTreeBuilder in elementtree

    - by Shard
    I have been developing an application with django and elementtree and while deploying it to the production server i have found out it is running python 2.4. I have been able to bundle elementtree but now i am getting the error: "No module named expat; use SimpleXMLTreeBuilder instead" Unfortunately i cannot upgrade python so im stuck with what i got. How do i use SimpleXMLTreeBuilder as the parser and/or will i need to rewrite code?

    Read the article

  • Sharding / indexing strategy for multi-faceted search

    - by Graham
    I'm currently thinking about our database structure and how we modify it for scale. Specifically, we're thinking about using ElasticSearch to provide our search functionality. One common pattern with ElasticSearch seems to be the 'user-routing' pattern; that is, using routing to ensure that any one user's data resides on the same shard. This is great for client-specific search e.g. Gmail. Our application has a constraint such that any user will have a maximum of a few thousand documents, so this pattern seems like a good candidate. However, our search needs to work across all users, as well as targeting a specific user (so I might search my content, Alice's content, or all content). Similarly, we need to provide full-text search across any timeframe; recent months to several years ago. I'm thinking of combining the 'user-routing' and 'index-per-time-interval' patterns: I create an index for each month By default, searches are aliased against the most recent X months If no results are found, we can search against previous X months As we grow, we can reduce the interval X Each document is routed by the user ID So, this should let us do the following: search by user. This will search all indeces across 1 shard search by time. This will search ~2 indeces (by default) across all shards Is this a reasonable approach, considering we may scale to multi-million+ documents? Or should I be denormalizing the data somehow, so that user searches are performed on a totally seperate index from date searches? Thanks for any pros-cons of the above scenario.

    Read the article

  • Custom Lucene Sharding with Hibernate Search

    - by Timo Westkämper
    Has anyone experience with custom Lucene sharding / paritioning using Hibernate Search? The documentation of Hibernate Search says the following about Lucene Sharding : In some cases, it is necessary to split (shard) the indexing data of a given entity type into several Lucene indexes. This solution is not recommended unless there is a pressing need because by default, searches will be slower as all shards have to be opened for a single search. In other words don't do it until you have problems :) Has anyone implemented sharding in such a way for Hibernate Search that also queries can be target to one of the shards? In our case we have Lucene queries that should target only one shard per query.

    Read the article

  • Webscale is all about sharding and its coming to SQL Azure

    - by simonsabin
    There are many that joke about developers always talking about webscale and needing to shard to be able to scale. In reality many systems, if not most, don’t need to be able to scale to numerous nodes because todays processing is so powerful. However in the cloud world where you don’t have 1 big box you have many little ones (instances) you need some way of sharding/federating/distributing data and load. I’ve mentioned before of a PDC presentation on whats coming in SQL Azure, well they’ve put some...(read more)

    Read the article

  • How to make ActiveRecord work with legacy partitioned/sharded databases/tables?

    - by Utensil
    thanks for your time first...after all the searching on google, github and here, and got more confused about the big words(partition/shard/fedorate),I figure that I have to describe the specific problem I met and ask around. My company's databases deals with massive users and orders, so we split databases and tables in various ways, some are described below: way database and table name shard by (maybe it's should be called partitioned by?) YZ.X db_YZ.tb_X order serial number last three digits YYYYMMDD. db_YYYYMMDD.tb date YYYYMM.DD db_YYYYMM.tb_ DD date too The basic concept is that databases and tables are seperated acording to a field(not nessissarily the primary key), and there are too many databases and too many tables, so that writing or magically generate one database.yml config for each database and one model for each table isn't possible or at least not the best solution. I looked into drnic's magic solutions, and datafabric, and even the source code of active record, maybe I could use ERB to generate database.yml and do database connection in around filter, and maybe I could use named_scope to dynamically decide the table name for find, but update/create opertions are bounded to "self.class.quoted_table_name" so that I couldn't easily get my problem solved. And even I could generate one model for each table, because its amount is up to 30 most. But this is just not DRY! What I need is a clean solution like the following DSL: class Order < ActiveRecord::Base shard_by :order_serialno do |key| [get_db_config_by(key), #because some or all of the databaes might share the same machine in a regular way or can be configed by a hash of regex, and it can also be a const get_db_name_by(key), get_tb_name_by(key), ] end end Can anybody enlight me? Any help would be greatly appreciated~~~~

    Read the article

  • Designing a web application to scale

    - by Fahim Akhter
    Hi, While designing a web application facebook application to be precise. Which can spike and increase rapidly because of it vitality and is right intensive. What point should one keep in mind while designing the DB. For example what things should I leave room for if I need to shard or have a Master/Slave combination later (with memcache) Considering I use Relational Database with mySQL

    Read the article

1 2  | Next Page >