Search Results

Search found 549 results on 22 pages for 'nosql'.

Page 8/22 | < Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >

  • Best data store for billions of rows

    - by Jody Powlette
    I need to be able to store small bits of data (approximately 50-75 bytes) for billions of records (~3 billion/month for a year). The only requirement is fast inserts and fast lookups for all records with the same GUID and the ability to access the data store from .net. I'm a SQL server guy and I think SQL Server can do this, but with all the talk about BigTable, CouchDB, and other nosql solutions, it's sounding more and more like an alternative to a traditional RDBS may be best due to optimizations for distributed queries and scaling. I tried cassandra and the .net libraries don't currently compile or are all subject to change (along with cassandra itself). I've looked into many nosql data stores available, but can't find one that meets my needs as a robust production-ready platform. If you had to store 36 billion small, flat records so that they're accessible from .net, what would choose and why?

    Read the article

  • What is the technological basis for Azure DocumentDB?

    - by user1703840
    Microsoft announced the availability of Azure DocumentDB as follows... A fully-managed, highly-scalable, NoSQL document database service. - Rich query over a schema-free JSON data model - Transactional execution of JavaScript logic - Scalable storage and throughput - Tunable consistency - Rapid development with familiar technologies - Blazingly fast and write optimized database service I really like the "transactional execution of JavaScript logic". Sounds like an approach similar to that of PostgreSQL NoSQL. Anyone know what is the technological basis for the Azure DocumentDB service? SQL Server?

    Read the article

  • Pintura OR Perserve 2.0 - Production Ready?

    - by Scott
    I'm very interested in this framework, coupled with a NoSQL backend like MongoDB. Basically, my blue-sky vision is this: ExtJS/Pintura/MongoDB. I would probably plug in Rhino as the js engine. Is there anybody here using Pintura in a production environment? What are the pitfalls? What is your general experience? Thanks.

    Read the article

  • Denormalization of large text?

    - by tesmar
    If I have large articles that need to be stored in a database, each associated with many tables would a NoSQL option help? Should I copy the 1000 char articles over multiple "buckets", duplicating them each time they are related to a bucket or should I use a normalized MySQL DB with lots of Memcache?

    Read the article

  • Column-oriented DBMS and JOIN operations

    - by André
    From some of the research I've done on NoSQL, column-oriented databases (like HBase or Cassandra) seem to solve the problem of costly JOIN operations, but I don't get how this approach solves this problem. Can anyone explain it to me and/or link me to interesting documentation regarding this area? Thanks

    Read the article

  • Sortie de la version 0.7 de Apache Cassandra, le SGBD NoSQL phare du Cloud Computing, utilisé par Facebook et Twitter

    Sortie de la version 0.7 de Apache Cassandra Le SGBD NoSQL phare du Cloud Computing, utilisé par Facebook et Twitter Mise à jour du 12/01/2011 par Idelways La fondation Apache vient d'annoncer la disponibilité de la version 0.7 de Cassandra, le système de gestion de base de donnée de type NoSQL conçu pour supporter des quantités massives de données. Parmi les nouveautés de cette version, l'arrivée des Indexs Secondaires, une manière efficace d'exécuter des requêtes sur des noeud locaux de stockages côté client. Autre nouveauté, le support des « Large Row », qui permettent d'avoir jusqu'à 2 milliards de co...

    Read the article

  • How to achieve zero down time

    - by Hiral Lakdavala
    For an application we want to achieve zero database and application down time using Active Active configuration. Our dB is Oracle Following are my questions: How can we achieve active active configuration in Oracle? Will introducing Cassandra/HBase(or any other No SQL dbs) cloud help in zero downtime or it is only for fast retrieval of data in a large db? Any other options? Thanks and Regards, Hiral

    Read the article

  • Database which only holds indexes and last X records in memory?

    - by Xeoncross
    I'm looking for a data store that is very memory efficient while still allowing many object changes per second and disregarding ACID compliance for the last X records. I need this database for a server with not much memory and I can make a key-value store, document, or SQL database work. The idea is that indexes/keys are the only thing I need in memory and all the actual values/objects/rows can be saved on disk do to the low read rate (I just want index/key lookup to be fast). I also don't want records constantly being flushed to disk, so I would like the last X number of records to be held in memory so that 100 or so of them can all be written at once. I don't care if I lose the last 10 seconds worth of objects/values. I do care if the database as a whole is in danger of becoming corrupt. Is there a data-store like this?

    Read the article

  • master-slave datastore replication, automatic failover, and wackamole

    - by z8000
    I have 2 dedicated servers provisioned for my next project's datastores. The datastores are configured for master-slave replication. There's no inherent automatic failover but I of course want this. That is, I'd love for access to the master datastore to always just work without having to configure a client library to detect when a master is down and failover to the slave. I've seen Wackamole which is based on the Spread Toolkit. You provide Wackamole with a set of IPs and a bunch of nodes, and regardless of the up/down state of any of the nodes, those IPs will stay available/up. Wackamole detects when a node goes down and ARPs the IP(s) that were up on the now-down node. It's pretty neat actually. So, my thought was to use Wackamole to keep the 2 virtual private IPs available/up. Clients would then just always use the same private IP to access the master datastore and the same but distinct IP for the slave datastore, even if those IPs were hosted on the same node. My datastore servers are accessed over a private network. I am unsure if this messes with Wackamole though. Is this lunacy? How do you generally handle automatic failover of private services like a datastore. FWIW, it shouldn't matter but the datastore is Redis. I don't want to hear "use mySQL" please :) Thanks.

    Read the article

  • Mongodb Slave replication lag

    - by Leonid Bugaev
    We using standard mongo setup: 2 replicas + 1 arbiter. Both replica servers use same AWS m1.medium with RAID10 EBS. We experiencing constantly growing replication lag on secondary replica. I tried to do full-resync, you can see it on graph, but it helped only for some hours. Our mongo usage is really low now, and frankly i can't understan why it can be. iostat 1 for secondary: avg-cpu: %user %nice %system %iowait %steal %idle 80.39 0.00 2.94 0.00 16.67 0.00 Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn xvdap1 0.00 0.00 0.00 0 0 xvdb 0.00 0.00 0.00 0 0 xvdfp4 12.75 0.00 189.22 0 193 xvdfp3 12.75 0.00 189.22 0 193 xvdfp2 7.84 0.00 40.20 0 41 xvdfp1 7.84 0.00 40.20 0 41 md127 19.61 0.00 219.61 0 224 mongostat for secondary (why 100% locks? i guess its the problem): insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time *10 *0 *16 *0 0 2|4 0 30.9g 62.4g 1.65g 0 107 0 0|0 0|0 198b 1k 16 replset-01 SEC 06:55:37 *4 *0 *8 *0 0 12|0 0 30.9g 62.4g 1.65g 0 91.7 0 0|0 0|0 837b 5k 16 replset-01 SEC 06:55:38 *4 *0 *7 *0 0 3|0 0 30.9g 62.4g 1.64g 0 110 0 0|0 0|0 342b 1k 16 replset-01 SEC 06:55:39 *4 *0 *8 *0 0 1|0 0 30.9g 62.4g 1.64g 0 82.9 0 0|0 0|0 62b 1k 16 replset-01 SEC 06:55:40 *3 *0 *7 *0 0 5|0 0 30.9g 62.4g 1.6g 0 75.2 0 0|0 0|0 466b 2k 16 replset-01 SEC 06:55:41 *4 *0 *7 *0 0 1|0 0 30.9g 62.4g 1.64g 0 138 0 0|0 0|1 62b 1k 16 replset-01 SEC 06:55:42 *7 *0 *15 *0 0 3|0 0 30.9g 62.4g 1.64g 0 95.4 0 0|0 0|0 342b 1k 16 replset-01 SEC 06:55:43 *7 *0 *14 *0 0 1|0 0 30.9g 62.4g 1.64g 0 98 0 0|0 0|0 62b 1k 16 replset-01 SEC 06:55:44 *8 *0 *17 *0 0 3|0 0 30.9g 62.4g 1.64g 0 96.3 0 0|0 0|0 342b 1k 16 replset-01 SEC 06:55:45 *7 *0 *14 *0 0 3|0 0 30.9g 62.4g 1.64g 0 96.1 0 0|0 0|0 186b 2k 16 replset-01 SEC 06:55:46 mongostat for primary insert query update delete getmore command flushes mapped vsize res faults locked % idx miss % qr|qw ar|aw netIn netOut conn set repl time 12 30 20 0 0 3 0 30.9g 62.6g 641m 0 0.9 0 0|0 0|0 212k 619k 48 replset-01 M 06:56:41 5 17 10 0 0 2 0 30.9g 62.6g 641m 0 0.5 0 0|0 0|0 159k 429k 48 replset-01 M 06:56:42 9 22 16 0 0 3 0 30.9g 62.6g 642m 0 0.7 0 0|0 0|0 158k 276k 48 replset-01 M 06:56:43 6 18 12 0 0 2 0 30.9g 62.6g 640m 0 0.7 0 0|0 0|0 93k 231k 48 replset-01 M 06:56:44 6 12 8 0 0 3 0 30.9g 62.6g 640m 0 0.3 0 0|0 0|0 80k 125k 48 replset-01 M 06:56:45 8 21 14 0 0 9 0 30.9g 62.6g 641m 0 0.6 0 0|0 0|0 118k 419k 48 replset-01 M 06:56:46 10 34 20 0 0 6 0 30.9g 62.6g 640m 0 1.3 0 0|0 0|0 164k 527k 48 replset-01 M 06:56:47 6 21 13 0 0 2 0 30.9g 62.6g 641m 0 0.7 0 0|0 0|0 111k 477k 48 replset-01 M 06:56:48 8 21 15 0 0 2 0 30.9g 62.6g 641m 0 0.7 0 0|0 0|0 204k 336k 48 replset-01 M 06:56:49 4 12 8 0 0 8 0 30.9g 62.6g 641m 0 0.5 0 0|0 0|0 156k 530k 48 replset-01 M 06:56:50 Mongo version: 2.0.6

    Read the article

  • How can I get started with BigData?

    - by ????? ????????
    I have a programming background, and I've done lots of database design and written lots of queries with Sql Server. I am really excited about looking at bigdata solutions. I know almost nothing about it. The way I want to learn is to sign up for a sandbox where I can try things out. questions Is there a sandbox where I can play around with hadoop? It does not have to be free. Would amazon EMR be the right path to go? What technologies should I be looking at to get started quickly? Is there a 'bigdata' dataset that is available to play with? Thank you so much for your guidance.

    Read the article

  • Large, high performance object or key/value store for HTTP serving on Linux

    - by Tommy
    I have a service that serves images to end users at a very high rate using plain HTTP. The images vary between 4 and 64kbytes, and there are 1.300.000.000 of them in total. The dataset is about 30TiB in size and changes (new objects, updates, deletes) make out less than 1% of the requests. The number of requests pr. second vary from 240 to 9000 and is dispersed pretty much all over, with few objects being especially "hot". As of now, these images are files on a ext3 filesystem distributed read only across a large amount of mid range servers. This poses several problems: Using a fileysystem is very inefficient since the metadata size is large, the inode/dentry cache is volatile on linux and some daemons tend to stat()/readdir() it's way through the directory structure, which in my case becomes very expensive. Updating the dataset is very time consuming and requires remounting between set A and B. The only reasonable handling is operating on the block device for backup, copying, etc. What I would like is a deamon that: speaks HTTP (get, put, delete and perhaps update) stores data it in an efficient structure. The index should remain in memory, and considering the amount of objects, the overhead must be small. The software should be able to handle massive connections with slow (if any) time needed to ramp up. Index should be read in memory at startup. Statistics would be nice, but not mandatory. I have experimented a bit with riak, redis, mongodb, kyoto and varnish with persistent storage, but I haven't had the chance to dig in really deep yet.

    Read the article

  • How to get the MongoDB' current working set size

    - by Howard
    From the doc , it said "For best performance, the majority of your active set should fit in RAM." So for example, my db.stats() give me { "db" : "mydb", "collections" : 16, "objects" : 21452, "avgObjSize" : 768.0516501957859, "dataSize" : 16476244, "storageSize" : 25385984, "numExtents" : 43, "indexes" : 70, "indexSize" : 15450112, "fileSize" : 469762048, "ok" : 1 } Which value is the working set size?

    Read the article

  • Couchdb failing test suite on Linux

    - by user52674
    Hi I've been trying to install CouchDB on my webfusion virtual server. I followed the latest instructions from the webfusion forum (see: http://forum.webfaction.com/viewtopic.php?id=2355 ) and it runs (just) Futon is very sluggish and I get 502 errors. Anyway when I run the test suite it fails on multiple tests. Webfaction support have been great but don't have erlang experience to interpret the error logs. Can anyone help me know what might be wrong? Test suite result: basics, all_docs, attachments, attachments_multipart, attachment_names, compact, config, conflicts, delayed_commits, design_docs, design_options all the errors are: Exception raised: {"error":"unknown","reason":"\u000d\u000a502 Bad Gateway\u000d\u000a\u000d\u000a<\h1502 Bad Gateway\u000d\u000a nginx\u000d\u000a\u000d\u000a\u000d\u000a"} except for 'compact; which also has: Assertion failed: xhr.responseText == "This is a base64 encoded text" Assertion failed: xhr.getResponseHeader("Content-Type") == "text/plain" I'm stumped. Anybody know what these indicate? AL

    Read the article

  • non-mapped virtual memory & total number of connections

    - by tszming
    We have two MongoDB data nodes (replica set) - Primary & Secondary. I noticed that the non-mapped virtual memory is relatively high and wondering if they are hurting our MongoDB performance (The server usually peaked at around 6-7K queries per sec). In MMS, it was stated: "The most common case of usage of a high amount of memory for non-mapped is that there are very many connections to the database." So we checked the memory usage with db.serverStatus().mem in our Secondary: { "bits" : 64, "resident" : 6846, "virtual" : 416797, "supported" : true, "mapped" : 205549, "mappedWithJournal" : 411098, "note" : "virtual minus mapped is large. could indicate a memory leak" } Note: We are using 2.0.4 and now the default stack size should be 1MB per connection. The current number of connections is around 1.1K, but the non-mapped virtual memory (virtual-mappedWithJournal) is around 5699 MB. The trend is quite stable so I can't say there is a leak here, but where is the memory gone? Any idea?

    Read the article

  • Which of CouchDB or MongoDB suits my needs?

    - by vonconrad
    Where I work, we use Ruby on Rails to create both backend and frontend applications. Usually, these applications interact with the same MySQL database. It works great for a majority of our data, but we have one situation which I would like to move to a NoSQL environment. We have clients, and our clients have what we call "inventories"--one or more of them. An inventory can have many thousands of items. This is currently done through two relational database tables, inventories and inventory_items. The problems start when two different inventories have different parameters: # Inventory item from inventory 1, televisions { inventory_id: 1 sku: 12345 name: Samsung LCD 40 inches model: 582903-4 brand: Samsung screen_size: 40 type: LCD price: 999.95 } # Inventory item from inventory 2, accomodation { inventory_id: 2 sku: 48cab23fa name: New York Hilton accomodation_type: hotel star_rating: 5 price_per_night: 395 } Since we obviously can't use brand or star_rating as the column name in inventory_items, our solution so far has been to use generic column names such as text_a, text_b, float_a, int_a, etc, and introduce a third table, inventory_schemas. The tables now look like this: # Inventory schema for inventory 1, televisions { inventory_id: 1 int_a: sku text_a: name text_b: model text_c: brand int_b: screen_size text_d: type float_a: price } # Inventory item from inventory 1, televisions { inventory_id: 1 int_a: 12345 text_a: Samsung LCD 40 inches text_b: 582903-4 text_c: Samsung int_a: 40 text_d: LCD float_a: 999.95 } This has worked well... up to a point. It's clunky, it's unintuitive and it lacks scalability. We have to devote resources to set up inventory schemas. Using separate tables is not an option. Enter NoSQL. With it, we could let each and every item have their own parameters and still store them together. From the research I've done, it certainly seems like a great alterative for this situation. Specifically, I've looked at CouchDB and MongoDB. Both look great. However, there are a few other bits and pieces we need to be able to do with our inventory: We need to be able to select items from only one (or several) inventories. We need to be able to filter items based on its parameters (eg. get all items from inventory 2 where type is 'hotel'). We need to be able to group items based on parameters (eg. get the lowest price from items in inventory 1 where brand is 'Samsung'). We need to (potentially) be able to retrieve thousands of items at a time. We need to be able to access the data from multiple applications; both backend (to process data) and frontend (to display data). Rapid bulk insertion is desired, though not required. Based on the structure, and the requirements, are either CouchDB or MongoDB suitable for us? If so, which one will be the best fit? Thanks for reading, and thanks in advance for answers. EDIT: One of the reasons I like CouchDB is that it would be possible for us in the frontend application to request data via JavaScript directly from the server after page load, and display the results without having to use any backend code whatsoever. This would lead to better page load and less server strain, as the fetching/processing of the data would be done client-side.

    Read the article

  • MongoDb vs Ehcache caching advise for speeding up read only mysql Database

    - by paddydub
    I'm building a Route Planner Webapp using Spring/Hibernate/Tomcat and a mysql database, I have a database containing read only data, such as Bus Stop Coordinates, Bus times which is never updated. I'm trying to make the app run faster, each time the application is run it will preform approx 1000 reads to the database to calculate a route. I have setup a Ehcache which greatly improves the read from database times. I'm now setting terracotta + Ehcache distributed caching to share the cache with multiple Tomcat JVMs. This seems a bit complicated. I've tried memcached but it was not performing as fast as ehcache. I'm wondering if a MongoDb would be better suited. I have no experience with nosql but I would appreciate if anyone has any ideas. All i need is quick access to the read only database.

    Read the article

  • Suggest Cassandra data model for an existing schema

    - by Andriy Bohdan
    Hello guys! I hope there's someone who can help me suggest a suitable data model to be implemented using nosql database Apache Cassandra. More of than I need it to work under high loads and large amounts of data. Simplified I have 3 types of objects: Product Tag ProductTag Product: key - string key name - string .... - some other fields Tag: key - string key name - unique tag words ProductTag: product_key - foreign key referring to product tag_key - foreign key referring to tag rating - this is rating of tag for this product Each product may have 0 or many tags. Tag may be assigned to 1 or many products. Means relation between products and tags is many-to-many in terms of relational databases. Value of "rating" is updated "very" often. I need to be run the following queries Select objects by keys Select tags for product ordered by rating Select products by tag order by rating Update rating by product_key and tag_key The most important is to make these queries really fast on large amounts of data, considering that rating is constantly updated.

    Read the article

  • How to get a list of items when using cassandra

    - by Blankman
    When using a nosql type datastore like Cassandra, how would you return a result set based on a column? e.g. SELECT * FROM Articles WHERE category='blah' ORDER BY datetime DESC is this something that you would store in a sql db and then pull the data from cassandra? Or can cassandra handle this type of query? (assuming millions of rows in a db) From what I understand, cassandra is great at key based lookups, confused if it can and should be used for getting a list of data back and paging that data also (and if it is highly performant)

    Read the article

  • Help regarding no sql databases like hadoop, hbase etc

    - by user560370
    I am new to the distributed NoSQL databases like Hadoop, Cassandra, etc. I have few questions for which I seek an expert advice: Can you list problems/challenges one will generally face when making a shift from the present conventional database like MySQL to these large cluster-based databases? What are the difficulties, if any, when one needs to adapt to a newer version of these open source projects? Can you list out the things which are generally stored/kept in memcached for fast rendering of the page? How can I understand the source code of open-source projects so that I can build on it and maybe give back to the community? Above questions may sound to be idiotic and basic but please it's a request for the experts to answer the above questions in detailed and to best of their abilities.

    Read the article

  • How to modernize an enormous legacy database?

    - by smayers81
    I have a question, just looking for suggestions here. So, my application is 'modernizing' a desktop application by converting it to the web, with an ICEFaces UI and server side written in Java. However, they are keeping around the same Oracle database, which at current count has about 700-900 tables and probably a billion total records in the tables. Some individual tables have 250 million rows, many have over 25 million. Needless to say, the database is not scaling well. As a result, the performance of the application is looking to be abysmal. The architects / decision makers-that-be have all either refused or are unwilling to restructure the persistence. So, basically we are putting a fresh coat of paint on a functional desktop application that currently serves most user needs and does so with relative ease and quick performance. I am having trouble sleeping at night thinking of how poorly this application is going to perform and how difficult it is going to be for everyday users to do their job. So, my question is, what options do I have to mitigate this impending disaster? Is there some type of intermediate layer I can put in between the database and the Java code to speed up performance while at the same time keeping the database structure intact? Caching is obviously an option, but I don't see that as being a cure-all. Is it possible to layer a NoSQL DB in between or something?

    Read the article

  • Non-Relational Database Design

    - by Ian Varley
    I'm interested in hearing about design strategies you have used with non-relational "nosql" databases - that is, the (mostly new) class of data stores that don't use traditional relational design or SQL (such as Hypertable, CouchDB, SimpleDB, Google App Engine datastore, Voldemort, Cassandra, SQL Data Services, etc.). They're also often referred to as "key/value stores", and at base they act like giant distributed persistent hash tables. Specifically, I want to learn about the differences in conceptual data design with these new databases. What's easier, what's harder, what can't be done at all? Have you come up with alternate designs that work much better in the non-relational world? Have you hit your head against anything that seems impossible? Have you bridged the gap with any design patterns, e.g. to translate from one to the other? Do you even do explicit data models at all now (e.g. in UML) or have you chucked them entirely in favor of semi-structured / document-oriented data blobs? Do you miss any of the major extra services that RDBMSes provide, like relational integrity, arbitrarily complex transaction support, triggers, etc? I come from a SQL relational DB background, so normalization is in my blood. That said, I get the advantages of non-relational databases for simplicity and scaling, and my gut tells me that there has to be a richer overlap of design capabilities. What have you done? FYI, there have been StackOverflow discussions on similar topics here: the next generation of databases changing schemas to work with Google App Engine choosing a document-oriented database

    Read the article

  • Django Models / SQLAlchemy are bloated! Any truly Pythonic DB models out there?

    - by Luke Stanley
    "Make things as simple as possible, but no simpler." Can we find the solution/s that fix the Python database world? from someAmazingDB import * class Task (model): title = '' isDone = False db.taskList = [] #or db.taskList = expandableTypeCollection(Task) #not sure what this syntax would be db['taskList'].append(Task(title='Beat old sql interfaces',done=False)) db.taskList.append(Task('Illustrate different syntax modes',True)) #at this point it should autosave #we should be able to reload the console and access like: >> from someAmazingDB import * >> print 'Done tasks:' >> for task in db.taskList: >> if task.done: >> print task 'Illustrate different syntax modes' I'm a fan of Python, webPy and Cherry Py, and KISS in general. We're talking automatic Python to SQL type translation or NoSQL. We don't have to totally be SQL compatible! Just a scalable subset or ignore it! Re:model changes, it's ok to ask the developer when they try to change it or have a set of sensible defaults. Here is the challenge: The above code should work with very little modification or thinking required. Why must we put up with compromise when we know better? It's 2010, we should be able to code scalable, simple databases in our sleep. If you think this is important, please upvote!

    Read the article

  • Getting started with massive data

    - by Max
    I'm a math guy and occasionally do some statistics/machine learning analysis consulting projects on the side. The data I have access to are usually on the smaller side, at most a couple hundred of megabytes (and almost always far less), but I want to learn more about handling and analyzing data on the gigabyte/terabyte scale. What do I need to know and what are some good resources to learn from? Hadoop/MapReduce is one obvious start. Is there a particular programming language I should pick up? (I primarily work now in Python, Ruby, R, and occasionally Java, but it seems like C and Clojure are often used for large-scale data analysis?) I'm not really familiar with the whole NoSQL movement, except that it's associated with big data. What's a good place to learn about it, and is there a particular implementation (Cassandra, CouchDB, etc.) I should get familiar with? Where can I learn about applying machine learning algorithms to huge amounts of data? My math background is mostly on the theory side, definitely not on the numerical or approximation side, and I'm guessing most of the standard ML algorithms don't really scale. Any other suggestions on things to learn would be great!

    Read the article

< Previous Page | 4 5 6 7 8 9 10 11 12 13 14 15  | Next Page >