Large scale storage for incrementally-appended documents?

Posted by Ben Dilts on Stack Overflow See other posts from Stack Overflow or by Ben Dilts
Published on 2011-01-03T01:43:17Z Indexed on 2011/01/03 1:53 UTC
Read the original article Hit count: 611

Filed under:

database

|

mongodb

|

couchdb

|

storage

I need to store hundreds of thousands (right now, potentially many millions) of documents that start out empty and are appended to frequently, but never updated otherwise or deleted. These documents are not interrelated in any way, and just need to be accessed by some unique ID.

Read accesses are some subset of the document, which almost always starts midway through at some indexed location (e.g. "document #4324319, save #53 to the end").

These documents start very small, at several KB. They typically reach a final size around 500KB, but many reach 10MB or more.

I'm currently using MySQL (InnoDB) to store these documents. Each of the incremental saves is just dumped into one big table with the document ID it belongs to, so reading part of a document looks like "select * from saves where document_id=14 and save_id > 53 order by save_id", then manually concatenating it all together in code.

Ideally, I'd like the storage solution to be easily horizontally scalable, with redundancy across servers (e.g. each document stored on at least 3 nodes) with easy recovery of crashed servers.

I've looked at CouchDB and MongoDB as possible replacements for MySQL, but I'm not sure that either of them make a whole lot of sense for this particular application, though I'm open to being convinced.

Any input on a good storage solution?

© Stack Overflow or respective owner

Related posts about database

SQL SERVER Retrieve and Explore Database Backup without Restoring Database Idera virtual database

as seen on Dot net Slackers - Search for 'Dot net Slackers'
I recently downloaded Ideras SQL virtual database, and tested it. There are a few things about this tool which caught my attention.My ScenarioIt is quite common in real life that sometimes observing or retrieving older data is necessary; however, it had changed as time passed by. The full database… >>> More
Cloning A Database On The Same Server Using Rman Duplicate From Active Database

as seen on Oracle Blogs - Search for 'Oracle Blogs'
To clone a database using Rman we used to require an existing Rman backup, on 11g we can clone databases using the "from active" database option. In this case we do not require an existing backup, the active datafiles will be used as the source for the clone. In order to clone with the source database… >>> More
cPickle ImportError: No module named multiarray

as seen on Stack Overflow - Search for 'Stack Overflow'
Hello, I'm using cPickle to save my Database into file. The code looks like that: def Save_DataBase(): import cPickle from scipy import * from numpy import * a=Results.VersionName #filename='D:/results/'+a[a.find('/')+1:-a.find('/')-2]+Results.AssType[:3]+str(random.randint(0,100))+Results.Distribution+"… >>> More
SQL SERVER – 2008 – Introduction to Snapshot Database – Restore From Snapshot

as seen on SQL Authority - Search for 'SQL Authority'
Snapshot database is one of the most interesting concepts that I have used at some places recently. Here is a quick definition of the subject from Book On Line: A Database Snapshot is a read-only, static view of a database (the source database). Multiple snapshots can exist on a source database and… >>> More
OTN ???? ?????? ???????

as seen on Oracle Blogs - Search for 'Oracle Blogs'
Database ?? Database ??????? Database ?????????? Java WebLogic Server/????????·???? SOA/BPM/????? ???????/???? ID??/?????? ?????EPM/BI EPM/BI ??????? EPM/BI ???? OS/??? ???? ????? MySQL Database ?? ???? ?? ????????? ??? ?? ORACLE MASTER… >>> More

Related posts about mongodb

MongoDB usage best practices

as seen on ASP.net Weblogs - Search for 'ASP.net Weblogs'
The project I'm working on uses MongoDB for some stuff so I'm creating some documents to help developers speedup the learning curve and also avoid mistakes and help them write clean & reliable code. This is my first version of it, so I'm pretty sure I will be adding more stuff to it, so stay tuned… >>> More
Errors trying to run MongoDB

as seen on Ask Ubuntu - Search for 'Ask Ubuntu'
I'm running Ubuntu Server 12.04 (32 bit) on an old (1998) computer. Everything's working fine until I try and start MongoDB. somekittens@DLserver01:~$ mongo MongoDB shell version: 2.2.2 connecting to: test Sun Dec 16 22:47:50 Error: couldn't connect to server 127.0.0.1:27017 src/mongo/shell/mongo… >>> More
push new value to mongodb inner array - mongodb/php

as seen on Stack Overflow - Search for 'Stack Overflow'
hi i have this document in mongo: { "_id": ObjectId("4d0b9c7a8b012fe287547157"), "done_by": ["1"] } and i want to add another value to "done_by" field, so my expected document will be:: { "_id": ObjectId("4d0b9c7a8b012fe287547157"), "done_by": ["1","2","3"] } i try this: $conn… >>> More
Write-only collections in MongoDB

as seen on Stack Overflow - Search for 'Stack Overflow'
I'm currently using MongoDB to record application logs, and while I'm quite happy with both the performance and with being able to dump arbitrary structured data into log records, I'm troubled by the mutability of log records once stored. In a traditional database, I would structure the grants for… >>> More
How to install mongoDB on windows?

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi! I am trying to test out mongoDB and see if it is anything for me. I downloaded the 32bit windows version, but have no idea on how to continue from now on. I normally use the WAMP services for developing on my local computer. Can i run mongoDB on Wamp? However, what's the best (easiest!) way… >>> More