nosql - Page 12 - Developer IT

High performance querying - Sugestions please

- by Alex Takitani

Supposing that I have millions of user profiles, with hundreds of fields (name, gender, preferred pet and so on...). With database would You choose? Suppose that You have a Facebook like load. Speed is a must. Open Source preferred. I've read a lot about Cassandra, HBase, Mongo, Mysql... I just can't decide.....

Read the article

How fast are EC/2 nodes between each other?

- by tesmar

Hi, I am looking to setup Amazon EC/2 nodes on rails with Riak. I am looking to be able to sync the riak DBs and if the cluster gets a query, to be able to tell where the data lies and retrieve it quickly. In your opinion(s), is EC/2 fast enough between nodes to query a Riak DB, return the results, and get them back to the client in a timely manner? I am new to all of this, so please be kind :)

Read the article

Can I do transactions and locks in CouchDB?

- by damian

I need to do transactions (begin, commit or rollback), locks (select for update). How can I do it in a document model db? Edit: The case is this: I want to run an auctions site. And I think how to direct purchase as well. In a direct purchase I have to decrement the quantity field in the item record, but only if the quantity is greater than zero. That is why I need locks and transactions. I don't know how to address that without locks and/or transactions. Can I solve this with CouchDB?

Read the article

Cassandra performance slow down with counter column

- by tubcvt

I have a cluster (4 node ) and a node have 16 core and 24 gb ram: 192.168.23.114 datacenter1 rack1 Up Normal 44.48 GB 25.00% 192.168.23.115 datacenter1 rack1 Up Normal 44.51 GB 25.00% 192.168.23.116 datacenter1 rack1 Up Normal 44.51 GB 25.00% 192.168.23.117 datacenter1 rack1 Up Normal 44.51 GB 25.00% We use about 10 column family (counter column) to make some system statistic report. Problem on here is that When i set replication_factor of this keyspace from 1 to 2 (contain 10 counter column family ), all cpu of node increase from 10% ( when use replication factor=1) to --- 90%. :( :( who can help me work around that :( . why counter column consume too much cpu time :(. thanks all

Read the article

Revisions: algorithm and data structure

- by SODA

Hi, I need ideas for structuring and processing data with revisions. For example, I have a database of objects (e.g. cars). Each object has a number of properties, which can be arbitrary, so there's no a set schema to describe these objects. These objects are probably saved as key-value pairs. Now I need to change property of an object. I don't want to completely rewrite it - I want to be able to go back and see history of changes to these properties, that's why I want to add new property and keep the old one (so I guess a timestamp would do the job of telling which property is the latest). At the same time I want to be able to get info about any object in a snap, with only latest versions of each of the properties. Any ideas what would be the best approach? At least please point me in the right direction. Thanks!

Read the article

In MongoDB, how can I replicate this simple query using map/reduce in ruby?

- by Matthew Rathbone

Hi, So using the regular MongoDB library in Ruby I have the following query to find average filesize across a set of 5001 documents: avg = 0 total = collection.count() Rails.logger.info "#{total} asset creation stats in the system" collection.find().each {|row| avg += (row["filesize"] * (1/total.to_f)) if row["filesize"]} Its pretty simple, so I'm trying to do the same using map/reduce as a learning exercise. This is what I came up with: map = 'function(){emit("filesizes", {size: this.filesize, num: 1});}' reduce = 'function(k, vals){ var result = {size: 0, num: 0}; for(var x in vals) { var new_total = result.num + vals[x].num; result.num = new_total result.size = result.size + (vals[x].size * (vals[x].num / new_total)); } return result; }' @results = collection.map_reduce(map, reduce) However the two queries come back with two different results! What am I doing wrong?

Read the article

What happens to a Redis data store if the data exceeds available ram?

- by Chris Marisic

What happens to a Redis data store if the data exceeds available ram?

Read the article

how to Solve the "Digg" problem in MongoDB

- by user193116

A while back,a Digg developer had posted this blog ,"http://about.digg.com/blog/looking-future-cassandra", where the he described one of the issues that were not optimally solved in MySQL. This was cited as one of the reasons for their move to Cassandra. I have been playing with MongoDB and I would like to understand how to implement the MongoDB collections for this problem From the article, the schema for this information in MySQL : CREATE TABLE Diggs ( id INT(11), itemid INT(11), userid INT(11), digdate DATETIME, PRIMARY KEY (id), KEY user (userid), KEY item (itemid) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; CREATE TABLE Friends ( id INT(10) AUTO_INCREMENT, userid INT(10), username VARCHAR(15), friendid INT(10), friendname VARCHAR(15), mutual TINYINT(1), date_created DATETIME, PRIMARY KEY (id), UNIQUE KEY Friend_unique (userid,friendid), KEY Friend_friend (friendid) ) ENGINE=InnoDB DEFAULT CHARSET=utf8; This problem is ubiquitous in social networking scenario implementation. People befriend a lot of people and they in turn digg a lot of things. Quickly showing a user what his/her friends are up to is very critical. I understand that several blogs have since then provided a pure RDBMs solution with indexes for this issue; however I am curious as to how this could be solved in MongoDB.

Read the article

What should i do for accomodating large scale data storage and retrieval?

- by kailashbuki

There's two columns in the table inside mysql database. First column contains the fingerprint while the second one contains the list of documents which have that fingerprint. It's much like an inverted index built by search engines. An instance of a record inside the table is shown below; 34 "doc1, doc2, doc45" The number of fingerprints is very large(can range up to trillions). There are basically following operations in the database: inserting/updating the record & retrieving the record accoring to the match in fingerprint. The table definition python snippet is: self.cursor.execute("CREATE TABLE IF NOT EXISTS `fingerprint` (fp BIGINT, documents TEXT)") And the snippet for insert/update operation is: if self.cursor.execute("UPDATE `fingerprint` SET documents=CONCAT(documents,%s) WHERE fp=%s",(","+newDocId, thisFP))== 0L: self.cursor.execute("INSERT INTO `fingerprint` VALUES (%s, %s)", (thisFP,newDocId)) The only bottleneck i have observed so far is the query time in mysql. My whole application is web based. So time is a critical factor. I have also thought of using cassandra but have less knowledge of it. Please suggest me a better way to tackle this problem.

Read the article

Are there any Design Guidelines for Documental Databases?

- by Hugo

Just wondering if there are any kind of guidelines for when you are designing a document-oriented db and I am talking especially about CouchDb. I know that being schemaless things can take the shape that we want but, are there any best practices? Thanks in advance! =D

Read the article

In Cassandra terminology, what is TimeUUID?

- by knorv

In Cassandra terminology, what is TimeUUID and when is it used?

Read the article

What are some "mental steps" a developer must take to begin moving from SQL to NO-SQL (CouchDB, Fath

- by Byron Sommardahl

I have my mind firmly wrapped around relational databases and how to code efficiently against them. Most of my experience is with MySQL and SQL. I like many of the things I'm hearing about document-based databases, especially when someone in a recent podcast mentioned huge performance benefits. So, if I'm going to go down that road, what are some of the mental steps I must take to shift from SQL to NO-SQL? If it makes any difference in your answer, I'm a C# developer primarily (today, anyhow). I'm used to ORM's like EF and Linq to SQL. Before ORMs, I rolled my own objects with generics and datareaders. Maybe that matters, maybe it doesn't. Here are some more specific: How do I need to think about joins? How will I query without a SELECT statement? What happens to my existing stored objects when I add a property in my code? (feel free to add questions of your own here)

Read the article

Retrieve every key of a column family in Cassandra

- by Matroska

Hi all, I have found no way to translate a simple select like SELECT * FROM USER into Cassandra. Is it possible to simply retrieve all the keys in a ColumnFamily? The only one I have found is a select with a key range (get_range_slices). Is there a way to not define the key range? Thanks Tobia Loschiavo

Read the article

How to run MongoDB as windows service?

- by stefan.hery

How to setup MongoDB so it can run as windows service? Thanks

Read the article

Advanced queries in HBase

- by Teflon Ted

Given the following HBase schema scenario (from the official FAQ)... How would you design an Hbase table for many-to-many association between two entities, for example Student and Course? I would define two tables: Student: student id student data (name, address, ...) courses (use course ids as column qualifiers here) Course: course id course data (name, syllabus, ...) students (use student ids as column qualifiers here) This schema gives you fast access to the queries, show all classes for a student (student table, courses family), or all students for a class (courses table, students family). How would you satisfy the request: "Give me all the students that share at least two courses in common"? Can you build a "query" in HBase that will return that set, or do you have to retrieve all the pertinent data and crunch it yourself in code?

Read the article

How does CouchDB perform for a regularly updated dataset?

- by Ritesh M Nayak

I am planning on using CouchDB on a project. But as the querying mechanism involves writing views (which are a lot like indexes on regular RDMBMS's) I was wondering, if the document database keeps getting updated a lot ( a write heavy database) would CouchDB perform well compared to a regular RDBMS? Or do we have to compact/re-index the system occasionally to make it perform faster?

Read the article

When is porting data from MySQL to CouchDB NOT advisable? Seeking cautionary tales

- by dan

I've dabbled in CouchDB and I have pretty good MySQL experience. I've also created one production application that uses both. I like MySQL but I've run into scaling/concurrency issues with MySQL that CouchDB advertises itself as a general solution for. The problem is that I have MySQL based applications that are pretty huge, and I don't really know whether it would be a good idea or not to try to port them over to a CouchDB datastore. I don't want to put in a lot of time and effort only to find out that my application is really not a good fit for CouchDB. Is there any sort of informed consensus on when porting a MySQL based app to CouchDB is NOT advisable? Any cautionary tales? I think CouchDB is really cool and want to use it more. I'd also like to know ahead of time what specific types of data querying scenarios CouchDB is really not good for, or if CouchDB can really replace MySQL for all the applications I create going forward.

Read the article

MongoDB equivalent of SQL "OR"

- by Matt

So, MongoDB defaults to "AND" when finding records. For example: db.users.find({age: {'$gte': 30}, {'$lte': 40}}); The above query finds users = 30 AND <= 40 years old. How would I find users <= 30 OR = 40 years old?

Read the article

How to design an exception logging table using HyperTable and access it via the Java client?

- by ikevinjp

If I have the following table schema to log an exception (in standard SQL schema): Table: ExceptionLog Columns: ID (Long), ExceptionClass (String), ExceptionMessage (String), Host (String), Port (Integer), HttpHeader (String), HttpPostBody (String), HttpMethod (String) How would I design the same thing in HyperTable (specifically, what is the best approach for efficiency)? And, how would I code it using the HyperTable Java client?

Read the article

Is Cassandra database row size limited by available memory?

- by Adam Hollidge

I'm working with very long time series -- hundreds of millions of data points in one series -- and am considering Cassandra as a data store. In this question, one of the Cassandra committers (the über helpful jbellis) says that Cassandra rows can be very large, and that column slicing operations are faster than row slices, hence my question: Is the row size still limited by available memory?

Read the article

Most proper way to use inherited classes with shared scopes in Mongo?

- by Trip

I have the TestVisual class that is inherited by the Game class : class TestVisual < Game include MongoMapper::Document end class Game include MongoMapper::Document belongs_to :maestra key :incorrect, Integer key :correct, Integer key :time_to_complete, Integer key :maestra_id, ObjectId timestamps! end As you can see it belongs to Maestra. So I can do Maestra.first.games But I can not to Maestra.first.test_visuals Since I'm working specifically with TestVisuals, that is ideally what I would like to pull. Is this possible with Mongo. If it isn't or if it isn't necessary, is there any other better way to reach the TestVisual object from Maestra and still have it inherit Game ?

Read the article

Data storage advice needed: Best way to store location + time data?

- by sobedai

I have a project in mind that will require the majority of queries to be keyed off of lat/long as well as date + time. Initially, I was thinking of a standard RDBMS where lat, long, and the datetime field are properly indexed. Then, I began thinking of a document based system where the document was essentially a timestamp and each document has lat/long with in it. Each document could have n objects associated with it. I'm looking for advice on what would be the best type of storage engine for this sort of thing is - which of the above idea would be better or if there is something else completely that is the ideal solution. Thanks

Read the article

Database solution for 200million writes/day, monthly summarization queries

- by sb

Hello. I'm looking for help deciding on which database system to use. (I've been googling and reading for the past few hours; it now seems worthwhile to ask for help from someone with firsthand knowledge.) I need to log around 200 million rows (or more) per 8 hour workday to a database, then perform weekly/monthly/yearly summary queries on that data. The summary queries would be for collecting data for things like billing statements, eg. "How many transactions of type A did each user run this month?" (could be more complex, but that's the general idea). I can spread the database amongst several machines, as necessary, but I don't think I can take old data offline. I'll definitely need to be able to query a month's worth of data, maybe a year. These queries would be for my own use, and wouldn't need to be generated in real-time for an end-user (they could run overnight, if needed). Does anyone have any suggestions as to which databases would be a good fit? P.S. Cassandra looks like it would have no problem handling the writes, but what about the huge monthly table scans? Is anyone familiar with Cassandra/Hadoop MapReduce performance?

Read the article

How are Cassandra's 0.7 Secondary Indexes stored?

- by user574793

We have been using Cassandra 0.6 and now have Column Families with millions of keys. We are interested in using the new Secondary Index feature available in the 0.7 but couldn't find any documentation on how the new index is stored. Is there any disk-space limitation or is the index stored similar to keys in that it's spread over multiple nodes? I've tried combing through the Cassandra site for an answer but to no avail.

Read the article

How should I best store these files?

- by Triton Man

I have a set of image files, they are generally very small, between 5k and 100k. They can be any size though, upwards of 50mb but this is very rare. When these images are put into the system they are not ever modified. There is about 50 TB of these images total. They are currently chunked and stored in BLOBs in Oracle, but we want to change this since it requires special software to extract them. These images are access sometimes at a rate of over 100 requests per second among about 10 servers. I'm thinking about Hadoop or Cassandra, but I really don't know which would be best or how best to index them.

Search Results

Search found 549 results on 22 pages for 'nosql'.

Page 12/22 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

- by Alex Takitani

- by tesmar

- by damian

- by tubcvt

- by SODA

- by Matthew Rathbone

- by Chris Marisic

- by user193116

- by kailashbuki

- by Hugo

- by knorv

- by Byron Sommardahl

- by Matroska

- by stefan.hery

- by Teflon Ted

- by Ritesh M Nayak

- by dan

- by Matt

- by ikevinjp

- by Adam Hollidge

- by Trip

- by sobedai

- by sb

- by user574793

- by Triton Man

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >