Search Results

Search found 549 results on 22 pages for 'nosql'.

Page 11/22 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

Hadoop Map/Reduce - simple use example to do the following...

- by alexeypro

I have MySQL database, where I store the following BLOB (which contains JSON object) and ID (for this JSON object). JSON object contains a lot of different information. Say, "city:Los Angeles" and "state:California". There are about 500k of such records for now, but they are growing. And each JSON object is quite big. My goal is to do searches (real-time) in MySQL database. Say, I want to search for all JSON objects which have "state" to "California" and "city" to "San Francisco". I want to utilize Hadoop for the task. My idea is that there will be "job", which takes chunks of, say, 100 records (rows) from MySQL, verifies them according to the given search criteria, returns those (ID's) which qualify. Pros/cons? I understand that one might think that I should utilize simple SQL power for that, but the thing is that JSON object structure is pretty "heavy", if I put it as SQL schemas, there will be at least 3-5 tables joins, which (I tried, really) creates quite a headache, and building all the right indexes eats RAM faster than I one can think. ;-) And even then, every SQL query has to be analyzed to be utilizing the indexes, otherwise with full scan it literally is a pain. And with such structure we have the only way "up" is just with vertical scaling. But I am not sure it's the best option for me, as I see how JSON objects will grow (the data structure), and I see that the number of them will grow too. :-) Help? Can somebody point me to simple examples of how this can be done? Does it make sense at all? Am I missing something important? Thank you.

Read the article
When I remove rows in Cassandra I delete only columns not row keys

- by Matroska

Hi, If I delete every keys in a ColumnFamily in a Cassandra db using remove(key), then if I use get_range_slices, rows are still there but without columns. How could I remove entire rows? Thanks Tobia Loschiavo

Read the article
Is it bad practice to extend the MongoEngine User document?

- by Soviut

I'm integrating MongoDB using MongoEngine. It provides auth and session support that a standard pymongo setup would lack. In regular django auth, it's considered bad practice to extend the User model since there's no guarantee it will be used correctly everywhere. Is this the case with mongoengine.django.auth? If it is considered bad practice, what is the best way to attach a separate user profile? Django has mechanisms for specifying an AUTH_PROFILE_MODULE. Is this supported in MongoEngine as well, or should I be manually doing the lookup?

Read the article
Non-Relational DBMS Design Resources

- by Matt Luongo

Hey guys, As a personal project, I'm looking to build a rudimentary DBMS. I've read the relevant sections in Elmasri & Navathe (5ed), but could use a more focused text. The rub is that I want to play with novel non-relational data models. While a lot of E&N was great- indexing implementation details in particular- the more advanced DBMS implementation was only targeted to a relational model. I could also use something a bit more practical and detail-oriented, with real-world recommendations. I'd like to defer staring at DBMS source for a while if I can. Any ideas?

Read the article
How to organize integrity tests and code unit tests?

- by karlthorwald

I have several files with code testing code (which uses a "unittest" class). Later I found it would be nice to test database integrity also. I put this into a separate directory tree. (Things like the keys have correct format, parent and child nodes are pointing correctly and such.) I use the same unittest class for the integrity tests. Now I wonder if it makes really sense to keep this separate. To test the integrity of data I often duplicate parts of code that I use to test the code that handles the data. But it is not the same. The code tests use test databases (that get deleted after each test) and the integrity tests connect to the live data and analyze it. The integrity tests I want to call from cron and send an alarm if something happens in the live database. How would you handle that? Are there standards for such a setup? What is your experience? My tendency is to put everything in the same file, which would result in the code tests also being executed by the cron on the production environment.

Read the article
MongoDB C# Driver Unable to Find by Object ID?

- by Hery

Using MongoDB C# driver (http://github.com/samus/mongodb-csharp), seems that I'm unable to get the data by ObjectId. Below the command that I'm using: var spec = new Document { { "_id", id } }; var doc = mc.FindOne(spec); I also tried this: var spec = new Document { { "_id", "ObjectId(\"" + id + "\")" } }; var doc = mc.FindOne(spec); Both return nothing. Meanwhile, if I query it from the mongo console, it returns the expected result. My question is, does that driver actually support the lookup by ObjectId? Thanks..

Read the article
What is an example of relational data and what is an example of a document?

- by Aaron Fischer

I am looking for examples of when to use a relational data base vs when to use a document database.

Read the article
Neo4j Performing shortest path calculations on stored data

- by paddydub

I would like to store the following graph data in the database, graph.makeEdge( "s", "c", "cost", (double) 7 ); graph.makeEdge( "c", "e", "cost", (double) 7 ); graph.makeEdge( "s", "a", "cost", (double) 2 ); graph.makeEdge( "a", "b", "cost", (double) 7 ); graph.makeEdge( "b", "e", "cost", (double) 2 ); Then run the Dijskra algorighm from a web servlet, to find shortest path calculations using the stored graph data. Then I will print the path to a html file from the servlet. Dijkstra<Double> dijkstra = getDijkstra( graph, 0.0, "s", "e" );

Read the article
Why do I get an error when inserting rows with Net::Cassandra::Easy and Cassandra 0.5x?

- by knorv

When using the Perl module Net::Cassandra::Easy to interface with Cassandra I use the following code to read colums col[123] from rows row[123] in the column-family Standard1: my $cassandra = Net::Cassandra::Easy->new(keyspace => 'Keyspace1', server => 'localhost'); $cassandra->connect(); my $result = $cassandra->get(['row1', 'row2', 'row3'], family => 'Standard1', byname => ['col1', 'col2', 'col3']); This works as expected. However, when trying to insert row row1 with .. $result = $cassandra->mutate(['row1'], family => 'Standard1', insertions => { "col1" => "Value to set." }); .. I get the error message Can't use string ("0") as a SCALAR ref while "strict refs" in use at .../Net/GenThrift/Thrift/BinaryProtocol.pm line 376. What am I doing wrong?

Read the article
MongoDB Schema Design - Real-time Chat

- by Nick

I'm starting a project which I think will be particularly suited to MongoDB due to the speed and scalability it affords. The module I'm currently interested in is to do with real-time chat. If I was to do this in a traditional RDBMS I'd split it out into: Channel (A channel has many users) User (A user has one channel but many messages) Message (A message has a user) The the purpose of this use case, I'd like to assume that there will be typically 5 channels active at one time, each handling at most 5 messages per second. Specific queries that need to be fast: Fetch new messages (based on an bookmark, time stamp maybe, or an incrementing counter?) Post a message to a channel Verify that a user can post in a channel Bearing in mind that the document limit with MongoDB is 4mb, how would you go about designing the schema? What would yours look like? Are there any gotchas I should watch out for?

Read the article
Best document format for addressbook in CouchDB

- by 2x2p1p

Hi guys. I really tried, tried so hard but i cant understand couchdb :( I must record the contact of several people, should i put every contact in a single document ? codeviewer.org/view/code:df8 Or in different documents ? codeviewer.org/view/code:df9 I confused, can somebody help me ? Thanks.

Read the article
How to organize live data integrity tests and code unit tests?

- by karlthorwald

I have several files with code testing code (which uses a "unittest" class). Later I found it would be nice to test database integrity also. I put this into a separate directory tree. (Things like the keys have correct format, parent and child nodes are pointing correctly and such.) I use the same unittest class for the integrity tests. Now I wonder if it makes really sense to keep this separate. To test the integrity of data I often duplicate parts of code that I use to test the code that handles the data. But it is not the same. The code tests use test databases (that get deleted after each test) and the integrity tests connect to the live data and analyze it. The integrity tests I want to call from cron and send an alarm if something happens in the live database. How would you handle that? Are there standards for such a setup? What is your experience? My tendency is to put everything in the same file, which would result in the code tests also being executed by the cron on the production environment.

Read the article
Can DBRefs contain additional fields?

- by Soviut

I've encountered several situations when using MongoDB that require the use of DBRefs. However, I'd also like to cache some fields from the referenced document in the DBRef itself. {$ref:'user', $id:'10285102912A', username:'Soviut'} For example, I may want to have the username available even though the user document is referenced. This would provide me all the benefits of a single document approach; Faster querying and eliminating the need to do manual dereferencing in my code. While at the same time allowing me to use references where they make sense. The idea being that when the referenced document is updated (a user changes their name, for example) my business layer can automatically update all the documents that reference it. Ultimately, I'm wondering if it's considered good form to store additional fields on my DBRefs? Will it break anything? Will I lose my data each time a reference is rewritten? Will drivers like pymongo support it?

Read the article
Can sphinx be used over cassandra?

- by Mickey Shine

I am planning to build a cassandra store system and also I need a full-text(Chinese) system too. Can sphinx be used on cassandra? (sphinx supports xml format but I am not going to use it, cause it is slow and much of time are spent on xml parsing). Or you can share your experiences if you have ever built a full-text searching system over cassandra. Thank you

Read the article
Choosing a distributed shared memory solution

- by mindas

I have a task to build a prototype for a massively scalable distributed shared memory (DSM) app. The prototype would only serve as a proof-of-concept, but I want to spend my time most effectively by picking the components which would be used in the real solution later on. The aim of this solution is to take data input from an external source, churn it and make the result available for a number of frontends. Those "frontends" would just take the data from the cache and serve it without extra processing. The amount of frontend hits on this data can literally be millions per second. The data itself is very volatile; it can (and does) change quite rapidly. However the frontends should see "old" data until the newest has been processed and cached. The processing and writing is done by a single (redundant) node while other nodes only read the data. In other words: no read-through behaviour. I was looking into solutions like memcached however this particular one doesn't fulfil all our requirements which are listed below: The solution must at least have Java client API which is reasonably well maintained as the rest of app is written in Java and we are seasoned Java developers; The solution must be totally elastic: it should be possible to add new nodes without restarting other nodes in the cluster; The solution must be able to handle failover. Yes, I realize this means some overhead, but the overall served data size isn't big (1G max) so this shouldn't be a problem. By "failover" I mean seamless execution without hardcoding/changing server IP address(es) like in memcached clients when a node goes down; Ideally it should be possible to specify the degree of data overlapping (e.g. how many copies of the same data should be stored in the DSM cluster); There is no need to permanently store all the data but there might be a need of post-processing of some of the data (e.g. serialization to the DB). Price. Obviously we prefer free/open source but we're happy to pay a reasonable amount if a solution is worth it. In any way, paid 24hr/day support contract is a must. The whole thing has to be hosted in our data centers so SaaS offerings like Amazon SimpleDB are out of scope. We would only consider this if no other options would be available. Ideally the solution would be strictly consistent (as in CAP); however, eventual consistence can be considered as an option. Thanks in advance for any ideas.

Read the article
Is Using Python to MapReduce for Cassandra Dumb?

- by UltimateBrent

Since Cassandra doesn't have MapReduce built in yet (I think it's coming in 0.7), is it dumb to try and MapReduce with my Python client or should I just use CouchDB or Mongo or something? The application is stats collection, so I need to be able to sum values with grouping to increment counters. I'm not, but pretend I'm making Google analytics so I want to keep track of which browsers appear, which pages they went to, and visits vs. pageviews. I would just atomically update my counters on write, but Cassandra isn't very good at counters either. May Cassandra just isn't the right choice for this? Thanks!

Read the article
Windows Azure Table Storage LINQ Operators

- by Ryan Elkins

Currently Table Storage supports From, Where, Take, and First. Are there plans to support any of the other 29 operators? If we have to code for these ourselves, how much of a performance difference are we looking at to something similar via SQL and SQL Server? Do you see it being somewhat comparable or will it be far far slower if I need to do a Count or Sum or Group By over a gigantic dataset? I like the Azure platform and the idea of cloud based storage. I like Windows Azure for the amount of data it can store and the schema-less nature of table storage. SQL Azure just won't work due to the high cost to storage space.

Read the article
What database works well with 200+GB of data?

- by taw

I've been using mysql (with innodb; on Amazon rds) because it's sort of universal default, but it's been ridiculously under-performing, and tweaking it only delays the inevitable. The data is mostly relatively short (<1kB of bytes each) blobs information about 100Ms of urls. There is (or should be, mysql cannot seem to handle it) very high amount of insert / update / retrieve but few complex queries - not that complex queries wouldn't be useful, but because mysql is so slow that it's far faster to get the data out, process it locally, and cache the results somewhere. I can keep tweaking mysql and throwing more hardware at it, but it seems increasingly futile. So what are the options? SQL/relational model/etc. optional - anything will do as long as it's fast, networked, and language-independent.

Read the article
How to differentiate between time to live and time to idle in ehcache

- by Jacques René Mesrine

The docs on ehache says: timeToIdleSeconds: Sets the time to idle for an element before it expires. i.e. The maximum amount of time between accesses before an element expires timeToLiveSeconds: Sets the time to live for an element before it expires. i.e. The maximum time between creation time and when an element expires. I understand timeToIdleSeconds But does it means that after the creation & first access of a cache item, the timeToLiveSeconds is not applicable anymore ?

Read the article
What is the largest known size of a CouchDB cluster and/or database?

- by Eric Bloch

What is the largest known size of a CouchDB cluster and/or database in terms of bytes of storage, #s of documents, and/or #s of nodes?

Read the article
inheritance in document database?

- by nils petersohn

i am wondering because i searched the pdf "xxx the definitive guide" and "beginning xxx" for the word "inheritance" but i didn't find anything? am i missing something? because i am doing a tablePerHierarchy inheritance with hibernate and mysql, does that become deprecated for some reason in xxx? (replace xxx with the "not only sql" database you like)

Read the article
Any Open Source software using Orient DB database? Have you any experiences with that database?

- by Juha Syrjälä

Do you know any open source software that uses Orient DB? Or have you used that product yourself? Any experiences to share? I have recently looked into Orient DB, and it has nice and feature interesting feature set (fast, embeddable in Java, simple API) but it seems that it is not widely used. Is it just because the Orient DB is a new player on the field?

Read the article
High performance querying - Suggestions please

- by Alex Takitani

Supposing that I have millions of user profiles, with hundreds of fields (name, gender, preferred pet and so on...). You want to make searches on profiles. Ex.:All profiles that has age between x and y, loves butterflies, hates chocolate.... With database would you choose? Suppose that You have a Facebook like load. Speed is a must. Open Source preferred. I've read a lot about Cassandra, HBase, Mongo, Mysql... I just can't decide.....

Read the article
High performance querying - Sugestions please

- by Alex Takitani

Supposing that I have millions of user profiles, with hundreds of fields (name, gender, preferred pet and so on...). With database would You choose? Suppose that You have a Facebook like load. Speed is a must. Open Source preferred. I've read a lot about Cassandra, HBase, Mongo, Mysql... I just can't decide.....

Read the article
Cassandra Production ready on Windows?

- by BlackTea

Question anyone know of any success stories of Cassandra running on windows in a production environment? I'm doing some work on Cassandra and trying to find the correct platform for it currently the platform is windows running MS-SQLas the data store. what are the dis-advantages if any when running Cassandra on a windows environment.

Read the article

Search Results

Search found 549 results on 22 pages for 'nosql'.

Page 11/22 | < Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >

- by alexeypro

- by Matroska

- by Soviut

- by Matt Luongo

- by karlthorwald

- by Hery

- by Aaron Fischer

- by paddydub

- by knorv

- by Nick

- by 2x2p1p

- by karlthorwald

- by Soviut

- by Mickey Shine

- by mindas

- by UltimateBrent

- by Ryan Elkins

- by taw

- by Jacques René Mesrine

- by Eric Bloch

- by nils petersohn

- by Juha Syrjälä

- by Alex Takitani

- by Alex Takitani

- by BlackTea

< Previous Page | 7 8 9 10 11 12 13 14 15 16 17 18 | Next Page >