data generation - Page 12

Data Structure Behind Amazon S3s Keys (Filtering Data Structure)

- by dimo414

I'd like to implement a data structure similar to the lookup functionality of Amazon S3. For those of you who don't know what I'm taking about, Amazon S3 stores all files at the root, but allows you to look up groups of files by common prefixes in their names, therefore replicating the power of a directory tree without the complexity of it. The catch is, both lookup and filter operations are O(1) (or close enough that even on very large buckets - S3's disk equivalents - both operations might as well be O(1))). So in short, I'm looking for a data structure that functions like a hash map, with the added benefit of efficient (at the very least not O(n)) filtering. The best I can come up with is extending HashMap so that it also contains a (sorted) list of contents, and doing a binary search for the range that matches the prefix, and returning that set. This seems slow to me, but I can't think of any other way to do it. Does anyone know either how Amazon does it, or a better way to implement this data structure?

Read the article

MySQL: Blank row in table after LOAD DATA INFILE

- by Tom

Hi, I'm uploading a large amount of data from a CSV (I'm doing it via MySQL Workbench): LOAD DATA INFILE 'C:/development/mydoc.csv' INTO TABLE mydatabase.mytable CHARACTER SET utf8 FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' LINES TERMINATED BY '\r'; However, I'm noticing that it keeps adding an empty line full of nulls/zeros after the last record. I'm guessing it's because of the "LINES TERMINATED" command. However, I need that to load the data in correctly. Is there some way around this / some better SQL to avoid the blank row in the table? Thanks

Read the article

Update tableview instantly as data pushed in core data iphone

- by user336685

I need to update the tableview as soon as the content is pushed in core data database. for this AppDelegate.m contains following code NSManagedObjectContext *moc = [self managedObjectContext]; NSFetchRequest *request = [[NSFetchRequest alloc] init]; [request setEntity:[NSEntityDescription entityForName:@"FeedItem" inManagedObjectContext:moc]]; //for loop // push data in code data & then save context [moc save:&error]; ZAssert(error == nil, @"Error saving context: %@", [error localizedDescription]); //for loop ends This code triggers following code from RootviewController.m - (void)controllerWillChangeContent:(NSFetchedResultsController*)controller { [[self tableView] beginUpdates]; } But this updates the tableview only at the end of the for loop ,the table does not get updated after immediate push in db. I tried following code but that didn't work - (void)controllerDidChangeContent:(NSFetchedResultsController *)controller { // In the simplest, most efficient, case, reload the table view. [self.tableView reloadData]; } I have been stuck with this problem for several days.Please help.Thanks in advance for solution.

Read the article

Core Data data type for just the date - not including time

- by Jason

I am new at Core Data, and it seems like it is a great way to manage the data store. However I am also very memory-conscious due to the fact that the iPhone doesn't have that much of it. I was a little surprised to see that the data types are so limited - eg. there is a Date type which includes also the time, but no Date type for just the date! All the time information takes up precious bytes of memory, if I just wanted an attribute with the date (e.g. 2/15/2010 rather than 2/15/2010 02:34:48), how could I do this? Is it possible?

Read the article

Long text input from user and PDF generation

- by Petteri Hietavirta

I have built a web application that can be seen as an overcomplicated application form. There are bunch of text areas with a given character limit. After the form submission various things happen and one of them is PDF generation. The text is queried from the DB and inserted in the PDF template created in iReports. This works fine but the major pain is overflowing text. The maximum number of characters is set based on 'average' text. But sometimes people prefer to write with CAPS or add plenty of linefeeds to format their text. These then cause user's text to overflow the space given in PDF. Unfortunately the PDF document must look like a real application form so I cannot allow unlimited space. What kind of approaches you have used to tackle this? Clean/restrict user input? Calculate the space requirement of the text based on font metrics? Provide preview of the PDF? (too bad users are not allowed to change their input after submission...)

Read the article

XSD, restrictions and code generation

- by bob

Hello, I'm working on some code generation for an existing project and I want to start from a xsd. So I can use tools as Xsd2Code / xsd.exe to generate the code and also the use the xsd to validate the xml. That part works without any problems. I also want to translate some of the restrictions to DataAnnotations (enrich Xsd2Code). For example xs:minInclusive / xs:maxInclusive I can translate to a RangeAttribute. But what to do with custom validation attributes that we created? Can I add custom facets / restrictions? And how? Or is there another solution / best practice. I would like to collect everything in a single (xsd) file so that one file contains the structure of the class (model) including the validation (attributes) that has to be added. <xs:element name="CertainValue"> <xs:simpleType> <xs:restriction base="xs:double"> <xs:minInclusive value="1" /> <xs:maxInclusive value="100" /> <xs_custom:customRule attribute="value" /> </xs:restriction> </xs:simpleType> </xs:element>

Read the article

Type-safe generic data structures in plain-old C?

- by Bradford Larsen

I have done far more C++ programming than "plain old C" programming. One thing I sorely miss when programming in plain C is type-safe generic data structures, which are provided in C++ via templates. For sake of concreteness, consider a generic singly linked list. In C++, it is a simple matter to define your own template class, and then instantiate it for the types you need. In C, I can think of a few ways of implementing a generic singly linked list: Write the linked list type(s) and supporting procedures once, using void pointers to go around the type system. Write preprocessor macros taking the necessary type names, etc, to generate a type-specific version of the data structure and supporting procedures. Use a more sophisticated, stand-alone tool to generate the code for the types you need. I don't like option 1, as it is subverts the type system, and would likely have worse performance than a specialized type-specific implementation. Using a uniform representation of the data structure for all types, and casting to/from void pointers, so far as I can see, necessitates an indirection that would be avoided by an implementation specialized for the element type. Option 2 doesn't require any extra tools, but it feels somewhat clunky, and could give bad compiler errors when used improperly. Option 3 could give better compiler error messages than option 2, as the specialized data structure code would reside in expanded form that could be opened in an editor and inspected by the programmer (as opposed to code generated by preprocessor macros). However, this option is the most heavyweight, a sort of "poor-man's templates". I have used this approach before, using a simple sed script to specialize a "templated" version of some C code. I would like to program my future "low-level" projects in C rather than C++, but have been frightened by the thought of rewriting common data structures for each specific type. What experience do people have with this issue? Are there good libraries of generic data structures and algorithms in C that do not go with Option 1 (i.e. casting to and from void pointers, which sacrifices type safety and adds a level of indirection)?

Read the article

Clever ways of implementing different data structures in C & data structures that should be used mor

- by Yktula

What are some clever (not ordinary) ways of implementing data structures in C, and what are some data structures that should be used more often? For example, what is the most effective way (generating minimal overhead) to implement a directed and cyclic graph with weighted edges in C? I know that we can store the distances in an array as is done here, but what other ways are there to implement this kind of a graph?

Read the article

Using GameKit to transfer CoreData data between iPhones, via NSDictionary

- by OscarTheGrouch

I have an application where I would like to exchange information, managed via Core Data, between two iPhones. First turning the Core Data object to an NSDictionary (something very simple that gets turned into NSData to be transferred). My CoreData has 3 string attributes, 2 image attributes that are transformables. I have looked through the NSDictionary API but have not had any luck with it, creating or adding the CoreData information to it. Any help or sample code regarding this would be greatly appreciated.

Read the article

Good practices for multiple language data in Core Data

- by Luca Bartoletti

Hi, i need a multilingual coredata db in my iphone app. I could create different database for each language but i hope that in iphone sdk exist an automatically way to manage data in different language core data like for resources and string. Someone have some hints?

Read the article

core-data relationships and data structure.

- by Boaz

What is the right way to build iPhone core data for this SMS like app (with location)? - I want to represent an entity of conversation with "profile1" "profile2" that heritage from a profile entity, and a message entity with: "to" "from" "body" where the "to" and "from" are equal to "profile1" and/or "profile2" in the conversation entity. How can I make such a relationships? is there a better way to represent the data (other structure)? Thanks

Read the article

convert unstructured data to structured data?

- by Codeguru

How to convert unstructured data into structured data?For example email,contacts to a structured format. Are there any algorithms to do this??

Read the article

When is the Data Vault model the right model for a data-warehouse?

- by Stephan Eggermont

I recently found a reference to 'Data Vault modeling' as a model for data-warehouses. The models I've seen before are Inmon and Kimball. The author refers to possible performance problems due to the joins needed. It looks like a nice model, but I wonder about the gotcha's. Are there any experience reports on-line?

Read the article

[Visual C++]Forcing memory alignment of variables/data-structures

- by John

I'm looking at using SSE and I gather aligning data on 16byte boundaries is recommended. There are two cases to consider: float data[4]; struct myystruct { float x,y,z,w; }; I'm not sure the first case can be done explicitly, though there's perhaps a compiler option I could use? In the second case I remember being able to control packing in old versions of GCC several years back, is this still possible?

Read the article

SQL SERVER – Data Pages in Buffer Pool – Data Stored in Memory Cache

- by pinaldave

This will drop all the clean buffers so we will be able to start again from there. Now, run the following script and check the execution plan of the query. Have you ever wondered what types of data are there in your cache? During SQL Server Trainings, I am usually asked if there is any way one can know how much data in a table is stored in the memory cache? The more detailed question I usually get is if there are multiple indexes on table (and used in a query), were the data of the single table stored multiple times in the memory cache or only for a single time? Here is a query you can run to figure out what kind of data is stored in the cache. USE AdventureWorks GO SELECT COUNT(*) AS cached_pages_count, name AS BaseTableName, IndexName, IndexTypeDesc FROM sys.dm_os_buffer_descriptors AS bd INNER JOIN ( SELECT s_obj.name, s_obj.index_id, s_obj.allocation_unit_id, s_obj.OBJECT_ID, i.name IndexName, i.type_desc IndexTypeDesc FROM ( SELECT OBJECT_NAME(OBJECT_ID) AS name, index_id ,allocation_unit_id, OBJECT_ID FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.hobt_id AND (au.type = 1 OR au.type = 3) UNION ALL SELECT OBJECT_NAME(OBJECT_ID) AS name, index_id, allocation_unit_id, OBJECT_ID FROM sys.allocation_units AS au INNER JOIN sys.partitions AS p ON au.container_id = p.partition_id AND au.type = 2 ) AS s_obj LEFT JOIN sys.indexes i ON i.index_id = s_obj.index_id AND i.OBJECT_ID = s_obj.OBJECT_ID ) AS obj ON bd.allocation_unit_id = obj.allocation_unit_id WHERE database_id = DB_ID() GROUP BY name, index_id, IndexName, IndexTypeDesc ORDER BY cached_pages_count DESC; GO Now let us run the query above and observe the output of the same. We can see in the above query that there are four columns. Cached_Pages_Count lists the pages cached in the memory. BaseTableName lists the original base table from which data pages are cached. IndexName lists the name of the index from which pages are cached. IndexTypeDesc lists the type of index. Now, let us do one more experience here. Please note that you should not run this test on a production server as it can extremely reduce the performance of the database. DBCC DROPCLEANBUFFERS This will drop all the clean buffers and we will be able to start again from there. Now run following script and check the execution plan for the same. USE AdventureWorks GO SELECT UnitPrice, ModifiedDate FROM Sales.SalesOrderDetail WHERE SalesOrderDetailID BETWEEN 1 AND 100 GO The execution plans contain the usage of two different indexes. Now, let us run the script that checks the pages cached in SQL Server. It will give us the following output. It is clear from the Resultset that when more than one index is used, datapages related to both or all of the indexes are stored in Memory Cache separately. Let me know what you think of this article. I had a great pleasure while writing this article because I was able to write on this subject, which I like the most. In the next article, we will exactly see what data are cached and those that are not cached, using a few undocumented commands. Reference: Pinal Dave (http://blog.SQLAuthority.com) Filed under: DMV, Pinal Dave, SQL, SQL Authority, SQL Optimization, SQL Query, SQL Scripts, SQL Server, SQL Tips and Tricks, T SQL, Technology Tagged: SQL DMV

Read the article

Bitmask data insertions in SSDT Post-Deployment scripts

- by jamiet

On my current project we are using SQL Server Data Tools (SSDT) to manage our database schema and one of the tasks we need to do often is insert data into that schema once deployed; the typical method employed to do this is to leverage Post-Deployment scripts and that is exactly what we are doing. Our requirement is a little different though, our data is split up into various buckets that we need to selectively deploy on a case-by-case basis. I was going to use a SQLCMD variable for each bucket (defaulted to some value other than “Yes”) to define whether it should be deployed or not so we could use something like this in our Post-Deployment script: IF ($(DeployBucket1Flag) = 'Yes')BEGIN :r .\Bucket1.data.sqlENDIF ($(DeployBucket2Flag) = 'Yes')BEGIN :r .\Bucket2.data.sqlENDIF ($(DeployBucket3Flag) = 'Yes')BEGIN :r .\Bucket3.data.sqlEND That works fine and is, I’m sure, a very common technique for doing this. It is however slightly ugly because we have to litter our deployment with various SQLCMD variables. My colleague James Rowland-Jones (whom I’m sure many of you know) suggested another technique – bitmasks. I won’t go into detail about how this works (James has already done that at Using a Bitmask - a practical example) but I’ll summarise by saying that you can deploy different combinations of the buckets simply by supplying a different numerical value for a single SQLCMD variable. Each bit of that value’s binary representation signifies whether a particular bucket should be deployed or not. This is better demonstrated using the following simple script (which can be easily leveraged inside your Post-Deployment scripts): /* $(DeployData) is a SQLCMD variable that would, if you were using this in SSDT, be declared in the SQLCMD variables section of your project file. It should contain a numerical value, defaulted to 0. In this example I have declared it using a :setvar statement. Test the affect of different values by changing the :setvar statement accordingly. Examples: :setvar DeployData 1 will deploy bucket 1 :setvar DeployData 2 will deploy bucket 2 :setvar DeployData 3 will deploy buckets 1 & 2 :setvar DeployData 6 will deploy buckets 2 & 3 :setvar DeployData 31 will deploy buckets 1, 2, 3, 4 & 5 */ :setvar DeployData 0 DECLARE @bitmask VARBINARY(MAX) = CONVERT(VARBINARY,$(DeployData)); IF (@bitmask & 1 = 1) BEGIN PRINT 'Bucket 1 insertions'; END IF (@bitmask & 2 = 2) BEGIN PRINT 'Bucket 2 insertions'; END IF (@bitmask & 4 = 4) BEGIN PRINT 'Bucket 3 insertions'; END IF (@bitmask & 8 = 8) BEGIN PRINT 'Bucket 4 insertions'; END IF (@bitmask & 16 = 16) BEGIN PRINT 'Bucket 5 insertions'; END An example of running this using DeployData=6 The binary representation of 6 is 110. The second and third significant bits of that binary number are set to 1 and hence buckets 2 and 3 are “activated”. Hope that makes sense and is useful to some of you! @Jamiet P.S. I used the awesome HTML Copy feature of Visual Studio’s Productivity Power Tools in order to format the T-SQL code above for this blog post.

Read the article

Looking for Cutting-Edge Data Integration: 2014 Excellence Awards

- by Sandrine Riley

Normal 0 false false false EN-US X-NONE X-NONE MicrosoftInternetExplorer4 It is nomination time!!! This year's Oracle Fusion Middleware Excellence Awards will honor customers and partners who are creatively using various products across Oracle Fusion Middleware. Think you have something unique and innovative with one or a few of our Oracle Data Integration products? We would love to hear from you! Please submit today. The deadline for the nomination is June 20, 2014. What you win: An Oracle Fusion Middleware Innovation trophy One free pass to Oracle OpenWorld 2014 Priority consideration for placement in Profit magazine, Oracle Magazine, or other Oracle publications & press release Oracle Fusion Middleware Innovation logo for inclusion on your own Website and/or press release Let us reminisce a little… For details on the 2013 Data Integration Winners: Royal Bank of Scotland’s Market and International Banking and The Yalumba Wine Company, check out this blog post: 2013 Oracle Excellence Awards for Fusion Middleware Innovation… and the Winners for Data Integration are… and for details on the 2012 Data Integration Winners: Raymond James and Morrisons, check out this blog post: And the Winners of Fusion Middleware Innovation Awards in Data Integration are… Now to view the 2013 Winners (for all categories). We hope to honor you! Here's what you need to do: Click here to submit your nomination today. And just a reminder: the deadline to submit a nomination is 5pm Pacific Time on June 20, 2014. /* Style Definitions */ table.MsoNormalTable {mso-style-name:"Table Normal"; mso-tstyle-rowband-size:0; mso-tstyle-colband-size:0; mso-style-noshow:yes; mso-style-priority:99; mso-style-qformat:yes; mso-style-parent:""; mso-padding-alt:0in 5.4pt 0in 5.4pt; mso-para-margin-top:0in; mso-para-margin-right:0in; mso-para-margin-bottom:10.0pt; mso-para-margin-left:0in; line-height:115%; mso-pagination:widow-orphan; font-size:11.0pt; font-family:"Calibri","sans-serif"; mso-ascii-font-family:Calibri; mso-ascii-theme-font:minor-latin; mso-hansi-font-family:Calibri; mso-hansi-theme-font:minor-latin;}

Read the article

How do I create tileable solid noise for map generation?

- by nickbadal

Hey guys, I'm trying to figure out how to generate tileable fractals in code (for game maps, but that's irrelevant) I've been trying to modify the Solid Noise plug-in shipped with GIMP (with my extremely limited understanding of how the code works) but I cant get mine to work correctly. My modified code so far (Java) GIMP's solid noise module that I'm basing my code off of (C) Here is what I'm trying to achieve but This is what I'm getting So if anybody can see what I've done wrong, or has a suggestion how I could go about doing it differently, I'd greatly appreciate it. Thanks in advance. And if I'm asking way to much or if just a huge failure at life, I apologize.

Read the article

Help with Perl persistent data storage using Data::Dumper

- by stephenmm

I have been trying to figure this out for way to long tonight. I have googled it to death and none of the examples or my hacks of the examples are getting it done. It seems like this should be pretty easy but I just cannot get it. Here is the code: #!/usr/bin/perl -w use strict; use Data::Dumper; my $complex_variable = {}; my $MEMORY = "$ENV{HOME}/data/memory-file"; $complex_variable->{ 'key' } = 'value'; $complex_variable->{ 'key1' } = 'value1'; $complex_variable->{ 'key2' } = 'value2'; $complex_variable->{ 'key3' } = 'value3'; print Dumper($complex_variable)."TEST001\n"; open M, ">$MEMORY" or die; print M Data::Dumper->Dump([$complex_variable], ['$complex_variable']); close M; $complex_variable = {}; print Dumper($complex_variable)."TEST002\n"; # Then later to restore the value, it's simply: do $MEMORY; #eval $MEMORY; print Dumper($complex_variable)."TEST003\n"; And here is my output: $VAR1 = { 'key2' => 'value2', 'key1' => 'value1', 'key3' => 'value3', 'key' => 'value' }; TEST001 $VAR1 = {}; TEST002 $VAR1 = {}; TEST003 Everything that I read says that the TEST003 output should look identical to the TEST001 output which is exactly what I am trying to achieve. What am I missing here? Should I be "do"ing differently or should I be "eval"ing instead and if so how? Thanks for any help...

Read the article

Relational database data explorer / visualization?

- by Ian Boyd

Is there a tool that can let one browse relational data as a graph of connected nodes? For example, i'm faced with trying to cleanse some anomolous data. i can start with two offending rows. In this particular example, the TransactionID should, by business rules, be unique to the table, but i find a transaction that violates that rule: SELECT * FROM LCTTrans WHERE TransactionID = 1075048 LCTID TransactionID ========= ============= 4358 1075048 4359 1075048 2 row(s) affected But really what i want to begin to hunt down all the related data, to try to see which is right. So this hypothetical software would start by showing me these two rows: Next, i want to see that transaction that is linked into this table: Now that transaction points to an MAL, so show me that: Now lets add those two LCTs, that the transaction is "on". A transaction can be on only one LCT, yet this one is pointing to two: Okay computer, both of those LCTs point to an MAL and the transaction that created them, show me those: Those last two transactions, they also point at an MAL, and they themselves point to an LCT, show me those: Okay, now are there any entries in LCTTrans that point to LCTs 4358 or 4359?... And so on, and so on. Now i did all this manually, running single selects, copying and pasting uniqueidentifier keys and converting them into friendly id numbers so i could easily see the relationships. Is there software that can do this?

Read the article

How to properly set relationships in Core Data when using setValue and data already exists

- by ern

Let's say I have two objects: Articles and Categories. For the sake of this example all relevant categories have already been added to the data store. When looping through data that holds edits for articles, there is category relationship information that needs to be saved. I was planning on using the -setValue method in the Article class in order to set the relationships like so: - (void)setValue:(id)value forUndefinedKey:(NSString *)key { if([key isEqualToString:@"categories"]){ NSLog(@"trying to set categories..."); } } The problem is that value isn't a Category, it is just a string (or array of strings) holding the title of a category. I could certainly do a lookup within this method for each category and assign it, but that seems inefficient when processing a whole bunch of articles at once. Another option is to populate an array of all possible categories and just filter, but my question is where to store that array? Should it be a class method on Article? Is there a way to pass in additional data to the -setValue method? Is there another, better option for setting the relationship I'm not thinking of? Thanks for your help.

Read the article

Suggested Web Application Framework and Database for Enterprise, “Big-Data” App?

- by willOEM

I have a web application that I have been developing for a small group within my company over the past few years, using Pipeline Pilot (plus jQuery and Python scripting) for web development and back-end computation, and Oracle 10g for my RDBMS. Users upload experimental genomic data, which is parsed into a database, and made available for querying, transformation, and reporting. Experimental data sets are large and have many layers of metadata. A given experimental data record might have a foreign key relationship with a table that describes this data point's assay. Assays can cover multiple genes, which can have multiple transcript, which can have multiple mutations, which can affect multiple signaling pathways, etc. Users need to approach this data from any point in those layers in the metadata. Since all data sets for a given data type can run over a billion rows, this results in some large, dynamic queries that are hard to predict. New data sets are added on a weekly basis (~1GB per set). Experimental data is never updated, but the associated metadata can be updated weekly for a few records and yearly for most others. For every data set insert the system sees, there will be between 10 and 100 selects run against it and associated data. It is okay for updates and inserts to run slow, so long as queries run quick and are as up-to-date as possible. The application continues to grow in size and scope and is already starting to run slower than I like. I am worried that we have about outgrown Pipeline Pilot, and perhaps Oracle (as the sole database). Would a NoSQL database or an OLAP system be appropriate here? What web application frameworks work well with systems like this? I'd like the solution to be something scalable, portable and supportable X-years down the road. Here is the current state of the application: Web Server/Data Processing: Pipeline Pilot on Windows Server + IIS Database: Oracle 10g, ~1TB of data, ~180 tables with several billion-plus row tables Network Storage: Isilon, ~50TB of low-priority raw data

Read the article

Simple ADF page using BAM Data Control

- by [email protected]

var gaJsHost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3Cscript src='" + gaJsHost + "google-analytics.com/ga.js' type='text/javascript'%3E%3C/script%3E")); try { var pageTracker = _gat._getTracker("UA-15829414-1"); pageTracker._trackPageview(); } catch(err) {} Purpose : In this blog I will walk you through very simple steps to create an ADF page using BAM data control connection.Details : Create the projectOpen JDeveloper (make sure you have installed the SOA extension for JDev)Create new Application using "Generic Application" template.Click on "Next"Shuttle "ADF Faces" to right pane for the project technology.Click "Finish"Create a BAM connectionIn the resource palette click on "Folder->New Connection -> BAM"Enter the connection name and click "Next"Enter Connection details Click on "Test connection" and "Finish"Create the BAM Data Control Open the IDE connection created in above step.Drag and drop "Employees" to "Data controls" palette.Select "Flat Query" and Click "Finish".Create the View Create a new JSF page.From Data control Panel drag and drop "Employees->Query->ADF Read Only table"Right click and Run the page.

Read the article

Data Aggregation of CSV files java

- by royB

I have k csv files (5 csv files for example), each file has m fields which produce a key and n values. I need to produce a single csv file with aggregated data. I'm looking for the most efficient solution for this problem, speed mainly. I don't think by the way that we will have memory issues. Also I would like to know if hashing is really a good solution because we will have to use 64 bit hashing solution to reduce the chance for a collision to less than 1% (we are having around 30000000 rows per aggregation). For example file 1: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,50,60,70,80 a3,b2,c4,60,60,80,90 file 2: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,30,50,90,40 a3,b2,c4,30,70,50,90 result: f1,f2,f3,v1,v2,v3,v4 a1,b1,c1,80,110,160,120 a3,b2,c4,90,130,130,180 algorithm that we thought until now: hashing (using concurentHashTable) merge sorting the files DB: using mysql or hadoop or redis. The solution needs to be able to handle Huge amount of data (each file more than two million rows) a better example: file 1 country,city,peopleNum england,london,1000000 england,coventry,500000 file 2: country,city,peopleNum england,london,500000 england,coventry,500000 england,manchester,500000 merged file: country,city,peopleNum england,london,1500000 england,coventry,1000000 england,manchester,500000 The key is: country,city. This is just an example, my real key is of size 6 and the data columns are of size 8 - total of 14 columns. We would like that the solution will be the fastest in regard of data processing.

Read the article

SQL – Download FREE Book – Data Access for HighlyScalable Solutions: Using SQL, NoSQL, and Polyglot Persistence

- by Pinal Dave

Recently I was preparing for Big Data and I ended up on very interesting read for everybody. This is created by Microsoft and it is indeed a fantastic read as per my opinion. It took me some time to read this entire book but it was worth reading this as it tried to answer two of the very interesting questions related to muscle. Here is the abstract from the book: Organizations seeking to use a NoSQL database are therefore faced with a twofold challenge: • Which NoSQL database(s) best meet(s) the needs of the organization? • How does an organization integrate a NoSQL database into its solutions? As I keep on reading the book, I find it very interesting and informative. I suggest if you have time this weekend, download the book and read it. This guide focuses on the most common types of NoSQL database currently available, describes the situations for which they are most suited, and shows examples of how you might incorporate them into a business application. The guide summarizes the experiences of a fictitious organization named Adventure Works, who implemented a solution that comprised an assortment of different databases. Download Data Access for HighlyScalable Solutions: Using SQL, NoSQL, and Polyglot Persistence While we are talking about Big Data and NoSQL do not forget to check out my tomorrow’s blog as I am going to talk about the same subject and it will be very interesting. Reference: Pinal Dave (http://blog.sqlauthority.com) Filed under: Big Data, NoSQL, PostADay, SQL, SQL Authority, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

Search Results

Search found 60391 results on 2416 pages for 'data generation'.

Page 12/2416 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

- by dimo414

- by Tom

- by user336685

- by Jason

- by Petteri Hietavirta

- by bob

- by Bradford Larsen

- by Yktula

- by OscarTheGrouch

- by Luca Bartoletti

- by Boaz

- by Codeguru

- by Stephan Eggermont

- by John

- by pinaldave

- by jamiet

- by Sandrine Riley

- by nickbadal

- by stephenmm

- by Ian Boyd

- by ern

- by willOEM

- by [email protected]

- by royB

- by Pinal Dave

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >