large scale nat - Page 69

XML streaming with XProc.

- by Pierre

Hi all, I'm playing with xproc, the XML pipeline language and http://xmlcalabash.com/. I'd like to find an example for streaming large xml documents. for example, given the following huge xml document: <Books> <Book> <title>Book-1</title> </Book> <Book> <title>Book-2</title> </Book> <Book> <title>Book-3</title> </Book>  <Book> <title>Book-N</title> </Book> </Books> How should I proceed to loop (streaming) over x-N documents like <Books> <Book> <title>Book-x</title> </Book> </Books> and treat each document with a xslt ? is it possible with xproc ?

Read the article

Unable to load huge XML document (incorrectly suppose it's due to the XSLT processing)

- by krisvandenbergh

I'm trying to match certain elements using XSLT. My input document is very large and the source XML fails to load after processing the following code (consider especially the first line). <xsl:template match="XMI/XMI.content/Model_Management.Model/Foundation.Core.Namespace.ownedElement/Model_Management.Package/Foundation.Core.Namespace.ownedElement"> <rdf:RDF> <rdf:Description rdf:about=""> <xsl:for-each select="Foundation.Core.Class"> <xsl:for-each select="Foundation.Core.ModelElement.name"> <owl:Class rdf:ID="@Foundation.Core.ModelElement.name" /> </xsl:for-each> </xsl:for-each> </rdf:Description> </rdf:RDF> </xsl:template> Apparently the XSLT fails to load after "Model_Management.Model". The PHP code is as follows: if ($xml->loadXML($source_xml) == false) { die('Failed to load source XML: ' . $http_file); } It then fails to perform loadXML and immediately dies. I think there are two options now. 1) I should set a maximum executing time. Frankly, I don't know how that I do this for the built-in PHP 5 XSLT processor. 2) Think about another way to match. What would be the best way to deal with this? The input document can be found at http://krisvandenbergh.be/uml_pricing.xml Any help would be appreciated! Thanks.

Read the article

Copying a foreign Subversion repository to keep under dependencies

- by Jonathan Sternberg

I want to keep dependencies for my project in our own repository, that way we have consistent libraries for the entire team to work with. For example, I want our project to use the Boost libraries. I've seen this done in the past with putting dependencies under a "vendor" or "dependencies" folder. But I still want to be able to update these dependencies. If a new feature appears in a library and we need it, I want to just be able to update that repository within our own repository. I don't want to have to recopy it and put it under version control again. I'd also like for us to have the ability to change dependencies if a small change is needed without stopping us from ever updating the library. I want the ability to do something like 'svn cp', then be able to 'svn merge' in the future. I just tried this with the boost trunk, but I'm not able to get any history using 'svn log' on the copy I made. How do I do this? What is usually done for large projects with dependencies?

Read the article

Improving File Read Performance (single file, C++, Windows)

- by david

I have large (hundreds of MB or more) files that I need to read blocks from using C++ on Windows. Currently the relevant functions are: errorType LargeFile::read( void* data_out, __int64 start_position, __int64 size_bytes ) const { if( !m_open ) { // return error } else { seekPosition( start_position ); DWORD bytes_read; BOOL result = ReadFile( m_file, data_out, DWORD( size_bytes ), &bytes_read, NULL ); if( size_bytes != bytes_read || result != TRUE ) { // return error } } // return no error } void LargeFile::seekPosition( __int64 position ) const { LARGE_INTEGER target; target.QuadPart = LONGLONG( position ); SetFilePointerEx( m_file, target, NULL, FILE_BEGIN ); } The performance of the above does not seem to be very good. Reads are on 4K blocks of the file. Some reads are coherent, most are not. A couple questions: Is there a good way to profile the reads? What things might improve the performance? For example, would sector-aligning the data be useful? I'm relatively new to file i/o optimization, so suggestions or pointers to articles/tutorials would be helpful.

Read the article

Read/Write/Find/Replace huge csv file

- by notapipe

I have a huge (4,5 GB) csv file.. I need to perform basic cut and paste, replace operations for some columns.. the data is pretty well organized.. the only problem is I cannot play with it with Excel because of the size (2000 rows, 550000 columns). here is some part of the data: ID,Affection,Sex,DRB1_1,DRB1_2,SENum,SEStatus,AntiCCP,RFUW,rs3094315,rs12562034,rs3934834,rs9442372,rs3737728 D0024949,0,F,0101,0401,SS,yes,?,?,A_A,A_A,G_G,G_G D0024302,0,F,0101,7,SN,yes,?,?,A_A,G_G,A_G,?_? D0023151,0,F,0101,11,SN,yes,?,?,A_A,G_G,G_G,G_G I need to remove 4th, 5th, 6th, 7th, 8th and 9th columns; I need to find every _ character from column 10 onwards and replace it with a space ( ) character; I need to replace every ? with zero (0); I need to replace every comma with a tab; I need to remove first row (that has column names; I need to replace every 0 with 1, every 1 with 2 and every ? with 0 in 2nd column; I need to replace F with 2, M with 1 and ? with 0 in 3rd column; so that in the resulting file the output reads: D0024949 1 2 A A A A G G G G D0024302 1 2 A A G G A G 0 0 D0023151 1 2 A A G G G G G G (both input and output should read one line per row, ne extra blank row) Is there a memory efficient way of doing that with java(and I need a code to do that) or a usable tool for playing with this large data so that I can easily apply Excel functionality..

Read the article

MySQL - What is wrong with this query or my database? Terrible performance.

- by Moss

SELECT * from `employees` a LEFT JOIN (SELECT phone1 p1, count(*) c, FROM `employees` GROUP BY phone1) b ON a.phone1 = b.p1; I'm not sure if it is this query in particular that has the problem. I have been getting terrible performance in general with this database. The table in question has 120,000 rows. I have tried this particular query remotely and locally with the MyISAM and InnoDB engines, with different types of joins, and with and without an index on phone1. I can get this to complete in about 4 minutes on a 10,000 row table successfully but performance drops exponentially with larger tables. Remotely it will lose connection to the server and locally it brings my system to its knees and seems to go on forever. This query is only a smaller step I was trying to do when a larger query couldn't complete. Maybe I should explain the whole scenario. I have one big flat ugly table that lists a bunch of people and their contact info and the info of the companies they work for. I'm trying to normalize the database and intelligently determine which phone numbers apply to individual people and which apply to an office location. My reasoning is that if a phone number occurs multiple times and the number of occurrence equals the number of times that the street address it is attached to occurs then it must be an office number. So the first step is to count each phone number grouping by phone number. Normally if you just use COUNT()...GROUP BY it will only list the first record it finds in that group so I figured I have to join the full table to the count table where the phone number matches. This does work but as I said I can't successfully complete it on any table much larger than 10,000 rows. This seems pathetic and this doesn't seem like a crazy query to do. Is there a better way to achieve what I want or do I have to break my large table into 12 pieces or is there something wrong with the table or db?

Read the article

Given a trace of packets, how would you group them into flows?

- by zxcvbnm

I've tried it these ways so far: 1) Make a hash with the source IP/port and destination IP/port as keys. Each position in the hash is a list of packets. The hash is then saved in a file, with each flow separated by some special characters/line. Problem: Not enough memory for large traces. 2) Make a hash with the same key as above, but only keep in memory the file handles. Each packet is then put into the hash[key] that points to the right file. Problems: Too many flows/files (~200k) and it might run out of memory as well. 3) Hash the source IP/port and destination IP/port, then put the info inside a file. The difference between 2 and 3 is that here the files are opened and closed for each operation, so I don't have to worry about running out of memory because I opened too many at the same time. Problems: WAY too slow, same number of files as 2 so also impractical. 4) Make a hash of the source IP/port pairs and then iterate over the whole trace for each flow. Take the packets that are part of that flow and place them into the output file. Problem: Suppose I have a 60 MB trace that has 200k flows. This way, I would process, say, a 60 MB file 200k times. Maybe removing the packets as I iterate would make it not so painful, but so far I'm not sure this would be a good solution. 5) Split them by IP source/destination and then create a single file for each one, separating the flows by special characters. Still too many files (+50k). Right now I'm using Ruby to do it, which might've been a bad idea, I guess. Currently I've filtered the traces with tshark so that they only have relevant info, so I can't really make them any smaller. I thought about loading everything in memory as described in 1) using C#/Java/C++, but I was wondering if there wouldn't be a better approach here, especially since I might also run out of memory later on even with a more efficient language if I have to use larger traces. In summary, the problem I'm facing is that I either have too many files or that I run out of memory. I've also tried searching for some tool to filter the info, but I don't think there is one. The ones I've found only return some statistics and wouldn't scan for every flow as I need.

Read the article

How to make flash movie to scale proportunatly to div width?

- by user73119

I have put together an example page detailing my problem My website is going to have a main wrapper that is set to a max-width property for compatible browsers. It will stretch to 940px across at max. When scaled down I would like the swf to scale proportionately with it. Like an image with width percent applied. The flash movie has the dimensions of 940 × 360 pixels. I can't seem to figure out the correct attributes to add to the embed tag to get it to do this. I am currently using jquery flash embed, but am open to other options, though this is my ideal. In the example I have set the flash background to black. When resize the browser window the flash movie doesn't scale proportionately to the div, only the photo does, leaving a blank canvas (black), while the div height stays the same. I can't add a height value in the CSS. How do I make this scale correctly? Adding a noscale param only crops the image. The swf's height doesn't scale also. All of my code can be viewed in the linked examples source.

Read the article

Music and Mathematics. Finding the natural scale matemathically. Is this correct?

- by Alfonso de la Osa

Hi! I wrote this post Music and Mathematics, finding the Natural and the Pentathonic scales. Central A at 383,56661 Hz. Is a method to find the Natural scale. I want to discuss it and find if its true. This is the code of the reasoning in js. <script> var c = 1.714285714285714; var tot = 0; var scale = []; while(tot < (14 - c)){ tot += c; scale.push(Math.round(tot)); } if(scale.length == 8){ document.write(scale + " " + c + "<br />"); } </script>

Read the article

Music and Mathematics. Finding the natural scale generator. The best way?

- by Alfonso de la Osa

Hi! I wrote this post Music and Mathematics, finding the Natural and the Pentatonic scales. Is a method to find the Natural scale. I want to discuss it and find if its true. This is the code of the reasoning in js. <script> var c = 12/7; var tot = 0; var scale = []; while(tot < (14 - c)){ tot += c; scale.push(Math.round(tot)); } if(scale.length == 8){ document.write(scale + " " + c + "<br />"); } </script>

Read the article

Windows Azure Service Bus Splitter and Aggregator

- by Alan Smith

This article will cover basic implementations of the Splitter and Aggregator patterns using the Windows Azure Service Bus. The content will be included in the next release of the “Windows Azure Service Bus Developer Guide”, along with some other patterns I am working on. I’ve taken the pattern descriptions from the book “Enterprise Integration Patterns” by Gregor Hohpe. I bought a copy of the book in 2004, and recently dusted it off when I started to look at implementing the patterns on the Windows Azure Service Bus. Gregor has also presented an session in 2011 “Enterprise Integration Patterns: Past, Present and Future” which is well worth a look. I’ll be covering more patterns in the coming weeks, I’m currently working on Wire-Tap and Scatter-Gather. There will no doubt be a section on implementing these patterns in my “SOA, Connectivity and Integration using the Windows Azure Service Bus” course. There are a number of scenarios where a message needs to be divided into a number of sub messages, and also where a number of sub messages need to be combined to form one message. The splitter and aggregator patterns provide a definition of how this can be achieved. This section will focus on the implementation of basic splitter and aggregator patens using the Windows Azure Service Bus direct programming model. In BizTalk Server receive pipelines are typically used to implement the splitter patterns, with sequential convoy orchestrations often used to aggregate messages. In the current release of the Service Bus, there is no functionality in the direct programming model that implements these patterns, so it is up to the developer to implement them in the applications that send and receive messages. Splitter A message splitter takes a message and spits the message into a number of sub messages. As there are different scenarios for how a message can be split into sub messages, message splitters are implemented using different algorithms. The Enterprise Integration Patterns book describes the splatter pattern as follows: How can we process a message if it contains multiple elements, each of which may have to be processed in a different way? Use a Splitter to break out the composite message into a series of individual messages, each containing data related to one item. The Enterprise Integration Patterns website provides a description of the Splitter pattern here. In some scenarios a batch message could be split into the sub messages that are contained in the batch. The splitting of a message could be based on the message type of sub-message, or the trading partner that the sub message is to be sent to. Aggregator An aggregator takes a stream or related messages and combines them together to form one message. The Enterprise Integration Patterns book describes the aggregator pattern as follows: How do we combine the results of individual, but related messages so that they can be processed as a whole? Use a stateful filter, an Aggregator, to collect and store individual messages until a complete set of related messages has been received. Then, the Aggregator publishes a single message distilled from the individual messages. The Enterprise Integration Patterns website provides a description of the Aggregator pattern here. A common example of the need for an aggregator is in scenarios where a stream of messages needs to be combined into a daily batch to be sent to a legacy line-of-business application. The BizTalk Server EDI functionality provides support for batching messages in this way using a sequential convoy orchestration. Scenario The scenario for this implementation of the splitter and aggregator patterns is the sending and receiving of large messages using a Service Bus queue. In the current release, the Windows Azure Service Bus currently supports a maximum message size of 256 KB, with a maximum header size of 64 KB. This leaves a safe maximum body size of 192 KB. The BrokeredMessage class will support messages larger than 256 KB; in fact the Size property is of type long, implying that very large messages may be supported at some point in the future. The 256 KB size restriction is set in the service bus components that are deployed in the Windows Azure data centers. One of the ways of working around this size restriction is to split large messages into a sequence of smaller sub messages in the sending application, send them via a queue, and then reassemble them in the receiving application. This scenario will be used to demonstrate the pattern implementations. Implementation The splitter and aggregator will be used to provide functionality to send and receive large messages over the Windows Azure Service Bus. In order to make the implementations generic and reusable they will be implemented as a class library. The splitter will be implemented in the LargeMessageSender class and the aggregator in the LargeMessageReceiver class. A class diagram showing the two classes is shown below. Implementing the Splitter The splitter will take a large brokered message, and split the messages into a sequence of smaller sub-messages that can be transmitted over the service bus messaging entities. The LargeMessageSender class provides a Send method that takes a large brokered message as a parameter. The implementation of the class is shown below; console output has been added to provide details of the splitting operation. public class LargeMessageSender { private static int SubMessageBodySize = 192 * 1024; private QueueClient m_QueueClient; public LargeMessageSender(QueueClient queueClient) { m_QueueClient = queueClient; } public void Send(BrokeredMessage message) { // Calculate the number of sub messages required. long messageBodySize = message.Size; int nrSubMessages = (int)(messageBodySize / SubMessageBodySize); if (messageBodySize % SubMessageBodySize != 0) { nrSubMessages++; } // Create a unique session Id. string sessionId = Guid.NewGuid().ToString(); Console.WriteLine("Message session Id: " + sessionId); Console.Write("Sending {0} sub-messages", nrSubMessages); Stream bodyStream = message.GetBody<Stream>(); for (int streamOffest = 0; streamOffest < messageBodySize; streamOffest += SubMessageBodySize) { // Get the stream chunk from the large message long arraySize = (messageBodySize - streamOffest) > SubMessageBodySize ? SubMessageBodySize : messageBodySize - streamOffest; byte[] subMessageBytes = new byte[arraySize]; int result = bodyStream.Read(subMessageBytes, 0, (int)arraySize); MemoryStream subMessageStream = new MemoryStream(subMessageBytes); // Create a new message BrokeredMessage subMessage = new BrokeredMessage(subMessageStream, true); subMessage.SessionId = sessionId; // Send the message m_QueueClient.Send(subMessage); Console.Write("."); } Console.WriteLine("Done!"); }} The LargeMessageSender class is initialized with a QueueClient that is created by the sending application. When the large message is sent, the number of sub messages is calculated based on the size of the body of the large message. A unique session Id is created to allow the sub messages to be sent as a message session, this session Id will be used for correlation in the aggregator. A for loop in then used to create the sequence of sub messages by creating chunks of data from the stream of the large message. The sub messages are then sent to the queue using the QueueClient. As sessions are used to correlate the messages, the queue used for message exchange must be created with the RequiresSession property set to true. Implementing the Aggregator The aggregator will receive the sub messages in the message session that was created by the splitter, and combine them to form a single, large message. The aggregator is implemented in the LargeMessageReceiver class, with a Receive method that returns a BrokeredMessage. The implementation of the class is shown below; console output has been added to provide details of the splitting operation. public class LargeMessageReceiver { private QueueClient m_QueueClient; public LargeMessageReceiver(QueueClient queueClient) { m_QueueClient = queueClient; } public BrokeredMessage Receive() { // Create a memory stream to store the large message body. MemoryStream largeMessageStream = new MemoryStream(); // Accept a message session from the queue. MessageSession session = m_QueueClient.AcceptMessageSession(); Console.WriteLine("Message session Id: " + session.SessionId); Console.Write("Receiving sub messages"); while (true) { // Receive a sub message BrokeredMessage subMessage = session.Receive(TimeSpan.FromSeconds(5)); if (subMessage != null) { // Copy the sub message body to the large message stream. Stream subMessageStream = subMessage.GetBody<Stream>(); subMessageStream.CopyTo(largeMessageStream); // Mark the message as complete. subMessage.Complete(); Console.Write("."); } else { // The last message in the sequence is our completeness criteria. Console.WriteLine("Done!"); break; } } // Create an aggregated message from the large message stream. BrokeredMessage largeMessage = new BrokeredMessage(largeMessageStream, true); return largeMessage; } } The LargeMessageReceiver initialized using a QueueClient that is created by the receiving application. The receive method creates a memory stream that will be used to aggregate the large message body. The AcceptMessageSession method on the QueueClient is then called, which will wait for the first message in a message session to become available on the queue. As the AcceptMessageSession can throw a timeout exception if no message is available on the queue after 60 seconds, a real-world implementation should handle this accordingly. Once the message session as accepted, the sub messages in the session are received, and their message body streams copied to the memory stream. Once all the messages have been received, the memory stream is used to create a large message, that is then returned to the receiving application. Testing the Implementation The splitter and aggregator are tested by creating a message sender and message receiver application. The payload for the large message will be one of the webcast video files from http://www.cloudcasts.net/, the file size is 9,697 KB, well over the 256 KB threshold imposed by the Service Bus. As the splitter and aggregator are implemented in a separate class library, the code used in the sender and receiver console is fairly basic. The implementation of the main method of the sending application is shown below. static void Main(string[] args) { // Create a token provider with the relevant credentials. TokenProvider credentials = TokenProvider.CreateSharedSecretTokenProvider (AccountDetails.Name, AccountDetails.Key); // Create a URI for the serivce bus. Uri serviceBusUri = ServiceBusEnvironment.CreateServiceUri ("sb", AccountDetails.Namespace, string.Empty); // Create the MessagingFactory MessagingFactory factory = MessagingFactory.Create(serviceBusUri, credentials); // Use the MessagingFactory to create a queue client QueueClient queueClient = factory.CreateQueueClient(AccountDetails.QueueName); // Open the input file. FileStream fileStream = new FileStream(AccountDetails.TestFile, FileMode.Open); // Create a BrokeredMessage for the file. BrokeredMessage largeMessage = new BrokeredMessage(fileStream, true); Console.WriteLine("Sending: " + AccountDetails.TestFile); Console.WriteLine("Message body size: " + largeMessage.Size); Console.WriteLine(); // Send the message with a LargeMessageSender LargeMessageSender sender = new LargeMessageSender(queueClient); sender.Send(largeMessage); // Close the messaging facory. factory.Close(); } The implementation of the main method of the receiving application is shown below. static void Main(string[] args) { // Create a token provider with the relevant credentials. TokenProvider credentials = TokenProvider.CreateSharedSecretTokenProvider (AccountDetails.Name, AccountDetails.Key); // Create a URI for the serivce bus. Uri serviceBusUri = ServiceBusEnvironment.CreateServiceUri ("sb", AccountDetails.Namespace, string.Empty); // Create the MessagingFactory MessagingFactory factory = MessagingFactory.Create(serviceBusUri, credentials); // Use the MessagingFactory to create a queue client QueueClient queueClient = factory.CreateQueueClient(AccountDetails.QueueName); // Create a LargeMessageReceiver and receive the message. LargeMessageReceiver receiver = new LargeMessageReceiver(queueClient); BrokeredMessage largeMessage = receiver.Receive(); Console.WriteLine("Received message"); Console.WriteLine("Message body size: " + largeMessage.Size); string testFile = AccountDetails.TestFile.Replace(@"\In\", @"\Out\"); Console.WriteLine("Saving file: " + testFile); // Save the message body as a file. Stream largeMessageStream = largeMessage.GetBody<Stream>(); largeMessageStream.Seek(0, SeekOrigin.Begin); FileStream fileOut = new FileStream(testFile, FileMode.Create); largeMessageStream.CopyTo(fileOut); fileOut.Close(); Console.WriteLine("Done!"); } In order to test the application, the sending application is executed, which will use the LargeMessageSender class to split the message and place it on the queue. The output of the sender console is shown below. The console shows that the body size of the large message was 9,929,365 bytes, and the message was sent as a sequence of 51 sub messages. When the receiving application is executed the results are shown below. The console application shows that the aggregator has received the 51 messages from the message sequence that was creating in the sending application. The messages have been aggregated to form a massage with a body of 9,929,365 bytes, which is the same as the original large message. The message body is then saved as a file. Improvements to the Implementation The splitter and aggregator patterns in this implementation were created in order to show the usage of the patterns in a demo, which they do quite well. When implementing these patterns in a real-world scenario there are a number of improvements that could be made to the design. Copying Message Header Properties When sending a large message using these classes, it would be great if the message header properties in the message that was received were copied from the message that was sent. The sending application may well add information to the message context that will be required in the receiving application. When the sub messages are created in the splitter, the header properties in the first message could be set to the values in the original large message. The aggregator could then used the values from this first sub message to set the properties in the message header of the large message during the aggregation process. Using Asynchronous Methods The current implementation uses the synchronous send and receive methods of the QueueClient class. It would be much more performant to use the asynchronous methods, however doing so may well affect the sequence in which the sub messages are enqueued, which would require the implementation of a resequencer in the aggregator to restore the correct message sequence. Handling Exceptions In order to keep the code readable no exception handling was added to the implementations. In a real-world scenario exceptions should be handled accordingly.

Read the article

How does I/O work for large graph databases?

- by tjb1982

I should preface this by saying that I'm mostly a front end web developer, trained as a musician, but over the past few years I've been getting more and more into computer science. So one idea I have as a fun toy project to learn about data structures and C programming was to design and implement my own very simple database that would manage an adjacency list of posts. I don't want SQL (maybe I'll do my own query language? I'm just having fun). It should support ACID. It should be capable of storing 1TB let's say. So with that, I was trying to think of how a database even stores data, without regard to data structures necessarily. I'm working on linux, and I've read that in that world "everything is a file," including hardware (like /dev/*), so I think that that obviously has to apply to a database, too, and it clearly does--whether it's MySQL or PostgreSQL or Neo4j, the database itself is a collection of files you can see in the filesystem. That said, there would come a point in scale where loading the entire database into primary memory just wouldn't work, so it doesn't make sense to design it with that mindset (I assume). However, reading from secondary memory would be much slower and regardless some portion of the database has to be in primary memory in order for you to be able to do anything with it. I read this post: Why use a database instead of just saving your data to disk? And I found it difficult to understand how other databases, like SQLite or Neo4j, read and write from secondary memory and are still very fast (faster, it would seem, than simply writing files to the filesystem as the above question suggests). It seems the key is indexing. But even indexes need to be stored in secondary memory. They are inherently smaller than the database itself, but indexes in a very large database might be prohibitively large, too. So my question is how is I/O generally done with large databases like the one I described above that would be at least 1TB storing a big adjacency list? If indexing is more or less the answer, how exactly does indexing work--what data structures should be involved?

Read the article

Why do large IT projects tend to fail or have big cost/schedule overruns?

- by Pratik

I always read about large scale transformation or integration project that are total or almost total disaster. Even if they somehow manage to succeed the cost and schedule blow out is enormous. What is the real reason behind large projects being more prone to failure. Can agile be used in these sort of projects or traditional approach is still the best. One example from Australia is the Queensland Payroll project where they changed test success criteria to deliver the project. See some more failed projects in this SO question Have you got any personal experience to share?

Read the article

How can I create multiple identical AWS EC2 server instances with large amounts of persistent data?

- by mojones

I have a CPU-intensive data-processing application that I want to run across many (~100,000) input files. The application needs a large (~20GB) data file in order to run. What I would like to do is create an EC2 machine image that has my application and associated data files installed boot up a large number (e.g. 100) of instances of this image split my input files up into 100 batches and send one batch to be processed on each instance I am having trouble figuring out the best way to ensure that each instance has access to the large data file. The data file is too big to fit on the root filesystem of an AMI. I could use Block Storage, but a given Block Storage volume can only be attached to a single instance, so I would need 100 clones. Is there some way to create a custom image that has more space on the root filsystem so that I can include my large data file? Or is there a better way to tackle this problem?

Read the article

Breaking up a large PHP object used to abstract the database. Best practices?

- by John Kershaw

Two years ago it was thought a single object with functions such as $database->get_user_from_id($ID) would be a good idea. The functions return objects (not arrays), and the front-end code never worries about the database. This was great, until we started growing the database. There's now 30+ tables, and around 150 functions in the database object. It's getting impractical and unmanageable and I'm going to be breaking it up. What is a good solution to this problem? The project is large, so there's a limit to the extent I can change things. My current plan is to extend the current object for each table, then have the database object contain these. So, the above example would turn into (assume "user" is a table) $database->user->get_user_from_id($ID). Instead of one large file, we would have a file for every table.

Read the article

Cloud Computing Forces Better Design Practices

- by Herve Roggero

Is cloud computing simply different than on premise development, or is cloud computing actually forcing you to create better applications than you normally would? In other words, is cloud computing merely imposing different design principles, or forcing better design principles? A little while back I got into a discussion with a developer in which I was arguing that cloud computing, and specifically Windows Azure in his case, was forcing developers to adopt better design principles. His opinion was that cloud computing was not yielding better systems; just different systems. In this blog, I will argue that cloud computing does force developers to use better design practices, and hence better applications. So the first thing to define, of course, is the word “better”, in the context of application development. Looking at a few definitions online, better means “superior quality”. As it relates to this discussion then, I stipulate that cloud computing can yield higher quality applications in terms of scalability, everything else being equal. Before going further I need to also outline the difference between performance and scalability. Performance and scalability are two related concepts, but they don’t mean the same thing. Scalability is the measure of system performance given various loads. So when developers design for performance, they usually give higher priority to a given load and tend to optimize for the given load. When developers design for scalability, the actual performance at a given load is not as important; the ability to ensure reasonable performance regardless of the load becomes the objective. This can lead to very different design choices. For example, if your objective is to obtains the fastest response time possible for a service you are building, you may choose the implement a TCP connection that never closes until the client chooses to close the connection (in other words, a tightly coupled service from a connectivity standpoint), and on which a connection session is established for faster processing on the next request (like SQL Server or other database systems for example). If you objective is to scale, you may implement a service that answers to requests without keeping session state, so that server resources are released as quickly as possible, like a REST service for example. This alternate design would likely have a slower response time than the TCP service for any given load, but would continue to function at very large loads because of its inherently loosely coupled design. An example of a REST service is the NO-SQL implementation in the Microsoft cloud called Azure Tables. Now, back to cloud computing… Cloud computing is designed to help you scale your applications, specifically when you use Platform as a Service (PaaS) offerings. However it’s not automatic. You can design a tightly-coupled TCP service as discussed above, and as you can imagine, it probably won’t scale even if you place the service in the cloud because it isn’t using a connection pattern that will allow it to scale [note: I am not implying that all TCP systems do not scale; I am just illustrating the scalability concepts with an imaginary TCP service that isn’t designed to scale for the purpose of this discussion]. The other service, using REST, will have a better chance to scale because, by design, it minimizes resource consumption for individual requests and doesn’t tie a client connection to a specific endpoint (which means you can easily deploy this service to hundreds of machines without much trouble, as long as your pockets are deep enough). The TCP and REST services discussed above are both valid designs; the TCP service is faster and the REST service scales better. So is it fair to say that one service is fundamentally better than the other? No; not unless you need to scale. And if you don’t need to scale, then you don’t need the cloud in the first place. However, it is interesting to note that if you do need to scale, then a loosely coupled system becomes a better design because it can almost always scale better than a tightly-coupled system. And because most applications grow overtime, with an increasing user base, new functional requirements, increased data and so forth, most applications eventually do need to scale. So in my humble opinion, I conclude that a loosely coupled system is not just different than a tightly coupled system; it is a better design, because it will stand the test of time. And in my book, if a system stands the test of time better than another, it is of superior quality. Because cloud computing demands loosely coupled systems so that its underlying service architecture can be leveraged, developers ultimately have no choice but to design loosely coupled systems for the cloud. And because loosely coupled systems are better… … the cloud forces better design practices. My 2 cents.

Read the article

Why use C++ when C works for large projects equally well?

- by Karl

Before I start, please DO NOT make this into a C vs C++ flamewar. This question has nothing to do with which language is better or not. Period. I have read that C++ is said to be fit for large projects. After all, it makes managing code easier. OO and other features, for example the STL. But then why use C++ when C works equally well for large projects? Take the example of the Linux kernel. Or GNOME. Or even Windows I guess, it is written in C right? So why bother at all with the complexity of C++ (templates and all that), when C works well and this is not just a statement, but proper examples have been quoted. If it works for projects of magnitude of the kernel, why is C++ preferred or why is C not used for almost all projects?

Read the article

Can I copy large files faster without using the file cache?

- by Veazer

After adding the preload package, my applications seem to speed up but if I copy a large file, the file cache grows by more than double the size of the file. By transferring a single 3-4 GB virtualbox image or video file to an external drive, this huge cache seems to remove all the preloaded applications from memory, leading to increased load times and general performance drops. Is there a way to copy large, multi-gigabyte files without caching them (i.e. bypassing the file cache)? Or a way to whitelist or blacklist specific folders from being cached?

Read the article

What techniques can I use to render very large numbers of objects more efficiently in OpenGL?

- by Luke

You can think of my application as drawing a very large ball-and-stick diagram (or graph). At times, this graph can get very large, where the number of elements even outnumbers the pixels on the screen. Currently I am simply passing all of my textures (as GL_POINTS) and lines to the graphics card using VBO's. When the number of elements outnumbers the number of pixels, is this the most efficient way to do this? Or should I do some calculations on the CPU side before handing everything over to the GPU? If it matters, I do use GL_DEPTH_TEST and GL_ALPHA_TEST. I do some alpha blending, but probably not enough to make a huge performance difference. My scene can be static at times, but the user has control over a typical arc-ball camera and can pan, rotate, or zoom. It is during these operations that performance degradation is noticeable.

Read the article

How do I scale EC2 and push out code / data to my instances?

- by chris

Unfortunately I only have a limited knowledge of server architecture, I come from a development background. I am looking to ensure my new app can scale properly using EC2. I currently have a T1.micro for development running Windows with SQL server 2008. The system allows students to come to our site to search for a mentor, update their profile with pictures and employment history etc. Roughly the same sort of work as a LinkedIn profile. I need this to be able to scale very quickly without wasted resources. I understand the following is important. Separation of data, application etc. I will achieve this I think by hosting images using S3, Database instance via RDS and upgrade the EC2 instance. My main question is: How do I push data / code out to multiple ec2 / RDS instances seamlessly?

Read the article

What are the best tools to help work with large ant files.

- by klfox

I just started working at a company that has a very large ant build file that imports lots of other large/small ant files. Needless to say it's giving me a headache trying to figure out what is going on. What are the best tools out there for: Getting some kind of concise answer on what is happening Visualizing the various targets Seeing performance on tasks Can be multiple tools. Any other tips/suggestions? I tagged this as java since I don't have the reputation to create an ant tag.

Read the article

Is CodeIgniter PHP Framework suitable for large ERP or Business Application?

- by adietan63

Is CodeIgniter is recommended for a large web based ERP or Business Application? I want to use CodeIgniter for my future Project and I'm so confused whether to use it or not. Im so worried about in the long term process or lifetime of the application that it may crashed or produce a bug or error. I also worried about the performance of the framework when the data becomes larger and containing millions of records. I searched on the internet the answer but there is no exactly answer that will satisfy me. I think this question is important for the programmers like me who wanted to use PHP Framework for their large business application. I need an advice from you guys in order to decide whether to use it or not. thank you very much!

Read the article

Does the QPI figure matter for large scale data processing using SAS?

- by xiaodai

At work we use SAS to manipulte large amounts of data everyday on our workstations. To give an indication of scale the largest merges we had to do was merging 24 files of 2GB in size each together into one big file (if you are familiar with SAS the files are binary compressed too!). If we upgrade our PCs to core i7 then which of Core i7 975 or Core i7 960 is better? The main difference between the two seems to be QPI. So does that affect large scale data processing such as merge datasets in SAS?

Read the article

Polling duplex does not scale... what's the alternative?

- by user80855

Our tests showed that the polling duplex binding simply does not scale and can not be used on a service within a web-farm or even a web garden. We have looked at TCP/IP sockets for a client push method, but the firewall issue is does allow us to use sockets. I was wondering what is the alternative "free" solution to this problem? allowing us to scale and allowing us to push data to client... I have also tried the solution in this article http://tomasz.janczuk.org/2009/09/scale-out-of-silverlight-http-polling.html but at the end, there was too much polling on a database, and performance was affected. Our Silverlight application need a pub/sub design, but it needs to be reliable and scalable... any ideas?

Read the article

Are there any small scale, durable document/object databases?

- by Joe Doyle

I have a few .Net projects that would benefit from using a document/object database opposed to a relational one. I think that db4o would be a good choice, but we're not sure how much the cost is. I'd love to use MongoDB but it's design isn't for small scale, single server applications. Are there other options out there that I just haven't run across for small scale applications? EDIT: So is this a space that doesn't have a good solution, yet? Are there no small scale & durable document databases? Would my best choice be to use MongoDB and set the --syncdelay option set to 1?

Search Results

Search found 13928 results on 558 pages for 'large scale nat'.

Page 69/558 | < Previous Page | 65 66 67 68 69 70 71 72 73 74 75 76 | Next Page >

- by Pierre

- by krisvandenbergh

- by Jonathan Sternberg

- by david

- by notapipe

- by Moss

- by zxcvbnm

- by user73119

- by Alfonso de la Osa

- by Alfonso de la Osa

- by Alan Smith

- by tjb1982

- by Pratik

- by mojones

- by John Kershaw

- by Herve Roggero

- by Karl

- by Veazer

- by Luke

- by chris

- by klfox

- by adietan63

- by xiaodai

- by user80855

- by Joe Doyle

< Previous Page | 65 66 67 68 69 70 71 72 73 74 75 76 | Next Page >