Search Results

Search found 12802 results on 513 pages for 'memory profiler'.

Page 150/513 | < Previous Page | 146 147 148 149 150 151 152 153 154 155 156 157  | Next Page >

  • Objective-c when to release objects

    - by Chris
    -(IBAction)registerUpdate:(id)sender { HTTPRequest* request = [[HTTPRequest alloc] initWithUrl:@"http://www.yahoo.com" delegate:self]; [request doRequest]; } The HTTPRequest makes an asynchronous request and calls the onHTTPResponse method in the current class. My question is do I have to release request? My guess is that I'm supposed to make it an instance variable? [NSString stringWithFormat:@"Data received: %@", [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding]]; How would I release that string object, or should I assign it to a variable?

    Read the article

  • When are temporaries created as part of a function call destroyed?

    - by Michael Mrozek
    Is a temporary created as part of an argument to a function call guaranteed to stay around until the called function ends, even if the temporary isn't passed directly to the function? There's virtually no chance that was coherent, so here's an example: class A { public: A(int x) : x(x) {printf("Constructed A(%d)\n", x);} ~A() {printf("Destroyed A\n");} int x; int* y() {return &x;} }; void foo(int* bar) { printf("foo(): %d\n", *bar); } int main(int argc, char** argv) { foo(A(4).y()); } If A(4) were passed directly to foo it would definitely not be destroyed until after the foo call ended, but instead I'm calling a method on the temporary and losing any reference to it. I would instinctively think the temporary A would be destroyed before foo even starts, but testing with GCC 4.3.4 shows it isn't; the output is: Constructed A(4) foo(): 4 Destroyed A The question is, is GCC's behavior guaranteed by the spec? Or is a compiler allowed to destroy the temporary A before the call to foo, invaliding the pointer to its member I'm using?

    Read the article

  • XNA: What is the point of Unload()?

    - by Rosarch
    XNA games have an Unload() method, where content is supposed to be unloaded. But what is the point of this? If all the content is being unloaded, then the game must be exiting, in which case everything would be garbage collected anyway, right?

    Read the article

  • Why this kind of release doesn't work?

    - by parkyprg
    Hello, I have a newbie question about the following: - (NSString *)tableView:(UITableView *)tableView titleForHeaderInSection:(NSInteger)section { NSArray *anArray; anArray = [dictionary objectForKey: [NSString stringWithFormat:@"%d", section]]; //here dictionary is of type NSDictionary, initialized in another place. AnObject *obj = [[AnObject alloc] init]; obj = [anArray objectAtIndex:0]; [anArray release]; return obj.title; } If I run it as it is I will get an error. If I don't put [anArray release] it works just fine. I don't quite understand why is this happening? Thanks.

    Read the article

  • Can I just release the top object (iPhone)?

    - by yar
    If I release the object that's holding a reference to the variable that I need to release, is that sufficient? Or must I release at every level of the containment hierarchy? I fear that my logic comes from working with a garbage collector for too long. For instance, I assigned to this property of a UIPickerView instance by hand instead of using IB @property(nonatomic, assign) id<UIPickerViewDelegate> delegate Since it's an assign property, I can't just release the reference after I assign it. When I finally release my UIPickerView instance, do I need to do this: [singlePicker.delegate release]; [singlePicker release]; or is the second line sufficient? Also: Are these assign properties the norm, or is that mostly for Interface Builder? I thought that retain properties were the normal thing to expect.

    Read the article

  • How to copy a structure with pointers to data inside (so to copy pointers and data they point to)?

    - by Kabumbus
    so I have a structure like struct GetResultStructure { int length; char* ptr; }; I need a way to make a full copy of it meaning I need a copy to have a structure with new ptr poinnting on to copy of data I had in original structure. Is It any how possible? I mean any structure I have which contains ptrs will have some fields with its lengths I need a function that would copy my structure coping all ptrs and data they point to by given array of lengthes... Any cool boost function for it? Or any way how to create such function?

    Read the article

  • The difference between delete and delete [] in C++

    - by Ilya Melamed
    I've written a class that contains two pointers, one is char* color_ and one in vertexesset* vertex_ where vertexesset is a class I created. In the destractor I've written at start delete [] color_; delete [] vertex_; When It came to the destructor it gave me a segmentation fault. Then I changed the destructor to: delete [] color_; delete vertex_; And now it works fine. What is the difference between the two?

    Read the article

  • Does LINQ require significantly more processing cycles and memory than lower-level data iteration techniques?

    - by Matthew Patrick Cashatt
    Background I am recently in the process of enduring grueling tech interviews for positions that use the .NET stack, some of which include silly questions like this one, and some questions that are more valid. I recently came across an issue that may be valid but I want to check with the community here to be sure. When asked by an interviewer how I would count the frequency of words in a text document and rank the results, I answered that I would Use a stream object put the text file in memory as a string. Split the string into an array on spaces while ignoring punctuation. Use LINQ against the array to .GroupBy() and .Count(), then OrderBy() said count. I got this answer wrong for two reasons: Streaming an entire text file into memory could be disasterous. What if it was an entire encyclopedia? Instead I should stream one block at a time and begin building a hash table. LINQ is too expensive and requires too many processing cycles. I should have built a hash table instead and, for each iteration, only added a word to the hash table if it didn't otherwise exist and then increment it's count. The first reason seems, well, reasonable. But the second gives me more pause. I thought that one of the selling points of LINQ is that it simply abstracts away lower-level operations like hash tables but that, under the veil, it is still the same implementation. Question Aside from a few additional processing cycles to call any abstracted methods, does LINQ require significantly more processing cycles to accomplish a given data iteration task than a lower-level task (such as building a hash table) would?

    Read the article

  • Creating foreign words' learning site with memory technique (Web 2.0)? Will it work?

    - by Michal P.
    I would like to earn a little money for realizing a good, simple project. My idea is to build a website for learning of chosen by me language (for users knowing English) using mnemonics. Users would be encourage to enter English words with translation to another language and describing the way, how to remember a foreign language word (an association link). Example: if I choose learning Spanish for people who knows English well, it would look like that: every user would be encourage to enter a way to remember a chosen by him/her Spanish word. So he/she would enter to the dictionary (my site database) ,e.g., English word: beach - playa (Spanish word). Then he/she would describe the method to remember Spanish word, e.g., "Image that U r on the beach and U play volleyball" - we have the word play and recall playa (mnemonics). I would like to give possibility of pic hotlinks, encourage for fun or little shocking memory links which is -- in the art of memory -- good. I would choose a language to take a niche of Google Search. The big question is if I don't lose my time on it?? (Maybe I need to find prototype way to check that idea?)

    Read the article

  • java System.nanoTime is really slow. Is it possible to implement a high performance java profiler?

    - by willpowerforever
    I did a test and found the overhead of a function call to System.nanoTime() is at least 500 ns on my machine. Seemed that it is very hard to have a high performance java profiler. For enterprise software, suppose a function takes about 350 seconds and has 12,500,000,000 times of method calls. Therefore, the number of calls to System.nanoTime() is: 12,500,000,000 * 2 = 25,000,000,000 (one for start timestamp, one for end timestamp) And the overhead of System.nanoTime in total is: 500 ns * 25,000,000,000 = 500 * 25000 s = 12500000s. Note: all data from real case. Any better way to acquire the timestamp?

    Read the article

  • How can I create a profiler GUI in WPF?

    - by thehan
    I am looking to create a task profiler in WPF with an UI similar to the one of incredibuild. Eg, timed tasks appear as strips on their respective line. Have a look at: http://baoz.net/wp-content/2009/06/ib1.png to see what I mean. I am unsure what the best way to tackle this problem is? Should I override the Panel class creating a custom layout panel for this or might there be an existing control/library out there that can fit my needs. I have so far had a look at various charting controls but am unsure of weither they can be tweaked to achieve this behaviour or not. I am really hoping for someone who has been using WPF for a while to help me narrow down my search. It is not exactly easy to know what to google for either :)

    Read the article

  • How to find and fix performance problems in ORM powered applications

    - by FransBouma
    Once in a while we get requests about how to fix performance problems with our framework. As it comes down to following the same steps and looking into the same things every single time, I decided to write a blogpost about it instead, so more people can learn from this and solve performance problems in their O/R mapper powered applications. In some parts it's focused on LLBLGen Pro but it's also usable for other O/R mapping frameworks, as the vast majority of performance problems in O/R mapper powered applications are not specific for a certain O/R mapper framework. Too often, the developer looks at the wrong part of the application, trying to fix what isn't a problem in that part, and getting frustrated that 'things are so slow with <insert your favorite framework X here>'. I'm in the O/R mapper business for a long time now (almost 10 years, full time) and as it's a small world, we O/R mapper developers know almost all tricks to pull off by now: we all know what to do to make task ABC faster and what compromises (because there are almost always compromises) to deal with if we decide to make ABC faster that way. Some O/R mapper frameworks are faster in X, others in Y, but you can be sure the difference is mainly a result of a compromise some developers are willing to deal with and others aren't. That's why the O/R mapper frameworks on the market today are different in many ways, even though they all fetch and save entities from and to a database. I'm not suggesting there's no room for improvement in today's O/R mapper frameworks, there always is, but it's not a matter of 'the slowness of the application is caused by the O/R mapper' anymore. Perhaps query generation can be optimized a bit here, row materialization can be optimized a bit there, but it's mainly coming down to milliseconds. Still worth it if you're a framework developer, but it's not much compared to the time spend inside databases and in user code: if a complete fetch takes 40ms or 50ms (from call to entity object collection), it won't make a difference for your application as that 10ms difference won't be noticed. That's why it's very important to find the real locations of the problems so developers can fix them properly and don't get frustrated because their quest to get a fast, performing application failed. Performance tuning basics and rules Finding and fixing performance problems in any application is a strict procedure with four prescribed steps: isolate, analyze, interpret and fix, in that order. It's key that you don't skip a step nor make assumptions: these steps help you find the reason of a problem which seems to be there, and how to fix it or leave it as-is. Skipping a step, or when you assume things will be bad/slow without doing analysis will lead to the path of premature optimization and won't actually solve your problems, only create new ones. The most important rule of finding and fixing performance problems in software is that you have to understand what 'performance problem' actually means. Most developers will say "when a piece of software / code is slow, you have a performance problem". But is that actually the case? If I write a Linq query which will aggregate, group and sort 5 million rows from several tables to produce a resultset of 10 rows, it might take more than a couple of milliseconds before that resultset is ready to be consumed by other logic. If I solely look at the Linq query, the code consuming the resultset of the 10 rows and then look at the time it takes to complete the whole procedure, it will appear to me to be slow: all that time taken to produce and consume 10 rows? But if you look closer, if you analyze and interpret the situation, you'll see it does a tremendous amount of work, and in that light it might even be extremely fast. With every performance problem you encounter, always do realize that what you're trying to solve is perhaps not a technical problem at all, but a perception problem. The second most important rule you have to understand is based on the old saying "Penny wise, Pound Foolish": the part which takes e.g. 5% of the total time T for a given task isn't worth optimizing if you have another part which takes a much larger part of the total time T for that same given task. Optimizing parts which are relatively insignificant for the total time taken is not going to bring you better results overall, even if you totally optimize that part away. This is the core reason why analysis of the complete set of application parts which participate in a given task is key to being successful in solving performance problems: No analysis -> no problem -> no solution. One warning up front: hunting for performance will always include making compromises. Fast software can be made maintainable, but if you want to squeeze as much performance out of your software, you will inevitably be faced with the dilemma of compromising one or more from the group {readability, maintainability, features} for the extra performance you think you'll gain. It's then up to you to decide whether it's worth it. In almost all cases it's not. The reason for this is simple: the vast majority of performance problems can be solved by implementing the proper algorithms, the ones with proven Big O-characteristics so you know the performance you'll get plus you know the algorithm will work. The time taken by the algorithm implementing code is inevitable: you already implemented the best algorithm. You might find some optimizations on the technical level but in general these are minor. Let's look at the four steps to see how they guide us through the quest to find and fix performance problems. Isolate The first thing you need to do is to isolate the areas in your application which are assumed to be slow. For example, if your application is a web application and a given page is taking several seconds or even minutes to load, it's a good candidate to check out. It's important to start with the isolate step because it allows you to focus on a single code path per area with a clear begin and end and ignore the rest. The rest of the steps are taken per identified problematic area. Keep in mind that isolation focuses on tasks in an application, not code snippets. A task is something that's started in your application by either another task or the user, or another program, and has a beginning and an end. You can see a task as a piece of functionality offered by your application.  Analyze Once you've determined the problem areas, you have to perform analysis on the code paths of each area, to see where the performance problems occur and which areas are not the problem. This is a multi-layered effort: an application which uses an O/R mapper typically consists of multiple parts: there's likely some kind of interface (web, webservice, windows etc.), a part which controls the interface and business logic, the O/R mapper part and the RDBMS, all connected with either a network or inter-process connections provided by the OS or other means. Each of these parts, including the connectivity plumbing, eat up a part of the total time it takes to complete a task, e.g. load a webpage with all orders of a given customer X. To understand which parts participate in the task / area we're investigating and how much they contribute to the total time taken to complete the task, analysis of each participating task is essential. Start with the code you wrote which starts the task, analyze the code and track the path it follows through your application. What does the code do along the way, verify whether it's correct or not. Analyze whether you have implemented the right algorithms in your code for this particular area. Remember we're looking at one area at a time, which means we're ignoring all other code paths, just the code path of the current problematic area, from begin to end and back. Don't dig in and start optimizing at the code level just yet. We're just analyzing. If your analysis reveals big architectural stupidity, it's perhaps a good idea to rethink the architecture at this point. For the rest, we're analyzing which means we collect data about what could be wrong, for each participating part of the complete application. Reviewing the code you wrote is a good tool to get deeper understanding of what is going on for a given task but ultimately it lacks precision and overview what really happens: humans aren't good code interpreters, computers are. We therefore need to utilize tools to get deeper understanding about which parts contribute how much time to the total task, triggered by which other parts and for example how many times are they called. There are two different kind of tools which are necessary: .NET profilers and O/R mapper / RDBMS profilers. .NET profiling .NET profilers (e.g. dotTrace by JetBrains or Ants by Red Gate software) show exactly which pieces of code are called, how many times they're called, and the time it took to run that piece of code, at the method level and sometimes even at the line level. The .NET profilers are essential tools for understanding whether the time taken to complete a given task / area in your application is consumed by .NET code, where exactly in your code, the path to that code, how many times that code was called by other code and thus reveals where hotspots are located: the areas where a solution can be found. Importantly, they also reveal which areas can be left alone: remember our penny wise pound foolish saying: if a profiler reveals that a group of methods are fast, or don't contribute much to the total time taken for a given task, ignore them. Even if the code in them is perhaps complex and looks like a candidate for optimization: you can work all day on that, it won't matter.  As we're focusing on a single area of the application, it's best to start profiling right before you actually activate the task/area. Most .NET profilers support this by starting the application without starting the profiling procedure just yet. You navigate to the particular part which is slow, start profiling in the profiler, in your application you perform the actions which are considered slow, and afterwards you get a snapshot in the profiler. The snapshot contains the data collected by the profiler during the slow action, so most data is produced by code in the area to investigate. This is important, because it allows you to stay focused on a single area. O/R mapper and RDBMS profiling .NET profilers give you a good insight in the .NET side of things, but not in the RDBMS side of the application. As this article is about O/R mapper powered applications, we're also looking at databases, and the software making it possible to consume the database in your application: the O/R mapper. To understand which parts of the O/R mapper and database participate how much to the total time taken for task T, we need different tools. There are two kind of tools focusing on O/R mappers and database performance profiling: O/R mapper profilers and RDBMS profilers. For O/R mapper profilers, you can look at LLBLGen Prof by hibernating rhinos or the Linq to Sql/LLBLGen Pro profiler by Huagati. Hibernating rhinos also have profilers for other O/R mappers like NHibernate (NHProf) and Entity Framework (EFProf) and work the same as LLBLGen Prof. For RDBMS profilers, you have to look whether the RDBMS vendor has a profiler. For example for SQL Server, the profiler is shipped with SQL Server, for Oracle it's build into the RDBMS, however there are also 3rd party tools. Which tool you're using isn't really important, what's important is that you get insight in which queries are executed during the task / area we're currently focused on and how long they took. Here, the O/R mapper profilers have an advantage as they collect the time it took to execute the query from the application's perspective so they also collect the time it took to transport data across the network. This is important because a query which returns a massive resultset or a resultset with large blob/clob/ntext/image fields takes more time to get transported across the network than a small resultset and a database profiler doesn't take this into account most of the time. Another tool to use in this case, which is more low level and not all O/R mappers support it (though LLBLGen Pro and NHibernate as well do) is tracing: most O/R mappers offer some form of tracing or logging system which you can use to collect the SQL generated and executed and often also other activity behind the scenes. While tracing can produce a tremendous amount of data in some cases, it also gives insight in what's going on. Interpret After we've completed the analysis step it's time to look at the data we've collected. We've done code reviews to see whether we've done anything stupid and which parts actually take place and if the proper algorithms have been implemented. We've done .NET profiling to see which parts are choke points and how much time they contribute to the total time taken to complete the task we're investigating. We've performed O/R mapper profiling and RDBMS profiling to see which queries were executed during the task, how many queries were generated and executed and how long they took to complete, including network transportation. All this data reveals two things: which parts are big contributors to the total time taken and which parts are irrelevant. Both aspects are very important. The parts which are irrelevant (i.e. don't contribute significantly to the total time taken) can be ignored from now on, we won't look at them. The parts which contribute a lot to the total time taken are important to look at. We now have to first look at the .NET profiler results, to see whether the time taken is consumed in our own code, in .NET framework code, in the O/R mapper itself or somewhere else. For example if most of the time is consumed by DbCommand.ExecuteReader, the time it took to complete the task is depending on the time the data is fetched from the database. If there was just 1 query executed, according to tracing or O/R mapper profilers / RDBMS profilers, check whether that query is optimal, uses indexes or has to deal with a lot of data. Interpret means that you follow the path from begin to end through the data collected and determine where, along the path, the most time is contributed. It also means that you have to check whether this was expected or is totally unexpected. My previous example of the 10 row resultset of a query which groups millions of rows will likely reveal that a long time is spend inside the database and almost no time is spend in the .NET code, meaning the RDBMS part contributes the most to the total time taken, the rest is compared to that time, irrelevant. Considering the vastness of the source data set, it's expected this will take some time. However, does it need tweaking? Perhaps all possible tweaks are already in place. In the interpret step you then have to decide that further action in this area is necessary or not, based on what the analysis results show: if the analysis results were unexpected and in the area where the most time is contributed to the total time taken is room for improvement, action should be taken. If not, you can only accept the situation and move on. In all cases, document your decision together with the analysis you've done. If you decide that the perceived performance problem is actually expected due to the nature of the task performed, it's essential that in the future when someone else looks at the application and starts asking questions you can answer them properly and new analysis is only necessary if situations changed. Fix After interpreting the analysis results you've concluded that some areas need adjustment. This is the fix step: you're actively correcting the performance problem with proper action targeted at the real cause. In many cases related to O/R mapper powered applications it means you'll use different features of the O/R mapper to achieve the same goal, or apply optimizations at the RDBMS level. It could also mean you apply caching inside your application (compromise memory consumption over performance) to avoid unnecessary re-querying data and re-consuming the results. After applying a change, it's key you re-do the analysis and interpretation steps: compare the results and expectations with what you had before, to see whether your actions had any effect or whether it moved the problem to a different part of the application. Don't fall into the trap to do partly analysis: do the full analysis again: .NET profiling and O/R mapper / RDBMS profiling. It might very well be that the changes you've made make one part faster but another part significantly slower, in such a way that the overall problem hasn't changed at all. Performance tuning is dealing with compromises and making choices: to use one feature over the other, to accept a higher memory footprint, to go away from the strict-OO path and execute queries directly onto the RDBMS, these are choices and compromises which will cross your path if you want to fix performance problems with respect to O/R mappers or data-access and databases in general. In most cases it's not a big issue: alternatives are often good choices too and the compromises aren't that hard to deal with. What is important is that you document why you made a choice, a compromise: which analysis data, which interpretation led you to the choice made. This is key for good maintainability in the years to come. Most common performance problems with O/R mappers Below is an incomplete list of common performance problems related to data-access / O/R mappers / RDBMS code. It will help you with fixing the hotspots you found in the interpretation step. SELECT N+1: (Lazy-loading specific). Lazy loading triggered performance bottlenecks. Consider a list of Orders bound to a grid. You have a Field mapped onto a related field in Order, Customer.CompanyName. Showing this column in the grid will make the grid fetch (indirectly) for each row the Customer row. This means you'll get for the single list not 1 query (for the orders) but 1+(the number of orders shown) queries. To solve this: use eager loading using a prefetch path to fetch the customers with the orders. SELECT N+1 is easy to spot with an O/R mapper profiler or RDBMS profiler: if you see a lot of identical queries executed at once, you have this problem. Prefetch paths using many path nodes or sorting, or limiting. Eager loading problem. Prefetch paths can help with performance, but as 1 query is fetched per node, it can be the number of data fetched in a child node is bigger than you think. Also consider that data in every node is merged on the client within the parent. This is fast, but it also can take some time if you fetch massive amounts of entities. If you keep fetches small, you can use tuning parameters like the ParameterizedPrefetchPathThreshold setting to get more optimal queries. Deep inheritance hierarchies of type Target Per Entity/Type. If you use inheritance of type Target per Entity / Type (each type in the inheritance hierarchy is mapped onto its own table/view), fetches will join subtype- and supertype tables in many cases, which can lead to a lot of performance problems if the hierarchy has many types. With this problem, keep inheritance to a minimum if possible, or switch to a hierarchy of type Target Per Hierarchy, which means all entities in the inheritance hierarchy are mapped onto the same table/view. Of course this has its own set of drawbacks, but it's a compromise you might want to take. Fetching massive amounts of data by fetching large lists of entities. LLBLGen Pro supports paging (and limiting the # of rows returned), which is often key to process through large sets of data. Use paging on the RDBMS if possible (so a query is executed which returns only the rows in the page requested). When using paging in a web application, be sure that you switch server-side paging on on the datasourcecontrol used. In this case, paging on the grid alone is not enough: this can lead to fetching a lot of data which is then loaded into the grid and paged there. Keep note that analyzing queries for paging could lead to the false assumption that paging doesn't occur, e.g. when the query contains a field of type ntext/image/clob/blob and DISTINCT can't be applied while it should have (e.g. due to a join): the datareader will do DISTINCT filtering on the client. this is a little slower but it does perform paging functionality on the data-reader so it won't fetch all rows even if the query suggests it does. Fetch massive amounts of data because blob/clob/ntext/image fields aren't excluded. LLBLGen Pro supports field exclusion for queries. You can exclude fields (also in prefetch paths) per query to avoid fetching all fields of an entity, e.g. when you don't need them for the logic consuming the resultset. Excluding fields can greatly reduce the amount of time spend on data-transport across the network. Use this optimization if you see that there's a big difference between query execution time on the RDBMS and the time reported by the .NET profiler for the ExecuteReader method call. Doing client-side aggregates/scalar calculations by consuming a lot of data. If possible, try to formulate a scalar query or group by query using the projection system or GetScalar functionality of LLBLGen Pro to do data consumption on the RDBMS server. It's far more efficient to process data on the RDBMS server than to first load it all in memory, then traverse the data in-memory to calculate a value. Using .ToList() constructs inside linq queries. It might be you use .ToList() somewhere in a Linq query which makes the query be run partially in-memory. Example: var q = from c in metaData.Customers.ToList() where c.Country=="Norway" select c; This will actually fetch all customers in-memory and do an in-memory filtering, as the linq query is defined on an IEnumerable<T>, and not on the IQueryable<T>. Linq is nice, but it can often be a bit unclear where some parts of a Linq query might run. Fetching all entities to delete into memory first. To delete a set of entities it's rather inefficient to first fetch them all into memory and then delete them one by one. It's more efficient to execute a DELETE FROM ... WHERE query on the database directly to delete the entities in one go. LLBLGen Pro supports this feature, and so do some other O/R mappers. It's not always possible to do this operation in the context of an O/R mapper however: if an O/R mapper relies on a cache, these kind of operations are likely not supported because they make it impossible to track whether an entity is actually removed from the DB and thus can be removed from the cache. Fetching all entities to update with an expression into memory first. Similar to the previous point: it is more efficient to update a set of entities directly with a single UPDATE query using an expression instead of fetching the entities into memory first and then updating the entities in a loop, and afterwards saving them. It might however be a compromise you don't want to take as it is working around the idea of having an object graph in memory which is manipulated and instead makes the code fully aware there's a RDBMS somewhere. Conclusion Performance tuning is almost always about compromises and making choices. It's also about knowing where to look and how the systems in play behave and should behave. The four steps I provided should help you stay focused on the real problem and lead you towards the solution. Knowing how to optimally use the systems participating in your own code (.NET framework, O/R mapper, RDBMS, network/services) is key for success as well as knowing what's going on inside the application you built. I hope you'll find this guide useful in tracking down performance problems and dealing with them in a useful way.  

    Read the article

  • SAP dévoile Business Object 4.0, la nouvelle version de sa solution BI intègre la mobilité, les réseaux sociaux et le « in-memory »

    SAP dévoile Business Object 4.0 La nouvelle version de sa solution BI intègre la mobilité, les réseaux sociaux et le « in-memory » SAP vient de dévoiler Business Object 4.0, la prochaine version de sa plate-forme de nouvelle génération de Business Intelligence et de Gestion d'Information d'Entreprise (EIM). [IMG]http://ftp-developpez.com/gordon-fowler/SAP/Slide-5-SAP-BusinessObjects-4.0-Event-Insight2.jpg[/IMG] Après SAP ByDesign 2.6, sa suite ERP en mode SaaS (qui arrive avec un tout nouveau SDK), Business Object 4.0 est la deuxième très grosse annonce de cette année 2011 que Nicolas Sekkaki, Direc...

    Read the article

  • C# Performance Pitfall – Interop Scenarios Change the Rules

    - by Reed
    C# and .NET, overall, really do have fantastic performance in my opinion.  That being said, the performance characteristics dramatically differ from native programming, and take some relearning if you’re used to doing performance optimization in most other languages, especially C, C++, and similar.  However, there are times when revisiting tricks learned in native code play a critical role in performance optimization in C#. I recently ran across a nasty scenario that illustrated to me how dangerous following any fixed rules for optimization can be… The rules in C# when optimizing code are very different than C or C++.  Often, they’re exactly backwards.  For example, in C and C++, lifting a variable out of loops in order to avoid memory allocations often can have huge advantages.  If some function within a call graph is allocating memory dynamically, and that gets called in a loop, it can dramatically slow down a routine. This can be a tricky bottleneck to track down, even with a profiler.  Looking at the memory allocation graph is usually the key for spotting this routine, as it’s often “hidden” deep in call graph.  For example, while optimizing some of my scientific routines, I ran into a situation where I had a loop similar to: for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i]); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This loop was at a fairly high level in the call graph, and often could take many hours to complete, depending on the input data.  As such, any performance optimization we could achieve would be greatly appreciated by our users. After a fair bit of profiling, I noticed that a couple of function calls down the call graph (inside of ProcessElement), there was some code that effectively was doing: // Allocate some data required DataStructure* data = new DataStructure(num); // Call into a subroutine that passed around and manipulated this data highly CallSubroutine(data); // Read and use some values from here double values = data->Foo; // Cleanup delete data; // ... return bar; Normally, if “DataStructure” was a simple data type, I could just allocate it on the stack.  However, it’s constructor, internally, allocated it’s own memory using new, so this wouldn’t eliminate the problem.  In this case, however, I could change the call signatures to allow the pointer to the data structure to be passed into ProcessElement and through the call graph, allowing the inner routine to reuse the same “data” memory instead of allocating.  At the highest level, my code effectively changed to something like: DataStructure* data = new DataStructure(numberToProcess); for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i], data); } delete data; Granted, this dramatically reduced the maintainability of the code, so it wasn’t something I wanted to do unless there was a significant benefit.  In this case, after profiling the new version, I found that it increased the overall performance dramatically – my main test case went from 35 minutes runtime down to 21 minutes.  This was such a significant improvement, I felt it was worth the reduction in maintainability. In C and C++, it’s generally a good idea (for performance) to: Reduce the number of memory allocations as much as possible, Use fewer, larger memory allocations instead of many smaller ones, and Allocate as high up the call stack as possible, and reuse memory I’ve seen many people try to make similar optimizations in C# code.  For good or bad, this is typically not a good idea.  The garbage collector in .NET completely changes the rules here. In C#, reallocating memory in a loop is not always a bad idea.  In this scenario, for example, I may have been much better off leaving the original code alone.  The reason for this is the garbage collector.  The GC in .NET is incredibly effective, and leaving the allocation deep inside the call stack has some huge advantages.  First and foremost, it tends to make the code more maintainable – passing around object references tends to couple the methods together more than necessary, and overall increase the complexity of the code.  This is something that should be avoided unless there is a significant reason.  Second, (unlike C and C++) memory allocation of a single object in C# is normally cheap and fast.  Finally, and most critically, there is a large advantage to having short lived objects.  If you lift a variable out of the loop and reuse the memory, its much more likely that object will get promoted to Gen1 (or worse, Gen2).  This can cause expensive compaction operations to be required, and also lead to (at least temporary) memory fragmentation as well as more costly collections later. As such, I’ve found that it’s often (though not always) faster to leave memory allocations where you’d naturally place them – deep inside of the call graph, inside of the loops.  This causes the objects to stay very short lived, which in turn increases the efficiency of the garbage collector, and can dramatically improve the overall performance of the routine as a whole. In C#, I tend to: Keep variable declarations in the tightest scope possible Declare and allocate objects at usage While this tends to cause some of the same goals (reducing unnecessary allocations, etc), the goal here is a bit different – it’s about keeping the objects rooted for as little time as possible in order to (attempt) to keep them completely in Gen0, or worst case, Gen1.  It also has the huge advantage of keeping the code very maintainable – objects are used and “released” as soon as possible, which keeps the code very clean.  It does, however, often have the side effect of causing more allocations to occur, but keeping the objects rooted for a much shorter time. Now – nowhere here am I suggesting that these rules are hard, fast rules that are always true.  That being said, my time spent optimizing over the years encourages me to naturally write code that follows the above guidelines, then profile and adjust as necessary.  In my current project, however, I ran across one of those nasty little pitfalls that’s something to keep in mind – interop changes the rules. In this case, I was dealing with an API that, internally, used some COM objects.  In this case, these COM objects were leading to native allocations (most likely C++) occurring in a loop deep in my call graph.  Even though I was writing nice, clean managed code, the normal managed code rules for performance no longer apply.  After profiling to find the bottleneck in my code, I realized that my inner loop, a innocuous looking block of C# code, was effectively causing a set of native memory allocations in every iteration.  This required going back to a “native programming” mindset for optimization.  Lifting these variables and reusing them took a 1:10 routine down to 0:20 – again, a very worthwhile improvement. Overall, the lessons here are: Always profile if you suspect a performance problem – don’t assume any rule is correct, or any code is efficient just because it looks like it should be Remember to check memory allocations when profiling, not just CPU cycles Interop scenarios often cause managed code to act very differently than “normal” managed code. Native code can be hidden very cleverly inside of managed wrappers

    Read the article

  • tile_static, tile_barrier, and tiled matrix multiplication with C++ AMP

    - by Daniel Moth
    We ended the previous post with a mechanical transformation of the C++ AMP matrix multiplication example to the tiled model and in the process introduced tiled_index and tiled_grid. This is part 2. tile_static memory You all know that in regular CPU code, static variables have the same value regardless of which thread accesses the static variable. This is in contrast with non-static local variables, where each thread has its own copy. Back to C++ AMP, the same rules apply and each thread has its own value for local variables in your lambda, whereas all threads see the same global memory, which is the data they have access to via the array and array_view. In addition, on an accelerator like the GPU, there is a programmable cache, a third kind of memory type if you'd like to think of it that way (some call it shared memory, others call it scratchpad memory). Variables stored in that memory share the same value for every thread in the same tile. So, when you use the tiled model, you can have variables where each thread in the same tile sees the same value for that variable, that threads from other tiles do not. The new storage class for local variables introduced for this purpose is called tile_static. You can only use tile_static in restrict(direct3d) functions, and only when explicitly using the tiled model. What this looks like in code should be no surprise, but here is a snippet to confirm your mental image, using a good old regular C array // each tile of threads has its own copy of locA, // shared among the threads of the tile tile_static float locA[16][16]; Note that tile_static variables are scoped and have the lifetime of the tile, and they cannot have constructors or destructors. tile_barrier In amp.h one of the types introduced is tile_barrier. You cannot construct this object yourself (although if you had one, you could use a copy constructor to create another one). So how do you get one of these? You get it, from a tiled_index object. Beyond the 4 properties returning index objects, tiled_index has another property, barrier, that returns a tile_barrier object. The tile_barrier class exposes a single member, the method wait. 15: // Given a tiled_index object named t_idx 16: t_idx.barrier.wait(); 17: // more code …in the code above, all threads in the tile will reach line 16 before a single one progresses to line 17. Note that all threads must be able to reach the barrier, i.e. if you had branchy code in such a way which meant that there is a chance that not all threads could reach line 16, then the code above would be illegal. Tiled Matrix Multiplication Example – part 2 So now that we added to our understanding the concepts of tile_static and tile_barrier, let me obfuscate rewrite the matrix multiplication code so that it takes advantage of tiling. Before you start reading this, I suggest you get a cup of your favorite non-alcoholic beverage to enjoy while you try to fully understand the code. 01: void MatrixMultiplyTiled(vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W) 02: { 03: static const int TS = 16; 04: array_view<const float,2> a(M, W, vA); 05: array_view<const float,2> b(W, N, vB); 06: array_view<writeonly<float>,2> c(M,N,vC); 07: parallel_for_each(c.grid.tile< TS, TS >(), 08: [=] (tiled_index< TS, TS> t_idx) restrict(direct3d) 09: { 10: int row = t_idx.local[0]; int col = t_idx.local[1]; 11: float sum = 0.0f; 12: for (int i = 0; i < W; i += TS) { 13: tile_static float locA[TS][TS], locB[TS][TS]; 14: locA[row][col] = a(t_idx.global[0], col + i); 15: locB[row][col] = b(row + i, t_idx.global[1]); 16: t_idx.barrier.wait(); 17: for (int k = 0; k < TS; k++) 18: sum += locA[row][k] * locB[k][col]; 19: t_idx.barrier.wait(); 20: } 21: c[t_idx.global] = sum; 22: }); 23: } Notice that all the code up to line 9 is the same as per the changes we made in part 1 of tiling introduction. If you squint, the body of the lambda itself preserves the original algorithm on lines 10, 11, and 17, 18, and 21. The difference being that those lines use new indexing and the tile_static arrays; the tile_static arrays are declared and initialized on the brand new lines 13-15. On those lines we copy from the global memory represented by the array_view objects (a and b), to the tile_static vanilla arrays (locA and locB) – we are copying enough to fit a tile. Because in the code that follows on line 18 we expect the data for this tile to be in the tile_static storage, we need to synchronize the threads within each tile with a barrier, which we do on line 16 (to avoid accessing uninitialized memory on line 18). We also need to synchronize the threads within a tile on line 19, again to avoid the race between lines 14, 15 (retrieving the next set of data for each tile and overwriting the previous set) and line 18 (not being done processing the previous set of data). Luckily, as part of the awesome C++ AMP debugger in Visual Studio there is an option that helps you find such races, but that is a story for another blog post another time. May I suggest reading the next section, and then coming back to re-read and walk through this code with pen and paper to really grok what is going on, if you haven't already? Cool. Why would I introduce this tiling complexity into my code? Funny you should ask that, I was just about to tell you. There is only one reason we tiled our extent, had to deal with finding a good tile size, ensure the number of threads we schedule are correctly divisible with the tile size, had to use a tiled_index instead of a normal index, and had to understand tile_barrier and to figure out where we need to use it, and double the size of our lambda in terms of lines of code: the reason is to be able to use tile_static memory. Why do we want to use tile_static memory? Because accessing tile_static memory is around 10 times faster than accessing the global memory on an accelerator like the GPU, e.g. in the code above, if you can get 150GB/second accessing data from the array_view a, you can get 1500GB/second accessing the tile_static array locA. And since by definition you are dealing with really large data sets, the savings really pay off. We have seen tiled implementations being twice as fast as their non-tiled counterparts. Now, some algorithms will not have performance benefits from tiling (and in fact may deteriorate), e.g. algorithms that require you to go only once to global memory will not benefit from tiling, since with tiling you already have to fetch the data once from global memory! Other algorithms may benefit, but you may decide that you are happy with your code being 150 times faster than the serial-version you had, and you do not need to invest to make it 250 times faster. Also algorithms with more than 3 dimensions, which C++ AMP supports in the non-tiled model, cannot be tiled. Also note that in future releases, we may invest in making the non-tiled model, which already uses tiling under the covers, go the extra step and use tile_static memory on your behalf, but it is obviously way to early to commit to anything like that, and we certainly don't do any of that today. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

  • What's the difference between "Flash Drive" and "Flash Memory"?

    - by Clive D
    I have a problem with a Blu ray disk I bought. I talked to a Sony technician who advised me to plug a "USB Flash Memory Stick" into the Blu-ray player. Is this something specific? Is there a difference between the following two? "USB Flash Drive" "USB Flash Memory" When I go to Curry's or other sites that sell USB Sticks, they only talk about "USB Flash Drives". I've been in computing for many years and know the basics, but "memory" and "drive" are different things to me.

    Read the article

  • Fatal error: Out of memory (allocated ...) (tried to allocate ... bytes) not due to memory_limit setting

    - by Lorenz Meyer
    Since a few days, I get the following error on my server: Fatal error: Out of memory (allocated 262144) (tried to allocate 393216 bytes) Usually this error is due to a memory consumption that is exceeding the configured memory_limit, but in my case there is no relation. The memory_limit is set to 128MB, and in this case, we not even reach 1MB. Also the server does not have a big load, in fact it is an intranet server, and there are just a few people conected to it. System: Windows Server 2003, 1Go RAM, only 600 MB used. Apache 2.2.4 PHP 5.2.3 This error is appearing randomly. The memory limit reached also is randomly between a few kB to a few MB. Sometimes restarting Apache is required to get rid of the error, sometimes it disapears itself. Restarting Apache or the entire server helps temporarily. Where could this problem come from ? How could I narrow down the error source ?

    Read the article

  • TechEd 2014 Day 3

    - by John Paul Cook
    There is some confusion about durability of data stored in SQL Server in-memory tables, so some review of the concepts is appropriate. The in-memory option is enabled at the database level. Enabling it at the database level only gives you the option to specify the in-memory feature on a table by table basis. No existing tables or new tables will by default become in-memory tables when you enable the feature at the database level. If you choose to make a table an in-memory table, by default it is...(read more)

    Read the article

< Previous Page | 146 147 148 149 150 151 152 153 154 155 156 157  | Next Page >