Search Results

Search found 18811 results on 753 pages for 'dynamic memory allocation'.

Page 235/753 | < Previous Page | 231 232 233 234 235 236 237 238 239 240 241 242  | Next Page >

  • How to copy a structure with pointers to data inside (so to copy pointers and data they point to)?

    - by Kabumbus
    so I have a structure like struct GetResultStructure { int length; char* ptr; }; I need a way to make a full copy of it meaning I need a copy to have a structure with new ptr poinnting on to copy of data I had in original structure. Is It any how possible? I mean any structure I have which contains ptrs will have some fields with its lengths I need a function that would copy my structure coping all ptrs and data they point to by given array of lengthes... Any cool boost function for it? Or any way how to create such function?

    Read the article

  • Resolve property at runtime. C#

    - by user1031381
    I have some class which have some code public IEnumerable<TypeOne> TypeOne { get { if (db != null) { var col = db.Select<TypeOne>(); if (col.Count > 0) return col; } return db2.TypeOne; } } public IEnumerable<TypeTwo> TypeTwo { get { if (db != null) { var col = db.Select<TypeTwo>(); if (col.Count > 0) return col; } return db2.TypeTwo; } } So as You can see there is a lot of duplicated Code and there are same property name and item type of enumerable. I want to call some property of object like "obj.MyProp". And MyProp must be resolved at runtime with some generic or non-generic method. Is it possible?

    Read the article

  • base pointer to derived class

    - by Jay
    Suppose there are Base class and Derived class. Base *A = new Base; Here A is a pointer point to Base class, and new constructs one that A points to. I also saw Base *B = new Derived; How to explain this? B is a pointer to Base Class, and a Derived class constructed and pointed by B? If there is a function derived from Base class, say, Virtual void f(), and it's been overridden in Derived class, then B->f() will invoke which version of the function? version in Base class, or version that overridden in Derived Class. What if there is a new function void g()in Derived, is B->g() going to invoke this function properly? One more is, is int *a = new double; or int *a = new int; legal?

    Read the article

  • Generate Delete Statement From Foreign Key Relationships in SQL 2008 ?

    - by Element
    Is it possible via script/tool to generate a delete statement based on the tables fk relations. i.e. I have the table: DelMe(ID) and there are 30 tables with fk references to its ID that I need to delete first, is there some tool/script that I can run that will generate the 30 delete statements based on the FK relations for me ? (btw I know about cascade delete on the relations, I can't use it in this existing db) I'm using Microsoft SQL Server 2008

    Read the article

  • Main page content populated on the fly?

    - by jcovert
    Is there any reason to NOT have a webpage retrieve it's main content on the fly? For example, I have a page that has a header and a footer, and in the middle of this page is an empty div. When you click on one of the buttons in the header, an http GET is done behind the scenes and the .innerHTML() of the empty div is replaced with the result. I can't think of any reason why this might be a bad idea, but I can't seem to find any pages out there that do it? Please advise!

    Read the article

  • JavaScript not changing display type or color in IE

    - by user445359
    I am trying to switch a series of blocks between "none" and "block" based on the OnMouseOver property and to change the title of the selected list to yellow at the same time. The JavaScript code I have for this is: function switchCat(cat) { var uls = document.getElementsByClassName('lower-ul'); var titles = document.getElementsByClassName('lower-cat-title'); for (var i=0;i<uls.length;i++) { uls[i].style.display = 'none'; titles[i].style.color = 'white'; } if (cat != -1) { var wanted = document.getElementById('lower-cat-'+cat); var wantedTitle = document.getElementById('lower-cat-title-'+cat); wanted.style.display = 'block'; wantedTitle.style.color = 'yellow'; } } It works with Chrome, Opera, and Firefox, however, it does not work with IE. When I test it in IE I get the error "Object doesn't support this property or method." Does anyone know what I am doing wrong?

    Read the article

  • inner workings of PHP (really long PHP script)

    - by econclicks
    I have a really long php script for just 1 page i.e. something like: mywebsite.com/page.php?id=99999 I have about 10000-20000 cases of the id, each with a different settings. Will this slow down my website significantly? i.e. my question is really along the lines of, what happens when php is executed. does the server execute it and display the results, or does the client's computer download it, execute it and display the results. if its the latter, does it mean a really slow load time? each of the 10000-20000 cases has about 20-25 lines of code after it. thanks, xoxo

    Read the article

  • [jQuery] Descriptions on Hover

    - by Nimbuz
    <form action=""> <input type="text" value="" class="card-num" /> <input type="text" value="" class="card-cvv2"/> </form> <div class="description"> <p class="card-num">Enter your 16 digit card number</p> <p class="card-cvv2">Enter the 3 digit number at the back of your card</p> </div> All descriptions are hidden initially. But when hovered on an element that has a matching tag, display description? I'd really appreciate any help. Many thanks!

    Read the article

  • Does LINQ require significantly more processing cycles and memory than lower-level data iteration techniques?

    - by Matthew Patrick Cashatt
    Background I am recently in the process of enduring grueling tech interviews for positions that use the .NET stack, some of which include silly questions like this one, and some questions that are more valid. I recently came across an issue that may be valid but I want to check with the community here to be sure. When asked by an interviewer how I would count the frequency of words in a text document and rank the results, I answered that I would Use a stream object put the text file in memory as a string. Split the string into an array on spaces while ignoring punctuation. Use LINQ against the array to .GroupBy() and .Count(), then OrderBy() said count. I got this answer wrong for two reasons: Streaming an entire text file into memory could be disasterous. What if it was an entire encyclopedia? Instead I should stream one block at a time and begin building a hash table. LINQ is too expensive and requires too many processing cycles. I should have built a hash table instead and, for each iteration, only added a word to the hash table if it didn't otherwise exist and then increment it's count. The first reason seems, well, reasonable. But the second gives me more pause. I thought that one of the selling points of LINQ is that it simply abstracts away lower-level operations like hash tables but that, under the veil, it is still the same implementation. Question Aside from a few additional processing cycles to call any abstracted methods, does LINQ require significantly more processing cycles to accomplish a given data iteration task than a lower-level task (such as building a hash table) would?

    Read the article

  • Creating foreign words' learning site with memory technique (Web 2.0)? Will it work?

    - by Michal P.
    I would like to earn a little money for realizing a good, simple project. My idea is to build a website for learning of chosen by me language (for users knowing English) using mnemonics. Users would be encourage to enter English words with translation to another language and describing the way, how to remember a foreign language word (an association link). Example: if I choose learning Spanish for people who knows English well, it would look like that: every user would be encourage to enter a way to remember a chosen by him/her Spanish word. So he/she would enter to the dictionary (my site database) ,e.g., English word: beach - playa (Spanish word). Then he/she would describe the method to remember Spanish word, e.g., "Image that U r on the beach and U play volleyball" - we have the word play and recall playa (mnemonics). I would like to give possibility of pic hotlinks, encourage for fun or little shocking memory links which is -- in the art of memory -- good. I would choose a language to take a niche of Google Search. The big question is if I don't lose my time on it?? (Maybe I need to find prototype way to check that idea?)

    Read the article

  • Running ARM(EL) executables on ARM(HF) system - Missing Symlink to dynamic Loader?

    - by Uhli
    I am using an Ubuntu 12.10 (ARMHF) distribution on a panda board. I want to run applications compiled for ARMEL. It was not possible due to a changed dynamic loader location (https://wiki.linaro.org/OfficeofCTO/HardFloat/LinkerPathCallApr2012) I succeeded by creating the following symbolic link /lib/ld-linux.so.3 - /lib/ld-linuxarmhf.so.3 Is there a way to install a portability package instead? Is there a reason why this is not done by the distribution? Thanks in advance

    Read the article

  • Groovy 2.0 : plus statique pour un meilleur dynamisme, Invoke Dynamic, vérification des types et compilation statiques sur la première bêta

    Groovy 2.0 : plus statique pour un meilleur dynamisme Invoke Dynamic, vérification des types et compilation statiques sur la première bêta [IMG]http://idelways.developpez.com/news/images/groovy.jpg[/IMG] L'équipe du projet Groovy surprend la communauté du langage en sortant la version 2.0 en bêta, alors que c'est la 1.9 qui était attendue. Comme c'est tendance actuellement, Groovy abandonne la numérotation des versions majeures en deuxième position (1.x), où la version 2.0 devient « mythique et semble ne jamais vouloir arriver », explique Guillaume Laforge, manager du projet. Le projet aura désormais une évolution majeure chaque année, mais cette versi...

    Read the article

  • Using sscanf for the dynamic delimiters (inputted by users) in parsing an input string in C? [migrated]

    - by Zuhakasa
    I'm encountering a problem in parsing a string using sscanf() function. My function gets 2 string parameters. One for input string and another one for a dynamic list of delimiters. How can I use sscanf to parse the input string with the defined delimiters inputted by users. For example: Myfunction(char * input_string, char * delimiter_list){ scanf("%s", input_string); scanf("%s", delimiter_list); sscanf(input_string, ???...); ................ }

    Read the article

  • Getting Memory allocation problem at UIBarButtonItem in Iphone sdk.

    - by monish
    Hi guys, Here I am getting memory allocation problem at UIBarButtonItem and the related code for that is: toolbar = [UIToolbar new]; toolbar.barStyle = UIBarStyleBlackOpaque; [toolbar setFrame:CGRectMake(0, 350,320,20)]; [self.view addSubview:toolbar]; UIBarButtonItem* barItem1 = [[UIBarButtonItem alloc] initWithBarButtonSystemItem:UIBarButtonSystemItemFlexibleSpace target:self action:@selector(categoryConfig:)] ; rightBarItem = [[UIBarButtonItem alloc] initWithBarButtonSystemItem:UIBarButtonSystemItemBookmarks target:self action:@selector(dialogOtherAction:)] ; UIBarButtonItem* barItem2 = [[UIBarButtonItem alloc] initWithBarButtonSystemItem:UIBarButtonSystemItemFlexibleSpace target:self action:@selector(categoryConfig:)] ; NSArray *items = [NSArray arrayWithObjects: barItem1,rightBarItem,barItem2, nil]; [barItem1 release]; [barItem2 release]; [rightBarItem release]; [toolbar setItems:items animated:NO]; after adding UIBarButtonItems into the array items I released them.even though its showing allocations at barbuttons. can any help me for this? Thank you, Monish.

    Read the article

  • SQL SERVER – Faster SQL Server Databases and Applications – Power and Control with SafePeak Caching Options

    - by Pinal Dave
    Update: This blog post is written based on the SafePeak, which is available for free download. Today, I’d like to examine more closely one of my preferred technologies for accelerating SQL Server databases, SafePeak. Safepeak’s software provides a variety of advanced data caching options, techniques and tools to accelerate the performance and scalability of SQL Server databases and applications. I’d like to look more closely at some of these options, as some of these capabilities could help you address lagging database and performance on your systems. To better understand the available options, it is best to start by understanding the difference between the usual “Basic Caching” vs. SafePeak’s “Dynamic Caching”. Basic Caching Basic Caching (or the stale and static cache) is an ability to put the results from a query into cache for a certain period of time. It is based on TTL, or Time-to-live, and is designed to stay in cache no matter what happens to the data. For example, although the actual data can be modified due to DML commands (update/insert/delete), the cache will still hold the same obsolete query data. Meaning that with the Basic Caching is really static / stale cache.  As you can tell, this approach has its limitations. Dynamic Caching Dynamic Caching (or the non-stale cache) is an ability to put the results from a query into cache while maintaining the cache transaction awareness looking for possible data modifications. The modifications can come as a result of: DML commands (update/insert/delete), indirect modifications due to triggers on other tables, executions of stored procedures with internal DML commands complex cases of stored procedures with multiple levels of internal stored procedures logic. When data modification commands arrive, the caching system identifies the related cache items and evicts them from cache immediately. In the dynamic caching option the TTL setting still exists, although its importance is reduced, since the main factor for cache invalidation (or cache eviction) become the actual data updates commands. Now that we have a basic understanding of the differences between “basic” and “dynamic” caching, let’s dive in deeper. SafePeak: A comprehensive and versatile caching platform SafePeak comes with a wide range of caching options. Some of SafePeak’s caching options are automated, while others require manual configuration. Together they provide a complete solution for IT and Data managers to reach excellent performance acceleration and application scalability for  a wide range of business cases and applications. Automated caching of SQL Queries: Fully/semi-automated caching of all “read” SQL queries, containing any types of data, including Blobs, XMLs, Texts as well as all other standard data types. SafePeak automatically analyzes the incoming queries, categorizes them into SQL Patterns, identifying directly and indirectly accessed tables, views, functions and stored procedures; Automated caching of Stored Procedures: Fully or semi-automated caching of all read” stored procedures, including procedures with complex sub-procedure logic as well as procedures with complex dynamic SQL code. All procedures are analyzed in advance by SafePeak’s  Metadata-Learning process, their SQL schemas are parsed – resulting with a full understanding of the underlying code, objects dependencies (tables, views, functions, sub-procedures) enabling automated or semi-automated (manually review and activate by a mouse-click) cache activation, with full understanding of the transaction logic for cache real-time invalidation; Transaction aware cache: Automated cache awareness for SQL transactions (SQL and in-procs); Dynamic SQL Caching: Procedures with dynamic SQL are pre-parsed, enabling easy cache configuration, eliminating SQL Server load for parsing time and delivering high response time value even in most complicated use-cases; Fully Automated Caching: SQL Patterns (including SQL queries and stored procedures) that are categorized by SafePeak as “read and deterministic” are automatically activated for caching; Semi-Automated Caching: SQL Patterns categorized as “Read and Non deterministic” are patterns of SQL queries and stored procedures that contain reference to non-deterministic functions, like getdate(). Such SQL Patterns are reviewed by the SafePeak administrator and in usually most of them are activated manually for caching (point and click activation); Fully Dynamic Caching: Automated detection of all dependent tables in each SQL Pattern, with automated real-time eviction of the relevant cache items in the event of “write” commands (a DML or a stored procedure) to one of relevant tables. A default setting; Semi Dynamic Caching: A manual cache configuration option enabling reducing the sensitivity of specific SQL Patterns to “write” commands to certain tables/views. An optimization technique relevant for cases when the query data is either known to be static (like archive order details), or when the application sensitivity to fresh data is not critical and can be stale for short period of time (gaining better performance and reduced load); Scheduled Cache Eviction: A manual cache configuration option enabling scheduling SQL Pattern cache eviction based on certain time(s) during a day. A very useful optimization technique when (for example) certain SQL Patterns can be cached but are time sensitive. Example: “select customers that today is their birthday”, an SQL with getdate() function, which can and should be cached, but the data stays relevant only until 00:00 (midnight); Parsing Exceptions Management: Stored procedures that were not fully parsed by SafePeak (due to too complex dynamic SQL or unfamiliar syntax), are signed as “Dynamic Objects” with highest transaction safety settings (such as: Full global cache eviction, DDL Check = lock cache and check for schema changes, and more). The SafePeak solution points the user to the Dynamic Objects that are important for cache effectiveness, provides easy configuration interface, allowing you to improve cache hits and reduce cache global evictions. Usually this is the first configuration in a deployment; Overriding Settings of Stored Procedures: Override the settings of stored procedures (or other object types) for cache optimization. For example, in case a stored procedure SP1 has an “insert” into table T1, it will not be allowed to be cached. However, it is possible that T1 is just a “logging or instrumentation” table left by developers. By overriding the settings a user can allow caching of the problematic stored procedure; Advanced Cache Warm-Up: Creating an XML-based list of queries and stored procedure (with lists of parameters) for periodically automated pre-fetching and caching. An advanced tool allowing you to handle more rare but very performance sensitive queries pre-fetch them into cache allowing high performance for users’ data access; Configuration Driven by Deep SQL Analytics: All SQL queries are continuously logged and analyzed, providing users with deep SQL Analytics and Performance Monitoring. Reduce troubleshooting from days to minutes with database objects and SQL Patterns heat-map. The performance driven configuration helps you to focus on the most important settings that bring you the highest performance gains. Use of SafePeak SQL Analytics allows continuous performance monitoring and analysis, easy identification of bottlenecks of both real-time and historical data; Cloud Ready: Available for instant deployment on Amazon Web Services (AWS). As you can see, there are many options to configure SafePeak’s SQL Server database and application acceleration caching technology to best fit a lot of situations. If you’re not familiar with their technology, they offer free-trial software you can download that comes with a free “help session” to help get you started. You can access the free trial here. Also, SafePeak is available to use on Amazon Cloud. Reference: Pinal Dave (http://blog.sqlauthority.com)Filed under: PostADay, SQL, SQL Authority, SQL Performance, SQL Query, SQL Server, SQL Tips and Tricks, T SQL

    Read the article

  • Question about memory allocation when initializing char arrays in C/C++.

    - by Carlos Nunez
    Before anything, I apologize if this question has been asked before. I am programming a simple packet sniffer for a class project. For a little while, I ran into the issue where the source and destination of a packet appeared to be the same. For example, the source and destination of an Ethernet frame would be the same MAC address all of the time. I custom-made ether_ntoa(char *) because Windows does not seem to have ethernet.h like Linux does. Code snippet is below: char *ether_ntoa(u_char etheraddr[ETHER_ADDR_LEN]) { int i, j; char eout[32]; for(i = 0, j = 0; i < 5; i++) { eout[j++] = etheraddr[i] >> 4; eout[j++] = etheraddr[i] & 0xF; eout[j++] = ':'; } eout[j++] = etheraddr[i] >> 4; eout[j++] = etheraddr[i] & 0xF; eout[j++] = '\0'; for(i = 0; i < 17; i++) { if(eout[i] < 10) eout[i] += 0x30; else if(eout[i] < 16) eout[i] += 0x57; } return(eout); } I solved the problem by using malloc() to have the compiler assign memory (i.e. instead of char eout[32], I used char * eout; eout = (char *) malloc (32);). However, I thought that the compiler assigned different memory locations when one sized a char-array at compile time. Is this incorrect? Thanks! Carlos Nunez

    Read the article

  • How much is too much memory allocation in NDK?

    - by Maximus
    The NDK download page notes that, "Typical good candidates for the NDK are self-contained, CPU-intensive operations that don't allocate much memory, such as signal processing, physics simulation, and so on." I came from a C background and was excited to try to use the NDK to operate most of my OpenGL ES functions and any native functions related to physics, animation of vertices, etc... I'm finding that I'm relying quite a bit on Native code and wondering if I may be making some mistakes. I've had no trouble with testing at this point, but I'm curious if I may run into problems in the future. For example, I have game struct defined (somewhat like is seen in the San-Angeles example). I'm loading vertex information for objects dynamically (just what is needed for an active game area) so there's quite a bit of memory allocation happening for vertices, normals, texture coordinates, indices and texture graphic data... just to name the essentials. I'm quite careful about freeing what is allocated between game areas. Would I be safer setting some caps on array sizes or should I charge bravely forward as I'm going now?

    Read the article

  • SAP dévoile Business Object 4.0, la nouvelle version de sa solution BI intègre la mobilité, les réseaux sociaux et le « in-memory »

    SAP dévoile Business Object 4.0 La nouvelle version de sa solution BI intègre la mobilité, les réseaux sociaux et le « in-memory » SAP vient de dévoiler Business Object 4.0, la prochaine version de sa plate-forme de nouvelle génération de Business Intelligence et de Gestion d'Information d'Entreprise (EIM). [IMG]http://ftp-developpez.com/gordon-fowler/SAP/Slide-5-SAP-BusinessObjects-4.0-Event-Insight2.jpg[/IMG] Après SAP ByDesign 2.6, sa suite ERP en mode SaaS (qui arrive avec un tout nouveau SDK), Business Object 4.0 est la deuxième très grosse annonce de cette année 2011 que Nicolas Sekkaki, Direc...

    Read the article

  • C# Performance Pitfall – Interop Scenarios Change the Rules

    - by Reed
    C# and .NET, overall, really do have fantastic performance in my opinion.  That being said, the performance characteristics dramatically differ from native programming, and take some relearning if you’re used to doing performance optimization in most other languages, especially C, C++, and similar.  However, there are times when revisiting tricks learned in native code play a critical role in performance optimization in C#. I recently ran across a nasty scenario that illustrated to me how dangerous following any fixed rules for optimization can be… The rules in C# when optimizing code are very different than C or C++.  Often, they’re exactly backwards.  For example, in C and C++, lifting a variable out of loops in order to avoid memory allocations often can have huge advantages.  If some function within a call graph is allocating memory dynamically, and that gets called in a loop, it can dramatically slow down a routine. This can be a tricky bottleneck to track down, even with a profiler.  Looking at the memory allocation graph is usually the key for spotting this routine, as it’s often “hidden” deep in call graph.  For example, while optimizing some of my scientific routines, I ran into a situation where I had a loop similar to: for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i]); } .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } This loop was at a fairly high level in the call graph, and often could take many hours to complete, depending on the input data.  As such, any performance optimization we could achieve would be greatly appreciated by our users. After a fair bit of profiling, I noticed that a couple of function calls down the call graph (inside of ProcessElement), there was some code that effectively was doing: // Allocate some data required DataStructure* data = new DataStructure(num); // Call into a subroutine that passed around and manipulated this data highly CallSubroutine(data); // Read and use some values from here double values = data->Foo; // Cleanup delete data; // ... return bar; Normally, if “DataStructure” was a simple data type, I could just allocate it on the stack.  However, it’s constructor, internally, allocated it’s own memory using new, so this wouldn’t eliminate the problem.  In this case, however, I could change the call signatures to allow the pointer to the data structure to be passed into ProcessElement and through the call graph, allowing the inner routine to reuse the same “data” memory instead of allocating.  At the highest level, my code effectively changed to something like: DataStructure* data = new DataStructure(numberToProcess); for (i=0; i<numberToProcess; ++i) { // Do some work ProcessElement(element[i], data); } delete data; Granted, this dramatically reduced the maintainability of the code, so it wasn’t something I wanted to do unless there was a significant benefit.  In this case, after profiling the new version, I found that it increased the overall performance dramatically – my main test case went from 35 minutes runtime down to 21 minutes.  This was such a significant improvement, I felt it was worth the reduction in maintainability. In C and C++, it’s generally a good idea (for performance) to: Reduce the number of memory allocations as much as possible, Use fewer, larger memory allocations instead of many smaller ones, and Allocate as high up the call stack as possible, and reuse memory I’ve seen many people try to make similar optimizations in C# code.  For good or bad, this is typically not a good idea.  The garbage collector in .NET completely changes the rules here. In C#, reallocating memory in a loop is not always a bad idea.  In this scenario, for example, I may have been much better off leaving the original code alone.  The reason for this is the garbage collector.  The GC in .NET is incredibly effective, and leaving the allocation deep inside the call stack has some huge advantages.  First and foremost, it tends to make the code more maintainable – passing around object references tends to couple the methods together more than necessary, and overall increase the complexity of the code.  This is something that should be avoided unless there is a significant reason.  Second, (unlike C and C++) memory allocation of a single object in C# is normally cheap and fast.  Finally, and most critically, there is a large advantage to having short lived objects.  If you lift a variable out of the loop and reuse the memory, its much more likely that object will get promoted to Gen1 (or worse, Gen2).  This can cause expensive compaction operations to be required, and also lead to (at least temporary) memory fragmentation as well as more costly collections later. As such, I’ve found that it’s often (though not always) faster to leave memory allocations where you’d naturally place them – deep inside of the call graph, inside of the loops.  This causes the objects to stay very short lived, which in turn increases the efficiency of the garbage collector, and can dramatically improve the overall performance of the routine as a whole. In C#, I tend to: Keep variable declarations in the tightest scope possible Declare and allocate objects at usage While this tends to cause some of the same goals (reducing unnecessary allocations, etc), the goal here is a bit different – it’s about keeping the objects rooted for as little time as possible in order to (attempt) to keep them completely in Gen0, or worst case, Gen1.  It also has the huge advantage of keeping the code very maintainable – objects are used and “released” as soon as possible, which keeps the code very clean.  It does, however, often have the side effect of causing more allocations to occur, but keeping the objects rooted for a much shorter time. Now – nowhere here am I suggesting that these rules are hard, fast rules that are always true.  That being said, my time spent optimizing over the years encourages me to naturally write code that follows the above guidelines, then profile and adjust as necessary.  In my current project, however, I ran across one of those nasty little pitfalls that’s something to keep in mind – interop changes the rules. In this case, I was dealing with an API that, internally, used some COM objects.  In this case, these COM objects were leading to native allocations (most likely C++) occurring in a loop deep in my call graph.  Even though I was writing nice, clean managed code, the normal managed code rules for performance no longer apply.  After profiling to find the bottleneck in my code, I realized that my inner loop, a innocuous looking block of C# code, was effectively causing a set of native memory allocations in every iteration.  This required going back to a “native programming” mindset for optimization.  Lifting these variables and reusing them took a 1:10 routine down to 0:20 – again, a very worthwhile improvement. Overall, the lessons here are: Always profile if you suspect a performance problem – don’t assume any rule is correct, or any code is efficient just because it looks like it should be Remember to check memory allocations when profiling, not just CPU cycles Interop scenarios often cause managed code to act very differently than “normal” managed code. Native code can be hidden very cleverly inside of managed wrappers

    Read the article

  • tile_static, tile_barrier, and tiled matrix multiplication with C++ AMP

    - by Daniel Moth
    We ended the previous post with a mechanical transformation of the C++ AMP matrix multiplication example to the tiled model and in the process introduced tiled_index and tiled_grid. This is part 2. tile_static memory You all know that in regular CPU code, static variables have the same value regardless of which thread accesses the static variable. This is in contrast with non-static local variables, where each thread has its own copy. Back to C++ AMP, the same rules apply and each thread has its own value for local variables in your lambda, whereas all threads see the same global memory, which is the data they have access to via the array and array_view. In addition, on an accelerator like the GPU, there is a programmable cache, a third kind of memory type if you'd like to think of it that way (some call it shared memory, others call it scratchpad memory). Variables stored in that memory share the same value for every thread in the same tile. So, when you use the tiled model, you can have variables where each thread in the same tile sees the same value for that variable, that threads from other tiles do not. The new storage class for local variables introduced for this purpose is called tile_static. You can only use tile_static in restrict(direct3d) functions, and only when explicitly using the tiled model. What this looks like in code should be no surprise, but here is a snippet to confirm your mental image, using a good old regular C array // each tile of threads has its own copy of locA, // shared among the threads of the tile tile_static float locA[16][16]; Note that tile_static variables are scoped and have the lifetime of the tile, and they cannot have constructors or destructors. tile_barrier In amp.h one of the types introduced is tile_barrier. You cannot construct this object yourself (although if you had one, you could use a copy constructor to create another one). So how do you get one of these? You get it, from a tiled_index object. Beyond the 4 properties returning index objects, tiled_index has another property, barrier, that returns a tile_barrier object. The tile_barrier class exposes a single member, the method wait. 15: // Given a tiled_index object named t_idx 16: t_idx.barrier.wait(); 17: // more code …in the code above, all threads in the tile will reach line 16 before a single one progresses to line 17. Note that all threads must be able to reach the barrier, i.e. if you had branchy code in such a way which meant that there is a chance that not all threads could reach line 16, then the code above would be illegal. Tiled Matrix Multiplication Example – part 2 So now that we added to our understanding the concepts of tile_static and tile_barrier, let me obfuscate rewrite the matrix multiplication code so that it takes advantage of tiling. Before you start reading this, I suggest you get a cup of your favorite non-alcoholic beverage to enjoy while you try to fully understand the code. 01: void MatrixMultiplyTiled(vector<float>& vC, const vector<float>& vA, const vector<float>& vB, int M, int N, int W) 02: { 03: static const int TS = 16; 04: array_view<const float,2> a(M, W, vA); 05: array_view<const float,2> b(W, N, vB); 06: array_view<writeonly<float>,2> c(M,N,vC); 07: parallel_for_each(c.grid.tile< TS, TS >(), 08: [=] (tiled_index< TS, TS> t_idx) restrict(direct3d) 09: { 10: int row = t_idx.local[0]; int col = t_idx.local[1]; 11: float sum = 0.0f; 12: for (int i = 0; i < W; i += TS) { 13: tile_static float locA[TS][TS], locB[TS][TS]; 14: locA[row][col] = a(t_idx.global[0], col + i); 15: locB[row][col] = b(row + i, t_idx.global[1]); 16: t_idx.barrier.wait(); 17: for (int k = 0; k < TS; k++) 18: sum += locA[row][k] * locB[k][col]; 19: t_idx.barrier.wait(); 20: } 21: c[t_idx.global] = sum; 22: }); 23: } Notice that all the code up to line 9 is the same as per the changes we made in part 1 of tiling introduction. If you squint, the body of the lambda itself preserves the original algorithm on lines 10, 11, and 17, 18, and 21. The difference being that those lines use new indexing and the tile_static arrays; the tile_static arrays are declared and initialized on the brand new lines 13-15. On those lines we copy from the global memory represented by the array_view objects (a and b), to the tile_static vanilla arrays (locA and locB) – we are copying enough to fit a tile. Because in the code that follows on line 18 we expect the data for this tile to be in the tile_static storage, we need to synchronize the threads within each tile with a barrier, which we do on line 16 (to avoid accessing uninitialized memory on line 18). We also need to synchronize the threads within a tile on line 19, again to avoid the race between lines 14, 15 (retrieving the next set of data for each tile and overwriting the previous set) and line 18 (not being done processing the previous set of data). Luckily, as part of the awesome C++ AMP debugger in Visual Studio there is an option that helps you find such races, but that is a story for another blog post another time. May I suggest reading the next section, and then coming back to re-read and walk through this code with pen and paper to really grok what is going on, if you haven't already? Cool. Why would I introduce this tiling complexity into my code? Funny you should ask that, I was just about to tell you. There is only one reason we tiled our extent, had to deal with finding a good tile size, ensure the number of threads we schedule are correctly divisible with the tile size, had to use a tiled_index instead of a normal index, and had to understand tile_barrier and to figure out where we need to use it, and double the size of our lambda in terms of lines of code: the reason is to be able to use tile_static memory. Why do we want to use tile_static memory? Because accessing tile_static memory is around 10 times faster than accessing the global memory on an accelerator like the GPU, e.g. in the code above, if you can get 150GB/second accessing data from the array_view a, you can get 1500GB/second accessing the tile_static array locA. And since by definition you are dealing with really large data sets, the savings really pay off. We have seen tiled implementations being twice as fast as their non-tiled counterparts. Now, some algorithms will not have performance benefits from tiling (and in fact may deteriorate), e.g. algorithms that require you to go only once to global memory will not benefit from tiling, since with tiling you already have to fetch the data once from global memory! Other algorithms may benefit, but you may decide that you are happy with your code being 150 times faster than the serial-version you had, and you do not need to invest to make it 250 times faster. Also algorithms with more than 3 dimensions, which C++ AMP supports in the non-tiled model, cannot be tiled. Also note that in future releases, we may invest in making the non-tiled model, which already uses tiling under the covers, go the extra step and use tile_static memory on your behalf, but it is obviously way to early to commit to anything like that, and we certainly don't do any of that today. Comments about this post by Daniel Moth welcome at the original blog.

    Read the article

< Previous Page | 231 232 233 234 235 236 237 238 239 240 241 242  | Next Page >