Search Results

Search found 13461 results on 539 pages for 'optimizing performance'.

Page 17/539 | < Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >

  • IIS/ASP.NET performance incident - Perfmon Current Annonymous Users going through roof but Requests/sec low

    - by Laurence
    Setup: ASP.NET 4.0 website on IIS 6.0 on Win 2003 64 bit, 8xCPUs, 16GB memory, separate SQL 2005 DB server. Had a serious slowdown today with any otherwise fairly well performing ASP.NET site. For a period of a couple of hours all page requests were taking a very long time to be served - e.g. 30-60s compared to usual 2s. The w3wp.exe's CPU and memory usage on the webserver was not much higher than normal. The application pool was not in the middle of recycling (and it hadn't recycled for several hours). Bottlenecks in the database were ruled out - no blocks occurring and query results were being returned quickly. I couldn't make any sense of it and set up the following Perfmon counters: Current Anonymous Users (for site in question) Get requests/sec (ditto) Requests/sec for the ASP.NET application running the site Get requests/sec was averaging 100-150. Requests/sec for ASP.NET was averaging 5-10. However Current Anonymous Users was around 200. And then as I was watching, the Current Anonymous Users began to climb steeply going up to about 500 within a few minutes. All this time Get requests/sec & Requests/sec for ASP.NET was if anything going down. I did a whole load of things (in a panic!) to try to get the site working, like shutting it down, recycling the app pool, and adding another worker process to the pool. I also extended the expiration time for content (in IIS under HTTP Headers) in an attempt to lower the number of requests for static files (there are a lot of images on the site). The site is now back to normal, and the counters are fairly steady and reading (added Current Connections counter): Current Anonymous Users : average 30 Get requests/sec : average 100 Requests/sec for ASP.NET : 5 Current Connections : average 300 I have also observed an inverse relationship between Get requests/sec & Current Anonymous Users. Usually both are fairly steady but there will be short periods when Get requests/sec will go down dramatically and Current Anonymous Users will go up in a perfect mirror image. Then they will flip back to their usual levels. So, my questions are: Thinking of the original performance issue - if w3wp.exe CPU, memory usage were normal and there was no DB bottleneck, what could explain page requests taking 20 times longer to be served than usual? What other counters should I be looking at if this happens again? What explains the inverse relationship between Get requests/sec & Current Anonymous Users? What could explain Current Anonymous Users going from 200 to 500 within a few minutes? Many thanks for any insight into this.

    Read the article

  • Baseline / Benchmark Physical and virtual server performance

    - by EyeonTech
    I am setting up a new server and there are some options. I want to perform some benchmarks and I need your help in determining the best tools and if possible run pre-configured benchmarks designed for SQL servers on Windows Server 2008/2012. Step 1. Run a performance monitor on the current Live SQL server (Windows Server 2008 Virtual machine running on ESXi. New server Hardware rundown: Intel® Server System R1304BTLSHBN - 1U Rack, LGA1155 http://ark.intel.com/products/53559/Intel-Server-System-R1304BTLSHBN Intel Xeon E3-1270V2 2x Intel SSD 330 Series 240GB 2.5in SATA 6Gb/s 25nm 1x WD 2TB WD2002FAEX 2TB 64M SATA3 CAVIAR BLACK 4x 8GB 1333MHz DDR3 ECC CL9 DIMM There are several options for configurations and I want to benchmark some of them and share the results. Option 1. Configure 2x SSDs at RAID 0. Install Windows Server 2008 directly to the 2TB WD Caviar HDD. Store Database files on the RAID 0 Volume. Benchmark the OS direct on the hardware as an SQL Server. Store SQL Backup databases on the 2TB WD Caviar HDD. Option 2. Configure 2x SSDs at RAID 0. Install Windows Server 2012 directly to the 2TB WD Caviar HDD. Install Hyper-V. Install the SQL Server (Server 2008) as a virtual machine. Store the Virtual Hard Disks on the SSDs. Option 3. Configure 2x SSDs at RAID 0. Install VMWare ESXi on a partition of the 2TB WD Caviar HDD. Install the SQL Server (Server 2008) as a virtual machine. Store the Virtual Hard Disks on the SSDs. I have a few tools in mind from http://technet.microsoft.com/en-us/library/cc768530(v=bts.10).aspx. Any tools with pre-configured test would be fantastic. Specifically if there are pre-configured perfmon sets avaliable. Any opinions on the setup to gain the best results is welcome. Thanks in advance.

    Read the article

  • MySQL performance over a (local) network much slower than I would expect

    - by user15241
    MySQL queries in my production environment are taking much longer than I would expect them too. The site in question is a fairly large Drupal site, with many modules installed. The webserver (Nginx) and database server (mysql) are hosted on separated machines, connected by a 100mbps LAN connection (hosted by Rackspace). I have the exact same site running on my laptop for development. Obviously, on my laptop, the webserver and database server are on the same box. Here are the results of my database query times: Production: Executed 291 queries in 320.33 milliseconds. (homepage) Executed 517 queries in 999.81 milliseconds. (content page) Development: Executed 316 queries in 46.28 milliseconds. (homepage) Executed 586 queries in 79.09 milliseconds. (content page) As can clearly be seen from these results, the time involved with querying the MySQL database is much shorter on my laptop, where the MySQL server is running on the same database as the web server. Why is this?! One factor must be the network latency. On average, a round trip from from the webserver to the database server takes 0.16ms (shown by ping). That must be added to every singe MySQL query. So, taking the content page example above, where there are 517 queries executed. Network latency alone will add 82ms to the total query time. However, that doesn't account for the difference I am seeing (79ms on my laptop vs 999ms on the production boxes). What other factors should I be looking at? I had thought about upgrading the NIC to a gigabit connection, but clearly there is something else involved. I have run the MySQL performance tuning script from http://www.day32.com/MySQL/ and it tells me that my database server is configured well (better than my laptop apparently). The only problem reported is "Of 4394 temp tables, 48% were created on disk". This is true in both environments and in the production environment I have even tried increasing max_heap_table_size and Current tmp_table_size to 1GB, with no change (I think this is because I have some BLOB and TEXT columns).

    Read the article

  • Performance Tuning a High-Load Apache Server

    - by futureal
    I am looking to understand some server performance problems I am seeing with a (for us) heavily loaded web server. The environment is as follows: Debian Lenny (all stable packages + patched to security updates) Apache 2.2.9 PHP 5.2.6 Amazon EC2 large instance The behavior we're seeing is that the web typically feels responsive, but with a slight delay to begin handling a request -- sometimes a fraction of a second, sometimes 2-3 seconds in our peak usage times. The actual load on the server is being reported as very high -- often 10.xx or 20.xx as reported by top. Further, running other things on the server during these times (even vi) is very slow, so the load is definitely up there. Oddly enough Apache remains very responsive, other than that initial delay. We have Apache configured as follows, using prefork: StartServers 5 MinSpareServers 5 MaxSpareServers 10 MaxClients 150 MaxRequestsPerChild 0 And KeepAlive as: KeepAlive On MaxKeepAliveRequests 100 KeepAliveTimeout 5 Looking at the server-status page, even at these times of heavy load we are rarely hitting the client cap, usually serving between 80-100 requests and many of those in the keepalive state. That tells me to rule out the initial request slowness as "waiting for a handler" but I may be wrong. Amazon's CloudWatch monitoring tells me that even when our OS is reporting a load of 15, our instance CPU utilization is between 75-80%. Example output from top: top - 15:47:06 up 31 days, 1:38, 8 users, load average: 11.46, 7.10, 6.56 Tasks: 221 total, 28 running, 193 sleeping, 0 stopped, 0 zombie Cpu(s): 66.9%us, 22.1%sy, 0.0%ni, 2.6%id, 3.1%wa, 0.0%hi, 0.7%si, 4.5%st Mem: 7871900k total, 7850624k used, 21276k free, 68728k buffers Swap: 0k total, 0k used, 0k free, 3750664k cached The majority of the processes look like: 24720 www-data 15 0 202m 26m 4412 S 9 0.3 0:02.97 apache2 24530 www-data 15 0 212m 35m 4544 S 7 0.5 0:03.05 apache2 24846 www-data 15 0 209m 33m 4420 S 7 0.4 0:01.03 apache2 24083 www-data 15 0 211m 35m 4484 S 7 0.5 0:07.14 apache2 24615 www-data 15 0 212m 35m 4404 S 7 0.5 0:02.89 apache2 Example output from vmstat at the same time as the above: procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 8 0 0 215084 68908 3774864 0 0 154 228 5 7 32 12 42 9 6 21 0 198948 68936 3775740 0 0 676 2363 4022 1047 56 16 9 15 23 0 0 169460 68936 3776356 0 0 432 1372 3762 835 76 21 0 0 23 1 0 140412 68936 3776648 0 0 280 0 3157 827 70 25 0 0 20 1 0 115892 68936 3776792 0 0 188 8 2802 532 68 24 0 0 6 1 0 133368 68936 3777780 0 0 752 71 3501 878 67 29 0 1 0 1 0 146656 68944 3778064 0 0 308 2052 3312 850 38 17 19 24 2 0 0 202104 68952 3778140 0 0 28 90 2617 700 44 13 33 5 9 0 0 188960 68956 3778200 0 0 8 0 2226 475 59 17 6 2 3 0 0 166364 68956 3778252 0 0 0 21 2288 386 65 19 1 0 And finally, output from Apache's server-status: Server uptime: 31 days 2 hours 18 minutes 31 seconds Total accesses: 60102946 - Total Traffic: 974.5 GB CPU Usage: u209.62 s75.19 cu0 cs0 - .0106% CPU load 22.4 requests/sec - 380.3 kB/second - 17.0 kB/request 107 requests currently being processed, 6 idle workers C.KKKW..KWWKKWKW.KKKCKK..KKK.KKKK.KK._WK.K.K.KKKKK.K.R.KK..C.C.K K.C.K..WK_K..KKW_CK.WK..W.KKKWKCKCKW.W_KKKKK.KKWKKKW._KKK.CKK... KK_KWKKKWKCKCWKK.KKKCK.......................................... ................................................................ From my limited experience I draw the following conclusions/questions: We may be allowing far too many KeepAlive requests I do see some time spent waiting for IO in the vmstat although not consistently and not a lot (I think?) so I am not sure this is a big concern or not, I am less experienced with vmstat Also in vmstat, I see in some iterations a number of processes waiting to be served, which is what I am attributing the initial page load delay on our web server to, possibly erroneously We serve a mixture of static content (75% or higher) and script content, and the script content is often fairly processor intensive, so finding the right balance between the two is important; long term we want to move statics elsewhere to optimize both servers but our software is not ready for that today I am happy to provide additional information if anybody has any ideas, the other note is that this is a high-availability production installation so I am wary of making tweak after tweak, and is why I haven't played with things like the KeepAlive value myself yet.

    Read the article

  • What is recommended minimum object size for gzip performance benefits?

    - by utt73
    I'm working on improving page speed display times, and one of the methods is to gzip content from the webserver. Google recommends: Note that gzipping is only beneficial for larger resources. Due to the overhead and latency of compression and decompression, you should only gzip files above a certain size threshold; we recommend a minimum range between 150 and 1000 bytes. Gzipping files below 150 bytes can actually make them larger. We serve our content through Akamai, using their network for a proxy and CDN. What they've told me: Following up on your question regarding what is the minimum size Akamai will compress the requested object when sending it to the end user: The minimum size is 860 bytes. My reply: What is the reason(s) for why Akamai's minimum size is 860 bytes? And why, for example, is this not the case for files Akamai serves for facebook? (see below) Google recommends to gzip more agressively. And that seems appropriate on our site where the most frequent hits, by far, are AJAX calls that are <860 bytes. Akamai's response: The reasons 860 bytes is the minimum size for compression is twofold: (1) The overhead of compressing an object under 860 bytes outweighs performance gain. (2) Objects under 860 bytes can be transmitted via a single packet anyway, so there isn't a compelling reason to compress them. So I'm here for some fact checking. Is the 860 byte limit due to packet size the end of this reasoning? Why would high traffic sites push this down to the 150 byte limit... just to save on bandwidth costs (since CDNs base their charges on bandwith offloaded from origin), or is there a performance gain in doing so?

    Read the article

  • How do I measure performance of a virtual server?

    - by Sergey
    I've got a VPS running Ubuntu. Being a virtual server, I understand that it shares resources with unknown number of other servers, and I'm noticing that it's considerably slower than my desktop machine. Is there some tool to measure the performance of the virtual machine? I'd be curious to see some approximate measure similar to bogomips, possibly for CPU (operations/sec), memory and disk read/write speed. I'd like to be able to compare those numbers to my desktop machine. I'm not interested in the specs of the actual physical machine my VPS is running on - by doing cat /proc/cpuinfo I can see that it's a nice quad-core Xeon machine, but it doesn't matter to me. I'm basically interested in how fast a program would run in my VPS - how many CPU operations it can make in a second, how many bytes to write to RAM or to disk. I only have ssh access to the machine so the tool need to be command-line. I could write a script which, say, does some calculations in a loop for a second and counts how many loops it was able to do, or something similar to measure disk and RAM performance. But I'm sure something like this already exists.

    Read the article

  • Performing client-side OAuth authorized Twitter API calls versus server side, how much of a difference is there in terms of performance?

    - by Terence Ponce
    I'm working on a Twitter application in Ruby on Rails. One of the biggest arguments that I have with other people on the project is the method of calling the Twitter API. Before, everything was done on the server: OAuth login, updating the user's Twitter data, and retrieving tweets. Retrieving tweets was the heaviest thing to do since we don't store the tweets in our database, so viewing the tweets means that we have to call the API every time. One of the people in the project suggested that we call the tweets through Javascript instead to lessen the load on the server. We used GET search, which, correct me if I'm wrong, will be removed when v1.0 becomes completely deprecated, but that really isn't a concern now. When the Twitter API has migrated completely to v1.1 (again, correct me if I'm wrong), every calls to the API must be authenticated, so we have to authenticate our Javascript requests to the API. As said here: We don't support or recommend performing OAuth directly through Javascript -- it's insecure and puts your application at risk. The only acceptable way to perform it is if you kept all keys and secrets server-side, computed the OAuth signatures and parameters server side, then issued the request client-side from the server-generated OAuth values. If we do exactly what Twitter suggests, the only difference between this and doing everything server-side is that our server won't have to contact the Twitter API anymore every time the user wants to view tweets. Here's how I would picture what's happening every time the user makes a request: If we do it through Javascript, it would be harder on my part because I would have to create the signatures manually for every request, but I will gladly do it if the boost in performance is worth all the trouble. Doing it through Ruby on Rails would be very easy since the Twitter gem does most of the grunt work already, so I'm really encouraging the other people in the project to agree with me. Is the difference in performance trivial or is it significant enough to switch to Javascript?

    Read the article

  • How to squeeze the maximum performance out of Unity and GNOME 3?

    - by melvincv
    I see that I do not get good performance with the new Unity desktop, but I should say that Unity has improved a lot since the last edition Ubuntu 11.10. How to squeeze the maximum performance out of 1. Unity 2. GNOME 3 My system specs: -Processors- Intel(R) Pentium(R) Dual CPU E2180 @ 2.00GHz -Memory- Total Memory : 2049996 kB -PCI Devices- Host bridge : Intel Corporation 82G33/G31/P35/P31 Express DRAM Controller (rev 10) PCI bridge : Intel Corporation 82G33/G31/P35/P31 Express PCI Express Root Port (rev 10) (prog-if 00 [Normal decode]) VGA compatible controller : Intel Corporation 82G33/G31 Express Integrated Graphics Controller (rev 10) (prog-if 00 [VGA controller]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01) (prog-if 00 [UHCI]) USB controller : Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01) (prog-if 20 [EHCI]) PCI bridge : Intel Corporation 82801 PCI Bridge (rev e1) (prog-if 01 [Subtractive decode]) ISA bridge : Intel Corporation 82801GB/GR (ICH7 Family) LPC Interface Bridge (rev 01) IDE interface : Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01) (prog-if 8a [Master SecP PriP]) IDE interface : Intel Corporation N10/ICH7 Family SATA Controller [IDE mode] (rev 01) (prog-if 8f [Master SecP SecO PriP PriO]) SMBus : Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01) Ethernet controller : Intel Corporation PRO/100 VE Network Connection (rev 01)

    Read the article

  • Optimizing AES modes on Solaris for Intel Westmere

    - by danx
    Optimizing AES modes on Solaris for Intel Westmere Review AES is a strong method of symmetric (secret-key) encryption. It is a U.S. FIPS-approved cryptographic algorithm (FIPS 197) that operates on 16-byte blocks. AES has been available since 2001 and is widely used. However, AES by itself has a weakness. AES encryption isn't usually used by itself because identical blocks of plaintext are always encrypted into identical blocks of ciphertext. This encryption can be easily attacked with "dictionaries" of common blocks of text and allows one to more-easily discern the content of the unknown cryptotext. This mode of encryption is called "Electronic Code Book" (ECB), because one in theory can keep a "code book" of all known cryptotext and plaintext results to cipher and decipher AES. In practice, a complete "code book" is not practical, even in electronic form, but large dictionaries of common plaintext blocks is still possible. Here's a diagram of encrypting input data using AES ECB mode: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 What's the solution to the same cleartext input producing the same ciphertext output? The solution is to further process the encrypted or decrypted text in such a way that the same text produces different output. This usually involves an Initialization Vector (IV) and XORing the decrypted or encrypted text. As an example, I'll illustrate CBC mode encryption: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ IV >----->(XOR) +------------->(XOR) +---> . . . . | | | | | | | | \/ | \/ | AESKey-->(AES Encryption) | AESKey-->(AES Encryption) | | | | | | | | | \/ | \/ | CipherTextOutput ------+ CipherTextOutput -------+ Block 1 Block 2 The steps for CBC encryption are: Start with a 16-byte Initialization Vector (IV), choosen randomly. XOR the IV with the first block of input plaintext Encrypt the result with AES using a user-provided key. The result is the first 16-bytes of output cryptotext. Use the cryptotext (instead of the IV) of the previous block to XOR with the next input block of plaintext Another mode besides CBC is Counter Mode (CTR). As with CBC mode, it also starts with a 16-byte IV. However, for subsequent blocks, the IV is just incremented by one. Also, the IV ix XORed with the AES encryption result (not the plain text input). Here's an illustration: Block 1 Block 2 PlainTextInput PlainTextInput | | | | \/ \/ AESKey-->(AES Encryption) AESKey-->(AES Encryption) | | | | \/ \/ IV >----->(XOR) IV + 1 >---->(XOR) IV + 2 ---> . . . . | | | | \/ \/ CipherTextOutput CipherTextOutput Block 1 Block 2 Optimization Which of these modes can be parallelized? ECB encryption/decryption can be parallelized because it does more than plain AES encryption and decryption, as mentioned above. CBC encryption can't be parallelized because it depends on the output of the previous block. However, CBC decryption can be parallelized because all the encrypted blocks are known at the beginning. CTR encryption and decryption can be parallelized because the input to each block is known--it's just the IV incremented by one for each subsequent block. So, in summary, for ECB, CBC, and CTR modes, encryption and decryption can be parallelized with the exception of CBC encryption. How do we parallelize encryption? By interleaving. Usually when reading and writing data there are pipeline "stalls" (idle processor cycles) that result from waiting for memory to be loaded or stored to or from CPU registers. Since the software is written to encrypt/decrypt the next data block where pipeline stalls usually occurs, we can avoid stalls and crypt with fewer cycles. This software processes 4 blocks at a time, which ensures virtually no waiting ("stalling") for reading or writing data in memory. Other Optimizations Besides interleaving, other optimizations performed are Loading the entire key schedule into the 128-bit %xmm registers. This is done once for per 4-block of data (since 4 blocks of data is processed, when present). The following is loaded: the entire "key schedule" (user input key preprocessed for encryption and decryption). This takes 11, 13, or 15 registers, for AES-128, AES-192, and AES-256, respectively The input data is loaded into another %xmm register The same register contains the output result after encrypting/decrypting Using SSSE 4 instructions (AESNI). Besides the aesenc, aesenclast, aesdec, aesdeclast, aeskeygenassist, and aesimc AESNI instructions, Intel has several other instructions that operate on the 128-bit %xmm registers. Some common instructions for encryption are: pxor exclusive or (very useful), movdqu load/store a %xmm register from/to memory, pshufb shuffle bytes for byte swapping, pclmulqdq carry-less multiply for GCM mode Combining AES encryption/decryption with CBC or CTR modes processing. Instead of loading input data twice (once for AES encryption/decryption, and again for modes (CTR or CBC, for example) processing, the input data is loaded once as both AES and modes operations occur at in the same function Performance Everyone likes pretty color charts, so here they are. I ran these on Solaris 11 running on a Piketon Platform system with a 4-core Intel Clarkdale processor @3.20GHz. Clarkdale which is part of the Westmere processor architecture family. The "before" case is Solaris 11, unmodified. Keep in mind that the "before" case already has been optimized with hand-coded Intel AESNI assembly. The "after" case has combined AES-NI and mode instructions, interleaved 4 blocks at-a-time. « For the first table, lower is better (milliseconds). The first table shows the performance improvement using the Solaris encrypt(1) and decrypt(1) CLI commands. I encrypted and decrypted a 1/2 GByte file on /tmp (swap tmpfs). Encryption improved by about 40% and decryption improved by about 80%. AES-128 is slighty faster than AES-256, as expected. The second table shows more detail timings for CBC, CTR, and ECB modes for the 3 AES key sizes and different data lengths. » The results shown are the percentage improvement as shown by an internal PKCS#11 microbenchmark. And keep in mind the previous baseline code already had optimized AESNI assembly! The keysize (AES-128, 192, or 256) makes little difference in relative percentage improvement (although, of course, AES-128 is faster than AES-256). Larger data sizes show better improvement than 128-byte data. Availability This software is in Solaris 11 FCS. It is available in the 64-bit libcrypto library and the "aes" Solaris kernel module. You must be running hardware that supports AESNI (for example, Intel Westmere and Sandy Bridge, microprocessor architectures). The easiest way to determine if AES-NI is available is with the isainfo(1) command. For example, $ isainfo -v 64-bit amd64 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov amd_sysc cx8 tsc fpu 32-bit i386 applications pclmulqdq aes sse4.2 sse4.1 ssse3 popcnt tscp ahf cx16 sse3 sse2 sse fxsr mmx cmov sep cx8 tsc fpu No special configuration or setup is needed to take advantage of this software. Solaris libraries and kernel automatically determine if it's running on AESNI-capable machines and execute the correctly-tuned software for the current microprocessor. Summary Maximum throughput of AES cipher modes can be achieved by combining AES encryption with modes processing, interleaving encryption of 4 blocks at a time, and using Intel's wide 128-bit %xmm registers and instructions. References "Block cipher modes of operation", Wikipedia Good overview of AES modes (ECB, CBC, CTR, etc.) "Advanced Encryption Standard", Wikipedia "Current Modes" describes NIST-approved block cipher modes (ECB,CBC, CFB, OFB, CCM, GCM)

    Read the article

  • Premature-Optimization and Performance Anxiety

    - by James Michael Hare
    While writing my post analyzing the new .NET 4 ConcurrentDictionary class (here), I fell into one of the classic blunders that I myself always love to warn about.  After analyzing the differences of time between a Dictionary with locking versus the new ConcurrentDictionary class, I noted that the ConcurrentDictionary was faster with read-heavy multi-threaded operations.  Then, I made the classic blunder of thinking that because the original Dictionary with locking was faster for those write-heavy uses, it was the best choice for those types of tasks.  In short, I fell into the premature-optimization anti-pattern. Basically, the premature-optimization anti-pattern is when a developer is coding very early for a perceived (whether rightly-or-wrongly) performance gain and sacrificing good design and maintainability in the process.  At best, the performance gains are usually negligible and at worst, can either negatively impact performance, or can degrade maintainability so much that time to market suffers or the code becomes very fragile due to the complexity. Keep in mind the distinction above.  I'm not talking about valid performance decisions.  There are decisions one should make when designing and writing an application that are valid performance decisions.  Examples of this are knowing the best data structures for a given situation (Dictionary versus List, for example) and choosing performance algorithms (linear search vs. binary search).  But these in my mind are macro optimizations.  The error is not in deciding to use a better data structure or algorithm, the anti-pattern as stated above is when you attempt to over-optimize early on in such a way that it sacrifices maintainability. In my case, I was actually considering trading the safety and maintainability gains of the ConcurrentDictionary (no locking required) for a slight performance gain by using the Dictionary with locking.  This would have been a mistake as I would be trading maintainability (ConcurrentDictionary requires no locking which helps readability) and safety (ConcurrentDictionary is safe for iteration even while being modified and you don't risk the developer locking incorrectly) -- and I fell for it even when I knew to watch out for it.  I think in my case, and it may be true for others as well, a large part of it was due to the time I was trained as a developer.  I began college in in the 90s when C and C++ was king and hardware speed and memory were still relatively priceless commodities and not to be squandered.  In those days, using a long instead of a short could waste precious resources, and as such, we were taught to try to minimize space and favor performance.  This is why in many cases such early code-bases were very hard to maintain.  I don't know how many times I heard back then to avoid too many function calls because of the overhead -- and in fact just last year I heard a new hire in the company where I work declare that she didn't want to refactor a long method because of function call overhead.  Now back then, that may have been a valid concern, but with today's modern hardware even if you're calling a trivial method in an extremely tight loop (which chances are the JIT compiler would optimize anyway) the results of removing method calls to speed up performance are negligible for the great majority of applications.  Now, obviously, there are those coding applications where speed is absolutely king (for example drivers, computer games, operating systems) where such sacrifices may be made.  But I would strongly advice against such optimization because of it's cost.  Many folks that are performing an optimization think it's always a win-win.  That they're simply adding speed to the application, what could possibly be wrong with that?  What they don't realize is the cost of their choice.  For every piece of straight-forward code that you obfuscate with performance enhancements, you risk the introduction of bugs in the long term technical debt of the application.  It will become so fragile over time that maintenance will become a nightmare.  I've seen such applications in places I have worked.  There are times I've seen applications where the designer was so obsessed with performance that they even designed their own memory management system for their application to try to squeeze out every ounce of performance.  Unfortunately, the application stability often suffers as a result and it is very difficult for anyone other than the original designer to maintain. I've even seen this recently where I heard a C++ developer bemoaning that in VS2010 the iterators are about twice as slow as they used to be because Microsoft added range checking (probably as part of the 0x standard implementation).  To me this was almost a joke.  Twice as slow sounds bad, but it almost never as bad as you think -- especially if you're gaining safety.  The only time twice is really that much slower is when once was too slow to begin with.  Think about it.  2 minutes is slow as a response time because 1 minute is slow.  But if an iterator takes 1 microsecond to move one position and a new, safer iterator takes 2 microseconds, this is trivial!  The only way you'd ever really notice this would be in iterating a collection just for the sake of iterating (i.e. no other operations).  To my mind, the added safety makes the extra time worth it. Always favor safety and maintainability when you can.  I know it can be a hard habit to break, especially if you started out your career early or in a language such as C where they are very performance conscious.  But in reality, these type of micro-optimizations only end up hurting you in the long run. Remember the two laws of optimization.  I'm not sure where I first heard these, but they are so true: For beginners: Do not optimize. For experts: Do not optimize yet. This is so true.  If you're a beginner, resist the urge to optimize at all costs.  And if you are an expert, delay that decision.  As long as you have chosen the right data structures and algorithms for your task, your performance will probably be more than sufficient.  Chances are it will be network, database, or disk hits that will be your slow-down, not your code.  As they say, 98% of your code's bottleneck is in 2% of your code so premature-optimization may add maintenance and safety debt that won't have any measurable impact.  Instead, code for maintainability and safety, and then, and only then, when you find a true bottleneck, then you should go back and optimize further.

    Read the article

  • How to avoid Memory "Hard Fault/sec"

    - by Flavio Oliveira
    i've a problem on my windows 2008 server x64, and i cannot understand how can i solve it. i'm looking to Resource Monitor and see about 100 to 200 hard faults/sec. and generally the machine is slow. As i've readed a bit it is caused by a "memory Page" that is no longer available on physical memory and causes a io operations (disk) and it is a problem. The current hardware is a intel core2duo E8400 (3.0GHz) with 6GB RAM on a Windows Server Web 64-bit. Actually the machine have about 2GB Ram used what having 4Gb available to use, Why is the machine requires that high level of Disk operations? what can i do to increase the performance? Im experiencing a memory issues? what should be my starting point?

    Read the article

  • SQL server peformance, virtual memory usage

    - by user45641
    Hello, I have a very large DB used mostly for analytics. The performance overall is very sluggish. I just noticed that when running the query below, the amount of virtual memory used greatly exceeds the amount of physical memory available. Currently, physical memory is 10GB (10238 MB) whereas the virtual memory returns significantly more - 8388607 MB. That seems really wrong, but I'm at a bit of a loss on how to proceed. USE [master]; GO select cpu_count , hyperthread_ratio , physical_memory_in_bytes / 1048576 as 'mem_MB' , virtual_memory_in_bytes / 1048576 as 'virtual_mem_MB' , max_workers_count , os_error_mode , os_priority_class from sys.dm_os_sys_info

    Read the article

  • Benchmarking Java programs

    - by stefan-ock
    For university, I perform bytecode modifications and analyze their influence on performance of Java programs. Therefore, I need Java programs---in best case used in production---and appropriate benchmarks. For instance, I already got HyperSQL and measure its performance by the benchmark program PolePosition. The Java programs running on a JVM without JIT compiler. Thanks for your help! P.S.: I cannot use programs to benchmark the performance of the JVM or of the Java language itself (such as Wide Finder).

    Read the article

  • Optimizing MySQL -

    - by Josh
    I've been researching how to optimize MySQL a bit, but I still have a few questions. MySQL Primer Results http://pastie.org/private/lzjukl8wacxfjbjhge6vw Based on this, the first problem seems to be that the max_connections limit is too low. I had a similar problem with Apache initially, the max connection limit was set to 100, and the web server would frequently lock up and take an excruciatingly long time to deliver pages. Raising the connection limit to 512 fixed this issue, and I read that raising the connection limit on MySQL to match this was considered good practice. Being that MySQL has actually been "locking up" recently as well (connections have been refused entirely for a few minutes at a time at random intervals) I'm assuming this is the main cause of the issue. However, as far as table cache goes, I'm not sure what I should set this as. I've read that setting this too high can hinder performance further, so should I raise this to right around 551, 560, 600, or do something else? Lastly, as far as raising the join_buffer_size value goes, this doesn't even seem to be included in Debian's my.cnf file by default. Assuming there's not much I can do about adding indexes, should I look into raising this? Any suggested values? Any suggestions in general here would be appreciated as well. Edit: Here's the number of open tables the MySQL server is reporting. I believe this value is related to my question (Opened_tables: 22574)

    Read the article

  • Optimizing MySQL, Improving Performance of Database Servers

    - by Antoinette O'Sullivan
    Optimization involves improving the performance of a database server and queries that run against it. Optimization reduces query execution time and optimized queries benefit everyone that uses the server. When the server runs more smoothly and processes more queries with less, it performs better as a whole. To learn more about how a MySQL developer can make a difference with optimization, take the MySQL Developers training course. This 5-day instructor-led course is available as: Live-Virtual Event: Attend a live class from your own desk - no travel required. Choose from a selection of events on the schedule to suit different timezones. In-Class Event: Travel to an education center to attend an event. Below is a selection of the events on the schedule.  Location  Date  Delivery Language  Vienna, Austria  17 November 2014  German  Brussels, Belgium  8 December 2014  English  Sao Paulo, Brazil  14 July 2014  Brazilian Portuguese London, English  29 September 2014  English   Belfast, Ireland  6 October 2014  English  Dublin, Ireland  27 October 2014  English  Milan, Italy  10 November 2014  Italian  Rome, Italy  21 July 2014  Italian  Nairobi, Kenya  14 July 2014  English  Petaling Jaya, Malaysia  25 August 2014  English  Utrecht, Netherlands  21 July 2014  English  Makati City, Philippines  29 September 2014  English  Warsaw, Poland  25 August 2014  Polish  Lisbon, Portugal  13 October 2014  European Portuguese  Porto, Portugal  13 October 2014  European Portuguese  Barcelona, Spain  7 July 2014  Spanish  Madrid, Spain  3 November 2014  Spanish  Valencia, Spain  24 November 2014  Spanish  Basel, Switzerland  4 August 2014  German  Bern, Switzerland  4 August 2014  German  Zurich, Switzerland  4 August 2014  German The MySQL for Developers course helps prepare you for the MySQL 5.6 Developers OCP certification exam. To register for an event, request an additional event or learn more about the authentic MySQL curriculum, go to http://education.oracle.com/mysql.

    Read the article

  • Can compressing Program Files save space *and* give a significant boost to SSD performance?

    - by Christopher Galpin
    Considering solid-state disk space is still an expensive resource, compressing large folders has appeal. Thanks to VirtualStore, could Program Files be a case where it might even improve performance? Discovery In particular I have been reading: SSD and NTFS Compression Speed Increase? Does NTFS compression slow SSD/flash performance? Will somebody benchmark whole disk compression (HD,SSD) please? (may have to scroll up) The first link is particularly dreamy, but maybe head a little too far in the clouds. The third link has this sexy semi-log graph (logarithmic scale!). Quote (with notes): Using highly compressable data (IOmeter), you get at most a 30x performance increase [for reads], and at least a 49x performance DECREASE [for writes]. Assuming I interpreted and clarified that sentence correctly, this single user's benchmark has me incredibly interested. Although write performance tanks wretchedly, read performance still soars. It gave me an idea. Idea: VirtualStore It so happens that thanks to sanity saving security features introduced in Windows Vista, write access to certain folders such as Program Files is virtualized for non-administrator processes. Which means, in normal (non-elevated) usage, a program or game's attempt to write data to its install location in Program Files (which is perhaps a poor location) is redirected to %UserProfile%\AppData\Local\VirtualStore, somewhere entirely different. Thus, to my understanding, writes to Program Files should primarily only occur when installing an application. This makes compressing it not only a huge source of space gain, but also a potential candidate for performance gain. Testing The beginning of this post has me a bit timid, it suggests benchmarking NTFS compression on a whole drive is difficult because turning it off "doesn't decompress the objects". However it seems to me the compact command is perfectly capable of doing so for both drives and individual folders. Could it be only marking them for decompression the next time the OS reads from them? I need to find the answer before I begin my own testing.

    Read the article

  • Performance question: Inverting an array of pointers in-place vs array of values

    - by Anders
    The background for asking this question is that I am solving a linearized equation system (Ax=b), where A is a matrix (typically of dimension less than 100x100) and x and b are vectors. I am using a direct method, meaning that I first invert A, then find the solution by x=A^(-1)b. This step is repated in an iterative process until convergence. The way I'm doing it now, using a matrix library (MTL4): For every iteration I copy all coeffiecients of A (values) in to the matrix object, then invert. This the easiest and safest option. Using an array of pointers instead: For my particular case, the coefficients of A happen to be updated between each iteration. These coefficients are stored in different variables (some are arrays, some are not). Would there be a potential for performance gain if I set up A as an array containing pointers to these coefficient variables, then inverting A in-place? The nice thing about the last option is that once I have set up the pointers in A before the first iteration, I would not need to copy any values between successive iterations. The values which are pointed to in A would automatically be updated between iterations. So the performance question boils down to this, as I see it: - The matrix inversion process takes roughly the same amount of time, assuming de-referencing of pointers is non-expensive. - The array of pointers does not need the extra memory for matrix A containing values. - The array of pointers option does not have to copy all NxN values of A between each iteration. - The values that are pointed to the array of pointers option are generally NOT ordered in memory. Hopefully, all values lie relatively close in memory, but *A[0][1] is generally not next to *A[0][0] etc. Any comments to this? Will the last remark affect performance negatively, thus weighing up for the positive performance effects?

    Read the article

  • Performance impact: What is the optimal payload for SqlBulkCopy.WriteToServer()?

    - by Linchi Shea
    For many years, I have been using a C# program to generate the TPC-C compliant data for testing. The program relies on the SqlBulkCopy class to load the data generated by the program into the SQL Server tables. In general, the performance of this C# data loader is satisfactory. Lately however, I found myself in a situation where I needed to generate a much larger amount of data than I typically do and the data needed to be loaded within a confined time frame. So I was driven to look into the code...(read more)

    Read the article

  • Performance triage

    - by Dave
    Folks often ask me how to approach a suspected performance issue. My personal strategy is informed by the fact that I work on concurrency issues. (When you have a hammer everything looks like a nail, but I'll try to keep this general). A good starting point is to ask yourself if the observed performance matches your expectations. Expectations might be derived from known system performance limits, prototypes, and other software or environments that are comparable to your particular system-under-test. Some simple comparisons and microbenchmarks can be useful at this stage. It's also useful to write some very simple programs to validate some of the reported or expected system limits. Can that disk controller really tolerate and sustain 500 reads per second? To reduce the number of confounding factors it's better to try to answer that question with a very simple targeted program. And finally, nothing beats having familiarity with the technologies that underlying your particular layer. On the topic of confounding factors, as our technology stacks become deeper and less transparent, we often find our own technology working against us in some unexpected way to choke performance rather than simply running into some fundamental system limit. A good example is the warm-up time needed by just-in-time compilers in Java Virtual Machines. I won't delve too far into that particular hole except to say that it's rare to find good benchmarks and methodology for java code. Another example is power management on x86. Power management is great, but it can take a while for the CPUs to throttle up from low(er) frequencies to full throttle. And while I love "turbo" mode, it makes benchmarking applications with multiple threads a chore as you have to remember to turn it off and then back on otherwise short single-threaded runs may look abnormally fast compared to runs with higher thread counts. In general for performance characterization I disable turbo mode and fix the power governor at "performance" state. Another source of complexity is the scheduler, which I've discussed in prior blog entries. Lets say I have a running application and I want to better understand its behavior and performance. We'll presume it's warmed up, is under load, and is an execution mode representative of what we think the norm would be. It should be in steady-state, if a steady-state mode even exists. On Solaris the very first thing I'll do is take a set of "pstack" samples. Pstack briefly stops the process and walks each of the stacks, reporting symbolic information (if available) for each frame. For Java, pstack has been augmented to understand java frames, and even report inlining. A few pstack samples can provide powerful insight into what's actually going on inside the program. You'll be able to see calling patterns, which threads are blocked on what system calls or synchronization constructs, memory allocation, etc. If your code is CPU-bound then you'll get a good sense where the cycles are being spent. (I should caution that normal C/C++ inlining can diffuse an otherwise "hot" method into other methods. This is a rare instance where pstack sampling might not immediately point to the key problem). At this point you'll need to reconcile what you're seeing with pstack and your mental model of what you think the program should be doing. They're often rather different. And generally if there's a key performance issue, you'll spot it with a moderate number of samples. I'll also use OS-level observability tools to lock for the existence of bottlenecks where threads contend for locks; other situations where threads are blocked; and the distribution of threads over the system. On Solaris some good tools are mpstat and too a lesser degree, vmstat. Try running "mpstat -a 5" in one window while the application program runs concurrently. One key measure is the voluntary context switch rate "vctx" or "csw" which reflects threads descheduling themselves. It's also good to look at the user; system; and idle CPU percentages. This can give a broad but useful understanding if your threads are mostly parked or mostly running. For instance if your program makes heavy use of malloc/free, then it might be the case you're contending on the central malloc lock in the default allocator. In that case you'd see malloc calling lock in the stack traces, observe a high csw/vctx rate as threads block for the malloc lock, and your "usr" time would be less than expected. Solaris dtrace is a wonderful and invaluable performance tool as well, but in a sense you have to frame and articulate a meaningful and specific question to get a useful answer, so I tend not to use it for first-order screening of problems. It's also most effective for OS and software-level performance issues as opposed to HW-level issues. For that reason I recommend mpstat & pstack as my the 1st step in performance triage. If some other OS-level issue is evident then it's good to switch to dtrace to drill more deeply into the problem. Only after I've ruled out OS-level issues do I switch to using hardware performance counters to look for architectural impediments.

    Read the article

  • Higher Performance With Spritesheets Than With Rotating Using C# and XNA 4.0?

    - by Manuel Maier
    I would like to know what the performance difference is between using multiple sprites in one file (sprite sheets) to draw a game-character being able to face in 4 directions and using one sprite per file but rotating that character to my needs. I am aware that the sprite sheet method restricts the character to only be able to look into predefined directions, whereas the rotation method would give the character the freedom of "looking everywhere". Here's an example of what I am doing: Single Sprite Method Assuming I have a 64x64 texture that points north. So I do the following if I wanted it to point east: spriteBatch.Draw( _sampleTexture, new Rectangle(200, 100, 64, 64), null, Color.White, (float)(Math.PI / 2), Vector2.Zero, SpriteEffects.None, 0); Multiple Sprite Method Now I got a sprite sheet (128x128) where the top-left 64x64 section contains a sprite pointing north, top-right 64x64 section points east, and so forth. And to make it point east, i do the following: spriteBatch.Draw( _sampleSpritesheet, new Rectangle(400, 100, 64, 64), new Rectangle(64, 0, 64, 64), Color.White); So which of these methods is using less CPU-time and what are the pro's and con's? Is .NET/XNA optimizing this in any way (e.g. it notices that the same call was done last frame and then just uses an already rendered/rotated image thats still in memory)?

    Read the article

  • Disabling CPU management

    - by Tiffany Walker
    If I add the following processor.max_cstate=0 to the kernel command line for boot up, does that disable all CPU power management and throttling? I also found: http://www.experts-exchange.com/OS/Linux/Administration/A_3492-Avoiding-CPU-speed-scaling-in-modern-Linux-distributions-Running-CPU-at-full-speed-Tips.html The link talks of Change CPU governor from 'ondemand' to 'performance' for all CPUs/cores and disabling kondemand from kernel. Server is for web hosting UPDATES: 2.6.32-379.1.1.lve1.1.7.6.el6.x86_64 #1 SMP Sat Aug 4 09:56:37 EDT 2012 x86_64 x86_64 x86_64 GNU/Linux . # dmidecode 2.11 SMBIOS 2.6 present. 74 structures occupying 2878 bytes. Table at 0x0009F000. Handle 0x0000, DMI type 0, 24 bytes BIOS Information Vendor: American Megatrends Inc. Version: 1.0c Release Date: 05/27/2010 Address: 0xF0000 Runtime Size: 64 kB ROM Size: 4096 kB Characteristics: ISA is supported PCI is supported PNP is supported BIOS is upgradeable BIOS shadowing is allowed ESCD support is available Boot from CD is supported Selectable boot is supported BIOS ROM is socketed EDD is supported 5.25"/1.2 MB floppy services are supported (int 13h) 3.5"/720 kB floppy services are supported (int 13h) 3.5"/2.88 MB floppy services are supported (int 13h) Print screen service is supported (int 5h) 8042 keyboard services are supported (int 9h) Serial services are supported (int 14h) Printer services are supported (int 17h) CGA/mono video services are supported (int 10h) ACPI is supported USB legacy is supported LS-120 boot is supported ATAPI Zip drive boot is supported BIOS boot specification is supported Targeted content distribution is supported BIOS Revision: 8.16 Handle 0x0001, DMI type 1, 27 bytes System Information Manufacturer: Supermicro Product Name: X8SIE Version: 0123456789 Serial Number: 0123456789 UUID: 49434D53-0200-9033-2500-33902500D52C Wake-up Type: Power Switch SKU Number: To Be Filled By O.E.M. Family: To Be Filled By O.E.M. Handle 0x0002, DMI type 2, 15 bytes Base Board Information Manufacturer: Supermicro Product Name: X8SIE Version: 0123456789 Serial Number: VM11S61561 Asset Tag: To Be Filled By O.E.M. Features: Board is a hosting board Board is replaceable Location In Chassis: To Be Filled By O.E.M. Chassis Handle: 0x0003 Type: Motherboard Contained Object Handles: 0 Handle 0x0003, DMI type 3, 21 bytes Chassis Information Manufacturer: Supermicro Type: Sealed-case PC Lock: Not Present Version: 0123456789 Serial Number: 0123456789 Asset Tag: To Be Filled By O.E.M. Boot-up State: Safe Power Supply State: Safe Thermal State: Safe Security Status: None OEM Information: 0x00000000 Height: Unspecified Number Of Power Cords: 1 Contained Elements: 0

    Read the article

  • Very long (>300s) request processing time on Apache Server serving static content from particular IP

    - by Ron Bieber
    We are running an Apache 2.2 server for a very large web site. Over the past few months we have been having some users reporting slow response times, while others (including our resources, both on the internal network and our home networks) do not see any degradation in performance. After a ton of investigation, we finally found a "Deny from none" statement in our configuration that was causing reverse DNS lookups (which were timing out) that solved the bulk of our issues, but we still have some customers that we are seeing in the Apache logs (using %D in the log format) with request processing times of 300s for images, css, javascript and other static content. We've checked all Deny / Allow statements for reoccurrence of "none", as well as all other things we know of that would cause reverse DNS lookups (such as using "REMOTE_HOST" in rewrite rules, using %a instead of %h in our log format configuration) as well as verified that HostnameLookups is set to "Off". As an aside, we've also validated that reverse DNS lookups for folks having this problem do not time out - so I'm fairly certain DNS is not an issue in this case. I've run out of ideas. Are there any Apache configuration scenarios that someone can point me to that I might be missing that would cause request times for static content to take so long only for certain users? Thank you in advance.

    Read the article

  • Understanding RedHats recommended tuned profiles

    - by espenfjo
    We are going to roll out tuned (and numad) on ~1000 servers, the majority of them being VMware servers either on NetApp or 3Par storage. According to RedHats documentation we should choose the virtual-guestprofile. What it is doing can be seen here: tuned.conf We are changing the IO scheduler to NOOP as both VMware and the NetApp/3Par should do sufficient scheduling for us. However, after investigating a bit I am not sure why they are increasing vm.dirty_ratio and kernel.sched_min_granularity_ns. As far as I have understood increasing increasing vm.dirty_ratio to 40% will mean that for a server with 20GB ram, 8GB can be dirty at any given time unless vm.dirty_writeback_centisecsis hit first. And while flushing these 8GB all IO for the application will be blocked until the dirty pages are freed. Increasing the dirty_ratio would probably mean higher write performance at peaks as we now have a larger cache, but then again when the cache fills IO will be blocked for a considerably longer time (Several seconds). The other is why they are increasing the sched_min_granularity_ns. If I understand it correctly increasing this value will decrease the number of time slices per epoch(sched_latency_ns) meaning that running tasks will get more time to finish their work. I can understand this being a very good thing for applications with very few threads, but for eg. apache or other processes with a lot of threads would this not be counter-productive?

    Read the article

  • F# performance in scientific computing

    - by aaa
    hello. I am curious as to how F# performance compares to C++ performance? I asked a similar question with regards to Java, and the impression I got was that Java is not suitable for heavy numbercrunching. I have read that F# is supposed to be more scalable and more performant, but how is this real-world performance compares to C++? specific questions about current implementation are: How well does it do floating-point? Does it allow vector instructions how friendly is it towards optimizing compilers? How big a memory foot print does it have? Does it allow fine-grained control over memory locality? does it have capacity for distributed memory processors, for example Cray? what features does it have that may be of interest to computational science where heavy number processing is involved? Are there actual scientific computing implementations that use it? Thanks

    Read the article

  • VS2010 + Resharper 5 performance issues

    - by Jeremy Roberts
    I have been using VS2010 with Resharper 5 for several weeks and am having a performance issue. Sometimes when typing, the cursor will lag and the keystrokes won't show instantaneously. Also, scrolling will lag at times. There is a forum thread started and JetBrains has been responding. Several people (including myself) have added their voice and uploaded some performance profiles. If anyone here has has this issue, I would encourage you to visit the thread and let JetBrains know about it. Has anyone had this problem and have a suggestion to restore performance?

    Read the article

< Previous Page | 13 14 15 16 17 18 19 20 21 22 23 24  | Next Page >