Search Results

Search found 854 results on 35 pages for 'cores'.

Page 12/35 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

HyperV - low CPU usage

- by Klark

I am very new to HyperV and virtual machine philosophy in general, so please expect more or less nooby questions :) I have a server that is only used as a host for virtual machines. OS is windows server 2008 R2 and it is running on 16 CPU and 48 GBs of RAM. On aforementioned server there are 8 VMs, each having 4 CPUs and 4 GBs of RAM. On those VMs we are running some CPU intensive tasks. Each machine has nearly 100% cpu usage. After I noticed slow performance I went to the host machine and started playing with process explorer. It turned out that cpu usage is very low. Also I/O is very low, and of course, memory consumption is high, which is expected. Of course, I don't expect that those 4 virtual cores dedicated to a VM work as fast as real, hardware 4 cores, but still I expected a higher consumption of real hardware. Is this sort of behaviour normal? I see that the most of CPU usage on host machine are marked as interrupts (which I guess is normal) and all those interrupts are passed to only one core (which is strange). Are there out of box optimization that I could perform to finally use all that processing power that is under the hood. My knowledge of virtualization technology is near to embarrassing, so I would be grateful for any links that could enlightened me :) Thanks.

Read the article
poor performance when deleteing many files

- by choppy

I've got two machines: The first is IBM Blade with 24 cores 96GB RAM and single local hard drive with 278GB divided to 4 partitions: 1. c: - 40GB; 3GB free 2. d: - 40GB; 37GB free 3. e: - 198322GB; 198.1 free 4. 100MB (EFI system Partition) Formatted with GPT The other is pizza server with 4 cores 8GB RAM and single local hard drive with 273GB divided to 3 partitions: 1. c: - 136.81; 20GB free 2. d: - 88.74GB; 87.91 free 3. e: - 47.85GB; 46.91 free Formatted with MBR I have two scripts, the first creates 20,000 files in one directory, each file size is 192KB, the second delete the folder (recursive) and prints how much time it toke to delete all files. The problem is on the first server (blade) it takes about 2 minutes to delete all 20,000 files while on the second (pizza) it takes about 4 seconds!? Both servers have clean windows server 2008R2 with no special application running on background. Any ideas what is going on?

Read the article
Tuning Linux + HAProxy

- by react

I'm currently rolling out HAProxy on Centos 6 which will send requests to some Apache HTTPD servers and I'm having issues with performance. I've spent the last couple of days googling and still can't seem to get past 10k/sec connections consistently when benchmarking (sometimes I do get 30k/sec though). I've pinned the IRQ's of the TX/RX queues for both the internal and external NICS to separate CPU cores and made sure HAProxy is pinned to it's own core. I've also made the following adjustments to sysctl.conf: # Max open file descriptors fs.file-max = 331287 # TCP Tuning net.ipv4.tcp_tw_reuse = 1 net.ipv4.ip_local_port_range = 1024 65023 net.ipv4.tcp_max_syn_backlog = 10240 net.ipv4.tcp_max_tw_buckets = 400000 net.ipv4.tcp_max_orphans = 60000 net.ipv4.tcp_synack_retries = 3 net.core.somaxconn = 40000 net.ipv4.tcp_rmem = 4096 8192 16384 net.ipv4.tcp_wmem = 4096 8192 16384 net.ipv4.tcp_mem = 65536 98304 131072 net.core.netdev_max_backlog = 40000 net.ipv4.tcp_tw_reuse = 1 If I use AB to hit the a webserver directly I easily get 30k/s connections. If I stop the webservers and use AB to hit HAProxy then I get 30k/s connections but obviously it's useless. I've also disabled iptables for now since I read that nf_conntrack can slow everything down, no change. I've also disabled the irqbalance service. The fact that I can hit each individual device with 30k/s makes me believe the tuning of the servers is OK and that it must be some HAProxy config? Here's the config which I've built from reading tuning articles, etc http://pastebin.com/zsCyAtgU The server is a dual Xeon CPU E5-2620 (6 cores) with 32GB of RAM. Running Centos 6.2 x64. The private and public interfaces are on separate NICS. Anyone have any ideas? Thanks.

Read the article
100% CPU when doing 4 or more concurrent requests with Magento

- by pancake

Currently I'm having trouble with a server running Magento, it's unbelievably slow. It's a VPS with a few Magento installations on it used for development, so I'm the only one using them. When I do 4 request all 2 seconds after each other I'm finished in 10 seconds. Slow, but still within the limits of my patience. When I do 4 "concurrent" requests, however (opening 4 tabs in a row, very quickly) all four cores go to 100% and stay there for like a minute. How is this possible? I know that there are a lot of possibilities here, so any tips on how to make an Apache/PHP server go faster are also welcome. It used to go a lot faster before, and I've also tried APC, but it kept causing problems (PHP errors, something with memory pools) so I've disabled it. By the way, the Magento cache is off and compiling is also off. I know this makes Magento slower than usual, but I don't think a 60 second response time is normal for any Magento installation. Virtual hardware: 4 Cores and 4096MB RAM Swap is never used (checked with htop) 100GB disk space, of which 10% is in use Software: Debian 6 DirectAdmin and apache custombuild PHP 5.2.17 (CLI) If you need more info, please tell me how to get it, because I probably don't know how. I do know how to use the command line in linux and the usage of quite a few commands, but my experience with managing a server is limited.

Read the article
Parallelism in .NET – Part 9, Configuration in PLINQ and TPL

- by Reed

Parallel LINQ and the Task Parallel Library contain many options for configuration. Although the default configuration options are often ideal, there are times when customizing the behavior is desirable. Both frameworks provide full configuration support. When working with Data Parallelism, there is one primary configuration option we often need to control – the number of threads we want the system to use when parallelizing our routine. By default, PLINQ and the TPL both use the ThreadPool to schedule tasks. Given the major improvements in the ThreadPool in CLR 4, this default behavior is often ideal. However, there are times that the default behavior is not appropriate. For example, if you are working on multiple threads simultaneously, and want to schedule parallel operations from within both threads, you might want to consider restricting each parallel operation to using a subset of the processing cores of the system. Not doing this might over-parallelize your routine, which leads to inefficiencies from having too many context switches. In the Task Parallel Library, configuration is handled via the ParallelOptions class. All of the methods of the Parallel class have an overload which accepts a ParallelOptions argument. We configure the Parallel class by setting the ParallelOptions.MaxDegreeOfParallelism property. For example, let’s revisit one of the simple data parallel examples from Part 2: Parallel.For(0, pixelData.GetUpperBound(0), row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); .csharpcode, .csharpcode pre { font-size: small; color: black; font-family: consolas, "Courier New", courier, monospace; background-color: #ffffff; /*white-space: pre;*/ } .csharpcode pre { margin: 0em; } .csharpcode .rem { color: #008000; } .csharpcode .kwrd { color: #0000ff; } .csharpcode .str { color: #006080; } .csharpcode .op { color: #0000c0; } .csharpcode .preproc { color: #cc6633; } .csharpcode .asp { background-color: #ffff00; } .csharpcode .html { color: #800000; } .csharpcode .attr { color: #ff0000; } .csharpcode .alt { background-color: #f4f4f4; width: 100%; margin: 0em; } .csharpcode .lnum { color: #606060; } Here, we’re looping through an image, and calling a method on each pixel in the image. If this was being done on a separate thread, and we knew another thread within our system was going to be doing a similar operation, we likely would want to restrict this to using half of the cores on the system. This could be accomplished easily by doing: var options = new ParallelOptions(); options.MaxDegreeOfParallelism = Math.Max(Environment.ProcessorCount / 2, 1); Parallel.For(0, pixelData.GetUpperBound(0), options, row => { for (int col=0; col < pixelData.GetUpperBound(1); ++col) { pixelData[row, col] = AdjustContrast(pixelData[row, col], minPixel, maxPixel); } }); Now, we’re restricting this routine to using no more than half the cores in our system. Note that I included a check to prevent a single core system from supplying zero; without this check, we’d potentially cause an exception. I also did not hard code a specific value for the MaxDegreeOfParallelism property. One of our goals when parallelizing a routine is allowing it to scale on better hardware. Specifying a hard-coded value would contradict that goal. Parallel LINQ also supports configuration, and in fact, has quite a few more options for configuring the system. The main configuration option we most often need is the same as our TPL option: we need to supply the maximum number of processing threads. In PLINQ, this is done via a new extension method on ParallelQuery<T>: ParallelEnumerable.WithDegreeOfParallelism. Let’s revisit our declarative data parallelism sample from Part 6: double min = collection.AsParallel().Min(item => item.PerformComputation()); Here, we’re performing a computation on each element in the collection, and saving the minimum value of this operation. If we wanted to restrict this to a limited number of threads, we would add our new extension method: int maxThreads = Math.Max(Environment.ProcessorCount / 2, 1); double min = collection .AsParallel() .WithDegreeOfParallelism(maxThreads) .Min(item => item.PerformComputation()); This automatically restricts the PLINQ query to half of the threads on the system. PLINQ provides some additional configuration options. By default, PLINQ will occasionally revert to processing a query in parallel. This occurs because many queries, if parallelized, typically actually cause an overall slowdown compared to a serial processing equivalent. By analyzing the “shape” of the query, PLINQ often decides to run a query serially instead of in parallel. This can occur for (taken from MSDN): Queries that contain a Select, indexed Where, indexed SelectMany, or ElementAt clause after an ordering or filtering operator that has removed or rearranged original indices. Queries that contain a Take, TakeWhile, Skip, SkipWhile operator and where indices in the source sequence are not in the original order. Queries that contain Zip or SequenceEquals, unless one of the data sources has an originally ordered index and the other data source is indexable (i.e. an array or IList(T)). Queries that contain Concat, unless it is applied to indexable data sources. Queries that contain Reverse, unless applied to an indexable data source. If the specific query follows these rules, PLINQ will run the query on a single thread. However, none of these rules look at the specific work being done in the delegates, only at the “shape” of the query. There are cases where running in parallel may still be beneficial, even if the shape is one where it typically parallelizes poorly. In these cases, you can override the default behavior by using the WithExecutionMode extension method. This would be done like so: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .Select(i => i.PerformComputation()) .Reverse(); Here, the default behavior would be to not parallelize the query unless collection implemented IList<T>. We can force this to run in parallel by adding the WithExecutionMode extension method in the method chain. Finally, PLINQ has the ability to configure how results are returned. When a query is filtering or selecting an input collection, the results will need to be streamed back into a single IEnumerable<T> result. For example, the method above returns a new, reversed collection. In this case, the processing of the collection will be done in parallel, but the results need to be streamed back to the caller serially, so they can be enumerated on a single thread. This streaming introduces overhead. IEnumerable<T> isn’t designed with thread safety in mind, so the system needs to handle merging the parallel processes back into a single stream, which introduces synchronization issues. There are two extremes of how this could be accomplished, but both extremes have disadvantages. The system could watch each thread, and whenever a thread produces a result, take that result and send it back to the caller. This would mean that the calling thread would have access to the data as soon as data is available, which is the benefit of this approach. However, it also means that every item is introducing synchronization overhead, since each item needs to be merged individually. On the other extreme, the system could wait until all of the results from all of the threads were ready, then push all of the results back to the calling thread in one shot. The advantage here is that the least amount of synchronization is added to the system, which means the query will, on a whole, run the fastest. However, the calling thread will have to wait for all elements to be processed, so this could introduce a long delay between when a parallel query begins and when results are returned. The default behavior in PLINQ is actually between these two extremes. By default, PLINQ maintains an internal buffer, and chooses an optimal buffer size to maintain. Query results are accumulated into the buffer, then returned in the IEnumerable<T> result in chunks. This provides reasonably fast access to the results, as well as good overall throughput, in most scenarios. However, if we know the nature of our algorithm, we may decide we would prefer one of the other extremes. This can be done by using the WithMergeOptions extension method. For example, if we know that our PerformComputation() routine is very slow, but also variable in runtime, we may want to retrieve results as they are available, with no bufferring. This can be done by changing our above routine to: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.NotBuffered) .Select(i => i.PerformComputation()) .Reverse(); On the other hand, if are already on a background thread, and we want to allow the system to maximize its speed, we might want to allow the system to fully buffer the results: var reversed = collection .AsParallel() .WithExecutionMode(ParallelExecutionMode.ForceParallelism) .WithMergeOptions(ParallelMergeOptions.FullyBuffered) .Select(i => i.PerformComputation()) .Reverse(); Notice, also, that you can specify multiple configuration options in a parallel query. By chaining these extension methods together, we generate a query that will always run in parallel, and will always complete before making the results available in our IEnumerable<T>.

Read the article
What's up with LDoms: Part 1 - Introduction & Basic Concepts

- by Stefan Hinker

LDoms - the correct name is Oracle VM Server for SPARC - have been around for quite a while now. But to my surprise, I get more and more requests to explain how they work or to give advise on how to make good use of them. This made me think that writing up a few articles discussing the different features would be a good idea. Now - I don't intend to rewrite the LDoms Admin Guide or to copy and reformat the (hopefully) well known "Beginners Guide to LDoms" by Tony Shoumack from 2007. Those documents are very recommendable - especially the Beginners Guide, although based on LDoms 1.0, is still a good place to begin with. However, LDoms have come a long way since then, and I hope to contribute to their adoption by discussing how they work and what features there are today. In this and the following posts, I will use the term "LDoms" as a common abbreviation for Oracle VM Server for SPARC, just because it's a lot shorter and easier to type (and presumably, read). So, just to get everyone on the same baseline, lets briefly discuss the basic concepts of virtualization with LDoms. LDoms make use of a hypervisor as a layer of abstraction between real, physical hardware and virtual hardware. This virtual hardware is then used to create a number of guest systems which each behave very similar to a system running on bare metal: Each has its own OBP, each will install its own copy of the Solaris OS and each will see a certain amount of CPU, memory, disk and network resources available to it. Unlike some other type 1 hypervisors running on x86 hardware, the SPARC hypervisor is embedded in the system firmware and makes use both of supporting functions in the sun4v SPARC instruction set as well as the overall CPU architecture to fulfill its function. The CMT architecture of the supporting CPUs (T1 through T4) provide a large number of cores and threads to the OS. For example, the current T4 CPU has eight cores, each running 8 threads, for a total of 64 threads per socket. To the OS, this looks like 64 CPUs. The SPARC hypervisor, when creating guest systems, simply assigns a certain number of these threads exclusively to one guest, thus avoiding the overhead of having to schedule OS threads to CPUs, as do typical x86 hypervisors. The hypervisor only assigns CPUs and then steps aside. It is not involved in the actual work being dispatched from the OS to the CPU, all it does is maintain isolation between different guests. Likewise, memory is assigned exclusively to individual guests. Here, the hypervisor provides generic mappings between the physical hardware addresses and the guest's views on memory. Again, the hypervisor is not involved in the actual memory access, it only maintains isolation between guests. During the inital setup of a system with LDoms, you start with one special domain, called the Control Domain. Initially, this domain owns all the hardware available in the system, including all CPUs, all RAM and all IO resources. If you'd be running the system un-virtualized, this would be what you'd be working with. To allow for guests, you first resize this initial domain (also called a primary domain in LDoms speak), assigning it a small amount of CPU and memory. This frees up most of the available CPU and memory resources for guest domains. IO is a little more complex, but very straightforward. When LDoms 1.0 first came out, the only way to provide IO to guest systems was to create virtual disk and network services and attach guests to these services. In the meantime, several different ways to connect guest domains to IO have been developed, the most recent one being SR-IOV support for network devices released in version 2.2 of Oracle VM Server for SPARC. I will cover these more advanced features in detail later. For now, lets have a short look at the initial way IO was virtualized in LDoms: For virtualized IO, you create two services, one "Virtual Disk Service" or vds, and one "Virtual Switch" or vswitch. You can, of course, also create more of these, but that's more advanced than I want to cover in this introduction. These IO services now connect real, physical IO resources like a disk LUN or a networt port to the virtual devices that are assigned to guest domains. For disk IO, the normal case would be to connect a physical LUN (or some other storage option that I'll discuss later) to one specific guest. That guest would be assigned a virtual disk, which would appear to be just like a real LUN to the guest, while the IO is actually routed through the virtual disk service down to the physical device. For network, the vswitch acts very much like a real, physical ethernet switch - you connect one physical port to it for outside connectivity and define one or more connections per guest, just like you would plug cables between a real switch and a real system. For completeness, there is another service that provides console access to guest domains which mimics the behavior of serial terminal servers. The connections between the virtual devices on the guest's side and the virtual IO services in the primary domain are created by the hypervisor. It uses so called "Logical Domain Channels" or LDCs to create point-to-point connections between all of these devices and services. These LDCs work very similar to high speed serial connections and are configured automatically whenever the Control Domain adds or removes virtual IO. To see all this in action, now lets look at a first example. I will start with a newly installed machine and configure the control domain so that it's ready to create guest systems. In a first step, after we've installed the software, let's start the virtual console service and downsize the primary domain. root@sun # ldm list NAME STATE FLAGS CONS VCPU MEMORY UTIL UPTIME primary active -n-c-- UART 512 261632M 0.3% 2d 13h 58m root@sun # ldm add-vconscon port-range=5000-5100 \ primary-console primary root@sun # svcadm enable vntsd root@sun # svcs vntsd STATE STIME FMRI online 9:53:21 svc:/ldoms/vntsd:default root@sun # ldm set-vcpu 16 primary root@sun # ldm set-mau 1 primary root@sun # ldm start-reconf primary root@sun # ldm set-memory 7680m primary root@sun # ldm add-config initial root@sun # shutdown -y -g0 -i6 So what have I done: I've defined a range of ports (5000-5100) for the virtual network terminal service and then started that service. The vnts will later provide console connections to guest systems, very much like serial NTS's do in the physical world. Next, I assigned 16 vCPUs (on this platform, a T3-4, that's two cores) to the primary domain, freeing the rest up for future guest systems. I also assigned one MAU to this domain. A MAU is a crypto unit in the T3 CPU. These need to be explicitly assigned to domains, just like CPU or memory. (This is no longer the case with T4 systems, where crypto is always available everywhere.) Before I reassigned the memory, I started what's called a "delayed reconfiguration" session. That avoids actually doing the change right away, which would take a considerable amount of time in this case. Instead, I'll need to reboot once I'm all done. I've assigned 7680MB of RAM to the primary. That's 8GB less the 512MB which the hypervisor uses for it's own private purposes. You can, depending on your needs, work with less. I'll spend a dedicated article on sizing, discussing the pros and cons in detail. Finally, just before the reboot, I saved my work on the ILOM, to make this configuration available after a powercycle of the box. (It'll always be available after a simple reboot, but the ILOM needs to know the configuration of the hypervisor after a power-cycle, before the primary domain is booted.) Now, lets create a first disk service and a first virtual switch which is connected to the physical network device igb2. We will later use these to connect virtual disks and virtual network ports of our guest systems to real world storage and network. root@sun # ldm add-vds primary-vds root@sun # ldm add-vswitch net-dev=igb2 switch-primary primary You are free to choose whatever names you like for the virtual disk service and the virtual switch. I strongly recommend that you choose names that make sense to you and describe the function of each service in the context of your implementation. For the vswitch, for example, you could choose names like "admin-vswitch" or "production-network" etc. This already concludes the configuration of the control domain. We've freed up considerable amounts of CPU and RAM for guest systems and created the necessary infrastructure - console, vts and vswitch - so that guests systems can actually interact with the outside world. The system is now ready to create guests, which I'll describe in the next section. For further reading, here are some recommendable links: The LDoms 2.2 Admin Guide The "Beginners Guide to LDoms" The LDoms Information Center on MOS LDoms on OTN

Read the article
My new laptop - with a really nice battery option

- by Rob Farley

It was about time I got a new laptop, and so I made a phone-call to Dell to discuss my options. I decided not to get an SSD from them, because I’d rather choose one myself – the sales guy tells me that changing the HD doesn’t void my warranty, so that’s good (incidentally, I’d love to hear people’s recommendations for which SSD to get for my laptop). Unfortunately this machine only has one HD slot, but I figure that I’ll put lots of stuff onto external disks anyway. The machine I got was a Dell Studio XPS 16. It’s red (which suits my company), but also has the Intel® Core™ i7-820QM Processor, which is 4 Cores/8 Threads. Makes for a pretty Task Manager, but nothing like the one I saw at SQLBits last year (at 96 cores), or the one that my good friend James Rowland-Jones writes about here. But the reason for this post is actually something in the software that comes with the machine – you know, the stuff that most people uninstall at the earliest opportunity. I had just reinstalled the operating system, and was going through the utilities to get the drivers up-to-date, when I noticed that one of Dell applications included an option to disable battery charging. So I installed it. And sure enough, I can tell the battery not to charge now. Clearly Dell see it as a temporary option, and one that’s designed for when you’re on a plane. But for me, I most often use my laptop with the power plugged in, which means I don’t need to have my battery continually topping itself up. So I really love this option, but I feel like it could go a little further. I’d like “Not Charging” to be the default option, and let me set it when I want to charge it (which should theoretically make my battery last longer). I also intend to work out how this option works, so that I can script it and put it into my StartUp options (so it can be the Default setting). Actually – if someone has already worked this out and can tell me what it does, then please feel free to let me know. Even better would be an external switch. I had a switch on my old laptop (a Dell Latitude) for WiFi, so that I could turn that off before I turned on the computer (this laptop doesn’t give me that option – no physical switch for flight mode). I guess it just means I’ll get used to leaving the WiFi off by default, and turning it on when I want it – might save myself some battery power that way too. Soon I’ll need to take the plunge and sync my iPhone with the new laptop. I’m a little worried that I might lose something – Apple’s messages about how my stuff will be wiped and replaced with what’s on the PC doesn’t fill me with confidence, as it’s a new PC that doesn’t have stuff on it. But having a new machine is definitely a nice experience, and one that I can recommend. I’m sure when I get around to buying an SSD I’ll feel like it’s shiny and new all over again! Share this post: email it! | bookmark it! | digg it! | reddit! | kick it! | live it!

Read the article
How to make software development decisions based on facts

- by Laila

We love to hear stories about the many and varied ways our customers use the tools that we develop, but in our earnest search for stories and feedback, we'd rather forgotten that some of our keenest users are fellow RedGaters, in the same building. It was almost by chance that we discovered how the SQL Source Control team were using SmartAssembly. As it happens, there is a separate account (here on Simple-Talk) of how SmartAssembly was used to support the Early Access program; by providing answers to specific questions about how the SQL Source Control product was used. But what really got us all grinning was how valuable the SQL Source Control team found the reports that SmartAssembly was quickly and painlessly providing. So gather round, my friends, and I'll tell you the Tale Of The Framework Upgrade . <strange mirage effect to denote a flashback. A subtle background string of music starts playing in minor key> Kevin and his team were undecided. They weren't sure whether they could move their software product from .NET 2 to .NET 3.5 , let alone to .NET 4. You see, they were faced with having to guess what version of .NET was already installed on the average user's machine, which I'm sure you'll agree is no easy task. Upgrading their code to .NET 3.5 might put a barrier to people trying the tool, which was the last thing Kevin wanted: "what if our users have to download X, Y, and Z before being able to open the application?" he asked. That fear of users having to do half an hour of downloads (.followed by at least ten minutes of installation. followed by a five minute restart) meant that Kevin's team couldn't take advantage of WCF (Windows Communication Foundation). This made them sad, because WCF would have allowed them to write their code in a much simpler way, and in hours instead of days (as was the case with .NET 2). Oh sure, they had a gut feeling that this probably wasn't the case, 3.5 had been out for so many years, but they weren't sure. <background music switches to major key> SmartAssembly Feature Usage Reporting gave Kevin and his team exactly what they needed: hard data on their users' systems, both hardware and software. I was there, I saw it happen, and that's not the sort of thing a woman quickly forgets. I'll always remember his last words (before he went to lunch): "You get lots of free information by just checking a box in SmartAssembly" is what he said. For example, they could see how many CPU cores their customers were using, and found out that they should be making use of parallelism to take advantage of available cores. But crucially, (and this is the moral of my tale, dear reader), Kevin saw that 99% of SQL Source Control's users were on .NET 3.5 or above. So he knew that they could make the switch and that is was safe to do so. With this reassurance, they could use WCF to not only make development easier, but to also give them a really nice way to do inter-process communication between the Source Control and the SQL Compare products. To have done that on .NET 2.0 was certainly possible <knowing chuckle>, but Microsoft have made it a lot easier with WCF. <strange mirage effect to denote end of flashback> So you see, with Feature Usage Reporting, they finally got the hard evidence they needed to safely make the switch to .NET 3.5, knowing it would not inconvenience their users. And that, my friends, is just the sort of thing we like to hear.

Read the article
F# and the rose-tinted reflection

- by CliveT

We're already seeing increasing use of many cores on client desktops. It is a change that has been long predicted. It is not just a change in architecture, but our notions of efficiency in a program. No longer can we focus on the asymptotic complexity of an algorithm by counting the steps that a single core processor would take to execute it. Instead we'll soon be more concerned about the scalability of the algorithm and how well we can increase the performance as we increase the number of cores. This may even lead us to throw away our most efficient algorithms, and switch to less efficient algorithms that scale better. We might even be willing to waste cycles in order to speculatively execute at the algorithm rather than the hardware level. State is the big headache in this parallel world. At the hardware level, main memory doesn't necessarily contain the definitive value corresponding to a particular address. An update to a location might still be held in a CPU's local cache and it might be some time before the value gets propagated. To get the latest value, and the notion of "latest" takes a lot of defining in this world of rapidly mutating state, the CPUs may well need to communicate to decide who has the definitive value of a particular address in order to avoid lost updates. At the user program level, this means programmers will need to lock objects before modifying them, or attempt to avoid the overhead of locking by understanding the memory models at a very deep level. I think it's this need to avoid statefulness that has led to the recent resurgence of interest in functional languages. In the 1980s, functional languages started getting traction when research was carried out into how programs in such languages could be auto-parallelised. Sadly, the impracticality of some of the languages, the overheads of communication during this parallel execution, and rapid improvements in compiler technology on stock hardware meant that the functional languages fell by the wayside. The one thing that these languages were good at was getting rid of implicit state, and this single idea seems like a solution to the problems we are going to face in the coming years. Whether these languages will catch on is hard to predict. The mindset for writing a program in a functional language is really very different from the way that object-oriented problem decomposition happens - one has to focus on the verbs instead of the nouns, which takes some getting used to. There are a number of hybrid functional/object languages that have been becoming more popular in recent times. These half-way houses make it easy to use functional ideas for some parts of the program while still allowing access to the underlying object-focused platform without a great deal of impedance mismatch. One example is F# running on the CLR which, in Visual Studio 2010, has because a first class member of the pack. Inside Visual Studio 2010, the tooling for F# has improved to the point where it is easy to set breakpoints and watch values change while debugging at the source level. In my opinion, it is the tooling support that will enable the widespread adoption of functional languages - without this support, people will put off any transition into the functional world for as long as they possibly can. Without tool support it will make it hard to learn these languages. One tool that doesn't currently support F# is Reflector. The idea of decompiling IL to a functional language is daunting, but F# is potentially so important I couldn't dismiss the idea. As I'm currently developing Reflector 6.5, I thought it wise to take four days just to see how far I could get in doing so, even if it achieved little more than to be clearer on how much was possible, and how long it might take. You can read what happened here, and of the insights it gave us on ways to improve the tool.

Read the article
Crime Scene Investigation: SQL Server

- by Rodney Landrum

“The packages are running slower in Prod than they are in Dev” My week began with this simple declaration from one of our lead BI developers, quickly followed by an emailed spreadsheet demonstrating that, over 5 executions, an extensive ETL process was running average 630 seconds faster on Dev than on Prod. The situation needed some scientific investigation to determine why the same code, the same data, the same schema would yield consistently slower results on a more powerful server. Prod had yet to be officially christened with a “Go Live” date so I had the time, and having recently been binge watching CSI: New York, I also had the inclination. An inspection of the two systems, Prod and Dev, revealed the first surprise: although Prod was indeed a “bigger” system, with double the amount of RAM of Dev, the latter actually had twice as many processor cores. On neither system did I see much sign of resources being heavily taxed, while the ETL process was running. Without any real supporting evidence, I jumped to a conclusion that my years of performance tuning should have helped me avoid, and that was that the hardware differences explained the better performance on Dev. We spent time setting up a Test system, similarly scoped to Prod except with 4 times the cores, and ported everything across. The results of our careful benchmarks left us truly bemused; the ETL process on the new server was slower than on both other systems. We burned more time tweaking server configurations, monitoring IO and network latency, several times believing we’d uncovered the smoking gun, until the results of subsequent test runs pitched us back into confusion. Finally, I decided, enough was enough. Hadn’t I learned very early in my DBA career that almost all bottlenecks were caused by code and database design, not hardware? It was time to get back to basics. With over 100 SSIS packages and hundreds of queries, each handling specific tasks such as file loads, bulk inserts, transforms, logging, and so on, the task seemed formidable. And yet, after barely an hour spent with Profiler, Extended Events, and wait statistics DMVs, I had a lead in the shape of a query that joined three tables, containing millions of rows, returned 3279 results, but performed 239K logical reads. As soon as I looked at the execution plans for the query in Dev and Test I saw the culprit, an implicit conversion warning on a join predicate field that was numeric in one table and a varchar(50) in another! I turned this information over to the BI developers who quickly resolved the data type mismatches and found and fixed “several” others as well. After the schema changes the same query with the same databases ran in under 1 second on all systems and reduced the logical reads down to fewer than 300. The analysis also revealed that on Dev, the ETL task was pulling data across a LAN, whereas Prod and Test were connected across slower WAN, in large part explaining why the same process ran slower on the latter two systems. Loading the data locally on Prod delivered a further 20% gain in performance. As we progress through our DBA careers we learn valuable lessons. Sometimes, with a project deadline looming and pressure mounting, we choose to forget them. I was close to giving into the temptation to throw more hardware at the problem. I’m pleased at least that I resisted, though I still kick myself for not looking at the code on day one. It can seem a daunting prospect to return to the fundamentals of the code so close to roll out, but with the right tools, and surprisingly little time, you can collect the evidence that reveals the true problem. It is a lesson I trust I will remember for my next 20 years as a DBA, if I’m ever again tempted to bypass the evidence.

Read the article
How to get faster graphics in KVM? VNC is painfully slow with Haiku OS guest, Spice won't install and SDL doesn't work

- by Don Quixote

I've been coming up to speed on the Haiku operating system, an Open Source clone of BeOS 5 Pro. I'm using an Apple MacBook Pro as my development machine. Apple's BootCamp BIOS does not support more than four partitions on the internal hard drive. While I can set up extended and logical partitions, doing so will prevent any of the installed operating systems from booting. To run Haiku directly on the iron, I boot it off a USB stick. Using external storage is also helpful because I am perpetually out of filesystem space. While VirtualBox is documented to allow access to physical drives, I could not actually get it to work. Also VirtualBox can only use one of the host CPU's cores. While VB guests can be configured for more than one CPU, they are only emulated. A full build of the Haiku OS takes 4.5 under VB. I had the hope of reducing build times by using KVM instead, but it's not working nearly as well as VirtualBox did. The Linux Kernel Virtual Machine is broken in all manner of fundamental ways as seen from Haiku. But I'm a coder; maybe I could contribute to fixing some of those problems. The first problem I've got is that Haiku's video in virt-manager is quite painfully slow. When I drag Haiku windows around the desktop, they lag quite far behind where my mouse is. It's quite difficult to move a window to a precise position on the screen. Just imagine that the mouse was connected to the window title bar with a really stretchy spring. Also Haiku's mouse lags quite far behind where I have moved it. I found lots of Personal Package Archives that enable Spice from QEMU / KVM at the Ubuntu Personal Package Arhives. I tried a few of the PPAs but none of them worked; with one of them, the command "add-apt-repository" crashed with a traceback. There is a Wiki page about Spice, but it says that it only works on 64-bit. My Early 2006 MacBook Pro is 32-bit. Its Apple Model Identifier is MacBookPro1,1; these use Core Duos NOT Core 2 Duos. I don't mind building a source deb for 32-bit if I can expect it to work. Is there some reason that Spice should be 64-bit only? Does it need features of the x86_64 Instruction Set Architecture that x86 does not have? When I try using SDL from virt-manager, the configuration for Local SDL Window says "Xauth: /home/mike/.Xauthority". When I try to start my guest, virt-manager emits an error. When I Googled the error message, the usual solution was to make ~/.Xauthority readible. However, .Xauthorty does not exist in my home directory. Instead I have a $XAUTHORITY environment variable. There is no way to configure SDL in virt-manager to use $XAUTHORITY instead of ~/.Xauthority. Neither does it work to copy the value of $XAUTHORITY into the file. I am ready to scream, because I've been five fscking days trying to make KVM work for Haiku development. There is a whole lot more that is broken than the slow video. All I really want to do for now is speed up my full builds of Haiku by using "jam -j2" to use both cores in my CPU. I may try Xen next, but the last time I monkeyed with Xen it was far, far more broken than I am finding KVM to be. Just for now, I would be satisfied if there were some way to use my USB stick as a drive in VirtualBox. VB does allow me to configure /dev/sdb as a drive, but it always causes a fatal error when I try to launch the guest. Thank You For Any Advice You Can Give Me. -

Read the article
How to force two process to run on the same CPU?

- by kovan

Context: I'm programming a software system that consists of multiple processes. It is programmed in C++ under Linux. and they communicate among them using Linux shared memory. Usually, in software development, is in the final stage when the performance optimization is made. Here I came to a big problem. The software has high performance requirements, but in machines with 4 or 8 CPU cores (usually with more than one CPU), it was only able to use 3 cores, thus wasting 25% of the CPU power in the first ones, and more than 60% in the second ones. After many research, and having discarded mutex and lock contention, I found out that the time was being wasted on shmdt/shmat calls (detach and attach to shared memory segments). After some more research, I found out that these CPUs, which usually are AMD Opteron and Intel Xeon, use a memory system called NUMA, which basically means that each processor has its fast, "local memory", and accessing memory from other CPUs is expensive. After doing some tests, the problem seems to be that the software is designed so that, basically, any process can pass shared memory segments to any other process, and to any thread in them. This seems to kill performance, as process are constantly accessing memory from other processes. Question: Now, the question is, is there any way to force pairs of processes to execute in the same CPU?. I don't mean to force them to execute always in the same processor, as I don't care in which one they are executed, altough that would do the job. Ideally, there would be a way to tell the kernel: If you schedule this process in one processor, you must also schedule this "brother" process (which is the process with which it communicates through shared memory) in that same processor, so that performance is not penalized.

Read the article
How to setup matlabpool for multiple processors?

- by JohnIdol

I just setup a Extra Large Heavy Computation EC2 instance to throw it at my Genetic Algorithms problem, hoping to speed up things. This instance has 8 Intel Xeon processors (around 2.4Ghz each) and 7 Gigs of RAM. On my machine I have an Intel Core Duo, and matlab is able to work with my two cores just fine by runinng: matlabpool open 2 On the EC2 instance though, matlab only is capable of detecting 1 out of 8 processors, and if I try running: matlabpool open 8 I get an error saying that the ClusterSize is 1 since there's only 1 core on my CPU. True, there is only 1 core on each CPU, but I have 8 CPUs on the given EC2 instance! So the difference from my machine and the ec2 instance is that I have my 2 cores on a single processor locally, while the EC2 instance has 8 distinct processors. My question is, how do I get matlab to work with those 8 processors? I found this paper, but it seems related to setting up matlab with multiple EC2 instances (not related to multiple processors on the same instance, EC2 or not), which is not my problem. Any help appreciated!

Read the article
What does "cpuid level" means ? Asking just for curiosity

- by ogzylz

For example, I put just 2 core info of a 16 core machine. What does "cpuid level : 6" line means? If u can provide info about lines "bogomips : 5992.10" and "clflush size : 64" I will be appreciated processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 8 cpu MHz : 2992.689 cache size : 4096 KB physical id : 0 siblings : 4 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm bogomips : 5992.10 clflush size : 64 cache_alignment : 128 address sizes : 40 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 15 model : 6 model name : Intel(R) Xeon(TM) CPU 3.00GHz stepping : 8 cpu MHz : 2992.689 cache size : 4096 KB physical id : 1 siblings : 4 core id : 0 cpu cores : 2 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm syscall nx lm constant_tsc pni monitor ds_cpl vmx cid cx16 xtpr lahf_lm bogomips : 5985.23 clflush size : 64 cache_alignment : 128 address sizes : 40 bits physical, 48 bits virtual power management:

Read the article
Bash: how to simply parallelize tasks?

- by NoozNooz42

I'm writing a tiny script that calls the "PNGOUT" util on a few hundred PNG files. I simply did this: find $BASEDIR -iname "*png" -exec pngout {} \; And then I looked at my CPU monitor and noticed only one of the core was used, which is quite sad. In this day and age of dual, quad, octo and hexa (?) cores desktop, how do I simply parallelize this task with Bash? (it's not the first time I've had such a need, for quite a lot of these utils are mono-threaded... I already had the case with mp3 encoders). Would simply running all the pngout in the background do? How would my find command look like then? (I'm not too sure how to mix find and the '&' character) I if have three hundreds pictures, this would mean swapping between three hundreds processes, which doesn't seem great anyway!? Or should I copy my three hundreds files or so in "nb dirs", where "nb dirs" would be the number of cores, then run concurrently "nb finds"? (which would be close enough) But how would I do this?

Read the article
Which MySQL Fork/Version to Pick??

- by Drew

As most of you know, Sun acquired MySQL (and later Oracle acquired Sun), and during these acquisitions, there were a lot of FUD in MySQL community which resulted in creation of various forks. Today we have MySQL from MySQL, Percona (XtraDB) MySQL, OurDelta MySQL, MariaDB, Drizzle to name a few. Which brings us to the source of the problem. We are in the process of upgrading our databases (hardware/software) and I would like to know which one of the forks should I go with. Each has their own set of pros/cons. We are currently using MySQL 5.0.x from MySQL/Linux on an 8-core machine. Our new hardware is a monster with 32 cores and 32GB of memory connecting to a fast NetApp Storage via FC. I would like to stick with MySQL from MySQL but I have heard horror stories on how badly MySQL 5.1 performs on many cores. I have also heard that MySQL 5.4 performs better on multi-core machines but that's still not production ready. In addition, I have also heard a lot of good things about Percona builds. This is what I know so far: MySQL 5.1 from MySQL: Reliable choice, but doesn't scale well on a big machine Percona: Scales well, good backing company. I don't have much experience with it MariaDB: Don't know much about it besides that it was founded by Original MySQL developers (including Monty) OurDelta: Don't know much Drizzle: Mostly optimized for cloud computing I would like to know what's the general notion about this problem. Which build/version should I go with? How are you guys picking your builds/versions? Thanks!

Read the article
How to improve Visual C++ compilation times?

- by dtrosset

I am compiling 2 C++ projects in a buildbot, on each commit. Both are around 1000 files, one is 100 kloc, the other 170 kloc. Compilation times are very different from gcc (4.4) to Visual C++ (2008). Visual C++ compilations for one project take in the 20 minutes. They cannot take advantage of the multiple cores because a project depend on the other. In the end, a full compilation of both projects in Debug and Release, in 32 and 64 bits takes more than 2 1/2 hours. gcc compilations for one project take in the 4 minutes. It can be parallelized on the 4 cores and takes around 1 min 10 secs. All 8 builds for 4 versions (Debug/Release, 32/64 bits) of the 2 projects are compiled in less than 10 minutes. What is happening with Visual C++ compilation times? They are basically 5 times slower. What is the average time that can be expected to compile a C++ kloc? Mine are 7 s/kloc with vc++ and 1.4 s/kloc with gcc. Can anything be done to speed-up compilation times on Visual C++?

Read the article
How expensive is a context switch? Is it better to implement a manual task switch than to rely on OS

- by Vilx-

The title says it all. Imagine I have two (three, four, whatever) tasks that have to run in parallel. Now, the easy way to do this would be to create separate threads and forget about it. But on a plain old single-core CPU that would mean a lot of context switching - and we all know that context switching is big, bad, slow, and generally simply Evil. It should be avoided, right? On that note, if I'm writing the software from ground up anyway, I could go the extra mile and implement my own task-switching. Split each task in parts, save the state inbetween, and then switch among them within a single thread. Or, if I detect that there are multiple CPU cores, I could just give each task to a separate thread and all would be well. The second solution does have the advantage of adapting to the number of available CPU cores, but will the manual task-switch really be faster than the one in the OS core? Especially if I'm trying to make the whole thing generic with a TaskManager and an ITask, etc?

Read the article
AIX Checklist for stable obiee deployment

- by user554629

Common AIX configuration issues ( last updated 27 Aug 2012 ) OBIEE is a complicated system with many moving parts and connection points.The purpose of this article is to provide a checklist to discuss OBIEE deployment with your systems administrators. The information in this article is time sensitive, and updated as I discover new issues or details. What makes OBIEE different? When Tech Support suggests AIX component upgrades to a stable, locked-down production AIX environment, it is common to get "push back". "Why is this necessary? We aren't we seeing issues with other software?"It's a fair question that I have often struggled to answer; here are the talking points: OBIEE is memory intensive. It is the entire purpose of the software to trade memory for repetitive, more expensive database requests across a network. OBIEE is implemented in C++ and is very dependent on the C++ runtime to behave correctly. OBIEE is aggressively thread efficient; if atomic operations on a particular architecture do not work correctly, the software crashes. OBIEE dynamically loads third-party database client libraries directly into the nqsserver process. If the library is not thread-safe, or corrupts process memory the OBIEE crash happens in an unrelated part of the code. These are extremely difficult bugs to find. OBIEE software uses 99% common source across multiple platforms: Windows, Linux, AIX, Solaris and HPUX. If a crash happens on only one platform, we begin to suspect other factors. load intensity, system differences, configuration choices, hardware failures. It is rare to have a single product require so many diverse technical skills. My role in support is to understand system configurations, performance issues, and crashes. An analyst trained in Business Analytics can't be expected to know AIX internals in the depth required to make configuration choices. Here are some guidelines. AIX C++ Runtime must be at version 11.1.0.4$ lslpp -L | grep xlC.aixobiee software will crash if xlC.aix.rte is downlevel; this is not a "try it" suggestion.Nov 2011 11.1.0.4 version is appropriate for all AIX versions ( 5, 6, 7 )Download from here:https://www-304.ibm.com/support/docview.wss?uid=swg24031426 No reboot is necessary to install, it can even be installed while applications are using the current version.Restart the apps, and they will pick up the latest version. AIX 5.3 Technology Level 12 is required when running on Power5,6,7 processorsAIX 6.1 was introduced with the newer Power chips, and we have seen no issues with 6.1 or 7.1 versions.Customers with an unstable deployment, dozens of unexplained crashes, became stable after the upgrade.If your AIX system is 5.3, the minimum TL level should be at or higher than this:$ oslevel -s 5300-12-03-1107IBM typically supports only the two latest versions of AIX ( 6.1 and 7.1, for example). AIX 5.3 is still supported and popular running in an LPAR. obiee userid limits$ ulimit -Ha ( hard limits )$ ulimit -a ( default limits )core file size (blocks) unlimiteddata seg size (kbytes) unlimitedfile size (blocks) unlimitedmax memory size (kbytes) unlimitedopen files 10240 cpu time (seconds) unlimitedvirtual memory (kbytes) unlimitedIt is best to establish the values in /etc/security/limitsroot user is needed to observe and modify this file.If you modify a limit, you will need to relog in to change it again. For example,$ ulimit -c 0$ ulimit -c 2097151cannot modify limit: Operation not permitted$ ulimit -c unlimited$ ulimit -c0There are only two meaningful values for ulimit -c ; zero or unlimited.Anything else is likely to produce a truncated core file that cannot be analyzed. Deploy 32-bit or 64-bit ?Early versions of OBIEE offered 32-bit or 64-bit choice to AIX customers.The 32-bit choice was needed if a database vendor did not supply a 64-bit client library.That's no longer an issue and beginning with OBIEE 11, 32-bit code is no longer shipped.A common error that leads to "out of memory" conditions to to accept the 32-bit memory configuration choices on 64-bit deployments. The significant configuration choices are: Maximum process data (heap) size is in an AIX environment variableLDR_CNTRL=IGNOREUNLOAD@LOADPUBLIC@PREREAD_SHLIB@MAXDATA=0x... Two thread stack sizes are made in obiee NQSConfig.INI[ SERVER ]SERVER_THREAD_STACK_SIZE = 0;DB_GATEWAY_THREAD_STACK_SIZE = 0; Sort memory in NQSConfig.INI[ GENERAL ]SORT_MEMORY_SIZE = 4 MB ;SORT_BUFFER_INCREMENT_SIZE = 256 KB ; Choosing a value for MAXDATA:0x080000000 2GB Default maximum 32-bit heap size ( 8 with 7 zeros )0x100000000 4GB 64-bit breaking even with 32-bit ( 1 with 8 zeros )0x200000000 8GB 64-bit double 32-bit max0x400000000 16GB 64-bit safetyUsing 2GB heap size for a 64-bit process will almost certainly lead to an out-of-memory situation.Registers are twice as big ... consume twice as much memory in the heap.Upgrading to a 4GB heap for a 64-bit process is just "breaking even" with 32-bit.A 32-bit process is constrained by the 32-bit virtual addressing limits. Heap memory is used for dynamic requirements of obiee software, thread stacks for each of the configured threads, and sometimes for shared libraries. 64-bit processes are not constrained in this way; extra heap space can be configured for safety against a query that might create a sudden requirement for excessive storage. If the storage is not available, this query might crash the whole server and disrupt existing users.There is no performance penalty on AIX for configuring more memory than required; extra memory can be configured for safety. If there are no other considerations, start with 8GB.Choosing a value for Thread Stack size:zero is the value documented to select an appropriate default for thread stack size. My preference is to change this to an absolute value, even if you intend to use the documented default; it provides better documentation and removes the "surprise" factor.There are two thread types that can be configured. GATEWAY is used by a thread pool to call a database client library to establish a DB connection.The default size is 256KB; many customers raise this to 512KB ( no performance penalty for over-configuring ). This value must be set to 1 MB if Teradata connections are used. SERVER threads are used to run queries. OBIEE uses recursive algorithms during the analysis of query structures which can consume significant thread stack storage. It's difficult to provide guidance on a value that depends on data and complexity. The general notion is to provide more space than you think you need, "double down" and increase the value if you run out, otherwise inspect the query to understand why it is too complex for the thread stack. There are protections built into the software to abort a single user query that is too complex, but the algorithms don't cover all situations.256 KB The default 32-bit stack size. Many customers increased this to 512KB on 32-bit. A 64-bit server is very likely to crash with this value; the stack contains mostly register values, which are twice as big.512 KB The documented 64-bit default. Some early releases of obiee didn't set this correctly, resulting in 256KB stacks.1 MB The recommended 64-bit setting. If your system only ever uses 512KB of stack space, there is no performance penalty for using 1MB stack size.2 MB Many large customers use this value for safety. No performance penalty.nqscheduler does not use the NQSConfig.INI file to set thread stack size.If this process crashes because the thread stack is too small, use this to set 2MB:export OBI_BACKGROUND_STACK_SIZE=2048 Shared libraries are not (shared) When application libraries are loaded at run-time, AIX makes a decision on whether to load the libraries in a "public" memory segment. If the filesystem library permissions do not have the "Read-Other" permission bit, AIX loads the library into private process memory with two significant side-effects:* The libraries reduce the heap storage available. Might be significant in 32-bit processes; irrelevant in 64-bit processes.* Library code is loaded into multiple real pages for execution; one copy for each process.Multiple execution images is a significant issue for both 32- and 64-bit processes.The "real memory pages" saved by using public memory segments is a minor concern. Today's machines typically have plenty of real memory.The real problem with private copies of libraries is that they consume processor cache blocks, which are limited. The same library instructions executing in different real pages will cause memory delays as the i-cache ( instruction cache 128KB blocks) are refreshed from real memory. Performance loss because instructions are delayed is something that is difficult to measure without access to low-level cache fault data. The machine just appears to be running slowly for no observable reason.This is an easy problem to detect, and an easy problem to correct.Detection: "genld -l" AIX command produces a list of the libraries used by each process and the AIX memory address where they are loaded.32-bit public segment is 13 ( "dxxxxxxx" ). private segments are 2-a.64-bit public segment is 9 ( "9xxxxxxxxxxxxxxx") ; private segment is 8.genld -l | grep -v ' d| 9' | sort +2provides a list of privately loaded libraries. Repair: chmod o+r <libname>AIX shared libraries will have a suffix of ".so" or ".a".Another technique is to change all libraries in a selected directory to repair those that might not be currently loaded. The usual directories that need repair are obiee code, httpd code and plugins, database client libraries and java.chmod o+r /shr/dir/*.a /shr/dir/*.so Configure your system for diagnosticsProduction systems shouldn't crash, and yet bad things happen to good software.If obiee software crashes and produces a core, you should configure your system for reliable transfer of the failing conditions to Oracle Tech Support. Here's what we need to be able to diagnose a core file from your system.* fullcore enabled. chdev -lsys0 -a fullcore=true* core naming enabled. chcore -n on -d* ulimit must not truncate core. see item 3.* pstack.sh is used to capture core documentation.* obidoc is used to capture current AIX configuration.* snapcore AIX utility captures core and libraries. Use the proper syntax. $ snapcore -r corename executable-fullpath /tmp/snapcore will contain the .pax.Z output file. It is compressed.* If cores are directed to a common directory, ensure obiee userid can write to the directory. ( chcore -p /cores -d ; chmod 777 /cores )The filesystem must have sufficient space to hold a crashing obiee application.Use: df -k Check the "Free" column ( not "% Used" ) 8388608 is 8GB. Disable Oracle Client Library signal handlingThe Oracle DB Client Library is frequently distributed with the sqlplus development kit.By default, the library enables a signal handler, which will document a call stack if the application crashes. The signal handler is not needed, and definitely disruptive to obiee diagnostics. It needs to be disabled. sqlnet.ora is typically located at: $ORACLE_HOME/network/admin/sqlnet.oraAdd this line at the top of the file: DIAG_SIGHANDLER_ENABLED=FALSE Disable async query in the RPD connection pool.This might be an obiee 10.1.3.4 issue only ( still checking )."async query" must be disabled in the connection pools.It was designed to enable query cancellation to a database, and turned out to have too many edge conditions in normal communication that produced random corruption of data and crashes. Please ensure it is turned off in the RPD. Check AIX error report (errpt).Errors external to obiee applications can trigger crashes. $ /bin/errpt -aHardware errors ( firmware, adapters, disks ) should be reported to IBM support.All application core files are recorded by AIX; the most recent ones are listed first. Reserved for something important to say.

Read the article
Ubuntu suddenly freezes

- by tapan

I've a strange problem with my ubuntu 10.04 installation. Whenever i boot into ubuntu the entire system freezes / hangs soon after (~ 2 mins in). This problem exists on my windows 7 installation too. However if i start World of Warcraft or Warcraft on windows it doesnt hang for the duration i'm playing the game. After i stop playing and exit the game my laptop hangs inside 2 mins. Here is when it gets weirder. If i disconnect the charger, the laptop doesn't hang. However when I start it in ubuntu recovery mode and drop to root shell and use the 'startx' command everything works perfectly. I cannot figure out what the problem is. i have an intel core2duo 2.2ghz processor, intel mobile 965 graphics, 2 GB RAM for more details here is the output of cat /proc/cpuinfo : processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping : 11 cpu MHz : 2201.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips : 4389.80 clflush size : 64 power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping : 11 cpu MHz : 2201.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips : 4388.96 clflush size : 64 power management: here is the output of cat /proc/meminfo MemTotal: 2052440 kB MemFree: 55924 kB Buffers: 579352 kB Cached: 821752 kB SwapCached: 704 kB Active: 897124 kB Inactive: 1032256 kB Active(anon): 412140 kB Inactive(anon): 264804 kB Active(file): 484984 kB Inactive(file): 767452 kB Unevictable: 0 kB Mlocked: 0 kB HighTotal: 1178440 kB HighFree: 6012 kB LowTotal: 874000 kB LowFree: 49912 kB SwapTotal: 995988 kB SwapFree: 986616 kB Dirty: 8928 kB Writeback: 0 kB AnonPages: 527596 kB Mapped: 76536 kB Slab: 39480 kB SReclaimable: 21100 kB SUnreclaim: 18380 kB PageTables: 5672 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 2022208 kB Committed_AS: 1856400 kB VmallocTotal: 122880 kB VmallocUsed: 11928 kB VmallocChunk: 104644 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 4096 kB DirectMap4k: 16376 kB DirectMap4M: 892928 kB Also the kern.log doesn't show any errors. What I want to know is what might be the problem, how i could test for it and if there are any solutions I could try. Thanks :).

Read the article
Is the Core i5 Processor from Intel like the Celerons of yesteryear?

- by Chris

The title pretty much says it. I know that the Core i7's are Quad Core and Hyper-threaded (so 4 cores, and 8 logical), and the Core i5's are Quad Core as well but not Hyper-threaded, does this really make a difference? Or are the only people who are going to care are the ones who CPU intensive operations? I'm a developer, so I'm more concerned about hard drive speed most times than CPU speed. Any thoughts?

Read the article
Ubuntu/Windows suddenly freezes

- by tapan

I've a strange problem with my ubuntu 10.04 installation. Whenever i boot into ubuntu the entire system freezes / hangs soon after (~ 2 mins in). This problem exists on my windows 7 installation too. However if i start World of Warcraft or Warcraft on windows it doesnt hang for the duration i'm playing the game. After i stop playing and exit the game my laptop hangs inside 2 mins. Here is when it gets weirder. If i disconnect the charger, the laptop doesn't hang. However when I start it in ubuntu recovery mode and drop to root shell and use the 'startx' command everything works perfectly. I cannot figure out what the problem is. i have an intel core2duo 2.2ghz processor, intel mobile 965 graphics, 2 GB RAM for more details here is the output of cat /proc/cpuinfo : processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping : 11 cpu MHz : 2201.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips : 4389.80 clflush size : 64 power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz stepping : 11 cpu MHz : 2201.000 cache size : 4096 KB physical id : 0 siblings : 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm lahf_lm ida tpr_shadow vnmi flexpriority bogomips : 4388.96 clflush size : 64 power management: here is the output of cat /proc/meminfo MemTotal: 2052440 kB MemFree: 55924 kB Buffers: 579352 kB Cached: 821752 kB SwapCached: 704 kB Active: 897124 kB Inactive: 1032256 kB Active(anon): 412140 kB Inactive(anon): 264804 kB Active(file): 484984 kB Inactive(file): 767452 kB Unevictable: 0 kB Mlocked: 0 kB HighTotal: 1178440 kB HighFree: 6012 kB LowTotal: 874000 kB LowFree: 49912 kB SwapTotal: 995988 kB SwapFree: 986616 kB Dirty: 8928 kB Writeback: 0 kB AnonPages: 527596 kB Mapped: 76536 kB Slab: 39480 kB SReclaimable: 21100 kB SUnreclaim: 18380 kB PageTables: 5672 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 2022208 kB Committed_AS: 1856400 kB VmallocTotal: 122880 kB VmallocUsed: 11928 kB VmallocChunk: 104644 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 4096 kB DirectMap4k: 16376 kB DirectMap4M: 892928 kB Also the kern.log doesn't show any errors. What I want to know is what might be the problem, how i could test for it and if there are any solutions I could try. Thanks :).

Read the article
Fan speed monitor Software for Macbook Pro Unibody on Windows

- by dtmunir

I've tried multiple temperature monitor and fan speed software on my Macbook Pro Unibody under Windows 7 64-bit RC. None of them can report the fan speed. Currently I'm using SpeedFan which reports the CPU temperature of each of the two cores, but is not able to detect or interface with the Fans. Has anyone had any luck with this?

Read the article
Is Prime95 a dual-core tool?

- by Ssvarc

Does Prime95 use multiple cores? (Is there a difference in this regard between running it from Windows, UBCD, or UBCD4Win?) If it doesn't, are there tools out there that will (preferably via a boot disc)? Thanks!

Read the article
MD3200 - 3 to 4x less throughput than MD1220. Am I missing something here?

- by Igor Polishchuk

I have two R710 servers with similar configuration. One in my office has MD1220 attached. Another one in the datacenter of my hosting services vendor has MD3200. I'm getting significantly worse throughput from MD3200 at my vendors setup. I'm mostly interested in sequential writes, and I'm getting these results in bonnie++ and dd tests: Seq. writes on MD1220 in my office: 1.1 GB/s - bonnie++, 1.3GB/s - dd Seq. writes on MD3200 at my vendor's: 240MB/s - bonnie++, 310MB/s - dd Unfortunately, I could not test the exactly the same configurations, but the two I have should be comparable. If anything, my good performing environment is cheaper than the bad performing. I expect at least similar throughput from these two setups. My vendor cannot really help me. Hopefully, somebody more familiar with the DAS performance can look at it and tell if I'm missing something here and my expectations are too high. To summarize, the question here is it reasonable to expect about 100MB/s of sequential write throughput per each couple of drives in RAID10 on MD3200? Is there any trick to enable such performance in MD3200 with dual controller as opposed to simple MD1220 with a single H800 adapter? More details about the configurations: A good one in my office: Dell R710 2CPU X5650 @ 2.67GHz 12 cores 96GB DDR3, OS: RHEL 5.5, kernel 2.6.18-194.26.1.el5 x86_64 20x300GB 2.5" SAS 10K in a single RAID10 1MB chunk size on MD1220 + Dell H800 I/O controller with 1GB cache in the host Not so good one at my vendor's: Dell R710 2CPU L5520 @ 2.27GHz 8 cores 144GB DDR3, OS: RHEL 5.5, kernel 2.6.18-194.11.4.el5 x86_64 20x146GB 2.5" SAS 15K in a single RAID10 512KB chunk size, Dell MD3200, 2 I/O controllers in array with 1GB cache each Additional information. I've also ran the same tests on the same vendor's host, but the storage was: two raids of 14x146GB 15K RPM drives RAID 10, striped together on the OS level on MD3000+MD1000. The performance was about 25% worse than on MD3200 despite having more drives. When I ran similar tests on the internal storage of my vendor's host (2x146GB 15K RPM drives RAID1, Perc 6i) I've got about 128MB/s seq. writes. Just two internal drives gave me about a half of 20 drives' throughput on MD3200. The random I/O performance of the MD3200 setup is ok, it gives me at least 1300 IOPS. I'm mostly have problems with sequentioal I/O throughput. Thank you for looking into it. Regards Igor

Read the article

Search Results

Search found 854 results on 35 pages for 'cores'.

Page 12/35 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >

- by Klark

- by choppy

- by react

- by pancake

- by Reed

- by Stefan Hinker

- by Rob Farley

- by Laila

- by CliveT

- by Rodney Landrum

- by Don Quixote

- by kovan

- by JohnIdol

- by ogzylz

- by NoozNooz42

- by Drew

- by dtrosset

- by Vilx-

- by user554629

- by tapan

- by Chris

- by tapan

- by dtmunir

- by Ssvarc

- by Igor Polishchuk

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19 | Next Page >