Search Results

Search found 23103 results on 925 pages for 'performance issues and ha'.

Page 117/925 | < Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >

Why database partitioning didn't work? Extract from thedailywtf.com

- by questzen

Original link. http://thedailywtf.com/Articles/The-Certified-DBA.aspx. Article summary: The DBA suggests an approach involving rigorous partitioning, 10 partitions per disk (3 actual disks and 3 raid). The stats show that the performance is non-optimal. Then the DBA suggests an alternative of 1 partition per disk (with more added disks). This also fails. The sys-admin then sets up a single disk, single partition and saves the day. The size of disks was not mentioned but given today,s typical disk sizes (of the order of 100 GB), the partitions ; would be huge, it surprises me that a single disk with all partitions outperformed. Initially I suspect that the data was segregated and hence faster reads. But how come the performance didn't degrade as time went by with all the inserts and updates happening? Saw this on reddit, but the explanation was by far spindle/platter centered. There was no mention in the article about this. Is there any other reason? I can only guess that the tables were using a incorrect hash distribution causing non-uniform allocation across disks (wrong partitioning); this would increase fetch times. Any thoughts?

Read the article
How to find out which process is hogging the linux server?

- by user1149518

We have a RHEL server. Today it suddenly became slow. Symptoms - It was responding slow to ping queries from other server. When I try to login using ssh, it was taking about 10 seconds to login. I was able to resolve the problem by doing some guess work. I killed one process which I thought was culprit. Which resolved the problem. Though I would like to know what's proper approach to detect the culprit in such kind of "slow server" situations. Le me know proper way to resolving such slowness issues and decting the process causing the slowness. These were the conditions when the server was slow - # vmstat 3 3 procs -----------memory---------- ---swap-- -----io---- --system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 1 1 176 6730868 285052 4899676 0 0 3 4 0 0 1 1 97 1 0 0 0 176 6751576 285064 4899704 0 0 0 115 15307 37171 1 1 96 3 0 0 0 176 6751948 285068 4899700 0 0 0 23 14813 39559 1 1 98 1 0 # top top - 16:38:18 up 150 days, 19:36, 64 users, load average: 1.68, 1.46, 1.44 Tasks: 1287 total, 2 running, 1284 sleeping, 1 stopped, 0 zombie Cpu(s): 1.3%us, 1.7%sy, 0.1%ni, 95.9%id, 0.7%wa, 0.0%hi, 0.2%si, 0.0%st Mem: 16620824k total, 9867124k used, 6753700k free, 287424k buffers Swap: 8193140k total, 176k used, 8192964k free, 4898996k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 26258 khk 34 19 130m 47m 7088 S 11.2 0.3 385:32.42 edm Though I would like to know what's proper approach to detect the culprit in such kind of "slow server" situations. Le me know proper way to resolving such slowness issues and decting the process causing the slowness.

Read the article
Disk fragmentation when dealing with many small files

- by Zorlack

On a daily basis we generate about 3.4 Million small jpeg files. We also delete about 3.4 Million 90 day old images. To date, we've dealt with this content by storing the images in a hierarchical manner. The heriarchy is something like this: /Year/Month/Day/Source/ This heirarchy allows us to effectively delete days worth of content across all sources. The files are stored on a Windows 2003 server connected to a 14 disk SATA RAID6. We've started having significant performance issues when writing-to and reading-from the disks. This may be due to the performance of the hardware, but I suspect that disk fragmentation may be a culprit at well. Some people have recommended storing the data in a database, but I've been hesitant to do this. An other thought was to use some sort of container file, like a VHD or something. Does anyone have any advice for mitigating this kind of fragmentation? Additional Info: The average file size is 8-14KB Format information from fsutil: NTFS Volume Serial Number : 0x2ae2ea00e2e9d05d Version : 3.1 Number Sectors : 0x00000001e847ffff Total Clusters : 0x000000003d08ffff Free Clusters : 0x000000001c1a4df0 Total Reserved : 0x0000000000000000 Bytes Per Sector : 512 Bytes Per Cluster : 4096 Bytes Per FileRecord Segment : 1024 Clusters Per FileRecord Segment : 0 Mft Valid Data Length : 0x000000208f020000 Mft Start Lcn : 0x00000000000c0000 Mft2 Start Lcn : 0x000000001e847fff Mft Zone Start : 0x0000000002163b20 Mft Zone End : 0x0000000007ad2000

Read the article
Windows XP to remote server 2008 R2 shares - awful response times

- by nick3216

I have a network infrastructure of Windows XP clients (a mix of XP and 64-bit XP), that are accessing a network share on a Windows 2008 R2 server. Whenever users type the address of a folder into the address bar of Windows Explorer it's as snappy at determining the contents of the current folder and presenting them to you in the address bar as if you're working on a local drive. But if you open one of the subfolders users get the animated red torch and 'Searching for items...' dialog, typically for 45 seconds. Similarly when using the open folder dialog to try and select a subfolder on this share it takes, on average, 45 seconds for the dialog to expand each node and show the subfolders of each node. Also, while the Explorer instance accsesing the network share is running slowly users notice that the performance of all other Explorer windows suffers. So while Explorer is searching for files on the network share they can't switch to another task and navigate around their local drive using Explorer because it's now as slow as a dead dog at accessing anything. Are there any settings we can change which will improve the performance accessing network shares?

Read the article
Linux HA cluster w/Xen, Heartbeat, Pacemaker. domU does not failover to secondary node

- by Kendall

I am having the followig problem with an OenSuSE + Heartbeat + Pacemaker + Xen HA cluster: when the node a Xen domU is running on is "dead" the Xen domU running on it is not restarted on the second node. The cluster is setup with two nodes, each running OpenSuSE-11.3, Heartbeat 3.0, and Pacemaker 1.0 in CRM mode. For storage I am using a LUN on an iSCSI SAN device; the LUN is formatted with OCFS2 and managed with LVM. The Xen domU has two logical volumes; one for root and the other for swap. I am using IPMI cards for STONITH devices, and a dedicated ethernet link for heartbeat communications. The ha.cf file is as follows: keepalive 1 deadtime 10 warntime 5 udpport 694 ucast eth1 auto_failback off node dhcp-166 node stage use_logd yes crm yes My resources look as follows: shocrm(live)configure# show node $id="5c1aa924-bba4-4f95-a367-6c9a58ac4a38" dhcp-166 node $id="cebc92eb-af24-4833-aaf0-672adf80b58e" stage primitive Xen-Util ocf:heartbeat:Xen \ meta target-role="Started" \ operations $id="Xen-Util-operations" \ op start interval="0" timeout="60" start-delay="0" \ op stop interval="0" timeout="120" \ params xmfile="/etc/xen/vm/xen-util" primitive my-stonith stonith:external/ipmi \ params hostname="dhcp-166" ipaddr="192.168.3.106" userid="ADMIN" passwd="xxx" \ op monitor interval="2m" timeout="60s" primitive my-stonith2 stonith:external/ipmi \ params hostname="stage" ipaddr="192.168.3.105" userid="ADMIN" passwd="xxx" \ op monitor interval="2m" timeout="60s" property $id="cib-bootstrap-options" \ dc-version="1.0.9-89bd754939df5150de7cd76835f98fe90851b677" \ cluster-infrastructure="Heartbeat" The Xen domU config file is as follows: name = "xen-util" bootloader = "/usr/lib/xen/boot/domUloader.py" #bootargs = "xvda1:/vmlinuz-xen,/initrd-xen" bootargs = "--entry=xvda1:/boot/vmlinuz-xen,/boot/initrd-xen" memory = 4096 disk = [ 'phy:vg_xen/xen-util-root,xvda1,w', 'phy:vg_xen/xen-util-swap,xvda2,w', ] root = "/dev/xvda1" vif = [ 'mac=00:16:3e:42:42:06' ] #vfb = [ 'type=vnc,vncunused=0,vnclisten=192.168.3.172' ] extra = "" Say domU "Xen-Util" is running on node "stage"; if "stage" goes down, "Xen-Util" does not restart on node "dhcp-166". It seems to want to try as an "xm list" will show it for a few seconds and if you "xm console xen-util" it will give a message like "copying /boot/kernel.gz from xvda1 to /var/lib/xen/tmp/kernel.a53gs for booting". However, it never gets past that, eventually gives up, and no longer appears in "xm list". Now, when node "stage" comes back online after being power cycled, it detects that "Xen-Util" isn't running, and starts it (on stage). I've tried starting "Xen-Util" on node "dhcp-166" without the cluster running, and it works fine. No problems. So, I know it works in that respect. Any ideas? Thanks!

Read the article
How to force two process to run on the same CPU?

- by kovan

Context: I'm programming a software system that consists of multiple processes. It is programmed in C++ under Linux. and they communicate among them using Linux shared memory. Usually, in software development, is in the final stage when the performance optimization is made. Here I came to a big problem. The software has high performance requirements, but in machines with 4 or 8 CPU cores (usually with more than one CPU), it was only able to use 3 cores, thus wasting 25% of the CPU power in the first ones, and more than 60% in the second ones. After many research, and having discarded mutex and lock contention, I found out that the time was being wasted on shmdt/shmat calls (detach and attach to shared memory segments). After some more research, I found out that these CPUs, which usually are AMD Opteron and Intel Xeon, use a memory system called NUMA, which basically means that each processor has its fast, "local memory", and accessing memory from other CPUs is expensive. After doing some tests, the problem seems to be that the software is designed so that, basically, any process can pass shared memory segments to any other process, and to any thread in them. This seems to kill performance, as process are constantly accessing memory from other processes. Question: Now, the question is, is there any way to force pairs of processes to execute in the same CPU?. I don't mean to force them to execute always in the same processor, as I don't care in which one they are executed, altough that would do the job. Ideally, there would be a way to tell the kernel: If you schedule this process in one processor, you must also schedule this "brother" process (which is the process with which it communicates through shared memory) in that same processor, so that performance is not penalized.

Read the article
Alternative or succesor to GDBM

- by Anon Guy

We a have a GDBM key-value database as the backend to a load-balanced web-facing application that is in implemented in C++. The data served by the application has grown very large, so our admins have moved the GDBM files from "local" storage (on the webservers, or very close by) to a large, shared, remote, NFS-mounted filesystem. This has affected performance. Our performance tests (in a test environment) show page load times jumping from hundreds of milliseconds (for local disk) to several seconds (over NFS, local network), and sometimes getting as high as 30 seconds. I believe a large part of the problem is that the application makes lots of random reads from the GDBM files, and that these are slow over NFS, and this will be even worse in production (where the front-end and back-end have even more network hardware between them) and as our database gets even bigger. While this is not a critical application, I would like to improve performance, and have some resources available, including the application developer time and Unix admins. My main constraint is time only have the resources for a few weeks. As I see it, my options are: Improve NFS performance by tuning parameters. My instinct is we wont get much out of this, but I have been wrong before, and I don't really know very much about NFS tuning. Move to a different key-value database, such as memcachedb or Tokyo Cabinet. Replace NFS with some other protocol (iSCSI has been mentioned, but i am not familiar with it). How should I approach this problem?

Read the article
Refactoring or Rewriting Monolithic PHP Spaghetti Codebase

- by nategood

I've inherited a really poorly designed PHP spaghetti code project. It's been gaining a good bit of traffic recently and is starting to have performance issues on top of the poor monolithic code base. Its maxing out performance on a chunky 16GB dedicated machine when it really shouldn't be. I'm planning on doing some performance tweaks right off the bat to help the performance issue, but this still won't really help the horrible code base. The team is small but expecting to grow very soon. I've read Joel's article on the troubles of doing a complete rewrite and see the concerns. But how bad does the code base have to be before you consider a rewrite? There is PHP handling logic interjected into what one would usually consider a "view". Even worse, in some places SQL statements are in these same files! The only real separation of presentation and logic are a few PHP scripts that serve as function libraries. These scripts do most of the ORM stuff... if you can even call it that. Trying to slowly refractor this seems like a nightmare. Open to your thoughts and opinions... however not interested in hearing, "Run away, Run away!".

Read the article
How can a single disk in a hardware SATA RAID-10 array bring the entire array to a screeching halt?

- by Stu Thompson

Prelude: I'm a code-monkey that's increasingly taken on SysAdmin duties for my small company. My code is our product, and increasingly we provide the same app as SaaS. About 18 months ago I moved our servers from a premium hosting centric vendor to a barebones rack pusher in a tier IV data center. (Literally across the street.) This ment doing much more ourselves--things like networking, storage and monitoring. As part the big move, to replace our leased direct attached storage from the hosting company, I built a 9TB two-node NAS based on SuperMicro chassises, 3ware RAID cards, Ubuntu 10.04, two dozen SATA disks, DRBD and . It's all lovingly documented in three blog posts: Building up & testing a new 9TB SATA RAID10 NFSv4 NAS: Part I, Part II and Part III. We also setup a Cacit monitoring system. Recently we've been adding more and more data points, like SMART values. I could not have done all this without the awesome boffins at ServerFault. It's been a fun and educational experience. My boss is happy (we saved bucket loads of $$$), our customers are happy (storage costs are down), I'm happy (fun, fun, fun). Until yesterday. Outage & Recovery: Some time after lunch we started getting reports of sluggish performance from our application, an on-demand streaming media CMS. About the same time our Cacti monitoring system sent a blizzard of emails. One of the more telling alerts was a graph of iostat await. Performance became so degraded that Pingdom began sending "server down" notifications. The overall load was moderate, there was not traffic spike. After logging onto the application servers, NFS clients of the NAS, I confirmed that just about everything was experiencing highly intermittent and insanely long IO wait times. And once I hopped onto the primary NAS node itself, the same delays were evident when trying to navigate the problem array's file system. Time to fail over, that went well. Within 20 minuts everything was confirmed to be back up and running perfectly. Post-Mortem: After any and all system failures I perform a post-mortem to determine the cause of the failure. First thing I did was ssh back into the box and start reviewing logs. It was offline, completely. Time for a trip to the data center. Hardware reset, backup an and running. In /var/syslog I found this scary looking entry: Nov 15 06:49:44 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_00], 6 Currently unreadable (pending) sectors Nov 15 06:49:44 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_07], SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 171 to 170 Nov 15 06:49:45 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_10], 16 Currently unreadable (pending) sectors Nov 15 06:49:45 umbilo smartd[2827]: Device: /dev/twa0 [3ware_disk_10], 4 Offline uncorrectable sectors Nov 15 06:49:45 umbilo smartd[2827]: Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error Nov 15 06:49:45 umbilo smartd[2827]: # 1 Short offline Completed: read failure 90% 6576 3421766910 Nov 15 06:49:45 umbilo smartd[2827]: # 2 Short offline Completed: read failure 90% 6087 3421766910 Nov 15 06:49:45 umbilo smartd[2827]: # 3 Short offline Completed: read failure 10% 5901 656821791 Nov 15 06:49:45 umbilo smartd[2827]: # 4 Short offline Completed: read failure 90% 5818 651637856 Nov 15 06:49:45 umbilo smartd[2827]: So I went to check the Cacti graphs for the disks in the array. Here we see that, yes, disk 7 is slipping away just like syslog says it is. But we also see that disk 8's SMART Read Erros are fluctuating. There are no messages about disk 8 in syslog. More interesting is that the fluctuating values for disk 8 directly correlate to the high IO wait times! My interpretation is that: Disk 8 is experiencing an odd hardware fault that results in intermittent long operation times. Somehow this fault condition on the disk is locking up the entire array Maybe there is a more accurate or correct description, but the net result has been that the one disk is impacting the performance of the whole array. The Question(s) How can a single disk in a hardware SATA RAID-10 array bring the entire array to a screeching halt? Am I being naïve to think that the RAID card should have dealt with this? How can I prevent a single misbehaving disk from impacting the entire array? Am I missing something?

Read the article
protobuf-net: Issues deserializing DataMember fields in lieu of read-only property

- by Paul Smith

I'm having issues deserializing certain properties of ORM-generated entities using protobuf-net. I suspect something in the way the ORM manages serialization attributes on read-only properties (uses public backing fields with DataMember attributes & [de]serializes) those instead of the corresponding read-only property, which has an IgnoreDataMember attribute). Guid properties might have issues of their own, but the field vs. property thing is my working theory now. Here's a simplified example of the code. Say I have a class, Account with an AccountID read-only guid, and an AccountName read-write string. I serialize & immediately deserialize a clone. In this scenario I get one of two results (depending on the entity, haven't isolated the specific commonality yet). The deserialized clone either: ...has a different AccountID from the original, or ...throws an Incorrect wire-type deserializing Guid exception while deserializing. Here's example usage... Account acct = new Account() { AccountName = "Bob's Checking" }; Debug.WriteLine(acct.AccountID.ToString()); using (MemoryStream ms = new MemoryStream()) { ProtoBuf.Serializer.Serialize<Account>(ms, acct); Debug.WriteLine(Encoding.UTF8.GetString(ms.GetBuffer())); ms.Position = 0; Account clone = ProtoBuf.Serializer.Deserialize<Account>(ms); Debug.WriteLine(clone.AccountID.ToString()); } And here's an example ORM'd class (simplified; hopefully haven't removed the cause of the issue in the process). Uses a shell game to deserialize read-only properties by exposing the backing field ("can't write" essentially becomes "shouldn't write," but we can scan code for instances of assigning to these fields, so the hack works for our purposes): [DataContract()] [Serializable()] public partial class Account { public Account() { _accountID = Guid.NewGuid(); } [XmlAttribute("AccountID")] [DataMember(Name = "AccountID", Order = 0)] public Guid _accountID; /// <summary> /// A read-only property; XML, JSON and DataContract serializers all seem /// to correctly recognize the public backing field when deserializing: /// </summary> [IgnoreDataMember] [XmlIgnore] public Guid AccountID { get { return this._accountID; } } [IgnoreDataMember] protected string _accountName; [DataMember(Name = "AccountName", Order = 1)] [XmlAttribute] public string AccountName { get { return this._accountName; } set { this._accountName = value; } } } XML, JSON and DataContract serializers all seem to serialize / deserialize matching object graphs here, so this attribute arrangement apparently causes those serializers to correctly assign to the public backing field when deserializing. I've tried protobuf-net with lists vs. single instances, different prefix styles, etc., but always either get the 'incorrect wire type ... Guid' exception, or the Guid property (field) not deserializing correctly. So the specific questions are, is there a quick workaround for this, and/or is there an explanation for both of outcomes 1 & 2 above, and/or can protobuf-net somehow be corralled into behaving like WCF in cases like this (i.e. follow the same DataMember/IgnoreDataMember semantics)? We hope not to have to create a protobuf dependency directly in the entity layer; if that's the case, we'll probably create proxy DTO entities with all public properties having protobuf attributes. (This is a subjective issue I have with all declarative serialization models; it's a ubiquitous pattern, but IMO, "normal" should be to have objects and serialization contracts decoupled.) Thanks!

Read the article
Openfire and LDAP issues

- by clsmith

Thanks in advance for the help. Has anyone see this issue with openfire? Currently I use Openfire Fedora with Auth using windows 2003 and also use mysql for the database. When I bring up two clients and talk to each other the time is slow between messages. Sometimes it can take between 5-15 minutes for something sent to get to the person (this is with only two people on the openfire server). I ran a tcp dump using port 389 and see that the machine is running thousands of queries against ldap. When i plug it into wireshark I notice that it is transferring the entire contact list or checking on the status of the entire contact list ? When I run debug on openfire itself I am presented with only this small message in the log: 2010.06.08 07:01:17 LdapManager: Starting LDAP search... 2010.06.08 07:01:17 LdapManager: ... search finished 2010.06.08 07:01:17 LdapManager: Creating a DirContext in LdapManager.getContext()... 2010.06.08 07:01:17 LdapManager: Created hashtable with context values, attempting to create context... 2010.06.08 07:01:17 LdapManager: ... context created successfully, returning. 2010.06.08 07:01:17 LdapManager: Trying to find a groups's DN based on it's groupname. cn: Spark agents CLT, Base DN: OU="Hidden",DC="Hidden",DC="net"... 2010.06.08 07:01:17 LdapManager: Creating a DirContext in LdapManager.getContext()... 2010.06.08 07:01:17 LdapManager: Created hashtable with context values, attempting to create context... 2010.06.08 07:01:17 LdapManager: ... context created successfully, returning. 2010.06.08 07:01:17 LdapManager: Starting LDAP search... 2010.06.08 07:01:17 LdapManager: ... search finished 2010.06.08 07:01:17 LdapManager: Trying to find a groups's DN based on it's groupname. cn: Spark agents CLT, Base DN: OU="Hidden",DC="Hidden",DC="net"... 2010.06.08 07:01:17 LdapManager: Creating a DirContext in LdapManager.getContext()... 2010.06.08 07:01:17 LdapManager: Created hashtable with context values, attempting to create context... 2010.06.08 07:01:17 LdapManager: ... context created successfully, returning. 2010.06.08 07:01:17 LdapManager: Starting LDAP search... 2010.06.08 07:01:17 LdapManager: ... search finished I thought this was a configuration on my end and started to look into the cache settings on the openfire webpages. I tweaked the settings as recommend by the pages and still get the same issues. I doesnt seem to cache the contact list or this might be a feature never fixed or implemented. Has anyone gone through this before ? I have searched online and I see others have great experience with openfire with no issues like I have, or is it because noone checked the queries ? For the time being I created a new Domain Controller and moved openfire to that computer so it can run local queries. This seems to help reduce the speed alot, but when I run the server performance manager tool I see that with two people only using that openfire server I run 593.7 request per second. Thanks for your help, if I didnt provide enough data please let me know what you need and I can find it.

Read the article
Do classes which have a vector has a member have memory issues

- by user263766

I am just starting out C++, so sorry if this is a dumb question. I have a class Braid whose members are vectors. I have not written an assignment operator. When I do a lot of assignments to an object of the type Braid, I run into memory issues :- 0 0xb7daff89 in _int_malloc () from /lib/libc.so.6 #1 0xb7db2583 in malloc () from /lib/libc.so.6 #2 0xb7f8ac59 in operator new(unsigned int) () from /usr/lib/libstdc++.so.6 #3 0x0804d05e in __gnu_cxx::new_allocator<int>::allocate (this=0xbf800204, __n=1) at /usr/lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../include/c++/4.4.3/ext/new_allocator.h:89 #4 0x0804cb0e in std::_Vector_base<int, std::allocator<int> >::_M_allocate (this=0xbf800204, __n=1) at /usr/lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../include/c++/4.4.3/bits/stl_vector.h:140 #5 0x0804c086 in _Vector_base (this=0xbf800204, __n=1, __a=...) at /usr/lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../include/c++/4.4.3/bits/stl_vector.h:113 #6 0x0804b4b7 in vector (this=0xbf800204, __x=...) at /usr/lib/gcc/i686-pc-linux-gnu/4.4.3/../../../../include/c++/4.4.3/bits/stl_vector.h:242 #7 0x0804b234 in Braid (this=0xbf800204) at braid.h:13 #8 0x080495ed in Braid::cycleBraid (this=0xbf8001b4) at braid.cpp:191 #9 0x080497c6 in Braid::score (this=0xbf800298, b=...) at braid.cpp:251 #10 0x08049c46 in Braid::evaluateMove (this=0xbf800468, move=1, pos=0, depth=2, b=...) I suspect that these memory issues are because the vectors are getting resized. What I want to know is whether objects of type Braid automatically expand when its members expand? he code I am writing is really long so I will post the section which is causing the problems. Here is the relevant section of the code :- class Braid { private : vector<int> braid; //Stores the braid. int strands; vector < vector<bool> > history; vector < vector<bool> > CM; public : Braid () : strands(0) {} Braid operator * (Braid); Braid* inputBraid(int,vector<int>); int printBraid(); int printBraid(vector<vector<int>::iterator>); vector<int>::size_type size() const; ..... ..... } Here is the function which causes the issue :- int Braid::evaluateMove(int move,int pos,int depth,Braid b) { int netscore = 0; Braid curr(*this); curr = curr.move(move,pos); netscore += curr.score(b); while(depth > 1) { netscore += curr.evaluateMove(1,0,depth,b); netscore += curr.evaluateMove(2,0,depth,b); for(int i = 0; i < braid.size();++i) { netscore += curr.evaluateMove(3,i,depth,b); netscore += curr.evaluateMove(4,i,depth,b); netscore += curr.evaluateMove(5,i,depth,b); curr = curr.cycleBraid(); netscore += curr.evaluateMove(6,0,depth,b); } --depth; } return netscore; }

Read the article
X11 performance problem after upgrading from Centos3 to Centos5 with an ATI Rage XL

- by Marcelo Santos

After upgrading a computer from Centos3 to Centos5 an application that does a lot of scrolling took a very high performance hit. top tells me that X is using a lot of CPU and that was not happening before. The machine has an ATI Rage XL with 8MB and X is using the ati driver as there is no proprietary ATI driver for this board on linux. The xorg.conf: Section "Device" Identifier "Videocard0" Driver "ati" EndSection Section "Screen" Identifier "Screen0" Device "Videocard0" DefaultDepth 24 SubSection "Display" Viewport 0 0 Depth 24 Modes "1024x768" "800x600" "640x480" EndSubSection EndSection Section "DRI" Group 0 Mode 0666 EndSection A similar machine that still has Centos3 installed is able to start DRI on the X server while this one is not, this is the Xorg.0.log for the Centos5 machine: drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: Open failed drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: Open failed [drm] failed to load kernel module "mach64" (II) ATI(0): [drm] drmOpen failed (EE) ATI(0): [dri] DRIScreenInit Failed (II) ATI(0): Largest offscreen areas (with overlaps): (II) ATI(0): 1024 x 1279 rectangle at 0,768 (II) ATI(0): 768 x 1280 rectangle at 0,768 (II) ATI(0): Using XFree86 Acceleration Architecture (XAA) Screen to screen bit blits Solid filled rectangles 8x8 mono pattern filled rectangles Indirect CPU to Screen color expansion Solid Lines Offscreen Pixmaps Setting up tile and stipple cache: 32 128x128 slots 10 256x256 slots (==) ATI(0): Backing store disabled (==) ATI(0): Silken mouse enabled (II) ATI(0): Direct rendering disabled (==) RandR enabled I also tried using EXA instead of XAA and setting: Option "AccelMethod" "XAA" Option "XAANoOffscreenPixmaps" "true" uname -a Linux sir5.erg.inpe.br 2.6.18-128.7.1.el5 #1 SMP Mon Aug 24 08:20:55 EDT 2009 i686 i686 i386 GNU/Linux rpm -qa | grep xorg-x11-server xorg-x11-server-utils-7.1-4.fc6 xorg-x11-server-sdk-1.1.1-48.52.el5 xorg-x11-server-Xvfb-1.1.1-48.52.el5 xorg-x11-server-Xnest-1.1.1-48.52.el5 xorg-x11-server-Xorg-1.1.1-48.52.el5 The drmOpenDevice error continues when using the suggested Option "AIGLX" "true".

Read the article
SAN with iSCSI-Target Performance Horrendous

- by Justin

We have a poor man's SAN setup in a 1U Ubuntu server running iSCSI-Target with two 300GB drives in RAID-0. We then are using it for block level storage for virtual machines. The hypervisor is connected to the SAN via gigabit on a dedicated VLAN and interfaces. We only have a single virtual machine setup and doing some benchmarks. If we run hdparm -t /dev/sda1 from the virtual machine, we get 'ok' performance of 75MB/s from the virtual machine to the SAN. Then we basically compile a package with ./configure and make. Things start ok, but then all the sudden the load average on the SAN grows to 7+ and things slow down to a crawl. When we SSH into the SAN and run top, sure the load is 7+, but the CPU usage is basically nothing, also the server has 1.5GB of memory available. When we kill the compile on the virtual machine, slowly the LOAD on the SAN goes back to sub 1 figures. What in the world is causing this? How can we diagnosis this further? Here are two screenshot from the SAN during high load. 1> Output of iotop on the SAN: 2> Output of top on the SAN:

Read the article
Bad Performance when SQL Server hits 99% Memory Usage

- by user15863

I've got a server that reports 8 GB of ram used up at 99%. When restart Sql Server, it drops down to about 5% usage, but gradually builds back up to 99% over about 2 hours. When I look at the sqlserver process, its reported as only using 100k ram, and generally never goes up or below that number by very much. In fact, if I add up all the processes in my TaskManager, it's barely scratching the surface of my total available (yet TaskManager still shows 99% memory usage with "All processes shown"). It appears that Sql Server has a huge memory leak going on but it's not reporting it. The server has ran fine for nearly two years, with this only starting to manifest itself in the last 3-4 weeks. Anyone seen this or have any insight into the problem? EDIT When the server hits 99%, performance goes down hill. All queries to the server, apps, etc. come to a crawl. Restarting the service makes things zippy again, until 2 hours has passed and the server hits 99% once again.

Read the article
Poor NFS Performance: OpenFiler

- by Safin09

Good Day Everyone, I have an issue with OpenFiler, a Linux-based operating that converts a computer system into a SAN/NAS appliance. Here is the problem. In my environment we have two Netapp Storevault 500 appliances that I normally perform backups to a NFS share. There are two backup cronjobs that use ghettoVCB to backup two groups of VM's. One group is a pool of 3 VMs. This takes 13 mins to complete. A second job that backups a pool of 5 VMs to a 2nd Storevault appliance which takes 2 hours. We then installed Openfiler on a old server that has 2 core Xeon processors. There is a software RAID 5 process in place. When performing the same backups to a NFS Openfiler share, the first backup job, which takes 13 mins, takes around 4 hours. The second backup job, which takes 2 hours, takes almost 10 hours to complete. This is unacceptable!!!! Especially considering the strain placed on the host ESX Server. I assumed that because of the software RAID 5, the overhead on the CPU explained the long backup times. I then installed Openfiler on a 2nd server, an IBM x306 machine which has a P4 Intel processor. This time no software RAID or any RAID at all. A single 750GB hard drive that contained the OS and the rest of the disk uses to backup VMs to a NFS share. I performed the first backup job of the pool of 3 VMs. This time the backup job took 1 and 1/2 hours to complete instead of 13 mins!!!!!!!!!! Is Openfiler simply poor at being an NFS Server!!!!!!!!!!!!! Has anyone else had these issues with Openfiler?

Read the article
Performance of Virtual machines on very low end machines

- by TheLQ

I am managing a few cheap servers as my user base isn't large enough to get much more powerful servers. I also don't have the money lying around to invest in a server to prepare for the larger user base. So I'm stuck with the old hardware I have. I am toying with the idea of virtualizing all the current OS's with most likely VMware vSphere Hypervisor (AKA ESXi) Xen (ESXi has too strict of an HCL, and my hardware is too old). Big reasons for doing so: Ability to upgrade and scale hardware rapidly - This is most likely what I'll be doing as I distribute services, get a bigger server, centralize (electricity bills are horrible), distribute, get a bigger server, etc... Manually doing this by reinstalling the entire OS would be a big pain Safety from me - I've made many rookie mistakes, like doing lots of risky work on a vital production server. With a VM I can just backup the state, work on my machine, test, and revert if necessary. No worries, and no OS reinstallation Safety from other factors - As I scale servers might go down, and a backup VM can instantly be started. Various other reasons. However the limiting factor here is hardware. And I mean very depressing hardware. The current server's run off of a Pentium 3 and 4, and have 512 MB and 768 MB RAM respectively (RAM can be upgraded soon however). Is the Virtualization layer small enough to run itself and a Linux OS effectively? Will performance be acceptable (50% CPU overhead for every operation isn't acceptable)? Does it leave enough RAM for the Linux OS? Is this even feasible?

Read the article
Hints on diagnosing performance issue in OpenBSD firewall

- by Tom

My OpenBSD 4.6 pf firewall has started having really bad performance in the past few weeks. I've isolated the firewall (as opposed to the WAN connection, switch, cable, etc.) as the problem, but need a hint on how to further diagnose or fix the problem. The facts: Normal setup is: DSL Modem - FW Ext. NIC - FW Int. NIC - Switch - Laptop Normal setup described above gives only 25 Kbps! Plugging the laptop straight from the DSL modem gives a 1 MBps connection (full speed, as advertised). Therefore, the DSL connection seems to be OK. Plugging the laptop directly into the firewall's internal NIC (bypassing the switch) also gives only 25 Kbps. Therefore, the switch does not seem to be a problem. I've replaced the ethernet cables, but it didn't help. Here's the weird thing. Reloading the ruleset (/sbin/pfctl -Fa -f /etc/pf.conf) causes the laptop's connection to go up to 1 Mbps (i.e. full speed) for a few minutes before it gradually degrades back down to 25Kbps again. Any ideas on what's wrong or how I could further diagnose the problem?

Read the article
Nginx + uWSGI + Django performance stuck on 100rq/s

- by dancio

I have configured Nginx with uWSGI and Django on CentOS 6 x64 (3.06GHz i3 540, 4GB), which should easily handle 2500 rq/s but when I run ab test ( ab -n 1000 -c 100 ) performance stops at 92 - 100 rq/s. Nginx: user nginx; worker_processes 2; events { worker_connections 2048; use epoll; } uWSGI: Emperor /usr/sbin/uwsgi --master --no-orphans --pythonpath /var/python --emperor /var/python/*/uwsgi.ini [uwsgi] socket = 127.0.0.2:3031 master = true processes = 5 env = DJANGO_SETTINGS_MODULE=x.settings env = HTTPS=on module = django.core.handlers.wsgi:WSGIHandler() disable-logging = true catch-exceptions = false post-buffering = 8192 harakiri = 30 harakiri-verbose = true vacuum = true listen = 500 optimize = 2 sysclt changes: # Increase TCP max buffer size setable using setsockopt() net.ipv4.tcp_rmem = 4096 87380 8388608 net.ipv4.tcp_wmem = 4096 87380 8388608 net.core.rmem_max = 8388608 net.core.wmem_max = 8388608 net.core.netdev_max_backlog = 5000 net.ipv4.tcp_max_syn_backlog = 5000 net.ipv4.tcp_window_scaling = 1 net.core.somaxconn = 2048 # Avoid a smurf attack net.ipv4.icmp_echo_ignore_broadcasts = 1 # Optimization for port usefor LBs # Increase system file descriptor limit fs.file-max = 65535 I did sysctl -p to enable changes. Idle server info: top - 13:34:58 up 102 days, 18:35, 1 user, load average: 0.00, 0.00, 0.00 Tasks: 118 total, 1 running, 117 sleeping, 0 stopped, 0 zombie Cpu(s): 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 3983068k total, 2125088k used, 1857980k free, 262528k buffers Swap: 2104504k total, 0k used, 2104504k free, 606996k cached free -m total used free shared buffers cached Mem: 3889 2075 1814 0 256 592 -/+ buffers/cache: 1226 2663 Swap: 2055 0 2055 **During the test:** top - 13:45:21 up 102 days, 18:46, 1 user, load average: 3.73, 1.51, 0.58 Tasks: 122 total, 8 running, 114 sleeping, 0 stopped, 0 zombie Cpu(s): 93.5%us, 5.2%sy, 0.0%ni, 0.2%id, 0.0%wa, 0.1%hi, 1.1%si, 0.0%st Mem: 3983068k total, 2127564k used, 1855504k free, 262580k buffers Swap: 2104504k total, 0k used, 2104504k free, 608760k cached free -m total used free shared buffers cached Mem: 3889 2125 1763 0 256 595 -/+ buffers/cache: 1274 2615 Swap: 2055 0 2055 iotop 30141 be/4 nginx 0.00 B/s 7.78 K/s 0.00 % 0.00 % nginx: wo~er process Where is the bottleneck ? Or what am I doing wrong ?

Read the article
X11 performance problem after upgrading from Centos3 to Centos5 with an ATI Rage XL

- by Marcelo Santos

After upgrading a computer from Centos3 to Centos5 an application that does a lot of scrolling took a very high performance hit. top tells me that X is using a lot of CPU and that was not happening before. The machine has an ATI Rage XL with 8MB and X is using the ati driver as there is no proprietary ATI driver for this board on linux. The xorg.conf: Section "Device" Identifier "Videocard0" Driver "ati" EndSection Section "Screen" Identifier "Screen0" Device "Videocard0" DefaultDepth 24 SubSection "Display" Viewport 0 0 Depth 24 Modes "1024x768" "800x600" "640x480" EndSubSection EndSection Section "DRI" Group 0 Mode 0666 EndSection A similar machine that still has Centos3 installed is able to start DRI on the X server while this one is not, this is the Xorg.0.log for the Centos5 machine: drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: Open failed drmOpenDevice: node name is /dev/dri/card0 drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: open result is -1, (No such device or address) drmOpenDevice: Open failed [drm] failed to load kernel module "mach64" (II) ATI(0): [drm] drmOpen failed (EE) ATI(0): [dri] DRIScreenInit Failed (II) ATI(0): Largest offscreen areas (with overlaps): (II) ATI(0): 1024 x 1279 rectangle at 0,768 (II) ATI(0): 768 x 1280 rectangle at 0,768 (II) ATI(0): Using XFree86 Acceleration Architecture (XAA) Screen to screen bit blits Solid filled rectangles 8x8 mono pattern filled rectangles Indirect CPU to Screen color expansion Solid Lines Offscreen Pixmaps Setting up tile and stipple cache: 32 128x128 slots 10 256x256 slots (==) ATI(0): Backing store disabled (==) ATI(0): Silken mouse enabled (II) ATI(0): Direct rendering disabled (==) RandR enabled I also tried using EXA instead of XAA and setting: Option "AccelMethod" "XAA" Option "XAANoOffscreenPixmaps" "true" uname -a Linux sir5.erg.inpe.br 2.6.18-128.7.1.el5 #1 SMP Mon Aug 24 08:20:55 EDT 2009 i686 i686 i386 GNU/Linux rpm -qa | grep xorg-x11-server xorg-x11-server-utils-7.1-4.fc6 xorg-x11-server-sdk-1.1.1-48.52.el5 xorg-x11-server-Xvfb-1.1.1-48.52.el5 xorg-x11-server-Xnest-1.1.1-48.52.el5 xorg-x11-server-Xorg-1.1.1-48.52.el5 The drmOpenDevice error continues when using the suggested Option "AIGLX" "true".

Read the article
Setting up MongoDB in High Performance Computing LSF linux cluster

- by Dnaiel

I am trying to run mongo in a LSF cluster computing environment where I have no admin control. Our sysadmin installed mongodb, but it is not running. Any ideas on what should I ask the server admin to do for it to run? Or if I could run it locally? [node1382]allelix> mongod --dbpath /users/dnaiel/ma/mongodb/ Tue Oct 2 21:33:48 [initandlisten] MongoDB starting : pid=22436 port=27017 dbpath=/seq/epigenome01/allelix/ma/mongodb/ 64-bit host=node1382 Tue Oct 2 21:33:48 [initandlisten] Tue Oct 2 21:33:48 [initandlisten] ** WARNING: You are running on a NUMA machine. Tue Oct 2 21:33:48 [initandlisten] ** We suggest launching mongod like this to avoid performance problems: Tue Oct 2 21:33:48 [initandlisten] ** numactl --interleave=all mongod [other options] Tue Oct 2 21:33:48 [initandlisten] Tue Oct 2 21:33:48 [initandlisten] db version v2.2.0, pdfile version 4.5 Tue Oct 2 21:33:48 [initandlisten] git version: f5e83eae9cfbec7fb7a071321928f00d1b0c5207 Tue Oct 2 21:33:48 [initandlisten] build info: Linux ip-10-2-29-40 2.6.21.7-2.ec2.v1.2.fc8xen #1 SMP Fri Nov 20 17:48:28 EST 2009 x86_64 BOOST_LIB_VERSION=1_49 Tue Oct 2 21:33:48 [initandlisten] options: { dbpath: "/users/dnaiel/ma/mongodb/" } Tue Oct 2 21:33:48 [initandlisten] journal dir=users/dnaiel/ma/mongodb/journal Tue Oct 2 21:33:48 [initandlisten] recover begin Tue Oct 2 21:33:48 [initandlisten] info no lsn file in journal/ directory Tue Oct 2 21:33:48 [initandlisten] recover lsn: 0 Tue Oct 2 21:33:48 [initandlisten] recover /seq/epigenome01/allelix/ma/mongodb/journal/j._0 Tue Oct 2 21:33:48 [initandlisten] recover cleaning up Tue Oct 2 21:33:48 [initandlisten] removeJournalFiles Tue Oct 2 21:33:48 [initandlisten] recover done Tue Oct 2 21:33:48 [websvr] admin web console waiting for connections on port 28017 Tue Oct 2 21:33:48 [initandlisten] waiting for connections on port 27017 It basically waits forever and cannot start mongodb. These servers are not webservers but they do have network access, it's a cloud computing LSF environment system. Any advice would be welcome, thanks in advance.

Read the article
VMWare converter performance

- by bellocarico

Hello, I have a question about my test lab. It's more to understand the concept more than apply this into production: I have an ESXi with few VMs linux/windows configured and I'd like to use VMWare converter to create backups. To speedup the process I decided to create a Windows VM on the same ESXi host where I've installed Windows 7 and VMWare Converter. The Host has a gigabit card but it's currently connected to a 100Mb FD port. Windows 7 sees a 1gb card connected. When I do the backup using VMWare converter I specify the host IP as source and destination, so I thought the copy could be faster then use my laptop across the network. Well, to cut a long sotry short: I get dreadful performance (4Mb/sec). I'm a buit confused on this because despite the fact that the host is running 100Mb communication between VMs and hosts shouldn't (correct me if I'm wrong) have any limitation instead. I did tweak windows 7 to optimise network performane but I got just a little improvement. i still need 4 hours to back up a 50Gb (thin) VM. Additionally I wanted to ask: Would jumbo frame help in this? I know that jumbo frame have to be supported end to end, and the network switch where the host is currently connected doesn't support this, but I was wondering: 1) Does ESXi host support jumbo frames at all? 2) Can I enable it somehow? 3) If I do so, I guess bulk transfert between VMs and host would improve, but would this affect the communication going through the real switch as this doesn't do jumbo? Thanks for reading

Read the article
GitHub: Are there external tools for managing issues list vs. project backlog

- by DXM

Recently I posted one of my the projects1 on GitHub and as I was exploring capabilities of the site, I noticed they have a rather decent issue tracking section. I want to use that section as a) other people can report bugs if they'd like and b) other people can see which bugs I'm aware of. However, as others have noted, issues list cannot be prioritized in order to create a project backlog. For now my backlog has been a text file, but I'd like to be able to have it integrated so the same information isn't maintained in different places. Having a fully ordered list, which is something we also practice at work, has been very useful as I can open one file, start with line 1 and fire off 2 or 3 items in one sitting without having to go back to a full issues/stories bucket. GitHub doesn't offer this. What GitHub does offer is a very nice and clean API so issues can easily be exported into anything else. I've searched to see if there are other websites (like Trello) that integrate with GitHub issues, but did not find anything. Does anyone know of such a product, service or offline tool? Those that use GitHub, what is your experience in managing backlog? I kinda hate the idea of manually managing two disconnected lists like some people seem to be doing with Wiki project pages. 1 - are shameless plugs allowed no this site? Searched but didn't find a definite answer. If it's bad practice, STOP and don't read further As a developer I got sick and tired of navigating to same set of folders 30 times a day, so I wrote a little, auto-collapsible utility that gets stuck to the desktop and allows easy access to the folders you constantly use.

Read the article
apache performance timing out

- by Mike

Im running a webserver where I'm hosting about 6-7 websites. Most of these websites get their content from MySQL which is hosted on the same server. Traffic average per day is about 500-600 unique visitors, about 150K hits per week. But for some reason sometimes websites send a timeout, OR sometimes websites dont load all images. I know that I should perhaps separate static content from dynamic content, but for now I think that's not a possibility. I would appreciate any suggestions on how could I improve the performance of apache, so it doesn't keep timing out. Server is running on Sempron LE 1300; 2.3GHz,512K Cache 2GB RAM 10Mbps/1Mbps Services: MySQL, ProFTPD, Apache. Private + Shared = RAM used Program ---------------------------------------------------- 1.2 MiB + 54.0 KiB = 1.2 MiB proftpd 4.1 MiB + 23.0 KiB = 4.1 MiB munin-node 20.8 MiB + 120.5 KiB = 20.9 MiB mysqld 47.3 MiB + 9.9 MiB = 57.3 MiB apache2 (22) top: Mem: 2075356k total, 1826196k used, 249160k free, Timeout 35 KeepAlive On MaxKeepAliveRequests 300 KeepAliveTimeout 5 <IfModule mpm_prefork_module> StartServers 10 MinSpareServers 20 MaxSpareServers 20 MaxClients 60 MaxRequestsPerChild 1000 </IfModule> <IfModule mpm_worker_module> StartServers 2 MaxClients 150 MinSpareThreads 25 MaxSpareThreads 75 ThreadsPerChild 25 MaxRequestsPerChild 0 </IfModule>

Read the article
Bad Mumble control channel performance in KVM guest

- by aef

I'm running a Mumble server (Murmur) on a Debian Wheezy Beta 4 KVM guest which runs on a Debian Wheezy Beta 4 KVM hypervisor. The guest machines are attached to a bridge device on the hypervisor system through Virtio network interfaces. The Hypervisor is attached to a 100Mbit/s uplink and does IP-routing between the guest machines and the remaining Internet. In this setup we're experiencing a clearly recognizable lag between double-clicking a channel in the client and the channel joining action happening. This happens with a lot of different clients between 1.2.3 and 1.2.4 on Linux and Windows systems. Voice quality and latency seems to be completely unaffected by this. Most of the times the client's information dialog states a 16ms latency for both the voice and control channel. The deviation for the control channels mostly is a lot higher than the one of the voice channels. In some situations the control channel is displayed with a 100ms ping and about 1000 deviation. It seems the TCP performance is a problem here. We had no problems on an earlier setup which was in principle quite like the new one. We used Debian Lenny based Xen hypervisor and a soft-virtualised guest machine instead and an earlier version of the Mumble 1.2.3 series. The current murmurd --version says: 1.2.3-349-g315b5f5-2.1

Read the article

Search Results

Search found 23103 results on 925 pages for 'performance issues and ha'.

Page 117/925 | < Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >

- by questzen

- by user1149518

- by Zorlack

- by nick3216

- by Kendall

- by kovan

- by Anon Guy

- by nategood

- by Stu Thompson

- by Paul Smith

- by clsmith

- by user263766

- by Marcelo Santos

- by Justin

- by user15863

- by Safin09

- by TheLQ

- by Tom

- by dancio

- by Marcelo Santos

- by Dnaiel

- by bellocarico

- by DXM

- by Mike

- by aef

< Previous Page | 113 114 115 116 117 118 119 120 121 122 123 124 | Next Page >