Search Results

Search found 8698 results on 348 pages for 'opn days'.

Page 331/348 | < Previous Page | 327 328 329 330 331 332 333 334 335 336 337 338  | Next Page >

  • Linux server apache httpd processes take i/o wait to close to 100% and lock down server

    - by user3682065
    For about 5 days now, and seemingly out of the blue, my linux server has started locking up from time to time. The pattern is always the same as far as I can tell from top and iotop commands around the time it starts happening: One or more httpd processes (usually one) hang and start using up 100% of CPU power, the %wa goes close to 100% and in the iotop I see several httpd processes with 99.99% in the IO column. I'm also running an SVN server on this machine through apache and the one way that I've been consistently able to reproduce this is to do an SVN commit of new files or an SVN update from the repository on this server (I am the only one using this SVN repository). This will always reproduce this scenario successfully, but until very recently I had no problems at all checking in/out of SVN. But sometimes it just happens for no detectable reason at all it seems. So it seems like there is some issue with my Apache that leads it to have processes use up a lot of read/write upon certain triggers. I was wondering if anyone could help me uncover that issue. EDIT: OK now it's happening again: This is top: [root@server ~]# top top - 10:56:54 up 2:59, 5 users, load average: 171.46, 70.35, 27.01 Tasks: 328 total, 2 running, 326 sleeping, 0 stopped, 0 zombie Cpu(s): 1.9%us, 2.0%sy, 0.0%ni, 0.0%id, 96.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2021144k total, 1968192k used, 52952k free, 2500k buffers Swap: 4194288k total, 2938584k used, 1255704k free, 39008k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10390 apache 20 0 2774m 936m 6200 D 2.0 47.4 1:52.27 httpd 2149 root 20 0 927m 13m 1040 S 0.7 0.7 1:50.46 namecoind 11 root 20 0 0 0 0 R 0.3 0.0 0:30.10 events/0 23 root 20 0 0 0 0 S 0.3 0.0 0:17.88 kblockd/1 2049 root 20 0 382m 4932 2880 D 0.3 0.2 0:03.67 httpd 2144 root 20 0 1702m 69m 1164 S 0.3 3.5 5:19.68 bitcoind 6325 root 20 0 15164 1100 656 R 0.3 0.1 0:11.09 top 10311 apache 20 0 387m 9496 7320 D 0.3 0.5 0:01.89 httpd 10313 apache 20 0 391m 10m 7364 D 0.3 0.5 0:02.40 httpd 10466 apache 20 0 399m 12m 7392 D 0.3 0.7 0:02.41 httpd 10599 apache 20 0 391m 9324 7340 D 0.3 0.5 0:00.15 httpd 10628 apache 20 0 384m 7620 4052 D 0.3 0.4 0:00.01 httpd 10633 apache 20 0 384m 7048 3504 D 0.3 0.3 0:00.01 httpd 10634 apache 20 0 384m 8012 4048 D 0.3 0.4 0:00.02 httpd 10638 apache 20 0 400m 22m 9.8m D 0.3 1.1 0:01.93 httpd 10640 apache 20 0 385m 8288 4028 D 0.3 0.4 0:00.03 httpd 10641 apache 20 0 401m 21m 6376 D 0.3 1.1 0:01.45 httpd 10759 apache 20 0 385m 8816 3480 D 0.3 0.4 0:01.45 httpd 10773 apache 20 0 384m 8044 3464 D 0.3 0.4 0:00.02 httpd This is an iotop snapshot: Total DISK READ: 5.93 M/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 10732 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 58.48 % httpd 876 be/3 root 0.00 B/s 52.68 K/s 0.00 % 52.98 % [jbd2/dm-1-8] 10906 be/4 root 124.17 K/s 0.00 B/s 0.00 % 23.03 % sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1 2156 be/4 root 206.94 K/s 0.00 B/s 0.00 % 21.15 % bitcoind 10904 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 18.94 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock 10773 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 14.77 % httpd 10641 be/4 apache 15.05 K/s 0.00 B/s 0.00 % 11.57 % httpd 10399 be/4 apache 1057.29 K/s 0.00 B/s 43.16 % 10.56 % httpd 10682 be/4 sw-cp-se 158.03 K/s 0.00 B/s 0.00 % 7.45 % sw-engine-cgi -c /usr/local/psa/admin/conf/php.ini -d auto_prepend_file=auth.php3 -u psaadm 10774 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 6.53 % httpd 10624 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 5.53 % httpd 10356 be/4 apache 899.26 K/s 0.00 B/s 35.52 % 4.01 % httpd 10795 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 3.93 % httpd 10804 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 3.08 % httpd 4379 be/4 root 2.89 M/s 0.00 B/s 99.99 % 0.00 % namecoind 10619 be/4 apache 462.80 K/s 0.00 B/s 7.80 % 0.00 % httpd 10636 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 0.00 % httpd 10716 be/4 mysql 105.35 K/s 0.00 B/s 5.92 % 0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock 1988 be/4 root 18.81 K/s 0.00 B/s 0.00 % 0.00 % spamd_full.sock I also ran lsof -p for pid 10390 which was way up top under the top command and this is the bottom line where I can sort of see what request this was and it says CLOSE_WAIT: httpd 10390 apache 34u IPv6 315879 0t0 TCP default-domain.com:https->crawl-66-249-65-91.googlebot.com:42907 (CLOSE_WAIT) I'm still not sure what exactly is causing this all to happen though? I killed that service but %wa and load average remain high, I also stopped mysqld and other services. It really only goes down once I stop httpd altogether, and even then I can't start it without finding remaining hanging httpd processes via "netstat -tulpn", killing those or doing "killall -9 httpd" and after waiting a while for it to cycle through all those then doing /etc/init.d/httpd start

    Read the article

  • Can't deploy rails 4 app on Bluehost with Passenger 4 and nginx

    - by user2205763
    I am at Bluehost (dedicated server) trying to run a rails 4 app. I asked to have my server re-imaged, specifying that I do not want rails, ruby, or passenger install automatically as I wanted to install the latest versions myself using a version manager (Bluehost by default offers rails 2.3, ruby 1.8, and passenger 3, which won't work with my app). I installed ruby 1.9.3p327, rails 4.0.0, and passenger 4.0.5. I can verify this by typing, "ruby -v", "rails -v", and "passenger -v" (also "gem -v"). I made sure to install these not as root, so that I don't get a 403 forbidden error when trying to deploy the app. I installed passenger by typing "gem install passenger", and then installed the nginx passenger module (into "/nginx") with "passenger-install-nginx-module". I am trying to run my rails app on a subdomain, http://development.thegraduate.hk (I am using the subdomain to show my client progress on the website). In bluehost I created that subdomain, and had it point to "public_html/thegraduate". I then created a symlink from "rails_apps/thegraduate/public" to "public_html/thegraduate" and verified that the symlink exists. The problem is: when I go to http://development.thegraduate.hk, I get a directory listing. There is nothing resembling a rails app. I have not added a .htaccess file to /rails_apps/thegraduate/public, as that was never specified in the installation of passenger. It was meant to be 'install and go'. When I type "passenger-memory-status", I get 3 things: - Apache processes (7) - Nginx processes (0) - Passenger processes (0) So it appears that nginx and passenger are not running, and I can't figure out how to get it to run (I'm not looking to have it run as a standalone server). Here is my nginx.conf file (/nginx/conf/nginx.conf): #user nobody; worker_processes 1; #error_log logs/error.log; #error_log logs/error.log notice; #error_log logs/error.log info; #pid logs/nginx.pid; events { worker_connections 1024; } http { passenger_root /home/thegrad4/.rbenv/versions/1.9.3-p327/lib/ruby/gems/1.9.1/gems/passenger-4.0.5; passenger_ruby /home/thegrad4/.rbenv/versions/1.9.3-p327/bin/ruby; include mime.types; default_type application/octet-stream; #log_format main '$remote_addr - $remote_user [$time_local] "$request" ' # '$status $body_bytes_sent "$http_referer" ' # '"$http_user_agent" "$http_x_forwarded_for"'; #access_log logs/access.log main; sendfile on; #tcp_nopush on; #keepalive_timeout 0; keepalive_timeout 65; #gzip on; server { listen 80; server_name development.thegraduate.hk; root ~/rails_apps/thegraduate/public; passenger_enabled on; #charset koi8-r; #access_log logs/host.access.log main; location / { root html; index index.html index.htm; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php$ { # root html; # fastcgi_pass 127.0.0.1:9000; # fastcgi_index index.php; # fastcgi_param SCRIPT_FILENAME /scripts$fastcgi_script_name; # include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } # another virtual host using mix of IP-, name-, and port-based configuration # #server { # listen 8000; # listen somename:8080; # server_name somename alias another.alias; # location / { # root html; # index index.html index.htm; # } #} # HTTPS server # #server { # listen 443; # server_name localhost; # ssl on; # ssl_certificate cert.pem; # ssl_certificate_key cert.key; # ssl_session_timeout 5m; # ssl_protocols SSLv2 SSLv3 TLSv1; # ssl_ciphers HIGH:!aNULL:!MD5; # ssl_prefer_server_ciphers on; # location / { # root html; # index index.html index.htm; # } #} } I don't get any errors, just the directory listing. I've tried to be as detailed as possible. Any help on this issue would be greatly appreciated as I've been stumped for the past 3 days. Scouring the web has not helped as my issue seems to be specific to me. Thanks so much. If there are any potential details I forgot to specify, just ask. ** ADDITIONAL INFORMATION ** Going to development.thegraduate.hk/public/ will correctly display the index.html page in /rails_apps/thegraduate/public. However, changing root in the routes.rb file to "root = 'home#index'" does nothing.

    Read the article

  • rm on a directory with millions of files

    - by BMDan
    Background: physical server, about two years old, 7200-RPM SATA drives connected to a 3Ware RAID card, ext3 FS mounted noatime and data=ordered, not under crazy load, kernel 2.6.18-92.1.22.el5, uptime 545 days. Directory doesn't contain any subdirectories, just millions of small (~100 byte) files, with some larger (a few KB) ones. We have a server that has gone a bit cuckoo over the course of the last few months, but we only noticed it the other day when it started being unable to write to a directory due to it containing too many files. Specifically, it started throwing this error in /var/log/messages: ext3_dx_add_entry: Directory index full! The disk in question has plenty of inodes remaining: Filesystem Inodes IUsed IFree IUse% Mounted on /dev/sda3 60719104 3465660 57253444 6% / So I'm guessing that means we hit the limit of how many entries can be in the directory file itself. No idea how many files that would be, but it can't be more, as you can see, than three million or so. Not that that's good, mind you! But that's part one of my question: exactly what is that upper limit? Is it tunable? Before I get yelled at--I want to tune it down; this enormous directory caused all sorts of issues. Anyway, we tracked down the issue in the code that was generating all of those files, and we've corrected it. Now I'm stuck with deleting the directory. A few options here: rm -rf (dir)I tried this first. I gave up and killed it after it had run for a day and a half without any discernible impact. unlink(2) on the directory: Definitely worth consideration, but the question is whether it'd be faster to delete the files inside the directory via fsck than to delete via unlink(2). That is, one way or another, I've got to mark those inodes as unused. This assumes, of course, that I can tell fsck not to drop entries to the files in /lost+found; otherwise, I've just moved my problem. In addition to all the other concerns, after reading about this a bit more, it turns out I'd probably have to call some internal FS functions, as none of the unlink(2) variants I can find would allow me to just blithely delete a directory with entries in it. Pooh. while [ true ]; do ls -Uf | head -n 10000 | xargs rm -f 2/dev/null; done ) This is actually the shortened version; the real one I'm running, which just adds some progress-reporting and a clean stop when we run out of files to delete, is: export i=0; time ( while [ true ]; do ls -Uf | head -n 3 | grep -qF '.png' || break; ls -Uf | head -n 10000 | xargs rm -f 2/dev/null; export i=$(($i+10000)); echo "$i..."; done ) This seems to be working rather well. As I write this, it's deleted 260,000 files in the past thirty minutes or so. Now, for the questions: As mentioned above, is the per-directory entry limit tunable? Why did it take "real 7m9.561s / user 0m0.001s / sys 0m0.001s" to delete a single file which was the first one in the list returned by "ls -U", and it took perhaps ten minutes to delete the first 10,000 entries with the command in #3, but now it's hauling along quite happily? For that matter, it deleted 260,000 in about thirty minutes, but it's now taken another fifteen minutes to delete 60,000 more. Why the huge swings in speed? Is there a better way to do this sort of thing? Not store millions of files in a directory; I know that's silly, and it wouldn't have happened on my watch. Googling the problem and looking through SF and SO offers a lot of variations on "find" that obviously have the wrong idea; it's not going to be faster than my approach for several self-evident reasons. But does the delete-via-fsck idea have any legs? Or something else entirely? I'm eager to hear out-of-the-box (or inside-the-not-well-known-box) thinking. Thanks for reading the small novel; feel free to ask questions and I'll be sure to respond. I'll also update the question with the final number of files and how long the delete script ran once I have that. Final script output!: 2970000... 2980000... 2990000... 3000000... 3010000... real 253m59.331s user 0m6.061s sys 5m4.019s So, three million files deleted in a bit over four hours.

    Read the article

  • Open source CMS for a university department

    - by Greg Kuperberg
    I realize that this type of question gets asked over and over again. Nonetheless, I want to ask a more specific version. I'm in a university math department. Long ago our sysadmins (or just one at the time) switched to a web content management system. At the time, Zope looked like an informed choice. We have used Zope for years, but at least in my opinion, it has always been a controversial decision. At the time I didn't understand why it was so important to have a web CMS. Now I see that it certainly is important, but I don't know that it should be Zope. The good (even necessary) features of Zope for us are: It's free and Linux-based. It is a true CMS and not something else (e.g. wiki or blog) It lets you write HTML and scripts. What I really don't like about Zope is that the outcome of using it is all-or-nothing in a lot of ways. At least in convenient use, it ends up dividing the enterprise into superusers who can do everything, and lusers who can't do anything (except write their own home pages in plain HTML). It has a huge user manual, which end users won't have time to read. Somehow with the access permissions, the simple thing to do is to let a few admins access all of the source and data and that's it. Since this is a math department, the user base varies from real novices to people who understand computers reasonably well. But as it stands, any change that involves Zope has to go through the sysadmins. When the sysadmins are in a hurry, sometimes they will also just add plain HTML pages to the web site instead of using the Zope framework. It doesn't help matters that Zope is fairly disk-intensive and fairly hype-intensive. Not to dwell on Zope too much, but I am wondering what is the right web CMS for a mixed user base of terminal novices, quick studies, and experienced users. Some users might want intermediate permissions, e.g. read permission but not write permission, or permission to change some subset of the pages or see some subset of the database tables. Also it should be Linux-based and open source and a little bit scalable, and of course widely used and well-supported is a good idea. I might guess that the answer is Drupal just because that was the general answer before, but I don't know if it is the right type of CMS for this purpose. (But note that Python is a relatively popular language in a math department, among other reasons because Sage is based on Python.) I can see that I didn't completely define the question and that people are guessing what type of site it is. It is the UC Davis Math Department. The main structure of the site is not suitable for a wiki and it is also not the same thing as a course environment like Moodle. Rather, the site is mostly structured as a generic medium-small enterprise. Some components of the site could be a wiki, Moodle, LaTeX plugin, Request Tracker, etc. However, the main issue is not these components. The main issue is that it would be better to decentralize management of the site. Right now, everything that is in the Zope CMS has to go through the sysadmins. Every other user in the department either has to put in a request to them, or write their own web pages with no help from Zope. There are two main reasons for this: (1) Other people in the department don't have time to read the Zope manual. (2) It's a hassle to set up intermediate permissions in Zope. However, there are other people in the department who know how to write computer programs and use markup languages. I wouldn't want a solution that assumes that users either can't be trusted with much more than drag-and-drop, or that they are IT professionals who sleep with documentation manuals. I'm wondering if Plone/Zope still has this quality, since certainly Zope by itself does. But I also wonder sometimes if common-sense flexibility is unfashionable these days, and that things in general have be either mindlessly easy or incredibly powerful.

    Read the article

  • Cross-platform distributed fault-tolerant (disconnected operation/local cache) filesystem

    - by Adrian Frühwirth
    We are facing a design "challenge" where we are required to set up a storage solution with the following properties: What we need HA a scalable storage backend offline/disconnected operation on the client to account for network outages cross-platform access client-side access from certainly Windows (probably XP upwards), possibly Linux backend integrates with AD/LDAP (permission management (user/group management, ...)) should work reasonably well over slow WAN-links Another problem is that we don't really know all possible use cases here, if people need to be able to have concurrent access to shared files or if they will only be accessing their own files, so a possible solution needs to account for concurrent access and how conflict management would look in this case from a user's point of view. This two years old blog posts sums up the impression that I have been getting during the last couple of days of research, that there are lots of current übercool projects implementing (non-Windows) clustered petabyte-capable blob-storage solutions but that there is none that supports disconnected operation nicely and natively, but I am hoping that we have missed an obvious solution. What we have tried OpenAFS We figured that we want a distributed network filesystem with a local cache and tested OpenAFS (which, as the only currently "stable" DFS supporting disconnected operation, seemed the way to go) for a week but there are several problems with it: it's a real pain to set up there are no official RHEL/CentOS packages the package of the current stable version 1.6.5.1 from elrepo randomly kernel panics on fresh installs, this is an absolute no-go Windows support (including the required Kerberos packages) is mystical. The current client for the 1.6 branch does not run on Windows 8, the current client for the 1.7 does but it just randomly crashes. After that experience we didn't even bother testing on XP and Windows 7. Suffice to say, we couldn't get it working and the whole setup has been so unstable and complicated to setup that it's just not an option for production. Samba + Unison Since OpenAFS was a complete disaster and no other DFS seems to support disconnected operation we went for a simpler idea that would sync files against a Samba server using Unison. This has the following advantages: Samba integrates with ADs; it's a pain but can be done. Samba solves the problem of remotely accessing the storage from Windows but introduces another SPOF and does not address the actual storage problem. We could probably stick any clustered FS underneath Samba, but that means we need a HA Samba setup on top of that to maintain HA which probably adds a lot of additional complexity. I vaguely remember trying to implement redundancy with Samba before and I could not silently failover between servers. Even when online, you are working with local files which will result in more conflicts than would be necessary if a local cache were only touched when disconnected It's not automatic. We cannot expect users to manually sync their files using the (functional, but not-so-pretty) GTK GUI on a regular basis. I attempted to semi-automate the process using the Windows task scheduler, but you cannot really do it in a satisfactory way. On top of that, the way Unison works makes syncing against Samba a costly operation, so I am afraid that it just doesn't scale very well or even at all. Samba + "Offline Files" After that we became a little desparate and gave Windows "offline files" a chance. We figured that having something that is inbuilt into the OS would reduce administrative efforts, helps blaming someone else when it's not working properly and should just work since people have been using this for years. Right? Wrong. We really wanted it to work, but it just doesn't. 30 minutes of copying files around and unplugging network cables/disabling network interfaces left us with (silent! there is only a tiny notification in Windows explorer in the statusbar, which doesn't even open Sync Center if you click on it!) undeletable files on the server (!) and conflicts that should not even be conflicts. In the end, we had one successful sync of a tiny text file, everything else just exploded horribly. Beyond that, there are other problems: Microsoft admits that "offline files" in Windows XP cannot cope with "large files" and therefore does not cache/sync them at all which would mean those files become unavailable if the connection drop In Windows 7 the feature is only available in the Professional/Ultimate/Enterprise editions. Summary Unless there is another fault-tolerant DFS that supports Windows natively I assume that stacking a HA Samba cluster on top of something like GlusterFS/Lustre/whatnot is the only option, but I hope that I am wrong here. How do other companies allow fault-tolerant network access to redundant storage in a heterogeneous environment with Windows?

    Read the article

  • KVM Guest installed from console. But how to get to the guest's console?

    - by badbishop
    I'm trying to install a fully virtualized guest (Fedora 14 x86_64) on KVM (RHEL 6), using command-line only (both hypervisor and guest). It goes without errors, and without a tangible result . I'd like to know how to do a text-only installation. So, here's what I've done: # virt-install \ --name=FE --ram=756 --vcpus=1 \ --file=/var/lib/libvirt/images/FE.img --network bridge:br0 \ --nographics --os-type=linux \ --extra-args='console=tty0' -v \ --cdrom=/media/usb/Fedora-14-x86_64-Live-Desktop.iso Starting install... Creating domain... | 0 B 00:00 Connected to domain FE Escape character is ^] ÿ Now what? As I understand after googling for a couple of days, I should see the guest's output from the text installation, but nothing happens. virt-viewer cannot connect to it, kindly suggesting that I explore all the options by adding --help (which I did). If I reconnect with virsh, I see this: Domain installation still in progress. You can reconnect to the console to complete the installation process. [root@v ~] # virsh console FEConnected to domain FE Escape character is ^] This shows that VM is running # virsh list Id Name State ---------------------------------- 8 FE running Qemu log: LC_ALL=C PATH=/sbin:/usr/sbin:/bin:/usr/bin /usr/libexec/qemu-kvm -S -M rhel6.0.0 -enable-kvm -m 756 -smp 1,sockets=1,cores=1,threads=1 -name FE -uuid 6989d008-7c89-424c-d2d3-f41235c57a18 -nographic -nodefconfig -nodefaults -chardev socket,id=monitor,path=/var/lib/libvirt/qemu/FE.monitor,server,nowait -mon chardev=monitor,mode=control -rtc base=utc -no-reboot -boot d -drive file=/var/lib/libvirt/images/FE.img,if=none,id=drive-ide0-0-0,format=raw,cache=none -device ide-drive,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive file=/media/usb/Fedora-14-x86_64-Live-Desktop.iso,if=none,media=cdrom,id=drive-ide0-1-0,readonly=on,format=raw -device ide-drive,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=20,id=hostnet0 -device rtl8139,netdev=hostnet0,id=net0,mac=52:54:00:0a:65:8d,bus=pci.0,addr=0x2 -chardev pty,id=serial0 -device isa-serial,chardev=serial0 -usb -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x3 char device redirected to /dev/pts/1 Output of /etc/libvirt/qemu/FE.xml # cat /etc/libvirt/qemu/FE.xml <domain type='kvm'> <name>FE</name> <uuid>6989d008-7c89-424c-d2d3-f41235c57a18</uuid> <memory>774144</memory> <currentMemory>774144</currentMemory> <vcpu>1</vcpu> <os> <type arch='x86_64' machine='rhel6.0.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> <pae/> </features> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='raw' cache='none'/> <source file='/var/lib/libvirt/images/FE.img'/> <target dev='hda' bus='ide'/> <address type='drive' controller='0' bus='0' unit='0'/> </disk> <disk type='block' device='cdrom'> <driver name='qemu' type='raw'/> <target dev='hdc' bus='ide'/> <readonly/> <address type='drive' controller='0' bus='1' unit='0'/> </disk> <controller type='ide' index='0'> <address type='pci' domain='0x0000' bus='0x00' slot='0x01' function='0x1'/> </controller> <interface type='bridge'> <mac address='52:54:00:0a:65:8d'/> <source bridge='br0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <serial type='pty'> <target port='0'/> </serial> <console type='pty'> <target port='0'/> </console> <memballoon model='virtio'> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </memballoon> </devices> </domain> I'm obviously missing something that many others don't, but what is it? Thanx in advance!

    Read the article

  • Bind: dns not 'spreaded'

    - by realtebo
    I've elfoip.net with bind $ whois elfoip.net | grep 'Name Server' Name Server: NS.ELFOIP.NET I need elfoip.net be able to serve third levels domain, like mickymouse.elfoip.net, etc... Yes, I'm trying to create an other useless dyndns clone. i've added some third level as A RR. Eg: executing this from the server itself $ dig @localhost mattinauno.elfoip.net ;; ANSWER SECTION: mattinauno.elfoip.net. 60 IN A 192.81.221.113 I was expecting in one or two days, from my pc i can digit in browser mattinauno.elfoip.net and get page a 192.81.221.113 But this is not happening. Are there any prerequisites to satisfy to allow dns of my isp to be able to forward dns resolution of *.elfoip.net to MY dns ? (Or to ask to him and then cache ?) TTL of zone is set a 5m I've not AllowQuey directive, is it necessary for other dns to cache from mine ? I've cheched the zone with bind utility named-checkzone but no error detected. How to diagnose why other dns doesn't take in account RR from mine ? from my home pc dig @ns.elfoip.net mattinauno.elfoip.net ;; ANSWER SECTION: mattinauno.elfoip.net. 60 IN A 192.81.221.113 ;; AUTHORITY SECTION: elfoip.net. 300 IN NS ns.elfoip.net. but dig @8.8.8.8 mattinauno.elfoip.net give no answers Whole zone file: note I've used nsupdate, so this file has been re-edited and re-formatted from this utility ! root@mirko:/var/named# cat elfoip.net.db $ORIGIN . $TTL 300 ; 5 minutes elfoip.net IN SOA ns.elfoip.net. hostmaster.elfoip.net. ( 2013062314 ; serial 3600 ; refresh (1 hour) 600 ; retry (10 minutes) 86400 ; expire (1 day) 60 ; minimum (1 minute) ) NS ns.elfoip.net. A 109.168.99.6 $ORIGIN elfoip.net. $TTL 60 ; 1 minute google A 173.194.35.56 maiscai A 192.81.221.113 mattinadue A 192.81.221.113 mattinauno A 192.81.221.113 $TTL 300 ; 5 minutes ns A 109.168.99.6 $TTL 60 ; 1 minute prova A 208.67.222.222 prova2 A 13.23.34.45 A 13.23.34.46 www CNAME elfoip.net. EDIT: added named.conf.local zone "elfoip.net" { type master; // file "/etc/bind/elfoip.net.db"; file "/var/named/elfoip.net.db"; allow-update { key elfoip.net ; }; }; EDIT: I've no setup list-on directive *EDIT Added a TCPDUMP after [email protected] wwww.elfoip.net from a machine which uses my company internal dns, who allow recursive query. root@mirko:~# tcpdump -i eth0 'port 53' tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes 11:57:23.293611 IP host9-210-static.22-87-b.business.telecomitalia.it.45958 > mirko.elfoip.net.domain: 61337+ A? www.elfoip.net. (32) 11:57:23.294114 IP mirko.elfoip.net.domain > host9-210-static.22-87-b.business.telecomitalia.it.45958: 61337* 2/1/1 CNAME elfoip.net., A 109.168.99.6 (95) 11:57:23.294554 IP mirko.elfoip.net.59571 > google-public-dns-a.google.com.domain: 45851+ PTR? 9.210.22.87.in-addr.arpa. (42) 11:57:23.330444 IP google-public-dns-a.google.com.domain > mirko.elfoip.net.59571: 45851 1/0/0 PTR host9-210-static.22-87-b.business.telecomitalia.it. (106) 11:57:23.331181 IP mirko.elfoip.net.44171 > google-public-dns-a.google.com.domain: 33339+ PTR? 8.8.8.8.in-addr.arpa. (38) 11:57:23.439405 IP google-public-dns-a.google.com.domain > mirko.elfoip.net.44171: 33339 1/0/0 PTR google-public-dns-a.google.com. (82) 11:57:31.350654 IP host9-210-static.22-87-b.business.telecomitalia.it.30108 > mirko.elfoip.net.domain: 38269 [1au] A? ns.elfoip.net. (42) 11:57:31.351117 IP mirko.elfoip.net.domain > host9-210-static.22-87-b.business.telecomitalia.it.30108: 38269* 1/1/1 A 109.168.99.6 (72) If i dig @8.8.8.8 www.elfoip.net, NOTHING happens in dump log !

    Read the article

  • snmptt not translating traps, even with translate_log_trap_oid=1

    - by mbrownnyc
    I am having some trouble configuring snmptt to properly translate snmp traps. The following is a problem: /etc/snmp/snmptt.conf reflects: EVENT fgFmTrapIfChange .1.3.6.1.4.1.12356.101.6.0.1004 "Status Events" Critical FORMAT $* EXEC /usr/local/nagios/libexec/eventhandlers/submit_check_result $r "snmp_traps" 2 "$O: $+*" "$*" SDESC Trap is sent to the managing FortiManager if an interface IP is changed Variables: 1: fnSysSerial 2: ifName 3: fgManIfIp 4: fgManIfMask EDESC when a trap is received, /var/log/messages reflects: Sep 6 12:07:32 SNMPMANAGERHOST snmptrapd[15385]: 2012-09-06 12:07:32 <UNKNOWN> [UDP: [192.168.100.2]:162->[192.168.100.31]]: #012.1.3.6.1.2.1.1.3.0 = Timeticks: (707253943) 81 days, 20:35:39.43 #011.1.3.6.1.6.3.1.1.4.1.0 = OID: .1.3.6.1.4.1.12356.101.6.0.1004 #011.1.3.6.1.4.1.12356.100.1.1.1.0 = STRING: FGTNNNNNNNNN #011.1.3.6.1.2.1.31.1.1.1.1.10 = STRING: internal4 #011.1.3.6.1.4.1.12356.101.6.2.1.0 = IpAddress: 192.168.65.100 #011.1.3.6.1.4.1.12356.101.6.2.2.0 = IpAddress: 255.255.255.0 Sep 6 12:07:37 SNMPMANAGERHOST icinga: EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT; 192.168.100.2; snmp_traps; 2; enterprises.12356.101.6.0.1004: enterprises.12356.100.1.1.1.0:FGTNNNNNNNNN ifName.10:internal4 enterprises.12356.101.6.2.1.0:192.168.65.100 enterprises.12356.101.6.2.2.0:255.255.255.0 Since the icinga entry reflects the EXEC, it's obvious there is no translations occurring by snmptt. I have verified that translate_log_trap_oid and net_snmp_perl_enable is enabled in snmptt.ini When using --debug=1 to start snmptt, I see the following in the --debugfile: ********** Net-SNMP version 5.05 Perl module enabled ********** The main NET-SNMP version is reported as NET-SNMP version: 5.5. What else can be done to verify that snmptt is configured properly to translate traps? I have run snmptt-net-snmp-test to verify whatever net-snmp-perl version I have installed properly supports translations. The output indicates it does. /root/snmptt_1.3/snmptt-net-snmp-test --best_guess=2 SNMPTT Net-SNMP Test v1.0 (c) 2003 Alex Burger http://snmptt.sourceforge.net MIBS:RFC1213-MIB best_guess: 2 Testing translateObj ******************** Testing: .1.3.6.1.2.1.1.1, long_names=disabled, include_module=disabled Test passed. Result: sysDescr Testing: .1.3.6.1.2.1.1.1, long_names=disabled, include_module=enabled Test passed. Result: RFC1213-MIB::sysDescr Testing: .1.3.6.1.2.1.1.1, long_names=enabled, include_module=disabled Test passed. Result: .iso.org.dod.internet.mgmt.mib-2.system.sysDescr Testing: .1.3.6.1.2.1.1.1, long_names=enabled, include_module=enabled Test passed. Result: RFC1213-MIB::.iso.org.dod.internet.mgmt.mib-2.system.sysDescr Testing: sysDescr, long_names=disabled, include_module=disabled Test passed. Result: .1.3.6.1.2.1.1.1 Testing: RFC1213-MIB::sysDescr, long_names=disabled, include_module=disabled Test passed. Result: .1.3.6.1.2.1.1.1 Testing: system.sysDescr, long_names=disabled, include_module=disabled Test passed. Result: .1.3.6.1.2.1.1.1 Testing: RFC1213-MIB::system.sysDescr, long_names=disabled, include_module=disabled Test passed. Result: .1.3.6.1.2.1.1.1 Testing: .iso.org.dod.internet.mgmt.mib-2.system.sysDescr, long_names=disabled, include_module=disabled Test passed. Result: .1.3.6.1.2.1.1.1 Testing getType *************** Testing: .1.3.6.1.2.1.4.1 Test passed. Result: INTEGER Testing: ipForwarding Test passed. Result: INTEGER Testing Description ******************* Test passed. Result: ------------------------------------------------- The indication of whether this entity is acting as an IP gateway in respect to the forwarding of datagrams received by, but not addressed to, this entity. IP gateways forward datagrams. IP hosts do not (except those source-routed via the host). Note that for some managed nodes, this object may take on only a subset of the values possible. Accordingly, it is appropriate for an agent to return a `badValue' response if a management station attempts to change this object to an inappropriate value. ------------------------------------------------- I have manually gone through the MIB with the definition that's not resolving, and verified that it is properly linking back to the proper resolved definition. It is: FORTINET-FORTIGATE-MIB.txt contains: fgFmTrapIfChange NOTIFICATION-TYPE OBJECTS { fnSysSerial, ifName, fgManIfIp, fgManIfMask } STATUS current DESCRIPTION "Trap is sent to the managing FortiManager if an interface IP is changed" ::= { fgFmTrapPrefix 1004 } fgFmTrapPrefix OBJECT IDENTIFIER ::= { fgMgmt 0 } fgMgmt OBJECT IDENTIFIER ::= { fnFortiGateMib 6 } fnFortiGateMib ::= { fortinet 101 } IMPORTS FnBoolState, FnIndex, fnAdminEntry, fnSysSerial, fortinet FROM FORTINET-CORE-MIB fortinet MODULE-IDENTITY ::= { enterprises 12356 } LOOKS GOOD!!!!! 1.3.6.1.4.1.12356.101.6.0.1004 I've exhausted all the documentation and even posted fruitlessly in the snmptt-users mailing list. I can not prove it is the MIB. Why would snmptt fail to translate traps? Thanks, Matt

    Read the article

  • Server being used to send spam mail. How do I investigate?

    - by split_account
    Problem I think my server is being used to send spam with sendmail, I'm getting a lot of mail being queued up that I don't recognize and my mail.log and syslog are getting huge. I've shutdown sendmail, so none of it is getting out but I can't work out where it's coming from. Investigation so far: I've tried the solution in the blog post below and also shown in this thread. It's meant to add a header from wherever the mail is being added and log all all mail to file, so I changed the following lines in my php.ini file: mail.add_x_header = On mail.log = /var/log/phpmail.log But nothing is appearing in the phpmail.log. I used the command here to investigate cron jobs for all users, but nothing is out of place. The only cron being run is the cron for the website. And then I brought up all php files which had been modified in the last 30 days but none of them look suspicious. What else can I do to find where this is coming from? Mail.log reports Turned sendmail back on for second. Here is a small sample of the reports: Jun 10 14:40:30 ubuntu12 sm-mta[13684]: s5ADeQdp013684: from=<>, size=2431, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1] Jun 10 14:40:30 ubuntu12 sm-msp-queue[13674]: s5ACK1cC011438: to=www-data, delay=01:20:14, xdelay=00:00:00, mailer=relay, pri=571670, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s5ADeQdp013684 Message accepted for delivery) Jun 10 14:40:30 ubuntu12 sm-mta[13719]: s5ADeQdp013684: to=<[email protected]>, delay=00:00:00, xdelay=00:00:00, mailer=local, pri=32683, dsn=2.0.0, stat=Sent Jun 10 14:40:30 ubuntu12 sm-mta[13684]: s5ADeQdr013684: from=<[email protected]>, size=677, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1] Jun 10 14:40:31 ubuntu12 sm-msp-queue[13674]: s5AC0gpi011125: to=www-data, ctladdr=www-data (33/33), delay=01:39:49, xdelay=00:00:01, mailer=relay, pri=660349, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s5ADeQdr013684 Message accepted for delivery) Jun 10 14:40:31 ubuntu12 sm-mta[13721]: s5ADeQdr013684: to=<[email protected]>, ctladdr=<[email protected]> (33/33), delay=00:00:01, xdelay=00:00:00, mailer=local, pri=30946, dsn=2.0.0, stat=Sent Jun 10 14:40:31 ubuntu12 sm-mta[13684]: s5ADeQdt013684: from=<[email protected]>, size=677, class=0, nrcpts=1, msgid=<[email protected]>, proto=ESMTP, daemon=MTA-v4, relay=localhost [127.0.0.1] Jun 10 14:40:31 ubuntu12 sm-msp-queue[13674]: s5ACF2Nq011240: to=www-data, ctladdr=www-data (33/33), delay=01:25:29, xdelay=00:00:00, mailer=relay, pri=660349, relay=[127.0.0.1] [127.0.0.1], dsn=2.0.0, stat=Sent (s5ADeQdt013684 Message accepted for delivery) Jun 10 14:40:31 ubuntu12 sm-mta[13723]: s5ADeQdt013684: to=<[email protected]>, ctladdr=<[email protected]> (33/33), delay=00:00:00, xdelay=00:00:00, mailer=local, pri=30946, dsn=2.0.0, stat=Sent Ju Further Investigation Spotted 4 spam accounts registered in the past day, which is suspicious however all have normal user privileges. There are no contact forms on the site, there are a number of forms and they take either filtered text input or plain text input. Mail is still being queued up having switched the website to maintenance mode, which blocks out everyone but the admin. Ok more investigation, it looks like the email is being send by my websites cron which runs every 5 minutes. However there are no cron jobs I've set-up which run more than once an hour and show on the website log so presumably someone has managed to edit my cron somehow. Copy of email: V8 T1402410301 K1402411201 N2 P120349 I253/1/369045 MDeferred: Connection refused by [127.0.0.1] Fbs $_www-data@localhost ${daemon_flags}c u Swww-data [email protected] MDeferred: Connection refused by [127.0.0.1] C:www-data rRFC822; [email protected] RPFD:www-data H?P?Return-Path: <?g> H??Received: (from www-data@localhost) by ubuntu12.pcsmarthosting.co.uk (8.14.4/8.14.4/Submit) id s5AEP13T015507 for www-data; Tue, 10 Jun 2014 15:25:01 +0100 H?D?Date: Tue, 10 Jun 2014 15:25:01 +0100 H?x?Full-Name: CronDaemon H?M?Message-Id: <[email protected]> H??From: root (Cron Daemon) H??To: www-data H??Subject: Cron <www-data@ubuntu12> /usr/bin/drush @main elysia-cron H??Content-Type: text/plain; charset=ANSI_X3.4-1968 H??X-Cron-Env: <PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin> H??X-Cron-Env: <COLUMNS=80> H??X-Cron-Env: <SHELL=/bin/sh> H??X-Cron-Env: <HOME=/var/www> H??X-Cron-Env: <LOGNAME=www-data>

    Read the article

  • Bypass cache for mobile user agents, VARNISH+NGINX+W3CACHE

    - by Mike McGhee
    Right now I'm running Wordpress w/ W3 Cache on nginx with varnish front end. I'm trying to use the WP Touch Pro plugin for wordpress to display mobile sites, but it is not working. Shows the desktop theme still. I've put the mobile user agents in the rejected user agents box in w3 cache. Here is the nginx config w3 cache spit out: BEGIN W3TC Page Cache cache location ~ /wp-content/w3tc/pgcache.*html$ { expires modified 3600s; add_header X-Powered-By "W3 Total Cache/0.9.2.4"; add_header Vary "Accept-Encoding, Cookie"; } location ~ /wp-content/w3tc/pgcache.*gzip$ { gzip off; types {} default_type text/html; expires modified 3600s; add_header X-Powered-By "W3 Total Cache/0.9.2.4"; add_header Vary "Accept-Encoding, Cookie"; add_header Content-Encoding gzip; } # END W3TC Page Cache cache # BEGIN W3TC Browser Cache gzip on; gzip_types text/css application/x-javascript text/x-component text/richtext image/svg+xml text/plain text/xsd text/xsl text/xml image/x-icon; location ~ \.(css|js|htc)$ { expires 31536000s; add_header X-Powered-By "W3 Total Cache/0.9.2.4"; } location ~ \.(html|htm|rtf|rtx|svg|svgz|txt|xsd|xsl|xml)$ { expires 3600s; add_header X-Powered-By "W3 Total Cache/0.9.2.4"; } location ~ \.(asf|asx|wax|wmv|wmx|avi|bmp|class|divx|doc|docx|eot|exe|gif|gz|gzip|ico|jpg|jpeg|jpe|mdb|mid|midi|mov|qt|mp3|m4a|mp4|m4v|mpeg|mpg|mpe|mpp|otf|odb|odc|odf|odg|odp|ods|odt|ogg|pdf|png|pot|pps|ppt|pptx|ra|ram|svg|svgz|swf|tar|tif|tiff|ttf|ttc|wav|wma|wri|xla|xls|xlsx|xlt|xlw|zip)$ { expires 31536000s; add_header X-Powered-By "W3 Total Cache/0.9.2.4"; } # END W3TC Browser Cache # BEGIN W3TC Minify core rewrite ^/wp-content/w3tc/min/w3tc_rewrite_test$ /wp-content/w3tc/min/index.php?w3tc_rewrite_test=1 last; rewrite ^/wp-content/w3tc/min/(.+\.(css|js))$ /wp-content/w3tc/min/index.php?file=$1 last; # END W3TC Minify core # BEGIN W3TC Page Cache core rewrite ^(.*\/)?w3tc_rewrite_test$ $1?w3tc_rewrite_test=1 last; set $w3tc_rewrite 1; if ($request_method = POST) { set $w3tc_rewrite 0; } if ($query_string != "") { set $w3tc_rewrite 0; } if ($http_host != "mysite.com") { set $w3tc_rewrite 0; } set $w3tc_rewrite2 1; if ($request_uri !~ \/$) { set $w3tc_rewrite2 0; } if ($request_uri ~* "(sitemap(_index)?\.xml(\.gz)?|[a-z0-9_\-]+-sitemap([0-9]+)?\.xml(\.gz)?)") { set $w3tc_rewrite2 1; } if ($w3tc_rewrite2 != 1) { set $w3tc_rewrite 0; } set $w3tc_rewrite3 1; if ($request_uri ~* "(\/wp-admin\/|\/xmlrpc.php|\/wp-(app|cron|login|register|mail)\.php|\/feed\/|wp-.*\.php|index\.php)") { set $w3tc_rewrite3 0; } if ($request_uri ~* "(wp\-comments\-popup\.php|wp\-links\-opml\.php|wp\-locations\.php)") { set $w3tc_rewrite3 1; } if ($w3tc_rewrite3 != 1) { set $w3tc_rewrite 0; } if ($http_cookie ~* "(comment_author|wp\-postpass|wordpress_\[a\-f0\-9\]\+|wordpress_logged_in)") { set $w3tc_rewrite 0; } if ($http_user_agent ~* "(W3\ Total\ Cache/0\.9\.2\.4|iphone|ipod|ipad|aspen|incognito|webmate|android|dream|cupcake|froyo|blackberry9500|blackberry9520|blackberry9530|blackberry9550|blackberry\ 9800|blackberry\ 9780|webos|s8000|bada)") { set $w3tc_rewrite 0; } set $w3tc_ua ""; if ($http_user_agent ~* "(acer\ s100|android|archos5|blackberry9500|blackberry9530|blackberry9550|blackberry\ 9800|cupcake|docomo\ ht\-03a|dream|htc\ hero|htc\ magic|htc_dream|htc_magic|incognito|ipad|iphone|ipod|kindle|lg\-gw620|liquid\ build|maemo|mot\-mb200|mot\-mb300|nexus\ one|opera\ mini|samsung\-s8000|series60.*webkit|series60/5\.0|sonyericssone10|sonyericssonu20|sonyericssonx10|t\-mobile\ mytouch\ 3g|t\-mobile\ opal|tattoo|webmate|webos)") { set $w3tc_ua _high; } set $w3tc_ref ""; set $w3tc_ssl ""; set $w3tc_enc ""; if ($http_accept_encoding ~ gzip) { set $w3tc_enc _gzip; } set $w3tc_ext ""; if (-f "$document_root/wp-content/w3tc/pgcache/$request_uri/_index$w3tc_ua$w3tc_ref$w3tc_ssl.html$w3tc_enc") { set $w3tc_ext .html; } if ($w3tc_ext = "") { set $w3tc_rewrite 0; } if ($w3tc_rewrite = 1) { rewrite .* "/wp- content/w3tc/pgcache/$request_uri/_index$w3tc_ua$w3tc_ref$w3tc_ssl$w3tc_ext$w3tc_enc" last; } # END W3TC Page Cache core And here is what I have in my varnish vcl.. sub vcl_recv { # Add a unique header containing the client address remove req.http.X-Forwarded-For; set req.http.X-Forwarded-For = client.ip; # Device detection set req.http.X-Device = "desktop"; if ( req.http.User-Agent ~ "iP(hone|od|ad)" || req.http.User-Agent ~ "Android" ) { set req.http.X-Device = "smart"; } elseif ( req.http.User-Agent ~ "(SymbianOS|BlackBerry|SonyEricsson|Nokia|SAMSUNG|^LG)" ) { set req.http.X-Device = "cell"; } Any help is greatly appreciated, I've been banging my head against this for 2 days..

    Read the article

  • Cluster failover and strange gratuitous arp behavior

    - by lazerpld
    I am experiencing a strange Windows 2008R2 cluster related issue that is bothering me. I feel that I have come close as to what the issue is, but still don't fully understand what is happening. I have a two node exchange 2007 cluster running on two 2008R2 servers. The exchange cluster application works fine when running on the "primary" cluster node. The problem occurs when failing over the cluster ressource to the secondary node. When failing over the cluster to the "secondary" node, which for instance is on the same subnet as the "primary", the failover initially works ok and the cluster ressource continues to work for a couple of minutes on the new node. Which means that the recieving node does send out a gratuitous arp reply packet that updated the arp tables on the network. But after x amount of time (typically within 5 minutes time) something updates the arp-tables again because all of a sudden the cluster service does not answer to pings. So basically I start a ping to the exchange cluster address when its running on the "primary node". It works just great. I failover the cluster ressource group to the "secondary node" and I only have loss of one ping which is acceptable. The cluster ressource still answers for some time after being failed over and all of a sudden the ping starts timing out. This is telling me that the arp table initially is updated by the secondary node, but then something (which I haven't found out yet) wrongfully updates it again, probably with the primary node's MAC. Why does this happen - has anyone experienced the same problem? The cluster is NOT running NLB and the problem stops immidiately after failing over back to the primary node where there are no problems. Each node is using NIC teaming (intel) with ALB. Each node is on the same subnet and has gateway and so on entered correctly as far as I am concerned. Edit: I was wondering if it could be related to network binding order maybe? Because I have noticed that the only difference I can see from node to node is when showing the local arp table. On the "primary" node the arp table is generated on the cluster address as the source. While on the "secondary" its generated from the nodes own network card. Any input on this? Edit: Ok here is the connection layout. Cluster address: A.B.6.208/25 Exchange application address: A.B.6.212/25 Node A: 3 physical nics. Two teamed using intels teaming with the address A.B.6.210/25 called public The last one used for cluster traffic called private with 10.0.0.138/24 Node B: 3 physical nics. Two teamed using intels teaming with the address A.B.6.211/25 called public The last one used for cluster traffic called private with 10.0.0.139/24 Each node sits in a seperate datacenter connected together. End switches being cisco in DC1 and NEXUS 5000/2000 in DC2. Edit: I have been testing a little more. I have now created an empty application on the same cluster, and given it another ip address on the same subnet as the exchange application. After failing this empty application over, I see the exact same problem occuring. After one or two minutes clients on other subnets cannot ping the virtual ip of the application. But while clients on other subnets cannot, another server from another cluster on the same subnet has no trouble pinging. But if i then make another failover to the original state, then the situation is the opposite. So now clients on same subnet cannot, and on other they can. We have another cluster set up the same way and on the same subnet, with the same intel network cards, the same drivers and same teaming settings. Here we are not seeing this. So its somewhat confusing. Edit: OK done some more research. Removed the NIC teaming of the secondary node, since it didnt work anyway. After some standard problems following that, I finally managed to get it up and running again with the old NIC teaming settings on one single physical network card. Now I am not able to reproduce the problem described above. So it is somehow related to the teaming - maybe some kind of bug? Edit: Did some more failing over without being able to make it fail. So removing the NIC team looks like it was a workaround. Now I tried to reestablish the intel NIC teaming with ALB (as it was before) and i still cannot make it fail. This is annoying due to the fact that now i actually cannot pinpoint the root of the problem. Now it just seems to be some kind of MS/intel hick-up - which is hard to accept because what if the problem reoccurs in 14 days? There is a strange thing that happened though. After recreating the NIC team I was not able to rename the team to "PUBLIC" which the old team was called. So something has not been cleaned up in windows - although the server HAS been restarted! Edit: OK after restablishing the ALB teaming the error came back. So I am now going to do some thorough testing and i will get back with my observations. One thing is for sure. It is related to Intel 82575EB NICS, ALB and Gratuitous Arp.

    Read the article

  • Resolve Wrong IP from Domain Name only on certain networks

    - by Godric Seer
    I host a personal website on an old desktop that is LAMP based. There are several strange things about this problem so I will break it down into steps. Since I have a dynamic IP, I use no-ip to make sure I have a working domain name at all times. I use the automatic update client, but logged in and checked and my no-ip domain has the proper IP tied to it. Here is a link to the homepage through the no-ip domain for reference. Also, I do a ping and a traceroute on the no-ip domain and get: [eckertzs@localhost ~]$ ping -c 1 endradil.noip.me PING endradil.noip.me (65.24.215.99) 56(84) bytes of data. 64 bytes from endradil.noip.me (65.24.215.99): icmp_seq=1 ttl=64 time=2.23 ms --- endradil.noip.me ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 104ms rtt min/avg/max/mdev = 2.233/2.233/2.233/0.000 ms [eckertzs@localhost ~]$ traceroute endradil.noip.me traceroute to endradil.noip.me (65.24.215.99), 30 hops max, 60 byte packets 1 . (192.168.2.1) 1.755 ms 5.409 ms 5.380 ms 2 endradil.noip.me (65.24.215.99) 6.297 ms 9.543 ms 10.324 ms Using this domain, I can connect to my webserver without issue or interruption(the https is required to avoid a redirect serverside, but it works). I also have a domain I have bought on GoDaddy where I have a CNAME record forwarding the www subdomain to my no-ip domain. CNAME Record Host: www Points to: endradil.noip.me TTL: 1 hour For the past several weeks, I never had an issue using the GoDaddy domain to connect (ssh or https). As of the past few days, however, the GoDaddy domain has only worked intermittently, for a few minutes at a time and then will go down for hours at a time. I get server not found errors most of the time. Also, if I happen to be using the GoDaddy domain for an ssh connection, the connection will freeze. I have run online tests of the DNS and have seen that the website is visible by external servers and resolved to the correct IP. I also contacted GoDaddy support but they had no issues connecting to the website, and therefore did not see any issues. My personal computers (Windows desktop, linux laptop, android phone) all fail to connect when on my personal wifi. If I disconnect my phone from the wifi and use my AT&T wireless data, it can connect with both domains without issue. When I attempt to use Google webmaster tools to crawl the site using the GoDaddy domain, Google can not find the site. From my linux laptop, I have found some interesting results when I ping or traceroute the domain. The results from these: [eckertzs@localhost ~]$ ping -c 1 www.endradil.com PING www.endradil.com.Belkin (198.105.244.228) 56(84) bytes of data. --- www.endradil.com.Belkin ping statistics --- 1 packets transmitted, 0 received, 100% packet loss, time 10000ms [eckertzs@localhost ~]$ traceroute www.endradil.com traceroute to www.endradil.com (198.105.244.228), 30 hops max, 60 byte packets 1 . (192.168.2.1) 1.918 ms 2.806 ms 2.772 ms 2 cpe-65-24-208-1.insight.res.rr.com (65.24.208.1) 29.247 ms 29.654 ms 30.094 ms 3 cpe-69-23-24-117.new.res.rr.com (69.23.24.117) 15.597 ms 23.218 ms 23.581 ms 4 agg24.clmcohib01r.midwest.rr.com (65.29.1.52) 30.581 ms 30.556 ms 31.192 ms 5 be27.clevohek01r.midwest.rr.com (65.29.1.38) 30.580 ms 31.062 ms 31.038 ms 6 bu-ether25.atlngamq47w-bcr01.tbone.rr.com (107.14.19.38) 37.863 ms 68.844 ms 43.773 ms 7 107.14.17.178 (107.14.17.178) 51.866 ms 51.019 ms 50.989 ms 8 ae0.pr1.dca10.tbone.rr.com (107.14.17.200) 48.467 ms ae-4-0.a0.lax91.tbone.rr.com (66.109.1.113) 49.912 ms * 9 v413.core1.ash1.he.net (209.51.175.33) 60.270 ms 50.842 ms 50.819 ms 10 100ge5-1.core1.nyc4.he.net (184.105.223.166) 55.597 ms 56.045 ms 56.020 ms 11 xerocole-inc.10gigabitethernet12-4.core1.nyc4.he.net (216.66.41.242) 56.001 ms 55.969 ms 55.992 ms 12 * * * both show the incorrect IP. Also, the traceroute timesout on hops 12 through 255 (output truncated above). The traceroute using site24x7 works and shows reasonable results when run from their california server. From another linux box on a different network but in the same city as me (10 miles away), I still get timeout for traceroute, however the IP resolves correctly for the domain. From this I believe that the DNS result is incorrectly cached in either my router/modem or perhaps even at my ISP level. My question is, first, how do I find out exactly what is wrong, and second, how do I resolve it.

    Read the article

  • my webserver with 16GB ram shows all RAM as used, but is it really, see the 'top'

    - by Alex
    I have some questions about my web server. Its a LAMP web server running centos 5.5 and php5, mysql5. The server gets hundreds (maybe thousand) of concurrent users during peak hours. I'm trying to optimize a little and understand "top". From what I can see: all 16GB of my ram have been used up? does that mean that my server needs more memory? My swap is only 2GB, should it be increased? usually during peak hours my server load average first number is about 2.5-3. What could I do to optimize the server so that the load average even during peak doesn't go above 1? In the past I was told a good working server should stay under 1 load, is this still true? Although even during load of 2.5-3, server pages and applications seem to load with pretty good speed. what should the memory size in php.ini be set to? top - 14:30:18 up 2 days, 12:41, 5 users, load average: 1.25, 1.74, 2.92 Tasks: 305 total, 2 running, 302 sleeping, 0 stopped, 1 zombie Cpu(s): 6.3%us, 0.9%sy, 0.0%ni, 92.5%id, 0.2%wa, 0.0%hi, 0.1%si, 0.0%st Mem: 16427200k total, 16111472k used, 315728k free, 3120316k buffers Swap: 2104496k total, 268k used, 2104228k free, 6216756k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 29080 apache 15 0 358m 36m 5192 S 20.2 0.2 2:08.40 httpd 29093 apache 18 0 357m 36m 5192 S 18.2 0.2 2:02.52 httpd 29079 apache 15 0 370m 49m 5832 S 10.0 0.3 2:32.14 httpd 1812 apache 15 0 370m 49m 5196 S 7.3 0.3 2:25.30 httpd 5204 apache 15 0 358m 36m 5168 S 5.3 0.2 0:59.28 httpd 29075 apache 15 0 370m 48m 5184 S 3.3 0.3 2:15.93 httpd 9712 apache 15 0 360m 38m 5180 S 3.0 0.2 0:54.81 httpd 29072 apache 16 0 358m 36m 5192 S 2.7 0.2 2:24.43 httpd 6310 apache 17 0 388m 67m 5180 S 2.3 0.4 0:58.85 httpd 8674 apache 15 0 343m 21m 4980 S 2.0 0.1 0:07.91 httpd 29085 apache 15 0 371m 49m 5224 S 2.0 0.3 2:16.86 httpd 29083 apache 15 0 370m 48m 5196 S 1.7 0.3 2:10.64 httpd 5575 apache 15 0 357m 36m 5228 S 1.3 0.2 0:53.78 httpd 29066 apache 15 0 379m 59m 5860 R 1.3 0.4 2:11.93 httpd 29078 apache 15 0 370m 48m 5188 S 1.3 0.3 2:14.52 httpd 29084 apache 15 0 370m 48m 5208 S 1.0 0.3 2:02.49 httpd 29089 apache 15 0 370m 48m 5188 S 1.0 0.3 2:27.61 httpd 29082 apache 15 0 390m 68m 5188 S 0.7 0.4 2:32.48 httpd 29984 apache 15 0 358m 36m 5228 S 0.7 0.2 2:08.32 httpd 3571 root 16 0 13400 1792 848 S 0.3 0.0 2:37.89 top 4419 mysql 15 0 668m 175m 7204 S 0.3 1.1 3:32.25 mysqld 28181 root 15 0 90460 3624 2680 S 0.3 0.0 0:17.60 sshd 29091 apache 15 0 390m 69m 5196 S 0.3 0.4 2:29.99 httpd 32476 root 15 0 12900 1320 848 R 0.3 0.0 0:06.46 top 1 root 15 0 10372 680 572 S 0.0 0.0 0:02.01 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.51 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.07 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root RT -5 0 0 0 S 0.0 0.0 0:00.12 migration/1 6 root 34 19 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/1 7 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/1 8 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/2 9 root 34 19 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/2 10 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/2 11 root RT -5 0 0 0 S 0.0 0.0 0:00.06 migration/3 12 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/3 13 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/3 14 root RT -5 0 0 0 S 0.0 0.0 0:01.45 migration/4 15 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/4 16 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/4 17 root RT -5 0 0 0 S 0.0 0.0 0:00.22 migration/5 18 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/5 19 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/5 20 root RT -5 0 0 0 S 0.0 0.0 0:00.15 migration/6 21 root 34 19 0 0 0 S 0.0 0.0 0:00.02 ksoftirqd/6 22 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/6 23 root RT -5 0 0 0 S 0.0 0.0 0:00.15 migration/7 24 root 34 19 0 0 0 S 0.0 0.0 0:00.01 ksoftirqd/7 25 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/7 26 root RT -5 0 0 0 S 0.0 0.0 0:00.19 migration/8 27 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/8 28 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/8 29 root RT -5 0 0 0 S 0.0 0.0 0:00.34 migration/9 30 root 34 19 0 0 0 S 0.0 0.0 0:00.03 ksoftirqd/9 31 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/9 32 root RT -5 0 0 0 S 0.0 0.0 0:00.16 migration/10 33 root 34 19 0 0 0 S 0.0 0.0 0:00.04 ksoftirqd/10 34 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/10 35 root RT -5 0 0 0 S 0.0 0.0 0:00.12 migration/11 36 root 34 19 0 0 0 S 0.0 0.0 0:00.05 ksoftirqd/11 37 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/11 38 root RT -5 0 0 0 S 0.0 0.0 0:00.35 migration/12

    Read the article

  • Server hung with "blocked for more than 120 seconds", diskless

    - by alterpub
    I have server which hung every 2-5 days. dmesg show following situation: kernel: [490894.231753] INFO: task munin-html:10187 blocked for more than 120 seconds. kernel: [490894.231799] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. kernel: [490894.231843] munin-html D 0000000000000000 0 10187 9796 0x00000000 kernel: [490894.231878] ffff88063174b968 0000000000000082 ffff88063174b8d8 0000000000015b80 kernel: [490894.231930] ffff88063174bfd8 0000000000015b80 ffff88063174bfd8 ffff88062fe644d0 kernel: [490894.231982] 0000000000015b80 0000000000015b80 ffff88063174bfd8 0000000000015b80 kernel: [490894.232033] Call Trace: kernel: [490894.232059] [<ffffffff8117fce0>] ? sync_buffer+0x0/0x50 kernel: [490894.232089] [<ffffffff815a1f13>] io_schedule+0x73/0xc0 kernel: [490894.232115] [<ffffffff8117fd25>] sync_buffer+0x45/0x50 kernel: [490894.232143] [<ffffffff815a258f>] __wait_on_bit+0x5f/0x90 kernel: [490894.232170] [<ffffffff8117fce0>] ? sync_buffer+0x0/0x50 kernel: [490894.232197] [<ffffffff815a2638>] out_of_line_wait_on_bit+0x78/0x90 kernel: [490894.232227] [<ffffffff81080250>] ? wake_bit_function+0x0/0x40 kernel: [490894.232255] [<ffffffff8117fcd6>] __wait_on_buffer+0x26/0x30 kernel: [490894.232288] [<ffffffffa00131be>] squashfs_read_data+0x1be/0x520 [squashfs] kernel: [490894.232320] [<ffffffff8114f0f1>] ? __mem_cgroup_try_charge+0x71/0x450 kernel: [490894.232350] [<ffffffffa0013963>] squashfs_cache_get+0x1c3/0x320 [squashfs] kernel: [490894.232381] [<ffffffffa00136eb>] ? squashfs_copy_data+0x10b/0x130 [squashfs] kernel: [490894.232426] [<ffffffff815a3dbe>] ? _raw_spin_lock+0xe/0x20 kernel: [490894.232454] [<ffffffffa0013b68>] ? squashfs_read_metadata+0x48/0xf0 [squashfs] kernel: [490894.232499] [<ffffffffa0013ae1>] squashfs_get_datablock+0x21/0x30 [squashfs] kernel: [490894.232544] [<ffffffffa0015026>] squashfs_readpage+0x436/0x4a0 [squashfs] kernel: [490894.232575] [<ffffffff8111a375>] ? __inc_zone_page_state+0x35/0x40 kernel: [490894.232606] [<ffffffff8110d072>] __do_page_cache_readahead+0x172/0x210 kernel: [490894.232636] [<ffffffff8110d131>] ra_submit+0x21/0x30 kernel: [490894.232662] [<ffffffff811045f3>] filemap_fault+0x3f3/0x450 kernel: [490894.232691] [<ffffffff812bd156>] ? prio_tree_insert+0x256/0x2b0 kernel: [490894.232726] [<ffffffffa009225d>] aufs_fault+0x11d/0x170 [aufs] kernel: [490894.232755] [<ffffffff8111f6d4>] __do_fault+0x54/0x560 kernel: [490894.232782] [<ffffffff81122f39>] handle_mm_fault+0x1b9/0x440 kernel: [490894.232811] [<ffffffff811286f5>] ? do_mmap_pgoff+0x335/0x380 kernel: [490894.232840] [<ffffffff815a7af5>] do_page_fault+0x125/0x350 kernel: [490894.232867] [<ffffffff815a4675>] page_fault+0x25/0x30 Os info cat /etc/issue.net Ubuntu 10.04.2 LTS uname -a Linux Shard1Host3 2.6.35-32-server #68~lucid1-Ubuntu SMP Wed Mar 28 18:33:00 UTC 2012 x86_64 GNU/Linux This system load via ltsp and hasn't harddrives also it has a lot of memory 24Gb. free -m total used free shared buffers cached Mem: 24152 17090 7061 0 50 494 -/+ buffers/cache: 16545 7607 Swap: 0 0 0 I put vmstat info here but I think it won't give results after reboot vmstat 1 30 procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu---- r b swpd free buff cache si so bi bo in cs us sy id wa 0 0 0 7231156 52196 506400 0 0 0 0 172 143 7 0 92 0 1 0 0 7231024 52196 506400 0 0 0 0 7859 16233 5 0 94 0 0 0 0 7231024 52196 506400 0 0 0 0 7870 16446 2 0 98 0 0 0 0 7230900 52196 506400 0 0 0 0 7308 15661 5 0 95 0 0 0 0 7231100 52196 506400 0 0 0 0 7960 16543 6 0 94 0 0 0 0 7231100 52196 506400 0 0 0 0 7542 16047 5 1 94 0 3 0 0 7231100 52196 506400 0 0 0 0 7709 16621 3 0 96 0 0 0 0 7231220 52196 506400 0 0 0 0 7857 16552 4 0 96 0 0 0 0 7231220 52196 506400 0 0 0 0 7192 15491 6 0 94 0 0 0 0 7231220 52196 506400 0 0 0 0 7423 15792 5 1 94 0 1 0 0 7231260 52196 506404 0 0 0 0 7686 16296 2 0 98 0 0 0 0 7231260 52196 506404 0 0 0 0 6976 15183 5 0 95 0 0 0 0 7231260 52196 506404 0 0 0 0 7303 15600 4 0 95 0 0 0 0 7231320 52196 506404 0 0 0 0 7967 16241 1 0 98 0 0 0 0 7231444 52196 506404 0 0 0 0 6948 15113 6 0 94 0 0 0 0 7231444 52196 506404 0 0 0 0 7931 16181 6 0 94 0 1 0 0 7231516 52196 506404 0 0 0 0 7715 15829 6 0 94 0 0 0 0 7231516 52196 506404 0 0 0 0 7771 16036 2 0 97 0 0 0 0 7231268 52196 506404 0 0 0 0 7782 16202 6 0 94 0 1 0 0 7231212 52196 506404 0 0 0 0 7457 15622 4 0 96 0 0 0 0 7231212 52196 506404 0 0 0 0 7573 16045 2 0 98 0 2 0 0 7231216 52196 506404 0 0 0 0 7689 16076 6 0 94 0 0 0 0 7231424 52196 506404 0 0 0 0 7429 15650 4 0 95 0 3 0 0 7231424 52196 506404 0 0 0 0 7534 16168 3 0 97 0 1 0 0 7230548 52196 506404 0 0 0 0 8559 15926 7 1 92 0 0 0 0 7230672 52196 506404 0 0 0 0 7720 15905 2 0 98 0 0 0 0 7230548 52196 506404 0 0 0 0 7677 16313 5 0 95 0 1 0 0 7230676 52196 506404 0 0 0 0 7209 15432 5 0 95 0 0 0 0 7230800 52196 506404 0 0 0 0 7522 15861 2 0 98 0 0 0 0 7230552 52196 506404 0 0 0 0 7760 16661 5 0 95 0 In the munin(monitoring system) graphs I see(before server hung): Disk(nbd0) IOs per device: read: 289m but avg by week 2.09m Disk(nbd0) throughput per device: read: 4.73k but avg by week 108.76 Disk(nbd0) utilization per device: 100% but avg by week 1.2% Eth0 traffic was low: in/out only 2Mbps Number of threads increased to 566 usually 392 Fork rate 1.08 but usually 2.82 VMStat(processes state) Increased to 17.77(from 0 as far as I could see in the graph) CPU usage iowait 880.46% when usually 6.67% It'll be great if somebody help me to understand what's up.

    Read the article

  • Bad disks in ancient server

    - by Joel Coel
    I have a 1998-era Netware 3.12 server that runs everything on our campus: general ledger, purchasing, payroll, student information, grades, you name it. The server has an Adaptec RAID controller with two volumes: RAID 1, 2 17GB scsi disks, Seagate ST318417W RAID 5, 3 4GB scsi disks, 2 Seagate ST34573W and 1 ST34572W. We are currently in the early stages of a project to replace this system, but you don't just jump into a new system like that and so I need to keep this server running until at least November 2011. This week we had not one but two hard drives fail. Thankfully they are from different volumes and we're able to keep running for the moment, but given the close nature of these failures I have serious doubts that I'll be able to avoid catastrophic failure from this server through the November target as is without restoring the RAID redundancy — it'll only take one more drive failure anywhere and I'm completely hosed. We are fortunate enough to have exact match "spares" lying around for both drives, but the spares are in unknown condition. I tried swapping just them in, but the RAID controller isn't smart enough to handle this and it renders the system unbootable. As for the RAID controller itself, there is utility I can get into during POST via a Ctrl-A shortcut, but I can't do much useful from there. To actually manage volumes I must first boot in to Netware, at which point I can use CI/O Array Management Software Version 2.0 to actually look at volume information. I suspect that the normal way to manage things is to boot from a special floppy with the controller software on it, but that floppy is long gone. Going through the options in the RAID software, I think the only supported way to replace a disk in an existing RAID volume is to physically add the disk, boot up and configure it as a "spare" for a volume, force the volume to use the spare to replace an existing down disk (and at this point I'm only guessing) so that the down disk becomes the spare, repair the volume, remove the spare from the volume, and then shut down and remove the disk. Then start all over for the other failed disk. All this amounts to a lot of downtime, assuming I can even make it work and that my spares are any good. As for finding reliable spares, I have no clue where to even begin looking to find a new 4GB scsi drive, or even which exact scsi system I'm looking for, as it's gone through a few different iterations over time. Another option is to migrate this to a virtual machine (hyper-v), but all previous attempts we've made in this area have failed to get very far. When this machine was installed I was just graduating from high school, and so it requires lower level knowledge of netware and dos than I ever developed, or if I did have since forgotten (I'm not exactly a dos neophyte, either). Part of my problem is this is a high-use server, and taking it down for a few days to figure things out isn't gonna fly very well. As for the question, I'm looking for anything that might be helpful in this situation: a recommendation on a place to find good spares from this era, personal experience repairing RAID volumes using a similar controller or building a hyper-v vm from an old netware server, a line on a floppy with better software for the RAID controller, recommendation on a good Novell consultant in Nebraska that would be able to put things right, a whole other option I haven't considered yet, etc. Update: For backups, we have good (recently verified via restore) backups of the data only -- nothing for the software that actually runs things. Update 2: Just a progress report that I currently have a working Netware 3.12 install in VMWare Virtual Server 2.0, thanks largely to the guide I found here: http://cerbulescubogdan.blogspot.com/2010/11/novell-netware-312-on-vmware.html The next steps are preparing empty netware volumes to match the additional volumes on my existing server, taking a dump of everything on the C:\ drive and netware volumes on my existing server, and figuring out from that information what modules need added to netware, installing my licenses (we do still have that disk, if it's any good), and moving data over. I have approval to bring the server down for a week after the first of the year (sadly not before), so, aside from creating empty volumes, the rest of the work will have to wait until then. Final Update (Jan 5, 2011): I was able to get spares working in both raid arrays without data loss this week. Both are now listed by the controller as "FAULT TOLLERANT" (yay!). I was also able to build on the progress from my last update and now have a functional "spare" server in VMWare Server 2.0. The spare can run and use our erp software, but I can't put it into production because I can't (yet) print from that box (and I have no idea why). Even so, this VM will do in a pinch if I have no other choice, and between it and the repaired RAID arrays I'm comfortable pushing on until I can junk the machine in November.

    Read the article

  • My D-Link's Ethernet bridge downlink just got 10-30x slower?

    - by Jay Levitt
    TL;DR: I unplugged my network to move my desk, and now downloading via my DIR-655's Ethernet LAN bridge is 10-30x slower than the Ethernet switch it's plugged into. Background My network is SMC cable modem <-> Cisco firewall <-> Netgear switch <-> D-Link WiFi† | | | | SMC8014 ASA-5505 GS608v2 gigE DIR-655 rev A3 gigE †The DIR-655 is used as an access point, not a router (although what D-Link calls an access point, I'd call a bridge). The "WAN" port is unused; the Netgear connects to the built-in 4-port Ethernet LAN switch, inside the built- in router/firewall. Endpoints: MacBook Pro 17" mid-2010 iPhone 4S Fedora 12 Linux server running reasonably fast dual-Athlon X2, VelociRaptors, etc. All cables are <10 feet, mostly CAT-5e, some CAT-6, all premade. All WiFi endpoints are within three feet of the D-Link. Yesterday I unplugged and rearranged stuff, and now connecting via the D-Link - even through the wired switch, right next to the incoming network cable - is 30x slower than connecting directly to the Netgear switch, on both my MacBook and iPhone. How I'm measuring "slower" I'm mostly using http://speedtest.net, which of course only really measures broadband speeds. I've also installed http://www.speedtest.net/mini.php on my local server, but can't test the iPhone with that. Results Speedtest.net, closest server over Comcast business-class: CONFIG | PING (ms) | DOWN (Mbps) | UP (Mbps) Mac <-> Ethernet <-> Netgear | 9 | 31.6 | 6.8 Mac <-> Ethernet <-> D-Link | 8 | 4.1 | 6.0 Mac <-> WiFi <-> D-Link | 9 | 1.4 | 2.9 iPhone <-> WiFi <-> D-Link | 67 | 0.4 | 1.6 Speedtest Mini on Linux PC: CONFIG | DOWN (Mbps) | UP (Mbps) Mac <-> Ethernet <-> NetGear | 97.2 | 76.9 Mac <-> Ethernet <-> D-Link | 8.2 | 24.2 Mac <-> WiFi <-> D-Link | 1.0 | 8.6 Slow typing in SSH: Mac <-> Ethernet <-> Netgear <-> Linux PC: smooth Mac <-> Ethernet <-> D-Link <-> Linux PC: choppy Note that D-Link upload speeds are normal on broadband, slower locally (but I'd believe that's a D-Link limitation), and always faster than the downloads! Since ssh is choppy just with slow typing, I don't believe it's a throttling-type problem either; that's not a lot of bandwidth. What I've tried Swapping all "good" and "bad" cables Re-plugging "bad" cable from D-Link to Netgear and watching it be the "good" cable pulling cables away from power lines Verify that the Mac auto-detects the D-Link as gigE Try to verify the link speed of the D-Link <- Netgear connection, but the firmware doesn't report that Verify that the D-Link sees no TX/RX errors or collisions Use different Ethernet ports on both Netgear and D-Link Reset the D-Link to factory settings Upgrade the D-Link firmware from 1.21 to 1.35NA, 2010/11/12, the latest Reboot everything at least once On the Mac, disable Wi-Fi during the Ethernet tests, and unplug Ethernet during the Wi-Fi tests Using iStumbler, verify that the D-Link isn't picking overloaded Wi-Fi channels (usually just 1-5 neighbors on my and adjacent channels, average for my apt building) Verify that the only client connected to the Wi-Fi was the iPhone Verify that nothing was being chatty on my network according to the WISH log Enable and disable all sorts of D-Link settings, including forcing WAN auto-detect to gigE So. I don't mind buying a new access point—I wouldn't mind having a dual-link network—but as a guy who's been networking since gated v4 was a drastic rewrite, and who often used physical sniffers in the days before Wireshark, I'm baffled. I hate being baffled. What could I possibly have changed that would result in this? How can I measure it? All I can think of is a static zap—thick carpet, socks, HVAC—but I didn't feel one, and does that really happen anymore? Can I test if it's Ethernet vs. TCP layer slowness? I'm not familiar with modern network utilities; it's hard to Google without hitting "Q: Why is my network slow? A: Is your microwave on?" If I don't get an answer here, will someone big and powerful help me migrate it to serverfault without getting screamed back here? In the words of Inigo Montoya, "I must know." Don't get all Dread Pirate Roberts on me.

    Read the article

  • Bridging LXC containers to host eth0 so they can have a public IP

    - by Vianney Stroebel
    UPDATE: I found the solution there: http://www.linuxfoundation.org/collaborate/workgroups/networking/bridge#No_traffic_gets_trough_.28except_ARP_and_STP.29 # cd /proc/sys/net/bridge # ls bridge-nf-call-arptables bridge-nf-call-iptables bridge-nf-call-ip6tables bridge-nf-filter-vlan-tagged # for f in bridge-nf-*; do echo 0 $f; done But I'd like to have expert opinions on this: is it safe to disable all bridge-nf-*? What are they here for? END OF UPDATE I need to bridge LXC containers to the physical interface (eth0) of my host, reading numerous tutorials, documents and blog posts on the subject. I need the containers to have their own public IP (which I've previously done KVM/libvirt). After two days of searching and trying, I still can't make it work with LXC containers. The host runs a freshly installed Ubuntu Server Quantal (12.10) with only libvirt (which I'm not using here) and lxc installed. I created the containers with : lxc-create -t ubuntu -n mycontainer So they also run Ubuntu 12.10. Content of /var/lib/lxc/mycontainer/config is: lxc.utsname = mycontainer lxc.mount = /var/lib/lxc/test/fstab lxc.rootfs = /var/lib/lxc/test/rootfs lxc.network.type = veth lxc.network.flags = up lxc.network.link = br0 lxc.network.name = eth0 lxc.network.veth.pair = vethmycontainer lxc.network.ipv4 = 179.43.46.233 lxc.network.hwaddr= 02:00:00:86:5b:11 lxc.devttydir = lxc lxc.tty = 4 lxc.pts = 1024 lxc.arch = amd64 lxc.cap.drop = sys_module mac_admin mac_override lxc.pivotdir = lxc_putold # uncomment the next line to run the container unconfined: #lxc.aa_profile = unconfined lxc.cgroup.devices.deny = a # Allow any mknod (but not using the node) lxc.cgroup.devices.allow = c *:* m lxc.cgroup.devices.allow = b *:* m # /dev/null and zero lxc.cgroup.devices.allow = c 1:3 rwm lxc.cgroup.devices.allow = c 1:5 rwm # consoles lxc.cgroup.devices.allow = c 5:1 rwm lxc.cgroup.devices.allow = c 5:0 rwm #lxc.cgroup.devices.allow = c 4:0 rwm #lxc.cgroup.devices.allow = c 4:1 rwm # /dev/{,u}random lxc.cgroup.devices.allow = c 1:9 rwm lxc.cgroup.devices.allow = c 1:8 rwm lxc.cgroup.devices.allow = c 136:* rwm lxc.cgroup.devices.allow = c 5:2 rwm # rtc lxc.cgroup.devices.allow = c 254:0 rwm #fuse lxc.cgroup.devices.allow = c 10:229 rwm #tun lxc.cgroup.devices.allow = c 10:200 rwm #full lxc.cgroup.devices.allow = c 1:7 rwm #hpet lxc.cgroup.devices.allow = c 10:228 rwm #kvm lxc.cgroup.devices.allow = c 10:232 rwm Then I changed my host /etc/network/interfaces to: auto lo iface lo inet loopback auto br0 iface br0 inet static bridge_ports eth0 bridge_fd 0 address 92.281.86.226 netmask 255.255.255.0 network 92.281.86.0 broadcast 92.281.86.255 gateway 92.281.86.254 dns-nameservers 213.186.33.99 dns-search ovh.net When I try command line configuration ("brctl addif", "ifconfig eth0", etc.) my remote host becomes inaccessible and I have to hard reboot it. I changed the content of /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces to: auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 179.43.46.233 netmask 255.255.255.255 broadcast 178.33.40.233 gateway 92.281.86.254 It takes several minutes for mycontainer to start (lxc-start -n mycontainer). I tried replacing gateway 92.281.86.254 by : post-up route add 92.281.86.254 dev eth0 post-up route add default gw 92.281.86.254 post-down route del 92.281.86.254 dev eth0 post-down route del default gw 92.281.86.254 My container then starts instantly. But whatever configuration I set in /var/lib/lxc/mycontainer/rootfs/etc/network/interfaces, I cannot ping from mycontainer to any IP (including the host's) : ubuntu@mycontainer:~$ ping 92.281.86.226 PING 92.281.86.226 (92.281.86.226) 56(84) bytes of data. ^C --- 92.281.86.226 ping statistics --- 6 packets transmitted, 0 received, 100% packet loss, time 5031ms And my host cannot ping the container: root@host:~# ping 179.43.46.233 PING 179.43.46.233 (179.43.46.233) 56(84) bytes of data. ^C --- 179.43.46.233 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4000ms My container's ifconfig: ubuntu@mycontainer:~$ ifconfig eth0 Link encap:Ethernet HWaddr 02:00:00:86:5b:11 inet addr:179.43.46.233 Bcast:255.255.255.255 Mask:0.0.0.0 inet6 addr: fe80::ff:fe79:5a31/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:64 errors:0 dropped:6 overruns:0 frame:0 TX packets:54 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4070 (4.0 KB) TX bytes:4168 (4.1 KB) lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:32 errors:0 dropped:0 overruns:0 frame:0 TX packets:32 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2496 (2.4 KB) TX bytes:2496 (2.4 KB) My host's ifconfig: root@host:~# ifconfig br0 Link encap:Ethernet HWaddr 4c:72:b9:43:65:2b inet addr:92.281.86.226 Bcast:91.121.67.255 Mask:255.255.255.0 inet6 addr: fe80::4e72:b9ff:fe43:652b/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1453 errors:0 dropped:18 overruns:0 frame:0 TX packets:1630 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:145125 (145.1 KB) TX bytes:299943 (299.9 KB) eth0 Link encap:Ethernet HWaddr 4c:72:b9:43:65:2b UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3178 errors:0 dropped:0 overruns:0 frame:0 TX packets:1637 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:298263 (298.2 KB) TX bytes:309167 (309.1 KB) Interrupt:20 Memory:fe500000-fe520000 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:6 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:300 (300.0 B) TX bytes:300 (300.0 B) vethtest Link encap:Ethernet HWaddr fe:0d:7f:3e:70:88 inet6 addr: fe80::fc0d:7fff:fe3e:7088/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:54 errors:0 dropped:0 overruns:0 frame:0 TX packets:67 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:4168 (4.1 KB) TX bytes:4250 (4.2 KB) virbr0 Link encap:Ethernet HWaddr de:49:c5:66:cf:84 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0 UP BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:0 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 B) TX bytes:0 (0.0 B) I have disabled lxcbr0 (USE_LXC_BRIDGE="false" in /etc/default/lxc). root@host:~# brctl show bridge name bridge id STP enabled interfaces br0 8000.4c72b943652b no eth0 vethtest I have configured the IP 179.43.46.233 to point to 02:00:00:86:5b:11 in my hosting provider (OVH) config panel. (The IPs in this post are not the real ones.) Thanks for reading this long question! :-) Vianney

    Read the article

  • OS Missing? Messed up the MBR on Win7 64-bit

    - by hom3lesshom3boy
    I have a Windows 7 machine with two hard drives: a 1TB C: drive and 500GB J:. I had Windows XP installed on C: and Windows 7 installed on J:. I installed Windows 7 after Windows XP from an installer .exe I (legally) bought and downloaded. It, and all of my other files, are sitting on my J: drive intact. While under my Windows 7 install, a few days ago I decided to use Priform's CCleaner and use its DriveWipe utility to wipe the C: drive. 1% into the process, I cancelled and attempted to use it again. It gives me an error saying it can't format the drive, so I poke around the Internet a bit, give up, and restart my computer. I first get an "OS is missing" error after the computer boots past the BIOS. I downloaded and put UBCD on a bootable USB to use another drivewiping tool to completely erase the C: drive, hoping it'll take the problem with it. No luck. I try to use TestDisk to make my J: my primary active drive, but no luck. I still get the "OS is missing" error. Or sometimes it'll hang at Verifying DMI Pool. Or sometimes I'll get the "NTLDR is missing" error. I get hold of Hiren's and put it on another bootable USB. I first I tried the Boot Windows 7 from Hard Drive option, and I get "Error 15: File Not Found". I tried the "Fix 'NTLDR is Missing'" option (I'm not quite sure why this is even showing up, since I'm trying to get into a HDD with Windows 7 installed. Probably messed up somewhere when I used TestDisk) and I get this list: I'll run through the error messages I get: 1st Try - Windows could not start because the following file is missing or corrupt: \system32\hal.dll 2nd Try - Windows could not start because the following file is missing or corrupt: \system32\ntoskrnl.exe 3rd Try - Windows could not start because of a computer disk hardware configuration problem. Could not read from the selected boot disk. Check boot path and disk hardware. 4th - 8th Try - Same as #3 9th Try - I/O Error accessing boot sector file multi(0)disk(0)fdisk(0)\BOOTSEC.DOS. And computer freezes. 10th Try - computer restarts Needless to say, not a single one of those works. I then tried to open up the Windows 7 exe I have sitting on my J: from the Mini-XP OS on Hiren's, but it won't run because I'm trying to run a 64-bit file from a 32-bit exe. At least, that's the problem according to these guys: http://social.technet.microsoft.com/...-b2f54e9c7d18/ I then borrowed a 64-bit Windows Home Premium CD from a friend to get to the recovery options. But I get the error message: This version of System Recovery Options is not compatible with the version of Windows you are trying to repair. Try using a recovery disc that is compatible with this version of Windows. I pressed Shift + F10 to get to the Command Prompt directly. These are the exact steps I took from there (paraphrased a little): X:\Sources>bootrec /Fixmbr The operation completed successfully. X:\Sources>bootrec /Fixboot The operation completed successfully. I restarted my computer, but it still didn't work. I unplugged the C: drive, then tried bootrec and Diskpart: X:\Sources> bootrec.exe X:\Sources> bootrec /RebuildBcd Total identified Windows installations: 1 [1] \\?\GLOBALROOT\Device\HarddiskVolume1\Windows Add installation to bootlist? Yes(Y)/No(N)/All(A):y The requested system device cannot be found. X:\Sources>DiskPart DISKPART> List Disk Disk # Status Size Free Dyn Gpt Disk 0_Online_465GB_0B_______* Disk 1 Online 1000MB 0B (this is Hiren's on a bootable usb) DISKPART> Select Disk 0 Disk 0 is now the selected disk. DISKPART> List Partition Partition # Type Size Offset Partition 1 System 465GB 31KB DISKPART> Select Partition 1 Partition 1 is now the selected partition DISKPART> Active The selected disk is not a fixed MBR disk. The ACTIVE command can only be used on fixed MBR disks. DISKPART> exit Leaving Diskpart... X:\Sources>bootrec /Fixmbr The operation completed successfully. X:\Sources>bootrec /Fixboot The operation completed successfully. Before I go any further, is there anything I'm overlooking/doing wrong? All I care about is making the J: and Windows 7 bootable again. SPECS: Windows 7 Professional 64-Bit GIGABYTE - Motherboard - Socket 775 - GA-P35-DS3R (rev. 2.1) Crucial Ballistix 2048MB PC6400 DDR2 800MHz (2x2GB) Intel Core 2 Duo E6700 Processor (2.6 (6GHZ) I think... not sure anymore C: HDD - SAMSUNG HD103UJ (1TB, not plugged in) J: HDD - WDC WD5000AKS-00V1A0 (500GB)

    Read the article

  • Nginx no static files after update

    - by SomeoneS
    First, i must say that i am not expert in server administration, my site was setup by hosting admins (that i cannot contact anymore). Few days ago, i updated Nginx to latest version (admin told me that it is safe to do). But after that, my site serves only html content, no CSS, images, JS. If i try to open some image i get message "Wellcome to Nginx" (same thin if i try to open static.mysitedomain.com). More details: Site has static. subdomain, but static files are in same directory as they used to be before setting up static files. I was googling for some solutions, i tried to change something in /etc/nginx/, but no luck. I feel that this is some minor configuration problem, any ideas? EDIT: Here is /etc/nginx/nginx.conf file content: user www-data; worker_processes 4; pid /var/run/nginx.pid; events { worker_connections 768; # multi_accept on; } http { ## # Basic Settings ## sendfile on; tcp_nopush on; tcp_nodelay on; keepalive_timeout 65; types_hash_max_size 2048; # server_tokens off; # server_names_hash_bucket_size 64; # server_name_in_redirect off; include /etc/nginx/mime.types; default_type application/octet-stream; ## # Logging Settings ## access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; ## # Gzip Settings ## gzip on; gzip_disable "msie6"; # gzip_vary on; # gzip_proxied any; # gzip_comp_level 6; # gzip_buffers 16 8k; # gzip_http_version 1.1; # gzip_types text/plain text/css application/json application/x-javascript text/xml application/xml application/xml+rss text/javascript; ## # nginx-naxsi config ## # Uncomment it if you installed nginx-naxsi ## #include /etc/nginx/naxsi_core.rules; ## # nginx-passenger config ## # Uncomment it if you installed nginx-passenger ## #passenger_root /usr; #passenger_ruby /usr/bin/ruby; ## # Virtual Host Configs ## include /etc/nginx/conf.d/*.conf; include /etc/nginx/sites-enabled/*; } Here is /etc/nginx/sites-enabled/default file content: server { #listen 80; ## listen for ipv4; this line is default and implied #listen [::]:80 default ipv6only=on; ## listen for ipv6 root /usr/share/nginx/www; index index.html index.htm; # Make site accessible from http://localhost/ server_name localhost; location / { # First attempt to serve request as file, then # as directory, then fall back to index.html try_files $uri $uri/ /index.html; # Uncomment to enable naxsi on this location # include /etc/nginx/naxsi.rules } location /doc/ { alias /usr/share/doc/; autoindex on; allow 127.0.0.1; deny all; } # Only for nginx-naxsi : process denied requests #location /RequestDenied { # For example, return an error code #return 418; #} #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # #error_page 500 502 503 504 /50x.html; #location = /50x.html { # root /usr/share/nginx/www; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # #location ~ \.php$ { # fastcgi_split_path_info ^(.+\.php)(/.+)$; # # NOTE: You should have "cgi.fix_pathinfo = 0;" in php.ini # # # With php5-cgi alone: # fastcgi_pass 127.0.0.1:9000; # # With php5-fpm: # fastcgi_pass unix:/var/run/php5-fpm.sock; # fastcgi_index index.php; # include fastcgi_params; #} # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # #location ~ /\.ht { # deny all; #} } # another virtual host using mix of IP-, name-, and port-based configuration # #server { # listen 8000; # listen somename:8080; # server_name somename alias another.alias; # root html; # index index.html index.htm; # # location / { # try_files $uri $uri/ /index.html; # } #} # HTTPS server # #server { # listen 443; # server_name localhost; # # root html; # index index.html index.htm; # # ssl on; # ssl_certificate cert.pem; # ssl_certificate_key cert.key; # # ssl_session_timeout 5m; # # ssl_protocols SSLv3 TLSv1; # ssl_ciphers ALL:!ADH:!EXPORT56:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv3:+EXP; # ssl_prefer_server_ciphers on; # # location / { # try_files $uri $uri/ /index.html; # } #}

    Read the article

  • Linux Kernel crash mutex_lock_slowpath "blocked for more than 120 seconds". What to do?

    - by Roddick
    I have out-of-the box Debian Lenny with non-custom kernel 2.6.26-2-amd64. Brand new server that is used to 5% of it's potential, CPU and Disk-wise. Meaning it probably not crashing because of overload. every few days it freezes with hundreds of these messages in console log: : [284847.828428] INFO: task apache2:12473 blocked for more than 120 seconds. : [284847.868468] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. : [284847.912759] apache2 D ffff8101bc6b7ab0 0 12473 14358 : [284847.912763] ffff810160d5bc50 0000000000000082 ffff8101c0002e40 0000000000000000 : [284847.912766] ffff8101a7c42950 ffff810327d92810 ffff8101a7c42bd8 0000000400000044 : [284847.912770] ffff8101c0002e40 00000000000612d0 0000000000000000 00000040000612d0 : [284847.912773] Call Trace: : [284847.912786] [<ffffffff80429b0d>] __mutex_lock_slowpath+0x64/0x9b : [284847.912790] [<ffffffff80429972>] mutex_lock+0xa/0xb : [284847.912794] [<ffffffff802a20b9>] do_lookup+0x82/0x1c1 : [284847.912800] [<ffffffff802a4271>] __link_path_walk+0x87a/0xd19 : [284847.912805] [<ffffffff80295844>] kmem_getpages+0x96/0x15f : [284847.912808] [<ffffffff80295fb7>] ____cache_alloc_node+0x6d/0x106 : [284847.912814] [<ffffffff802a4756>] path_walk+0x46/0x8b : [284847.912819] [<ffffffff802a4a82>] do_path_lookup+0x158/0x1cf : [284847.912822] [<ffffffff802a3879>] getname+0x140/0x1a7 : [284847.912827] [<ffffffff802a53f1>] __user_walk_fd+0x37/0x4c : [284847.912831] [<ffffffff8029e381>] vfs_lstat_fd+0x18/0x47 : [284847.912840] [<ffffffff8029e3c9>] sys_newlstat+0x19/0x31 : [284847.912848] [<ffffffff8020beda>] system_call_after_swapgs+0x8a/0x8f Almost all traces has __mutex_lock_slowpath as top-level. Only some has different trace: : [284847.737386] INFO: task apache2:12472 blocked for more than 120 seconds. : [284847.777551] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. : [284847.824881] apache2 D ffff8101bc6b7ab0 0 12472 14358 : [284847.824886] ffff8101b9cc1c50 0000000000000086 ffffffffa0131e0a 0000000000000002 : [284847.824889] ffff8102e7454300 ffff810324c6cad0 ffff8102e7454588 0000000000000000 : [284847.824893] 0000000000000001 0000000000000296 0000000000000003 ffff8101b9cc1c58 : [284847.824896] Call Trace: : [284847.828403] [<ffffffffa0131e0a>] :ext3:__ext3_journal_dirty_metadata+0x1e/0x46 : [284847.828412] [<ffffffff80429b0d>] __mutex_lock_slowpath+0x64/0x9b : [284847.828418] [<ffffffff80429972>] mutex_lock+0xa/0xb : [284847.828421] [<ffffffff802a20b9>] do_lookup+0x82/0x1c1 : [284847.828427] [<ffffffff802a4271>] __link_path_walk+0x87a/0xd19 : [284847.828428] [<ffffffff80271296>] find_lock_page+0x1f/0x8a : [284847.828428] [<ffffffff80273182>] filemap_fault+0x1c2/0x33c : [284847.828428] [<ffffffff802a4756>] path_walk+0x46/0x8b : [284847.828428] [<ffffffff802a4a82>] do_path_lookup+0x158/0x1cf : [284847.828428] [<ffffffff802a3879>] getname+0x140/0x1a7 : [284847.828428] [<ffffffff802a53f1>] __user_walk_fd+0x37/0x4c : [284847.828428] [<ffffffff8029e381>] vfs_lstat_fd+0x18/0x47 : [284847.828428] [<ffffffff8029e3c9>] sys_newlstat+0x19/0x31 : [284847.828428] [<ffffffff8020beda>] system_call_after_swapgs+0x8a/0x8f kernel: [1912668.466347] INFO: task apache2:17984 blocked for more than 120 seconds. [1912668.507035] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. : [1912668.555165] apache2 D ffff8101c5637ba0 0 17984 17282 : [1912668.596752] ffff810166a7dd30 0000000000000086 0000000000000000 ffff810166a7dcd8 : [1912668.643341] ffff8101c563c880 ffff81024505f000 0000000000000002 ffff810166a7dd68 : [1912668.699566] 0000000000000086 00000000000cb1a0 0000000000000000 ffff81017f344d60 : [1912668.744773] Call Trace: : [1912668.761754] [<ffffffff8022a3ed>] pick_next_task_fair+0x6e/0x7a : [1912668.829311] [<ffffffff802be0e2>] bio_alloc_bioset+0x89/0xd9 : [1912668.861930] [<ffffffff8024ac3a>] getnstimeofday+0x39/0x98 : [1912668.897005] [<ffffffff802710f6>] sync_page+0x0/0x41 : [1912668.927868] [<ffffffff80429487>] io_schedule+0x5c/0x9e : [1912668.960286] [<ffffffff80271132>] sync_page+0x3c/0x41 : [1912668.991756] [<ffffffff804295fa>] __wait_on_bit_lock+0x36/0x66 : [1912669.031757] [<ffffffff802710e3>] __lock_page+0x5e/0x64 : [1912669.064191] [<ffffffff802461d3>] wake_bit_function+0x0/0x23 : [1912669.100100] [<ffffffff80281bc5>] handle_mm_fault+0x5e4/0x8de : [1912669.134531] [<ffffffff802461a5>] autoremove_wake_function+0x0/0x2e : [1912669.174623] [<ffffffff802aa108>] fcntl_setlk+0x1cf/0x291 : [1912669.210623] [<ffffffff802461a5>] autoremove_wake_function+0x0/0x2e : [1912669.246923] [<ffffffff802a677f>] sys_fcntl+0x280/0x2f7 After googling for "mutex_lock_slowpath" I can only find the Kernel mailing list discussions that this issue was introduced in some commit. Wthout reference to verison. Discussions as recent as Jan 25, 2011. The Kernel I am using is form Debian Lenny, year ago. What should I do? Is this bug even fixed in kernel? if it's such obvious bug why it happens so rarely? Should I download latest kernel from kernel.org and upgrade? Should I use Debian backports to install new "Approved" kernel? Am I missing something? What to do?

    Read the article

  • Web browsing is fast, but downloads are slow

    - by Ricket
    I work for a company on my university's campus, helping with general IT problems and some web development. But lately there has been a problem that has me and my boss completely stumped. We, plus one contractor, make up the entire IT department, so I'm reaching out to you for help. All around the office, we have wall jacks. These collect in a closet down the hall and all plug into a switch. This switch, along with our individual server jacks, plugs into another switch, and that switch plugs into our firewall hardware. Then the firewall is connected out to our campus network. Our campus internet is, well, very fast. I don't know exactly the terms, tiers, etc., but we have thousands of students and downloads can run as fast as 10 MB/s at night; uploads are sometimes even faster. I think we're practically ISP level. In short, I have a lot of faith that it is not the campus side of things that is causing a problem, combined with other evidence I'll mention in a moment. So our symptoms: web browsing is fast. Web pages, images, etc. load instantly. No problems there. But then when I go to download something, the download starts fast but very quickly (a matter of seconds) drops to nearly 0. Often it will actually drop to 0 and time out. This happens with even very small files, 1 MB or less. It smells to me like a QoS sort of thing. I'm not entirely sure, and I wanted to get your opinions first. My boss is hesitant to touch our firewall, much less let me touch it, and it was set up and is managed by a consultant remotely. These problems don't seem tied to a time of the day. I've tried downloads after 5:00 and still the same thing happens. From my desk, I can turn on my wireless adapter and pick up the campus wireless access point. If I unplug ethernet and connect to it, downloads are fast. This adds to my suspicion that it's limited to our company network. Also, a number of weeks ago the consultant upgraded our firewall firmware. Suddenly everything was very fast. I tested with downloads from Sun and speedtest.net and things were blazing fast, as they should be with our campus internet! It was wonderful, and I figured the slow speeds were an old firmware bug. In a matter of days, things steadily declined until they were back to the old symptoms. Oh, and we have antivirus installed on every computer, and we keep it up to date. Though I suppose the possibility is still there that someone could have spyware which is bogging down our internet, in which case what is the easiest/best way to find this out? (maybe this should go in a separate question) Thank you for your patience in reading all of this. Do you have any ideas as to what I can try? Is this something that you've experienced before? What sort of tools or methods can I use to try and diagnose the problem? P.S. everything here is Windows. Windows Server 2003 and 2008 on our servers, and Windows XP on employees' machines. Update: We are submitting a ticket to the university to just take a look and see if they see anything unusual and/or can suggestion methods for us to try and pinpoint our problem. Hopefully they'll be helpful! I'll update this to let you know what goes on. Update again: We found a hub (yes, a HUB) right between our campus connection and our firewall. It had only those two ethernet cables plugged into it, nothing else. After removing the hub, our speeds have jumped up to several mbps. However in talking with the campus, we got them to run a gigabit line to our firewall in place of the 100mbps line. As of friday, we are at about 65 mbps up and down (according to speedtest.net at 8am)!! Go NC State!!

    Read the article

  • How to open a server port outside of an OpenVPN tunnel with a pf firewall on OSX (BSD)

    - by Timbo
    I have a Mac mini that I use as a media server running XBMC and serves media from my NAS to my stereo and TV (which has been color calibrated with a Spyder3Express, happy). The Mac runs OSX 10.8.2 and the internet connection is tunneled for general privacy over OpenVPN through Tunnelblick. I believe my anonymous VPN provider pushes "redirect_gateway" to OpenVPN/Tunnelblick because when on it effectively tunnels all non-LAN traffic in- and outbound. As an unwanted side effect that also opens the boxes server ports unprotected to the outside world and bypasses my firewall-router (Netgear SRX5308). I have run nmap from outside the LAN on the VPN IP and the server ports on the mini are clearly visible and connectable. The mini has the following ports open: ssh/22, ARD/5900 and 8080+9090 for the XBMC iOS client Constellation. I also have Synology NAS which apart from LAN file serving over AFP and WebDAV only serves up an OpenVPN/1194 and a PPTP/1732 server. When outside of the LAN I connect to this from my laptop over OpenVPN and over PPTP from my iPhone. I only want to connect through AFP/548 from the mini to the NAS. The border firewall (SRX5308) just works excellently, stable and with a very high throughput when streaming from various VOD services. My connection is a 100/10 with a close to theoretical max throughput. The ruleset is as follows Inbound: PPTP/1723 Allow always to 10.0.0.40 (NAS/VPN server) from a restricted IP range >corresponding to possible cell provider range OpenVPN/1194 Allow always to 10.0.0.40 (NAS/VPN server) from any Outbound: Default outbound policy: Allow Always OpenVPN/1194 TCP Allow always from 10.0.0.40 (NAS) to a.b.8.1-a.b.8.254 (VPN provider) OpenVPN/1194 UDP Allow always to 10.0.0.40 (NAS) to a.b.8.1-a.b.8.254 (VPN provider) Block always from NAS to any On the Mini I have disabled the OSX Application Level Firewall because it throws popups which don't remember my choices from one time to another and that's annoying on a media server. Instead I run Little Snitch which controls outgoing connections nicely on an application level. I have configured the excellent OSX builtin firewall pf (from BSD) as follows pf.conf (Apple App firewall tie-ins removed) (# replaced with % to avoid formatting errors) ### macro name for external interface. eth_if = "en0" vpn_if = "tap0" ### wifi_if = "en1" ### %usb_if = "en3" ext_if = $eth_if LAN="{10.0.0.0/24}" ### General housekeeping rules ### ### Drop all blocked packets silently set block-policy drop ### all incoming traffic on external interface is normalized and fragmented ### packets are reassembled. scrub in on $ext_if all fragment reassemble scrub in on $vpn_if all fragment reassemble scrub out all ### exercise antispoofing on the external interface, but add the local ### loopback interface as an exception, to prevent services utilizing the ### local loop from being blocked accidentally. ### set skip on lo0 antispoof for $ext_if inet antispoof for $vpn_if inet ### spoofing protection for all interfaces block in quick from urpf-failed ############################# block all ### Access to the mini server over ssh/22 and remote desktop/5900 from LAN/en0 only pass in on $eth_if proto tcp from $LAN to any port {22, 5900, 8080, 9090} ### Allow all udp and icmp also, necessary for Constellation. Could be tightened. pass on $eth_if proto {udp, icmp} from $LAN to any ### Allow AFP to 10.0.0.40 (NAS) pass out on $eth_if proto tcp from any to 10.0.0.40 port 548 ### Allow OpenVPN tunnel setup over unprotected link (en0) only to VPN provider IPs ### and port ranges pass on $eth_if proto tcp from any to a.b.8.0/24 port 1194:1201 ### OpenVPN Tunnel rules. All traffic allowed out, only in to ports 4100-4110 ### Outgoing pings ok pass in on $vpn_if proto {tcp, udp} from any to any port 4100:4110 pass out on $vpn_if proto {tcp, udp, icmp} from any to any So what are my goals and what does the above setup achieve? (until you tell me otherwise :) 1) Full LAN access to the above ports on the mini/media server (including through my own VPN server) 2) All internet traffic from the mini/media server is anonymized and tunneled over VPN 3) If OpenVPN/Tunnelblick on the mini drops the connection, nothing is leaked both because of pf and the router outgoing ruleset. It can't even do a DNS lookup through the router. So what do I have to hide with all this? Nothing much really, I just got carried away trying to stop port scans through the VPN tunnel :) In any case this setup works perfectly and it is very stable. The Problem at last! I want to run a minecraft server and I installed that on a separate user account on the mini server (user=mc) to keep things partitioned. I don't want this server accessible through the anonymized VPN tunnel because there are lots more port scans and hacking attempts through that than over my regular IP and I don't trust java in general. So I added the following pf rule on the mini: ### Allow Minecraft public through user mc pass in on $eth_if proto {tcp,udp} from any to any port 24983 user mc pass out on $eth_if proto {tcp, udp} from any to any user mc And these additions on the border firewall: Inbound: Allow always TCP/UDP from any to 10.0.0.40 (NAS) Outbound: Allow always TCP port 80 from 10.0.0.40 to any (needed for online account checkups) This works fine but only when the OpenVPN/Tunnelblick tunnel is down. When up no connection is possbile to the minecraft server from outside of LAN. inside LAN is always OK. Everything else functions as intended. I believe the redirect_gateway push is close to the root of the problem, but I want to keep that specific VPN provider because of the fantastic throughput, price and service. The Solution? How can I open up the minecraft server port outside of the tunnel so it's only available over en0 not the VPN tunnel? Should I a static route? But I don't know which IPs will be connecting...stumbles How secure would to estimate this setup to be and do you have other improvements to share? I've searched extensively in the last few days to no avail...If you've read this far I bet you know the answer :)

    Read the article

  • e2fsck extremly slow, although enough memory exists

    - by kaefert
    I've got this external USB-Disk: kaefert@blechmobil:~$ lsusb -s 2:3 Bus 002 Device 003: ID 0bc2:3320 Seagate RSS LLC As can be seen in this dmesg output, there are some problems that prevents that disk from beeing mounted: kaefert@blechmobil:~$ dmesg | grep sdb [ 114.474342] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) [ 114.475089] sd 5:0:0:0: [sdb] Write Protect is off [ 114.475092] sd 5:0:0:0: [sdb] Mode Sense: 43 00 00 00 [ 114.475959] sd 5:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 114.477093] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) [ 114.501649] sdb: sdb1 [ 114.502717] sd 5:0:0:0: [sdb] 732566645 4096-byte logical blocks: (3.00 TB/2.72 TiB) [ 114.504354] sd 5:0:0:0: [sdb] Attached SCSI disk [ 116.804408] EXT4-fs (sdb1): ext4_check_descriptors: Checksum for group 3976 failed (47397!=61519) [ 116.804413] EXT4-fs (sdb1): group descriptors corrupted! So I went and fired up my favorite partition manager - gparted, and told it to verify and repair the partition sdb1. This made gparted call e2fsck (version 1.42.4 (12-Jun-2012)) e2fsck -f -y -v /dev/sdb1 Although gparted called e2fsck with the "-v" option, sadly it doesn't show me the output of my e2fsck process (bugreport https://bugzilla.gnome.org/show_bug.cgi?id=467925 ) I started this whole thing on Sunday (2012-11-04_2200) evening, so about 48 hours ago, this is what htop says about it now (2012-11-06-1900): PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command 3704 root 39 19 1560M 1166M 768 R 98.0 19.5 42h56:43 e2fsck -f -y -v /dev/sdb1 Now I found a few posts on the internet that discuss e2fsck running slow, for example: http://gparted-forum.surf4.info/viewtopic.php?id=13613 where they write that its a good idea to see if the disk is just that slow because maybe its damaged, and I think these outputs tell me that this is not the case in my case: kaefert@blechmobil:~$ sudo hdparm -tT /dev/sdb /dev/sdb: Timing cached reads: 3562 MB in 2.00 seconds = 1783.29 MB/sec Timing buffered disk reads: 82 MB in 3.01 seconds = 27.26 MB/sec kaefert@blechmobil:~$ sudo hdparm /dev/sdb /dev/sdb: multcount = 0 (off) readonly = 0 (off) readahead = 256 (on) geometry = 364801/255/63, sectors = 5860533160, start = 0 However, although I can read quickly from that disk, this disk speed doesn't seem to be used by e2fsck, considering tools like gkrellm or iotop or this: kaefert@blechmobil:~$ iostat -x Linux 3.2.0-2-amd64 (blechmobil) 2012-11-06 _x86_64_ (2 CPU) avg-cpu: %user %nice %system %iowait %steal %idle 14,24 47,81 14,63 0,95 0,00 22,37 Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 0,59 8,29 2,42 5,14 43,17 160,17 53,75 0,30 39,80 8,72 54,42 3,95 2,99 sdb 137,54 5,48 9,23 0,20 587,07 22,73 129,35 0,07 7,70 7,51 16,18 2,17 2,04 Now I researched a little bit on how to find out what e2fsck is doing with all that processor time, and I found the tool strace, which gives me this: kaefert@blechmobil:~$ sudo strace -p3704 lseek(4, 41026998272, SEEK_SET) = 41026998272 write(4, "\212\354K[_\361\3nl\212\245\352\255jR\303\354\312Yv\334p\253r\217\265\3567\325\257\3766"..., 4096) = 4096 lseek(4, 48404766720, SEEK_SET) = 48404766720 read(4, "\7t\260\366\346\337\304\210\33\267j\35\377'\31f\372\252\ffU\317.y\211\360\36\240c\30`\34"..., 4096) = 4096 lseek(4, 41027002368, SEEK_SET) = 41027002368 write(4, "\232]7Ws\321\352\t\1@[+5\263\334\276{\343zZx\352\21\316`1\271[\202\350R`"..., 4096) = 4096 lseek(4, 48404770816, SEEK_SET) = 48404770816 read(4, "\17\362r\230\327\25\346//\210H\v\311\3237\323K\304\306\361a\223\311\324\272?\213\tq \370\24"..., 4096) = 4096 lseek(4, 41027006464, SEEK_SET) = 41027006464 write(4, "\367yy>x\216?=\324Z\305\351\376&\25\244\210\271\22\306}\276\237\370(\214\205G\262\360\257#"..., 4096) = 4096 lseek(4, 48404774912, SEEK_SET) = 48404774912 read(4, "\365\25\0\21|T\0\21}3t_\272\373\222k\r\177\303\1\201\261\221$\261B\232\3142\21U\316"..., 4096) = 4096 ^CProcess 3704 detached around 16 of these lines every second, so 4 read and 4 write operations every second, which I don't consider to be a lot.. And finally, my question: Will this process ever finish? If those numbers from fseek (48404774912) represent bytes, that would be something like 45 gigabytes, with this beeing a 3 terrabyte disk, which would give me 134 days to go, if the speed stays constant, and he scans the disk like this completly and only once. Do you have some advice for me? I have most of the data on that disk elsewhere, but I've put a lot of hours into sorting and merging it to this disk, so I would prefer to getting this disk up and running again, without formatting it anew. I don't think that the hardware is damaged since the disk is only a few months and since I can't see any I/O errors in the dmesg output. UPDATE: I just looked at the strace output again (2012-11-06_2300), now it looks like this: lseek(4, 1419860611072, SEEK_SET) = 1419860611072 read(4, "3#\f\2447\335\0\22A\355\374\276j\204'\207|\217V|\23\245[\7VP\251\242\276\207\317:"..., 4096) = 4096 lseek(4, 43018145792, SEEK_SET) = 43018145792 write(4, "]\206\231\342Y\204-2I\362\242\344\6R\205\361\324\177\265\317C\334V\324\260\334\275t=\10F."..., 4096) = 4096 lseek(4, 1419860615168, SEEK_SET) = 1419860615168 read(4, "\262\305\314Y\367\37x\326\245\226\226\320N\333$s\34\204\311\222\7\315\236\336\300TK\337\264\236\211n"..., 4096) = 4096 lseek(4, 43018149888, SEEK_SET) = 43018149888 write(4, "\271\224m\311\224\25!I\376\16;\377\0\223H\25Yd\201Y\342\r\203\271\24eG<\202{\373V"..., 4096) = 4096 lseek(4, 1419860619264, SEEK_SET) = 1419860619264 read(4, ";d\360\177\n\346\253\210\222|\250\352T\335M\33\260\320\261\7g\222P\344H?t\240\20\2548\310"..., 4096) = 4096 lseek(4, 43018153984, SEEK_SET) = 43018153984 write(4, "\360\252j\317\310\251G\227\335{\214`\341\267\31Y\202\360\v\374\307oq\3063\217Z\223\313\36D\211"..., 4096) = 4096 So this number of the lseeks before the reads, like 1419860619264 are already a lot bigger, standing for 1.29 terabytes if the numbers are bytes, so it doesn't seem to be a linear progress on a big scale, maybe there are only some areas that need work, that have big gaps in between them. (times are in CET)

    Read the article

  • Why would Linux VM in vSphere ESXi 5.5 show dramatically increased disk i/o latency?

    - by mhucka
    I'm stumped and I hope someone else will recognize the symptoms of this problem. Hardware: new Dell T110 II, dual-core Pentium G860 2.9 GHz, onboard SATA controller, one new 500 GB 7200 RPM cabled hard drive inside the box, other drives inside but not mounted yet. No RAID. Software: fresh CentOS 6.5 virtual machine under VMware ESXi 5.5.0 (build 174 + vSphere Client). 2.5 GB RAM allocated. The disk is how CentOS offered to set it up, namely as a volume inside an LVM Volume Group, except that I skipped having a separate /home and simply have / and /boot. CentOS is patched up, ESXi patched up, latest VMware tools installed in the VM. No users on the system, no services running, no files on the disk but the OS installation. I'm interacting with the VM via the VM virtual console in vSphere Client. Before going further, I wanted to check that I configured things more or less reasonably. I ran the following command as root in a shell on the VM: for i in 1 2 3 4 5 6 7 8 9 10; do dd if=/dev/zero of=/test.img bs=8k count=256k conv=fdatasync done I.e., just repeat the dd command 10 times, which results in printing the transfer rate each time. The results are disturbing. It starts off well: 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 20.451 s, 105 MB/s 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 20.4202 s, 105 MB/s ... but after 7-8 of these, it then prints 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GG) copied, 82.9779 s, 25.9 MB/s 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 84.0396 s, 25.6 MB/s 262144+0 records in 262144+0 records out 2147483648 bytes (2.1 GB) copied, 103.42 s, 20.8 MB/s If I wait a significant amount of time, say 30-45 minutes, and run it again, it again goes back to 105 MB/s, and after several rounds (sometimes a few, sometimes 10+), it drops to ~20-25 MB/s again. Plotting the disk latency in vSphere's interface, it shows periods of high disk latency hitting 1.2-1.5 seconds during the times that dd reports the low throughput. (And yes, things get pretty unresponsive while that's happening.) What could be causing this? I'm comfortable that it is not due to the disk failing, because I also had configured two other disks as an additional volume in the same system. At first I thought I did something wrong with that volume, but after commenting the volume out from /etc/fstab and rebooting, and trying the tests on / as shown above, it became clear that the problem is elsewhere. It is probably an ESXi configuration problem, but I'm not very experienced with ESXi. It's probably something stupid, but after trying to figure this out for many hours over multiple days, I can't find the problem, so I hope someone can point me in the right direction. (P.S.: yes, I know this hardware combo won't win any speed awards as a server, and I have reasons for using this low-end hardware and running a single VM, but I think that's besides the point for this question [unless it's actually a hardware problem].) ADDENDUM #1: Reading other answers such as this one made me try adding oflag=direct to dd. However, it makes no difference in the pattern of results: initially the numbers are higher for many rounds, then they drop to 20-25 MB/s. (The initial absolute numbers are in the 50 MB/s range.) ADDENDUM #2: Adding sync ; echo 3 > /proc/sys/vm/drop_caches into the loop does not make a difference at all. ADDENDUM #3: To take out further variables, I now run dd such that the file it creates is larger than the amount of RAM on the system. The new command is dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct. Initial throughput numbers with this version of the command are ~50 MB/s. They drop to 20-25 MB/s when things go south. ADDENDUM #4: Here is the output of iostat -d -m -x 1 running in another terminal window while performance is "good" and then again when it's "bad". (While this is going on, I'm running dd if=/dev/zero of=/test.img bs=16k count=256k conv=fdatasync oflag=direct.) First, when things are "good", it shows this: When things go "bad", iostat -d -m -x 1 shows this:

    Read the article

  • Should I go along with my choice of web hosting company or still search?

    - by Devner
    Hi all, I have been searching for a good website hosting company that can offer me all the services that I need for hosting my PHP & MySQL based website. Now this is a community based website and users will be able to upload pictures, etc. The hosting company that I have in mind, currently lets me do everything... let me use mail(), supports CRON jobs, etc. Of course they are charging about $6/month. Now the only problem with this company is that they have a limit of 50,000 files that can exist within the hosting account at any time. This kind of contradicts their frontpage ad of "UNLIMITED SPACE" on their website. Apart from this, I know of no other reason why I should not go with this hosting company. But my issue is that 50,000 file limit is what I cannot live with, once the users increase in significant number and the files they upload, exceed 50,000 in number. Now since this is a dynamic website and also includes sensitive issues like payments, etc. I am not sure if I should go ahead with this company as I am just starting out and then later switch over to a better hosting company which does not limit me with 50,000 files. If I need to switch over once I host with this company, I will need to take backups of all the files located in my account (jpg, zip, etc.), then upload them to the new host. I am not aware of any tools that can help me in this process. Can you please mention if you know any? I can go ahead with the other companies right now, but their cost is double/triple of the current price and they all sport less features than my current choice. If I pay more, then they are ready to accommodate my higher demands. Unfortunately, the company that I am willing to go with now, does NOT have any other higher/better plans that I can switch to. So that's the really really bad part. So my question(s): Since I am starting out with my website and since the scope of users initially is going to be less/small, should I go ahead with the current choice and then once the demand increases, switch over to a better provider? If yes, how can I transfer my database, especially the jpg files, etc. to the new provider? I don't even know the tools required to backup and restore to another host. (I don't like this idea but still..) Should I go ahead and pay more right now and go with better providers (without knowing if the website is going to do really that well) just for saving myself the trouble of having to take a backup of the 50,000 files and upload to a new host from an old host and just start paying double/triple the price without even knowing if I would receive back the returns as I expected? Backup and Restore in such a bulky numbers is something that I have never done before and hence I am stuck here trying to decide what to do. The price per month is also a considerable factor in my decision. All these web hosting companies say one common thing: It is customers responsibility to backup and restore data and they are not liable for any loss. So no matter what hosting company that I would like to go with, they ask me to take backup via FTP so that I can restore them whenever I want (& it seems to be safer to have the files locally with me). Some are providing tools for backup and some are not and I am not sure how much their backup tools can be trusted considering the disclaimers they have. I have never backed-up and restored 50,000 files from one web host to another, so please, all you experienced people out there, leave your comments and let me know your suggestions so that I can decide. I have spent 2 days fighting with myself trying to decide what to do and finally concluded that this is a double-edged sword and I can't arrive at a satisfactory final decision without involving others suggestions. I believe that someone must be out there who may have had such troublesome decision to make. So all your suggestions to help me make my decision are appreciated. Thank you all.

    Read the article

< Previous Page | 327 328 329 330 331 332 333 334 335 336 337 338  | Next Page >