Search Results

Search found 1753 results on 71 pages for 'consistent hashing'.

Page 59/71 | < Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66 | Next Page >

Python Django sites on Apache+mod_wsgi with nginx proxy: highly fluctuating performance

- by Halfgaar

I have an Ubuntu 10.04 box running several dozen Python Django sites using mod_wsgi (embedded mode; the faster mode, if properly configured). Performance highly fluctuates. Sometimes fast, sometimes several seconds delay. The smokeping graphs are al over the place. Recently, I also added an nginx proxy for the static content, in the hopes it would cure the highly fluctuating performance. But, even though it reduced the number of requests Apache has to process significantly, it didn't help with the main problem. When clicking around on websites while running htop, it can be seen that sometimes requests are almost instant, whereas sometimes it causes Apache to consume 100% CPU for a few seconds. I really don't understand where this fluctuation comes from. I have configured the mpm_worker for Apache like this: StartServers 1 MinSpareThreads 50 MaxSpareThreads 50 ThreadLimit 64 ThreadsPerChild 50 MaxClients 50 ServerLimit 1 MaxRequestsPerChild 0 MaxMemFree 2048 1 server with 50 threads, max 50 clients. Munin and apache2ctl -t both show a consistent presence of workers; they are not destroyed and created all the time. Yet, it behaves as such. This tells me that once a sub interpreter is created, it should remain in memory, yet it seems sites have to reload all the time. I also have a nginx+gunicorn box, which performs quite well. I would really like to know why Apache is so random. This is a virtual host config: <VirtualHost *:81> ServerAdmin [email protected] ServerName example.com DocumentRoot /srv/http/site/bla Alias /static/ /srv/http/site/static Alias /media/ /srv/http/site/media WSGIScriptAlias / /srv/http/site/passenger_wsgi.py <Directory /> AllowOverride None </Directory> <Directory /srv/http/site> Options -Indexes FollowSymLinks MultiViews AllowOverride None Order allow,deny allow from all </Directory> Ubuntu 10.04 Apache 2.2.14 mod_wsgi 2.8 nginx 0.7.65 Edit: I've put some code in the settings.py file of a site that writes the date to a tmp file whenever it's loaded. I can now see that the site is not randomly reloaded all the time, so Apache must be keeping it in memory. So, that's good, except it doesn't bring me closer to an answer... Edit: I just found an error that might also be related to this: File "/usr/lib/python2.6/subprocess.py", line 633, in __init__ errread, errwrite) File "/usr/lib/python2.6/subprocess.py", line 1049, in _execute_child self.pid = os.fork() OSError: [Errno 12] Cannot allocate memory The server has 600 of 2000 MB free, which should be plenty. Is there a limit that is set on Apache or WSGI somewhere?

Read the article
HOSTS ignored when disconnected [closed]

- by Synetech

Problem I’m seeing a strange and extremely frustrating problem. Any system that is not connect to the Internet (Windows 7 shows the no Internet access icon because it cannot constantly ping Microsoft’s servers) cannot even access locally hosted servers. Hypothesis The problem appears to be that the HOSTS file is not being used to resolve DNS entries when there are no active NICs. Tests / Reproduction You can reproduce it as so: Disconnect a system from the Internet (make sure all wired and wireless connections are disconnected). If necessary, add an entry to the HOSTS file (e.g., 127.0.0.1 foobar or 127.0.0.1 foobar.com) Open a command-prompt Type ping foobar or ping foobar.com Observations The screenshots below show a clear and demonstrative example. In the first snap, a laptop is connected to a router wirelessly. The HOSTS file has only three entries and they resolve just fine. In the second snap, the wireless radio is turned off, so the entries in the HOSTS file are ignored. Moreover, notice that pinging localhost still works even without any active NICs (as does 127.0.0.1), but it is using the IPv6 address (must be hard-coded). You can see the same results in Windows XP with no IPv6 installed, so it has nothing to do with IPv6. I tried pining what should have resolved to 127.0.0.1 while the desktop system (with no wireless NICs) was connected via its Ethernet adapter, then again after pulling the cable from the router and waiting a couple of seconds, then again after plugging the cable back in. The same thing happens if instead of pulling out the cable, the NIC is disabled through software (the [Disable] button in the NIC’s Status dialog or via Device Manager). Conclusions It looks as though the HOSTS file is only being read and used if there is an active NIC, otherwise it is being ignored. This makes some sense in that if there are no active network adapters, then presumably there will not be any network activity, and thus no need to resolve host names via the HOSTS file. This assumption is specious however because it precludes locally hosted virtual servers. The HOSTS file should be used regardless of external DNS server connectivity, otherwise you cannot use simple/consistent/testing-production names for locally hosted servers when not connected to the Internet (for example web servers; help servers for Visual Studio, 3dsmax, etc.; and so on). Question Does anyone know how to force Windows to use the HOSTS file even if there are no active NICs? Appendix Figure 1: While the wireless NIC is connected to the router (the cable-modem is in standby, so no external Internet connectivity). Figure 2: With the wireless radio turned off (the Ethernet port is not unconnected in both cases). Figure 3: Same results in XP with no IPv6

Read the article
How do I troubleshoot root cause of a hung windows (2003) server?

- by GregW

I have a pair of Windows (2003 Server) servers both running MS SQL Server (2008 EE) that each hang every few months. This has been occurring intermittently :( for the past 15 months pretty much since we started using the servers. The symptoms are as-follows: I cannot remote desktop in to troubleshoot; when I attempt to, I get stuck on a blank black screen and am never offered a login prompt I can still ping the servers I can still open a SQL connection to the server, and, CURIOUSLY/BIZARRELY, when I do a "select getdate()", the time it returns appears to be stuck on the exact fraction of a second when (I presume) the server hung. Repeated attempts to do "select getdate()" keep getting that same date, suggesting that the clock is frozen. Filesharing attempts to connect to the hung server fail with the error message: "\ServerName is not accessible. You might not have permissions to use this network resource. Contact the administrator of this server to find out if you have access permissions. The server's clock is not synchronized with the primary domain controller's clock." This is consistent with a frozen clock. Post-reboot, if I investigate the Windows Event Viewer logs, I can see many security accesses (coming from me and others) that I recognize were login attempts during the "down" period, but all of them in the security log are associated with that same timestamp of when the server hung. This also suggests the clock is frozen. There is not a clear cause in the Application or System event logs. I have a local Admin account on the server and am in the process of getting a domain-credentialed Admin account for better remote admin access. HP is supposed to be supporting these machines and has some low-level ILO2 access but they seem incapable of finding the root cause. A reboot will "fix" the problem but I would like to get to the root cause and solve the issue. Has anyone ever seen something like this odd clock behavior?! (If it were just one server I'd perhaps say a bad hardware clock, but two?) Can anyone advise me on what I should try to troubleshoot this sort of situation to find the root cause (or what I should tell HP to try?)

Read the article
What other protocols must not be fire-walled for FTP to work?

- by Chris

my Netgear router randomly reset itself the other day loosing all of my config settings: DSL details, Firewall rules, the lot! So I set about restoring all of the details manually, but when it came to configuring the firewall I wanted improve the security by explicitly setting 'deny' rules for everything that I figured is 'non-essential', and (although not necessary) whilst I was at it I set explicit 'allow' for the 'essential' protocols. I'll admit now I didn't really know what I was doing and everything was just 'my best guess', but I enabled only DNS, HTTP, HTTPS, FTP, SFTP, TFTP with everything else blocked. This did not work for me as I could not access 99% of web sites (although strangely Google worked!), so I played around a bit more and found that (oddly) if I disabled just the explicit 'allow' rules then everything worked fine, for browsing anyway. Today I came to work on some web-sites via FTP and just could not get a consistent connection, it kept dropping out after a few files or being blocked by the server or simply not connecting. It would authenticate okay but then stop when retrieving the initial directory listing! e.g.: Status: Delaying connection for 1 second due to previously failed connection attempt... Status: Resolving address of ftp.domain.co.uk Status: Resolving address of ftp.domain.co.uk Status: Connecting to 123.123.123.123:21... Status: Connecting to 123.123.123.123:21... Status: Connection established, waiting for welcome message... Status: Connection established, waiting for welcome message... Response: 421 Too many connections (8) from this IP Error: Could not connect to server Status: Delaying connection for 5 seconds due to previously failed connection attempt... Response: 421 Too many connections (8) from this IP Error: Could not connect to server Status: Delaying connection for 5 seconds due to previously failed connection attempt... I've checked and re-checked the FTP settings (they worked before anyway), I have Googled the I.T. out of the various protocols that I have blocked in the fire-wall but none seem essential to FTP (other than FTP/SFTP etc. which I have passively enabled). I'm (clearly) no server engineer, or protocols / fire-wall expert so I was hoping that some one could maybe shed some light on why my FTP is failing. I've been wondering if I ought to be allowing BGP, BOOTP and/or IDENT (or any others)? What other protocols are required for FTP? Thanks in advance!

Read the article
Tomcat and IIS 7 both on different ip's and different ports

- by n00b

I have Tomcat and IIS 7 installed together on a Windows 2008 server. The machine has two IPs (134.133.1.1 and 134.133.2.2). I want Tomcat to handle 134.133.1.1, on port 80, and IIS to handle both 134.133.2.2, on port 80 AND 134.133.1.1, on port 443, but can't seem to get the last two together (I can get one or the other by themselves on IIS, along with the first IP address on Tomcat). I have configured Tomcat to successfully listen to ip 134.133.1.1, on port 80 with this configuration; <Connector port="80" protocol="HTTP/1.1" address="134.133.1.1" connectionTimeout="20000" redirectPort="8443" /> I also have a site configured in IIS bound to ip 134.133.1.1, on port 443 (SSL). When I turn on IIS, after Tomcat, I can reach both 134.133.1.1:80 (Tomcat) and 134.133.1.1:443 (IIS) successfully (as desired). The problem now comes when I want to introduce a new site via IIS, at the new ip address. In IIS I have setup a new site at IP 134.133.2.2, port 80. I can not start the site. The event log shows this error; Unable to bind to the underlying transport for [::]:80. The IP Listen-Only list may contain a reference to an interface which may not exist on this machine. The data field contains the error number. I think this is because IIS 7 tries to listen to port 80 on all IPs, and it cant because Tomcat is taking port 80 for 134.133.1.1. From reading, the resolution is to specify the IP address you want IIS to bind on port 80. The problem is, when I add 134.133.2.2 to the iplisten list, then I get a 404 when I try navigating to 134.133.1.1:443. I assume this is because IIS is no longer listening to ANY port on 134.133.1.1. How do I resolve this such that IIS will return both sites? EDIT: Per request my IIS binding for site A is 134.133.2.2 on port 80 (http) and 134.133.2.2 on port 443. For site B in IIS, the binding is 134.133.1.1 on port 443 (https). Note the IPs in this example are just for example purposes, but consistent with my setup.

Read the article
Looking for a new backup solution to replace dying tape drive

- by E3 Group

We're running Windows Server 2003 SBS and another machine with Server 2003 Standard on it. The SBS server is about 7 years old running pretty much 24/7 - a HP server of some description. We have an Ultrium 448 cycling LTO2 400GB tapes daily and incrementally backing up approximately 100gb worth of data (20gb C:\ and system state, 40gb exchange, 40gb database for some crap marketing software) on BackupExec 10D. As of 5 months ago, the backups have been consistently failing with IO errors, bad reads and some write errors. When I say consistent, I mean every time and we haven't had a proper backup for the entire 5 months - So if the server explodes tomorrow, 7 years worth of data will just cease to exist. I've only just recently rejoined the company and am looking at rectifying the more concerning problems, so the first thing I did was try a backup to an USB2.0 external drive. It was excruciatingly slow. In fact it was so slow it took 40 hours and it still wasn't finished. I ended up cancelling it and reconfiguring the selections again to reduce file size. This, however, isn't a permanent solution. I concluded that the IO error was either from a faulty tape drive (which has a tape stuck in there right now and not coming out) or from a dying SCSI controller. Neither of them are good news and both are extremely expensive to fix. I'm operating on an extremely low budget so have been looking at outsourcing the backups. A company in Sydney (where I'm located) offer incremental online backups via a NAS. It costs almost double a new tape drive but offers monthly repayments which will let us get through times when cash flow is minimal. It seems like a sweet deal but it is still a little bit pricey. So I'm looking for a cheaper, yet reliable solution. Maybe some in-house NAS or something offsite? The idea is to avoid using tapes. Are there any recommendations for rectifying my current situation? Or are tapes the only way to go? I'm concerned that the server will die one day in the near future and I must be able to restore it to another server with different hardware.

Read the article
How can I verify that my SSD is performing as it should?

- by Jon Skeet

EDIT: Okay, so I've no idea what caused the change, but after trying loads of different things to work out what was wrong, I've rerun the WEI (about the 4th time in total) and the score has jumped to a far more respectable 7.3. I'm going to leave well alone now :) I've got a brand new 256GB SSD (Crucial CT256M225) which should have stellar performance. However, on my (also brand new) Dell Studio 1557 with Windows 7 Professional 64 bit, it's only giving a performance index of 5.9. I realise the performance index should be taken with a bit of a pinch of salt, but I wonder whether something's wrong. Given this paragraph from this MSDN article on Windows 7, I'd expect to see a high 6.X or possible a 7.X figure: In Windows 7, there are new random read, random write and flush assessments. Better SSDs can score above 6.5 all the way to 7.9. To be included in that range, an SSD has to have outstanding random read rates and be resilient to flush and random write workloads. In the Beta timeframe of Windows 7, there was a capping of scores at 1.9, 2.9 or the like if a disk (SSD or HDD) didn’t perform adequately when confronted with our random write and flush assessments. Feedback on this was pretty consistent, with most feeling the level of capping to be excessive. As a result, we now simply restrict SSDs with performance issues from joining the newly added 6.0+ and 7.0+ ranges. SSDs that are not solid performers across all assessments effectively get scored in a manner similar to what they would have been in Windows Vista, gaining no Win7 boost for great random read performance. How can I diagnose any performance issues with either the disk or how Windows 7 is handling it? Are there any particularly good tools you'd recommend? One note of curiosity: I couldn't install the firmware update (to 1916) until I changed my BIOS handling of the drive to ATA mode; after installing the firmware I tried to boot the Windows installation DVD - but that only worked after turning it back to AHCI mode (which I've left it in). Installing Windows 7 took longer than I expected - it sat at the "Windows is loading files" prompt for a very long time. Likewise it was on "Expanding files (0%)" for a long time. Since installation it's been fine though - but I don't know whether it's really providing quite as beefy performance as it should. EDIT: My netbook with the 64GB equivalent drive has a performance index of 6.6...

Read the article
Is my Cisco switch port bad?

- by ewwhite

I've been chasing a packet-loss and network stability issue for a handful of end-users on an internal network for the past few days... These issues surfaced last week, however the location was struck by lightning six weeks ago. I was seeing 5-10% packet loss between a stack of four Cisco 2960's and several PC's and phones on the other side of a 77-meter run. The PC's were run inline with the phones over a trunked link (switchport configuration pastebin). We were seeing dropped calls and interruptions in client-server applications and Microsoft Exchange connectivity. I tried the usual troubleshooting steps remotely, having a local technician do the following during breaks in user and production activity: change cables between the wall jack and device. change patch cables between the patch panel and switch port(s). try different switch ports within the 2960 stack. change end-user devices with known-good equipment (new phones, different PC's). clear switch port interface counters and monitor incrementing errors closely. (Pastebin output of sh int) Pored over the device logs and Observium RRD graphs. No link up/down issues from the switch side. change power strips on the end-user side. test cable runs from the Cisco 2960 using test cable-diagnostics tdr int Gi4/0/9 (clean)* test cable runs with a Tripp-Lite cable tester. (clean) run diagnostics on the switch stack members. (clean) In the end, it took three changes of switch ports to find a stable solution. The only logical conclusion is that a few Cisco 2960 switch ports are bad or flaky... Not dead, but not consistent in behavior either. I'm not used to seeing individual ports die in this manner. What else can I test or check to determine if these devices are bad? Is it common for single ports to have problems, rather than a contiguous bank of ports? BTW - show cable-diagnostics tdr int Gi4/0/14 is very cool... Interface Speed Local pair Pair length Remote pair Pair status --------- ----- ---------- ------------------ ----------- -------------------- Gi4/0/14 1000M Pair A 79 +/- 0 meters Pair B Normal Pair B 75 +/- 0 meters Pair A Normal Pair C 77 +/- 0 meters Pair D Normal Pair D 79 +/- 0 meters Pair C Normal

Read the article
Ubuntu: Multiple NICs, one used only for Wake-On-LAN

- by jcwx86

This is similar to some other questions, but I have a specific need which is not covered in the other questions. I have an Ubuntu server (11.10) with two NICs. One is built into the motherboard and the other is a PCI express card. I want to have my server connected to the internet via my NAT router and also have it able to wake from suspend using a Magic Packet (henceforth referred to as Wake-On-LAN, WOL). I can't do this with just one of the NICs because each has an issue - the built-in NIC will crash the system if it is placed under heavy load (typically downloading data), whilst the PCI express NIC will crash the system if it is used for WOL. I have spent some time investigating these individual problems, to no avail. My plan is thus: use the built-in NIC solely for WOL, and use the PCI express card for all other network communication except WOL. Since I send the WOL Magic Packet to a specific MAC address, there is no danger of hitting the wrong NIC, but there is a danger of using the built-in NIC for general network access, overloading it and crashing the system. Both NICs are wired to the same LAN with address space 192.168.0.0/24. The built-in ethernet card is set to have interface name eth1 and the PCI express card is eth0 in Ubuntu's udev persistent rules (so they stay the same upon reboot). I have been trying to set this up with the /etc/network/interfaces file. Here is where I am currently: auto lo iface lo inet loopback auto eth0 iface eth0 inet static address 192.168.0.3 netmask 255.255.255.0 network 192.168.0.0 broadcast 192.168.0.255 gateway 192.168.0.1 auto eth1 iface eth1 inet static address 192.168.0.254 netmask 255.255.255.0 I think by not specifying a gateway for eth1, I prevent it being used for outgoing requests. I don't mind if it can be reached on 192.168.0.254 on the LAN, i.e. via SSH -- it's IP is irrelevant to WOL, which is based on MAC addresses -- I just don't want it to be used to access internet resources. My kernel routing table (from route -n) is Kernel IP routing table Destination Gateway Genmask Flags Metric Ref Use Iface 0.0.0.0 192.168.0.1 0.0.0.0 UG 100 0 0 eth0 169.254.0.0 0.0.0.0 255.255.0.0 U 1000 0 0 eth0 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0 192.168.0.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1 My question is this: Is this sufficient for what I want to achieve? My research has thrown up the idea of using static routing to specify that eth1 should only be used for WOL on the local network, but I'm not sure this is necessary. I have been monitoring the activity of the interfaces using iptraf and it seems like eth0 takes the vast majority of the packets, though I am not sure that this will be consistent based on my configuration. Given that if I mess up the configuration, my system will likely crash, it is important to me to have this set up correctly!

Read the article
Why might one host be unable to access the Internet, when it can ping the router and when all other hosts can?

- by user1444233

I have a Draytek Vigor 2830n. It's kicking out a 192.168.3.0 LAN. It performs load-balancing across dual-WAN ports, although I've turned off the second WAN to simplify testing. There are many hosts on the LAN. All IPs are allocated through DHCP, most freely allocated from the pool, but one or two are bound to NIC MAC addresses. All hosts can access the Internet, save one. That host (192.168.3.100 or 'dot100' for short) gets allocated an IP address (and the right gateway address, DNS server addresses, subnet etc.) dot100 can ping itself. It can ping the gateway, and access the latter's web interface via port 80. It's responsive and loss-free (sustained ping over a couple of minutes reports no data loss). Yet, for some reason that evades me, dot100 can't ping an external IP address or domain name. I suspect it's never been able to, because it was getting some Internet access from a second adaptor (different subnet), but that's now been turned off, which exposed the problem. In dot100, I've tried: two operating systems (Windows 8 and Knoppix), to rule out anti-virus programs etc. two physical adaptors two cables, on each adaptor two IPs (e.g. .100 and .103 assigned by Mac and .26 from the pool) both dynamic and assigned (MAC-bound) DHCP-allocated IPs but none of this experiments yielded any variation in the result. dot100 is a crucial host. It's a file server for the network, so I need it to be reliably allocated a consistent IP. Can anyone offer a potential solution or a way forward with the analysis please? My guess My analysis so far leads me to believe it's a router issue. I've checked the web interface very carefully. There are no filters setup in Firewall - General Setup or Filter Setup. I suspect it's a corrupted internal routing table, but the web UI shows this as the Routing table: Key: C - connected, S - static, R - RIP, * - default, ~ - private * 0.0.0.0/ 0.0.0.0 via 62.XX.XX.X WAN1 * 62.XX.XX.X/ 255.255.255.255 via 62.XX.XX.X WAN1 S 82.YY.YYY.YYY/ 255.255.255.255 via 82.YY.YYY.YYY WAN1 C 192.168.1.0/ 255.255.255.0 directly connected WAN2 C~ 192.168.3.0/ 255.255.255.0 directly connected LAN2

Read the article
load-causing processes disappearing from "top" ps -o pcpu shows bogus numbers

- by Alec Matusis

I administer a large number of servers, and I have this problem only with Ubuntu 10.04 LTS: I run a server under normal load (say load average 3.0 on an 8-core server). The "top" command shows processes taking certain % of CPU that cause this load average: say PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 11008 mysql 20 0 25.9g 22g 5496 S 67 76.0 643539:38 mysqld ps -o pcpu,pid -p11008 %CPU PID 53.1 11008 , everything is consistent. The all of the sudden, the process causing the load average disappears from "top", but the process continues to run normally (albeit with a slight performance decrease), and the system load average becomes somewhat higher. The output of ps -o pcpu becomes bogus: # ps -o pcpu,pid -p11008 %CPU PID 317910278 1587 This happened to at least 5 different severs (different brand new IBM System X hardware), each running different software: one httpd 2.2, one mysqld 5.1, and one Twisted Python TCP servers. Each time the kernel was between 2.6.32-32-server and 2.6.32-40-server. I updated some machines to 2.6.32-41-server, and it has not happened on those yet, but the bug is rare (once every 60 days or so). This is from an affected machine: top - 10:39:06 up 73 days, 17:57, 3 users, load average: 6.62, 5.60, 5.34 Tasks: 207 total, 2 running, 205 sleeping, 0 stopped, 0 zombie Cpu(s): 11.4%us, 18.0%sy, 0.0%ni, 66.3%id, 4.3%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 74341464k total, 71985004k used, 2356460k free, 236456k buffers Swap: 3906552k total, 328k used, 3906224k free, 24838212k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 805 root 20 0 0 0 0 S 3 0.0 1493:09 fct0-worker 982 root 20 0 0 0 0 S 1 0.0 111:35.05 fioa-data-groom 914 root 20 0 0 0 0 S 0 0.0 884:42.71 fct1-worker 1068 root 20 0 19364 1496 1060 R 0 0.0 0:00.02 top Nothing causing high load is showing on top, but I have two highly loaded mysqld instances on it, that suddenly show crazy %CPU: #ps -o pcpu,pid,cmd -p1587 %CPU PID CMD 317713124 1587 /nail/encap/mysql-5.1.60/libexec/mysqld and #ps -o pcpu,pid,cmd -p1624 %CPU PID CMD 2802 1624 /nail/encap/mysql-5.1.60/libexec/mysqld Here are the numbers from # cat /proc/1587/stat 1587 (mysqld) S 1212 1088 1088 0 -1 4202752 14307313 0 162 0 85773299069 4611685932654088833 0 0 20 0 52 0 3549 27255418880 5483524 18446744073709551615 4194304 11111617 140733749236976 140733749235984 8858659 0 552967 4102 26345 18446744073709551615 0 0 17 5 0 0 0 0 0 the 14th and 15th numbers according to man proc are supposed to be utime %lu Amount of time that this process has been scheduled in user mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK). This includes guest time, guest_time (time spent running a virtual CPU, see below), so that applications that are not aware of the guest time field do not lose that time from their calculations. stime %lu Amount of time that this process has been scheduled in kernel mode, measured in clock ticks (divide by sysconf(_SC_CLK_TCK). On a normal server, these numbers are advancing, every time I check the /proc/PID/stat. On a buggy server, these numbers are stuck at a ridiculously high value like 4611685932654088833, and it's not changing. Has anyone encountered this bug?

Read the article
Google Apps For Business, SSO, AD FS 2.0 and AD

- by Dominique dutra

We are a small company with 22 people in the office. We had a lot of problems with e-mail in the past so I decided to change over to Google Apps for Business. It is the perfect solution for us, except for one thing: I need to be able to control the access to the mailboxes. Only users inside the office, authenticated to AD, or users authenticated to our VPN can connect to gmail. From what I've read it is possible using the SSO (Single Sign On) solution provided by Google - but i am having some trouble finding consistent information about it. First of all, our infrastructure: Windows Server 2008 R2 Active Directory, one domain only. Kerio Control for QoS and VPN. That's about it on our side. On Google Apps' side, I have one account, and 03 domains that my users use to log in. The main domain has most of the users, but the are a couple of people that login using one of the subdomains. I have a 03 domains because I run mail for 03 companies and wanted all to be in within the same control panel. Well, I found some guides on the internet but none of them cover the AD FS installation part. I've read somewhere that I needed to download AD FS 2.0 directly from Microsoft.com, because the one that came with Windows Server was a old version. I downloaded it (adfsSetup.exe) and tried to install but got an error, saying that I needed a Windows Server 2008 Sp2 for that program. My Windows Server 2008 is R2. I really need some help here, this is very importand, I dont want to have to pay $1000 for a SSO solution when i have an AD set up. Can someone please point me out to the right direction? Where can I find an AD FS 2.0 setup compatible with R2 would be a good start, or the one that came with r2 is already the 2.0 version. After the initial setup, there are some guides on the internet about the Google Apps part. It seems to be really easy. I also tried adding AD FS role, but there are a bunch of options wich I have no idea what means, and I coudn't find any guide covering that on the internet. I dont have a lot of experience with Windows Server, but I have a company wich is certificated and provide us with support. I can ask for their help in the later setup, but I dont think ADFS is a very common thing to deal with.

Read the article
Wireless traffic stops when downloading large files at high speed: packets lost (Linksys WRT120N router)

- by Torious

The problem Note: First I'd like to understand WHY this is happening. Ofcourse, a solution would be nice too. :) When downloading a large file over HTTP at high-speeds, my wireless traffic basically stops: I can't open webpages and the download itself pauses. It pauses pretty much immediately after starting it; sometimes at 800 KB, sometimes at a few MB. After some time, the download (and other traffic) resumes, but the problem keeps reoccurring during the same download. The problem does not occur when using a wired connection through the same router (Linskys WRT120N). Also note that the connection is not dropped when this happens. It's just that the traffic stops and I can't browse to web pages, etc. (SYN packets are sent but nothing is received, etc.) Inspection with Wireshark shows that the following happens: Server sends data packets which are acknowledged by client Server sends a packet, but SEQ indicates some packets were lost (6 packets in one occurrence). Server sends a few more packets and client acknowledges these using "selective acknowledgement" Server stops sending data for a while (since the lost packets were not acknowledged or the router stops forwarding them?) Eventually, server does a "retransmission" and traffic resumes as normal. This all seems normal behavior to me when packet loss occurs. It's the consistent packet loss throughout a large, high-speed download that puzzles me. What might cause this? My own idea is the following: My internet is pretty fast (100 mbps), so when starting a large-file download, the router buffers the incoming data (since wireless introduces some slight delay / lower speed, in part due to other networks), but the buffer overflows and the router drops packets to regulate traffic (and because it has no choice). But how could that happen? Doesn't the TCP window size limit the amount of data that can go unacknowledged? So how can the router's buffer overflow if there can only be like 64 KB waiting to be acknowledged? Note: I've disabled TCP window scaling and dynamic window size through netsh options, in an attempt to fix this, but it doesn't seem to matter. Also, Wireshark shows a pattern of the server sending 2 packets (of 1514 bytes) and the client sending an ACK, so does that rule out a possible buffer overflow? And a few more subsequent packets are received... I'm at a loss here. Thanks for any insights. Things that are (probably) NOT the cause / I have experimented with The browser Various TCP options in Windows 7 (netsh etc.) Router settings such as MTU, beacon interval, UPnP, ...

Read the article
SCCM SP2 - OOB Management Certificates Problems

- by Achinoam

I have a vPro client computer with AMT 4.0. It was importeed successfully via the Import OOB Computers wizard, and after sending a "Hello- packet" it became provisioned. (The SCCM GUI displays AMT Status: Provisioned). But when I try to perform power operations on this machine, they always fail with the following lines in the log: AMT Operation Worker: Wakes up to process instruction files 7/29/2009 10:59:29 AM 2176 (0x0880) AMT Operation Worker: Wait 20 seconds... 7/29/2009 10:59:29 AM 2176 (0x0880) Auto-worker Thread Pool: Work thread 3884 started 7/29/2009 10:59:29 AM 3884 (0x0F2C) session params : https:/ / amt4.domaindemo.com:16993 , 11001 7/29/2009 10:59:29 AM 3884 (0x0F2C) ERROR: Invoke(invoke) failed: 80020009argNum = 0 7/29/2009 10:59:31 AM 3884 (0x0F2C) Description: A security error occurred 7/29/2009 10:59:31 AM 3884 (0x0F2C) Error: Failed to Invoke CIM_BootConfigSetting::ChangeBootOrder_INPUT action. 7/29/2009 10:59:31 AM 3884 (0x0F2C) AMT Operation Worker: AMT machine amt4.domaindemo.com can't be waken up. Error code: 0x80072F8F 7/29/2009 10:59:31 AM 3884 (0x0F2C) Auto-worker Thread Pool: Warning, Failed to run task this time. Will retry(1) it 7/29/2009 10:59:31 AM 3884 (0x0F2C) After investigation, I've seen that the problem occurs already on the 2nd stage of the provisioning: Start 2nd stage provision on AMT device amt4.domaindemo.com. 8/2/2009 4:55:12 PM 2944 (0x0B80) session params : https: / / amt4.domaindemo.com:16993 , 11001 8/2/2009 4:55:12 PM 2944 (0x0B80) Delete existing ACLs... 8/2/2009 4:55:12 PM 2944 (0x0B80) ERROR: Invoke(invoke) failed: 80020009argNum = 0 8/2/2009 4:55:14 PM 2944 (0x0B80) Description: A security error occurred 8/2/2009 4:55:14 PM 2944 (0x0B80) Error: Cannot Enumerate User Acl Entries. 8/2/2009 4:55:14 PM 2944 (0x0B80) Error: CSMSAMTProvTask::StartProvision Fail to call AMTWSManUtilities::DeleteACLs 8/2/2009 4:55:14 PM 2944 (0x0B80) Error: Can not finish WSMAN call with target device. 1. Check if there is a winhttp proxy to block connection. 2. Service point is trying to establish connection with wireless IP address of AMT firmware but wireless management has NOT enabled yet. AMT firmware doesn't support provision through wireless connection. 3. For greater than 3.x AMT, there is a known issue in AMT firmware that WSMAN will fail with FQDN longer than 44 bytes. (MachineId = 17) 8/2/2009 4:55:14 PM 2944 (0x0B80) STATMSG: ID=7208 SEV=E LEV=M SOURCE="SMS Server" COMP="SMS_AMT_OPERATION_MANAGER" SYS=JE-DEV-MS0 SITE=JR1 PID=1756 TID=2944 GMTDATE=Sun Aug 02 14:55:14.281 2009 ISTR0="amt4.domaindemo.com" ISTR1="amt4.domaindemo.com" ISTR2="" ISTR3="" ISTR4="" ISTR5="" ISTR6="" ISTR7="" ISTR8="" ISTR9="" NUMATTRS=0 8/2/2009 4:55:14 PM 2944 (0x0B80) This error is consistent with all the other 2nd stage provisioning tasks. (Add ACLs, Enable Web UI, etc.) I've opened the certification authority, and I see that the certificates were issued to the SCCM Site server instead of the AMT client! What could be the reason for this failure? What is the problematic definition for the certificate? Thank you in advance!!!

Read the article
SCCM SP2 - OOB Management Certificates Problems

- by Achinoam

Hi experts, I have a vPro client computer with AMT 4.0. It was importeed successfully via the Import OOB Computers wizard, and after sending a "Hello- packet" it became provisioned. (The SCCM GUI displays AMT Status: Provisioned). But when I try to perform power operations on this machine, they always fail with the following lines in the log: AMT Operation Worker: Wakes up to process instruction files 7/29/2009 10:59:29 AM 2176 (0x0880) AMT Operation Worker: Wait 20 seconds... 7/29/2009 10:59:29 AM 2176 (0x0880) Auto-worker Thread Pool: Work thread 3884 started 7/29/2009 10:59:29 AM 3884 (0x0F2C) session params : https:/ / amt4.domaindemo.com:16993 , 11001 7/29/2009 10:59:29 AM 3884 (0x0F2C) ERROR: Invoke(invoke) failed: 80020009argNum = 0 7/29/2009 10:59:31 AM 3884 (0x0F2C) Description: A security error occurred 7/29/2009 10:59:31 AM 3884 (0x0F2C) Error: Failed to Invoke CIM_BootConfigSetting::ChangeBootOrder_INPUT action. 7/29/2009 10:59:31 AM 3884 (0x0F2C) AMT Operation Worker: AMT machine amt4.domaindemo.com can't be waken up. Error code: 0x80072F8F 7/29/2009 10:59:31 AM 3884 (0x0F2C) Auto-worker Thread Pool: Warning, Failed to run task this time. Will retry(1) it 7/29/2009 10:59:31 AM 3884 (0x0F2C) After investigation, I've seen that the problem occurs already on the 2nd stage of the provisioning: Start 2nd stage provision on AMT device amt4.domaindemo.com. 8/2/2009 4:55:12 PM 2944 (0x0B80) session params : https: / / amt4.domaindemo.com:16993 , 11001 8/2/2009 4:55:12 PM 2944 (0x0B80) Delete existing ACLs... 8/2/2009 4:55:12 PM 2944 (0x0B80) ERROR: Invoke(invoke) failed: 80020009argNum = 0 8/2/2009 4:55:14 PM 2944 (0x0B80) Description: A security error occurred 8/2/2009 4:55:14 PM 2944 (0x0B80) Error: Cannot Enumerate User Acl Entries. 8/2/2009 4:55:14 PM 2944 (0x0B80) Error: CSMSAMTProvTask::StartProvision Fail to call AMTWSManUtilities::DeleteACLs 8/2/2009 4:55:14 PM 2944 (0x0B80) Error: Can not finish WSMAN call with target device. 1. Check if there is a winhttp proxy to block connection. 2. Service point is trying to establish connection with wireless IP address of AMT firmware but wireless management has NOT enabled yet. AMT firmware doesn't support provision through wireless connection. 3. For greater than 3.x AMT, there is a known issue in AMT firmware that WSMAN will fail with FQDN longer than 44 bytes. (MachineId = 17) 8/2/2009 4:55:14 PM 2944 (0x0B80) STATMSG: ID=7208 SEV=E LEV=M SOURCE="SMS Server" COMP="SMS_AMT_OPERATION_MANAGER" SYS=JE-DEV-MS0 SITE=JR1 PID=1756 TID=2944 GMTDATE=Sun Aug 02 14:55:14.281 2009 ISTR0="amt4.domaindemo.com" ISTR1="amt4.domaindemo.com" ISTR2="" ISTR3="" ISTR4="" ISTR5="" ISTR6="" ISTR7="" ISTR8="" ISTR9="" NUMATTRS=0 8/2/2009 4:55:14 PM 2944 (0x0B80) This error is consistent with all the other 2nd stage provisioning tasks. (Add ACLs, Enable Web UI, etc.) I've opened the certification authority, and I see that the certificates were issued to the SCCM Site server instead of the AMT client! What could be the reason for this failure? What is the problematic definition for the certificate? Thank you in advance!!!

Read the article
How to setup ssh's umask for all type of connections

- by Unode

I've been searching for a way to setup OpenSSH's umask to 0027 in a consistent way across all connection types. By connection types I'm referring to: sftp scp ssh hostname ssh hostname program The difference between 3. and 4. is that the former starts a shell which usually reads the /etc/profile information while the latter doesn't. In addition by reading this post I've became aware of the -u option that is present in newer versions of OpenSSH. However this doesn't work. I must also add that /etc/profile now includes umask 0027. Going point by point: sftp - Setting -u 0027 in sshd_config as mentioned here, is not enough. If I don't set this parameter, sftp uses by default umask 0022. This means that if I have the two files: -rwxrwxrwx 1 user user 0 2011-01-29 02:04 execute -rw-rw-rw- 1 user user 0 2011-01-29 02:04 read-write When I use sftp to put them in the destination machine I actually get: -rwxr-xr-x 1 user user 0 2011-01-29 02:04 execute -rw-r--r-- 1 user user 0 2011-01-29 02:04 read-write However when I set -u 0027 on sshd_config of the destination machine I actually get: -rwxr--r-- 1 user user 0 2011-01-29 02:04 execute -rw-r--r-- 1 user user 0 2011-01-29 02:04 read-write which is not expected, since it should actually be: -rwxr-x--- 1 user user 0 2011-01-29 02:04 execute -rw-r----- 1 user user 0 2011-01-29 02:04 read-write Anyone understands why this happens? scp - Independently of what is setup for sftp, permissions are always umask 0022. I currently have no idea how to alter this. ssh hostname - no problem here since the shell reads /etc/profile by default which means umask 0027 in the current setup. ssh hostname program - same situation as scp. In sum, setting umask on sftp alters the result but not as it should, ssh hostname works as expected reading /etc/profile and both scp and ssh hostname program seem to have umask 0022 hardcoded somewhere. Any insight on any of the above points is welcome. EDIT: I would like to avoid patches that require manually compiling openssh. The system is running Ubuntu Server 10.04.01 (lucid) LTS with openssh packages from maverick. Answer: As indicated by poige, using pam_umask did the trick. The exact changes were: Lines added to /etc/pam.d/sshd: # Setting UMASK for all ssh based connections (ssh, sftp, scp) session optional pam_umask.so umask=0027 Also, in order to affect all login shells regardless of if they source /etc/profile or not, the same lines were also added to /etc/pam.d/login. EDIT: After some of the comments I retested this issue. At least in Ubuntu (where I tested) it seems that if the user has a different umask set in their shell's init files (.bashrc, .zshrc,...), the PAM umask is ignored and the user defined umask used instead. Changes in /etc/profile did't affect the outcome unless the user explicitly sources those changes in the init files. It is unclear at this point if this behavior happens in all distros.

Read the article
Rebasing a branch which is public

- by Dror

I'm failing to understand how to use git-rebase, and I consider the following example. Let's start a repository in ~/tmp/repo: $ git init Then add a file foo $ echo "hello world" > foo which is then added and committed: $ git add foo $ git commit -m "Added foo" Next, I started a remote repository. In ~/tmp/bare.git I ran $ git init --bare In order to link repo to bare.git I ran $ git remote add origin ../bare.git/ $ git push --set-upstream origin master Next, lets branch, add a file and set an upstream for the new branch b1: $ git checkout -b b1 $ echo "bar" > foo2 $ git add foo2 $ git commit -m "add foo2 in b1" $ git push --set-upstream origin b1 Now it is time to switch back to master and change something there: $ echo "change foo" > foo $ git commit -a -m "changed foo in master" $ git push At this point in master the file foo contain changed foo, while in b1 it is still hello world. Finally, I want to sync b1 with the progress made in master. $ git checkout b1 $ git fetch origin $ git rebase origin/master At this point git st returns: # On branch b1 # Your branch and 'origin/b1' have diverged, # and have 2 and 1 different commit each, respectively. # (use "git pull" to merge the remote branch into yours) # nothing to commit, working directory clean At this point the content of foo in the branch b1 is change foo as well. So what does this warning mean? I expected I should do a git push, git suggests to do git pull... According to this answer, this is more or less it, and in his comment @FrerichRaabe explicitly say that I don't need to do a pull. What's going on here? What is the danger, how should one proceed? How should the history be kept consistent? What is the interplay between the case described above and the following citation: Do not rebase commits that you have pushed to a public repository. taken from pro git book. I guess it is somehow related, and if not I would love to know why. What's the relation between the above scenario and the procedure I described in this post.

Read the article
What is good usage scenario for Rackspace Cloud Files CDN (powered by AKAMAI) [closed]

- by Andrew Smith

I have just setup my website as static page via Rackspace CDN / Akamai. www.example.co.uk is an alias for d9771e6f24423091aebc-345678991111238fabcdef6114258d0e1.r61.cf3.rackcdn.com. d9771e6f24423091aebc-345678991111238fabcdef6114258d0e1.r61.cf3.rackcdn.com is an alias for a61.rackcdn.com. a61.rackcdn.com is an alias for a61.rackcdn.com.mdc.edgesuite.net. a61.rackcdn.com.mdc.edgesuite.net is an alias for a63.dscg10.akamai.net. a63.dscg10.akamai.net has address 63.166.98.41 a63.dscg10.akamai.net has address 63.166.98.40 a63.dscg10.akamai.net has IPv6 address 2001:428:4c02::cda8:ecb9 a63.dscg10.akamai.net has IPv6 address 2001:428:4c02::cda8:ed09 The HTTP header: HTTP/1.0 200 OK Last-Modified: Fri, 19 Oct 2012 23:27:41 GMT ETag: fdf9e14b77def799e09e8ce815a521da X-Timestamp: 1350689261.23382 Content-Type: text/html X-Trans-Id: tx457979be3bd746c2b4e5403a1189cdbc Cache-Control: public, max-age=900 Expires: Sat, 27 Oct 2012 22:18:56 GMT Date: Sat, 27 Oct 2012 22:03:56 GMT Content-Length: 7124 Connection: keep-alive I am wondering, if it's really the fastest solution to power the website? By investigating it thru http://www.just-ping.com/ it seems, that from many places the ping is very high, and during quick investigation I found that they use GeoIP to resolve addresses based on WHOIS, which is not accurate and because of that from many places the ping is above 300ms (for example, if ISP is in balgladore and request is routed to bangladore even if it's 300ms, for period of 1 month), while by just using Amazon Web Services and Route 53 Anycast DNS servers and only 4 EC2 instances it seems that for example India is always below 100ms, while using Akamai it goes above 300ms in some cases, and this is because Route 53 is using BGP. By quickly checking the Akamai, it seems that they are not getting feedback from the traffic - the high ping stays constant even if I keep downloading large files and videos, which is opposite to what they say on their website. They state, that they optimize the performance by taking feedback from the requests, while it seems they just use GeoIP with per City resolution (which are mostly big cities). Because of this, AWS with Route 53 / Anycast DNS seems to be much more reliable, as well EdgeCast which is using BGP, but I dont know how much does it cost to deploy static website. Actually, I dont know if EdgeCast is not a lie, because from isolated places there are many errors - so their performance is at the cost of quality of delivery, because of BGP switching the routes during transfer of large files. So I was wondering, what is really Akamai good for, because they dont seem to pose any strength in any field in what I do understand now, except they offer some software based WAF on their website, but what I really care about is the core distribiution, so the question is? Is really Akamai good for Videos? For static websites? ??? I found so far AWS most usable with most consistent ping and stable transfers.

Read the article
understanding my site's DNS records

- by DaveM

firstly apologies for using the word 'pointage' this is the word my french domain registrar uses so I may have used to wrong term. OK I would like to better understand what is going on on my 'pointage' record on my domain registrars site. for my (currently empty) web site it reports the following details... Type : Host : Destination A : www.mydomain.org : 62.210.176.146 A : mail.mydomain.org : 84.246.225.176 Mx : .mydomain.org : mail.mydomain.org I think I understand the MX record, that simply relays anything onto the mail.mydomain.org location. However why are the destination for the www and mail domains different. Even more confusing (for me) is the fact that if I ping either of www.mydomain.org or mail.mydomain.org the ping returns a different IP address. This IP address is consistent with that of my server (ie 92.39.247.92). So what exactly is going on ? I'm sure I could find the information on the web,I've read a few thing on the debianhelp site regarding DNS records, and it seems to suggest that the record should be a reverse lookup, but certains isn't the reverse of my servers IP ? but I don't what I should be looking for, so links to docs and search terms for google will be happily accepteed (even though they go against the grain of SO answers to question). thanks in advance. David. ps. I should add that everything seems to work just fine, and I've just descovered this part of the management page of my registrar. Edit: Addition of DNS records and ping results. The DNS record for the site. From what I've read there should only realy be a single 'A' record, so has something gone wrong ? should I change it (remove the extras and then just point www.facilitee.org - .facilitee.org and mail.facilitee.org - .facilitee.org here is the DNS record A www.facilitee.org ? 92.39.247.92 A .facilitee.org ? 92.39.247.92 A mail.facilitee.org ? 92.39.247.92 A webmail.facilitee.org ? 92.39.247.92 MX .facilitee.org ? mail.facilitee.org ping results... ~$ ping www.facilitee.org PING www.facilitee.org (92.39.247.92) 56(84) bytes of data. 64 bytes from vps4576-cloud.dns26.com (92.39.247.92): ~$ ping mail.facilitee.org PING mail.facilitee.org (92.39.247.92) 56(84) bytes of data. 64 bytes from vps4576-cloud.dns26.com (92.39.247.92): So the DNS and the ping correspond, but the 'pointage' doesn't. ~ how can I get a report of the pointage records other than from my registrar ?

Read the article
High CPU usage - symptoms moving from server to server after bouncing

- by grt3kl

First off, I apologize if I didn't include enough information to properly troubleshoot this issue. This sort of thing isn't my specialty, so it is a learning process. If there's something I need to provide, please let me know and I'll be happy to do what I can. The images associated with my question are at the bottom of this post. We are dealing with a clustered environment of four WebLogic 9.2 Java application servers. The cluster utilizes a round-robin load algorithm. Other details include: Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_12-b04) BEA JRockit(R) (build R27.4.0-90_CR352234-91983-1.5.0_12-20071115-1605-linux-x86_64, compiled mode) Basically, I started looking at the servers' performance because our customers are seeing lots of lag at various times of the day. Our servers should easily handle the loads they are given, so it's not clear what's going on. Using HP Performance Manager, I generated some graphs that indicate that the CPU usage is completely out of whack. It seems that, at any given point, one or more of the servers has a CPU utilization of over 50%. I know this isn't particularly high, but I would say it is a red flag based on the CPU utilization of the other servers in the WebLogic cluster. Interesting things to note: The high CPU utilization was occurring only on server02 for several weeks. The server crashed (extremely rare; we are not sure if it's related to this) and upon starting it back up, the CPU utilization was normal on all 4 servers. We restarted all 4 managed servers and the application server (on server01) yesterday, on 2/28. As you can see, server03 and server04 picked up the behavior that was seen on server02 before. The CPU utilization is a Java process owned by the application user (appown). The number of transactions is consistent across all servers. It doesn't seem like any one server is actually handling more than another. If anyone has any ideas or can at least point me in the right direction, that would be great. Again, please let me know if there is any additional information I should post. Thanks!

Read the article
On ESXi, guest machines hang for significant intervals compared to real machines. How can I fix this?

- by Tarbox

This is ESXi version 5.0.0. We plan on upgrading to 5.5 eventually. I have four code profiles, two taken on a real, unvirtualized machine, two taken on a virtual machine. Ordering the list of subroutines by time spent in each one, the two real profiles are practically identical. The two virtual profiles are different from each other and from the real profiles: a subset of subroutines are taking a lot more time on the virtual machines, and the subset is different for each run. The two virtual profiles take a similar amount of time, which is 3 times the amount of time the real profiles take. This gross "how long does it take?" result is consistent after hundreds of tests across three different virtual machines on two different host machines -- the virtual machine is just slower. I've only the code profiling on the four, however. Here's the most guilty set of lines: This is the real machine: 8µs $text = '' unless defined $text; 1.48ms foreach ( split( "\n", $text ) ) { This is the first run on the virtual machine: 20.1ms $text = '' unless defined $text; 1.49ms foreach ( split( "\n", $text ) ) { This is the second run on the virtual machine: 6µs $text = '' unless defined $text; 21.9ms foreach ( split( "\n", $text ) ) { My WAG is that the VM is swapping out the thread and then swapping it back in, destroying some level of cache in the process, but these code profiles were taken when the vm in question was the only active vm on the host, so... what? What does that mean? The guest itself is under light load, this is a latency problem for my users rather than throughput. The host is also under a light load, if I knew what resources to assign where, I could do it without worrying about the cost. I've attempted to lock memory, reserve cpu, assign a restrictive affinity, and disable hyperthread sharing. They don't help, it still takes the VM 2-4x the amount of time to do the same thing as the real machine. The host the tests were run on is 6x2.50GHz, Intel Xeon E5-26400 w/ 16gigs of ram. The guest exhibits the same performance under a wide combination of settings. The real machine is 4x2.13GHz, Xeon E5506 w/ 2 gigs of ram. Thank you for all advice.

Read the article
Single domain name potentially resolving to multiple servers

- by Jace

first time here at Server Fault, and I apologize in advance that this domain stuff is not really my strength. Any and all suggestions are much appreciated. I am completely lost and incredibly tired! I've inherited an incredibly convoluted system from my predecessor, and I'm trying to find a way to solve it - or I need to be told that it just isn't possible. I've got an old site on ServerA (some kind of Linux distribution), with the domain SomeDomain.com There is a new site sitting on ServerB (Ubuntu), with the intention of having SomeDomain.com to serve it in the future (it is replacing the old site) ServerA also has a web app that is currently in use by other departments within the company (accessible at SomeDomain.com/web-app/) The goal: To have SomeDomain.com and all extensions of this domain name (sub-domains, URL's etc.) serve the new site on ServerB. BUT, the URL SomeDomain.com/web-app/ must serve the Web App on ServerA. The Catch: The ServerA is a shared server with a hosting company with VERY limiting restrictions in place - I cannot adjust DNS settings (apart from Name servers - but cannot set A records or anything, I have full access to ServerB to do as I wish). Therefore the web-app MUST be served from SomeDomain.com/web-app/ and not from a sub-domain or anything. These limitations make migrating the web-app from Server A to Server B rather undesirable, AND this web-app will be replaced in the near future, so it isn't worth the effort right now. Therefore, ultimately I will want 1 domain name to resolve to Server B's IP address most of the time, but in the event that the URL is SomeDomain.com/web-app/, it should resolve to Server A's IP. Note: The domain names don't, technically, have to resolve to one IP or another - but ultimately the URL's must stay consistent Some things I have tried: I've looked into mod_rewrite and .htaccess to try and achieve this effect, but it doesn't look like it's going to work for me - but I may have done it wrong (On Server B, I just checked if the request URI was /web-app/ and tried to serve the /web-app/ folder on Server A) I do have the ability to modify the name servers on both servers I am not able to make a sub domain on Server A that points back to Server A (I assume because the hosting company's servers use the URL to determine what site the serve). I figured this could be good as I'd could set an A record on Server B to point to the web app on Server A - but alas, Server A requires SomeDomain.com. If there is any more information I can give, please let me know. I need a nudge in the right direction, ideas or a solution.

Read the article
Weird nfs performance: 1 thread better than 8, 8 better than 2!

- by Joe

I'm trying to determine the cause of poor nfs performance between two Xen Virtual Machines (client & server) running on the same host. Specifically, the speed at which I can sequentially read a 1GB file on the client is much lower than what would be expected based on the measured network connection speed between the two VMs and the measured speed of reading the file directly on the server. The VMs are running Ubuntu 9.04 and the server is using the nfs-kernel-server package. According to various NFS tuning resources, changing the number of nfsd threads (in my case kernel threads) can affect performance. Usually this advice is framed in terms of increasing the number from the default of 8 on heavily-used servers. What I find in my current configuration: RPCNFSDCOUNT=8: (default): 13.5-30 seconds to cat a 1GB file on the client so 35-80MB/sec RPCNFSDCOUNT=16: 18s to cat the file 60MB/s RPCNFSDCOUNT=1: 8-9 seconds to cat the file (!!?!) 125MB/s RPCNFSDCOUNT=2: 87s to cat the file 12MB/s I should mention that the file I'm exporting is on a RevoDrive SSD mounted on the server using Xen's PCI-passthrough; on the server I can cat the file in under seconds ( 250MB/s). I am dropping caches on the client before each test. I don't really want to leave the server configured with just one thread as I'm guessing that won't work so well when there are multiple clients, but I might be misunderstanding how that works. I have repeated the tests a few times (changing the server config in between) and the results are fairly consistent. So my question is: why is the best performance with 1 thread? A few other things I have tried changing, to little or no effect: increasing the values of /proc/sys/net/ipv4/ipfrag_low_thresh and /proc/sys/net/ipv4/ipfrag_high_thresh to 512K, 1M from the default 192K,256K increasing the value of /proc/sys/net/core/rmem_default and /proc/sys/net/core/rmem_max to 1M from the default of 128K mounting with client options rsize=32768, wsize=32768 From the output of sar -d I understand that the actual read sizes going to the underlying device are rather small (<100 bytes) but this doesn't cause a problem when reading the file locally on the client. The RevoDrive actually exposes two "SATA" devices /dev/sda and /dev/sdb, then dmraid picks up a fakeRAID-0 striped across them which I have mounted to /mnt/ssd and then bind-mounted to /export/ssd. I've done local tests on my file using both locations and see the good performance mentioned above. If answers/comments ask for more details I will add them.

Read the article
How can I minimize the amount my router slows down my Internet connection speed?

- by Lord Torgamus

Background I'm working with what I assume is a pretty common Internet setup: a cable modem, a wireless router and a few Internet-connected devices. Lately, I've started being more demanding on my Internet connection, and noticed that using my router slows down my download speeds considerably. I just kind of dealt with it until Zune Marketplace on the Xbox 360 told me that a movie was going to take well over ten hours to download, and I just didn't want to wait that long. Good little scientist that I am, I tried to reduce the problem down to one variable. The test As a control, I turned off all the devices in the house that use wireless Internet, and unplugged all the wired devices except for the Xbox. I also power-cycled both the modem and the router. I then tried to download the movie again, and was told that it would still take over ten hours. Next, I unplugged the router, and connected the Xbox directly to the modem. The movie downloaded in just over one hour. As far as I can tell, this means that my ISP, other cable users near me, the remote servers, anything wireless-related and my machines' disk speeds can't be at fault. A similar experiment that replaced the Xbox with a wired laptop produced similar results. To me, this says "the router is responsible for things taking around ten times longer to download." My question I'd still prefer to use the router for a few reasons: it's a pain to connect and disconnect everything every time there's a big file to download direct connection to the modem isn't good for security only one machine can be connected directly to the modem at a time What can I do to have fast connection speeds while still using the router? I don't mind turning other machines off, as long as I don't have to mess with power and ethernet cables. EDIT : After asking this followup question and then this one, I installed dd-wrt on my router, and I seem to be getting higher and more consistent speeds. Perhaps more importantly, my memory use is fairly constant. I know this isn't an answer — which is why I'm not posting it as an answer — but it is how I resolved the situation, and hopefully it'll be helpful for someone.

Read the article
The blocking nature of aggregates

- by Rob Farley

I wrote a post recently about how query tuning isn’t just about how quickly the query runs – that if you have something (such as SSIS) that is consuming your data (and probably introducing a bottleneck), then it might be more important to have a query which focuses on getting the first bit of data out. You can read that post here. In particular, we looked at two operators that could be used to ensure that a query returns only Distinct rows. and The Sort operator pulls in all the data, sorts it (discarding duplicates), and then pushes out the remaining rows. The Hash Match operator performs a Hashing function on each row as it comes in, and then looks to see if it’s created a Hash it’s seen before. If not, it pushes the row out. The Sort method is quicker, but has to wait until it’s gathered all the data before it can do the sort, and therefore blocks the data flow. But that was my last post. This one’s a bit different. This post is going to look at how Aggregate functions work, which ties nicely into this month’s T-SQL Tuesday. I’ve frequently explained about the fact that DISTINCT and GROUP BY are essentially the same function, although DISTINCT is the poorer cousin because you have less control over it, and you can’t apply aggregate functions. Just like the operators used for Distinct, there are different flavours of Aggregate operators – coming in blocking and non-blocking varieties. The example I like to use to explain this is a pile of playing cards. If I’m handed a pile of cards and asked to count how many cards there are in each suit, it’s going to help if the cards are already ordered. Suppose I’m playing a game of Bridge, I can easily glance at my hand and count how many there are in each suit, because I keep the pile of cards in order. Moving from left to right, I could tell you I have four Hearts in my hand, even before I’ve got to the end. By telling you that I have four Hearts as soon as I know, I demonstrate the principle of a non-blocking operation. This is known as a Stream Aggregate operation. It requires input which is sorted by whichever columns the grouping is on, and it will release a row as soon as the group changes – when I encounter a Spade, I know I don’t have any more Hearts in my hand. Alternatively, if the pile of cards are not sorted, I won’t know how many Hearts I have until I’ve looked through all the cards. In fact, to count them, I basically need to put them into little piles, and when I’ve finished making all those piles, I can count how many there are in each. Because I don’t know any of the final numbers until I’ve seen all the cards, this is blocking. This performs the aggregate function using a Hash Match. Observant readers will remember this from my Distinct example. You might remember that my earlier Hash Match operation – used for Distinct Flow – wasn’t blocking. But this one is. They’re essentially doing a similar operation, applying a Hash function to some data and seeing if the set of values have been seen before, but before, it needs more information than the mere existence of a new set of values, it needs to consider how many of them there are. A lot is dependent here on whether the data coming out of the source is sorted or not, and this is largely determined by the indexes that are being used. If you look in the Properties of an Index Scan, you’ll be able to see whether the order of the data is required by the plan. A property called Ordered will demonstrate this. In this particular example, the second plan is significantly faster, but is dependent on having ordered data. In fact, if I force a Stream Aggregate on unordered data (which I’m doing by telling it to use a different index), a Sort operation is needed, which makes my plan a lot slower. This is all very straight-forward stuff, and information that most people are fully aware of. I’m sure you’ve all read my good friend Paul White (@sql_kiwi)’s post on how the Query Optimizer chooses which type of aggregate function to apply. But let’s take a look at SQL Server Integration Services. SSIS gives us a Aggregate transformation for use in Data Flow Tasks, but it’s described as Blocking. The definitive article on Performance Tuning SSIS uses Sort and Aggregate as examples of Blocking Transformations. I’ve just shown you that Aggregate operations used by the Query Optimizer are not always blocking, but that the SSIS Aggregate component is an example of a blocking transformation. But is it always the case? After all, there are plenty of SSIS Performance Tuning talks out there that describe the value of sorted data in Data Flow Tasks, describing the IsSorted property that can be set through the Advanced Editor of your Source component. And so I set about testing the Aggregate transformation in SSIS, to prove for sure whether providing Sorted data would let the Aggregate transform behave like a Stream Aggregate. (Of course, I knew the answer already, but it helps to be able to demonstrate these things). A query that will produce a million rows in order was in order. Let me rephrase. I used a query which produced the numbers from 1 to 1000000, in a single field, ordered. The IsSorted flag was set on the source output, with the only column as SortKey 1. Performing an Aggregate function over this (counting the number of rows per distinct number) should produce an additional column with 1 in it. If this were being done in T-SQL, the ordered data would allow a Stream Aggregate to be used. In fact, if the Query Optimizer saw that the field had a Unique Index on it, it would be able to skip the Aggregate function completely, and just insert the value 1. This is a shortcut I wouldn’t be expecting from SSIS, but certainly the Stream behaviour would be nice. Unfortunately, it’s not the case. As you can see from the screenshots above, the data is pouring into the Aggregate function, and not being released until all million rows have been seen. It’s not doing a Stream Aggregate at all. This is expected behaviour. (I put that in bold, because I want you to realise this.) An SSIS transformation is a piece of code that runs. It’s a physical operation. When you write T-SQL and ask for an aggregation to be done, it’s a logical operation. The physical operation is either a Stream Aggregate or a Hash Match. In SSIS, you’re telling the system that you want a generic Aggregation, that will have to work with whatever data is passed in. I’m not saying that it wouldn’t be possible to make a sometimes-blocking aggregation component in SSIS. A Custom Component could be created which could detect whether the SortKeys columns of the input matched the Grouping columns of the Aggregation, and either call the blocking code or the non-blocking code as appropriate. One day I’ll make one of those, and publish it on my blog. I’ve done it before with a Script Component, but as Script components are single-use, I was able to handle the data knowing everything about my data flow already. As per my previous post – there are a lot of aspects in which tuning SSIS and tuning execution plans use similar concepts. In both situations, it really helps to have a feel for what’s going on behind the scenes. Considering whether an operation is blocking or not is extremely relevant to performance, and that it’s not always obvious from the surface. In a future post, I’ll show the impact of blocking v non-blocking and synchronous v asynchronous components in SSIS, using some of LobsterPot’s Script Components and Custom Components as examples. When I get that sorted, I’ll make a Stream Aggregate component available for download.

Read the article

Search Results

Search found 1753 results on 71 pages for 'consistent hashing'.

Page 59/71 | < Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66 | Next Page >

- by Halfgaar

- by Synetech

- by GregW

- by Chris

- by n00b

- by E3 Group

- by Jon Skeet

- by ewwhite

- by jcwx86

- by user1444233

- by Alec Matusis

- by Dominique dutra

- by Torious

- by Achinoam

- by Achinoam

- by Unode

- by Dror

- by Andrew Smith

- by DaveM

- by grt3kl

- by Tarbox

- by Jace

- by Joe

- by Lord Torgamus

- by Rob Farley

< Previous Page | 55 56 57 58 59 60 61 62 63 64 65 66 | Next Page >