Search Results

Search found 160 results on 7 pages for 'googlebot'.

Page 7/7 | < Previous Page | 3 4 5 6 7

Unknown Apache2 + PHP5 FastCGI 500 error .. caused by search engine bots?

- by rdjurovich

My Ubuntu server is configured with Apache 2.2.8 and PHP 5.2.4-2ubuntu5.18 in FastCGI mode. Everything works well, except I am seeing 500 errors that only seem to come from bots accessing the server.. for example (access.log): x.125.71.104 - - [16/Nov/2011:10:27:39 +1100] "GET / HTTP/1.1" 500 41377 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" x.40.103.239 - - [16/Nov/2011:11:05:56 +1100] "GET / HTTP/1.0" 500 14717 "-" "Mozilla/5.0 (compatible; mon.itor.us - free monitoring service; http://mon.itor.us)" x.249.67.114 - - [14/Nov/2011:20:57:17 +1100] "GET / HTTP/1.1" 500 101 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" x.55.39.85 - - [14/Nov/2011:19:31:06 +1100] "GET / HTTP/1.1" 500 7032 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._" It is my understanding that a 500 error will be thrown when the PHP process fails to respond to Apache, which could be caused by a fatal PHP error or if PHP runs out of processes.. so my assumption is that either the bots are hitting the server too hard, killing the PHP processes, or something in the request header from bots is causing a fatal error in my PHP script? If anyone can offer advice on this it would be greatly appreciated! Ryan

Read the article
HTTP responses curl and wget different results

- by Fab

To check HTTP response header for a set of urls I send with curl the following request headers foreach ( $urls as $url ) { // Setup headers - I used the same headers from Firefox version 2.0.0.6 $header[ ] = "Accept: text/xml,application/xml,application/xhtml+xml,"; $header[ ] = "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"; $header[ ] = "Cache-Control: max-age=0"; $header[ ] = "Connection: keep-alive"; $header[ ] = "Keep-Alive: 300"; $header[ ] = "Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7"; $header[ ] = "Accept-Language: en-us,en;q=0.5"; $header[ ] = "Pragma: "; // browsers keep this blank. curl_setopt( $ch, CURLOPT_URL, $url ); curl_setopt( $ch, CURLOPT_USERAGENT, 'Googlebot/2.1 (+http://www.google.com/bot.html)'); curl_setopt( $ch, CURLOPT_HTTPHEADER, $header); curl_setopt( $ch, CURLOPT_REFERER, 'http://www.google.com'); curl_setopt( $ch, CURLOPT_HEADER, true ); curl_setopt( $ch, CURLOPT_NOBODY, true ); curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true ); curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true ); curl_setopt( $ch, CURLOPT_HTTPAUTH, CURLAUTH_ANY ); curl_setopt( $ch, CURLOPT_TIMEOUT, 10 ); //timeout 10 seconds } Sometimes I receive 200 OK which is good other time 301, 302, 307 which I consider good as well, but other times I receive weird status as 406, 500, 504 which should identify an invalid url but when I open it on the browser they are fine for example the script returns http://www.awe.co.uk/ => HTTP/1.1 406 Not Acceptable and wget returns wget http://www.awe.co.uk/ --2011-06-23 15:26:26-- http://www.awe.co.uk/ Resolving www.awe.co.uk... 77.73.123.140 Connecting to www.awe.co.uk|77.73.123.140|:80... connected. HTTP request sent, awaiting response... 200 OK Does anyone know which request header I am missing or adding in excess?

Read the article
Why am I experiencing random connection timeouts? (CentOS)

- by Ryan

I have a CentOS server setup that currently hosts several websites (all relative of each other in some form or another). As of recently throughout the day at the most random times the website speed will lag to a crawl and eventually hit a connection timeout. When I say random times this typically happens anywhere between 10am and 1pm usually, however, this morning this happened to me at 8am. I do not have a lot of familiarity with server knowledge as far as what I am looking for in this situation. What are some possible causes of why my server is slowing the websites down to a complete crawl or timing out? Are there specific things I should be checking for when this happens? I have noticed using: tail /var/log/httpd/access_log That usually when this down time occurs there are lot of IP addresses related to BingBot, Googlebot, and sometimes various bots or spiders that I am unfamiliar with. Could this be related and if so how can I avoid this from causing my websites to lag out? Thanks in advance for any help or advice. The websites that are timing out are built with PHP and use a MySQL database to display information.

Read the article
Single page not appearing in Google Search

- by Dan

Description I have a static franchise website which has various sub pages each dedicated to an individual franchisee. For each franchisee the page, the only thing slightly similar between all of them are the page titles, they follow this structure: <title> Welcome to THE_COMPANY - PRODUCT_DESCRIPTION Services, THE_LOCATION </title> THE_COMPANY and PRODUCT_DESCRIPTION are the same across all franchisees, however THE_LOCATION changes depeding on where they are located in the UK. Each franchisee page has the following <meta /> tags: <meta name="DC.creator" content="user"/> <meta name="DC.format" content="text/html"/> <meta name="DC.language" content="en"/> <meta name="DC.date.modified" content="2014-01-23T11:22:31+00:00"/> <meta name="DC.date.created" content="2014-01-23T11:22:09+00:00"/> <meta name="DC.type" content="Page"/> <meta name="DC.distribution" content="Global"/> <meta name="robots" content="ALL"/> <meta name="distribution" content="Global"/> The main content on each franchisee page is completely different. The Problem There is one particular franchisee page, located in Area A.. Which will not appear in Google Search results at all. However every single other franchisee (if you Google Search for "THE_COMPANY, THE_LOCATION" is number 1). And if I do the same search on Bing, Yahoo or DuckDuckGo, the Area A franchisee is the first result on all of them. Has Google for some reason black listed one page on the site? What I Have Tried Ensuring the page is referenced in my sitemap.xml file 'Fetching as Google Bot' the link www.the_company.co.uk/areaa When that came back as OK I would submit to index Resubmitting the sitemap.xml file in Webmaster Tools Linking to the Area A page from another pages content For this I also waited about 3 weeks before checking again to give Google time to re-index Making a change to the page content and waiting another 2 / 3 weeks Removing the page completely and recreating it with an alternative URL The closest thing I have found to this issue is this StackOverflow question but this particular franchisee has existed for almost a year, it used to appear on Google searches however no longer does. I'm guessing the Panda update wasn't too happy with something on the page, but it hasn't effected anything else on the site and I am at a loss for things to try. I would greatly appreciate any information or thoughts as to what could have caused this Thanks. Update In line with Daniel Fukudas answer below, I have followed some of his steps but everything seems to check out alright: HTTP Headers HTTP/1.1 200 OK => Date => Tue, 25 Feb 2014 16:31:29 GMT Server => Zope/(2.12.16, python 2.6.6, linux2) ZServer/1.1 Content-Length => 40078 Expires => Sat, 01 Jan 2000 00:00:00 GMT Content-Type => text/html;charset=utf-8 Content-Language => en Vary => Accept-Encoding Connection => close Robots <meta /> tag: <meta name="robots" content="ALL"/> I have updated this <meta /> tag to read content="INDEX" instead now. robots.txt: User-agent: * Disallow: User-Agent: Googlebot Disallow: /*sendto_form$ Disallow: /*folder_factories$ Using site:THE_COMPANY.co.uk: Searching for 'AREA A site:THE_COMPANY.co.uk' does not return the page, but regardless of that searching just for site:THE_COMPANY.co.uk will not necessarily return every indexed page, or so I understand... Update It appears Google likes to drop pages every now and then from the index, despite my steps above, I left the site alone and the page appeared back in the SERPs by itself.

Read the article
Regexp that matches user-agents of end-user browsers but NOT crawlers with >90 % accuracy

- by knorv

I'm trying to construct a regexp that will evaluate to true for User-Agent:s of "browsers navigated by humans", but false for bots. Needless to say the matching will not be exact, but if it gets things right in say 90 % of cases that is more than good enough. My approach so far is to target the User-Agent string of the the five major desktop browsers (MSIE, Firefox, Chrome, Safari, Opera). Specifically I want the regexp NOT to match if the user-agent is a bot (Googlebot, msnbot, etc.). Currently I'm using the following regexp which appears to achieve the desired precision: ^(Mozilla.*(Gecko|KHTML|MSIE|Presto|Trident)|Opera).*$ I've observed small number of false negatives which are mostly mobile browsers. The exceptions all match: (BlackBerry|HTC|LG|MOT|Nokia|NOKIAN|PLAYSTATION|PSP|SAMSUNG|SonyEricsson) My question is: Given the desired accuracy level, how would you improve the regexp? Can you think of any major false positives or false negatives to the given regexp? Please note that the question is specifically about regexp-based User-Agent matching. There are a bunch of other approaches to solving this problem, but those are out of the scope of this question.

Read the article
CoInitialize fails in dll

- by Quandary

Question: I have the following program, which uses COM to use the Microsoft Speech API (SAPI) to take a text and output it as sound. Now it works fine as long as I have it in a .exe. When I load it as .dll, it fails. Why? I used dependencywalker, and saw the exe doesn't have MSVCR100D and ole32, so I loaded them like this: LoadLibraryA("MSVCR100D.DLL"); LoadLibraryA("ole32.dll"); but it didn't help... Any idea why ? #include <windows.h> #include <sapi.h> #include <cstdlib> int main(int argc, char* argv[]) { ISpVoice * pVoice = NULL; if (FAILED(::CoInitialize(NULL))) return FALSE; HRESULT hr = CoCreateInstance(CLSID_SpVoice, NULL, CLSCTX_ALL, IID_ISpVoice, (void **) &pVoice); if( SUCCEEDED( hr ) ) { hr = pVoice->Speak(L"Noobie was fragged by GSG9 Googlebot", 0, NULL); hr = pVoice->Speak(L"Test Test", 0, NULL); hr = pVoice->Speak(L"This sounds normal <pitch middle = '-10'/> but the pitch drops half way through", SPF_IS_XML, NULL ); pVoice->Release(); pVoice = NULL; } ::CoUninitialize(); return EXIT_SUCCESS; }

Read the article
Detecting 'stealth' web-crawlers

- by Jacco

What options are there to detect web-crawlers that do not want to be detected? (I know that listing detection techniques will allow the smart stealth-crawler programmer to make a better spider, but I do not think that we will ever be able to block smart stealth-crawlers anyway, only the ones that make mistakes.) I'm not talking about the nice crawlers such as googlebot and Yahoo! Slurp. I consider a bot nice if it: identifies itself as a bot in the user agent string reads robots.txt (and obeys it) I'm talking about the bad crawlers, hiding behind common user agents, using my bandwidth and never giving me anything in return. There are some trapdoors that can be constructed updated list (thanks Chris, gs): Adding a directory only listed (marked as disallow) in the robots.txt, Adding invisible links (possibly marked as rel="nofollow"?), style="display: none;" on link or parent container placed underneath another element with higher z-index detect who doesn't understand CaPiTaLiSaTioN, detect who tries to post replies but always fail the Captcha. detect GET requests to POST-only resources detect interval between requests detect order of pages requested detect who (consistently) requests https resources over http detect who does not request image file (this in combination with a list of user-agents of known image capable browsers works surprisingly nice) Some traps would be triggered by both 'good' and 'bad' bots. you could combine those with a whitelist: It trigger a trap It request robots.txt? It doest not trigger another trap because it obeyed robots.txt One other important thing here is: Please consider blind people using a screen readers: give people a way to contact you, or solve a (non-image) Captcha to continue browsing. What methods are there to automatically detect the web crawlers trying to mask themselves as normal human visitors. Update The question is not: How do I catch every crawler. The question is: How can I maximize the chance of detecting a crawler. Some spiders are really good, and actually parse and understand html, xhtml, css javascript, VB script etc... I have no illusions: I won't be able to beat them. You would however be surprised how stupid some crawlers are. With the best example of stupidity (in my opinion) being: cast all URLs to lower case before requesting them. And then there is a whole bunch of crawlers that are just 'not good enough' to avoid the various trapdoors.

Read the article
DNS with name.com and Amazon S3

- by aledalgrande

I have a website on a bucket in Amazon S3, and recently started to get emails from Google "Googlebot can't access your site". When I go to Webmaster Tools and I try to fetch in fact it doesn't work. Also people in locations different from mine sometimes reported they could not access the website. Now for curiosity I tried from my terminal: $ host xxx xxx is an alias for xxx.s3-website-us-west-1.amazonaws.com. xxx.s3-website-us-west-1.amazonaws.com is an alias for s3-website-us-west-1.amazonaws.com. s3-website-us-west-1.amazonaws.com has address yyy.yyy.yyy.yyy And when I try with dig: $ dig xxx ; <<>> DiG 9.8.3-P1 <<>> xxx ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 17860 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;xxx. IN A ;; ANSWER SECTION: xxx. 300 IN CNAME xxx.s3-website-us-west-1.amazonaws.com. xxx.s3-website-us-west-1.amazonaws.com. 60 IN CNAME s3-website-us-west-1.amazonaws.com. s3-website-us-west-1.amazonaws.com. 60 IN A yyy ;; Query time: 1514 msec ;; SERVER: 75.75.75.75#53(75.75.75.75) ;; WHEN: Fri Aug 22 12:32:13 2014 ;; MSG SIZE rcvd: 127 It seems OK to me. Why would Google tell me there is a DNS error? UPDATE: Google also cannot fetch robots.txt, but I can fetch it from my browser. UPDATE 2: I have a forwarding on the root to the www.* hostname: $ dig thenifty.me ; <<>> DiG 9.8.3-P1 <<>> thenifty.me ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49286 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0 ;; QUESTION SECTION: ;thenifty.me. IN A ;; AUTHORITY SECTION: thenifty.me. 300 IN SOA ns1hwy.name.com. support.name.com. 1 10800 3600 604800 300 ;; Query time: 148 msec ;; SERVER: 75.75.75.75#53(75.75.75.75) ;; WHEN: Fri Aug 22 13:32:56 2014 ;; MSG SIZE rcvd: 88

Read the article
Week in Geek: 4chan Falls Victim to DDoS Attack Edition

- by Asian Angel

This week we learned how to tweak the low battery action on a Windows 7 laptop, access an eBook collection anywhere in the world, “extend iPad battery life, batch resize photos, & sync massive music collections”, went on a reign of destruction with Snow Crusher, and had fun decorating our desktops with abstract icon collections. Photo by pasukaru76. Random Geek Links We have included extra news article goodness to help you catch up on any developments that you may have missed during the holiday break this past week. Note: The three 27C3 articles listed here represent three different presentations at the 27th Chaos Communication Congress hacker conference. 4chan victim of DDoS as FBI investigates role in PayPal attack Users of 4chan may have gotten a taste of their own medicine after the site was knocked offline by a DDoS attack from an unknown origin early Thursday morning. Report: FBI seizes server in probe of WikiLeaks attacks The FBI has seized a server in Texas as part of its hunt for the groups behind the pro-WikiLeaks denial-of-service attacks launched in December against PayPal, Visa, MasterCard, and others. Mozilla exposes older user-account database Mozilla has disabled 44,000 older user accounts for its Firefox add-ons site after a security researcher found part of a database of the account information on a publicly available server. Data breach affects 4.9 million Honda customers Japanese automaker Honda has put some 2.2 million customers in the United States on a security breach alert after a database containing information on the owners and their cars was hacked. Chinese Trojan discovered in Android games An Android-based Trojan called “Geinimi” has been discovered in the wild and the Trojan is capable of sending personal information to remote servers and exhibits botnet-like behavior. 27C3 presentation claims many mobiles vulnerable to SMS attacks According to security experts, an ‘SMS of death’ threatens to disable many current Sony Ericsson, Samsung, Motorola, Micromax and LG mobiles. 27C3: GSM cell phones even easier to tap Security researchers have demonstrated how open source software on a number of revamped, entry-level cell phones can decrypt and record mobile phone calls in the GSM network. 27C3: danger lurks in PDF documents Security researcher Julia Wolf has pointed out numerous, previously hardly known, security problems in connection with Adobe’s PDF standard. Critical update for WordPress A critical update has been made available for WordPress in the form of version 3.0.4. The update fixes a security bug in WordPress’s KSES library. McAfee Labs Predicts Geolocation, Mobile Devices and Apple Will Top the List of Targets for Emerging Threats in 2011 The list comprises 2010’s most buzzed about platforms and services, including Google’s Android, Apple’s iPhone, foursquare, Google TV and the Mac OS X platform, which are all expected to become major targets for cybercriminals. McAfee Labs also predicts that politically motivated attacks will be on the rise. Windows Phone 7 piracy materializes with FreeMarketplace A proof-of-concept application, FreeMarketplace, that allows any Windows Phone 7 application to be downloaded and installed free of charge has been developed. Empty email accounts, and some bad buzz for Hotmail In the past few days, a number of Hotmail users have been complaining about a rather disconcerting issue: their Hotmail accounts, some up to 10 years old, appear completely empty. No emails, no folders, nothing, just what appears to be a new account. Reports: Nintendo warns of 3DS risk for kids Nintendo has reportedly issued a warning that the 3DS, its eagerly awaited glasses-free 3D portable gaming device, should not be used by children under 6 when the gadget is in 3D-viewing mode. Google eyes ‘cloaking’ as next antispam target Google plans to take a closer look at the practice of “cloaking,” or presenting one look to a Googlebot crawling one’s site while presenting another look to users. Facebook, Twitter stock trading drawing SEC eye? The high degree of investor interest in shares of hot Silicon Valley companies that aren’t yet publicly traded–like Facebook, Twitter, LinkedIn, and Zynga–may be leading to scrutiny from the U.S. Securities and Exchange Commission (SEC). Random TinyHacker Links Photo by jcraveiro. Exciting Software Set for Release in 2011 A few bloggers from great websites such as How-To Geek, Guiding Tech and 7 Tutorials took the time to sit down and talk about their software wishes for 2011. Take the time to read it and share… Wikileaks Infopr0n An infographic detailing the quest to plug WikiLeaks. The New York Times Guide to Mobile Apps A growing collection of all mobile app coverage by the New York Times as well as lists of favorite apps from Times writers. 7,000,000,000 (Video) A fascinating look at the world’s population via National Geographic Magazine. Super User Questions Check out the great answers to these hot questions from Super User. How to use a Personal computer as a Linux web server for development purposes? How to link processing power of old computers together? Free virtualization tool for testing suspicious files? Why do some actions not work with Remote Desktop? What is the simplest way to send a large batch of pictures to a distant friend or colleague? How-To Geek Weekly Article Recap Had a busy week and need to get caught up on your HTG reading? Then sit back and relax while enjoying these hot posts full of how-to roundup goodness. The 50 Best How-To Geek Windows Articles of 2010 The 20 Best How-To Geek Explainer Topics for 2010 The 20 Best How-To Geek Linux Articles of 2010 How to Search Just the Site You’re Viewing Using Google Search Ask the Readers: Backing Your Files Up – Local Storage versus the Cloud One Year Ago on How-To Geek Need more how-to geekiness for your weekend? Then look through this great batch of articles from one year ago that focus on dual-booting and O.S. installation goodness. Dual Boot Your Pre-Installed Windows 7 Computer with Vista Dual Boot Your Pre-Installed Windows 7 Computer with XP How To Setup a USB Flash Drive to Install Windows 7 Dual Boot Your Pre-Installed Windows 7 Computer with Ubuntu Easily Install Ubuntu Linux with Windows Using the Wubi Installer The Geek Note We hope that you and your families have had a terrific holiday break as everyone prepares to return to work and school this week. Remember to keep those great tips coming in to us at [email protected]! Photo by pjbeardsley. Latest Features How-To Geek ETC The 20 Best How-To Geek Linux Articles of 2010 The 50 Best How-To Geek Windows Articles of 2010 The 20 Best How-To Geek Explainer Topics for 2010 How to Disable Caps Lock Key in Windows 7 or Vista How to Use the Avira Rescue CD to Clean Your Infected PC The Complete List of iPad Tips, Tricks, and Tutorials Tune Pop Enhances Android Music Notifications Another Busy Night in Gotham City Wallpaper Classic Super Mario Brothers Theme for Chrome and Iron Experimental Firefox Builds Put Tabs on the Title Bar (Available for Download) Android Trojan Found in the Wild Chaos, Panic, and Disorder Wallpaper

Read the article
Linux server apache httpd processes take i/o wait to close to 100% and lock down server

- by user3682065

For about 5 days now, and seemingly out of the blue, my linux server has started locking up from time to time. The pattern is always the same as far as I can tell from top and iotop commands around the time it starts happening: One or more httpd processes (usually one) hang and start using up 100% of CPU power, the %wa goes close to 100% and in the iotop I see several httpd processes with 99.99% in the IO column. I'm also running an SVN server on this machine through apache and the one way that I've been consistently able to reproduce this is to do an SVN commit of new files or an SVN update from the repository on this server (I am the only one using this SVN repository). This will always reproduce this scenario successfully, but until very recently I had no problems at all checking in/out of SVN. But sometimes it just happens for no detectable reason at all it seems. So it seems like there is some issue with my Apache that leads it to have processes use up a lot of read/write upon certain triggers. I was wondering if anyone could help me uncover that issue. EDIT: OK now it's happening again: This is top: [root@server ~]# top top - 10:56:54 up 2:59, 5 users, load average: 171.46, 70.35, 27.01 Tasks: 328 total, 2 running, 326 sleeping, 0 stopped, 0 zombie Cpu(s): 1.9%us, 2.0%sy, 0.0%ni, 0.0%id, 96.1%wa, 0.0%hi, 0.0%si, 0.0%st Mem: 2021144k total, 1968192k used, 52952k free, 2500k buffers Swap: 4194288k total, 2938584k used, 1255704k free, 39008k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 10390 apache 20 0 2774m 936m 6200 D 2.0 47.4 1:52.27 httpd 2149 root 20 0 927m 13m 1040 S 0.7 0.7 1:50.46 namecoind 11 root 20 0 0 0 0 R 0.3 0.0 0:30.10 events/0 23 root 20 0 0 0 0 S 0.3 0.0 0:17.88 kblockd/1 2049 root 20 0 382m 4932 2880 D 0.3 0.2 0:03.67 httpd 2144 root 20 0 1702m 69m 1164 S 0.3 3.5 5:19.68 bitcoind 6325 root 20 0 15164 1100 656 R 0.3 0.1 0:11.09 top 10311 apache 20 0 387m 9496 7320 D 0.3 0.5 0:01.89 httpd 10313 apache 20 0 391m 10m 7364 D 0.3 0.5 0:02.40 httpd 10466 apache 20 0 399m 12m 7392 D 0.3 0.7 0:02.41 httpd 10599 apache 20 0 391m 9324 7340 D 0.3 0.5 0:00.15 httpd 10628 apache 20 0 384m 7620 4052 D 0.3 0.4 0:00.01 httpd 10633 apache 20 0 384m 7048 3504 D 0.3 0.3 0:00.01 httpd 10634 apache 20 0 384m 8012 4048 D 0.3 0.4 0:00.02 httpd 10638 apache 20 0 400m 22m 9.8m D 0.3 1.1 0:01.93 httpd 10640 apache 20 0 385m 8288 4028 D 0.3 0.4 0:00.03 httpd 10641 apache 20 0 401m 21m 6376 D 0.3 1.1 0:01.45 httpd 10759 apache 20 0 385m 8816 3480 D 0.3 0.4 0:01.45 httpd 10773 apache 20 0 384m 8044 3464 D 0.3 0.4 0:00.02 httpd This is an iotop snapshot: Total DISK READ: 5.93 M/s | Total DISK WRITE: 0.00 B/s TID PRIO USER DISK READ DISK WRITE SWAPIN IO> COMMAND 10732 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 58.48 % httpd 876 be/3 root 0.00 B/s 52.68 K/s 0.00 % 52.98 % [jbd2/dm-1-8] 10906 be/4 root 124.17 K/s 0.00 B/s 0.00 % 23.03 % sh -c [ -x /usr/local/psa/admin/sbin/backupmng ] && /usr/local/psa/admin/sbin/backupmng >/dev/null 2>&1 2156 be/4 root 206.94 K/s 0.00 B/s 0.00 % 21.15 % bitcoind 10904 be/4 mysql 0.00 B/s 0.00 B/s 0.00 % 18.94 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock 10773 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 14.77 % httpd 10641 be/4 apache 15.05 K/s 0.00 B/s 0.00 % 11.57 % httpd 10399 be/4 apache 1057.29 K/s 0.00 B/s 43.16 % 10.56 % httpd 10682 be/4 sw-cp-se 158.03 K/s 0.00 B/s 0.00 % 7.45 % sw-engine-cgi -c /usr/local/psa/admin/conf/php.ini -d auto_prepend_file=auth.php3 -u psaadm 10774 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 6.53 % httpd 10624 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 5.53 % httpd 10356 be/4 apache 899.26 K/s 0.00 B/s 35.52 % 4.01 % httpd 10795 be/4 apache 0.00 B/s 0.00 B/s 0.00 % 3.93 % httpd 10804 be/4 apache 7.53 K/s 0.00 B/s 0.00 % 3.08 % httpd 4379 be/4 root 2.89 M/s 0.00 B/s 99.99 % 0.00 % namecoind 10619 be/4 apache 462.80 K/s 0.00 B/s 7.80 % 0.00 % httpd 10636 be/4 apache 3.76 K/s 0.00 B/s 0.00 % 0.00 % httpd 10716 be/4 mysql 105.35 K/s 0.00 B/s 5.92 % 0.00 % mysqld --basedir=/usr --datadir=/var/lib/mysql --user=mysql --log-error=/var/log/mysqld.log --pid-file=/var/run/mysqld/mysqld.pid --socket=/var/lib/mysql/mysql.sock 1988 be/4 root 18.81 K/s 0.00 B/s 0.00 % 0.00 % spamd_full.sock I also ran lsof -p for pid 10390 which was way up top under the top command and this is the bottom line where I can sort of see what request this was and it says CLOSE_WAIT: httpd 10390 apache 34u IPv6 315879 0t0 TCP default-domain.com:https->crawl-66-249-65-91.googlebot.com:42907 (CLOSE_WAIT) I'm still not sure what exactly is causing this all to happen though? I killed that service but %wa and load average remain high, I also stopped mysqld and other services. It really only goes down once I stop httpd altogether, and even then I can't start it without finding remaining hanging httpd processes via "netstat -tulpn", killing those or doing "killall -9 httpd" and after waiting a while for it to cycle through all those then doing /etc/init.d/httpd start

Read the article

Developer IT

googlebot - Page 7 - Developer IT

Search Results

Search found 160 results on 7 pages for 'googlebot'.

Page 7/7 | < Previous Page | 3 4 5 6 7

Unknown Apache2 + PHP5 FastCGI 500 error .. caused by search engine bots?

- by rdjurovich

Read the article

HTTP responses curl and wget different results

- by Fab

Read the article

Why am I experiencing random connection timeouts? (CentOS)

- by Ryan

Read the article

Single page not appearing in Google Search

- by Dan

Read the article

Regexp that matches user-agents of end-user browsers but NOT crawlers with >90 % accuracy

- by knorv

Read the article

CoInitialize fails in dll

- by Quandary

Read the article

Detecting 'stealth' web-crawlers

- by Jacco

Read the article

DNS with name.com and Amazon S3

- by aledalgrande

Read the article

Week in Geek: 4chan Falls Victim to DDoS Attack Edition

- by Asian Angel

Read the article

Linux server apache httpd processes take i/o wait to close to 100% and lock down server

- by user3682065

Read the article

< Previous Page | 3 4 5 6 7