Search Results

Search found 247 results on 10 pages for 'bots'.

Page 4/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10  | Next Page >

  • Which token from a long User-Agent should I use in robots.txt?

    - by Gaia
    The definition of User-Agent states that several tokens can be included, as deemed necessary by the client. I want to block certain bots via robots.txt and I am confused as to which part of the User-Agent string to use, especially for more obscure bots. For example: Mozilla/5.0 (compatible; uMBot-LN/1.0; mailto: [email protected])" JS-Kit URL Resolver, http://js-kit.com/ Mozilla/5.0 (compatible; SEOkicks-Robot +http://www.seokicks.de/robot.html Do I use the second token? Can tokens contain spaces, or did the SEOkicks folks forget a semicolon after SEOkicks-Robot? I don't actually intend on making my question specific to a couple bots - I want to know the guideline: which part of UA do I place in robots.txt for these exotic bots with UA as long as a haiku? User-agent: uMBot-LN/1.0 Disallow: / PS: Thank you but I do not need to hear that undesirable bots are better blocked with mod_security. I already have commercial mod_sec rules in place.

    Read the article

  • Blocking 'good' bots in nginx with multiple conditions for certain off-limits URL's where humans can go

    - by Glenn Plas
    After 2 days of searching/trying/failing I decided to post this here, I haven't found any example of someone doing the same nor what I tried seems to be working OK. I'm trying to send a 403 to bots not respecting the robots.txt file (even after downloading it several times). Specifically Googlebot. It will support the following robots.txt definition. User-agent: * Disallow: /*/*/page/ The intent is to allow Google to browse whatever they can find on the site but return a 403 for the following type of request. Googlebot seems to keep on nesting these links eternally adding paging block after block: my_domain.com:80 - 66.x.67.x - - [25/Apr/2012:11:13:54 +0200] "GET /2011/06/ page/3/?/page/2//page/3//page/2//page/3//page/2//page/2//page/4//page/4//pag e/1/&wpmp_switcher=desktop HTTP/1.1" 403 135 "-" "Mozilla/5.0 (compatible; G ooglebot/2.1; +http://www.google.com/bot.html)" It's a wordpress site btw. I don't want those pages to show up, even though after the robots.txt info got through, they stopped for a while only to begin crawling again later. It just never stops .... I do want real people to see this. As you can see, google get a 403 but when I try this myself in a browser I get a 404 back. I want browsers to pass. root@my_domain:# nginx -V nginx version: nginx/1.2.0 I tried different approaches, using a map and plain old nono if's and they both act the same: (under http section) map $http_user_agent $is_bot { default 0; ~crawl|Googlebot|Slurp|spider|bingbot|tracker|click|parser|spider 1; } (under the server section) location ~ /(\d+)/(\d+)/page/ { if ($is_bot) { return 403; # Please respect the robots.txt file ! } } I recently had to polish up my Apache skills for a client where I did about the same thing like this : # Block real Engines , not respecting robots.txt but allowing correct calls to pass # Google RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Googlebot/2\.[01];\ \+http://www\.google\.com/bot\.html\)$ [NC,OR] # Bing RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ bingbot/2\.[01];\ \+http://www\.bing\.com/bingbot\.htm\)$ [NC,OR] # msnbot RewriteCond %{HTTP_USER_AGENT} ^msnbot-media/1\.[01]\ \(\+http://search\.msn\.com/msnbot\.htm\)$ [NC,OR] # Slurp RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Yahoo!\ Slurp;\ http://help\.yahoo\.com/help/us/ysearch/slurp\)$ [NC] # block all page searches, the rest may pass RewriteCond %{REQUEST_URI} ^(/[0-9]{4}/[0-9]{2}/page/) [OR] # or with the wpmp_switcher=mobile parameter set RewriteCond %{QUERY_STRING} wpmp_switcher=mobile # ISSUE 403 / SERVE ERRORDOCUMENT RewriteRule .* - [F,L] # End if match This does a bit more than I asked nginx to do but it's about the same principle, I'm having a hard time figuring this out for nginx. So my question would be, why would nginx serve my browser a 404 ? Why isn't it passing, The regex isn't matching for my UA: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.30 Safari/536.5" There are tons of example to block based on UA alone, and that's easy. It also looks like the matchin location is final, e.g. it's not 'falling' through for regular user, I'm pretty certain that this has some correlation with the 404 I get in the browser. As a cherry on top of things, I also want google to disregard the parameter wpmp_switcher=mobile , wpmp_switcher=desktop is fine but I just don't want the same content being crawled multiple times. Even though I ended up adding wpmp_switcher=mobile via the google webmaster tools pages (requiring me to sign up ....). that also stopped for a while but today they are back spidering the mobile sections. So in short, I need to find a way for nginx to enforce the robots.txt definitions. Can someone shell out a few minutes of their lives and push me in the right direction please ? I really appreciate ANY response that makes me think harder ;-)

    Read the article

  • how do I block my rails app from being hit by bots?

    - by codeman73
    I'm not even sure I'm using the right terminology, whether this is actually bots or not. I didn't want to use the word 'spam' because it's not like I have comments or posts that are being created/spammed. It looks more like something is making the same repeated request to my domain, which is what made me think it was some kind of bot. I've opened my first rails app to the 'public', which is a really a small group of users, <50 currently. That was last Friday. I started having performance issues today, so I looked at the log and I see tons of these RoutingErrors ActionController::RoutingError (No route matches "/portalApp/APF/pages/business/util/whichServer.jsp" with {:method=>:get}): They are filling up the log and I'm assuming this is causing the slowdown. Note the .jsp on the end and this is a rails app, so I've got no urls remotely like this in my app. I mean, the /portalApp I don't even have, so I don't know where this is coming from. This is hosted at Dreamhost and I chatted with one of their support people, and he suggested a couple sites that detail using htaccess to block things. But that looks like you need to know the IP or domain that the requests are coming from, which I don't. How can I block this? How can I find the IP or domain from the request? Any other suggestions?

    Read the article

  • Do or can robots cause considerable performance issues?

    - by Anicho
    So the question in the title is exactly what I am trying to find out. My case is: At work we are in a discussion with team members who seem to think bots will cause us problems relating to performance when running on our services website. Out setup: Lets say I have site www.mysite.co.uk this is a shop window to our online services which sit on www.mysiteonline.co.uk. When people search in google for mysite they see mysiteonline.co.uk as well as mysite.co.uk. Cases against stopping bots crawling: We don't store gb's of data publicly available on the web Most friendly bots, if they were to cause issues would have done so already In our instance the bots can't crawl the site because it requires username & password Stopping bots with robot .txt causes an issue with seo (ref.1) If it was a malicious bot, it would ignore robot.txt or meta tags anyway Ref 1. If we were to block mysiteonline.co.uk from having robots crawl this will affect seo rankings and make it inconvenient for users who actively search for mysite to find mysiteonline. Which we can prove is the case for a good portion of our users.

    Read the article

  • UDK game Prisoners/Guards

    - by RR_1990
    For school I need to make a little game with UDK, the concept of the game is: The player is the headguard, he will have some other guard (bots) who will follow him. Between the other guards and the player are some prisoners who need to evade the other guards. It needs to look like this My idea was to let the guard bots follow the player at a certain distance and let the prisoners bots in the middle try to evade the guard bots. Now is the problem i'm new to Unreal Script and the school doesn't support me that well. Untill now I have only was able to make the guard bots follow me. I hope you guys can help me or make me something that will make this game work. Here is the class i'm using to let te bots follow me: class ChaseControllerAI extends AIController; var Pawn player; var float minimalDistance; var float speed; var float distanceToPlayer; var vector selfToPlayer; auto state Idle { function BeginState(Name PreviousStateName) { Super.BeginState(PreviousStateName); } event SeePlayer(Pawn p) { player = p; GotoState('Chase'); } Begin: player = none; self.Pawn.Velocity.x = 0.0; self.Pawn.Velocity.Y = 0.0; self.Pawn.Velocity.Z = 0.0; } state Chase { function BeginState(Name PreviousStateName) { Super.BeginState(PreviousStateName); } event PlayerOutOfReach() { `Log("ChaseControllerAI CHASE Player out of reach."); GotoState('Idle'); } // class ChaseController extends AIController; CONTINUED // State Chase (continued) event Tick(float deltaTime) { `Log("ChaseControllerAI in Event Tick."); selfToPlayer = self.player.Location - self.Pawn.Location; distanceToPlayer = Abs(VSize(selfToPlayer)); if (distanceToPlayer > minimalDistance) { PlayerOutOfReach(); } else { self.Pawn.Velocity = Normal(selfToPlayer) * speed; //self.Pawn.Acceleration = Normal(selfToPlayer) * speed; self.Pawn.SetRotation(rotator(selfToPlayer)); self.Pawn.Move(self.Pawn.Velocity*0.001); // or *deltaTime } } Begin: `Log("Current state Chase:Begin: " @GetStateName()@""); } defaultproperties { bAdjustFromWalls=true; bIsPlayer= true; minimalDistance = 1024; //org 1024 speed = 500; }

    Read the article

  • Protecting Apache with Fail2Ban

    - by NetStudent
    Having checked my Apache logs for the last two days I have noticed several attempts to access URLs such as /phpmyadmin, /phpldapadmin: 121.14.241.135 - - [09/Jun/2012:04:37:35 +0100] "GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 404 415 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:35 +0100] "GET /phpMyAdmin/scripts/setup.php HTTP/1.1" 404 405 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:35 +0100] "GET /phpmyadmin/scripts/setup.php HTTP/1.1" 404 404 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:36 +0100] "GET /pma/scripts/setup.php HTTP/1.1" 404 399 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:36 +0100] "GET /myadmin/scripts/setup.php HTTP/1.1" 404 403 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:37 +0100] "GET /MyAdmin/scripts/setup.php HTTP/1.1" 404 403 "-" "ZmEu" 66.249.72.235 - - [09/Jun/2012:07:11:06 +0100] "GET /robots.txt HTTP/1.1" 404 430 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.72.235 - - [09/Jun/2012:07:11:06 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 188.132.178.34 - - [09/Jun/2012:08:39:05 +0100] "HEAD /manager/html HTTP/1.0" 404 166 "-" "-" 95.108.150.235 - - [09/Jun/2012:09:42:09 +0100] "GET /robots.txt HTTP/1.1" 404 432 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:09 +0100] "GET /robots.txt HTTP/1.1" 404 432 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:10 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:10 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:11 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:11 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 194.128.132.2 - - [09/Jun/2012:16:04:41 +0100] "HEAD / HTTP/1.0" 200 260 "-" "-" 66.249.68.176 - - [09/Jun/2012:18:08:12 +0100] "GET /robots.txt HTTP/1.1" 404 430 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.68.176 - - [09/Jun/2012:18:08:13 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 212.3.106.249 - - [09/Jun/2012:18:12:33 +0100] "GET / HTTP/1.1" 200 388 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:34 +0100] "GET /phpldapadmin/ HTTP/1.1" 404 379 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:34 +0100] "GET /phpldapadmin/htdocs/ HTTP/1.1" 404 386 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:35 +0100] "GET /phpldap/ HTTP/1.1" 404 374 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:36 +0100] "GET /phpldap/htdocs/ HTTP/1.1" 404 381 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:36 +0100] "GET /admin/ HTTP/1.1" 404 372 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:38 +0100] "GET /admin/ldap/ HTTP/1.1" 404 377 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:38 +0100] "GET /admin/ldap/htdocs/ HTTP/1.1" 404 384 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:38 +0100] "GET /admin/phpldap/ HTTP/1.1" 404 380 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:39 +0100] "GET /admin/phpldap/htdocs/ HTTP/1.1" 404 387 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:39 +0100] "GET /admin/phpldapadmin/htdocs/ HTTP/1.1" 404 392 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:40 +0100] "GET /admin/phpldapadmin/ HTTP/1.1" 404 385 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:40 +0100] "GET /openldap HTTP/1.1" 404 374 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:41 +0100] "GET /openldap/htdocs HTTP/1.1" 404 381 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:42 +0100] "GET /openldap/htdocs/ HTTP/1.1" 404 382 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:44 +0100] "GET /ldap/ HTTP/1.1" 404 371 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:44 +0100] "GET /ldap/htdocs/ HTTP/1.1" 404 378 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:45 +0100] "GET /ldap/phpldapadmin/ HTTP/1.1" 404 384 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:46 +0100] "GET /ldap/phpldapadmin/htdocs/ HTTP/1.1" 404 391 "-" "-" Is there any way I can use Fail2Ban or any other similar software to ban these IPs in situations when my server is being abused this way (by trying several "common" URLs)?

    Read the article

  • How do bots access directories on a server that are not DocumentRoot of public IP address? How do I stop them?

    - by tmsimont
    I have a local network set up with apache2 and "named" running on OpenSuse 13.1 Linux. I used the "named" service to use my computer as a domain server. I set up my router to point to ask my computer for domain lookups, so I have a chance to have it rewrite a bunch of domains on my network to its own local IP, 192.168.0.111 This works great. I use virtual host configuration to allow various domains and subdomains (re-routed to the same IP via named) to pull up different directories in my computer. For example: <VirtualHost *:80> ServerName 192.168.0.111 ServerAlias fmb.wa.net DocumentRoot /home/work/wa.net/fmb </VirtualHost> <VirtualHost *:80> ServerName 192.168.0.111 ServerAlias postrecord.wa.net DocumentRoot /home/work/wa.net/postrecord </VirtualHost> <VirtualHost *:80> ServerName 192.168.0.111 ServerAlias cvalley.wa.net DocumentRoot /home/work/wa.net/cvalley_local </VirtualHost> This makes it possible for me to hit cvalley.wa.net from any device in my network and get the site that lives in /home/work/wa.net/cvalley_local I decided to forward port 80 to this computer, so I could share a few development sites with coworkers. I can't control which site they see with the same named service, because they'd have to use my computer as their domain name server... So I added a line like this: <VirtualHost *:80> ServerName 192.168.0.111 ServerAlias MY.IP.XXX.XX DocumentRoot /home/work/wa.net/cvalley </VirtualHost> Where "MY.IP.XXX.XX" is my public IP address. This works as expected, when you hit my IP address from a public network you see the site that lives in /home/work/wa.net/cvalley. The point of confusion that I have is that there are public IP addresses in my logs in other sites. I would have expected it to be impossible to access other sites in my network, unless the public user somehow figured out what I'm calling my ServerAliases, and is mimicing my domain set up... How can public traffic be hitting my other local sites? How can I recreate this kind of access? Here are some examples of public IP's hitting my VirtualHost sites: 162.253.66.76 - - [15/Aug/2014:19:20:47 -0600] "GET /xmlrpc.php HTTP/1.0" 404 1004 "-" "-" 162.253.66.74 - - [16/Aug/2014:10:50:28 -0600] "GET / HTTP/1.0" 200 262 "-" "masscan/1.0 (https://github.com/robertdavidgraham/masscan)" 185.4.227.194 - - [16/Aug/2014:11:16:45 -0600] "GET http://24x7-allrequestsallowed.com/?PHPSESSID=1rysxtj500143WQMVT%5E_NAZ%5BQ HTTP/1.1" 200 262 "-" "-" 101.226.254.138 - - [16/Aug/2014:13:32:14 -0600] "HEAD / HTTP/1.0" 200 - "-" "-" 162.253.66.74 - - [16/Aug/2014:14:26:19 -0600] "GET / HTTP/1.0" 200 262 "-" "masscan/1.0 (https://github.com/robertdavidgraham/masscan)" 212.129.2.119 - - [16/Aug/2014:16:00:51 -0600] "HEAD / HTTP/1.0" 200 - "-" "-" 91.240.163.111 - - [16/Aug/2014:18:34:32 -0600] "GET / HTTP/1.0" 200 262 "-" "masscan/1.0 (https://github.com/robertdavidgraham/masscan)" 162.253.66.74 - - [16/Aug/2014:19:02:53 -0600] "GET / HTTP/1.0" 200 262 "-" "masscan/1.0 (https://github.com/robertdavidgraham/masscan)" 122.226.223.69 - - [17/Aug/2014:05:53:09 -0600] "GET http://www.k2proxy.com//hello.html HTTP/1.1" 404 1006 "-" "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; Trident/6.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E)" ::1 - - [17/Aug/2014:10:19:26 -0600] "OPTIONS * HTTP/1.0" 200 - "-" "Apache/2.4.6 (Linux/SUSE) OpenSSL/1.0.1e PHP/5.4.20 (internal dummy connection)" 162.209.65.196 - - [17/Aug/2014:15:31:53 -0600] "HEAD / HTTP/1.0" 200 - "-" "-" 111.206.199.163 - - [18/Aug/2014:11:12:56 -0600] "HEAD / HTTP/1.0" 200 - "-" "-" 37.187.180.168 - - [18/Aug/2014:15:40:00 -0600] "HEAD / HTTP/1.0" 200 - "-" "-" 62.210.38.226 - - [18/Aug/2014:18:35:16 -0600] "HEAD / HTTP/1.0" 200 - "-" "-" Is there anything that I can do to reliably deny public access by default, but allow it only in one VirtualHost?

    Read the article

  • Open Source AI Bot interfaces

    - by David Young
    What are some open source AI Bot interfaces? Similar to Pogamut 3 GameBots2004 for custom Unreal Tournament bots or Brood Wars API for Starcraft bots etc. If you could please post one AI bot interface per answer (make sure to provide a link) and give a brief summary as to the content of the blog posts. Please include what type of bot interface structure it is, client/server, server/server, etc e.g. BWAPI is client/server which emulates a real player

    Read the article

  • Spam bot constantly hitting our site 800-1,000 times a day. Causing loss in sales

    - by akaDanPaul
    For the past 5 months our site has been receiving hits from these 4 sites below; sheratonbd.com newsheraton.com newsheration.com newsheratonltd.com Typically the exact url they come from looks something like this; http://www.newsheraton.com/ClickEarnArea.aspx?loginsession_expiredlogin=85 The spam bot goes to our homepage and stays there for about 1 min and then exist. Luckily we have some pretty beefy servers so it hasn't even come close to overloading our servers yet. Last month I started blocking the IP address's of the spam bots but they seem to keep getting new ones everyday. So far I have blocked over 200 IP address's, below are a few of the ones I have blocked. They all come from Bangladesh. 58.97.238.214 58.97.149.132 180.234.109.108 180.149.31.221 117.18.231.5 117.18.231.12 Since this has been going on for the past 5 months our real site traffic has started to drop, and everyday our orders get lower and lower. Also since these spam bots simply go to our homepage and then leave our bounce rate in analytics has sky rocketed. My questions are; Is it possible that these spam bots are affecting our SEO? 60% of our orders come from natural search, and since this whole thing has started orders have slowly been dropping. What would be the reason someone would want to waste resources in doing this to our site? IP's aren't free and either are domain names, what would be the goal in doing this to us? We have google adwords but don't advertise on extended networks nor advertise in Bangladesh since we don't ship there so they are not making money on adsense. Has anyone experienced anything similar to this? What did you do and what was the final out come?

    Read the article

  • SEO effect of “You are leaving this site” page for outbound links?

    - by Timo Huovinen
    The problem I am working on an aggregation website that collects reviews about specific products from various websites. The site has many thousands of outbound links (with "nofollow" attributes) to the content source websites where the reviews were collected from. The site has far more outbound links than inbound links and I have read that this is bad for SEO. The question Would adding an intermediate «You are leaving this site» disclaimer/warning page like this hurt search engine rankings? And can you provide any links about this topic? p.s. The exit page would be a POST form instead of a script, that notifies the user that he/she is leaving this site and provides a button to continue to the other website. p.p.s This kind of idea is implemented on many forums, aggregation websites with the purpose of warning the user that he/she is leaving this site and to block search engine bots from following those links because search bots do not submit forms.

    Read the article

  • Google Developers SXSW LEGO Rumble

    Google Developers SXSW LEGO Rumble The Google Developers LEGO® MINDSTORMS® rumble returns to SXSW this year with even more epic proportions. After teams spend the day building LEGO race bots controlled by Android, the bots will compete in the ultimate showdown to determine the victors. We'll be broadcasting live the main event with multiple camera angles, slow-mo replay, interviews with the teams, and commentary from judges and attendees to give you an insider pass to all the action. You won't want to miss this showdown. More information can be found at: www.google.com From: GoogleDevelopers Views: 11238 182 ratings Time: 01:37:01 More in Entertainment

    Read the article

  • How do I block a user-agent from Apache

    - by rubo77
    How do I realize a UA string block by regular expression in the config files of my Apache webserver? For example: if I would like to block out all bots from Apache on my debian server, that have the regular expression /\b\w+[Bb]ot\b/ or /Spider/ in their user-agent. Those bots should not be able to see any page on my server and they should not appear neither in the accesslogs nor in the errorlogs. http://global-security.blogspot.de/2009/06/how-to-block-robots-before-they-hit.html supposes to uses mod_security for that, but isn't there a simple directive for http.conf?

    Read the article

  • First Person Shooter game agent development

    - by LangerHansIslands
    I would like to apply (program) the Artificial intelligence methods to create a intelligent game bots for a first person shooter game. Do you have any knowledge from where can I start to develop as a Linux user? Do you have a suggestion for an easy-to-start game for which I can develop bots easily, caring more about the result of my algorithms rather than spending a lot of time dealing with the game code? I've read some publications about the applied methods to Quake 3 (c) and Open Arena. But I couldn't find the source codes and manuals describing how to start coding( for compiling, developing ai and etc.). I appreciate your help.

    Read the article

  • Where can I find an exhaustive list of meta tags and what they do?

    - by leeand00
    It seems to me that there are a ton of <meta> tags for all sorts of different purposes out there... Though they all follow a similar format of <meta name="" content="" /> they seem to serve a vast variety of different purposes from controlling the crawling of search engine bots, providing search engine bots with descriptions of pages, to making sure a page display correctly on a mobile device. These tags fall into so many different categories I was wondering if anyone had a wiki or master list of possible meta tags and their content.

    Read the article

  • How does bing-bot( is that the right spider-name? ) and googlebot interpret 301 redirect?

    - by jbcurtin
    I've been looking for documentation on how the Microsoft and Google bots interpret 301 redirects. It seems that google-bot stores documents on a url based index system. But I haven't been able to figure out how bing works. Should I assume that they are still working towards coping everyone else and assume they use an algorithm close to google? Is it best to just forward a page to a new location via Javascript? I think this might be a blackhat trick, but how would I tell the bots that it's not? Is 301 redirect my best option and I just have to bit the bullet because said pages are no longer in existence? What other options do I have that I might not be aware of?

    Read the article

  • when will google revert back page rank after i cleared network unrechable error

    - by Jayapal Chandran
    For the past one month i was getting network unreachable error. I contacted my web hosting and they said that google bots were blocked if it were causing more traffic. And then they witelisted google bots. Now the errors did not appear but my ranking and search results went down to more than 6 pages or they did not appear at all. Now google is able to read my robots and sitemap. Just yesterday. when will search results and page rank gets to its previous positions? like it were before a month? Most links did not appear in google search result.

    Read the article

  • Does the user agent in any regular browser contain 'bot' or 'crawl'?

    - by Echo
    Does the user agent in any regular browser contain 'bot' or 'crawl'? I check the user agent on my site to see if it is coming from a bot or not. If it is, I can do some little optimizations since they don't login. (I don't change the content at all) After adding checks for 30-40+ bots, I'm getting tired of added them. So I was wondering if checking if it just contains 'bot' or 'crawl'. I know that wont get all bots, but it would get a lot of them. But if that could cause any false positives, then it would totally mess up the ability to add to cart, place an order, and login in.

    Read the article

  • Realtime Twitter Replies?

    - by ejunker
    I have created Twitter bots for many geographic locations. I want to allow users to @-reply to the Twitter bot with commands and then have the bot respond with the results. I would like to have the bot reply to the user as quickly as possible (realtime). Apparently, Twitter used to have an XMPP/Jabber interface that would provide this type of realtime feed of replies but it was shut down. As I see it my options are to use one of the following: REST API This would involve polling every X minutes for each bot. The problem with this is that it is not realtime and each Twitter account would have to be polled. Search API The search API does allow specifying a "-to" parameter in the search and replies to all bots could be aggregated in a search such as "-to bot1 OR -to bot2...". Though if you have hundreds of bots then the search string would get very long and probably exceed the maximum length of a GET request. Streaming API The streaming API looks very promising as it provides realtime results. The API allows you to specify a follow and track parameters. follow is not useful as the bot does not know who will be sending it commands. track allows you to specify keywords to track. This could possibly work by creating a daemon process that connects to the Streaming API and tracks all references to the bot's names. Once again since there are lots of bots to track the length and complexity of the query may be an issue. Another idea would be to track a special hashtag such as #botcommand and then a user could send a command using this syntax @bot1 weather #botcommand. Then by using the Streaming API to track all references to #botcommand would give you a realtime stream of all the commands. Further parsing could then be done to determine which bot to send the command to. Third-party service Are there any third-party companies that have access to the Twitter firehouse and offer realtime data? I haven't investigated these, but here are a few that I have found: Gnip Tweet.IM excla.im TwitterSpy - seems to use polling, not realtime I'm leaning towards using the Streaming API. Is there a better way to get near realtime @-replies for many (hundreds) of Twitter accounts?

    Read the article

  • Developing Bots__Where can I start?

    - by user947659
    i have a pet programming goal: to develop a bot for a game. Now, this won't be anything malicious, and I just want to do this to further my knowledge in programming. Can anyone help me out by pointing to where i can start learning how to develop a bot? the type of bots i want to make are video game bots (online multiplayer, first person shooters, and offline games(like solitaire and such)). Thanks!

    Read the article

  • Capturing Drupal7 DOM content before page load for comparison

    - by ehime
    We have an MU (Multisite) installation of Drupal7 here at work, and are trying to temporarily hold back the swarm of bots we receive until we get a chance to load our content. I wrote a quick and and dirty script to send 503 headers if we find a certain criteria in Xpath (This can ALSO be done as a strpos/preg_match if DOM is not formed). In order to get the ball rolling though I need to figure out how to either A) Hijack the Drupal7 bootstrap and pull all content through this filter below B) ob_flush content through the filter before content is loaded The issue that I am having is figuring out exactly where I can catch the content at? I thought that index.php in Drupal7 would be the suspect, but I'm a little confused as to where or how I should capture the contents. Here's the script, and hopefully someone can point me in the right direction. //error_reporting(-1); /* start query */ $dom = new DOMDocument; $dom->preserveWhiteSpace = false; $dom->Load($_SERVER['PHP_SELF']); $xpath = new DOMXPath($dom); //if this exists we aren't ready to be read by bots $query = $xpath->query(".//*[@id='block-views-about-this-site-block']/div/div/div"); //or $query = 'klat-badge'; //if this is a string not DOM /* end query */ if(strpos($query) !== false) { //require banlist require('botlist.php'); $str = strtolower('/'.implode('|', array_unique($list)).'/i'); if(preg_match($str, strtolower($_SERVER['HTTP_USER_AGENT']))) { //so tell bots we're broken header('HTTP/1.1 503 Service Temporarily Unavailable'); header('Status: 503 Service Temporarily Unavailable'); exit; } }

    Read the article

  • PHP Script causing high CPU

    - by user20996
    I have a website which is causing me a major headache. There is one PHP script which is using far to much CPU, this only seem to happen when bots hit the site, I dont want to block all the bots because we need them. I have the the process manager output: Pid Owner Priority CPU % Memory % Command 16943 (Trace) (Kill) specialone 0 99.4 1.0 /usr/bin/php /var/www/specialone/page.php I ran strace -p 16943 on the process but it comes up with nothing. We have 2GB of RAM and the php memory_limit is set to 128M which should be enough. The trouble I have is the PHP code is a Framework and the culprit page.php pulls in a lot of other PHP files so I cant debug the PHP. Is there any way of finding out what the script is doing when its using so much CPU which will help me solve it?

    Read the article

  • Google Analytics and Whos.amung.us in realtime visitors, why such an enormous discrepancy?

    - by jacouh
    Since years I use in a site both Google Analytics and Whos.amung.us, both Google analytics and whos.amung.us javascripts are inserted in the same pages in the tracked part of the site. In real-time visitors, why such an enormous discrepancy ? for example at the moment, Google analytics gives me 9 visitors, whos.amung.us indicates 59, a ratio of 6 times? Why whos.amung.us is 6 times optimistic than Google Analytics in terms of the realtime visitors? Google whos.amung.us My question is: whos.amung.us does not detect robots while Google does? GA ignores visitors from some countries, not whos.amung.us? Some robots/bots execute whos.amung.us javascript for tracking? While no robots/bots can execute the tracking javascript provided by Google Analytics? To facilitate your analysis, I copy JS code used below: Google analytics: <script type="text/javascript"> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'MyGaAccountNo']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); </script> Whos.amung.us: <script>var _wau = _wau || []; _wau.push(["tab", "MyWAUAccountNo", "c6x", "right-upper"]);(function() { var s=document.createElement("script"); s.async=true; s.src="http://widgets.amung.us/tab.js";document.getElementsByTagName("head")[0].appendChild(s);})();</script> I've aleady signaled this to WAU staff some time ago, NR, I've not done this to Google as they don't handle this kind of feedback. Thank you for your explanations.

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10  | Next Page >