Search Results

Search found 160 results on 7 pages for 'googlebot'.

Page 4/7 | < Previous Page | 1 2 3 4 5 6 7  | Next Page >

  • How to make googlebot to crawl a page? [closed]

    - by mamadum
    What is the best way of forcing googlebot to crawl the page? I've put up Google analytics, registered site with Google webmaster. I've done great deal of SEO work on the website with keywords and titles, I took care of microdata. I submitted the site anonymously, I successfully fetched the site and submit for indexing couple of days ago and still nothing. Last time googlebot visited the site is almost 1 month ago and the indexed content is now obsolete. Am I missing something? or is it just a slow process??

    Read the article

  • JS dynamic img change and SEO

    - by Gusepo
    Hi all, I've built a web site using jquery to make nice transitions between content. The code works this way: there are 2 imgs (body and footer) when I click on a link (instead of going to another page) I fade out the 2 imgs and change the src attribute of the 2. When the new imgs are loaded I fade them back in. I'm using SWFaddress to allow user go directly to internal content. Now I'd like to make my content indexed by google and other Search engines, all the text content is inside the imgs, So I've got the text in ALT attribute. My question is: if a dinamically change the imgs ALT attribute using JS, will spiders be able to read it properly? consider that I'm using SWFaddress to create a sitemap.. Thanks

    Read the article

  • 302 vs 301 redirect in this specific case

    - by Binder
    We have a website that displays information in a location based manner, i.e. it detects the IP of the visiting user and redirects him/her to an appropriate landing page; for e.g. a user coming from 'Egypt' will be redirected to http://www.mysite.com/egypt/cairo and a user visting from dubai will be redirected to http://www.mysite.com/uae/dubai, so on and so forth and we cater to multiple locations in the middle-east. Now, we have been advised by our SEO consultant that we should put a 301 (permanent redirect) on http://www.mysite.com to point to http://www.mysite.com/ksa/riyadh I would like to know the negative implications that this would have on Google indexing or otherwise, as I fundamentally disagree with this suggestion and believe that in a siutation like this a 302 redirect would be more appropriate.

    Read the article

  • Anonymous users support vs Google bot

    - by Andy
    I have a User class in my web app that represents a user currently logged in. Every time a user vists a page, a User instance is populated based on authentication data supplied in cookies. A User instance is created even if an anonymous user logs in - and a corresponding new record is created in the User table in the database. This approach allows me to save some state info for the current user regardless of its type. The problem however with this approach is the Google bot, and other non-human web organisms crawling my pages. Every time a bot starts to walk around the site, thousands of useless records will be created in the database, each of them only to be used for a single page. Question: what is the best trade off? How to support anonymous users, save their state, and don't get too much overhead because of cookieless bots?

    Read the article

  • How does bing-bot( is that the right spider-name? ) and googlebot interpret 301 redirect?

    - by jbcurtin
    I've been looking for documentation on how the Microsoft and Google bots interpret 301 redirects. It seems that google-bot stores documents on a url based index system. But I haven't been able to figure out how bing works. Should I assume that they are still working towards coping everyone else and assume they use an algorithm close to google? Is it best to just forward a page to a new location via Javascript? I think this might be a blackhat trick, but how would I tell the bots that it's not? Is 301 redirect my best option and I just have to bit the bullet because said pages are no longer in existence? What other options do I have that I might not be aware of?

    Read the article

  • suspicious crawler activity

    - by ithkuil
    I'm noticing that I get accesses 66.249.66.198 - - [01/Jul/2011:17:13:46 +0200] "GET /img/clip.incubus.torrent.phtml HTTP/1.1" 404 143 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.66.198 - - [01/Jul/2011:17:13:48 +0200] "GET /img/clip.global.deejays.download.phtml HTTP/1.1" 404 143 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" that files don't exist and there is no file on my site that has this content (I hope). Why is googlebot trying out these links? reverse dns and whois state that 66.249.66.198 is really googlebot.

    Read the article

  • 'Buy the app' landing page implementations

    - by benwad
    My site (using Django) has an app that I'm trying to push - I currently have a piece of middleware that redirects the user to a page advertising the app if they're accessing the page on the iPhone, then setting a cookie so that the user isn't bugged by the message every time they visit the site. This works fine, however checking the page with the mobile Googlebot checker shows that the Googlebot gets stuck in the redirect (since it doesn't store cookies) and therefore won't index the proper content. So, I'm trying to think of an alternative implementation that won't hurt the site's Google ranking and won't have any other adverse effects. I've considered a couple of options: Redirect (the current solution), but don't redirect if the user agent matches the Googlebot's UA string. This would be ideal, however I'm not sure if Google like their bot being treated differently from other users, and I'm afraid the site's ranking may be somehow penalised if I go ahead with this. Use a Javascript popup instead of a redirect. This would make sure the Googlebot finds the content it needs, however I envision this approach causing compatibility issues with the myriad mobile devices/browsers out there, and may affect the page load time. How valid are these options? And is there a better option for implementing this feature out there? I've tried researching this topic but surprisingly can't find any reputable-looking blog posts that explore this topic. EDIT: I posted this on SF because it seemed unsuitable for SO, but if there's another site that would be better for this issue then I'd be happy to move the question elsewhere.

    Read the article

  • Is it possible to tell a search engine not to index a specific section of an HTML page? [closed]

    - by Justin
    Possible Duplicate: Preventing robots from crawling specific part of a page I know you can use robots.txt to ignore entire pages or sections of your site, but is there a way to tell cralwers like the Googlebot to ignore specific sections of an HTML page? I found this blog post that discusses one method, but it appears only to work for the Google Search Appliance, not the Googlebot. Is there some method for at least Google for to do this?

    Read the article

  • Protecting Apache with Fail2Ban

    - by NetStudent
    Having checked my Apache logs for the last two days I have noticed several attempts to access URLs such as /phpmyadmin, /phpldapadmin: 121.14.241.135 - - [09/Jun/2012:04:37:35 +0100] "GET /w00tw00t.at.blackhats.romanian.anti-sec:) HTTP/1.1" 404 415 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:35 +0100] "GET /phpMyAdmin/scripts/setup.php HTTP/1.1" 404 405 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:35 +0100] "GET /phpmyadmin/scripts/setup.php HTTP/1.1" 404 404 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:36 +0100] "GET /pma/scripts/setup.php HTTP/1.1" 404 399 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:36 +0100] "GET /myadmin/scripts/setup.php HTTP/1.1" 404 403 "-" "ZmEu" 121.14.241.135 - - [09/Jun/2012:04:37:37 +0100] "GET /MyAdmin/scripts/setup.php HTTP/1.1" 404 403 "-" "ZmEu" 66.249.72.235 - - [09/Jun/2012:07:11:06 +0100] "GET /robots.txt HTTP/1.1" 404 430 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.72.235 - - [09/Jun/2012:07:11:06 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 188.132.178.34 - - [09/Jun/2012:08:39:05 +0100] "HEAD /manager/html HTTP/1.0" 404 166 "-" "-" 95.108.150.235 - - [09/Jun/2012:09:42:09 +0100] "GET /robots.txt HTTP/1.1" 404 432 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:09 +0100] "GET /robots.txt HTTP/1.1" 404 432 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:10 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:10 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:11 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 95.108.150.235 - - [09/Jun/2012:09:42:11 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)" 194.128.132.2 - - [09/Jun/2012:16:04:41 +0100] "HEAD / HTTP/1.0" 200 260 "-" "-" 66.249.68.176 - - [09/Jun/2012:18:08:12 +0100] "GET /robots.txt HTTP/1.1" 404 430 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 66.249.68.176 - - [09/Jun/2012:18:08:13 +0100] "GET / HTTP/1.1" 200 424 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" 212.3.106.249 - - [09/Jun/2012:18:12:33 +0100] "GET / HTTP/1.1" 200 388 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:34 +0100] "GET /phpldapadmin/ HTTP/1.1" 404 379 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:34 +0100] "GET /phpldapadmin/htdocs/ HTTP/1.1" 404 386 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:35 +0100] "GET /phpldap/ HTTP/1.1" 404 374 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:36 +0100] "GET /phpldap/htdocs/ HTTP/1.1" 404 381 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:36 +0100] "GET /admin/ HTTP/1.1" 404 372 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:38 +0100] "GET /admin/ldap/ HTTP/1.1" 404 377 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:38 +0100] "GET /admin/ldap/htdocs/ HTTP/1.1" 404 384 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:38 +0100] "GET /admin/phpldap/ HTTP/1.1" 404 380 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:39 +0100] "GET /admin/phpldap/htdocs/ HTTP/1.1" 404 387 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:39 +0100] "GET /admin/phpldapadmin/htdocs/ HTTP/1.1" 404 392 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:40 +0100] "GET /admin/phpldapadmin/ HTTP/1.1" 404 385 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:40 +0100] "GET /openldap HTTP/1.1" 404 374 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:41 +0100] "GET /openldap/htdocs HTTP/1.1" 404 381 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:42 +0100] "GET /openldap/htdocs/ HTTP/1.1" 404 382 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:44 +0100] "GET /ldap/ HTTP/1.1" 404 371 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:44 +0100] "GET /ldap/htdocs/ HTTP/1.1" 404 378 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:45 +0100] "GET /ldap/phpldapadmin/ HTTP/1.1" 404 384 "-" "-" 212.3.106.249 - - [09/Jun/2012:18:12:46 +0100] "GET /ldap/phpldapadmin/htdocs/ HTTP/1.1" 404 391 "-" "-" Is there any way I can use Fail2Ban or any other similar software to ban these IPs in situations when my server is being abused this way (by trying several "common" URLs)?

    Read the article

  • [Disallow: /index.php] seems to block /my-beautiful-sef-url-123

    - by Jaroslav Záruba
    Hello I have robots.txt that looks like this: User-agent: * Disallow: /system/ Disallow: /admin/ Disallow: /index.php The obvious goal has been to prevent all the ugly URLs from being indexed, as they all begin with "/index.php". But for some reason all URLs like /my-beautiful-sef-url-123 are listed under Crawl errors in Google Webmaster Tools with "URL restricted by robots.txt". (When I test such URL it yields Allowed for both Googlebot and Googlebot-Mobile.) Can anyone help please?

    Read the article

  • Is hiding content with JavaScript or "text-indent: -9999px" bad for SEO?

    - by Samuel
    So apparently hiding content using "display: none" is bad for SEO and seen by googlebot as being deceptive. This according to a lot of the posts I read online and questions even on this site. But what if I hide keyword rich text using javascript? A jquery example: $(function() { $('#keywordRichTextContainer').hide(); }); or using visibility hidden: $(function() { $('#keywordRichTextContainer').css({ visibility: 'hidden', position: 'absolute' }); }); Would any of these techniques cause my site to be penalized? If googlebot can't read javascript then if I'm hiding through js it shouldn't know right? What about using "text-indent: -9999px"?

    Read the article

  • Webmaster Tools word count

    - by Henrik Erlandsson
    Is there a way to somehow verify that the googlebot finds the headings and the content, for example by word count? I'm asking this because I tried a program called Screaming Frog, which fails to even fetch the first h1 on a validated page - for about 1/3 of all the pages(!) - and got insecure. Even though the site looks hunky dory in Webmaster Tools, I'd like to know what a googlebot-like content crawler finds on my page and in what order. Any tips on such tools is appreciated. This is not about keyword count.

    Read the article

  • Best way to redirect users back to the pretty URL who land on the _escaped_fragment_ one?

    - by Ryan
    I am working on an AJAX site and have successfully implemented Google's AJAX recommendation by creating _escape_fragment_ versions of each page for it to index. Thus each page has 2 URLs: pretty: example.com#!blog ugly: example.com?_escaped_fragment_=blog However, I have noticed in my analytics that some users are arriving on the site via the "ugly" URL and am looking for a clean way to redirect them to the pretty URL without impacting Google's ability to index the site. I have considered using a 301 redirect in the head but fear that Googlebot might try to follow it and end up in an endless loop. I have also considered using a JavaScript redirect that Googlebot wouldn't execute but fear that Google may interpret this as cloaking and penalize the website. Is there a good, clean, acceptable way to redirect real users away from the ugly URL if for some reason or another they end up arriving at the site that way?

    Read the article

  • Disable outbound links without letting others know that

    - by tadoman
    Is there a way I can tell google not to follow external links ( pointing to other sites) without letting other know. I know you can disable outbound links by putting rel=nofollow or something in robots.txt. But that's something others can see as well. I'm just wondering if there's a way to tell google not to follow those links without letting others know that... like a setting in webmaster tools or something similar ( there's definetly one way. I could set an exception in my conf file for my server to check the user agent to be "googlebot" and then serve a different version of robots.txt. So that when a different user would check that link it would return a different robots.txt thant the one served to googlebot. However I'm not too sure google would be too happy about this) Thank you

    Read the article

  • Blocking 'good' bots in nginx with multiple conditions for certain off-limits URL's where humans can go

    - by Glenn Plas
    After 2 days of searching/trying/failing I decided to post this here, I haven't found any example of someone doing the same nor what I tried seems to be working OK. I'm trying to send a 403 to bots not respecting the robots.txt file (even after downloading it several times). Specifically Googlebot. It will support the following robots.txt definition. User-agent: * Disallow: /*/*/page/ The intent is to allow Google to browse whatever they can find on the site but return a 403 for the following type of request. Googlebot seems to keep on nesting these links eternally adding paging block after block: my_domain.com:80 - 66.x.67.x - - [25/Apr/2012:11:13:54 +0200] "GET /2011/06/ page/3/?/page/2//page/3//page/2//page/3//page/2//page/2//page/4//page/4//pag e/1/&wpmp_switcher=desktop HTTP/1.1" 403 135 "-" "Mozilla/5.0 (compatible; G ooglebot/2.1; +http://www.google.com/bot.html)" It's a wordpress site btw. I don't want those pages to show up, even though after the robots.txt info got through, they stopped for a while only to begin crawling again later. It just never stops .... I do want real people to see this. As you can see, google get a 403 but when I try this myself in a browser I get a 404 back. I want browsers to pass. root@my_domain:# nginx -V nginx version: nginx/1.2.0 I tried different approaches, using a map and plain old nono if's and they both act the same: (under http section) map $http_user_agent $is_bot { default 0; ~crawl|Googlebot|Slurp|spider|bingbot|tracker|click|parser|spider 1; } (under the server section) location ~ /(\d+)/(\d+)/page/ { if ($is_bot) { return 403; # Please respect the robots.txt file ! } } I recently had to polish up my Apache skills for a client where I did about the same thing like this : # Block real Engines , not respecting robots.txt but allowing correct calls to pass # Google RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Googlebot/2\.[01];\ \+http://www\.google\.com/bot\.html\)$ [NC,OR] # Bing RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ bingbot/2\.[01];\ \+http://www\.bing\.com/bingbot\.htm\)$ [NC,OR] # msnbot RewriteCond %{HTTP_USER_AGENT} ^msnbot-media/1\.[01]\ \(\+http://search\.msn\.com/msnbot\.htm\)$ [NC,OR] # Slurp RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ \(compatible;\ Yahoo!\ Slurp;\ http://help\.yahoo\.com/help/us/ysearch/slurp\)$ [NC] # block all page searches, the rest may pass RewriteCond %{REQUEST_URI} ^(/[0-9]{4}/[0-9]{2}/page/) [OR] # or with the wpmp_switcher=mobile parameter set RewriteCond %{QUERY_STRING} wpmp_switcher=mobile # ISSUE 403 / SERVE ERRORDOCUMENT RewriteRule .* - [F,L] # End if match This does a bit more than I asked nginx to do but it's about the same principle, I'm having a hard time figuring this out for nginx. So my question would be, why would nginx serve my browser a 404 ? Why isn't it passing, The regex isn't matching for my UA: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.30 Safari/536.5" There are tons of example to block based on UA alone, and that's easy. It also looks like the matchin location is final, e.g. it's not 'falling' through for regular user, I'm pretty certain that this has some correlation with the 404 I get in the browser. As a cherry on top of things, I also want google to disregard the parameter wpmp_switcher=mobile , wpmp_switcher=desktop is fine but I just don't want the same content being crawled multiple times. Even though I ended up adding wpmp_switcher=mobile via the google webmaster tools pages (requiring me to sign up ....). that also stopped for a while but today they are back spidering the mobile sections. So in short, I need to find a way for nginx to enforce the robots.txt definitions. Can someone shell out a few minutes of their lives and push me in the right direction please ? I really appreciate ANY response that makes me think harder ;-)

    Read the article

  • Google indexing and ranking a custom domain served by Google App Engine

    - by Hugues
    I have a website served on the following URL : "http://www.plugimmo.com" which is a custom domain served by Google App Engine on the following URL : http://plugimmo.appspot.com Since a while I have tried to optimise the Google indexing and ranking with no success. The problem is that searching on Google the keywords in the title of my home page does not retrieve my website at all even not in the 1,000 first results : When checking the cached version of google ( cache:www.plugimmo.com), it says that the cached version is the one of 20-Aug-12 of "plugimmo.appspot.com". It looks there are several issues : 1 - The cached version is really old. I have made a lot of changes since the 20-Aug-12 and I saw the googlebot crawling my site several times. 2 - The cached version is for "plugimmo.appspot.com" 3 - When looking at the Google Webmaster tools, I see that the number of pages indexed for www.plugimmo.com is 0, but that can't be the case given the number of changes I made since then. My questions would therefore be the following : Why is the version of the cache so old although I saw the googlebot crawling the site many times since 20-Aug-12 ? Is there a problem with indexing a custom domain served by Google App Engine ? Why is the Google Webmaster tools showing 0 pages indexed although new pages have been crawled and that no errors have been reported in the indexing ? Also, the site has been developed with Google Web Toolkit. I have followed the guidelines regarding crawling Ajax sites. The home page when crawled by a robot is redirected to http://www.plugimmo.com/HomeSnapshot.html Thanks a lot for your help ! Hugues

    Read the article

  • In Google webmaster tools, can a "soft 404" be triggered by the text on the page?

    - by Stephen Ostermiller
    I just ran across an error in Google Webmaster Tools that I have never seen before. I manage the website for my local community band (I play trombone). One of the pages on the site is a list of our upcoming performances. It is powered by a WordPress events plugin that uses a database of upcoming events that are entered through the administration interface. We just finished up our summer and fall concerts and our next performance will be our Christmas concert. I hadn't gotten around to adding that into the website yet, so there are no upcoming events shown on the page. In fact the text on the page says: No upcoming events listed under Performance. Check out past events for this category or view the full calendar. Then in Google Webmaster Tools, this page is showing up as a "soft 404": The page is returning a 200 status and Google is indicating that he 404 is "soft". I wouldn't have expected Googlebot to be as sophisticated to parse that particular sentence. Is Googlebot able to detect that the text on the page indicates that there is currently not content and then treat it as a 404 page because of that? If Google is treating this page as a soft 404 because of the text on the page, does that mean that like regular 404 pages, the page won't show up in search results?

    Read the article

  • problem with Webmaster Google Sitemap

    - by Alex
    I have a wp mu 3.6.1 with domain mapping (0.5.4.3) with w3tc (0.9.3) and Google XML Sitemaps (4.0 BETA). I have 4 different sitemaps. sub-1.com/sitemap.xml sub-2.com/sitemap.xml sub-3.com/sitemap.xml sub-4.com/sitemap.xml on google webmaster i got 59 errors & 14 warnings. Sitemap errorsErrors: We encountered an error while trying to access your Sitemap. Please ensure your Sitemap follows our guidelines and can be accessed at the location you provided and then resubmit. General HTTP error: 404 not found Sitemap: sub-2.com/sitemap-pt-post-2011-02.xml etc (but when i click on my sitemap links they work fine) Sitemap errorsWarnings: URLs not accessible When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot due to an HTTP status error. All accessible URLs will still be submitted. Sitemap: sub-2.com/sitemap-misc.xml HTTP Error: 404 URL: /sitemap.html (but when i click on my sitemap links they work fine) Sitemap errorsIndex errors URLs not accessible When we tested a sample of the URLs from your Sitemap, we found that some URLs were not accessible to Googlebot due to an HTTP status error. All accessible URLs will still be submitted. HTTP Error: 404 URL: /sitemap-pt-post-2010-09.xml (but when i click on my sitemap links they work fine) Web pages 3,276 Submitted 3,247 Indexed what do I have to put on network adminperformance(w3tc)page cachecache preloadSitemap URL: ? i have add "/sitemap.xml" my robots.txt: http://pastebin.com/3K2U0mQa my .htaccess: http://pastebin.com/efJJ6zwy How can I make it work right?

    Read the article

  • https:// search results appearing on Google for purely http:// site

    - by hydrurga
    I started weeding through my site's search results from Google today, using a site: search, to determine if there are any links that cause 404s and thus need redirecting. To my amazement I noticed numerous https:// results relating to various pages. My site doesn't have a SSL certificate, doesn't serve such pages, doesn't internally link to https:// pages, doesn't include any such files in its sitemap.xml and, for all of these, never has. I decided to do a Google search for https://<my site> and found one site that incorrectly refers to the root of my site with a https:// prefix - I will try to contact them to get them to correct this. I'm not sure however how Googlebot managed to index the non-root files as https://. I can't find any external links to them and surely, without certification, Googlebot should have stalled at the first request? I've just added the following lines to the site's .htaccess (although the surfer still has to navigate through the browser's "This site is a security risk. Abandon hope all ye who enter here!" message(s) first to get there): RewriteEngine On RewriteCond %{HTTPS} on RewriteRule ^(.*)$ http://www.<my site>.org/$1 [R=301,L] replacing <my site> with my domain name. My big question is this though - I would like to use the Google Webmaster Tools Remove URLs feature to remove the https:// pages from the index. Can I be guaranteed that this will only remove the https:// versions of each relevant page and not the valid http:// versions? My thanks to anyone who can help me out with this particular question and the issue in general.

    Read the article

  • Facebook Like javascript related to Time Spent Downloading a page Increase in GWT?

    - by donaldthe
    Hi, I installed the Facebook Like button Javascript version on my website on December 15th. Take a look at this report from Google Webmaster Central. Crawl stats Googlebot activity in the last 90 days The crawl stats are from Googlebot which as far as I know doesn't execute Javascript. Could the Facebook Like Javascript code, "The XFBML version" be related to large spike in Time spent downloading a page? (By the way the huge spike in November was caused by a mistake where every image request was getting a 301.) I'm not sure what caused the spike to go down by half somewhere in December. It may have been related to a faulty setting in web.config. I'm at a loss as to what I can do about this or even how to tell if this is my problem or Googlebots crawl problem. Here is the Facebook code I am using to create the like button. It is right after the opening body tag <div id="fb-root"></div> <script> window.fbAsyncInit = function() { FB.init({appId: 'xxxxx', status: true, cookie: true, xfbml: true}); }; (function() { var e = document.createElement('script'); e.async = true; e.src = document.location.protocol + '//connect.facebook.net/en_US/all.js'; document.getElementById('fb-root').appendChild(e); }()); ` and this creates the like box: <fb:like show_faces="false"></fb:like> If the Javascript can't be the problem any ideas on where to start looking would be appreciated.

    Read the article

  • Can too many 301 redirects cause a DNS error?

    - by Graham
    For a site http://imageocd.com that I just set up I initially spelled the category "automobiles" as "autimobiles"... I know it's rediculous. I then set up over 10,000 pages behind that category e.g. http://imageocd.com/automobiles/hillman-minx-cabrio-pictures-and-wallpapers. So, I set up over 10,000 301 url redirects to change the spelling on automobiles. I just checked my Google Webmasters report and got an error saying: http://www.imageocd.com/: Googlebot can't access your siteSep 7, 2012 Over the last 24 hours, Googlebot encountered 2 errors while attempting to retrieve DNS information for your site. The overall error rate for DNS queries for your site is 66.7%. Could the overabundance of 301 redirects be causing this? I host 13 sites on this dedicated server and all sites are running fine. I also contacted GoDaddy and they said the server is running fine. Any ideas on what might be going on? Also, I have "canonical" set up for every URL. Could this be part of the error? Thanks.

    Read the article

  • Google bot .net and AspxAutoDetectCookieSupport dilemma

    - by nLL
    Hi, i have a .net mobile web site where i use sesion state and due to nature of mobile networks/phones (not all supports session cookies) i had to use <sessionState cookieless="AutoDetect"/> It works fine but because each new session redirected with "AspxAutoDetectCookieSupport=1" i have a feeling that google won't like this. Here is a small sample from my server logs supportForumReadTopic.aspx id=38 80 - 66.249.71.80 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 302 supportForumReadTopic.aspx id=38&AspxAutoDetectCookieSupport=1 Mozilla/5.0+(compatible;+Googlebot/2.1;++http://www.google.com/bot.html) 200 As you see each new hit from google will get 302 to itself. I have a genericmozilla5.browser file where i define google bot as cookie supporting browser in order to get .net not to use cookieless urls but not sure how this 302 would effect me. Any one had smilar exprience? Any ideas? Suggestions? Thanks

    Read the article

  • How to insert scraping data to mysql

    - by user1887288
    i am fetching data from other websites can any one tell me how to insert fetch data to mysql database Below code i am using to fetch results coming $urls = $_POST["urls"]; require_once('simple_html_dom.php'); $useragent = 'Googlebot/2.1 (http://www.googlebot.com/bot.html)'; foreach ($urls as $url) { $curl = curl_init(); curl_setopt($curl, CURLOPT_URL, $url); curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1); curl_setopt($curl, CURLOPT_CONNECTTIMEOUT, 20); curl_setopt($curl, CURLOPT_USERAGENT, $useragent); $str = curl_exec($curl); curl_close($curl); $html= str_get_html($str); foreach($html->find('span.price') as $e) echo $e->innertext . '<br>'; }

    Read the article

< Previous Page | 1 2 3 4 5 6 7  | Next Page >