Search Results

Search found 499 results on 20 pages for 'robots'.

Page 6/20 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

How to disallow indexing but allow crawling?

- by John Doe

In the front page of my website, I have some previews to articles (with a small introduction to them) that link to the full articles. I want to disallow the front page to prevent duplicate content. But if I do this (in robots.txt), would it still be crawled? I mean, the full articles would be still reached by the crawler even though I disallowed the only page that links to them? I don't want the webcrawler not to access the page and enter the links in them, but I just don't want it to save the information (that will be repeated in the full articles).

Read the article
CDN virtual subdomain causes duplicated content

- by user3474818

I have created a subdomain and a CNAME record which points to the domain root. The subdomain www.static.example.com is actually a copy of the entire website www.example.com and it is supposed to act as an CDN and serve static content in order to improve speed. However, all of my content can be accessed via subdomain aswell, so Google has indexed it all and now I am dealing with duplicated content. How could I deny access to crawlers for the subdomain baring in mind that I do not have different subfolder for subdomain, so I can't create a separate robots.txt file?

Read the article
SEO: disallowing Google from indexing forms in iframes or not?

- by Marco Demaio

I usually place forms in iframes (i.e. order form, request assistance form, contact forms, ect.). Just the forms, I never place other contents or pages in iframes. From a SEO point of view, would you exclude forms from being indexed/crawled by Google or not? I mean my forms hardly ever contains keyword/keyphrases, moreover I obviously place empty title/meta description tags in pages shown in iframe to display forms, cause those titles are never displaied in browser title bar. So I'm wondering what's the point of letting Google index them? Moreover I think these form pages might suck out PR from all other pages that are more valuable for SEO. If your answer is "yes I would exclude them form indexing" would you simply use robots.txt to exclude them? Thanks!

Read the article
Googlebot can't access my site when crawling from rootdomain

- by PéCé

I can't explain why I get this message for my rootdomain result in Google : trocmalin.com/ A description for this result is not available because of this site's robots.txt – learn more. Here is my site specifics : vide-greniers.trocmalin.com is the site address www.trocmalin.com redirects (301) to vide-greniers.trocmalin.com trocmalin.com redirects (301) to vide-greniers.trocmalin.com too... User-agent: * Disallow: /orga/ User-agent: * Disallow: /sitemap-update Google results for vide-greniers.trocmalin.com are well rendered, as well as sub pages allowed for bots. But the result for my rootdomain (trocmalin.com) gives this message... Can you help me ?

Read the article
Will duplicate international (i18n) content hinder SEO rankings?

- by Rhys

Google clearly states that duplicate content within a single, or multiple, domains is not advised. This is understood, but I am not sure of any exceptions for sites with region-specific content that is often replicated across locales. For example, a site's /en-us/about page could be identical to /en-uk/about, whereas most likely /en-ja/about is unique. Are GYM smart enough to understand that the initial URL depth is a locale specifier? Is there any robots.txt or header, etc, trickery that I should include to outline the site's international structure?

Read the article
Search engine bots accessing strange URLs

- by casasoft

We have ELMAH enabled on our site and get errors whenever a Page Not Found error is triggered on the website. We have recently redesigned a new website and so we understand that search engine robots might have previously indexed pages which they try to access and result in a Page Not Found errors. For this reason, we have set up permanent redirects for such previously indexed pages to the respective new pages. The website in mention is www.chambercollege.com and for example, a previously indexed URL was www.chambercollege.com/special-offers.aspx. This page is no longer accessible so we have created the necessary permanent redirect to redirect to the respective page on www.chambercollege.com/en/content/special-offers-161/. Now we are starting to receive Page Not Found errors of search engine bots (e.g. MSN bot) trying to access the URL www.chambercollege.com/special-offers.aspx/images/shadow_right.jpg/. Any idea how could a search engine make up that strange URL and whether you have any suggestions of what to do best?

Read the article
Htaccess/robots.txt to allow search bots to explore main domain but not directory on other domain

- by gX

Ok, I understand the Title didn't make any sense so here I've tried to explain it in detail. I'm using a hosting that gives me space for my domain and lets me "add on" other domains on it. So lets say I have a domain A, and I add on a domain B. Basically my hosting gives me a public_html where I can put stuff that shows when someone visits website A. But, when I add the domain B, it lets me put the content of B, INSIDE of that public_html so that website B.com can also be visited by going to A.com/siteB... Thats all good, except that Google has started indexing B.com as well as A.com/siteB, I'm ok with it indexing B.com, but I somehow want to prevent it from indexing A.com/siteB so that when people search for B, it doesn't end up showing A.com/siteB. Any ideas? Let me know if the question is still unclear.

Read the article
Handling SEO for Infinite pages that cause external slow API calls

- by Noam

I have an 'infinite' amount of pages in my site which rely on an external API. Generating each page takes time (1 minute). Links in the site point to such pages, and when a users clicks them they are generated and he waits. Considering I cannot pre-create them all, I am trying to figure out the best SEO approach to handle these pages. Options: Create really simple pages for the web spiders and only real users will fetch the data and generate the page. A little bit 'afraid' google will see this as low quality content, which might also feel duplicated. Put them under a directory in my site (e.g. /non-generated/) and put a disallow in robots.txt. Problem here is I don't want users to have to deal with a different URL when wanting to share this page or make sense of it. Thought about maybe redirecting real users from this URL back to the regular hierarchy and that way 'fooling' google not to get to them. Again not sure he will like me for that. Letting him crawl these pages. Main problem is I can't control to rate of the API calls and also my site seems slower than it should from a spider's perspective (if he only crawled the generated pages, he'd think it's much faster). Which approach would you suggest?

Read the article
mod_rewrite and SEO friendliness

- by John Doe

My website has an atypical structure and I'm not sure if this could create problems in the long run, specially for SEO positioning purposes. I have a unique, large PHP script, and I use the Apache module mod_rewrite in the .htaccess file to create friendly URLs, for example: RewriteRule ^$ /index.php?section=Main RewriteRule ^createArticle$ /index.php?section=Main&view=CreateArticle RewriteRule ^configuration$ /index.php?section=Configuration RewriteRule ^article/([0-9]{1,10})$ /index.php?section=Article&view=Default&id=$1 RewriteRule ^deleteArticle/([0-9]{1,10})$ /index.php?section=Article&view=Delete&id=$1 RewriteRule ^reportArticle/([0-9]{1,10})$ /index.php?section=Article&view=Report&id=$1 RewriteRule ^logIn$ /index.php?section=Authentication ... So, www.example.com/index.php?section=Article&view=Default&id=105 would become www.example.com/article/105. The only real physical file is index.php, in which the parameters of the URL queried is processed and the corresponding result is outputted. My question is, do the crawling robots (e.g. Googlebot) recognize these links? Do they index the resulting HTML outputted by index.php with the specified parameters as if it was a actual HTML file? Also, would this become a problem when creating a Sitemap?

Read the article
What kind of redirect (301 or 302) for an email links tracker?

- by MaxiWheat

We are developing an email sending application ("à la" Mailchimp). Hyperlinks inserted by our users, in the emails they want to send, are replaced by a tracking URL on our application (https://ourdomain.com/trackingurl?blablabla) which then redirects the email reader to the original URL our users included in their emails. This allows us to record statistics about link clicks. Until now, we used 301 for those redirections, but we noticed that Google began indexing pages on our application which are in fact redirects to other domains. (The title and snippet in Google results are from the other domain, but the link in green is from our application). We took action by adding those urls to our robots.txt, but Google seems to take forever (months!) before removing them for its index and removing them by hand in Webmaster Tools would take a lot of time since there are lot. I would like to know which kind of HTTP redirect (301 or 302) is best suited for this kind of opreation ? Do you think switching to 302 redirects could improve this situation since we don't really want Google to index redirected links from our clients emails ?

Read the article
Does GoogleBot respect User-agent: *

- by rkulla

I blocked a page in robots.txt under User-agent: *, and tried to do a manual removal of that URL from Google's cache in the webmasters tools. Google said it wasn't being blocked in my robots.txt, so I then blocked it specifically under User-agent: GoogleBot and tried removing it again and this time it worked. Does that mean Google doesn't respect User-agent: * or what?

Read the article
[Disallow: /index.php] seems to block /my-beautiful-sef-url-123

- by Jaroslav Záruba

Hello I have robots.txt that looks like this: User-agent: * Disallow: /system/ Disallow: /admin/ Disallow: /index.php The obvious goal has been to prevent all the ugly URLs from being indexed, as they all begin with "/index.php". But for some reason all URLs like /my-beautiful-sef-url-123 are listed under Crawl errors in Google Webmaster Tools with "URL restricted by robots.txt". (When I test such URL it yields Allowed for both Googlebot and Googlebot-Mobile.) Can anyone help please?

Read the article
I'm getting a "Does not implement IController" error on images and robots.txt in MVC2

- by blesh

I'm getting a strange error on my webserver for seemingly every file but the .aspx files. Here is an example. Just replace '/robots.txt' with any .jpg name or .gif or whatever and you'll get the idea: The controller for path '/robots.txt' was not found or does not implement IController. I'm sure it's something to do with how I've setup routing but I'm not sure what exactly I need to do about it. Also, this is a mixed MVC and WebForms site, if that makes a difference.

Read the article
Unit testing Google wave robots in Java

- by Paul

Any tips or best practices for unit testing Google Wave robots written in Java? I'm expecting to deploy on AppEngine, if that helps. I'm a fan of TDD but new to both Wave Robots and AppEngine, so I'm hoping to use TDD to help me explore the design space.

Read the article
What bots are really worth letting onto a site?

- by blunders

Having written a number of bots, and seen the massive amounts of random bots that happen to crawl a site, I am wondering if the goal of the site allowing bots is for the potential for the bot to send real traffic back to the site if there is any reason to allow bots that are not known to be sending real traffic back, and how to spot these "good" bots; based on how they ID themselves, IPs they come from, behaviors, etc.

Read the article
Why deny access to website for msnbot/bingbot?

- by Quandary

I've seen quite a lot of tutorials that recommend you to ban user agents containing the strings libwww-perl and msnbot. I understand why one would ban libwww-perl, it's mainly if not only used for hacking and spamming. But why are there so many sites recommending to ban msnbot/bingbot? Since it's a search engine, even if only with a marginal market share, I would except one would want this bot to crawl one's sites. What is it that msnbot does that makes people ban it?

Read the article
How to get search engines to properly index an ajax driven search page

- by Redtopia

I have an ajax-driven search page that will allow users to search through a large collection of records. Each search result points to index.php?id=xyz (where xyz is the id of the record). The initial view does not have any records listed, and there is no interface that allows you to browse through all records. You can only conduct a search. How do I build the page so that spiders can crawl each record? Or is there another way (outside of this specific search page) that will allow me to point spiders to a list of all records. FYI, the collection is rather large, so dumping links to every record in a single request is not a workable solution. Outputting the records must be done in multiple requests. Each record can be viewed via a single page (eg "record.php?id=xyz"). I would like all the records indexed without anything indexed from the sitemap that shows where the records exist, for example: <a href="/result.php?id=record1">Record 1</a> <a href="/result.php?id=record2">Record 2</a> <a href="/result.php?id=record3">Record 3</a> <a href="/seo.php?page=2">next</a> Assuming this is the correct approach, I have these questions: How would the search engines find the crawl page? Is it possible to prevent the search engines from indexing the words "Record 1", etc. and "next"? Can I output only the links? Or maybe something like:

Read the article
Using rel=canonical and noindex in a 1-n partners enviroment

- by Telemako Mako

We sell a whole site (domain, etc) to partners that create content that is shown together at the main site. What we want to achieve is that the main site copy is the original, but the one that is indexed is the partners copy. We want to do it this way so the search results point to the partner sites but never to the main site while the main site gets all the credit for the links. We are trying setting the main site article with a noindex, follow and a link to the partner article, and in the partner article we have a rel=canonical pointing to the main site article. Are we correct or the noindex at the main site will break the canonical reference?

Read the article
Adsense click bot is click bombing my site

- by Graham

I have a site that get's roughly 7,000 - 10,000 page views per day right now. Starting around 1 AM on 7/1/12 I noticed the CTR was rising dramatically. These clicks would be credited then de-credited soon after. So, they were obviously fraudulent clicks. The next day I had about 200 clicks in account with about 100 of them being fraudulent. It's about 3 - 8 per hour evenly dispersed for each of the three ads 24 hours a day. This leads me to believe that it's some sort of Adsense click bot. Also, I removed the ads last evening then put them back up around 3AM and the invalid clicks started within 10 minutes. I signed up for statcounter.com to analyze the exit links on the Adsense. Then I conditionally blocked ads for the IP address of the person / bot I suspected doing this. But, I think that the bot has several proxies to choose from and can refresh IP addresses. I've notified Google through the invalid click form / email 4 times over the past two days in order to let them know I'm aware of the situation and am working on a solution. I've also temporally removed all ads on that site. How can I block a bot like this? Thank you.

Read the article
Receiving requests where absolute URL on page are morphed to relative URLs

- by Jacob

In our web pages, we have a hyperlink with an href to an absolute URL: https://some.other.host.com/blah.aspx?var1=val1&var2=val2 For some reason, in our logs, we see a lot of requests to URLs of this format: http://our.site.com/https:/some.other.host.com/blah.aspx?var1=val1&var2=val2 We don't have any JavaScript that would request that URL; it only appears inside of a hyperlink. Is there some sort of known bot, browser plugin, bug, etc. that could be responsible for these requests being made?

Read the article
Should I block bots from my site and why?

- by Frank E

My logs are full of bot visitors, often from Eastern Europe and China. The bots are identified as Ahrefs, Seznam, LSSRocketCrawler, Yandex, Sogou and so on. Should I block these bots from my site and why? Which ones have a legitimate purpose in increasing traffic to my site? Many of them are SEO. I have to say I see less traffic if anything since the bots have arrived in large numbers. It would not be too hard to block these since they all admit in their User Agent that they are bots.

Read the article
Uploading a non-finished website

- by Daniel

I have a pretty basic question. I developed a neet little website wich I'm ready to upload, but still needs a bit of work. The designer needs the html to do his work so the website needs to be uploaded. Besides that, I have to correct a couple details, do the friendly-urls, etc. What's the best way to set up the webpage in the definitive hosting with the defintive domain, blocking it to any unknown users and without affecting affecting SEO and those kind of things. If I were to just upload it, the non-definitive website might be crawled by a SE-bot. Thanks!

Read the article
Yandex frequently replaces page names with ampersands

- by Guy

The Yandex spider is a frequent visitor to one of the sites I manage. On ocassion it replaces the page name with two ampersands and a space. So if the page is: /mypage.aspx?param=value then it will try and crawl it as: /&& ?param=value Any idea why it is doing this? [EDIT] If I remember correctly the IP that this "mistake" is coming from is based in California and not Russia. I believe that they crawl US sites from a US based IP address. Not sure if that helps. More Info about request: IP: 199.21.99.82 City: Palo Alto State: California Country: United States ISP: Yandex Inc. User-Agent: Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)

Read the article
How to hide download file from bots? [closed]

- by CJ7

Possible Duplicate: How to restrict the download of all files in a folder? I want to make a private file available for download but not use username/password protection. I want to put the file into a directory called something like download. How can I ensure: the file does not become part of search engine results, and the file cannot be accessed by bots that might guess the directory name?

Read the article
Should I set NOINDEX header for my JS, CSS and image files?

- by Yoga

Are there any harms if my site send NOINDEX headers for all my static assets? For image files, I refer to those valueless, e.g. background images, button images, etc. Update: more background information I have this concern is since recent Google said they also execute JS and they might fetch content via Ajax. So, for example, if I send noindex for my jQuery script, so Google would not be able to use them to load Ajax, I suppose it is not good for my site's SEO, right?

Read the article

Search Results

Search found 499 results on 20 pages for 'robots'.

Page 6/20 | < Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >

- by John Doe

- by user3474818

- by Marco Demaio

- by PéCé

- by Rhys

- by casasoft

- by gX

- by Noam

- by John Doe

- by MaxiWheat

- by rkulla

- by Jaroslav Záruba

- by blesh

- by Paul

- by blunders

- by Quandary

- by Redtopia

- by Telemako Mako

- by Graham

- by Jacob

- by Frank E

- by Daniel

- by Guy

- by CJ7

- by Yoga

< Previous Page | 2 3 4 5 6 7 8 9 10 11 12 13 | Next Page >