Restricting crawler activity to certain directories with robots.txt

Posted by neimad on Pro Webmasters See other posts from Pro Webmasters or by neimad
Published on 2011-11-05T13:29:00Z Indexed on 2011/11/19 10:16 UTC
Read the original article Hit count: 381

Filed under:

ruby-on-rails

|

robots.txt

I would like to use robots.txt to prevent indexing of some parts of my website. I want search engines to index only the / directory and not search inside my controllers.

In my robots.txt, I have this:

User-Agent: *
Disallow: /compagnies/
Disallow: /floors/
Disallow: /spaces/
Disallow: /buildings/
Disallow: /users/
Disallow: /

I put this file in /mysite/public. I tested the file with a robots.txt validator and got no errors.

However, Google always returns the result of my site. For testing, I added Disallow: /, but again, Google indexed all pages.

floors, spaces, buildings, etc. are not physical directories. Is this a bug? How can I work around it?

© Pro Webmasters or respective owner

Related posts about ruby-on-rails

Ruby on Rails - How can I start? [closed]

as seen on Programmers - Search for 'Programmers'
I have misconception in understanding the relationship between Ruby language and Ruby on Rails Framework. Because of 'I am an absolute beginner' in web development I have no idea if I have to grasp the fundamentals of Ruby before I go with Ruby on Rails! I also want to ask who is behind both Ruby… >>> More
Ruby on rails: Image downloads with Authentication/Authorization/Time outs

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi Guys, I'm having few doubts on implementing file downloads. I'm creating an app where I use attachment_fu with Amazon s3 to upload files. Things are working pretty well so far on uploading side. Now its the time to start the file downloads. Here is what I need, a logged in user search and browse… >>> More
DES3 decryption in Ruby on Rails

as seen on Stack Overflow - Search for 'Stack Overflow'
My RoR server receives a string, that was encrypted in C++ application using des3 with base64 encoding The cipher object is created so: cipher = OpenSSL::Cipher::Cipher::new("des3") cipher.key = key_str cipher.iv = iv_str key_str and iv_str: are string representations of key and initialization… >>> More
Ruby on Rails deployment, on "thin" server with lot of attachments

as seen on Stack Overflow - Search for 'Stack Overflow'
A lot of PDFs are stored inside MySQL as a BLOB field for each PDF file. The average file size is 500K each. The Rails app will stream the :binary data as file downloads, where there is a user click on the download link. Assume there is a maximum of 5 users downloading 5 PDFs concurrently, what… >>> More
Apply Behavior Driven Development to Ruby on Rails with Rspec

as seen on Internet.com - Search for 'Internet.com'
Applying the Behavior Driven Development (BDD) methodology to your Rails development takes you beyond traditional Test Driven Development. Learn how to do it with the Ruby-based BDD framework RSpec. >>> More

Related posts about robots.txt

Robots.txt practices with .htaccess redirections (inherits)

as seen on Pro Webmasters - Search for 'Pro Webmasters'
I have a question regarding how to write robots.txt files for many domains and subdomains with redirects in place. We have a hosting account that enacts primary and add-on domains. All of our domains and subdomains, including the primary domain, is redirected via htaccess 301s to their own subdirectories… >>> More
mod evasive not working properly on ubuntu 10.04

as seen on Server Fault - Search for 'Server Fault'
I have an ubuntu 10.04 server where I installed mod_evasive using apt-get install libapache2-mod-evasive I already tried several configurations, the result stays the same. The blocking does work, but randomly. I tried with low limis and long blocking periods as well as short limits. The behaviour… >>> More
Cross-domain jQuery using YQL gives robots.txt error

as seen on Stack Overflow - Search for 'Stack Overflow'
On the page http://qxlapps.dk/test.htm I am trying to perform an Ajax load from another domain, qxlapp.dk. I am using James Padolsey's xdomainajax.js plugin from: http://james.padolsey.com/javascript/cross-domain-requests-with-jquery/ When I open my test page, I get no output, but FireBug shows… >>> More
Asterisk in robots.txt

as seen on Stack Overflow - Search for 'Stack Overflow'
Wondering if following will work for google in robots.txt Disallow: /*.action I need to exclude all urls ending with .action. Is this correct? >>> More
SEO chaos from changing robots.txt file in Wordpress site

as seen on Stack Overflow - Search for 'Stack Overflow'
Hi there, I recently edited the robots.txt file in my site using a wordpress plugin. However, since i did this, google seems to have removed my site from their search page. I'd appreciate if I could get an expert opinion on why this is so, and a possible solution. I'd initially done it to increase… >>> More