Search Results

Search found 4984 results on 200 pages for 'robots txt'.

Page 3/200 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • How to resolve "Google can't find your site's robots.txt" error?

    - by Manivasagam
    I've recently found that "Google can't find your site's robots.txt" in crawl errors. When I tried Fetching as Google, I got result "SUCCESS", then I tried looking at crawl errors and it still shows "Google can't find your site's robots.txt". What can I do to resolve this issue? Before this issue arose, my site was indexed within a few mintues, but now I find that it took time to be indexed in Google's search. When I access http://mydomain.com/robots.txt, it shows the data below: User-agent: *Disallow: /wp-admin/ Disallow: /wp-includes/ I found Blocked URLs = 0, also no any other errors. Is there any other thing I need to change? Or what could be the solution for this? Any help would be appreciated.

    Read the article

  • Cross-submission robots.txt for multiple domains on single host

    - by sidd.darko
    We are running a site with multiple languages hosted in a single environment on IIS7. For example, oursite.com - english oursite.de - german oursite.es - spanish This is a single-host environment. All of these sites are in the same application space on the same physical machine. I need to do cross-submission of sitemaps via robots.txt. Looking at the sitemap.org guidelines for this suggest this is possible, but the example indicates different physical machines. Will the following entries in oursite.com/robots.txt work? http://www.oursite.com/sitemap-oursite-de.xml http://www.oursite.com/sitemap-oursite-es.xml

    Read the article

  • Amazon SES domain verification TXT DNS record

    - by Skittles
    I currently am trying to get my domain verified on Amazon's SES and running int a problem that google searches are not helping me get any closer to solving. According to SES, I have to create a TXT record in my DNS for the domain I'm trying to verify. Amazon gives you the following (value changed for security purposes); TYPE: TXT NAME: _amazonses.somedomain.com VALUE: M2sXTycXkgZXXuMuWI8TczngaPIDDMToPefzGhZ3uYA= I have tried numerous entries in my registrar's DNS manager, but SES still fails to find what it's looking for. I am not a DNS guru, so, I have tried to construct the TXT record from very sparse examples, at best, to try to get this right. My present TXT record is this; "v=DKIM1 s=_domainkey d=_amazonses.somedomain.com p=M2sXTycXkgZXXuMuWI8TczngaPIDDMToPefzGhZ3uYA=" Am I doing something incorrect? Thanks

    Read the article

  • Should I add a "nofollow" attribute to download links, or disallow the URLs in robots.txt?

    - by Laurent
    I have a download link very similar to Opera's one - it's just a script that sends the file. It doesn't have an extension and there's no obvious way to tell that it's actually a download link. So since I don't want robots to crawl this link, do I need to add it to robots.txt or maybe add a "nofollow" attribute to it? I see that on Opera's website they didn't do either of this, so perhaps it's not necessary?

    Read the article

  • Disallowed images in the robots.txt of my Joomla site can't be displayed when shared in Facebook

    - by opk
    I have noticed that since I have disallowed images using the robots.txt in my Joomla site, when sharing an article in Facebook, the image will not be displayed. Why is that? Is it indeed related? My robots.txt file: User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /cli/ Disallow: /components/ Disallow: /images/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /logs/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/

    Read the article

  • Google Sitemap and Robots.txt Issue

    - by Sarfaraz Soomro
    Hi, We have a sitemap at our site, http://www.gamezebo.com/sitemap.xml Some of the urls in the sitemap, are being reported in the webmaster central as being blocked by our robots.txt, see, gamezebo.com/robots.txt ! Although these urls are not Disallowed in Robots.txt. There are other such urls aswell, for example, gamezebo.com/gamelinks is present in our sitemap, but it's being reported as "URL restricted by robots.txt". Also I have this parse result in the Webmaster Central that says, "Line 21: Crawl-delay: 10 Rule ignored by Googlebot". What does it mean? I appreciate your help, Thanks.

    Read the article

  • How difficult is it to write our own Robots API, similar to G Wave Robots API ? Please read the deta

    - by user169650
    Consider the following entities : a) My own Wave-server b) My own Robots API c) Tomcat d) Google wave server/any other wave server Let us consider that a and d interact with one another via Google wave federation protocol. Now, I want to write my own Robots API in Java (similar to that of G Wave Robots API) using which I want to create Robots; which I want to host in entity c), which may in-turn connect to a) for listening to events and responding with operations. Let us consider that a) is already in place, i.e. implemented. Let us also consider that the Robot running on tomcat and entity a) are co-located, so that we do not need to use JSON-RPC for receiving events/sending operations; instead we can use Java interfaces. Now, my questions are : 1.How much of an effort is it to write my own Robots API to run on a tomcat container ? 2.What are the salient points to be taken care of ? Am I missing some important point here ? 3.How can I reuse some of the classes/packages/interfaces (e.g. com.google.wave.api.AbstractRobot, com.google.wave.api.event) with little/no changes at all ?

    Read the article

  • Robots.txt help

    - by Kyle R
    Google have just thrown up thousands of errors for duplicate content on my link tracker I am using. I want to make it so Google & any other search engines do not visit my pages on the link tracker. I want these pages to disallow these robots, my pages are: http://www.site.com/page1.html http://www.site.com/page2.html How would I write my robots.txt to make all robots not visit these links when they are in my page?

    Read the article

  • Is it safe to block redirected (but still linked) URLs with robots.txt?

    - by Edgar Quintero
    I have a website that has all URLs optimized and 301 redirected from nasty URLs to clean ones. However, everywhere throughout the site the unclean URLs are linked in menus, content, products, etc. Google currently has all clean URLs indexed, along with a few unclean URLs too. So the site still has linked everywhere the old URLs (ideally this wouldn't be the case but this is how it is ATM). I would like to block the unclean URLs with robots.txt. The question: if I block these unclean URLs with the robots.txt, when the entire website is linked with them (but they all redirect to the clean version), will this affect the indexing status at all?

    Read the article

  • Which token from a long User-Agent should I use in robots.txt?

    - by Gaia
    The definition of User-Agent states that several tokens can be included, as deemed necessary by the client. I want to block certain bots via robots.txt and I am confused as to which part of the User-Agent string to use, especially for more obscure bots. For example: Mozilla/5.0 (compatible; uMBot-LN/1.0; mailto: [email protected])" JS-Kit URL Resolver, http://js-kit.com/ Mozilla/5.0 (compatible; SEOkicks-Robot +http://www.seokicks.de/robot.html Do I use the second token? Can tokens contain spaces, or did the SEOkicks folks forget a semicolon after SEOkicks-Robot? I don't actually intend on making my question specific to a couple bots - I want to know the guideline: which part of UA do I place in robots.txt for these exotic bots with UA as long as a haiku? User-agent: uMBot-LN/1.0 Disallow: / PS: Thank you but I do not need to hear that undesirable bots are better blocked with mod_security. I already have commercial mod_sec rules in place.

    Read the article

  • Can I include a robots meta tag outside of the head in HTML snippets indeded to be SSIed?

    - by Dan
    I have a number of files in my site which are not intended for independent viewing, but rather to be AJAXed into content within the site. They obviously don't meet HTML standards (no body, head, etc.) as independent entities. I would like to prevent search engines from indexing these pages, but do not have access to /robots.txt (which would be much more ideal). My question is, could I include the following at the top of these partial HTML files and get the desired results? <meta name="robots" content="noindex, noarchive"> I guess there are two parts to this question. Will this cause any rendering issues in any browsers? Will search engines (at least Google & Bing) interpret this as intended?

    Read the article

  • Is it safe to Block These URLs with Robots.txt?

    - by Edgar Quintero
    I have a website that has all URLs optimized and 301 redirected from nasty URLs to clean ones. However, everywhere throughout the site the unclean URLs are linked in menus, content, products, etc. Google currently has all clean URLs indexed, along with a few unclean URLs too. So the site still has linked everywhere the old URLs (ideally this wouldn't be the case but this is how it is ATM). I would like to block the unclean URLs with robots.txt. The question: If I block these unclean URLs with the robots.txt, when the entire website is linked with them (but they all redirect to the clean version), will this affect the indexing status at all?

    Read the article

  • robots.txt file with more restrictive rules for certain user agents

    - by Carson63000
    Hi, I'm a bit vague on the precise syntax of robots.txt, but what I'm trying to achieve is: Tell all user agents not to crawl certain pages Tell certain user agents not to crawl anything (basically, some pages with enormous amounts of data should never be crawled; and some voracious but useless search engines, e.g. Cuil, should never crawl anything) If I do something like this: User-agent: * Disallow: /path/page1.aspx Disallow: /path/page2.aspx Disallow: /path/page3.aspx User-agent: twiceler Disallow: / ..will it flow through as expected, with all user agents matching the first rule and skipping page1, page2 and page3; and twiceler matching the second rule and skipping everything?

    Read the article

  • PHP robots.txt parsing

    - by omfgroflmao
    Is there an easiest way to do this? function parse_robots_txt($URL){ $parsed = parse_url($URL); $robots = file_get_contents('http://'.$parsed['host'].'/robots.txt',FILE_TEXT); $exploded = explode('user-agent:',strtolower($robots)); foreach($exploded as $user_agent){ $user_agent = trim($user_agent); if(substr($user_agent,0,1) == '*'){ $user_agent = str_replace('#','',preg_replace('/#.*\\n/i','',$user_agent)); $user_agent = str_replace('disallow:','',substr($user_agent,1)); $user_agent = preg_replace('/allow:/i', '+-+-+-+', $user_agent, 1); $user_agent = str_replace('allow:','',$user_agent); print_r(explode('+-+-+-+',$user_agent)); } } }

    Read the article

  • Robots Crawling Across Namespace?

    - by Codex73
    I migrated site from one domain to another. Also placed permanent redirection on old account. My stats logs are capturing this: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) /libro_metaboforte_chap5.php/members/members/file_chap6.php I placed this on robots which wasn't present at time of migration. Robots.txt Contents User-agent: * Allow: / Disallow: /members/ Disallow: /includes/ HTACCESS FILE CONTENTS DirectoryIndex index.php index.html Options +FollowSymlinks RewriteEngine On # Turn on the rewriting engine RewriteBase / RewriteCond %{REQUEST_FILENAME} !-d RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_URI} !^/store/?$ RewriteCond %{QUERY_STRING} !. RewriteRule ^.+/?$ index.php [QSA,L] RewriteCond %{QUERY_STRING} ^curlang=([a-z]*)$ RewriteRule ^.+/?$ index.php? [QSA,L] Will continue to log incoming bot captures. My htaccess does rewrite. I just added the robot file. The funny part is that is stepping in double directories... I don't know if the problem was not having the 'robots.txt' in place or the actual in place htaccess doing rewrites?

    Read the article

  • iphone read txt file from UIWebView

    - by Ni
    I can read the data in file.txt file located in local disk. NSString *filePath = [[NSBundle mainBundle] pathForResource:@"file" ofType:@"txt"]; NSString* Data = [NSString stringWithContentsOfFile:filePath encoding:NSUTF8StringEncoding error:&error ]; now, I upload the file.txt file into a website. how can i read the data from the txt file now from UIWebView? Please help!!

    Read the article

  • Rewrite for robots.txt and favicon.ico

    - by BHare
    I have setup some rules in which subdomains (my users) will default to where I have located the robots.txt, favicon.ico, and crossdomain.xml therefore if a user creates a site say testing.mywebsite.com and they don't make their own favicon.ico at testing.mywebsite.com/favicon.ico, then it will use the favicon.ico I have in /misc/favicon.ico This works perfect, but it doesn't work for the main website. If you attempt to go to mywebsite.com/favicon.ico it will check if "/" exists, in which it does. And then never redirects to /misc/favicon.ico How can I get it so both instances redirect to /misc/favicon.ico ? # Set all crossdomain (openpalace file) favorite icons and robots.txt doesnt exist on their # side, then redirect to site's just to have something to go on. RewriteCond %{REQUEST_URI} crossdomain.xml$ RewriteCond ^(.+)crossdomain.xml !-f RewriteRule ^(.*)$ /misc/crossdomain.xml [L] RewriteCond %{REQUEST_URI} favicon.ico$ RewriteCond ^(.+)favicon.ico !-f RewriteRule ^(.*)$ /misc/favicon.ico [L] RewriteCond %{REQUEST_URI} robots.txt$ RewriteCond ^(.+)robots.txt !-f RewriteRule ^(.*)$ /misc/robots.txt [L]

    Read the article

  • Should I disallow(robots.txt) archive/author pages with links already available on the front page? [on hold]

    - by WPRookie82
    I am working on a simple Wordpress blog where when an article is published, it appears on ALL these pages: Homepage - Headline(clickable) + 3-line summary Parent category page - Headline(clickable) + 3-line summary Child category page - Headline(clickable) + 3-line summary Author page - Headline(clickable) sitemap.xml I've been told that I should add all author pages to my robots.txt, under disallow, so as search engine bots do not spider /author/* since all links on these pages are available elsewhere. Is this a good approach or maybe rel=nofollow is better, or maybe I shouldn't worry about this at all?

    Read the article

  • how to read the txt file from the database(line by line)

    - by Ranjana
    i have stored the txt file to sql server database . i need to read the txt file line by line to get the content in it. my code : DataTable dtDeleteFolderFile = new DataTable(); dtDeleteFolderFile = objutility.GetData("GetTxtFileonFileName", new object[] { ddlSelectFile.SelectedItem.Text }).Tables[0]; foreach (DataRow dr in dtDeleteFolderFile.Rows) { name = dr["FileName"].ToString(); records = Convert.ToInt32(dr["NoOfRecords"].ToString()); bytes = (Byte[])dr["Data"]; } FileStream readfile = new FileStream(Server.MapPath("txtfiles/" + name), FileMode.Open); StreamReader streamreader = new StreamReader(readfile); string line = ""; line = streamreader.ReadLine(); but here i have used the FileStream to read from the Particular path. but i have saved the txt file in byt format into my Database. how to read the txt file using the byte[] value to get the txt file content, instead of using the Path value.

    Read the article

  • how to read the txt file from database(byte[] to filestream)

    - by Ranjana
    i have stored the txt file to sql server database . i need to read the txt file line by line to get the content in it. my code : DataTable dtDeleteFolderFile = new DataTable(); dtDeleteFolderFile = objutility.GetData("GetTxtFileonFileName", new object[] { ddlSelectFile.SelectedItem.Text }).Tables[0]; foreach (DataRow dr in dtDeleteFolderFile.Rows) { name = dr["FileName"].ToString(); records = Convert.ToInt32(dr["NoOfRecords"].ToString()); bytes = (Byte[])dr["Data"]; } FileStream readfile = new FileStream(Server.MapPath("txtfiles/" + name), FileMode.Open); StreamReader streamreader = new StreamReader(readfile); string line = ""; line = streamreader.ReadLine(); but here i have used the FileStream to read from the Particular path. but i have saved the txt file in byt format into my Database. how to read the txt file using the byte[] value to get the txt file content, instead of using the Path value.

    Read the article

  • Robots.txt syntax

    - by Sinan
    I not expert on robots.txt and i have the following in one of my clients robots.txt User-agent: * Disallow: Disallow: /backup/ Disallow: /stylesheets/ Disallow: /admin/ I am not sure about the second line. Is this line disallows all spiders?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >