Search Results

Search found 4783 results on 192 pages for 'txt'.

Page 2/192 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >

  • blocking bad bots with robots.txt in 2012 [closed]

    - by Rachel Sparks
    does it still work good? I have this: # Generated using http://solidshellsecurity.com services # Begin block Bad-Robots from robots.txt User-agent: asterias Disallow:/ User-agent: BackDoorBot/1.0 Disallow:/ User-agent: Black Hole Disallow:/ User-agent: BlowFish/1.0 Disallow:/ User-agent: BotALot Disallow:/ User-agent: BuiltBotTough Disallow:/ User-agent: Bullseye/1.0 Disallow:/ User-agent: BunnySlippers Disallow:/ User-agent: Cegbfeieh Disallow:/ User-agent: CheeseBot Disallow:/ User-agent: CherryPicker Disallow:/ User-agent: CherryPickerElite/1.0 Disallow:/ User-agent: CherryPickerSE/1.0 Disallow:/ User-agent: CopyRightCheck Disallow:/ User-agent: cosmos Disallow:/ User-agent: Crescent Disallow:/ User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow:/ User-agent: DittoSpyder Disallow:/ User-agent: EmailCollector Disallow:/ User-agent: EmailSiphon Disallow:/ User-agent: EmailWolf Disallow:/ User-agent: EroCrawler Disallow:/ User-agent: ExtractorPro Disallow:/ User-agent: Foobot Disallow:/ User-agent: Harvest/1.5 Disallow:/ User-agent: hloader Disallow:/ User-agent: httplib Disallow:/ User-agent: humanlinks Disallow:/ User-agent: InfoNaviRobot Disallow:/ User-agent: JennyBot Disallow:/ User-agent: Kenjin Spider Disallow:/ User-agent: Keyword Density/0.9 Disallow:/ User-agent: LexiBot Disallow:/ User-agent: libWeb/clsHTTP Disallow:/ User-agent: LinkextractorPro Disallow:/ User-agent: LinkScan/8.1a Unix Disallow:/ User-agent: LinkWalker Disallow:/ User-agent: LNSpiderguy Disallow:/ User-agent: lwp-trivial Disallow:/ User-agent: lwp-trivial/1.34 Disallow:/ User-agent: Mata Hari Disallow:/ User-agent: Microsoft URL Control - 5.01.4511 Disallow:/ User-agent: Microsoft URL Control - 6.00.8169 Disallow:/ User-agent: MIIxpc Disallow:/ User-agent: MIIxpc/4.2 Disallow:/ User-agent: Mister PiX Disallow:/ User-agent: moget Disallow:/ User-agent: moget/2.1 Disallow:/ User-agent: mozilla/4 Disallow:/ User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows ME) Disallow:/ User-agent: mozilla/5 Disallow:/ User-agent: NetAnts Disallow:/ User-agent: NICErsPRO Disallow:/ User-agent: Offline Explorer Disallow:/ User-agent: Openfind Disallow:/ User-agent: Openfind data gathere Disallow:/ User-agent: ProPowerBot/2.14 Disallow:/ User-agent: ProWebWalker Disallow:/ User-agent: QueryN Metasearch Disallow:/ User-agent: RepoMonkey Disallow:/ User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow:/ User-agent: RMA Disallow:/ User-agent: SiteSnagger Disallow:/ User-agent: SpankBot Disallow:/ User-agent: spanner Disallow:/ User-agent: suzuran Disallow:/ User-agent: Szukacz/1.4 Disallow:/ User-agent: Teleport Disallow:/ User-agent: TeleportPro Disallow:/ User-agent: Telesoft Disallow:/ User-agent: The Intraformant Disallow:/ User-agent: TheNomad Disallow:/ User-agent: TightTwatBot Disallow:/ User-agent: Titan Disallow:/ User-agent: toCrawl/UrlDispatcher Disallow:/ User-agent: True_Robot Disallow:/ User-agent: True_Robot/1.0 Disallow:/ User-agent: turingos Disallow:/ User-agent: URLy Warning Disallow:/ User-agent: VCI Disallow:/ User-agent: VCI WebViewer VCI WebViewer Win32 Disallow:/ User-agent: Web Image Collector Disallow:/ User-agent: WebAuto Disallow:/ User-agent: WebBandit Disallow:/ User-agent: WebBandit/3.50 Disallow:/ User-agent: WebCopier Disallow:/ User-agent: WebEnhancer Disallow:/ User-agent: WebmasterWorldForumBot Disallow:/ User-agent: WebSauger Disallow:/ User-agent: Website Quester Disallow:/ User-agent: Webster Pro Disallow:/ User-agent: WebStripper Disallow:/ User-agent: WebZip Disallow:/ User-agent: WebZip/4.0 Disallow:/ User-agent: Wget Disallow:/ User-agent: Wget/1.5.3 Disallow:/ User-agent: Wget/1.6 Disallow:/ User-agent: WWW-Collector-E Disallow:/ User-agent: Xenu's Disallow:/ User-agent: Xenu's Link Sleuth 1.1c Disallow:/ User-agent: Zeus Disallow:/ User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow:/

    Read the article

  • Hiding elements based on last closed element jquery script

    - by Jared
    Hi my question is, how can I make this jquery script close all previously opened children when entering a new parent? At the moment it traverses thru all the tree structure fine, but switching from one parent to another does not close the previous children, but rather only the each individual parents elements as a user browses. Here is the jquery I'm using: <script type="text/javascript"> $(document).ready($(function(){ $('#nav>li>ul').hide(); $('.children').hide(); $('#nav>li').mousedown(function(){ // check that the menu is not currently animated if ($('#nav ul:animated').size() == 0) { // create a reference to the active element (this) // so we don't have to keep creating a jQuery object $heading = $(this); // create a reference to visible sibling elements // so we don't have to keep creating a jQuery object $expandedSiblings = $heading.siblings().find('ul:visible'); if ($expandedSiblings.size() > 0) { $expandedSiblings.slideUp(0, function(){ $heading.find('ul').slideDown(0); }); } else { $heading.find('ul').slideDown(0); } } }); $('#nav>li>ul>li').mousedown(function(){ // check that the menu is not currently animated if ($('#nav ul:animated').size() == 0) { // create a reference to the active element (this) // so we don't have to keep creating a jQuery object $heading2 = $(this); // create a reference to visible sibling elements // so we don't have to keep creating a jQuery object $expandedSiblings2 = $heading2.siblings().find('.children:visible'); if ($expandedSiblings2.size() > 0) { $expandedSiblings2.slideUp(0, function(){ $heading2.find('.children').slideDown(0); }); } else { $heading2.find('.children').slideDown(0); } } }); })); </script> and here is my html output <ul id="nav"> <li><a href="#">folder 4</a> <ul><li><a href="#">2001</a> <ul><li class="children"><a href="./directory//folder 4/2001/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder 4/2001/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder 4/2001/doc3.txt">doc3.txt</a></li> </ul> </li> <li><a href="#">2002</a> <ul><li class="children"><a href="./directory//folder 4/2002/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder 4/2002/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder 4/2002/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder 4/2002/doc4.txt">doc4.txt</a></li> </ul> </li> <li><a href="#">2003</a> <ul><li class="children"><a href="./directory//folder 4/2003/Copy of doc1.txt">Copy of doc1.txt</a></li> <li class="children"><a href="./directory//folder 4/2003/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder 4/2003/doc2.txt">doc2.txt</a></li> </ul> </li> <li><a href="#">2004</a> <ul><li class="children"><a href="./directory//folder 4/2004/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder 4/2004/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder 4/2004/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder 4/2004/doc4.txt">doc4.txt</a></li> </ul> </li> </ul> </li> <li><a href="#">folder1</a> <ul><li><a href="#">2001</a> <ul><li class="children"><a href="./directory//folder1/2001/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder1/2001/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder1/2001/doc3.txt">doc3.txt</a></li> </ul> </li> <li><a href="#">2002</a> <ul><li class="children"><a href="./directory//folder1/2002/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder1/2002/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder1/2002/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder1/2002/doc4.txt">doc4.txt</a></li> </ul> </li> <li><a href="#">2003</a> <ul><li class="children"><a href="./directory//folder1/2003/Copy of doc1.txt">Copy of doc1.txt</a></li> <li class="children"><a href="./directory//folder1/2003/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder1/2003/doc2.txt">doc2.txt</a></li> </ul> </li> <li><a href="#">2004</a> <ul><li class="children"><a href="./directory//folder1/2004/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder1/2004/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder1/2004/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder1/2004/doc4.txt">doc4.txt</a></li> </ul> </li> </ul> </li> <li><a href="#">folder2</a> <ul><li><a href="#">2001</a> <ul><li class="children"><a href="./directory//folder2/2001/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder2/2001/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder2/2001/doc3.txt">doc3.txt</a></li> </ul> </li> <li><a href="#">2002</a> <ul><li class="children"><a href="./directory//folder2/2002/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder2/2002/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder2/2002/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder2/2002/doc4.txt">doc4.txt</a></li> </ul> </li> <li><a href="#">2003</a> <ul><li class="children"><a href="./directory//folder2/2003/Copy of doc1.txt">Copy of doc1.txt</a></li> <li class="children"><a href="./directory//folder2/2003/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder2/2003/doc2.txt">doc2.txt</a></li> </ul> </li> <li><a href="#">2004</a> <ul><li class="children"><a href="./directory//folder2/2004/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder2/2004/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder2/2004/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder2/2004/doc4.txt">doc4.txt</a></li> </ul> </li> </ul> </li> <li><a href="#">folder3</a> <ul><li><a href="#">2001</a> <ul><li class="children"><a href="./directory//folder3/2001/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder3/2001/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder3/2001/doc3.txt">doc3.txt</a></li> </ul> </li> <li><a href="#">2002</a> <ul><li class="children"><a href="./directory//folder3/2002/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder3/2002/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder3/2002/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder3/2002/doc4.txt">doc4.txt</a></li> </ul> </li> <li><a href="#">2003</a> <ul><li class="children"><a href="./directory//folder3/2003/Copy of doc1.txt">Copy of doc1.txt</a></li> <li class="children"><a href="./directory//folder3/2003/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder3/2003/doc2.txt">doc2.txt</a></li> </ul> </li> <li><a href="#">2004</a> <ul><li class="children"><a href="./directory//folder3/2004/doc1.txt">doc1.txt</a></li> <li class="children"><a href="./directory//folder3/2004/doc2.txt">doc2.txt</a></li> <li class="children"><a href="./directory//folder3/2004/doc3.txt">doc3.txt</a></li> <li class="children"><a href="./directory//folder3/2004/doc4.txt">doc4.txt</a></li> </ul> </li> </ul> </li> </ul> I assume my problem is, jquery isn't closing the children between each new parent so I need to make a call, but I'm a bit lost on how to do that. I know the code is pretty messy, this project was done in a huge rush and a very tight timeframe. Appreciate your answers and any other constructive comments, cheers :)

    Read the article

  • How to sort the file names in bash in this circumstance?

    - by Nicolas
    I have run a program to generate some results with the different parameters(i.e. the R, C and RP). These results are saved in files named results.txt. Then, I should parse these experimental results to make an analysis. In the params_R_7_C_16_RP_0, the 7 is the value of the parameter R, the 16 is the value of the parameter C and the 0 is the value of the parameter RP. Now, I want to get these results.txt files in the current directory to parse, and sort the path with the parameter values of R,C and RP. I first use the following command to get the results.txt files that I want to parse: find ./ -name "results.txt" and the output is: ./params_R_11_C_9_RP_0/results.txt ./params_R_7_C_9_RP_0/results.txt ./params_R_7_C_4_RP_0/results.txt ./params_R_11_C_16_RP_0/results.txt ./params_R_9_C_4_RP_0/results.txt ./params_R_5_C_9_RP_0/results.txt ./params_R_9_C_25_RP_0/results.txt ./params_R_7_C_16_RP_0/results.txt ./params_R_5_C_25_RP_0/results.txt ./params_R_5_C_16_RP_0/results.txt ./params_R_11_C_4_RP_0/results.txt ./params_R_9_C_16_RP_0/results.txt ./params_R_7_C_25_RP_0/results.txt ./params_R_15_C_4_RP_0/results.txt ./params_R_5_C_4_RP_0/results.txt ./params_R_9_C_9_RP_0/results.txt and I change the command as follows: find ./ -name "results.txt" | sort and the output is: ./params_R_11_C_16_RP_0/results.txt ./params_R_11_C_25_RP_0/results.txt ./params_R_11_C_4_RP_0/results.txt ./params_R_11_C_9_RP_0/results.txt ./params_R_5_C_16_RP_0/results.txt ./params_R_5_C_25_RP_0/results.txt ./params_R_5_C_4_RP_0/results.txt ./params_R_5_C_9_RP_0/results.txt ./params_R_7_C_16_RP_0/results.txt ./params_R_7_C_25_RP_0/results.txt ./params_R_7_C_4_RP_0/results.txt ./params_R_7_C_9_RP_0/results.txt ./params_R_9_C_16_RP_0/results.txt ./params_R_9_C_25_RP_0/results.txt ./params_R_9_C_4_RP_0/results.txt ./params_R_9_C_9_RP_0/results.txt But I want it output as following: ./params_R_5_C_4_RP_0/results.txt ./params_R_5_C_9_RP_0/results.txt ./params_R_5_C_16_RP_0/results.txt ./params_R_5_C_25_RP_0/results.txt ./params_R_7_C_4_RP_0/results.txt ./params_R_7_C_9_RP_0/results.txt ./params_R_7_C_16_RP_0/results.txt ./params_R_7_C_25_RP_0/results.txt ./params_R_9_C_4_RP_0/results.txt ./params_R_9_C_9_RP_0/results.txt ./params_R_9_C_16_RP_0/results.txt ./params_R_9_C_25_RP_0/results.txt ... I should let it params_R_005_C_004_RP_0 when generating the results. But it would take much time to rerun the program to get the results. So I wonder if there is any way to use the bash command to achieve this objective.

    Read the article

  • How to Save data in txt file in MATLAB

    - by Jessy
    I have 3 txt files s1.txt, s2.txt, s3.txt.Each have the same format and number of data.I want to combine only the second column of each of the 3 files into one file. Before I combine the data, I sorted it according to the 1st column: UnSorted file: s1.txt s2.txt s3.txt 1 23 2 33 3 22 4 32 4 32 2 11 5 22 1 10 5 28 2 55 8 11 7 11 Sorted file: s1.txt s2.txt s3.txt 1 23 1 10 2 11 2 55 2 33 3 22 4 32 4 32 5 28 5 22 8 11 7 11 Here is the code I have so far: BaseFile ='s' n=3 fid=fopen('RT.txt','w'); for i=1:n %Open each file consecutively d(i)=fopen([BaseFile num2str(i)'.txt']); %read data from file A=textscan(d(i),'%f%f') a=A{1} b=A{2} ab=[a,b]; %sort the data according to the 1st column B=sortrows(ab,1); %delete the 1st column after being sorted B(:,1)=[] %write to a new file fprintf(fid,'%d\n',B'); %close (d(i)); end fclose(fid); How can I get the output in the new txt file in this format? 23 10 11 55 33 22 32 32 28 22 11 11 instead of this format? 23 55 32 22 10 33 32 11 11 22 28 11

    Read the article

  • How to set robots.txt globally in nginx for all virtual hosts

    - by anup
    I am trying to set robots.txt for all virtual hosts under nginx http server. I was able to do it in Apache by putting the following in main httpd.conf: <Location "/robots.txt"> SetHandler None </Location> Alias /robots.txt /var/www/html/robots.txt I tried doing something similar with nginx by adding the lines given below (a) within nginx.conf and (b) as include conf.d/robots.conf location ^~ /robots.txt { alias /var/www/html/robots.txt; } I have tried with '=' and even put it in one of the virtual host to test it. Nothing seemed to work. What am I missing here? Is there another way to achieve this?

    Read the article

  • Android - store .txt filenames in array

    - by JustMe
    Hello! I would like to store only .txt filenames from a specific dir (let's say: /sdcard/docs/) into an String array. Example: In /sdcard/docs/ there are 5 files: 'foo.txt', 'bar.jpg', 'foofoo.txt', 'loool.png' and 'foobar.txt'. I would like to get array with contents: "foo.txt", "foofoo.txt", "foobar.txt" How could I do it? Thanks =)

    Read the article

  • Googlebot cant access my site webmaster tools reply Unreachable robots.txt

    - by Ahmad Ahmadi
    When I try to fetch my site as a googlebot in webmaster tools it return Unreachable robots.txt, after investigate I understood google bot can see my server: tcpdump | grep google it return that google can access my server with IP 66.249.81.172 or 66.249.75.111. but there is not any think in access log or error log or other apache logs. cat access_log | grep google or cat error_log | grep 66.249.81.172 Other bot (bing,...) can access apache but google cant. there is not any problem in my robots.txt or its permissions because as you know robots.txt is not necessary so I delete it but again webmaster tools returned Unreachable robots.txt not 404 not found! information about server: Server OS : CentOS 6 Web Server : Apache 2.x Firewall : IPTables is stoped SELinux is Disabled There is not any think else for security on my server. how can I investigate the problem and is there any other command that can help me to find the problem.

    Read the article

  • Amazon SES domain verification TXT DNS record

    - by Skittles
    I currently am trying to get my domain verified on Amazon's SES and running int a problem that google searches are not helping me get any closer to solving. According to SES, I have to create a TXT record in my DNS for the domain I'm trying to verify. Amazon gives you the following (value changed for security purposes); TYPE: TXT NAME: _amazonses.somedomain.com VALUE: M2sXTycXkgZXXuMuWI8TczngaPIDDMToPefzGhZ3uYA= I have tried numerous entries in my registrar's DNS manager, but SES still fails to find what it's looking for. I am not a DNS guru, so, I have tried to construct the TXT record from very sparse examples, at best, to try to get this right. My present TXT record is this; "v=DKIM1 s=_domainkey d=_amazonses.somedomain.com p=M2sXTycXkgZXXuMuWI8TczngaPIDDMToPefzGhZ3uYA=" Am I doing something incorrect? Thanks

    Read the article

  • apache robots.txt with SSL

    - by user224013
    I have an .htaccess file with a rewrite rule to get a redirect of every HTTP request to HTTPS. But now I have a problem that my robots.txt is not recognized by some online checker. If I remove the redirect from the .htaccess file the robots.txt is recognized correctly. Maybe I should exclude that the robots.txt is redirects to an HTTPS connection? This is the part of .htaccess for redirecting to HTTPS RewriteCond %{SERVER_PORT} !^443$ RewriteRule (.*) https://%{HTTP_HOST}/$1 [L]

    Read the article

  • Is there a way to disallow only crawling in https in robots.txt?

    - by David Wilkins
    I just realized that Bingbot is crawling my company's website's pages over https. Bing already crawls the site over http, so this seems frivolous. Is there a way to specify Disallow: / for https only? According to Wikipedia, each protocol has its own robots.txt And according to Google's Robots.txt Specification, the robots.txt applies to http AND https I don't want to Disallow: / for Bing totally, just over https.

    Read the article

  • Disallow robots.txt from being accessed in a browser but still accessible by spiders?

    - by Michael Irigoyen
    We make use of the robots.txt file to prevent Google (and other search spiders) from crawling certain pages/directories in our domain. Some of these directories/files are secret, meaning they aren't linked (except perhaps on other pages encompassed by the robots.txt file). Some of these directories/files aren't secret, we just don't want them indexed. If somebody browses directly to www.mydomain.com/robots.txt, they can see the contents of the robots.txt file. From a security standpoint, this is not something we want publicly available to anybody. Any directories that contain secure information are set behind authentication, but we still don't want them to be discoverable unless the user specifically knows about them. Is there a way to provide a robots.txt file but to have it's presence masked by John Doe accessing it from his browser? Perhaps by using PHP to generate the document based on certain criteria? Perhaps something I'm not thinking of? We'd prefer a way to centrally do it (meaning a <meta> tag solution is less than ideal).

    Read the article

  • Is there a way to disallow crawling of only HTTPS in robots.txt?

    - by David Wilkins
    I just realized that Bingbot is crawling my company's website's pages over https. Bing already crawls the site over http, so this seems frivolous. Is there a way to specify Disallow: / for https only? According to Wikipedia, each protocol has its own robots.txt And according to Google's Robots.txt Specification, the robots.txt applies to http AND https I don't want to Disallow: / for Bing totally, just over https.

    Read the article

  • No description for any page on the website is available in Google despite robots.txt allowing crawling

    - by Abhijit
    I seem to have the weirdest issue with Search Engine Optimization, and I asked the IT folks at my university, I asked people on Joomla forums and I am trying to sort this issue out using Google Webmaster Tools for more than 2 months to little avail. I want to know if I have some blatantly wrong configuration somewhere that is causing search engines to be unable to index this site. I noticed a similar issue with another website I searched for online (ECEGSA - The University of British Columbia at gsa.ece.ubc.ca), making me believe this might be a concern that people might be looking an answer for. Here are the details: The website in question is: http://gsa.ece.umd.edu/. It runs using Joomla 2.5.x (latest). The site was up since around mid December of 2013, and I noticed right from the get go that the site was not being indexed correctly on Google. Specifically I see the following message when I search for the website on Google: A description for this result is not available because of this site's robots.txt – learn more. The thing is in December till around March I used the default Joomla robots.txt file which is: User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /cli/ Disallow: /components/ Disallow: /images/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /logs/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/ Nothing there should stop Google from searching my website. And even more confusingly, when I go to Google Webmaster tools, under "Blocked URLs" tab, when I try many of the links on the site, they are all shown up as "Allowed". I then tried adding a sitemap, putting it in the robots.txt file. That did not help. Same exact search result, same behavior in the "Blocked URLs" tab on the webmaster tools. Now additionally, the "sitemaps" tab says for several links an error saying "URL is robotted out". I tried those exact links in the "Blocked URLs" and they are allowed! I then tried deleting the robots.txt file. No use. Same exact problem. Here is an example screenshot from Google's Webmaster Tools: At this point I cannot give a rational explanation to why this is happening and neither can anyone in the IT department here. No one on Joomla forums can seem to understand what is going on. Based on what I explained, does it seem that I have somehow set a setting in the robots.txt or in .htaccess or somewhere else, incorrectly?

    Read the article

  • Why do Google search results include pages disallowed in robots.txt?

    - by Ilmari Karonen
    I have some pages on my site that I want to keep search engines away from, so I disallowed them in my robots.txt file like this: User-Agent: * Disallow: /email Yet I recently noticed that Google still sometimes returns links to those pages in their search results. Why does this happen, and how can I stop it? Background: Several years ago, I made a simple web site for a club a relative of mine was involved in. They wanted to have e-mail links on their pages, so, to try and keep those e-mail addresses from ending up on too many spam lists, instead of using direct mailto: links I made those links point to a simple redirector / address harvester trap script running on my own site. This script would return either a 301 redirect to the actual mailto: URL, or, if it detected a suspicious access pattern, a page containing lots of random fake e-mail addresses and links to more such pages. To keep legitimate search bots away from the trap, I set up the robots.txt rule shown above, disallowing the entire space of both legit redirector links and trap pages. Just recently, however, one of the people in the club searched Google for their own name and was quite surprised when one of the results on the first page was a link to the redirector script, with a title consisting of their e-mail address followed by my name. Of course, they immediately e-mailed me and wanted to know how to get their address out of Google's index. I was quite surprised too, since I had no idea that Google would index such URLs at all, seemingly in violation of my robots.txt rule. I did manage to submit a removal request to Google, and it seems to have worked, but I'd like to know why and how Google is circumventing my robots.txt like that and how to make sure that none of the disallowed pages will show up in their search results. Ps. I actually found out a possible explanation and solution, which I'll post below, while preparing this question, but I thought I'd ask it anyway in case someone else might have the same problem. Please do feel free to post your own answers. I'd also be interested in knowing if other search engines do this too, and whether the same solutions work for them also.

    Read the article

  • How to resolve "Google can't find your site's robots.txt" error?

    - by Manivasagam
    I've recently found that "Google can't find your site's robots.txt" in crawl errors. When I tried Fetching as Google, I got result "SUCCESS", then I tried looking at crawl errors and it still shows "Google can't find your site's robots.txt". What can I do to resolve this issue? Before this issue arose, my site was indexed within a few mintues, but now I find that it took time to be indexed in Google's search. When I access http://mydomain.com/robots.txt, it shows the data below: User-agent: *Disallow: /wp-admin/ Disallow: /wp-includes/ I found Blocked URLs = 0, also no any other errors. Is there any other thing I need to change? Or what could be the solution for this? Any help would be appreciated.

    Read the article

  • Cross-submission robots.txt for multiple domains on single host

    - by sidd.darko
    We are running a site with multiple languages hosted in a single environment on IIS7. For example, oursite.com - english oursite.de - german oursite.es - spanish This is a single-host environment. All of these sites are in the same application space on the same physical machine. I need to do cross-submission of sitemaps via robots.txt. Looking at the sitemap.org guidelines for this suggest this is possible, but the example indicates different physical machines. Will the following entries in oursite.com/robots.txt work? http://www.oursite.com/sitemap-oursite-de.xml http://www.oursite.com/sitemap-oursite-es.xml

    Read the article

  • Disallowed images in the robots.txt of my Joomla site can't be displayed when shared in Facebook

    - by opk
    I have noticed that since I have disallowed images using the robots.txt in my Joomla site, when sharing an article in Facebook, the image will not be displayed. Why is that? Is it indeed related? My robots.txt file: User-agent: * Disallow: /administrator/ Disallow: /cache/ Disallow: /cli/ Disallow: /components/ Disallow: /images/ Disallow: /includes/ Disallow: /installation/ Disallow: /language/ Disallow: /libraries/ Disallow: /logs/ Disallow: /media/ Disallow: /modules/ Disallow: /plugins/ Disallow: /templates/ Disallow: /tmp/

    Read the article

  • iphone read txt file from UIWebView

    - by Ni
    I can read the data in file.txt file located in local disk. NSString *filePath = [[NSBundle mainBundle] pathForResource:@"file" ofType:@"txt"]; NSString* Data = [NSString stringWithContentsOfFile:filePath encoding:NSUTF8StringEncoding error:&error ]; now, I upload the file.txt file into a website. how can i read the data from the txt file now from UIWebView? Please help!!

    Read the article

  • how to read the txt file from the database(line by line)

    - by Ranjana
    i have stored the txt file to sql server database . i need to read the txt file line by line to get the content in it. my code : DataTable dtDeleteFolderFile = new DataTable(); dtDeleteFolderFile = objutility.GetData("GetTxtFileonFileName", new object[] { ddlSelectFile.SelectedItem.Text }).Tables[0]; foreach (DataRow dr in dtDeleteFolderFile.Rows) { name = dr["FileName"].ToString(); records = Convert.ToInt32(dr["NoOfRecords"].ToString()); bytes = (Byte[])dr["Data"]; } FileStream readfile = new FileStream(Server.MapPath("txtfiles/" + name), FileMode.Open); StreamReader streamreader = new StreamReader(readfile); string line = ""; line = streamreader.ReadLine(); but here i have used the FileStream to read from the Particular path. but i have saved the txt file in byt format into my Database. how to read the txt file using the byte[] value to get the txt file content, instead of using the Path value.

    Read the article

  • how to read the txt file from database(byte[] to filestream)

    - by Ranjana
    i have stored the txt file to sql server database . i need to read the txt file line by line to get the content in it. my code : DataTable dtDeleteFolderFile = new DataTable(); dtDeleteFolderFile = objutility.GetData("GetTxtFileonFileName", new object[] { ddlSelectFile.SelectedItem.Text }).Tables[0]; foreach (DataRow dr in dtDeleteFolderFile.Rows) { name = dr["FileName"].ToString(); records = Convert.ToInt32(dr["NoOfRecords"].ToString()); bytes = (Byte[])dr["Data"]; } FileStream readfile = new FileStream(Server.MapPath("txtfiles/" + name), FileMode.Open); StreamReader streamreader = new StreamReader(readfile); string line = ""; line = streamreader.ReadLine(); but here i have used the FileStream to read from the Particular path. but i have saved the txt file in byt format into my Database. how to read the txt file using the byte[] value to get the txt file content, instead of using the Path value.

    Read the article

  • Strange robots.txt - how and why did it get there?

    - by Mick
    I recently created a very simple, pure HTML website which I have hosted with "hostmonster". Hostmonster had very good reviews on some comparison website and in general so far they appear to be perfectly good in every way... At least I thought so until just now... I have been making lots of edits to my site on an almost daily basis. My site now appears on the first page (7th on the list) for my most important keyphrase when doing a google search. But I did notice some problem with the snippet chosen by google. I asked a question on this site about snippets and got some great answers. I then made some modifications to my meta data and within 48hrs the google snippet for my search was perfect. The odd thing though was that looking at the "cached" version google had, it appeared that the cache was still very odl- like three weeks previous. This seemed very odd - how could it be that the google robots had read my new metadata without updating the cache? This puzzled me greatly. Just now it occurred to me that maybe I had some goofey setting in my robots.txt file. I didn't actually remember even making one - but I thought I'd have a look just in case. Much to my horror, I saw that there was a robots.txt and it contained the disturbing text below: sitemap: http://cdn.attracta.com/sitemap/728687.xml.gz Intuitively this looks like some kind of junk, spam trick, and I had indeed been getting some spam from "attracta". So my questions are: 1. Should I simply delete this robots.txt? 2. Was the file there all along - placed there because of some commercial tie-in between attracta and hostmonster. 3. Does the attracta robots file explain the lack of re-caching?

    Read the article

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12  | Next Page >