robots txt - Page 5 - Developer IT

How to compare two TXT files before send it to SQL

- by adopilot

I have to handle TXT dat files which coming from one embed device, My problem is in that device always sending all captured data but I want to take only difrences between two sending and do calculation on them. After calculation I send it to SQL using bulkinsert function. I want to extract data which is different according to first file I got from device. Lats say that device first time device send data like this in some.dat (ASCII) file 0000199991 0000199321 0000132913 0000232318 0000312898 On second calls to get data from device it is going to return all again (previous and next captured records) something like this 0000199991 0000199321 0000132913 0000232318 0000312898 9992129990 8782999022 2323423456 But this time I do want only to calculate and pass trough data added after first insert. I am trying to make Win Forms app using C# and Visual Studio 2008

Read the article

Java BufferedReader behavior in CSV vs TXT file

- by Gabriel

If i try to read a CSV file called csv_file.csv. The problem is that when i read lines with BufferedReader.readLine() it skips the first line with months. But when i rename the file to csv_file.txt it reads it allright and it's not skipping the first line. Is there an undocumented "feature" of BufferedReader that i'm not aware? Example of file: Months, SEP2010, OCT2010, NOV2010 col1, col2, col3, col4, col5 aaa,,sdf,"12,456",bla bla bla, xsaffadfafda and so on, and so on, "10,00", xxx, xxx The code: FileInputStream stream = new FileInputStream(UploadSupport.TEMPORARY_FILES_PATH+fileName); BufferedReader br = new BufferedReader(new InputStreamReader(stream, "UTF-8")); String line = br.readLine(); String months[] = line.split(","); while ((line=br.readLine())!=null) { /*parse other lines*/ }

Read the article

Retrieving content of a txt file from URL to Android

- by eightx2

I would like to be able to retrieve contents of a .txt file from the internet, and load it in an EditText. I tried using the code on this page: Reading Text File From Server on Android It didn't work, as you might have guessed. I've read on numerous sites about this type of problem, but I can't get anything to work. Someone suggested AndroidHttpClient (http://developer.android.com/reference/android/net/http/AndroidHttpClient.html), but I simply can't find any examples with this. As I'm a newbie in Android programming, I would love if someone could please give me a small example.

Read the article

Google indexed page a day before also reflecting in search but today everything vanish

- by ganesh

We had robots.txt which disallow all robots as we were in development. We are live now. We change robots.txt as per our requirement a day before. Submit indexes using Google Webmaster Tools index status. After this we can see proper result in search as well as Google images search was working as expected. Suddenly today all these things vanish from Google Search. Now again I can see old result i.e. under construction message. I checked robots.txt in Google Webmaster Tools, it's ok - no crawling errors. Kindly let me know what exactly happened? How I can inform this issue to Google?

Read the article

mod evasive not working properly on ubuntu 10.04

- by Joe Hopfgartner

I have an ubuntu 10.04 server where I installed mod_evasive using apt-get install libapache2-mod-evasive I already tried several configurations, the result stays the same. The blocking does work, but randomly. I tried with low limis and long blocking periods as well as short limits. The behaviour I expect is that I can request websites until either page or site limit is reached per given interval. After that I expect to be blocked until I did not make another request for as long as the block period. However the behaviour is that I can request sites and after a while I get random 403 blocks, which increase and decrase in percentage, however they are very scattered. This is an output of siege, so you get an idea: HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.11 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.09 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.10 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.08 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.09 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.10 secs: 75 bytes ==> /robots.txt HTTP/1.1 403 0.09 secs: 242 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.09 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.10 secs: 75 bytes ==> /robots.txt HTTP/1.1 200 0.08 secs: 75 bytes ==> /robots.txt The exac limits in place during this test run were: DOSHashTableSize 3097 DOSPageCount 10 DOSSiteCount 100 DOSPageInterval 10 DOSSiteInterval 10 DOSBlockingPeriod 120 DOSLogDir /var/log/mod_evasive DOSEmailNotify ***@gmail.com DOSWhitelist 127.0.0.1 So I would expect to be blocked at least 120 seconds after being blocked once. Any ideas aobut this? I also tried adding my configuration at different places (vhost, server config, directory context) and with of without ifmodule directive... This doesnt change anything.

Read the article

Blocking Just the Parent Domain via robots.txt

- by Bryan Hadaway

Let's say you have a parent domain: parent.com and children subdomains under that parent domain: child1.com child2.com child3.com Is there a way to use just the following within parent.com: User-agent: * Disallow: / Considering each child has their own robots.txt stating: User-agent: * Allow: / Or is the parent robots.txt still going to have to make an exception for every single subdomain: User-agent: * Disallow: / Allow: /child1/ Allow: /child2/ Allow: /child3/ Obviously this is important and tricky territory SEO wise so I'm looking to learn the definitive and safe, best practice method here to sharpen my skills. Thanks, Bryan

Read the article

stored procedure for importing txt in sql server db

- by Iulian

I have to insert new records in a database every day from a text file ( tab delimited). I'm trying to make this into a stored procedure with a parameter for the file to read data from. CREATE PROCEDURE dbo.UpdateTable @FilePath BULK INSERT TMP_UPTable FROM @FilePath WITH ( FIRSTROW = 2, MAXERRORS = 0, FIELDTERMINATOR = '\t', ROWTERMINATOR = '\n' ) RETURN Then i would call this stored procedure from my code (C#) specifying the file to insert. This is obviously not working, so how can i do it ?

Read the article

How do I use an include statement in a TXT record?

- by Aglystas

We have a client that is using an email service that requires a TXT domain key reocrd that is over 127 characters long. I'm pretty sure BIND allows this, however we run djbdns with tinydns and it looks as though it only supports txt records up to 127 characters. And the rest is being truncated. I was thinking I can do an include combining them, but I'm not really sure how. I was thinking of setting the value to somthing like... v=DKIM1; k=rsa; p=MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQC2GWCNaDTuC3include:bdk2._domainkey.mail.cutlerymania.com My thought is, will this grab the actual value located at that domain which only has one record which is a TXT record and simply append that information so the entire key record gets sent correctly?

Read the article

What are the main differences between SRV records and TXT records?

- by Chris Adams

Hi there, I'm trying to consolidate domain names for the servers I look after to just use one panel instead of 3 or 4, and one thing stopping me is that the provider I originally wanted to move them to only lets me the following kinds of records: A MX NS CNAME TXT The first four I understand, but I'm not sure about the relationship (if any) between SRV records and TXT records. Can I use TXT records in the place of SRV records? They both seem to be general text records to just point at a particular server without needing to specify a particular protocol, so it doesn't sound like a totally unreasonable assumption, but I'd rather check here before I break something. If I can only set the above records, does that mean I'm essentially unable to so any SRC record redirecting? Thanks!

Read the article

How to format and where to put the SPF TXT record?

- by YellowSquirrel

EDIT I think I more or less understand the syntax and, anyway, Google is giving, in the link below, the syntax needed. My question is really where to put that stuff. Should I quote every field? The whole line? :) I've set up Google apps for my domain: I've registered the domain with Google by adding the CNAME Google asked and I've apparently succesfully setup the MX Google mail servers. So far I haven't yet a dedicated server: I'm just having a domain at a registrar. Now I want to activate SPF and I'm confused. In the following short webpage: http://www.google.com/support/a/bin/answer.py?answer=178723 it is written that I must add a TXT record containing: v=spf1 include:_spf.google.com ~all Where should I enter this? Should this go in the zone (?) file, like I did for the CNAME and the MX records? So far I have something like this: @ 10800 IN A 217.42.42.42 @ 10800 IN MX 5 ASPMX3.GOOGLEMAIL.COM. @ 10800 IN MX 5 ASPMX2.GOOGLEMAIL.COM. @ 10800 IN MX 3 ALT2.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 3 ALT1.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 1 ASPMX.L.GOOGLE.COM. google8a70835987f31e34 10800 IN CNAME google.com. Does adding the SPF TXT record mean I should literally have something like that: @ 10800 IN A 217.42.42.42 @ 10800 IN MX 5 ASPMX3.GOOGLEMAIL.COM. @ 10800 IN MX 5 ASPMX2.GOOGLEMAIL.COM. @ 3600 IN TXT "v=spf1 include:_spf.google.com ~all" @ 10800 IN MX 3 ALT2.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 3 ALT1.ASPMX.L.GOOGLE.COM. @ 10800 IN MX 1 ASPMX.L.GOOGLE.COM. google8a70835987f31e34 10800 IN CNAME google.com. I made that one up and included right in the middle to show how confused I am. What I'd like to know is the exact syntax and where/how I should put this TXT record.

Read the article

Rewrite for robots.txt and favicon.ico [closed]

- by BHare

I have setup some rules in which subdomains (my users) will default to where I have located the robots.txt, favicon.ico, and crossdomain.xml therefore if a user creates a site say testing.mywebsite.com and they don't make their own favicon.ico at testing.mywebsite.com/favicon.ico, then it will use the favicon.ico I have in /misc/favicon.ico This works perfect, but it doesn't work for the main website. If you attempt to go to mywebsite.com/favicon.ico it will check if "/" exists, in which it does. And then never redirects to /misc/favicon.ico How can I get it so both instances redirect to /misc/favicon.ico ? # Set all crossdomain (openpalace file) favorite icons and robots.txt doesnt exist on their # side, then redirect to site's just to have something to go on. RewriteCond %{REQUEST_URI} crossdomain.xml$ RewriteCond ^(.+)crossdomain.xml !-f RewriteRule ^(.*)$ /misc/crossdomain.xml [L] RewriteCond %{REQUEST_URI} favicon.ico$ RewriteCond ^(.+)favicon.ico !-f RewriteRule ^(.*)$ /misc/favicon.ico [L] RewriteCond %{REQUEST_URI} robots.txt$ RewriteCond ^(.+)robots.txt !-f RewriteRule ^(.*)$ /misc/robots.txt [L]

Read the article

Make Google Apps site publicly accessible while disabling crawlers with robots.txt?

- by Joannes Vermorel

I would like to create a publicly accessible Google Apps site (i.e. users do not need to be authenticated to access the content) while maintaining a policy crawlers and bots exclusion with Robots.txt. Does anyone know how to do that?

Read the article

How to test robots.txt in googlebot to find out what is being indexed

- by Amar Jarubula

This question is a continuation for this answer How to check if googlebot will index a given url? As was told I did go to the Webmaster Tools and tested contents of my robots.txt file. However this is just giving me the info if that content is good enough or not. However for my scenario I need to test whether disallowing some patterns is being indexed or not. For example I have something like this below in my robots.txt disallow:/pattern* My understanding is the URLs with word pattern should not crawled, but how do I test this pattern is enforced while indexing the website?

Read the article

Execure a random command from .txt file?

- by Alberto Burgos

I have a Ubuntu server, and I'm trying to print a Twitter quote using the app "twidge". So I made a list of tweets on a .txt file. I want to print one tweet (per line) from that file and send it to Twitter via twidge (or what ever other method was possible). I can print a random phrase with shuf: shuf -n 1 /var/www/tweets.txt and it works. It sends me back one of the tweets, but, it does not send it to Twitter, even if the "in line" phrase is a command. i.e: twidge update "bla bla bla" It just prints on the screen, but don't send it to Twitter. I tried turning the .txt to .sh, but don't work... any idea? by the way, i want to use it with crontab, something like this: 15 * * * * shuf -n 1 /var/www/tweets.txt

Read the article

Generating CMakeLists.txt

- by vanna

I got a bunch of C++ sources files and headers. They may use external libraries such as Boost e.g. I am interested in the process of building binaries for Windows and *nix. Makefiles (*nix) and .vcproj (Windows) call compilers with some specifications such as the order of compilation, compilation options and stuff. CMakeLists.txt can be used by CMake to build either makefiles or .vcproj and use very helpful commands such as recursive search of files, automatic linkage with known libraries, installers, variables that can be used in source files... Is there any existing tool that would generate a CMakeLists.txt from specified options ? Options could be like : scan this folder and make a library out of it, then scan this other folder and make an executable and automatically link both with Boost as well along with a user friendly installer with generated INSTALL.txt and README.txt. Something very powerful like that.

Read the article

Prevent azure subdomain indexation

- by Leg10n

Let me explain my situation, I have an azure website (with azurewebsites.net sub domain), and a custom domain.com, built with asp.net MVC Both are being indexed by Google, but I've noticed the custom domain is being penalized and it doesn't show up in results, it only shows when I search for "site:domain.com" I want to remove and block the azurewebsites.net subdomain from Google. I've read the "possible" solutions: Adding robots.txt: won't work, because the subdomain and the domain are the exact same content, so subdomain.azures.net/robots.txt will lead to domain.com/robots.txt, removing the domain as well. Adding the tag, is the same situation as the previous point. I'm using a CNAME register to redirect the domain to the subdomain, so I can't redirect to a sub directory. Do you have any other ideas?

Read the article

How to create robots.txt for a domain that contains international websites in subfolders?

- by aaandre

Hi, I am working on a site that has the following structure: site.com/us - us version site.com/uk - uk version site.com/jp - Japanese version etc. I would like to create a robots.txt that points the local search engines to a localized sitemap page and has them exclude everything else from the local listings. So, google.com (us) will index ONLY site.com/us and take in consideration site.com/us/sitemap.html google.co.uk will index only site.com/uk and site.com/uk/sitemap.html Same for the rest of the search engines, including Yahoo, Bing etc. Any idea on how to achieve this? Thank you!

Read the article

Ranking drop after using reverse proxy for blog subdirectory and robots.txt for old blog subdomain

- by user40387

We have a 3Dcart store and a WordPress blog hosted on a separate server. Originally, we had a CNAME set up to point the blog to http://blog.example.com/. However, in our attempt to boost link-based and traffic-based authority on the main site, we've opted to do a reverse proxy to http://www.example.com/blog/. It’s been about two months since we finished the reverse proxy migration. It appears that everything is technically working as intended, including some robots and sitemap changes; the new URLs are even generating some traffic, as indicated on Google Analytics. While Google has been indexing the new URL locations, they’re ranking very poorly, even for non-competitive, long-tail keywords. Meanwhile, the old subdomain URLs are still ranking mostly as well as they used to (even though they aren’t showing meta titles and descriptions due to being blocked by robots.txt). Our working theory is that Google has an old index of the subdomain URLs, and is considering the new URLs to be duplicate content, since it’s being told not to crawl the subdomain and therefore can’t see the rel canonicals we have in place. To resolve this, we’ve updated the subdomain’s robot.txt to no longer block crawling and indexing. Theoretically, seeing the canonical tag on the subdomain pages will resolve any perceived duplicate content issues. In the meantime, we were wondering if anyone would have any other ideas. We are very concerned that we’ll be losing valuable traffic, as we’re entering our on season at the moment.

Read the article

<meta name="robots" content="noindex"> in "Fetch as Google"

- by Rodrigo Azevedo

I don't know why but when I execute "fetch as Google" it returns me HTTP/1.1 200 OK Cache-Control: private Content-Type: text/html Content-Encoding: gzip Vary: Accept-Encoding Server: Microsoft-IIS/7.5 Set-Cookie: ASPSESSIONIDQACRADAQ=ECAINNFBMGNDEPAEBKBLOBOP; path=/ X-Powered-By: ASP.NET Date: Wed, 26 Jun 2013 15:18:29 GMT Content-Length: 153 <meta name="robots" content="noindex"> The noindex doesn't exist. Does anybody know what could be wrong?

Read the article

After Caldera.com's Robots.txt is Removed, Some Evidence Surfaces

Groklaw: "Now that SCO has sold off the caldera.com domain name, their previous robots.txt file no longer blocks access to the legacy Caldera web pages on Internet Archive. And what has popped up?"

Read the article

New features for Robots: Bundled Annotations, Inline Blips, Read-Only Roles

Over the last few releases, we've been rolling out incremental improvements to the robots API , based on the feedback from all of you developers. For those of...

Read the article

Android OS Now Used To Drive Real Robots

Robot Reviews: "For those wondering about the propriety of the name "Android" as a mobile device operating system, wonder no more because its real purpose has finally been revealed. It's really an operating system for robots."

Read the article

Qbo, Based On Linux, To Join Growing Field Of Open Source Robots

OStatic: "Now, one of the more interesting new open source robots is Qbo (shown), a Linux-based robot from the folks at thecorpora.com."

Read the article

How do I remove a LOT of indexed pages from Google?

- by Thierry

A few weeks ago we have figured out that Google has indexed some information we would rather keep in some confidentiality, in the format of individual PDF files. Our assumption was that this was a problem with our robots.txt we had overlooked. Even though we are not sure whether or not this is the case, we are certain that the robots.txt file is in a valid format and is, according to Google's webmaster tools, blocking the files. However, even after this adjustment that has been made weeks ago, Google still has the PDF files indexed, but does tell us further information cannot be provided due to the robots.txt file being present. As you can hopefully understand, this is unwanted behaviour due to the nature of the documents. I am aware that there is a request page being provided by Google for this purpose, but there are a lot of files. Is there an easier way to get Google to remove all of the files from its search engine? If not, is there anything else you could advise us to do besides manually requesting Google to remove every single page? Thanks in advance.

Read the article

Weird entry for robots.txt on a Naked Domain in Google Webmaster Tools

- by Metalshark

We own a .co.uk address and use an Internet hosting company that has made mistakes around DNS in the past. Our main site is hosted on www. and their reluctance to allow editing of AAAA records on-line means our naked domain does not resolve. Currently when we attempt to reach the naked version there is no entry for the browser to go to and it displays an unreachable page (nslookup just says Name: name of domain with no further entries such as an IP or Canonical Name). We recently added the relevant TXT records to verify us to view both the www. version and the naked version of the domain in Google Webmaster Tools (in anticipation of the requests to our Internet host coming to fruition). Imagine our shock when double checking the Site configuration Crawler access and finding a (admittedly failing) robots.txt with a dynamically generated HTML page (full of crude pop-up JavaScript) with references to 3 of our most prominent competitors. What could cause this to happen? As we are in the UK I am assuming some DNS server is serving Google bad information. We are going to contact the Internet hosting company to fix our A and AAAA records once and for all, then check that they work in the US (using something like OpenDNS). Should we be doing more though, for instance informing Google (through Webmaster Tools) that we are now aware there is something currently wrong with our naked domain? UPDATE: We have fixed our A records (not AAAA) and that has resolved the issue. But if there are further actions we should take for effectively having a parking page hosted on our active visitor-heavy, SEO-rich domain that advertised our competitors to US visitors, what would they be?

Search Results

Search found 4984 results on 200 pages for 'robots txt'.

Page 5/200 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by adopilot

- by Gabriel

- by eightx2

- by ganesh

- by Joe Hopfgartner

- by Bryan Hadaway

- by Iulian

- by Aglystas

- by Chris Adams

- by YellowSquirrel

- by BHare

- by Joannes Vermorel

- by Amar Jarubula

- by Alberto Burgos

- by vanna

- by Leg10n

- by aaandre

- by user40387

- by Rodrigo Azevedo

- by Thierry

- by Metalshark

< Previous Page | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >