Search Results

Search found 437 results on 18 pages for 'bot'.

Page 9/18 | < Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >

How to discredit hacked links pointing at my company's website

- by Dan Gayle

The competition of one of my company's websites has started a really dirty campaign of acquiring hack links. One of their ingenious tactics has been to seed in links to OUR site withing their hack bot, making US look like we might be responsible for it or using us to cover their tail. These are .gov and .edu sites. Is there any way possible to discredit these links? To disavow them at all? EDIT: Penguin has really effected this question, IMO. Does anyone know if there is a revised opinion on disavowing backlinks to your site?

Read the article
How are crossplatform/multiple-OS C++ projects planned in terms of code and tools?

- by Nav

I want to create a project in C++ that can work in Windows, Linux and Embedded Linux. How are projects created when they have to work across many OS'es? Is it first created on one OS and then the code slowly modified to be ported to another OS? Eg: to me, the Linux version of Firefox appears to be created as a Windows project and a separate Linux project with a different code base, since Firefox behaves a bit different in Windows and Linux. Although the source code download is surprisingly a single link. If QT is used for UI, Boost threads for threading, Build Bot for CI and NetBeans/Eclipse/QT Creator for an IDE, would a person be able to minimise the amount of code re-write required to get the project onto another OS? Is this the right way to do it, or are such projects meant to be created as two entirely separate projects for two separate OS'es?

Read the article
Does Submit to Index on a page with new content update Content Keywords for the site?

- by Dan Kanze

Using Google Webmaster Tools I'm trying to update the Content Keywords of my site. I'm confused about the relationship between Submit to Index and Content Keywords Does Fetch as Google -- Submit to Index on a previously existing indexed page containing new content expidite updating the Content Keywords crawled by the real Google bot? Does Submit to Index only submit new URL's so that previously indexed URL's still point to the older cached version until Google crawls specifically for new content on its own? Does Submit to Index have anything to do with Content Keywords or crawling new content being a previously indexed page or never been indexed page?

Read the article
How frequently Googlebot fetch sitemaps? Is it depending on page rank?

- by JITHIN JOSE

How much frequently google fetches sitemaps? I am now working with a high traffic website normally have 30 new posts per minute.But currently it provides sitemaps which links to new 100 posts(3 minutes). Is this method is enough ?. Is Bots fetch sitemaps every 3 minutes?. Did need to change sitemaps to list all 5M posts(indexed sitemaps)?. How this change will effect on traffic and page rank. Is google bot remove urls that previously listed on sitemap but not now?

Read the article
My First robots.txt

- by Whitechapel

I'm creating my first robots.txt and wanted to get a second opinion on it. Basically I have a FTP setup on my board for some special users to transfer files between each other and I do NOT want that included in the search by the bots. I also want to point to my sitemap which gets auto generated by a PHP page. So here is what I have, what else should I include, and if I need to fix anything with it? Also, it's linking to xmlsitemap.php because that generates the sitemap when called. My goal is to allow any search bot crawl the forums to grab meta data. User-agent: * Disallow: /admin/ Disallow: /ali/ Disallow: /benny/ Disallow: /cgi-bin/ Disallow: /ders/ Disallow: /empire/ Disallow: /komodo_117/ Disallow: /xanxan/ Disallow: /zeroordie/ Disallow: /tmp/ Sitemap: http://www.vivalanation.com/forums/xmlsitemap.php Edit, I'm not sure how to handle all the user's folders under /public_html/ since the robots.txt will be going in /public_html.

Read the article
Problem while installing (K)ubuntu 14.04

- by Armin

Im trying to install Kubuntu 14.04 along-side Windows 7 x64 but there are problems in Disk Setup section, installer does not show my drives properly as they are, i left drive empty to install Kubuntu but this drive does not shown at all and all drives are listed some way un-regular. this is how my drives really are: i want to install Kubuntu in my drive D but this is how Disk Setup is showing my drives: and when i click on manual: i even shrink-ed my drive D to assigning for root, home and swap but there are bot shown to be chosen. where is the problem? how can i tell installer to assign the drives that i want?

Read the article
Dartisans ep 14 - Dart Community Demos

Dartisans ep 14 - Dart Community Demos The #dartlang community has been busy! You'll meet some members of the Dart community and see demos of their latest projects. Also, learn how an open-source contributor gained committer status for Dart! As always, ask and vote for questions for Dart engineers and community members. Meet +Kevin Moore, +Alexander Aprelev, and +John McCutchan show off their libraries and projects. You might just see WebGL, dart2js, and BOT in action. Ask questions here: developers.google.com Learn more about Dart at www.dartlang.org From: GoogleDevelopers Views: 0 0 ratings Time: 00:00 More in Science & Technology

Read the article
Blocking path scanning

- by clinisbut

I'm seeing in my access log a number of request very suspicious: /i /im /imaa /imag /image /images /images/d /images/di /images/dis They part from a known resource (in the above example /images/disrupt.jpg). All comming from same IP. Requests varies from 1/sec to 10/sec, seems somewhat random. It's obviously they are trying to find something and seems they are using a script. How do I block this kind of behaviour? I though of blocking the IP request, at least for a given time. Keeping in mind that: Request intervals seems legitimate (at least I think so). I don't want to end blocking a search engine bot, which may find 404 urls too (and that's a different problem, I know). ¿Do they use always same IP?

Read the article
How to get rid of crawling errors due to the URL Encoded Slashes (%2F) problem in Apache

- by user14198

The Google web crawler has indexed a whole set of URLs with encoded slashes (%2F) for our site. I assume it has picked up the pages from our XML sitemap file. The problem is that the live pages will actually result in a failure because of the Url Encoded Slashes Problem in Apache. Some solutions are mentioned here We are implementing a 301 redirect scheme for all the error pages. This should make the Google bot delete the pages from the crawling errors (no more crashing pages). Does implementing the 301s require the pages to be "live"? In that case we may be forced to implement solution 1 in the article. The problem is that solution 1 will pose a security vulnerability..

Read the article
Les langages de programmation exceptés du droit d'auteur, la Cour Européenne les inclut avec les fonctionnalités dans un cadre restrictif

Les langages de programmation exceptés des droits d'auteur La Cour Européenne les inclut avec les fonctionnalités dans un cadre restrictif du copyright Les fonctionnalités d'un programme informatique et les langages de programmation de manière générale, ne peuvent être protégés par des droits d'auteur, a estimé l'avocat général de la Cour de Justice européenne. Yves Bot a rendu public son avis sur l'affaire qui oppose SAS à World Programming, délimitant la portée de la protection juridique en UE suite à une demande de clarification de la part de la justice britannique. Il assimile les fonctionnalités à des idées dont la protection reviendrait « à offrir la possib...

Read the article
Search engine bots accessing strange URLs

- by casasoft

We have ELMAH enabled on our site and get errors whenever a Page Not Found error is triggered on the website. We have recently redesigned a new website and so we understand that search engine robots might have previously indexed pages which they try to access and result in a Page Not Found errors. For this reason, we have set up permanent redirects for such previously indexed pages to the respective new pages. The website in mention is www.chambercollege.com and for example, a previously indexed URL was www.chambercollege.com/special-offers.aspx. This page is no longer accessible so we have created the necessary permanent redirect to redirect to the respective page on www.chambercollege.com/en/content/special-offers-161/. Now we are starting to receive Page Not Found errors of search engine bots (e.g. MSN bot) trying to access the URL www.chambercollege.com/special-offers.aspx/images/shadow_right.jpg/. Any idea how could a search engine make up that strange URL and whether you have any suggestions of what to do best?

Read the article
What failure can kill a long running IRC client? [closed]

- by Xeoncross

I have an IRC bot that I built in PHP using sockets that attempts to run forever and (if disconnected) reconnects again. I have it listening to several channels. Apparently it's fairly resilient, because it can run for several days before the process ends and CRON has to start it up again. However, based on the fact the process ends I'm assuming there are other conditions I'm not accounting for that are causing problems. I have nothing in my error logs giving me a hint. In addition, sometimes the process will continue running - but I notice it's no longer present in any of the channels on the IRC server which makes me think it violated some part of the protocol. I have logic setup to handle: reply to PING's correctly reconnect on disconnect (and reconnect to channels) respond to private messages (so someone doesn't ban it) prevent memory leaks What other failure could be killing my long-running IRC client?

Read the article
Google indexed my home page incorrectly: How can I fix it?

- by louis_coetzee

I finished my website and launched it, I think I had a problem with my robots.txt - so I changed it to look like this: 03/08/2012 # Allows all bots Sitemap: http://www.mysite.co.za/sitemap.xml User-agent: * Disallow: /dashboard/ When I google my domain.co.za - I get this back: Home A description for this result is not available because of this site's robots.txt – learn more. You've visited this page 3 times. Last visit: 2012/08/15 Now since I fixed this and added a 301 redirect to redirect mysite.co.za to www.mysite.co.za I would love it if google bot would come do a visit. Is there anything I can do to get this fixed?

Read the article
Why google isn't updating my site title in search results? [closed]

- by SharkTheDark

Possible Duplicate: Google doesn't seem to update the description or title of my homepage I had my domain for few days before I uploaded site to it, and it had one title, and then when I uploaded content it should get new title, but with my misunderstanding of WordPress it had blocked robots.txt and keyword with no-index and no-follow. But I removed that like 7 days ago, and I see in reports that Google bot is crawling over my site, but my site title isn't updating, it still has old domain title when site wasn't there... My robots.txt has now: User-agent: * Allow: / I have clear title tag on every page. How long does it take to update? Do I need to check something else?

Read the article
Issue with sitemap in GWT

- by Anusha

I have an e-commerce website www.beyondtime.in, i have been constantly monitoring the google bot crawling on my website and my webmaster account. Lately, i have found two issues that i have not been able to understand and hence want your help. 1.) The Google Bots have been only crawling www.beyondtime.in/telecom.php this URL of my website, when the URL is not even valid. So, kindly help me understand what needs to be done to let Google crawl other pages of the website as well. 2.) The second question is about the Google Webmaster account, where i've submitted my sitmap with 227 URLs, but out of that only 156 have been indexed. Also none of the images of my website have been indexed by Google. So kindly help me with this as well. Thanks

Read the article
Alternatives to using cookies?

- by theclueless1

Whate are alternatives to using cookies/client-side storage for a PHP/MySQL based site on Apache. Scenario/Requirements: I want to try using some anti-bot code to prevent specific scrapers etc. from accessing the site. I would like to run this code before launching the rest of the site (before DB access etc.). I don't want to constantly run the same code on every page-load after a visitor has passed the initial check. I'd like to avoid the use of Cookies/Client side storage if at all possible. The only solution I can currently think of is to write files to the server based on the visitors IP/UA, or to write a list of them to a single file. Yet this has the limitation of multiple users through a proxy/same connection, etc ... So, any ideas/suggestions? Or am I simply over working the issue?

Read the article
Tumblr is visiting my blog?

- by Hermes

I have created a blog on Tumblr a few days ago. Looking over the statistics, it seems that Tumblr itself is visiting my website, using different browsers. What is this supposed to mean? Are these real visitors or is it a Tumblr bot? One example: Browser: Chrome 32.0 OS: Win8 Resolution: 1024x768 Location: New York, United States IP Address: Tumblr (66.6.40.249) Referring URL: (No referring link) Other browsers used include: Chrome 20.0.1090.0 Firefox 21 Opera 12.14 Chrome 15.0.861.0 Chrome 32.0.1667.0 Internet Explorer 6 Internet Explorer 9 Opera 12 Opera 12.02 They all use the same screen resolution (1024x768) and have no referrer. The flash version is not set, but they do support javascript. Unfortunately, I don't have the full user agent string.

Read the article
Detecting 'stealth' web-crawlers

- by Jacco

What options are there to detect web-crawlers that do not want to be detected? (I know that listing detection techniques will allow the smart stealth-crawler programmer to make a better spider, but I do not think that we will ever be able to block smart stealth-crawlers anyway, only the ones that make mistakes.) I'm not talking about the nice crawlers such as googlebot and Yahoo! Slurp. I consider a bot nice if it: identifies itself as a bot in the user agent string reads robots.txt (and obeys it) I'm talking about the bad crawlers, hiding behind common user agents, using my bandwidth and never giving me anything in return. There are some trapdoors that can be constructed updated list (thanks Chris, gs): Adding a directory only listed (marked as disallow) in the robots.txt, Adding invisible links (possibly marked as rel="nofollow"?), style="display: none;" on link or parent container placed underneath another element with higher z-index detect who doesn't understand CaPiTaLiSaTioN, detect who tries to post replies but always fail the Captcha. detect GET requests to POST-only resources detect interval between requests detect order of pages requested detect who (consistently) requests https resources over http detect who does not request image file (this in combination with a list of user-agents of known image capable browsers works surprisingly nice) Some traps would be triggered by both 'good' and 'bad' bots. you could combine those with a whitelist: It trigger a trap It request robots.txt? It doest not trigger another trap because it obeyed robots.txt One other important thing here is: Please consider blind people using a screen readers: give people a way to contact you, or solve a (non-image) Captcha to continue browsing. What methods are there to automatically detect the web crawlers trying to mask themselves as normal human visitors. Update The question is not: How do I catch every crawler. The question is: How can I maximize the chance of detecting a crawler. Some spiders are really good, and actually parse and understand html, xhtml, css javascript, VB script etc... I have no illusions: I won't be able to beat them. You would however be surprised how stupid some crawlers are. With the best example of stupidity (in my opinion) being: cast all URLs to lower case before requesting them. And then there is a whole bunch of crawlers that are just 'not good enough' to avoid the various trapdoors.

Read the article
Testing an iphone web app on windows

- by JoseMarmolejos

When developing web apps for the iphone on a mac you can test your app in either Iphoney or the apple supplied simulator; bot of them are excellent for the task but are only available for macs. So I have to ask, are windows alternative for these iphone simulators? So far I could only find this one.

Read the article
How to strip color codes used by mIRC users?

- by daniels

I'm writing a IRC bot in Python using irclib and I'm trying to log the messages on certain channels. The issue is that some mIRC users and some Bots write using color codes. Any idea on how i could strip those parts and leave only the clear ascii text message?

Read the article
How does Dijkstra's Algorithm and A-Star compare?

- by KingNestor

I was looking at what the guys in the Mario AI Competition have been doing and some of them have built some pretty neat Mario bots utilizing the A* (A-Star) Pathing Algorithm. (Video of Mario A* Bot In Action) My question is, how does A-Star compare with Dijkstra? Looking over them, they seem similar. Why would someone use one over the other? Especially in the context of pathing in games?

Read the article
Live Messenger Programming - video

- by NicoJuicy

I have checked the possibilities of msn live sdk, i haven't come across a possiblity to add video options to a "bot". How would i implement this? I want to stream video's directly to a user's msn (multiple video's) ... Would this be possible?

Read the article
Tic-Tac-Toe AI: How to Make the Tree?

- by cam

I'm having a huge block trying to understand "trees" while making a Tic-Tac-Toe bot. I understand the concept, but I can't figure out to implement them. Can someone show me an example of how a tree should be generated for such a case? Or a good tutorial on generating trees? I guess the hard part is generating partial trees. I know how to implement generating a whole tree, but not parts of it.

Read the article
How can I transfer a file via XMPP using Python?

- by Enchantner

I'm using xmpppy library for my jabber remote administration bot, but I can't find how to send/receive a file and save it inside the directory specified. The documentation is poor and there isn't any examples, but I really want to make it. Can anyone show some examples or some links about it? Or maybe I should use an alternative xmpp bindings?

Read the article
How to go about reading a web page lazily in Clojure

- by Rayne

I and a friend recently implemented link grabbing in my Clojure IRC bot. When it sees a link, it slurp*s the page and grabs the title from the page. The problem is that it has to slurp* the ENTIRE page just to grab the link. How does one go about reading a page lazily until the first ?

Read the article

< Previous Page | 5 6 7 8 9 10 11 12 13 14 15 16 | Next Page >