Search Results

Search found 499 results on 20 pages for 'robots'.

Page 12/20 | < Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >

  • Lighttpd with FastCGI configuration running ViewVC - rewrite problems

    - by 0xC0000022L
    At the moment I am struggling with the configuration of lighttpd together with ViewVC. The configuration was ported from Apache 2.2.x, which is still running on the machine, serving the WebDAV/SVN stuff, being proxied through. Now, the problem I am having appears to be with the rewrite rules and I'm not really sure what I am missing here. Here's my configuration (slightly condensed to keep it concise): var.hgwebfcgi = "/var/www/vcs/bin/hgweb.fcgi" var.viewvcfcgi = "/var/www/vcs/bin/wsgi/viewvc.fcgi" var.viewvcstatic = "/var/www/vcs/templates/docroot" var.vcs_errorlog = "/var/log/lighttpd/error.log" var.vcs_accesslog = "/var/log/lighttpd/access.log" $HTTP["host"] =~ "domain.tld" { $SERVER["socket"] == ":443" { protocol = "https://" ssl.engine = "enable" ssl.pemfile = "/etc/lighttpd/ssl/..." ssl.ca-file = "/etc/lighttpd/ssl/..." ssl.use-sslv2 = "disable" setenv.add-environment = ( "HTTPS" => "on" ) url.rewrite-once += ("^/mercurial$" => "/mercurial/" ) url.rewrite-once += ("^/$" => "/viewvc.fcgi" ) alias.url += ( "/viewvc-static" => var.viewvcstatic ) alias.url += ( "/robots.txt" => var.robots ) alias.url += ( "/favicon.ico" => var.favicon ) alias.url += ( "/mercurial" => var.hgwebfcgi ) alias.url += ( "/viewvc.fcgi" => var.viewvcfcgi ) $HTTP["url"] =~ "^/mercurial" { fastcgi.server += ( ".fcgi" => ( ( "bin-path" => var.hgwebfcgi, "socket" => "/tmp/hgwebdir.sock", "min-procs" => 1, "max-procs" => 5 ) ) ) } else $HTTP["url"] =~ "^/viewvc\.fcgi" { fastcgi.server += ( ".fcgi" => ( ( "bin-path" => var.viewvcfcgi, "socket" => "/tmp/viewvc.sock", "min-procs" => 1, "max-procs" => 5 ) ) ) } expire.url = ( "/viewvc-static" => "access plus 60 days" ) server.errorlog = var.vcs_errorlog accesslog.filename = var.vcs_accesslog } } Now, when I access the domain.tld, I correctly see the index of the repositories. However, when I look at the links for each respective repository (or click them, for that matter), it's of the form https://domain.tld/viewvc.fcgi/reponame instead of the intended https://domain.tld/reponame. What do I have to change/add to achieve this? Do I have to "abuse" the index file mechanism somehow? Goal is to keep the /mercurial alias functional. So far I've tried sifting through the lighttpd book from Packt again, also through the lighttpd documentation, but found nothing that seemed to match the problem.

    Read the article

  • How to limit the number of concurrent CGI script invocations in Apache 2.2?

    - by hsivonen
    How can I limit the number of concurrent CGI invocations in Apache 2.2.x? More specifically, my problem is this: I have Apache hosting a Bugzilla instance and other stuff on one server. There's very little legitimate concurrent use of Bugzilla. However, it's trivial to mount a Denial of Service attack on the whole server by ignoring robots.txt and simply fetching a lot of bug pages that fork a process and hit a database.

    Read the article

  • Response code for Chinese spiders? [closed]

    - by pt2ph8
    My server is being "attacked" by Chinese spiders that don't respect the rules in my robots.txt. They are being very aggressive and using a lot of resources, so I'm going to set up some rules in nginx to block them by user agent. Question: which response code should I return, 403, 444 (empty response in nginx) or something else? I'm wondering how the spiders will react to different status codes. What's the best practice?

    Read the article

  • How google handle site traffic in google analytics

    - by Hamidreza
    I have a site with address www.exam.com and I have put Google analytics javascript scripts in it. I have made an app for my site, I want that everytime a user uses app, he visit the site in the application with built in browser which is inside the application ( I am using C# for application and .NET web browser ). User will address www.example.com/appvisit in the app and I just have put google analytics scripts in that page and nothing else. And I want to disallow this address /appvisit in my robots.txt file . I want to know that Is there any problem with doing this? will google crawl in the /appvisit directory ? Does google hate this work? and will google think this traffic is true and normal? thanks

    Read the article

  • NightHacking with James Gosling

    - by Yolande Poirier
    Java Evangelist Stephen Chin is back on the road for a new NightHacking Tour. He is meeting with James Gosling at Kona, Hawaii, the launch base of the Wave Glider. The Glider is an aquatic robot which communicates real-time data from the surface of the ocean. It runs on an ARM chip using Java SE Embedded.  "During this broadcast we will show some of the footage of his aquatic robots, talk through the technologies he is hacking on daily, and do Q&A with folks on the live chat" explains Stephen Chin.  Sign up for the live stream on Wednesday, October 23rd at:  8AM Hawaii Time 11AM PST 2PM EST 20:00 CET Follow @nighthackingtv for the next Nighthacking events

    Read the article

  • Nighthacking with James Gosling

    - by Yolande Poirier
    Java Evangelist Stephen Chin is back on the road for a new NightHacking Tour. He is meeting with James Gosling at Kona, Hawaii, the launch base of the Wave Glider. The Glider is an aquatic robot which communicates real-time data from the surface of the ocean. It runs on an ARM chip using Java SE Embedded.  "During this broadcast we will show some of the footage of his aquatic robots, talk through the technologies he is hacking on daily, and do Q&A with folks on the live chat" explains Stephen Chin.  Sign up for the live stream on Wednesday, October 23rd at:  8AM Hawaii Time 11AM PST 2PM EST 20:00 CET Follow @nighthackingtv for the next Nighthacking events

    Read the article

  • exclamation mark for sitemaps in webmastertools when resubmited

    - by Jayapal Chandran
    Hi, I have three sitemaps submitted to webmastertools. In that one has very few links and was accepted. It showed a green tick. The other two had around 150 links. They had been accepted in think yet webmastertools displays the exclamation mark. I think i saw this already but what confused was my hosting was blocking frequent bots recently by using a firewall and just now they added googles ip range in their witelist. and then my robots and sitemaps were read by webmastertools. But two sitemaps shows exclamation. I hope it is nothing to do with the above problem. What are all the reasons and where can i see the reason for that exclamation mark.? Here is the screen shot.

    Read the article

  • 32 Stunning Movie Tributes in LEGO

    - by Jason Fitzpatrick
    These impressive Sci-Fi LEGO tributes are an impressive combination of time, money, and a whole lot of LEGO bricks. Read on to see everything from Death Star hangers to adorable robots. Over at Dvice, a SyFy channel blog, they’ve rounded up 32 impressive movie tributes crafted entirely in LEGO bricks. The model seen above, for example, is composed of 30,000 bricks and is over six feet on a side. Planning on building your own? You’d better have $2,300 to blow on bricks and six months of spare time to invest. Hit up the link below for more LEGO tributes. 32 Fan-Built LEGO Tributes to Science Fiction [Dvice] 8 Deadly Commands You Should Never Run on Linux 14 Special Google Searches That Show Instant Answers How To Create a Customized Windows 7 Installation Disc With Integrated Updates

    Read the article

  • Thousands of 404 errors in Google Webmaster Tools

    - by atticae
    Because of a former error in our ASP.Net application, created by my predecessor and undiscovered for a long time, thousands of wrong URLs where created dynamically. The normal user did not notice it, but Google followed these links and crawled itself through these incorrect URLs, creating more and more wrong links. To make it clearer, consider the url example.com/folder should create the link example.com/folder/subfolder but was creating example.com/subfolder instead. Because of bad url rewriting, this was accepted and by default showed the index page for any unknown url, creating more and more links like this. example.com/subfolder/subfolder/.... The problem is resolved by now, but now I have thousands of 404 errors listed in the Google Webmaster Tools, which got discovered 1 or 2 years ago, and more keep coming up. Unfortunately the links do not follow a common pattern that I could deny for crawling in the robots.txt. Is there anything I can do to stop google from trying out those very old links and remove the already listed 404s from Webmaster Tools?

    Read the article

  • Would using AJAX only "Add to Cart" buttons be wise?

    - by Alex Erwin
    I want to AJAX enable all of my Add To Cart buttons because search engine bots are indexing these and not paying attention to my robots file or site map. I just don't want to loose potential customers. I have seen a number of top sites using heavily JavaScript support content, including Amazon, is it OK to follow the trend? The rest of my site progressively degrades, but I would really like to implement this because of the benefits to the customer (instant satisfaction), my infrastructure (constant page rebuilds), and allowing me to use SEO tools to optimize without the tool picking up thousands of "Add to Cart" widgets in my catalog. Thanks

    Read the article

  • How do I block a user-agent from Apache

    - by rubo77
    How do I realize a UA string block by regular expression in the config files of my Apache webserver? For example: if I would like to block out all bots from Apache on my debian server, that have the regular expression /\b\w+[Bb]ot\b/ or /Spider/ in their user-agent. Those bots should not be able to see any page on my server and they should not appear neither in the accesslogs nor in the errorlogs. http://global-security.blogspot.de/2009/06/how-to-block-robots-before-they-hit.html supposes to uses mod_security for that, but isn't there a simple directive for http.conf?

    Read the article

  • Recovering a website

    - by Jessica
    I found my website in the Wayback Machine a few months ago, but today I've tried again and now it tells me it can't find robots.txt. My old webhost stopped paying for their servers back in August without any notice. I was going to do a backup the day it happened. Is there a way just to find the text? I have the old IP, images, but nothing else. None of the big search engines have caches anymore, and I already looked in the cache of three of my Macs with nothing to be found.

    Read the article

  • Tears of Steel [Short Movie]

    - by Asian Angel
    In the future a young couple reach a parting of the ways because the young man can not handle the fact that she has a robotic arm. The bitterness of the break-up and bad treatment from her fellow humans lead to a dark future 40 years later where robots are relentlessly hunting and killing humans. Can the man who started her down this dark path redeem himself and save her or will it all end in ruin? TEARS OF STEEL – DOWNLOAD & WATCH [Original Blog Post & Download Links] Tears of Steel – Blender Foundation’s fourth short Open Movie [via I Love Ubuntu] HTG Explains: What is the Windows Page File and Should You Disable It? How To Get a Better Wireless Signal and Reduce Wireless Network Interference How To Troubleshoot Internet Connection Problems

    Read the article

  • Robot.txt can get all soft404s fixed?

    - by olo
    I got many soft404 in Google webmaster Tools, and those webpages aren't existing any more. thus I am unable to insert <meta name="robots" content="noindex, nofollow"> into my pages, and I've been searching a while but didn't get some valuable clues. There are about 100 URLs are soft 404, to redirect them all one by one is a bit silly as it would cost too much time for me. If i just add those links into robot.txt like below User-agent: * Disallow: /mysite.asp Disallow: /mysite-more.html if this way will fix all soft404s solidly? or if there is a way to change all soft404 to hard404? Please give me some suggestions. Many thanks

    Read the article

  • Do backlinks to blocked content add value?

    - by David Fisher
    We've been debating the following SEO question at our office: If you block bot access to a page either via robots.txt or on-page noindex metadata, does that negate the value of any backlinks to that page? We have a client who wants to block some event booking form pages from being indexed as each booking form page has a unique URL parameter and the pages are "clogging up" the Google index; however lots of websites link to those booking form pages and we wouldn't want to lose the value of those links. Any opinions welcomed.

    Read the article

  • Why are the tags on my site using wordpress being indexed instead of the page?

    - by Bernard
    I can't figure out why my tags are being indexed by google and not my actual posts. So in google, my posts are showing up as mysite.com/tags/post and I of course I want it to look like mysite.com/category/actualpost. Any ideas what could be wrong? My domain is 3 years old and I just started a new focus of an existing site. I can't figure this out! There is no duplicate content, I have a sitemap submitted to webmaster tools and robots.txt...I have everything I need. This is the first time something like this has happened to me. Let me know if anyone has any ideas.

    Read the article

  • How to recover a website's lost robot.txt?

    - by Jessica
    I found my website in the Wayback Machine a few months ago, but today I've tried again and now it tells me it can't find robots.txt. My old webhost stopped paying for their servers back in August without any notice. I was going to do a backup the day it happened. Is there a way just to find the text? I have the old IP, images, but nothing else. None of the big search engines have caches anymore, and I already looked in the cache of three of my Macs with nothing to be found.

    Read the article

  • Google I/O 2010 - Google Wave Media APIs

    Google I/O 2010 - Google Wave Media APIs Google I/O 2010 - Google Wave Media APIs: Attachments can surf too! Wave 201 Seth Covitz, Jimin Li, Phil Liao Google Wave is used by diverse groups to communicate and collaborate on projects from work to school to plain old having fun. To make users even more productive, we are providing capabilities that enable them to collaborate on and around any piece of third-party content (eg attachments). In this session, we will introduce the Wave Media APIs which enable robots and gadgets to create, access, and modify third-party content in Wave. For all I/O 2010 sessions, please go to code.google.com From: GoogleDevelopers Views: 5 0 ratings Time: 41:04 More in Science & Technology

    Read the article

  • when will google revert back page rank after i cleared network unrechable error

    - by Jayapal Chandran
    For the past one month i was getting network unreachable error. I contacted my web hosting and they said that google bots were blocked if it were causing more traffic. And then they witelisted google bots. Now the errors did not appear but my ranking and search results went down to more than 6 pages or they did not appear at all. Now google is able to read my robots and sitemap. Just yesterday. when will search results and page rank gets to its previous positions? like it were before a month? Most links did not appear in google search result.

    Read the article

  • Removing existing filtered pages from Google's index: noindex / 301 / canonical to non-filtered page?

    - by Noam
    I've decided to remove some of my site's pages from the Google index to focus more of the indexed pages on higher quality pages. The pages I'm going to remove are already in the index. These removed pages are filtered pages which will continue to exist, I just don't want them in the google index because they add little quality to the same page without any filter selected. I've added in webmaster tools specification of narrow for the parameters that set these filters, but it doesn't seem this changes anything in how he handles these pages. So I'm considering three options: Adding <meta name="robots" content="noindex" /> to the html header of these filtered pages 301 to the non-filtered page that contains the most similar information and will remain in the index Canonical tag. Which I'm not sure is exactly the mainstream use case, as these aren't really the same pages. Which should I use?

    Read the article

  • RewriteRule not working at server level?

    - by Alexis Wilke
    I wanted to forbid some robots from doing certain things to my websites and decided to add a RewriteRule for that purpose. The rule works when put in one of my <VirtualHost *:80> tag and looks like this: RewriteEngine On RewriteCond %{HTTP_USER_AGENT} libwww-perl RewriteCond %{REQUEST_METHOD} POST RewriteRule . - [F,L] However, I wanted to apply that to all my websites instead of just one of them. So with the newest version of Apache2 settings, I decided to put that code in the security.conf file. This file is defined under /etc/apache2/conf-available/... (and yes, I have a softlink from the /etc/apache2/conf-enabled/... directory.) However, if the definition is only in the conf-available/security.conf files, it somehow gets ignored. From the documentation, it says that these Rewrite* commands all work at server level! Any idea of what I would be missing?

    Read the article

  • Maker Faire 2012 Attendees build with Java Technology

    - by hinkmond
    Looks like Daniel Green, systems engineer from Oracle, and the panel of Java experts had a successful Java Technology booth at this year's Maker Faire 2012. See: Maker Faire 2012 adds Java Here's a quote: "We made a huge impact for Java and Oracle, creating positive perception, building brand awareness, and introducing fun and engaging ways for future technologists to learn Java programming," says Michelle Kovac, Oracle director, Java Marketing and Operations. Good stuff, considering all the future developers of exploding robots and fire-breathing dragon metal sculptures attend the Maker Faire. They can blow up stuff with Java technology just as effectively as other programming languages. Hinkmond

    Read the article

  • How did craigspro license Craigslist content? [closed]

    - by Joshua Frank
    There's an app called craigspro that provides a much better interface to Craigslist on mobile devices. They claim that the app is Officially Licensed by Craigslist, but I thought Craigslist never licensed their content, and the only thing I can find on the subject in the terms of use is this: Any copying, aggregation, display, distribution, performance or derivative use of craigslist or any content posted on craigslist whether done directly or through intermediaries (including but not limited to by means of spiders, robots, crawlers, scrapers, framing, iframes or RSS feeds) is prohibited. As a limited exception, general purpose Internet search engines and noncommercial public archives will be entitled to access craigslist without individual written agreements executed with CL that specifically authorize an exception to this prohibition if ... Does anyone know how do get a "written agreement" with Craigslist, and roughly what their terms would be? Do they charge a fee, or just check that you're not evil? I'll try next with Craigslist directly, but I'd like to get a sense of the landscape before stumbling in.

    Read the article

  • Does a "nofollow" attribute on a link prevent URL discovery by search engines?

    - by Stephen Ostermiller
    I know that nofollow prevents link juice from being passed across a link. But if search engine robots discover a link with a nofollow on it, will they add that link to their crawl queue? In other words, if I create a link to a brand new page and put a rel=nofollow attribute on that link, will it prevent search engine bots (particularly Googlebot) from crawling the page. (Assuming that this link remains the only link into that page.) I've read conflicting reports about this over the years and I'm looking for authoritative references about the current state of affairs. Official statements from Google or published results of independent testing would be ideal.

    Read the article

  • webmaster tools - Network Unreachable

    - by Jayapal Chandran
    Hi, webmaster tools for my site displays that robots.txt unreachable and for all links in sitemap it says network unreachable. sitemap.xml unreachable. These appear in crawl stats page. I discussed with the support team of my hosting and they said... Hi, I have verified apache logs, i cannot see any issues on your website/webserver/ Possible issues. There may the routing issue from the googles server to our server. When a google bots hits goes high the IP will be automatically blacklisted by our firewall to avoid server loads & downtimes. As we donot have access to their services, We cannot able to give details of their details/logs etc. The sitemaps link shows an exclamation mark which means the file was not reachable. What could be the problem and how to solve it?

    Read the article

< Previous Page | 8 9 10 11 12 13 14 15 16 17 18 19  | Next Page >