Search Results

Search found 499 results on 20 pages for 'robots'.

Page 10/20 | < Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >

check what process was causing the problem of high cpu load

- by linuxk

I'm running nginx wordpress server in KVM using 12.04 server x86. It was running very well about 4 month until 2 hours ago. I found that my website is down and no ping response. Virt-manager logged high cpu load(plz see the picture below) before unexpected shut down. I want to know what process caused unexpected shutdown. The following log files make me think my server is attacked. Any suggestions and help would be appreciated. kern.log and syslog showed me same output. Nov 11 03:54:11 www kernel: [1344541.156239] [UFW BLOCK] IN=eth0 OUT= MAC= SRC=0.0.0.0 DST=224.0. 0.1 LEN=32 TOS=0x00 PREC=0xC0 TTL=1 ID=0 DF PROTO=2 Nov 11 03:54:11 www kernel: [1344541.156315] [UFW BLOCK] IN=eth0 OUT= MAC= SRC=0101:080a:2334:c90 0:0100:0000:0000:0000 DST=ff02:0000:0000:0000:0000:0000:0000:0001 LEN=72 TC=0 HOPLIMIT=1 FLOWLBL=0 PROTO=ICMPv6 TYPE=130 CODE=0 /nginx/access.log showed me 119.235.237.17 - - [11/Nov/2012:03:45:29 +0900] "GET /blog HTTP/1.1" 200 30493 "-" "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)" my-server-ip - - [11/Nov/2012:11:05:30 +0900] "POST /wp-cron.php?doing_wp_cron=13 HTTP/1.0" 499 0 "-" "WordPress/3.4.2; http://mywebsite.com" Server turned on in here. 119.235.237.16 - - [11/Nov/2012:11:05:30 +0900] "GET /blog HTTP/1.1" 200 32935 "-" "Yeti/1.0 (NHN Corp.; http://help.naver.com/robots/)"

Read the article
Google suddenly only indexes https and not http

- by spender

So all of a sudden, searches for our site "radiotuna" give out the result as an HTTPS link. https://www.google.com/?q=radiotuna#hl=en&safe=off&output=search&sclient=psy-ab&q=radiotuna&oq=radiotuna&gs_l=hp.12...0.0.0.3499.0.0.0.0.0.0.0.0..0.0.les%3B..0.0...1c.LnOvBvgDOBk&pbx=1&bav=on.2,or.r_gc.r_pw.r_qf.&fp=177c7ff705652ec3&biw=1366&bih=602 We only use https for the download of two specific files (these urls are resources used for autoupdate functionality of an app we distribute). All other parts of the site should be served over http. We wouldn't like to see any other traffic over https, nor any of our site links to appear in search engines as https. I'd like to address this issue. It seems that the following solutions are available: hand out an https specific robots.txt as such: User-agent: * Disallow: / and/or at app-level, 301 permanent redirect all requests (except the two above) to HTTP if they come in as HTTPS. My concern with the robots method is that, say (for some reason) google decided not to index http pages, disallowing https pages might mean that google has nothing left to index with disastrous consequences for our ranking. This means I'm inclined to go with a 301 redirect. Any thoughts?

Read the article
Wget site mirror, links with rel="<content>" not followed

- by Pacifika

Whilst creating a site mirror using wget 1.12 on Ubuntu links with a rel attribute set are not downloaded: <a href="link" rel="tag">text</a> Rel="tag" is a microformat (By adding rel="tag" to a hyperlink, a page indicates that the destination of that hyperlink is an author-designated "tag" (or keyword/subject) for the current page). My WordPress theme uses this for link to tags, so 99% of the site is ignored. Edit: it turns out all my permalinks use rel="bookmark" and are skipped as well. I'm using the following wget command (this ignores robots.txt and also follows nofollow links): wget -mkp -e robots=off http://site How do I make wget follow links with rel set?

Read the article
SEO with duplicate content

- by user16831

I have a nature photography site with multiple types of photo galleries. Each photo and associated caption on my site appears in several galleries. For instance, a photo of a goldfinch that was taken on a trip to New Mexico in 2008 will appear in the "goldfinch.php" gallery, in the "finches.php" gallery, and in the "New_Mexico_2008.php" gallery. This duplication is useful for my site visitors - User A may want to see goldfinch photos, whereas User B wants to see photos from New Mexico - but I am concerned about the SEO implications. The typical suggestions to deal with duplicate content, such as 301 redirects and canonical tags, probably won't work in this case, because the page content is substantially different (ranging from ~1% to ~90% duplication, depending on the specific example chosen). The obvious solution to me would be to edit robots.txt to only allow search engines to crawl one type of gallery - for instance, if they crawled only the galleries organized by species(e.g. goldfinch.php), all the photos on my site would be found exactly once. However, the Google content guidelines recommend against blocking crawler access to duplicate information. Should I go ahead and use robots.txt anyway? Or is there a better solution?

Read the article
When load balancing, must all copies of static web page be exactly the same?

- by Gilles Blanchette

I am used to get answers for everything on the web, but not this time... Yesterday I enable Amazon DNS weight functionally to load balance 7 websites between two different IP addresses (split 50%-50%). Both servers run IIS 8.5, sites runs well on both sides. Today I found out that Google WebMasterTools is reporting fails error with file robots.txt, all close to 50% of access try errors. The robots.txt file is ok and accessible (even via Google testing URL page) on both servers. Lets say current version of static web pages are on the first computer and the updated version of the same web pages are on the second computer. Can it be the problem? When load balancing, can static web pages be slightly different from one host server to the other? Thank you for your help

Read the article
Per-user vhost logging

- by kojiro

I have a working per-user virtual host configuration with Apache, but I would like each user to have access to the logs for his virtual hosts. Obviously the ErrorLog and CustomLog directives don't accept the wildcard syntax that VirtualDocumentRoot does, but is there a way to achieve logs in each user's directory? <VirtualHost *:80> ServerName *.example.com ServerAdmin [email protected] VirtualDocumentRoot /home/%2/projects/%1 <Directory /home/*/projects/> Options FollowSymlinks Indexes IndexOptions FancyIndexing FoldersFirst AllowOverride All Order Allow,Deny Allow From All Satisfy Any </Directory> Alias /favicon.ico /var/www/default/favicon.ico Alias /robots.txt /var/www/default/robots.txt LogLevel warn # ErrorLog /home/%2/logs/%1.error.log # CustomLog /home/%2/logs/%1.access.log combined </VirtualHost>

Read the article
Prevent bot from crawling certain areas of site.

- by Skoder

Hey, I don't know much about SEO and how web spiders work, so forgive my ignorance here. I'm creating a site (using ASP.NET-MVC) which has areas that displays information retrieved from the database. The data is unique to the user, so there's no real server-side output caching going on. However, since the data can contain things the user may not wish to have displayed from search engine results, I'd like to prevent any spiders from accessing the search results page. Are there any special actions I should take to ensure that the search result directory isn't crawled? Also, would a spider even crawl a page that's dynamically generated and would any actions preventing certain directories being search mess up my search engine rankings? edit: I should add, I'm reading up on robots.txt protocol, but it relies on co-operation from the web crawler. However, I'd also like to prevent any data-mining users who will ignore the robots.txt file. I appreciate any help!

Read the article
How should I handle pages that move to a new url with regards to search engines?

- by Anders Juul

Hi all, I have done some refactoring on a asp.net mvc application already deployed to a live web site. Among the refactoring was moving functionality to a new controller, causing some urls to change. Shortly after the various search engine robots start hammering the old urls. What is the right way to handle this in general? Ignore it? In time the SEs should find out that they get nothing but 400 from the old urls. Block old urls with robots.txt? Continue to catch the old urls, then redirect to new ones? Users navigating the site would never get the redirection as the urls are updated through-out the new version of the site. I see it as garbage code - unless it could be handled by some fancy routing? Other? As always, all comments welcome... Thanks, Anders, Denmark

Read the article
Codeigniter Routes for filename with extension

- by thehuby

I am using codeigniter and its routes system successfully with some lovely regexp, however I have come unstuck on what should be an easy peasy thing in the system. I want to include a bunch of search engine related files (for Google webmaster etc.) plus the robots.txt file, all in a controller. So, I have create the controller and updated the routes file and don't seem to be able to get it working with these files. Here's a snip from my routes file: $route['robots\.txt|LiveSearchSiteAuth\.xml'] = 'search_controller/files'; Within the function I use the URI helper to figure out which content to show. Now I can't get this to match, which points to my regexp being wrong. I'm sure this is a really obvious one but its late and my caffeine tank is empty :)

Read the article
django (under mod_wsgi) and php

- by Hellnar

Hello Under my debian copy, I run a django site runs via apache2 and mod_wsgi. Now I want to include a wordpress to it, for that I need to install php - apache bindings. I am curious what library is recommended for this, aswell as how shall I be doing the apache2 config file ? Here is my current apache 2 000-default file: <VirtualHost *:80> Alias /media /home/myuser/myproject/statics Alias /favicon.ico /home/myuser/myproject/statics/pic/favicon.ico Alias /robots.txt /home/myuser/myproject/templates/robots.txt Alias /admin_media /usr/lib/python2.5/site-packages/Django-1.1.1-py2.5.egg/django/contrib/admin/media WSGIScriptAlias / /home/myuser/myproject/myproject_wsgi.py WSGIDaemonProcess myproject user=myuser group=myuser threads=25 WSGIProcessGroup myproject </VirtualHost> I want to add Wordpress to my www.mysite.com/blog

Read the article
How to remove "index.php?" from HTACCESS [duplicate]

- by Francis Goris

This question already has an answer here: Reference: mod_rewrite, URL rewriting and “pretty links” explained 2 answers I have url like this: www.site.com/index.php?/genero/aventura/av/ But I would like this to be my new url: site.com/genero/aventura/av/ I used the following code: <IfModule mod_rewrite.c>RewriteEngine On RewriteCond %{HTTP_HOST} !^www.site.com/$ [NC] RewriteRule ^index.php\?/(.*)$ site.com/$1 [R=301,L] </IfModule> but only returns me: site.com/index.php?/genero/aventura/av/ This is my latest & full version: RewriteEngine on #RewriteCond $1 !^(index\.php|ver_capitulo\.html|google3436eb8eea8b8d6e\.html|BingSiteAuth\.xml |portadas|public|mp3|css|favicon\.ico|js|plantilla|i|swf|plugins|player\.swf|robots\.txt) RewriteCond $1 !^(index\.php|public|css|js|i|feed|portadas|robots\.txt|BingSiteAuth\.xml|plugins|i|mp3|favicon\.ico|pluginslist\.xml|google3436eb8eea8b8d6e\.html) RewriteRule ^(.*)$ /index.php?/$1 [L] #DirectoryIndex index.php #RewriteCond %{THE_REQUEST} http://www.page.com/index\.php [NC] #RewriteRule ^(.*?)index\.php$ http://page.com/$1 [L,R=301,NC,NE] #DirectoryIndex index.php #RewriteEngine On Thanks for reading.

Read the article
AWStats: Visits from IP address vs Crawlers

- by user3651934

I use AWStats in cPanel to see stats of my website. Under Hosts section I see one IP address that has visited 150 pages. I am not sure if one person would have visited 150 pages using a browser. But if these 150 pages have been visited using a software application, then should not it be listed under Robots/Spider section. So how do I determine if I should block a certain IP address that has visited several hundred pages of my website? Thanks

Read the article
Java Spotlight Episode 138: Paul Perrone on Life Saving Embedded Java

- by Roger Brinkley

Interview with Paul Perrone, founder and CEO of Perrone Robotics, on using Java Embedded to test autonomous vehicle operations for the Insurance Institute for Highway Safety that will save lives. Right-click or Control-click to download this MP3 file. You can also subscribe to the Java Spotlight Podcast Feed to get the latest podcast automatically. If you use iTunes you can open iTunes and subscribe with this link: Java Spotlight Podcast in iTunes. Show Notes News JDK 8 is Feature Complete Java SE 7 Update 25 Released What should the JCP be doing? 2013 Duke's Choice Award Nominations Another Quick update to Code Signing Article on OTN Events June 24, Austin JUG, Austin, TX June 25, Virtual Developer Day - Java, EMEA, 10AM CEST Jul 16-19, Uberconf, Denver, USA Jul 22-24, JavaOne Shanghai, China Jul 29-31, JVM Summit Language, Santa Clara Sep 11-12, JavaZone, Oslo, Norway Sep 19-20, Strange Loop, St. Louis Sep 22-26 JavaOne San Francisco 2013, USA Feature Interview Paul J. Perrone is founder/CEO of Perrone Robotics. Paul architected the Java-based general-purpose robotics and automation software platform known as “MAX”. Paul has overseen MAX’s application to rapidly field self-driving robotic cars, unmanned air vehicles, factory and road-side automation applications, and a wide range of advanced robots and automaton applications. He fielded a self-driving autonomous robotic dune buggy in the historic 2005 Grand Challenge race across the Mojave desert and a self-driving autonomous car in the 2007 Urban Challenge through a city landscape. His work has been featured in numerous televised and print media including the Discovery Channel, a theatrical documentary, scientific journals, trade magazines, and international press. Since 2008, Paul has also been working as the chief software engineer, CTO, and roboticist automating rock star Neil Young’s LincVolt, a 1959 Lincoln Continental retro-fitted as a fully autonomous extended range electric vehicle. Paul has been an engineer, author of books and articles on Java, frequent speaker on Java, and entrepreneur in the robotics and software space for over 20 years. He is a member of the Java Champions program, recipient of three Duke Awards including a Gold Duke and Lifetime Achievement Award, has showcased Java-based robots at five JavaOne keynotes, and is a frequent JavaOne speaker and show floor participant. He holds a B.S.E.E. from Rutgers University and an M.S.E.E. from the University of Virginia. What’s Cool Shenandoah: A pauseless GC for OpenJDK

Read the article
Managing 404 error pages with noindex and url rewrite

- by ZenMaster

Currently I use custom 404 error pages, having the following meta on them : <meta content="noindex" name="robots"> My guess is this way Google will remove deleted pages faster from the index, anyone has experienced a case where it does ? Also, is it better to have the url path rewritten to the actual error page, like the url pattern: http://{mysite}/{404_error_page} or is it best to keep the old deleted page's url when serving a 404 error ?

Read the article
The New Face of Autism Therapy

<b>Popsci:</b> "With one in 110 children diagnosed with autism, and therapists in short supply, researchers are developing humanoids to fill the gaps. But can robots help patients forge stronger bonds with people? "

Read the article
Create Keyword Dense Content For Better SEO

Briefly touching on this in the introduction this is important to do and be aware of but do not make this a massive part of your efforts to achieve better search engine ranking. This basically means optimizing your content to be more keyword dense so that the search engine robots will pick up your site as being relevant for a certain search phrase or keyword.

Read the article
Keyword Optimisation of Your Web Pages - Not at the Expense of the Visitor Experience

When writing copy for the pages of your website you need to get the balance right between a page targeting the search engine robots and hence a high ranking and one that is visitor friendly. Don't sacrifice a good visitor experience for a high ranking. To make sales you need both.

Read the article
Google webmaster Index Status. Total Indexed=0

- by hammad

I previously changed my domain from www.visualstudiolearn.blogspot.com to www.visualstudiolearn.com... i had around 300 posts with the previous domain name and most of them where showing up on Google. Now that i have changed my domain name the index status shows total indexed as 0 and when i go to the advanced tab it says 304(not selected) and 217 blocked my robots. Im really depressed because of this situation. could you please help out???

Read the article
iPad2 - Yet Another Fundamental Defect in an Apple product

- by Kit Ong

First it was antenna defect in iPhone4 now it has been reported that some iPad 2 have display issues, Apple really needs to look at their manufacturing process. It doesn't help that workers are working like robots in their main supplier's factory Foxconn. More info on reported display light bleeding http://www.cultofmac.com/if-your-ipad-2-has-display-problems-do-not-return-it-heres-why/87197 How to check your iPad for dead pixel / light leak / bleed http://www.theipadguide.com/content/ipad-dead-pixel-test-how/7171269

Read the article
AJAX, DHTML, and SEO Search Engine Optimization

AJAX and DHTML can be used for rich user experience of websites, but AJAX and DHTML do not work with SEO. Search crawler robots do not crawl and analyze the JavaScript.

Read the article
Fetch as Google error 403

- by Bojan Vidanovic

2 weeks ago, google cant access my website anymore, in webmaster tools i cant fetch any page, i always get error 403, and the website has been completly disapperard form the google search results. I cant figure how suddendly it cant see it anymore, i've checked .htaccess and there nothing that blocks google crawlers, and robots.txt is fine to. Anyway the site is accesibly normaly for users. Anyone had this problems? please help!

Read the article
Best method to do A B testing across to subdomains

- by Lior

I want to do an A B test of an entire site for a new design and UX with only slight changes in content (a big brand site that has good Google rankings for many generic keywords. My idea of implementation is doing a 302 redirect to the new version (placing it on www1 subdomain) and allowing only user agents of known browsers to pass. The test version will have disallow all in the robots text. Will Google treat this favorably or do I have to use Google Website Optimizer (which will give me tracking headaches)?

Read the article
3d point cloud reconstruction using in c++

- by techie_db

I've got a project which involves 3D reconstruction if point clouds from a 3D scanner. Being relatively new to the computer vision field I'm in the dark. The objective of the project is to implement this 3D reconstruction in C/C++ without using Matlab so that it can be further integrated with the ROS (for robots). Can anyone guide me with this issue so that I get enough idea regarding how to approach the problem?

Read the article
Avg. Visit Duration 00:00:00 conclusion

- by user1592845

What can I predict when I see in Google Analytics that total visits by search for some day are 93 visits while 70 visits of them have the value 00:00:00 for Avg. Visit Duration? Did those visits made by robots? or How could they regarded as visits while they don't spend any time on the website? Or this is dysfunction of the Google's Analytics script by which it does not able to count the visit time?

Read the article
Will multivariate (A/B) testing applied with 302 redirects to a subdomain affect my Google ranking?

- by Lior

I want to do an A B test of an entire site for a new design and UX with only slight changes in content (a big brand site that has good Google rankings for many generic keywords. My idea of implementation is doing a 302 redirect to the new version (placing it on www1 subdomain) and allowing only user agents of known browsers to pass. The test version will have disallow all in the robots text. Will Google treat this favorably or do I have to use Google Website Optimizer (which will give me tracking headaches)?

Read the article

< Previous Page | 6 7 8 9 10 11 12 13 14 15 16 17 | Next Page >