crawling - Page 5 - Developer IT

How to Use SEO Services to Have a Successful Website

Essentially the optimization of web pages in a site is required because search engines are software programs based on a specific algorithm that is used at the time of its crawling into your website. Each website has numerous web pages and it is practically difficult to index and crawl each and every web page. No search engine can perform this function.

Read the article

How Many Web Pages Should Be Indexed?

Search engines are crawling websites around the clock for unique web pages and content.Google has always been on the top in indexing deep-links of any website, Google indexed 26 million pages in 1998 and in past 10 years Google have indexed over 1 trillion pages. So, this gives a fair idea that how big this cyber world is.

Read the article

Need to Get Your New Site Indexed Fast? 6 Easy Steps

When you are creating a new website you want it indexed almost immediately if not sooner. It can be frustrating waiting for the Big G to get around to crawling your site while you are patiently waiting to go on with the next phase of your marketing plan.

Read the article

Force request to miss cache but still store the response

- by Tom Marthenal

I have a slow web app that I've placed Varnish in front of. All of the pages are static (they don't vary for a different user), but they need to be updated every 5 minutes so they contain recent data. I have a simple script (wget --mirror) that crawls the entire website every 15 minutes. Each crawl takes about 5 minutes. The point of the crawl is to update every page in the Varnish cache so that a user never has to wait for the page to generate (since all pages have been generated recently thanks to the spider). The timeline looks like this: 00:00:00: Cache flushed 00:00:00: Spider starts crawling to update cache with new pages 00:05:00: Spider finishes crawling, all pages are updated until 1:15 A request that comes in between 0:00:00 and 0:05:00 might hit a page that hasn't been updated yet, and will be forced to wait a few seconds for a response. This isn't acceptable. What I'd like to do is, perhaps using some VCL magic, always foward requests from the spider to the backend, but still store the response in the cache. This way, a user will never have to wait for a page to generate since there is no 5-minute window in which parts of the cache are empty (except perhaps at server startup). How can I do this?

Read the article

Does microsoft security essentials work with windows server2003 r2 sp2?

- by Crash893

Does Microsoft security essentials work with Windows Server2003 R2 SP2? I currently have AVG Business Edition that's getting ready to run out. I don't want to pay 189 bucks a year but i don't want to uninstall it and screw around with MSE just to find out it doesn't work and have to come crawling back to AVG 9.0

Read the article

google search engine

- by kourosh

I am working on a google box, something like this, http://mytwentyfive.com/blog/wp-content/uploads/byme/Google%20Search%20Appliances.jpg I am pointing the crawler to a folder where there are html files. before the crawler was crawling the files and indexing them but right now it finds the pattern or the folder but not following any html files within the folder. I have tried everything I could and know but, can't think of anything else. Can someone help? thanks

Read the article

Innotop and Monit to kill thread using too much resources

- by pocesar

Instead of restarting the whole MYSQL process, sometimes I just want to kill the offending thread instead of making everything go down. Usually the spike in CPU is when a bot is crawling the first pages of pagination of my site (over 70.000 paginated results, 45 items per page). Is there a way I could do this automatically using monit and innotop? I couldn't find relevant information on Google, that's why I'm asking here. If these two tools aren't par to the task, which ones should I use?

Read the article

Can I increase Windows 7 start menu vertical size to let mire items fit in it?

- by Ivan

I hate putting shortcuts/files on desktop as well as crawling through "All Programs" menu any frequently (and I only pin some essential every-day applications to the task bar). So, I put all the programs I occasionally use to the start menu itself (above the automatic recently used programs section). But even though I've switched it to use small icons, I run out of vertical space in it (just about 16 shortcuts fit there at maximum).

Read the article

I have a collection of dead consumer grade routers, should I buy a real one?

- by Ex Networking Guy

Am I crazy for considering purchasing a Cisco 2621 for the house? I am familiar enough with IOS to set up a simple gateway router, I don't really need the experience. At this point, I'm a developer so my days of crawling through CO's and under desks are long past me. But I am really sick of crappy consumer grade networking gear. Maybe I have lousy luck and this stack of WRTG54s is just because I have lousy power, or whatever.

Read the article

Thousands of 404 errors in Google Webmaster Tools

- by atticae

Because of a former error in our ASP.Net application, created by my predecessor and undiscovered for a long time, thousands of wrong URLs where created dynamically. The normal user did not notice it, but Google followed these links and crawled itself through these incorrect URLs, creating more and more wrong links. To make it clearer, consider the url example.com/folder should create the link example.com/folder/subfolder but was creating example.com/subfolder instead. Because of bad url rewriting, this was accepted and by default showed the index page for any unknown url, creating more and more links like this. example.com/subfolder/subfolder/.... The problem is resolved by now, but now I have thousands of 404 errors listed in the Google Webmaster Tools, which got discovered 1 or 2 years ago, and more keep coming up. Unfortunately the links do not follow a common pattern that I could deny for crawling in the robots.txt. Is there anything I can do to stop google from trying out those very old links and remove the already listed 404s from Webmaster Tools?

Read the article

Can preventing directory listings in WordPress upload folders cause Google ranking drops when they cause 403 errors in Webmaster Tools?

- by Kelly

I recently moved to a new host that blocks crawling to my uploads folders but (hopefully) allows the files in the folder to be crawled. I now show many 403 errors for each folder in the uploads folder in my Webmaster Tools. For example, http://www.rewardcharts4kids.com/wp-content/uploads/2013/07/ shows a 403 error. For example, I can access this file: http://www.rewardcharts4kids.com/wp-content/uploads/2013/07/lunch-box-notes.jpg but I cannot access the folder it is in. My rankings went down after I moved to this host and I am wondering if: this could be the reason. is this how files/folders are supposed to be set up?

Read the article

Where can I find an exhaustive list of meta tags and what they do?

- by leeand00

It seems to me that there are a ton of <meta> tags for all sorts of different purposes out there... Though they all follow a similar format of <meta name="" content="" /> they seem to serve a vast variety of different purposes from controlling the crawling of search engine bots, providing search engine bots with descriptions of pages, to making sure a page display correctly on a mobile device. These tags fall into so many different categories I was wondering if anyone had a wiki or master list of possible meta tags and their content.

Read the article

Preventing indexing duplicate content by search engines

- by umesh awasthi

I am in process of migrating my old domain (www.oldurl.com) to new domain (www.newurl.com). Almost all the content,URL structure as well database is same except for few URL's and only difference will be in the domain name. I have made entries in the Apache's .htaccess file to set 301 redirect and currently have blocked all search engines from crawling my new domain by setting in robot.txt file. I am not sure how i will handle the duplicate content issue as when i will make the new domain go live. Should i block search engines to index/crawl my old domain? i am new to this field and not sure if this is actually any duplicate content issue or not.

Read the article

Best way to prevent Google from indexing a directory [duplicate]

- by Gkhan14

This question already has an answer here: Stopping Google index some web pages I have 5 answers I've researched many methods on how to prevent Google/other search engines from crawling a specific directory. The two most popular ones I've seen are: Adding it into the robots.txt file: Disallow: /directory/ Adding a meta tag: <meta name="robots" content="noindex, nofollow"> Which method would work the best? I want this directory to remain "invisible" from search engines so it does not affect any of my site's ranking. In other words, I want this directory to be neutral/invisible and "just there." I don't want it to affect any ranking. Which method would be the best to achieve this?

Read the article

410 Responses when your CMS host doesn't support them?

- by leeand00

Sending a 410 responses for a page that no longer exist should make Google stop crawling for that page. The site I am working on has been recently migrated, and very little of the content was migrated. I've already turned the existing content into 301 redirects (the content that is on both the old and the new site), but now I would like to flush the old content from Google's memory by placing 410 responses in it's path when it returns to crawl for them and finds a 404 response. However, I asked our CMS host about it, and they said that our CMS does not support 410 responses. Is there some other way to post a 410 response, like making a dead link 301 redirect to a page that a 410 response in the form of a meta tag?

Read the article

Which token from a long User-Agent should I use in robots.txt?

- by Gaia

The definition of User-Agent states that several tokens can be included, as deemed necessary by the client. I want to block certain bots via robots.txt and I am confused as to which part of the User-Agent string to use, especially for more obscure bots. For example: Mozilla/5.0 (compatible; uMBot-LN/1.0; mailto: [email protected])" JS-Kit URL Resolver, http://js-kit.com/ Mozilla/5.0 (compatible; SEOkicks-Robot +http://www.seokicks.de/robot.html Do I use the second token? Can tokens contain spaces, or did the SEOkicks folks forget a semicolon after SEOkicks-Robot? I don't actually intend on making my question specific to a couple bots - I want to know the guideline: which part of UA do I place in robots.txt for these exotic bots with UA as long as a haiku? User-agent: uMBot-LN/1.0 Disallow: / PS: Thank you but I do not need to hear that undesirable bots are better blocked with mod_security. I already have commercial mod_sec rules in place.

Read the article

Does a "nofollow" attribute on a link prevent URL discovery by search engines?

- by Stephen Ostermiller

I know that nofollow prevents link juice from being passed across a link. But if search engine robots discover a link with a nofollow on it, will they add that link to their crawl queue? In other words, if I create a link to a brand new page and put a rel=nofollow attribute on that link, will it prevent search engine bots (particularly Googlebot) from crawling the page. (Assuming that this link remains the only link into that page.) I've read conflicting reports about this over the years and I'm looking for authoritative references about the current state of affairs. Official statements from Google or published results of independent testing would be ideal.

Read the article

I need to go from Linux to VS2012 fast. Anybody have a guide?

- by Mikhail

I need to parallelize a library through the use of a graphic accelerator. I have had no trouble doing similar work on Linux but I am struggling with using Visual Studios 2012. I can't figure out how to do analogs to simple things. I can't figure out how to do simple things like specifying linkage, libraries, and include files. I need to move quickly from understanding the Linux build system to the Windows build system. Does anybody have a guide or some advice on moving from Linux to Visual Studios development? I feel like I am crawling through a labyrinth of menus. With frequent dead ends saying that this feature has moved to another place. Also this code must build with VS2012.

Read the article

Good Literature for "Object oriented programming in C"

- by Dipan Mehta

This is not a debate question about whether or not C is a good candidate for Object oriented programming or not. Quite often C is the primary platform where the development is happening. I have seen, and hopefully learnt through crawling many open source and commercial projects - that while the language inherently doesn't stop you if you create "non-object" code. However, you can still think in the "Object" way and reasonably write code that captures this designs thinking. For those who has done this, OO way is still the best way to write code even when you are programming in C. While, I have learnt most of it through the hard way, are there any deep literature that can help educate the relatively young guys about how to do OO programming in C?

Read the article

Google indexed page a day before also reflecting in search but today everything vanish

- by ganesh

We had robots.txt which disallow all robots as we were in development. We are live now. We change robots.txt as per our requirement a day before. Submit indexes using Google Webmaster Tools index status. After this we can see proper result in search as well as Google images search was working as expected. Suddenly today all these things vanish from Google Search. Now again I can see old result i.e. under construction message. I checked robots.txt in Google Webmaster Tools, it's ok - no crawling errors. Kindly let me know what exactly happened? How I can inform this issue to Google?

Read the article

Google webmaster Verification failed.

- by KMC

I have a site created by Ruby on Rails. I had verified against Google Webmaster Tool some months ago, which was successful. One day webmaster starts giving me Re-verification fails. I tried again to verify my site using Meta tags and HTML files. But I kept having "Verification failed. The connection to your server timed out." Since then, Google stop crawling my site's content - though, somehow google still crawl my PDF contents on my site.

Read the article

Doubt regarding search engine/plugin(One present on the website itself)

- by Ravi Gupta

I am new to web development and trying to study various types of websites as case study. Right now my focus is on how search engines works for an eCommerce website. I know basic functioning for a search engine, i.e. crawl web pages, index them and the display the results using those indexes. But I got little confuse in case of an eCommerce website. Don't you think that it would be better if a search engine instead of crawling the web pages containing products, it should directly crawl the database and index the products stored in the database? And when a user search for any product, it will simply give us the rows of the table which matches the user query? If this is not the case, can someone please explain how the usual method works on eCommerce website?

Read the article

Does Submit to Index on a page with new content update Content Keywords for the site?

- by Dan Kanze

Using Google Webmaster Tools I'm trying to update the Content Keywords of my site. I'm confused about the relationship between Submit to Index and Content Keywords Does Fetch as Google -- Submit to Index on a previously existing indexed page containing new content expidite updating the Content Keywords crawled by the real Google bot? Does Submit to Index only submit new URL's so that previously indexed URL's still point to the older cached version until Google crawls specifically for new content on its own? Does Submit to Index have anything to do with Content Keywords or crawling new content being a previously indexed page or never been indexed page?

Read the article

How to allow Google Images search to by pass hotlink protection?

- by Marco Demaio

I saw Google Images seems to index my images only if hotlink protection is off. * I use anyway hotlink protection because I don't like the idea of people sucking my bandwidth, i simply this code to protcet my sites from being hotlinked: RewriteEngine on RewriteCond %{HTTP_REFERER} !^$ RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?mydomain\.com/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?mydomain\.com$ [NC] RewriteRule .*\.(jpg|jpeg|png|gif)$ - [F,NC,L] But in order to allow Google Image search to bypass my hotlink protection (I want Google Images search to show my images) would it suffice to add a line like this one: RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?google\.com/.*$ [NC] RewriteCond %{HTTP_REFERER} !^http(s)?://(www\.)?google\.com$ [NC] Because I'm wondring: is the crawler crawling just from google.com? and what about google.it / google.co.uk, etc.? FYI: on Google official guidelines I did not find info about this. I suppose hotlink protection prevents Google Images to show images in its results because I did some tests and it seems hotlink protection does prevent my images to be shown in Google Images search.

Read the article

Should I prevent search engines indexing tag/category pages?

- by Macha

On my site, I currently have no special rules for search engines. It is a blog, statically generated using a Python program. When I search for some of my articles on Google, there is usually a tag or category page included in the results. Sometimes it even ranks ahead of the article itself. Obviously, as these links aren't always going to have the article on them, this aren't the results I want people to click on. So, I'm thinking of setting noindex on these pages. Is there any possible downside to doing so? Is this possible to do via robots.txt, or do I have to add it to all the relevant templates? All I can find for robots.txt are ways to stop the search engine crawling those pages, which isn't what I want - while I don't want them indexed, it's still the only surefire way to find all my blog posts.

Search Results

Search found 241 results on 10 pages for 'crawling'.

Page 5/10 | < Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >

- by Tom Marthenal

- by Crash893

- by kourosh

- by pocesar

- by Ivan

- by Ex Networking Guy

- by atticae

- by Kelly

- by leeand00

- by umesh awasthi

- by Gkhan14

- by leeand00

- by Gaia

- by Stephen Ostermiller

- by Mikhail

- by Dipan Mehta

- by ganesh

- by KMC

- by Ravi Gupta

- by Dan Kanze

- by Marco Demaio

- by Macha

< Previous Page | 1 2 3 4 5 6 7 8 9 10 | Next Page >