crawl errors - Developer IT

Redirect pages to fix crawl errors

- by sarah

Google is giving me a crawl error for pages that I have removed like www.mysite.com/mypage.html. I want to redirect this pages to the new page www.mysite.com/mysite/mypage. I tried to do that by using .htaccess but instead of fixing the problem, the crawl pages increased and a new crawl came www.mysite.com/www.mysite.com. This is my .htaccess file: <IfModule mod_rewrite.c> RewriteEngine On RewriteBase /sitename/ RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /sitename/index.php [L] </IfModule> # END WordPress Should I add this after the rewrite rule or I should do something else? RewriteRule ^pagename\.html$ http://www.sitename.com/pagename [R=301]

Read the article

GWT: Generate more complete crawl error report

- by Mike

I'm a developer in charge of managing Webmasters and related issues (including correcting crawl errors) for dozens (hundreds, maybe?) of active sites and as part of my duties I create a report of every discrepancy, including all pages generating a 404 and all pages that link to those pages. Currently within Webmaster Tools I'm able to download a csv file of all pages with a 404 response, but I'm then having to manually click on every single one of those links and copy the "linked from" field to paste into my spreadsheet. This is extremely tedious and seems unnecessary; I would expect the ability to download all that data at once. I'm ultimately looking for the end result of one csv file that has every url with a 404, but also has every url that links to each one of them. Am I overlooking this functionality somewhere or does anyone have a good solution? Edit 1 (2/11/2013): Example of what the csv output looks like now: URL,Response Code,News Error,Detected,Category http://www.abcdef.com/123.php,404,,11/12/13,Not found http://www.abcdef.com/456.php,404,,11/12/13,Not found Which is great, but let's say 123.php has 5 pages that link to it. Now I have to duplicate that row in my spreadsheet 4 more times, then go into Webmasters, get all the url's that link to the page, and add that data to my spreadsheet. The output I would prefer: URL,Response Code,Linked From,News Error,Detected,Category http://www.abcdef.com/123.php,404,http://www.ghijkl.com/naughtypage1.php,,11/12/13,Not found http://www.abcdef.com/123.php,404,http://www.ghijkl.com/naughtypage2.php,,11/12/13,Not found http://www.abcdef.com/123.php,404,http://www.ghijkl.com/naughtypage3.php,,11/12/13,Not found http://www.abcdef.com/456.php,404,http://www.ghijkl.com/naughtypage1.php,,11/12/13,Not found http://www.abcdef.com/456.php,404,http://www.ghijkl.com/naughtypage2.php,,11/12/13,Not found http://www.abcdef.com/456.php,404,http://www.ghijkl.com/naughtypage3.php,,11/12/13,Not found Note the (hypothetical) addition of a "Linked From" column, as well as the fact there are only 2 unique URL's now (like before) but all of the "Linked To" pages are shown in one report. Edit 2 (2/12/2013): To clarify, my question is less about detecting and correcting 404's, but more about generating a report of what Google has listed as errors. Oftentimes, these errors aren't even valid anymore but I still need documentation to show that Google detected a problem and that problem is now fixed. Many of the "linked from" url's I find are actually outdated, cached resources. For example, I'll frequently see that the linked-from url is the sitemap, which is actually an old sitemap cached by Google that points to an old page. Neither the sitemap or old page exist, but they still appear in my crawl error reports because they are cached resources.

Read the article

Google Webmasters Tools strange 404 errors referred from same site

- by Out of Control

Starting about a month ago, I noticed a sudden increase in 404 errors in Webmasters Tools for one of my sites (over 1400 errors so far). All the errors are being referred from my own site to non existent pages. The 404 error URLs are all of the same format: URL: http://www.helloneighbour.com/save/1347208508000 The number on the end appears to be a timestamp followed by 3 zeros. The referring page, in this case is : Linked from http://www.helloneighbour.com/save/cmw-insurance-insurance-burnaby When I look at the source code of that page, or I use Webmaster tools to view the page as Google sees it, I can't find any link that comes close to what is above. I built the site, and I can't find any place that might be causing these false links either. The server logs (access and error) don't show Google or anyone else trying to access these links. I've marked all these pages as fixed, and waited a couple of weeks, only to find the errors come back again over the last few days. I'm wondering if anyone else has seen anything strange like this, or if someone might have a way for me to debug, replicate this error myself.

Read the article

Submitting a sitemap to take care of inherited Google crawler errors

- by leeand00

I have an awful lot of Google Crawler errors (1000 or so) after I inherited a site that the previous owner migrated without moving much of their content. Would generating a map of the current site and submitting it to Google help fix this? Is there any quicker, automated way to eliminate errors other than clicking each and every site error? Note: I have already tried automating this on my own.

Read the article

Errors in ~/.xsession-errors

- by Kuberan Naganathan

I'm getting errors in ~/.xession-errors. I'm running ubuntu 12.04 Many apps fail to run without mention of problems in the .xsession-errors file. I looked around and tried to resolve issues myself but failed so far. I have to say it's possible that the issue is related to me mounting /home on another partition. (I say possibly because stuff worked ok for a while.) Fortunately my .xsession-errors file is small enough to post here. Thanks in advance for the help: gnome-keyring-daemon: insufficient process capabilities, unsecure memory might get used gnome-keyring-daemon: insufficient process capabilities, unsecure memory might get used gnome-keyring-daemon: insufficient process capabilities, unsecure memory might get used gnome-keyring-daemon: insufficient process capabilities, unsecure memory might get used Backend : gconf Integration : true Profile : unity Adding plugins Initializing core options...done (gnome-settings-daemon:2547): color-plugin-WARNING **: failed to get edid: unable to get EDID for output (gnome-settings-daemon:2547): color-plugin-WARNING **: unable to get EDID for xrandr-default: unable to get EDID for output (gnome-settings-daemon:2547): color-plugin-WARNING **: failed to reset xrandr-default gamma tables: gamma size is zero Initializing composite options...done Initializing opengl options...done Initializing decor options...done ** Message: applet now removed from the notification area Initializing vpswitch options...done Initializing snap options...done Initializing mousepoll options...done Initializing resize options...done Initializing place options...done Initializing move options...done Initializing wall options...done Initializing grid options...done I/O warning : failed to load external entity "/home/kuberan/.compiz/session/10754cf696d335e98e13471376531156900000024960034" Initializing session options...done Initializing gnomecompat options...done Initializing animation options...done Initializing fade options...done Initializing unitymtgrabhandles options...done Initializing workarounds options...done Initializing scale options...done compiz (expo) - Warn: failed to bind image to texture Initializing expo options...done Initializing ezoom options...done ** Message: using fallback from indicator to GtkStatusIcon (compiz:2560): GConf-CRITICAL **: gconf_client_add_dir: assertion `gconf_valid_key (dirname, NULL)' failed Initializing unityshell options...done Setting Update "main_menu_key" Setting Update "run_key" Setting Update "icon_size" ** Message: moving back from GtkStatusIcon to indicator

Read the article

Why is nesting or piggybacking errors within errors bad in general?

- by dietbuddha

Why is nesting or piggybacking errors within errors bad in general? To me it seems bad intuitively, but I'm suspicious in that I cannot adequately articulate why it is bad. This may be because it is not in general bad and that it is only bad in specific instances. Why is it detrimental to design error/exception handling in such a way. The specific instance is that of a REST service. There is a desire by some to use http errors (specifically the 500 response) as a way to indicate any problem with specific instances of a resource. An example of an instance resource in this case would be: http://server/ticket/80 # instance http://server/ticket # not an instance So this is the behavior that is being proposed. If ticket 80 does not exist return a http response code of 500. Within the body of the error return the "real" error as an additional error code and description. If the ticket resource doesn't exist return a response code of 404.

Read the article

crawl websites out of java web application without using bin/nutch

- by Marcel

hi :) i am trying to using nutch (1.1) without bin/nutch from my (java) mojarra 2.0.2 webapp... i am searching at google for examples, but there are no examples how i can realize this :/ ... i get an exception and the job fails :/ (i think of cause something with hadoop)... here is my code: public void run() throws Exception { final String[] args = new String[] { String.format("%s%s%s%s", JSFUtils.getWebAppRoot(), "nutch", File.separator, DIRECTORY_URLS), "-dir", String.format("%s%s%s%s", JSFUtils.getWebAppRoot(), "nutch", File.separator, DIRECTORY_CRAWL), "-threads", this.preferences.get("threads"), "-depth", this.preferences.get("depth"), "-topN", this.preferences.get("topN"), "-solr", this.preferences.get("solr") }; Crawl.main(args); } and a part of the logging: 10/05/17 10:42:54 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId= 10/05/17 10:42:54 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 10/05/17 10:42:54 INFO mapred.FileInputFormat: Total input paths to process : 1 10/05/17 10:42:54 INFO mapred.JobClient: Running job: job_local_0001 10/05/17 10:42:54 INFO mapred.FileInputFormat: Total input paths to process : 1 10/05/17 10:42:55 INFO mapred.MapTask: numReduceTasks: 1 10/05/17 10:42:55 INFO mapred.MapTask: io.sort.mb = 100 java.io.IOException: Job failed! at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1232) at org.apache.nutch.crawl.Injector.inject(Injector.java:211) at org.apache.nutch.crawl.Crawl.main(Crawl.java:124) at lan.localhost.process.NutchCrawling.run(NutchCrawling.java:108) at lan.localhost.main.Index.indexing(Index.java:71) at lan.localhost.bean.FeedingBean.actionStart(FeedingBean.java:25) .... can someone help me or tell me how i can crawling from a java application? i have increased the Xms to 256m and Xmx to 768m, but nothing changed... best regards marcel

Read the article

"Error in the Site Data Web Service." when performing crawl

- by Janis Veinbergs

Installed SharePoint Services v3 (SP2, october 2009 cumulative updates, Language Pack), attached to a content database I had previously (all works). Installed Search server 2008 Express (with language pack) on top of WSS and crawl does not work. However it works for newly created web application + database. Was playing around with accounts, permissions to try get it working. Currently I have WSS_Crawler account with such permissions: Office Search Server runs with WSS_Crawler account Config database has read permissions for WSS_Crawler Content database has read permissions for WSS_Crawler WSS_Crawler is owner of search database. Added WSS_Crawler to SQL server browser user group and administrator Yes, i'v given more permissions than needed, but it doesn't even work with that and i don't know if its permission problem or what. Crawl log says there is Error in the Site Data Web Service., nothing more. There were known issues with a similar error: Error in the Site Data Web Service. (Value does not fall within the expected range.), but this is not the case as thats an old issue and i hope it has been included in SP2... Logs are from olders to newest (descending order). They don't appear to be very helpful. Crawl log http://serveris Crawled Local Office SharePoint Server sites 3/15/2010 9:39 AM sts3://serveris Crawled Local Office SharePoint Server sites 3/15/2010 9:39 AM sts3://serveris/contentdbid={55180cfa-9d2d-46e4... Crawled Local Office SharePoint Server sites 3/15/2010 9:39 AM http://serveris/test Error in the Site Data Web Service. Local Office SharePoint Server sites 3/15/2010 9:39 AM http://serveris Error in the Site Data Web Service. Local Office SharePoint Server sites 3/15/2010 9:39 AM EventLog No errors in EventLog, just some Information events that Office Server Search provides The search service started. Successfully stored the application configuration registry snapshot in the database. Context: Application 'SharedServices Component: da1288b2-4109-4219-8c0c-3a22802eb842 Catalog: Portal_Content. A master merge was started due to an external request. Component: da1288b2-4109-4219-8c0c-3a22802eb842 A master merge has completed for catalog Portal_Content. Component: da1288b2-4109-4219-8c0c-3a22802eb842 Catalog: AnchorProject. A master merge was started due to an external request. Component: da1288b2-4109-4219-8c0c-3a22802eb842 A master merge has completed for catalog AnchorProject. ULS Log Just some information, but no exceptions, unexpected errors 03/15/2010 09:03:28.28 mssearch.exe (0x1B2C) 0x0E8C Search Server Common GatherStatus 0 Monitorable Insert crawl 771 to inprogress queue hr 0x00000000 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:6591 03/15/2010 09:03:28.28 mssearch.exe (0x1B2C) 0x0E8C Search Server Common GatherStatus 0 Monitorable Request Start Crawl 1, project Portal_Content, crawl 771 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2875 03/15/2010 09:03:28.28 mssearch.exe (0x1B2C) 0x0E8C Search Server Common GatherStatus 0 Monitorable Advise status change 1, project Portal_Content, crawl 771 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:28.28 w3wp.exe (0x1D98) 0x0958 Search Server Common MS Search Administration 8wn6 Information A full crawl was started on 'Local Office SharePoint Server sites' by BALTICOVO\janis.veinbergs. 03/15/2010 09:03:28.43 mssdmn.exe (0x1750) 0x10F8 ULS Logging Unified Logging Service 8wsv High ULS Init Completed (mssdmn.exe, Microsoft.Office.Server.Native.dll) 03/15/2010 09:03:30.48 mssdmn.exe (0x1750) 0x09C0 Search Server Common MS Search Indexing 8z0v Medium Create CCache 03/15/2010 09:03:30.56 mssdmn.exe (0x1750) 0x09C0 Search Server Common MS Search Indexing 8z0z Medium Create CUserCatalogCache 03/15/2010 09:03:32.06 w3wp.exe (0x1D98) 0x0958 Search Server Common MS Search Administration 90ge Medium SQL: dbo.proc_MSS_PropagationGetQueryServers 03/15/2010 09:03:32.09 w3wp.exe (0x1D98) 0x0958 Search Server Common MS Search Administration 7phq High GetProtocolConfigHelper failed in GetNotesInterface(). 03/15/2010 09:03:34.26 mssearch.exe (0x1B2C) 0x16A4 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project Portal_Content, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:35.92 mssearch.exe (0x1B2C) 0x16A4 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project Portal_Content, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:37.32 mssearch.exe (0x1B2C) 0x16A4 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project Portal_Content, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:37.23 mssdmn.exe (0x1750) 0x1850 Search Server Common MS Search Indexing 8z14 Medium Test TRACE (NULL):(null), (NULL)(null), (CrLf): , end 03/15/2010 09:03:39.04 mssearch.exe (0x1B2C) 0x16A4 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project Portal_Content, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:40.98 mssdmn.exe (0x1750) 0x0B24 Search Server Common MS Search Indexing 7how Monitorable GetWebDefaultPage fail. error 2147755542, strWebUrl http://serveris 03/15/2010 09:03:41.87 mssdmn.exe (0x1750) 0x1260 Search Server Common PHSts 0 Monitorable CSTS3Accessor::GetSubWebListItemAccessURL GetAccessURL failed: Return error to caller, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3acc.cxx Line:505 03/15/2010 09:03:41.87 mssdmn.exe (0x1750) 0x1260 Search Server Common PHSts 0 Monitorable CSTS3Accessor::Init: GetSubWebListItemAccessURL failed. Return error to caller, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3acc.cxx Line:348 03/15/2010 09:03:41.87 mssdmn.exe (0x1750) 0x1260 Search Server Common PHSts 0 Monitorable CSTS3Accessor::Init fails, Url sts3://serveris/siteurl=test/siteid={390611b2-55f3-4a99-8600-778727177a28}/weburl=/webid={fb0e4bff-65d5-4ded-98d5-fd099456962b}, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3handler.cxx Line:243 03/15/2010 09:03:41.87 mssdmn.exe (0x1750) 0x1260 Search Server Common PHSts 0 Monitorable CSTS3Handler::CreateAccessorExB: Return error to caller, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3handler.cxx Line:261 03/15/2010 09:03:40.98 mssdmn.exe (0x1750) 0x1260 Search Server Common MS Search Indexing 7how Monitorable GetWebDefaultPage fail. error 2147755542, strWebUrl http://serveris/test 03/15/2010 09:03:41.90 mssdmn.exe (0x1750) 0x0B24 Search Server Common PHSts 0 Monitorable CSTS3Accessor::GetSubWebListItemAccessURL GetAccessURL failed: Return error to caller, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3acc.cxx Line:505 03/15/2010 09:03:41.90 mssdmn.exe (0x1750) 0x0B24 Search Server Common PHSts 0 Monitorable CSTS3Accessor::Init: GetSubWebListItemAccessURL failed. Return error to caller, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3acc.cxx Line:348 03/15/2010 09:03:41.90 mssdmn.exe (0x1750) 0x0B24 Search Server Common PHSts 0 Monitorable CSTS3Accessor::Init fails, Url sts3://serveris/siteurl=/siteid={505443fa-ef12-4f1e-a04b-d5450c939b78}/weburl=/webid={c5a4f8aa-9561-4527-9e1a-b3c23200f11c}, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3handler.cxx Line:243 03/15/2010 09:03:41.90 mssdmn.exe (0x1750) 0x0B24 Search Server Common PHSts 0 Monitorable CSTS3Handler::CreateAccessorExB: Return error to caller, hr=80042616 - File:d:\office\source\search\search\gather\protocols\sts3\sts3handler.cxx Line:261 03/15/2010 09:03:43.26 mssearch.exe (0x1B2C) 0x0750 Search Server Common GatherStatus 0 Monitorable Advise status change 24, project Portal_Content, crawl 771 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:43.26 mssearch.exe (0x1B2C) 0x1804 Search Server Common GatherStatus 0 Monitorable Remove crawl 771 from inprogress queue - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:6722 03/15/2010 09:03:43.26 mssearch.exe (0x1B2C) 0x0750 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project Portal_Content, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:44.65 mssearch.exe (0x1B2C) 0x1804 Search Server Common GatherStatus 0 Monitorable Insert crawl 772 to inprogress queue hr 0x00000000 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:6591 03/15/2010 09:03:44.65 mssearch.exe (0x1B2C) 0x1804 Search Server Common GatherStatus 0 Monitorable Request Start Crawl 0, project AnchorProject, crawl 772 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2875 03/15/2010 09:03:44.65 mssearch.exe (0x1B2C) 0x1804 Search Server Common GatherStatus 0 Monitorable Advise status change 0, project AnchorProject, crawl 772 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:44.65 mssearch.exe (0x1B2C) 0x1804 Search Server Common GatherStatus 0 Monitorable Unlock Queue, project Portal_Content - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2922 03/15/2010 09:03:44.82 mssearch.exe (0x1B2C) 0x1DD0 Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:44.95 mssearch.exe (0x1B2C) 0x0750 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project AnchorProject, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:46.51 mssearch.exe (0x1B2C) 0x0750 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project AnchorProject, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:46.39 mssearch.exe (0x1B2C) 0x1E4C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.01 mssearch.exe (0x1B2C) 0x1C6C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.87 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.29 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.53 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.67 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.82 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.84 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.89 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:49.90 mssearch.exe (0x1B2C) 0x0750 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project AnchorProject, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:51.42 mssearch.exe (0x1B2C) 0x1E4C Search Server Common GatherStatus 0 Monitorable Advise status change 4, project AnchorProject, crawl 772 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:51.00 mssearch.exe (0x1B2C) 0x1E4C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:51.42 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GatherStatus 0 Monitorable Remove crawl 772 from inprogress queue - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:6722 03/15/2010 09:03:52.96 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GatherStatus 0 Monitorable Insert crawl 773 to inprogress queue hr 0x00000000 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:6591 03/15/2010 09:03:52.96 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GatherStatus 0 Monitorable Request Start Crawl 0, project AnchorProject, crawl 773 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2875 03/15/2010 09:03:55.29 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GatherStatus 0 Monitorable Unlock Queue, project AnchorProject - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2922 03/15/2010 09:03:55.29 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GatherStatus 0 Monitorable Removed start crawl request from Queue 0, crawl 773 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2942 03/15/2010 09:03:55.29 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GatherStatus 0 Monitorable Request Start Crawl 0, project AnchorProject, crawl 773 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2875 03/15/2010 09:03:55.29 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GatherStatus 0 Monitorable Advise status change 0, project AnchorProject, crawl 773 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:55.37 mssearch.exe (0x1B2C) 0x1CCC Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:55.37 mssearch.exe (0x1B2C) 0x0750 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project AnchorProject, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:56.71 mssearch.exe (0x1B2C) 0x1E4C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:56.78 mssearch.exe (0x1B2C) 0x0750 Search Server Common GatherStatus 0 Monitorable Advise status change 12, project AnchorProject, crawl -1 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:58.40 mssearch.exe (0x1B2C) 0x155C Search Server Common GathererSql 0 Monitorable CGatherer::LoadTransactionsFromCrawlInternal Flush anchor, count 0 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4943 03/15/2010 09:03:58.89 mssearch.exe (0x1B2C) 0x155C Search Server Common GatherStatus 0 Monitorable Advise status change 4, project AnchorProject, crawl 773 - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:4853 03/15/2010 09:03:58.89 mssearch.exe (0x1B2C) 0x1130 Search Server Common GatherStatus 0 Monitorable Remove crawl 773 from inprogress queue - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:6722 03/15/2010 09:03:58.89 mssearch.exe (0x1B2C) 0x1130 Search Server Common GatherStatus 0 Monitorable Unlock Queue, project AnchorProject - File:d:\office\source\search\search\gather\server\gatherobj.cxx Line:2922 What could be wrong here - any clues?

Read the article

Too many access denied errors showing in Google Webmaster Tools every day

- by user2255733

I get 18,000 access denied error showing in Google Webmaster Tools every day! So strange it shows for URL's with www and not no-www. Fetch as Google works perfectly for pages got that error. Google starts to downgrade my website - impressions have dropped from 35,000 to 18,000. I am using cloud flair CDN and .htaccess mod_rewrite. Any help will be extremely appreciated as I am really loosing control.

Read the article

Fake links cause crawl error in Google Webmaster Tools

- by Itai

Google reported Crawl Errors last week on my largest site though Webmaster Tools. Here is the message: Google detected a significant increase in the number of URLs that return a 404 (Page Not Found) error. Investigating these errors and fixing them where appropriate ensures that Google can successfully crawl your site's pages. The Crawl Errors list is now full of hundreds of fake links like these causing 16,519 errors so far: Note that my site does not even have a search.html and is not related to any of the terms shown in the above image. Inspecting sources for one of those links, I can see this is not simply an isolated source but a concerted effort: Each of the links has a few to a dozen sources all from different, seemingly unrelated sites. It is completely baffling as to why would someone to spending effort doing this. What are they hoping to achieve? Is this an attack? Most importantly: Does this have a negative effect on my side? Could it negatively impact my ranking? If so, what to do about it? The few linking pages I looked at are full of thousands of links to tons of sites and have no contact information and do not seem like the kind of people who would simply stop if asked nicely! According to Google Webmaster Tools, these errors have appeared in a span of 11 days. No crawl errors were being reported previously.

Read the article

Several "SideBySide" errors in the event viewer (Windows XP SP3)

- by Wesley

Hi all, For some reason, my netbook continuously comes up with SideBySide errors in the event viewer under Computer Management. Is there some way to get rid of these errors? My netbook occasionally BSODs and looking back into the event viewer after, at least 8 SideBySide errors were logged prior to BSOD. Please help! Thanks in advance. Netbook is a Samsung N120 with upgraded RAM to 2 GB. EDIT: So, SideBySide errors have been resolved and apparently the BSOD isn't related. But I remember specifically, that last time I had a BSOD, it was due to an "unknown hard error." What causes that?

Read the article

Make Visual Studio to show All Compile Errors!

- by BDotA

Visual studio does not show all the compile errors at once. for example one time it says I have two errors and when I fix them then 102 more compile errors are showing up and these new errors are not dependent on those two previous errors. How can we tell it to go through all the code and show all compile errors at once

Read the article

Understanding the maximum hit-rate supported by a web-server

- by SNag

I would like to crawl a publicly available site (and one that's legal to crawl) for a personal project. From a brief trial of the crawler, I gathered that my program hits the server with a new HTTPRequest 8 times in a second. At this rate, as per my estimate, to obtain the full set of data I need about 60 full days of crawling. While the site is legal to crawl, I understand it can still be unethical to crawl at a rate that causes inconvenience to the regular traffic on the site. What I'd like to understand here is -- how high is 8 hits per second to the server I'm crawling? Could I possibly do 4 times that (by running 4 instances of my crawler in parallel) to bring the total effort down to just 15 days instead of 60? How do you find the maximum hit-rate a web-server supports? What would be the theoretical (and ethical) upper-limit for the crawl-rate so as to not adversely affect the server's routine traffic?

Read the article

Page load speeds effect on crawl rate

- by Sam Pegler

We've noticed a big drop in the total pages crawled per day on our site, we have no control over the crawl rate in google webmaster tools so it's possible this has been changed by google. However it's a fairly large site and I wouldn't of thought that the crawl rate would've been decreased. What we have noticed though is a sizeable increase in page load times, in my mind this would be the cause. Can anyone else confirm if the crawl rate is directly correlated to page load time? Seems logical, longer page load time, less pages crawled. Any decent documentation on this would be appreciated, I don't normally have any input on SEO so this is new to me.

Read the article

How to crawl a webPage with dynamic content added by javascript

- by blunderboy

I guess there is a news that Google bots have the capability to understand our javascript code. It means this is possible to fully crawl a webpage which has lazy loading feature enabled. I am using Apache Nutch to crawl websites but I don't think it has the capability to fetch the URLs being injected in HTML page by javascript when the page is scrolled down. I see a lot of websites doing lazy loading for performance issue. So Can somebody please explain me how can i crawl the data which comes in HTML page on lazy load. (On scrolling the page down).

Read the article

404 crawl error for mailto:[email protected] in google webmaster

- by sam

im getting a 404 crawl error for mailto:[email protected], in google webmaster tools under the health crawl errors surly google should see that mailto: is related to an email not a webpage.. the html im using for the mailto on my page is <a href="mailto:mailto:[email protected]">[email protected]</a> Whats the best way to resolve this ? Is mailto still widely used or is there a newer alternative ?

Read the article

Factors of Crawl Allocation

Google, Yahoo!, and Bing all have heavy duty ways to crawl each and every website on the internet. Here are some factors that can influence your crawl allocation:

Read the article

IIS7 error config and remote errors

- by Kev

Certain IIS7/7.5 500.19 configuration errors only render on a browser running on the local server. This appears to happen regardless of whether I set <httpErrors errorMode="Detailed" existingResponse="PassThrough" /> in the system.webServer section of a site's web.config file (or even globally for that matter). For example, I had a developer who reported that he was just getting the generic IIS7 500 error page: This was happening even though he had the following configured in his web.config: <configuration> <system.webServer> <httpErrors errorMode="Detailed" existingResponse="PassThrough" /> </system.webServer> </configuration> If I browse to the site on the server itself I see (some sensitive info redacted): Would the reason for this be that if the web.config has errors it therefore can't be parsed. Because it can't be parsed the local <httpErrors> setting doesn't get read and thus causes IIS to revert to default settings (i.e. DetailedLocalOnly)? Update: @LazyOne - suggested setting the above config at the server level which I already tried. This resulted in just raw 500 errors:

Read the article

Microsoft Exchange Server not sending or receiving mails and there are no errors

- by Mike

Our Microsoft Exchange Server not sending or receiving mails and there are no errors showing.

Read the article

Google Webmasters tools crawl error

- by Shiro

I am looking in to Google Webmaster Tools - Crawl Error section. How should I handle for those URL due their system / application showed invalid URL. e.g http://www.example/images/products/s_=enlarge_16gb.jpg but, I dunno what happen to yahoo groups, it break the link into http://www.example/images/products/s_= enlarge_16gb.jpg and I only make the top part become hyperlink, which is http://www.example/images/products/s_= Because of the URL, Google show crawl error, I got few error because of this kind of result or because other people typo error. How do I prevent this. I am sure I don't have the right go and change other people post. What is the solution for this. Thanks!

Read the article

Google Webmasters tools crawl error caused by URL split into two lines

- by Shiro

I am looking in to Google Webmaster Tools - Crawl Error section. How should I handle for those URL due their system / application showed invalid URL. e.g http://www.example/images/products/s_=enlarge_16gb.jpg but, I dunno what happen to yahoo groups, it break the link into http://www.example/images/products/s_= enlarge_16gb.jpg and I only make the top part become hyperlink, which is http://www.example/images/products/s_= Because of the URL, Google show crawl error, I got few error because of this kind of result or because other people typo error. How do I prevent this. I am sure I don't have the right go and change other people post. What is the solution for this. Thanks!

Read the article

SQL Saturday 300 BBQ Crawl

- by Bill Graziano

SQL Saturday #300 is coming up right here in Kansas City on September 13th, 2014. This is our fifth SQL Saturday which means it's the fifth anniversary of our now infamous BBQ Crawl. We get together on Friday afternoon before the event and visit a few local joints. We've done nice places and we've done dives. We haven’t picked the venues yet but I promise you’ll be well fed! And if you’re thinking about the BBQ crawl you should think about submitting a session. Our call for speakers closes Tuesday, July 15th so you just have time! If you’re going to be at the event, contact me and I’ll get you added to the list.

Read the article

adding errors to Django form errors.all

- by hendrixski

How do I add errors to the top of a form after I cleaned the data? I have an object that needs to make a REST call to an external app (google maps) as a pre-save condition, and this can fail, which means I need my users to correct the data in the form. So I clean the data and then try to save and add to the form errors if the save doesn't work: if request.method == "POST": #clean form data try: profile.save() return HttpResponseRedirect(reverse("some_page", args=[some.args])) except ValueError: our_form.errors.__all__ = [u"error message goes here"] return render_to_response(template_name, {"ourform": our_form,}, context_instance=RequestContext(request)) This failed to return the error text in my unit-tests (which were looking for it in {{form.non_field_errors}}), and then when I run it through the debugger, the errors had not been added to the forms error dict when they reach the render_to_response line, nor anywhere else in the our_form tree. Why didn't this work? How am I supposed to add errors to the top of a form after it's been cleaned?

Read the article

Nutch - how to crawl by small patches?

- by Yurish

Hi everyone! I am stuck! Can`t get Nutch to crawl for me by small patches. I start it by bin/nutch crawl command with parameters -depth 7 and -topN 10000. And it never ends. Ends only when my HDD is empty. What i need to do: Start to crawl my seeds with possibility to go further on outlinks. Crawl 20000 pages, then index them. Crawl another 20000 pages, index them and merge with first index. Loop step 3 n times. Tried also with scripts found in wiki, but all scripts i found don't go further. If i run them again, they do everything from beginning. And in the end of script i have the same index i had, when started to crawl. But, i need to continue my crawl. Some help would be very usefull!

Read the article

Application Errors vs User Errors in PHP

- by CogitoErgoSum

So after much debate back and forth, I've come up with what I think may be a valid plan to handle Application/System errors vs User Errors (I.e. Validation Issues, Permission Issues etc). Application/System Errors will be handled using a custom error handler (via set_error_handler()). Depending on the severity of the error, the user may be redirected to a generic error page (I.e. Fatal Error), or the error may simply be silently logged (i.e E_WARNING). These errors are ones most likely caused by issues outside the users control (Missing file, Bad logic, etc). The second set of errors would be User Generated ones. These are the ones may not automatically trigger an error but would be considered one. In these cases i"ve decided to use the trigger_error() function and typically throw a waning or notice which would be logged silently by the error handler. After that it would be up to the developer to then redirect the user to another page or display some sort of more meaningful message to the user. This way an error of any type is always logged, but user errors still allow the developer freedom to handle it in their own ways. I.e. Redirect them back to their form with it fully repopulated and a message of what went wrong. Does anyone see anything wrong with this, or have a more intuitive way? My approach to error handling is typically everyone has their own ways but there must be a way instituted.

Search Results

Search found 12829 results on 514 pages for 'crawl errors'.

Page 1/514 | 1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >

- by sarah

- by Mike

- by Out of Control

- by leeand00

- by Kuberan Naganathan

- by dietbuddha

- by Marcel

- by Janis Veinbergs

- by user2255733

- by Itai

- by Wesley

- by BDotA

- by SNag

- by Sam Pegler

- by blunderboy

- by sam

- by Kev

- by Mike

- by Shiro

- by Shiro

- by Bill Graziano

- by hendrixski

- by Yurish

- by CogitoErgoSum

1 2 3 4 5 6 7 8 9 10 11 12 | Next Page >