Search Results

Search found 173 results on 7 pages for 'spider'.

Page 5/7 | < Previous Page | 1 2 3 4 5 6 7 | Next Page >

Does schema.org improve SEO?

- by marko

http://schema.org This site provides a collection of schemas, i.e., html tags, that webmasters can use to markup their pages in ways recognized by major search providers. Search engines including Bing, Google, Yahoo! and Yandex rely on this markup to improve the display of search results, making it easier for people to find the right web pages. It sounds wonderful, but does the search spider ignore the extra attributes and elements? Is it just too clever and ignores it? May it also be that it lowers your visibility because of such alteration?

Read the article
Does SEO optimisation count on the responsive side of a site?

- by Rick Donohoe

I'm looking at making some SEO optimisation fixes, and at this point I'm sorting out the heading structure and keywords - H1's, H2's etc We have a site where there are a number of similar blocks, and one is always visible, and one is hidden depending on the screen size. This is our method of making a single site responsive. Firstly, how does this technique affect the SEO, and in general does the responsive side of a site matter at all to search engines? What I mean by this is if the site has different content depending on screen sizes, then which content would the search spider crawl?

Read the article
SEO tool is telling me title, description and keywords don't exist, but they do. Where is the problem?

- by DaveDev

I'm using the following tool to analyse how 'optimal' a site that I'm working on is for search engines: http://tools.seobook.com/general/spider-test/ I enter the URL for the site - http://ftmsuat.moneymate.com - into the search bar, and it returns a breakdown of the contents of the page. I'm a little confused by what I see though. According to the results, the page doesn't have a title, description or keywords. But if you check the source of the page, those elements are definitely there. So I'm wondering now, which is wrong? seobook.com or my page?

Read the article
Working on the search

The one thing I've always like working on and about this site, is the full text search engine and spider. Bascially it goes out and spiders all the major development blogs on the web, and then indexes them. The engine uses Lucene for it's index. Lucene is another open source project and it works really fast. Currently the directory is indexed, and the rss feeds are underway. We're talking about a lot of content, but once it's done you'll be able to pull podcasts, and videos as they get posted to...Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
Is the use of hashbang really a good idea? [on hold]

- by user32642

I've been working on a WordPress site lately that was design with hashbang or shebang in the dynamically generated URLs. After doing some research, I noticed that there was some preference by Google in their use and how it crawled the site. However, after I ran several sitemap generators and Screaming Frog SEO Spider, I realized that the only page being crawled was the index page. So now I am questioning the use of hashbangs. What do you think? Should I attempt to remove them? Or will it even matter? And does anyone know of a easy way to remove this? The site is www.modernvintage1005.com

Read the article
How does Google index our site when you're doing a Website Optimizer experiment?

- by user305175

I'm about to use Google's Website Optimizer to do a/b testing on the home page of my site. My question is: which of the alternative pages will google's spider index? All of them? I couldn't find any info about this on google or on GWO pages.

Read the article
MSN like box for Ad rotation.

- by Muhammad Umar Siddique

Hi Everyone. I want to create a JavaScript based box much like the one found on MSN or AOL with the navigational buttons. Box content must be spider-able by search engines. On MSN you can find the box near top left corner. Note this box contains links and images. Any idea how to implement this ? Thanks.

Read the article
Executing Javascript without a browser?

- by Daniel

I am looking into Javascript programming without a browser. I want to run scripts from the Linux or Mac OS X command line, much like we run any other scripting language (ruby, php, perl, python...) $ javascript my_javascript_code.js I looked into spider monkey (Mozilla) and v8 (Google), but both of these appear to be embedded. Is anyone using Javascript as a scripting language to be executed from the command line? If anyone is curious why I am looking into this, I've been poking around node.js

Read the article
How can I find unused images and CSS styles in a website?

- by Jon Galloway

Is there a tool or methodology (other than trial and error) I can use to find unused image files? How about CSS declarations for ID's and Classes that don't even exist in the site? It seems like there might be a way to just spider the site, profile it, and see which images and styles are never loaded.

Read the article
Data extract from website URL

- by user2522395

From this below script I am able to extract all links of particular website, But i need to know how I can generate data from extracted links especially like eMail, Phone number if its there Please help how i will modify the existing script and get the result or if you have full sample script please provide me. Private Sub btnGo_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles btnGo.Click 'url must be in this format: http://www.example.com/ Dim aList As ArrayList = Spider("http://www.qatarliving.com", 1) For Each url As String In aList lstUrls.Items.Add(url) Next End Sub Private Function Spider(ByVal url As String, ByVal depth As Integer) As ArrayList 'aReturn is used to hold the list of urls Dim aReturn As New ArrayList 'aStart is used to hold the new urls to be checked Dim aStart As ArrayList = GrabUrls(url) 'temp array to hold data being passed to new arrays Dim aTemp As ArrayList 'aNew is used to hold new urls before being passed to aStart Dim aNew As New ArrayList 'add the first batch of urls aReturn.AddRange(aStart) 'if depth is 0 then only return 1 page If depth < 1 Then Return aReturn 'loops through the levels of urls For i = 1 To depth 'grabs the urls from each url in aStart For Each tUrl As String In aStart 'grabs the urls and returns non-duplicates aTemp = GrabUrls(tUrl, aReturn, aNew) 'add the urls to be check to aNew aNew.AddRange(aTemp) Next 'swap urls to aStart to be checked aStart = aNew 'add the urls to the main list aReturn.AddRange(aNew) 'clear the temp array aNew = New ArrayList Next Return aReturn End Function Private Overloads Function GrabUrls(ByVal url As String) As ArrayList 'will hold the urls to be returned Dim aReturn As New ArrayList Try 'regex string used: thanks google Dim strRegex As String = "<a.*?href=""(.*?)"".*?>(.*?)</a>" 'i used a webclient to get the source 'web requests might be faster Dim wc As New WebClient 'put the source into a string Dim strSource As String = wc.DownloadString(url) Dim HrefRegex As New Regex(strRegex, RegexOptions.IgnoreCase Or RegexOptions.Compiled) 'parse the urls from the source Dim HrefMatch As Match = HrefRegex.Match(strSource) 'used later to get the base domain without subdirectories or pages Dim BaseUrl As New Uri(url) 'while there are urls While HrefMatch.Success = True 'loop through the matches Dim sUrl As String = HrefMatch.Groups(1).Value 'if it's a page or sub directory with no base url (domain) If Not sUrl.Contains("http://") AndAlso Not sUrl.Contains("www") Then 'add the domain plus the page Dim tURi As New Uri(BaseUrl, sUrl) sUrl = tURi.ToString End If 'if it's not already in the list then add it If Not aReturn.Contains(sUrl) Then aReturn.Add(sUrl) 'go to the next url HrefMatch = HrefMatch.NextMatch End While Catch ex As Exception 'catch ex here. I left it blank while debugging End Try Return aReturn End Function Private Overloads Function GrabUrls(ByVal url As String, ByRef aReturn As ArrayList, ByRef aNew As ArrayList) As ArrayList 'overloads function to check duplicates in aNew and aReturn 'temp url arraylist Dim tUrls As ArrayList = GrabUrls(url) 'used to return the list Dim tReturn As New ArrayList 'check each item to see if it exists, so not to grab the urls again For Each item As String In tUrls If Not aReturn.Contains(item) AndAlso Not aNew.Contains(item) Then tReturn.Add(item) End If Next Return tReturn End Function

Read the article
JSON.parse vs. eval()

- by Kevin Major

My Spider Sense warns me that using eval() to parse incoming JSON is a bad idea. I'm just wondering if JSON.parse() - which I assume is a part of JavaScript and not a browser-specific function - is more secure.

Read the article
Blocking 'good' bots in nginx with multiple conditions for certain off-limits URL's where humans can go

- by Glenn Plas

After 2 days of searching/trying/failing I decided to post this here, I haven't found any example of someone doing the same nor what I tried seems to be working OK. I'm trying to send a 403 to bots not respecting the robots.txt file (even after downloading it several times). Specifically Googlebot. It will support the following robots.txt definition. User-agent: * Disallow: /*/*/page/ The intent is to allow Google to browse whatever they can find on the site but return a 403 for the following type of request. Googlebot seems to keep on nesting these links eternally adding paging block after block: my_domain.com:80 - 66.x.67.x - - [25/Apr/2012:11:13:54 +0200] "GET /2011/06/ page/3/?/page/2//page/3//page/2//page/3//page/2//page/2//page/4//page/4//pag e/1/&wpmp_switcher=desktop HTTP/1.1" 403 135 "-" "Mozilla/5.0 (compatible; G ooglebot/2.1; +http://www.google.com/bot.html)" It's a wordpress site btw. I don't want those pages to show up, even though after the robots.txt info got through, they stopped for a while only to begin crawling again later. It just never stops .... I do want real people to see this. As you can see, google get a 403 but when I try this myself in a browser I get a 404 back. I want browsers to pass. root@my_domain:# nginx -V nginx version: nginx/1.2.0 I tried different approaches, using a map and plain old nono if's and they both act the same: (under http section) map $http_user_agent $is_bot { default 0; ~crawl|Googlebot|Slurp|spider|bingbot|tracker|click|parser|spider 1; } (under the server section) location ~ /(\d+)/(\d+)/page/ { if ($is_bot) { return 403; # Please respect the robots.txt file ! } } I recently had to polish up my Apache skills for a client where I did about the same thing like this : # Block real Engines , not respecting robots.txt but allowing correct calls to pass # Google RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ $compatible;\ Googlebot/2\.[01];\ \+http://www\.google\.com/bot\.html$$ [NC,OR] # Bing RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ $compatible;\ bingbot/2\.[01];\ \+http://www\.bing\.com/bingbot\.htm$$ [NC,OR] # msnbot RewriteCond %{HTTP_USER_AGENT} ^msnbot-media/1\.[01]\ $\+http://search\.msn\.com/msnbot\.htm$$ [NC,OR] # Slurp RewriteCond %{HTTP_USER_AGENT} ^Mozilla/5\.0\ $compatible;\ Yahoo!\ Slurp;\ http://help\.yahoo\.com/help/us/ysearch/slurp$$ [NC] # block all page searches, the rest may pass RewriteCond %{REQUEST_URI} ^(/[0-9]{4}/[0-9]{2}/page/) [OR] # or with the wpmp_switcher=mobile parameter set RewriteCond %{QUERY_STRING} wpmp_switcher=mobile # ISSUE 403 / SERVE ERRORDOCUMENT RewriteRule .* - [F,L] # End if match This does a bit more than I asked nginx to do but it's about the same principle, I'm having a hard time figuring this out for nginx. So my question would be, why would nginx serve my browser a 404 ? Why isn't it passing, The regex isn't matching for my UA: "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/536.5 (KHTML, like Gecko) Chrome/19.0.1084.30 Safari/536.5" There are tons of example to block based on UA alone, and that's easy. It also looks like the matchin location is final, e.g. it's not 'falling' through for regular user, I'm pretty certain that this has some correlation with the 404 I get in the browser. As a cherry on top of things, I also want google to disregard the parameter wpmp_switcher=mobile , wpmp_switcher=desktop is fine but I just don't want the same content being crawled multiple times. Even though I ended up adding wpmp_switcher=mobile via the google webmaster tools pages (requiring me to sign up ....). that also stopped for a while but today they are back spidering the mobile sections. So in short, I need to find a way for nginx to enforce the robots.txt definitions. Can someone shell out a few minutes of their lives and push me in the right direction please ? I really appreciate ANY response that makes me think harder ;-)

Read the article
AWS Load balancer connection reset

- by joshmmo

I have an ELB set up with two instances. The issue I have with it is that when I do not add www. to it, the ELB just hangs. This is some info I get when I spider with wget: Spider mode enabled. Check if remote file exists. --2013-06-20 13:40:54-- http://learning.example.com/ Resolving learning.example.com... 54.xxx.x.x53, 50.xx.xxx.x71 Connecting to learning.example.com|54.xxx.x.x53|:80... connected. HTTP request sent, awaiting response... No data received. Retrying. when I add www. it works great. I have a GoDaddy SSL cert that I added to the listener section that covers 3 domains, www.learning.example.com, files.learning.example.com and learning.example.com. These are my listener settings: - HTTP 80 HTTPS 443 N/A N/A - SSL 443 SSL 443 Change canvasNew (Change) My EC2 instances are running apache2 on Ubuntu 12.04. I will be happy to post my vhosts file if needed. However, when I ran the server with the domains pointing to just one EC2 instance things worked fine. How can I fix this issue for learning.example.com? Why does www work just fine? A second question would be what is the difference between instance protocol and load balancer protocol? EDIT: Here are the dig results for learning.example.com from yesterday. I changed the DNS entry to point to one instance to make sure it was the elb. When I switch it back I will do it for www.learning.example.com ; <<>> DiG 9.9.1-P2 <<>> learning.example.com ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 20210 ;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;learning.example.com. IN A ;; ANSWER SECTION: learning.example.com. 2559 IN CNAME canvas-22222222222.us-west-1.elb.amazonaws.com. canvas-22222222222.us-west-1.elb.amazonaws.com. 60 IN A 54.xxx.x.x53 canvas-22222222222.us-west-1.elb.amazonaws.com. 60 IN A 50.xx.xxx.x71 ;; Query time: 83 msec ;; SERVER: 10.x.xx.20#53(10.x.xx.20) ;; WHEN: Thu Jun 20 13:40:47 2013 ;; MSG SIZE rcvd: 137 EDIT 2: Here is some more info that might be helpful. Port Configuration: 80 (HTTP) forwarding to 443 (HTTPS) Backend Authentication: Disabled Stickiness: Disabled(edit) 443 (SSL, Certificate: canvasNew) forwarding to 443 (SSL) Backend Authentication: Disabled So I switched everything to one EC2 IP address to bypass the elb to make sure things are working. It's running great. www and the non-www url work perfectly fine. Its only when I switch things to the ELB that learning.example.com hangs and www.learning.example.com works. Hopefully you can get some ideas flowing.

Read the article
Video acceleration problem with Windows 7 games and PPTX files

- by Jordan 1GT

I have a Dell xps M1330 which originally ran Vista, but I upgraded to Windows 7. When I try to run a Win 7 game like spider solitaire I receive the following message: The game is running in software rendering mode. Hardware acceleration is either disabled or not supported by your video card driver which could slow down game performance. Make sure you have the latest video card driver installed and that hardware acceleration is turned on. I confirmed that hardware acceleration is turned on. When I go to Dell's site, I'm told there is no later video driver. When I run the game it runs very choppy. I have a .pptx file which is doing strange things in normal view and I suspect it may be related to the same video acceleration problem.

Read the article
Video problem with Windows 7 Games

- by Jordan 1GT

I have a Dell xps M1330 which originally ran Vista, but I upgraded to Windows 7. When I try to run a Win 7 game like spider solitaire I receive the following message: "The game is running in software rendering mode. Hardware acceleration is either disabled or not supported by your video card driver which could slow down game performance. Make sure you have the latest video card driver installed and that hardware acceleration is turned on." I confirmed that hardware acceleration is turned on. When I go to Dell's site, I'm told there is no later video driver. When I run the game it runs very choppy. I wouldn't care, but I loaded a .pptx file which is doing strange things in normal view and I suspect may be related to the same video problem. Any ideas?

Read the article
Mongo Client RedHat EL5 UT8 Support

- by Michael Irey

# mongo MongoDB shell version: 1.6.4 Fri Mar 16 11:55:46 *** warning: spider monkey build without utf8 support. consider rebuilding with utf8 support connecting to: test Mongo Server seems to handle the utf8 characters fine, as well as my php-mongo-client driver. But when I try to query a record that has a utf8 character from the mongo command line client I get: > db.Users.find({age:33}); error:non ascii character detected Fri Mar 16 11:55:43 mongo got signal 11 (Segmentation fault), stack trace: Fri Mar 16 11:55:43 0x440b50 0x3664c302d0 0x3f47e7b6e0 0x3f47e83bbd 0x3f47e254f3 0x3f47e25660 0x3f47e256ee 0x3f47e25792 0x3f47e2876e 0x4b031d 0x443b72 0x445476 0x3664c1d994 0x43fd39 mongo(_Z12quitAbruptlyi+0x3b0) [0x440b50] /lib64/libc.so.6 [0x3664c302d0] /usr/lib64/libjs.so.1 [0x3f47e7b6e0] /usr/lib64/libjs.so.1(js_CompileTokenStream+0x3d) [0x3f47e83bbd] /usr/lib64/libjs.so.1 [0x3f47e254f3] /usr/lib64/libjs.so.1(JS_CompileUCScriptForPrincipals+0x60) [0x3f47e25660] /usr/lib64/libjs.so.1(JS_EvaluateUCScriptForPrincipals+0x3e) [0x3f47e256ee] /usr/lib64/libjs.so.1(JS_EvaluateUCScript+0x22) [0x3f47e25792] /usr/lib64/libjs.so.1(JS_EvaluateScript+0x6e) [0x3f47e2876e] mongo(_ZN5mongo7SMScope4execERKSsS2_bbbi+0xed) [0x4b031d] mongo(_Z5_mainiPPc+0x14a2) [0x443b72] mongo(main+0x26) [0x445476] /lib64/libc.so.6(__libc_start_main+0xf4) [0x3664c1d994] mongo(__gxx_personality_v0+0x269) [0x43fd39] Any ideas or suggestions would be welcome

Read the article
Download web server structure with empty files

- by golimar

I want to make a mirror of a Web server, but downloading the actual files will take too long. So I thought of having just the directory and file structure, and when I need the actual contents of the file, I can download just that file. I have tried wget --spider URL and in a short time it has created in my local disk the directory structure with no files. But I've checked all of wget's or curl's switches and there is nothing like what I need. Can this be done with wget, curl or any other tool?

Read the article
SEO - Google and link cleaning / cloaking [closed]

- by Jens Törnell

Possible Duplicate: Does the Google spider render JavaScript? This a SEO related question, not a code related one. Googles own link cleaning / cloaking Gå to http://www.google.com and search for something. Hover the title and you will se a link to the page you want to go to. The URL you see when hovering is NOT the link you are clicking on. Instead of clicking you can drag the title a little bit and then hover it. Then you will se the real URL. My own link cleaning / cloaking Go to http://jsfiddle.net/NvmER/1/ and click the link, or look at the code below. You will be "redirected" to http://www.test.com. The real link are http://www.test.com/?event=23 Working code in case jsfiddle don't work If you need to se how it works I pasted a code below. <a class="direct" href="http://www.test.com/?event=23" data-redirect="http://www.test.com">Länk</a>? $(document).ready(function() { $("a.direct").live("mousedown", function(e){ var oldurl = $(this).attr('href'); var newurl = $(this).attr('data-redirect'); $(this).attr('href', newurl); }); });? Question Is this ok with Google? It's done with javascript. If you have an answer, link to a source or test to support it.

Read the article
Idea of an algorithm to detect a website's navigation structure?

- by Uwe Keim

Currently I am in the process of developing an importer of any existing, arbitrary (static) HTML website into the upcoming release of our CMS. While the downloading the files is solved successfully, I'm pulling my hair off when it comes to detect a site structure (pages and subpages) purely from the HTML files, without the user specifying additional hints. Basically I want to get a tree like: + Root page 1 + Child page 1 + Child page 2 + Child child page1 + Child page 3 + Root page 2 + Child page 4 + Root page 3 + ... I.e. I want to be able to detect the menu structure from the links inside the pages. This has not to be 100% accurate, but at least I want to achieve more than just a flat list. I thought of looking at multiple pages to see similar areas and identify these as menu areas and parse the links there, but after all I'm not that satisfied with this idea. My question: Can you imagine any algorithm when it comes to detecting such a structure? Update 1: What I'm looking for is not a web spider, but an algorithm do create a logical tree of the relationship of the pages to be able to create pages and subpages inside my CMS when importing them. Update 2: As of Robert's suggestion I'll solve this by starting at the root page, and then simply parse links as you go and treat every link inside a page simply as a child page. Probably I'll recurse not in a deep-first manner but rather in a breadth-first manner to get a more balanced navigation structure.

Read the article
Handling SEO for Infinite pages that cause external slow API calls

- by Noam

I have an 'infinite' amount of pages in my site which rely on an external API. Generating each page takes time (1 minute). Links in the site point to such pages, and when a users clicks them they are generated and he waits. Considering I cannot pre-create them all, I am trying to figure out the best SEO approach to handle these pages. Options: Create really simple pages for the web spiders and only real users will fetch the data and generate the page. A little bit 'afraid' google will see this as low quality content, which might also feel duplicated. Put them under a directory in my site (e.g. /non-generated/) and put a disallow in robots.txt. Problem here is I don't want users to have to deal with a different URL when wanting to share this page or make sense of it. Thought about maybe redirecting real users from this URL back to the regular hierarchy and that way 'fooling' google not to get to them. Again not sure he will like me for that. Letting him crawl these pages. Main problem is I can't control to rate of the API calls and also my site seems slower than it should from a spider's perspective (if he only crawled the generated pages, he'd think it's much faster). Which approach would you suggest?

Read the article
CodePlex Daily Summary for Tuesday, November 30, 2010

CodePlex Daily Summary for Tuesday, November 30, 2010Popular ReleasesSense/Net Enterprise Portal & ECMS: SenseNet 6.0.1 Community Edition: Sense/Net 6.0.1 Community Edition This half year we have been working quite fiercely to bring you the long-awaited release of Sense/Net 6.0. Download this Community Edition to see what we have been up to. These months we have worked on getting the WebCMS capabilities of Sense/Net 6.0 up to par. New features include: New, powerful page and portlet editing experience. HTML and CSS cleanup, new, powerful site skinning system. Upgraded, lightning-fast indexing and query via Lucene. Limita...Minecraft GPS: Minecraft GPS 1.1.1: New Features Compass! New style. Set opacity on main window to allow overlay of Minecraft. Open World in any folder. Fixes Fixed style so listbox won't grow the window size. Fixed open file dialog issue on non-vista kernel machines.DotSpatial: DotSpatial 11-28-2001: This release introduces some exciting improvements. Support for big raster, both in display and changing the scheme. Faster raster scheme creation for all rasters. Caching of the "sample" values so once obtained the raster symbolizer dialog loads faster. Reprojection supported for raster and image classes. Affine transform fully supported for images and rasters, so skewed images are now possible. Projection uses better checks when loading unprojected layers. GDAL raster support f...Virtu: Virtu 0.9.0: Source Requirements.NET Framework 4 Visual Studio 2010 or Visual Studio 2010 Express Silverlight 4 Tools for Visual Studio 2010 Windows Phone 7 Developer Tools (which includes XNA Game Studio 4) Binaries RequirementsSilverlight 4 .NET Framework 4 XNA Framework 4SuperWebSocket: SuperWebSocket(60438): It is the first release of SuperWebSocket. Because it is base on SuperSocket, most features of SuperSocket are supported in SuperWebSocket. The source code include a LiveChat demo.MDownloader: MDownloader-0.15.25.7002: Fixed updater Fixed FileServe Fixed LetItBitNotepad.NET: Notepad.NET 0.7 Preview 1: Whats New?* Optimized Code Generation: Which means it will run significantly faster. * Preview of Syntax Highlighting: Only VB.NET highlighting is supported, C# and Ruby will come in Preview 2. * Improved Editing Updates (when the line number, etc updates) to be more graceful. * Recent Documents works! * Images can be inserted but they're extremely large. Known Bugs* The Update Process hangs: This is a bug apparently spawning since 0.5. It will be fixed in Preview 2. Until then, perform a fr...Cropper: 1.9.4: Mostly fixes for issues with a few feature requests. Fixed Issues 2730 & 3638 & 14467 11044 11447 11448 11449 14665 Implemented Features 6123 11581PFC: PFC for PB 11.5: This is just a migration from the 11.0 code. No changes have been made yet (and they are needed) for it to work properly with 11.5.PDF Rider: PDF Rider 0.5: This release does not add any new feature for pdf manipulation, but enables automatic updates checking, so it is reccomended to install it in order to stay updated with next releases. Prerequisites * Microsoft Windows Operating Systems (XP - Vista - 7) * Microsoft .NET Framework 3.5 runtime * A PDF rendering software (i.e. Adobe Reader) that can be opened inside Internet Explorer. Installation instructionsChoose one of the following methods: 1. Download and run the "pdfRider0...BCLExtensions: BCL Extensions v1.0: The files associated with v1.0 of the BCL Extensions library.XamlQuery/WPF - The Write Less, Do More, WPF Library: XamlQuery-WPF v1.2 (Runtime, Source): This is the first release of popular XamlQuery library for WPF. XamlQuery has already gained recognition among Silverlight developers.Math.NET Numerics: Beta 1: First beta of Math.NET Numerics. Only contains the managed linear algebra provider. Beta 2 will include the native linear algebra providers along with better documentation and examples.Microsoft All-In-One Code Framework: Visual Studio 2010 Code Samples 2010-11-25: Code samples for Visual Studio 2010Wii Backup Fusion: Wii Backup Fusion 0.8.5 Beta: - WBFS repair (default) options fixed - Transfer to image fixed - Settings ui widget names fixed - Some little bug fixes You need to reset the settings! Delete WiiBaFu's config file or registry entries on windows: Linux: ~/.config/WiiBaFu/wiibafu.conf Windows: HKEY_CURRENT_USER\Software\WiiBaFu\wiibafu Mac OS X: ~/Library/Preferences/com.wiibafu.wiibafu.plist Caution: This is a BETA version! Errors, crashes and data loss not impossible! Use in test environments only, not on productive syste...Minemapper: Minemapper v0.1.3: Added process count and world size calculation progress to the status bar. Added View->'Status Bar' menu item to show/hide the status bar. Status bar is automatically shown when loading a world. Added a prompt, when loading a world, to use or clear cached images.Sexy Select: sexy select v0.4: Changes in v0.4 Added method : elements. This returns all the option elements that are currently added to the select list Added method : selectOption. This method accepts two values, the element to be modified and the selected state. (true/false)Deep Zoom for WPF: First Release: This first release of the Deep Zoom control has the same source code, binaries and demos as the CodeProject article (http://www.codeproject.com/KB/WPF/DeepZoom.aspx).BlogEngine.NET: BlogEngine.NET 2.0 RC: This is a Release Candidate version for BlogEngine.NET 2.0. The most current, stable version of BlogEngine.NET is version 1.6. Find out more about the BlogEngine.NET 2.0 RC here. If you want to extend or modify BlogEngine.NET, you should download the source code. To get started, be sure to check out our installation documentation and the installation screencast. If you are upgrading from a previous version, please take a look at the Upgrading to BlogEngine.NET 2.0 instructions. As this ...NodeXL: Network Overview, Discovery and Exploration for Excel: NodeXL Excel Template, version 1.0.1.156: The NodeXL Excel template displays a network graph using edge and vertex lists stored in an Excel 2007 or Excel 2010 workbook. What's NewThis release adds a feature for aggregating the overall metrics in a folder full of NodeXL workbooks, adds geographical coordinates to the Twitter import features, and fixes a memory-related bug. See the Complete NodeXL Release History for details. Please Note: There is a new option in the setup program to install for "Just Me" or "Everyone." Most people...New ProjectsActiveRecordTest: ActiveRecordTest is a sample project that is really a quick guide for start using Castle ActiveRecord within an ASP.NET web application.BacteriaManage: just test codeplexDS CMS: Diamond Shop - open source project. 1. ASP.NET MVC 3.0 2. Entity Framework 3. Jquery 4. LinqGeneral Media Access WebService: This project is focused on building a general purpose media access webservice based on WCF.JavaEE server for XUNU: C'est le serveur internet du site à ChoupieLearning management system: Learning management system to help teachers on their work.LogWriterReader using Named pipe: LogWriterReader using Named pipeNMix: NMix???EntLib，NHibernate，log4net??????????，????????????????，?????????、?????、????、????、?????????。Nosso Rico Dinheirinho: Financial control system like Microsoft Money, but via web.Post Template: Post Template (for now) is for craigslist posters looking to make their posts more visually appealing. Abstracting the styling and layout details of HTML and CSS, Post Template eliminates the need to know these languages when posting. Post Template is mostly written in C#.SharePoint Silverlight Clock: SharePoint Silverlight ClockSilverlight MVVM wizard using Caliburn Micro: This MVVM style Silverlight 4 wizard shows some Caliburn Micro features, as well as the use of MEF and MVVM style unit testing. The UI and code are based on the code accompanying the "Code Project" article "Creating an Internationalized Wizard in WPF" from dec. 2008.Spider Framework: A ruler-based spider framework developing with C#syx Open Source Project: syx Open Source ProjectTigerCat: TigerCat will support application development as infrastructure and RAD tools.TitleNetSolution: This my team Solution.!Uploadert: UploadertWidget Suite for DotNetNuke: This project is intended to hold a suite of useful widgets to make your skinning easier, and raise the level of interactivity with DotNetNuke website visitors.ZenBridge for Picasa: ZenBridge for Picasa makes it easy for Zenfolio users to upload edited images directly to a chosen Zenfolio gallery. It's developed in C#.NET 4.

Read the article
Recovering a lost website with no backup?

- by Jeff Atwood

Unfortunately, our hosting provider experienced 100% data loss, so I've lost all content for two hosted blog websites: http://blog.stackoverflow.com http://www.codinghorror.com (Yes, yes, I absolutely should have done complete offsite backups. Unfortunately, all my backups were on the server itself. So save the lecture; you're 100% absolutely right, but that doesn't help me at the moment. Let's stay focused on the question here!) I am beginning the slow, painful process of recovering the website from web crawler caches. There are a few automated tools for recovering a website from internet web spider (Yahoo, Bing, Google, etc.) caches, like Warrick, but I had some bad results using this: My IP address was quickly banned from Google for using it I get lots of 500 and 503 errors and "waiting 5 minutes…" Ultimately, I can recover the text content faster by hand I've had much better luck by using a list of all blog posts, clicking through to the Google cache and saving each individual file as HTML. While there are a lot of blog posts, there aren't that many, and I figure I deserve some self-flagellation for not having a better backup strategy. Anyway, the important thing is that I've had good luck getting the blog post text this way, and I am definitely able to get the text of the web pages out of the Internet caches. Based on what I've done so far, I am confident I can recover all the lost blog post text and comments. However, the images that go with each blog post are proving…more difficult. Any general tips for recovering website pages from Internet caches, and in particular, places to recover archived images from website pages? (And, again, please, no backup lectures. You're totally, completely, utterly right! But being right isn't solving my immediate problem… Unless you have a time machine…)

Read the article
How should I deal with user agent parsing in logs?

- by Mr. Jefferson

My web app project includes logging functionality so we can see where visitors are coming from (referrer URL), what the popular user agents are, what pages are most popular, etc. The log is stored in SQL Server, and when I query the user agents I use a large (almost 100 lines) and growing CASE statement to separate the user agents using string matching (i.e. if the user agent contains the string "Firefox/9" then it's Firefox 9). Is there a better way to do this so I don't have to continually add to that CASE statement to deal with new browser releases? Also, how should I deal with less common, weird/unknown user agents? I've seen the following in the logs and been unable to find good information online about what they are: WordPress/3.3.1; http://www.facecolony.org Mozilla/4.0 ( http://www.hairirons.org redips; <a href=http://hairirons.org/>chi hair iron</a>) I'd guess they're bots/crawlers, but the sites they point to don't appear to reference web crawlers (or even be available sometimes). I've seen other user agents aren't familiar to me, but I know they're bots because they include "bot" or "spider" or something similar in them.

Read the article
Googlebot visit but no cache update - why?

- by Mick

I have made a new plain vanilla HTML website. I have been making regular modifications to it on an almost daily basis. The site is hosted by hostmonster and as part of their service they offer "awstats" to let you know assorted details of visitors to the site. One thing is puzzling me. According to awstats, a "robot/spider" calling itself "Googlebot" visited my site as recently as today (28th June 2011), but when I find my site on google (e.g. by searching for "full reserve banking") the cache is dated only the 5th June. I always thought that a visit from the google robot was synonymous with a cache update. Am I wrong? Or have I accidentally put something in the site telling google that nothing has been updated? EDIT: It seems a moderator has removed the name of my website, so there is now no chance that anyone could check out if I had made some error on my site :-( ... but anyway, in answer to paulmorriss' question, here is what aw stats was telling me:

Read the article
Googlebot visit but no cache update - why?

- by Mick

I have made a new plain vanilla HTML website. I have been making regular modifications to it on an almost daily basis. The site is hosted by hostmonster and as part of their service they offer "awstats" to let you know assorted details of visitors to the site. One thing is puzzling me. According to awstats, a "robot/spider" calling itself "Googlebot" visited my site as recently as today (28th June 2011), but when I find my site on google (e.g. by searching for "full reserve banking") the cache is dated only the 5th June. I always thought that a visit from the google robot was synonymous with a cache update. Am I wrong? Or have I accidentally put something in the site telling google that nothing has been updated? EDIT: It seems a moderator has removed the name of my website, so there is now no chance that anyone could check out if I had made some error on my site :-( ... but anyway, in answer to paulmorriss' question, here is what aw stats was telling me:

Read the article

Search Results

Search found 173 results on 7 pages for 'spider'.

Page 5/7 | < Previous Page | 1 2 3 4 5 6 7 | Next Page >

- by marko

- by Rick Donohoe

- by DaveDev

- by user32642

- by user305175

- by Muhammad Umar Siddique

- by Daniel

- by Jon Galloway

- by user2522395

- by Kevin Major

- by Glenn Plas

- by joshmmo

- by Jordan 1GT

- by Jordan 1GT

- by Michael Irey

- by golimar

- by Jens Törnell

- by Uwe Keim

- by Noam

- by Jeff Atwood

- by Mr. Jefferson

- by Mick

- by Mick

< Previous Page | 1 2 3 4 5 6 7 | Next Page >