Search Results

Search found 173 results on 7 pages for 'spider'.

Page 6/7 | < Previous Page | 2 3 4 5 6 7 | Next Page >

Run a PHP-script from a PHP-script without blocking

- by Nicklas Ansman

I'm building a spider which will traverse various sites and data mining them. Since I need to get each page separately this could take a VERY long time (maybe 100 pages). I've already set the set_time_limit to be 2 minutes per page but it seems like apache will kill the script after 5 minutes no matter. This isn't usually a problem since this will run from cron or something similar which does not have this time limit. However I would also like the admins to be able to start a fetch manually via a HTTP-interface. It is not important that apache is kept alive for the full duration, I'm, going to use AJAX to trigger a fetch and check back once in a while with AJAX. My problem is how to start the fetch from within a PHP-script without the fetch being terminated when the script calling it dies. Maybe I could use system('script.php &') but I'm not sure it will do the trick. Any other ideas?

Read the article
Prevent bot from crawling certain areas of site.

- by Skoder

Hey, I don't know much about SEO and how web spiders work, so forgive my ignorance here. I'm creating a site (using ASP.NET-MVC) which has areas that displays information retrieved from the database. The data is unique to the user, so there's no real server-side output caching going on. However, since the data can contain things the user may not wish to have displayed from search engine results, I'd like to prevent any spiders from accessing the search results page. Are there any special actions I should take to ensure that the search result directory isn't crawled? Also, would a spider even crawl a page that's dynamically generated and would any actions preventing certain directories being search mess up my search engine rankings? edit: I should add, I'm reading up on robots.txt protocol, but it relies on co-operation from the web crawler. However, I'd also like to prevent any data-mining users who will ignore the robots.txt file. I appreciate any help!

Read the article
is a negative text-indent considered cloaking?

- by John Isaacks

I am using the negative-text-indent technique I learned to show a text-image to the user, while hiding the corresponding actual text. This way the user sees the fancy styled text while search engines can still index it. However I am started to think this sounds like cloaking since I am serving different content to the user vs the spider. However, I am not using this in a deceitful way. Plus it seems like this is a popular technique. So is it SEO-safe or is it cloaking? Thanks!

Read the article
How Do I Programmatically Check if a WordPress Plugin Is Already Activated?

- by Volomike

I know I can use activate_plugin() from inside a given, active plugin in WordPress, to activate another plugin. But what I want to know is how do I programmatically check if that plugin is already active? For instance, this snippet of code can be temporarily added to an existing plugin's initial file to activate a partner plugin: add_action('wp','activatePlugins'); function activatePlugins() { if( is_single() || is_page() || is_home() || is_archive() || is_category() || is_tag()) { @ activate_plugin('../mypartnerplugin/thepluginsmainfile.php'); } } Then, use a Linux command line tool to spider all your sites that have this code, and it will force a page view. That page view will cause the above code to fire and activate that other plugin. That's how to programmatically activate another plugin from a given plugin as far as I can tell. But the problem is that it gets activated over and over and over again. What would be great is if I had an if/then condition and some function I could call in WordPress to see if that plugin is already activated, and only activate it once if not active.

Read the article
Scraping paginated items from a website using scrapy

- by Mridang Agarwalla

I'm using scrapy to scrape items from a site. I'm not being able to implement this scraping pattern. The site I'm trying to scrape is a forum and I scrape the site once a day. Each page has a table containing posts. New posts are added to the top of the table and as more and more posts are posted to the site, the older posts go further into the pages due to pagination. This is a very simple scenario and we will assume that the order of the posts never change. I would like to scrape this site and scrape all the "new" records until the last scraped post from yesterday is encountered. I have configured my spider to paginate endlessly and when it encounters yesterday's last scraped post, it should stop. How can implement this? (My Scrapy installation works with my Django installation using django-dynamic-scraper )

Read the article
Repeating a List in Scala

- by Ralph

I am a Scala noob. I have decided to write a spider solitaire solver as a first exercise to learn the language and functional programming in general. I would like to generate a randomly shuffled deck of cards containing 1, 2, or 4 suits. Here is what I came up with: val numberOfSuits = 1 (List["clubs", "diamonds", "hearts", "spades"].take(numberOfSuits) * 4).take(4) which should return List["clubs", "clubs", "clubs", "clubs"] List["clubs", "diamonds", "clubs", "diamonds"] List["clubs", "diamonds", "hearts", "spades"] depending on the value of numberOfSuits, except there is no List "multiply" operation that I can find. Did I miss it? Is there a better way to generate the complete deck before shuffling? BTW, I plan on using an Enumeration for the suits, but it was easier to type my question with strings. I will take the List generated above and using a for comprehension, iterate over the suits and a similar List of card "ranks" to generate a complete deck.

Read the article
Asp.net cached objects staying in memory

- by GordonB

I have a asp.net web forms app that uses System.Web.Caching.Cache to cache xml data from a number of web services for 2 hours. webCacheObj.Remove(dataCacheKey) webCacheObj.Insert(dataCacheKey, dataToCache, Nothing, DateTime.Now.AddHours(2), Nothing) Every 90 minutes a Microsoft Search Server hits a particular (spider) page which calls the code to put the objects into the cache. The issue i have is that over a period of time, the memory usage of the application grows exponentially. Lets say that in a week, the memory usage of the application pool grows to over 1gb. I'm using IIS7 and no application pool recycling is currently enabled.

Read the article
WebClient.DownloadString() Not Producing Exact HTML

- by Ryan Fuentes

So here's the deal. I'm creating a spider bot for a website that scans all the product pages and records the product data. I'm using C# and the WebClient library to download the HTML string. The site I'm crawling must be specially made because the HTML that is received from WebClient.DownloadString() is different than the HTML that I get when I view the source of the HTML when visiting it on a browser. This seems intentional because the only info I can't get is the price. Does anyone know a workaround for this problem or can anyone explain what is happening? Thanks.

Read the article
How to reset Scrapy parameters? (always running under same parameters)

- by Jean Ventura

I've been running my Scrapy project with a couple of accounts (the project scrapes a especific site that requieres login credentials), but no matter the parameters I set, it always runs with the same ones (same credentials). I'm running under virtualenv. Is there a variable or setting I'm missing? Edit: It seems that this problem is Twisted related. Even when I run: scrapy crawl -a user='user' -a password='pass' -o items.json -t json SpiderName I still get an error saying: ERROR: twisted.internet.error.ReactorNotRestartable And all the information I get, is the last 'succesful' run of the spider.

Read the article
Are there Adaptive Replacement Cache patent-free alternatives?

- by aleccolocco

An open source high-performance project I'm working on needs to keep a cache of parsed/compiled files. A plain LRU or a plain LFU wouldn't fit. Plain LRU wouldn't work as there will be remote batch/spider processes hitting the service regularly. Plain LFU wouldn't work because content will age. ARC seems like the perfect solution but since IBM holds patents to it at least one open source project dropped it. Are there any (good enough) alternatives? EDIT: I'm not looking for exactly the same thing, just something that could handle those two situations. Perhaps some simple strategy with timestamps and sources. There have to be many programmers who faced this situation before. That's why the "good enough" bit.

Read the article
Catch access to undefined property in JavaScript

- by avri

The Spider-Monkey JavaScript engine implements the noSuchMethod callback function for JavaScript Objects. This function is called whenever JavaScript tries to execute an undefined method of an Object. I would like to set a callback function to an Object that will be called whenever an undefined property in the Object is accessed or assigned to. I haven't found a noSuchProperty function implemented for JavaScript Objects and I am curios if there is any workaround that will achieve the same result. Consider the following code: var a = {}; a.__defineGetter__("bla", function(){alert(1);return 2;}); alert(a.bla); It is equivalent to [alert(1);alert(2)] - even though a.bla is undefined. I would like to achieve the same result but to unknown properties (i.e. without knowing in advance that a."bla" will be the property accessed)

Read the article
how to scrawl file hosting website with scrapy in python?

- by Veryel Hua

Can anyone help me to figure out how to scrawl file hosting website like filefactory.com? I don't want to download all the file hosted but just to index all available files with scrapy. I have read the tutorial and docs with respect to spider class for scrapy. If I only give the website main page as the begining url I wouldn't not scrawl the whole site, because the scrawling depends on links but the begining page seems not point to any file pages. That's the problem I am thinking and any help would be appreciated!

Read the article
ASP.NET MVC - how to modify requested URLs?

- by Marek

The baidu spider seems to be adding ¤ to end of some crawled urls (it seems that it happens with urls containing single unicode character as the last character) The baidu-requested url looks like this: site.com/abc/ä¤ while site.com/abc/ä is the valid url and as linked from many places on my site. The internal problem is that a different route is matched for this kind of url and an unhandled exception occurs. I would not like to lose baidu because of too many 500 errors on the site. I would like to change the requested URL to a different URL by removing the added character before any ASP.NET MVC processing of the request starts. Can I write a request filter/http module or something similar in ASP.NET MVC to remove the trailing '¤' from the urls? I would not like to alter my routes to counter-hack this behavior.

Read the article
What's the difference between Path and Polygon control? + issues when designing custom drawing user

- by tomo

I'm starting to design a custom control with rather complex drawing. It will be a kind of chart (a kind of radar chart). It will be composed of a few axises with labels, line-regions (like spider-network) and filled shapes. The main question is what's the difference between using Path control and Polygon control? What's better to use here? The goal is to prepare control with minimal amount of c# and try to do as much as possible in xaml / binding. The next important requirement is that control should resize to parent containers' width - if possible without any long recalulations in c#.

Read the article
What’s new in RadChart for 2010 Q1 (Silverlight / WPF)

Greetings, RadChart fans! It is with great pleasure that I present this short highlight of our accomplishments for the Q1 release :). Weve worked very hard to make the best silverlight and WPF charting product even better. Here is some of what we did during the past few months. 1) Zooming&Scrolling and the new sampling engine: Without a doubt one of the most important things we did. This new feature allows you to bind your chart to a very large set of data with blazing performance. Dont take my word for it give it a try! 2) New Smart Label Positioning and Spider-like labels feature: This new feature really helps with very busy graphs. You can play with the different settings we offer in this example. 3) Sorting and Filtering. Much like our RadGridview control the chart now allows you to sort and filter your data out of the box with a single line of code! 4) Legend improvements Weve also been paying attention to those of you who wanted a much improved legend. It is now possible to customize the look and feel of legend items and legend position with a single click. 5) Custom palette brushes. You have told us that you want to easily customize all palette colors using a single clean API from both XAML and code behind. The new custom palette brushes API does exactly that. There are numerous other improvements as well, as much improved themes, performance optimizations and other features that we did. If you want to dig in further check the release notes and changes and backwards compatibility topics. Feel free to share the pains and gains of working with RadChart. Our team is always open to receiving constructive feedback and beer :-)Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
What’s new in RadChart for 2010 Q1 (Silverlight / WPF)

Greetings, RadChart fans! It is with great pleasure that I present this short highlight of our accomplishments for the Q1 release :). Weve worked very hard to make the best silverlight and WPF charting product even better. Here is some of what we did during the past few months. 1) Zooming&Scrolling and the new sampling engine: Without a doubt one of the most important things we did. This new feature allows you to bind your chart to a very large set of data with blazing performance. Dont take my word for it give it a try! 2) New Smart Label Positioning and Spider-like labels feature: This new feature really helps with very busy graphs. You can play with the different settings we offer in this example. 3) Sorting and Filtering. Much like our RadGridview control the chart now allows you to sort and filter your data out of the box with a single line of code! 4) Legend improvements Weve also been paying attention to those of you who wanted a much improved legend. It is now possible to customize the look and feel of legend items and legend position with a single click. 5) Custom palette brushes. You have told us that you want to easily customize all palette colors using a single clean API from both XAML and code behind. The new custom palette brushes API does exactly that. There are numerous other improvements as well, as much improved themes, performance optimizations and other features that we did. If you want to dig in further check the release notes and changes and backwards compatibility topics. Feel free to share the pains and gains of working with RadChart. Our team is always open to receiving constructive feedback and beer :-)Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
blocking bad bots with robots.txt in 2012 [closed]

- by Rachel Sparks

does it still work good? I have this: # Generated using http://solidshellsecurity.com services # Begin block Bad-Robots from robots.txt User-agent: asterias Disallow:/ User-agent: BackDoorBot/1.0 Disallow:/ User-agent: Black Hole Disallow:/ User-agent: BlowFish/1.0 Disallow:/ User-agent: BotALot Disallow:/ User-agent: BuiltBotTough Disallow:/ User-agent: Bullseye/1.0 Disallow:/ User-agent: BunnySlippers Disallow:/ User-agent: Cegbfeieh Disallow:/ User-agent: CheeseBot Disallow:/ User-agent: CherryPicker Disallow:/ User-agent: CherryPickerElite/1.0 Disallow:/ User-agent: CherryPickerSE/1.0 Disallow:/ User-agent: CopyRightCheck Disallow:/ User-agent: cosmos Disallow:/ User-agent: Crescent Disallow:/ User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0 Disallow:/ User-agent: DittoSpyder Disallow:/ User-agent: EmailCollector Disallow:/ User-agent: EmailSiphon Disallow:/ User-agent: EmailWolf Disallow:/ User-agent: EroCrawler Disallow:/ User-agent: ExtractorPro Disallow:/ User-agent: Foobot Disallow:/ User-agent: Harvest/1.5 Disallow:/ User-agent: hloader Disallow:/ User-agent: httplib Disallow:/ User-agent: humanlinks Disallow:/ User-agent: InfoNaviRobot Disallow:/ User-agent: JennyBot Disallow:/ User-agent: Kenjin Spider Disallow:/ User-agent: Keyword Density/0.9 Disallow:/ User-agent: LexiBot Disallow:/ User-agent: libWeb/clsHTTP Disallow:/ User-agent: LinkextractorPro Disallow:/ User-agent: LinkScan/8.1a Unix Disallow:/ User-agent: LinkWalker Disallow:/ User-agent: LNSpiderguy Disallow:/ User-agent: lwp-trivial Disallow:/ User-agent: lwp-trivial/1.34 Disallow:/ User-agent: Mata Hari Disallow:/ User-agent: Microsoft URL Control - 5.01.4511 Disallow:/ User-agent: Microsoft URL Control - 6.00.8169 Disallow:/ User-agent: MIIxpc Disallow:/ User-agent: MIIxpc/4.2 Disallow:/ User-agent: Mister PiX Disallow:/ User-agent: moget Disallow:/ User-agent: moget/2.1 Disallow:/ User-agent: mozilla/4 Disallow:/ User-agent: Mozilla/4.0 (compatible; BullsEye; Windows 95) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 95) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 98) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows NT) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows XP) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows 2000) Disallow:/ User-agent: Mozilla/4.0 (compatible; MSIE 4.0; Windows ME) Disallow:/ User-agent: mozilla/5 Disallow:/ User-agent: NetAnts Disallow:/ User-agent: NICErsPRO Disallow:/ User-agent: Offline Explorer Disallow:/ User-agent: Openfind Disallow:/ User-agent: Openfind data gathere Disallow:/ User-agent: ProPowerBot/2.14 Disallow:/ User-agent: ProWebWalker Disallow:/ User-agent: QueryN Metasearch Disallow:/ User-agent: RepoMonkey Disallow:/ User-agent: RepoMonkey Bait & Tackle/v1.01 Disallow:/ User-agent: RMA Disallow:/ User-agent: SiteSnagger Disallow:/ User-agent: SpankBot Disallow:/ User-agent: spanner Disallow:/ User-agent: suzuran Disallow:/ User-agent: Szukacz/1.4 Disallow:/ User-agent: Teleport Disallow:/ User-agent: TeleportPro Disallow:/ User-agent: Telesoft Disallow:/ User-agent: The Intraformant Disallow:/ User-agent: TheNomad Disallow:/ User-agent: TightTwatBot Disallow:/ User-agent: Titan Disallow:/ User-agent: toCrawl/UrlDispatcher Disallow:/ User-agent: True_Robot Disallow:/ User-agent: True_Robot/1.0 Disallow:/ User-agent: turingos Disallow:/ User-agent: URLy Warning Disallow:/ User-agent: VCI Disallow:/ User-agent: VCI WebViewer VCI WebViewer Win32 Disallow:/ User-agent: Web Image Collector Disallow:/ User-agent: WebAuto Disallow:/ User-agent: WebBandit Disallow:/ User-agent: WebBandit/3.50 Disallow:/ User-agent: WebCopier Disallow:/ User-agent: WebEnhancer Disallow:/ User-agent: WebmasterWorldForumBot Disallow:/ User-agent: WebSauger Disallow:/ User-agent: Website Quester Disallow:/ User-agent: Webster Pro Disallow:/ User-agent: WebStripper Disallow:/ User-agent: WebZip Disallow:/ User-agent: WebZip/4.0 Disallow:/ User-agent: Wget Disallow:/ User-agent: Wget/1.5.3 Disallow:/ User-agent: Wget/1.6 Disallow:/ User-agent: WWW-Collector-E Disallow:/ User-agent: Xenu's Disallow:/ User-agent: Xenu's Link Sleuth 1.1c Disallow:/ User-agent: Zeus Disallow:/ User-agent: Zeus 32297 Webster Pro V2.9 Win32 Disallow:/

Read the article
Tron: Legacy, 3D goggles, and embedded UA

- by Roger Hart

The 3D edition of Tron: Legacy opens with embedded user assistance. The film starts with an iconic white-on-black command-prompt message exhorting viewers to keep their 3D glasses on throughout. I can't quote it verbatim, and at the time of writing nor could anybody findable with 5 minutes of googling. But it was something like: "Although parts of the movie are 2D, it was shot in 3D, and glasses should be worn at all times. This is how it was intended to be viewed" Yeah - "intended". That part is verbatim. Wow. Now, I appreciate that even out of the small sub-set of readers who care a rat's ass for critical theory, few will be quite so gung-ho for the whole "death of the author" shtick as I tend to be. And yes, this is ergonomic rather than interpretive, but really - telling an audience how you expect them to watch a movie? That's up there with Big Steve's "you're holding it wrong" Even if it solves the problem, it's pretty arrogant. If anything, it's worse than RTFM. And if enough people are doing it wrong that you have to include the announcement, then maybe - just maybe - you've got a UX and/or design problem. Plus, current 3D glasses are like sitting in a darkened room, cosplaying the lovechild of Spider Jerusalem and Jarvis Cocker. Ok, so that observation was weirder than it was helpful; but seriously, nobody wants to wear the glasses if they don't have to. They ruin the visual experience of the non-3D sections, and personally, I find them pretty disruptive to the suspension of disbelief. This is an old, old, problem, and I'm carping on about it because Tron is enjoyable mass-market slush. It's easier for me to say "no, I can't just put some text on it. It's fundamentally broken, redesign it." in the middle of a small-ish, agile, software project than it would be for some beleaguered production assistant at the end of editing a $200 million movie. But lots of folks in software don't even get to do that. Way more people are going to see Tron, and be annoyed by this, than will ever read a technical communication blog. So hopefully, after two hours of being mildly annoyed, wanting to turn the brightness up, and slowly getting a headache, they'll realise something very, very important: you just can't document your way out of a shoddy UI.

Read the article
What’s new in RadChart for 2010 Q1 (Silverlight / WPF)

Greetings, RadChart fans! It is with great pleasure that I present this short highlight of our accomplishments for the Q1 release :). Weve worked very hard to make the best silverlight and WPF charting product even better. Here is some of what we did during the past few months. 1) Zooming&Scrolling and the new sampling engine: Without a doubt one of the most important things we did. This new feature allows you to bind your chart to a very large set of data with blazing performance. Dont take my word for it give it a try! 2) New Smart Label Positioning and Spider-like labels feature: This new feature really helps with very busy graphs. You can play with the different settings we offer in this example. 3) Sorting and Filtering. Much like our RadGridview control the chart now allows you to sort and filter your data out of the box with a single line of code! 4) Legend improvements Weve also been paying attention to those of you who wanted a much improved legend. It is now possible to customize the look and feel of legend items and legend position with a single click. 5) Custom palette brushes. You have told us that you want to easily customize all palette colors using a single clean API from both XAML and code behind. The new custom palette brushes API does exactly that. There are numerous other improvements as well, as much improved themes, performance optimizations and other features that we did. If you want to dig in further check the release notes and changes and backwards compatibility topics. Feel free to share the pains and gains of working with RadChart. Our team is always open to receiving constructive feedback and beer :-)Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.

Read the article
WebCenter Customer Spotlight: Marvel

- by me

Author: Peter Reiser - Social Business Evangelist, Oracle WebCenter Solution SummaryMarvel Entertainment, LLC (Marvel) is one of the world's most prominent character-based entertainment companies, built on a proven library of over 8,000 characters featured in a variety of media over seventy years. The customer wanted to optimize their brand licensing process, so Marvel worked with Oracle WebCenter partner Fishbowl Solutions and implemented a centralized Content Hub based on Oracle WebCenter Content. The 100% web based secure Intranet/Partner Extranet solution is now managing the entire life cycle of the brand licensing process. Marvel and their brand licensees have now complete visibility of brand license operations including the history of approval request and related content. Company OverviewMarvel Entertainment, LLC (Marvel) a wholly-owned subsidiary of The Walt Disney Company, is one of the world's most prominent character-based entertainment companies, built on a proven library of over 8,000 characters featured in a variety of media over seventy years. Marvel utilizes its character franchises in entertainment, licensing and publishing. Sample characters: - Spider-Man - Iron Man - Captain America - X-MEN - Thor - Avengers - And a host of others Business ChallengesMarvel wanted to optimize their brand licensing process for their characters and had following business requirements : Facilitating content worldwide Scalable and flexible infrastructure to manage multiple content types and huge file sizes Optimize the licensing process workflow trough automatic notifications, tracking reviews, issuing approvals, etc. Solution DeployedMarvel worked with Oracle WebCenter partner Fishbowl Solutions and implemented a centralized Content Hub based on Oracle WebCenter Content. The 100% web based secure Intranet/Partner Extranet solution is now managing the entire life cycle of the brand licensing process. The internal users can now manage all digital assets related to a character trough proper categorization of all items, workflow based review and approval of branding styles and a powerful search and retrieval service. The licensees of Marvel brands can now online develop and submit concepts and prototypes which are reviewed and approved using a collaborative process. Business ResultMarvel and their brand licensees have now complete visibility of brand license operations including the history of approval request and related content. The character brand related content is now in the right place, at the right time at the user's fingertips with highly improved quality. Additional Information Marvel Open World Presentation Oracle WebCenter Content

Read the article
ASP.NET Web Forms is bad, or what am I missing?

- by iveqy

Being a PHP guy myself I recently had to write a spider to an asp.net site. I was really surprised by the different approach to ajax and form-handling. For example, in the PHP sites I've worked with, a deletion of a database entry would be something like: GET delete.php?id=&confirm=yes and get a "success" back in some form (in the ajax case, probably a json reply). In this asp.net application you would instead post a form, including all inputs on the page, with a huge __VIEWSTATE and __EVENTVALIDATION. This would be more than 10 times as big as above. The reply would be the complete side again, with a footer containing some structured data for javascript to parse and display the result. Again, the whole page is sent, and then throwed away(?) since it's already displayed. Why not just send the footer with the data to parse (it's not json nor xml but a | separated list). I really can't see why you would design a system that way. Usually you've a fast client, and a somewhat fast server but a really slow connection. Why not keep the datatransfer to a minimum? Why those huge __VIEWSTATE and __EVENTVALIDATION? It seems that everything is done way to chatty and way to complicated. I really can't see the point and that usually means that I'm missing something. So please tell me, what are the reasons for this design and what benefits (and weaknesses) does it have? (Yes I know that __VIEWSTATE is used to tell what type of form-konfiguration should be sent back to the server. But WHY is this needed?) Please keep this discussion strictly technical and avoid flamewars. Update: Please excuse the somewhat rantish question. I tried to explain my view to be able to get a better answer. I am not saying that asp.net is bad, I am saying that I don't understand the meaning of those concepts. Usually that means that I've things to learn instead of the concepts beeing wrong. I appreciate the explanations about that "you don't have to do this way in asp.net", I'll read up on MVC and other .net technologies. However, there most be a reason for this site (the one I referred to) to be written the way it is. It's written by professionals for a big organisation with far more experience than what I've. Any explanation about their (possible) design choice would be welcome.

Read the article
Unknown Apache2 + PHP5 FastCGI 500 error .. caused by search engine bots?

- by rdjurovich

My Ubuntu server is configured with Apache 2.2.8 and PHP 5.2.4-2ubuntu5.18 in FastCGI mode. Everything works well, except I am seeing 500 errors that only seem to come from bots accessing the server.. for example (access.log): x.125.71.104 - - [16/Nov/2011:10:27:39 +1100] "GET / HTTP/1.1" 500 41377 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" x.40.103.239 - - [16/Nov/2011:11:05:56 +1100] "GET / HTTP/1.0" 500 14717 "-" "Mozilla/5.0 (compatible; mon.itor.us - free monitoring service; http://mon.itor.us)" x.249.67.114 - - [14/Nov/2011:20:57:17 +1100] "GET / HTTP/1.1" 500 101 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" x.55.39.85 - - [14/Nov/2011:19:31:06 +1100] "GET / HTTP/1.1" 500 7032 "-" "msnbot/2.0b (+http://search.msn.com/msnbot.htm)._" It is my understanding that a 500 error will be thrown when the PHP process fails to respond to Apache, which could be caused by a fatal PHP error or if PHP runs out of processes.. so my assumption is that either the bots are hitting the server too hard, killing the PHP processes, or something in the request header from bots is causing a fatal error in my PHP script? If anyone can offer advice on this it would be greatly appreciated! Ryan

Read the article
Software to automate website screenshot capture

- by Leniel Macaferi

Do you know any software that can automate the process of getting screenshots of every page of a website? It would act like a spider/crawler/robot. You name it... For example: I developed a website and now I'd like to get a screenshot of every page of the site. I of course could do it manually (a lot of work). For each module of the site (Student, Payment, etc) I have different pages (Create, Edit, Details, Delete, etc) forms. The thing I'm looking for is a software that can visit every link of the site and then capture the screen - a software that can automate the whole process. It would also be good if the software allowed the user to pass a list of URLs to capture screenshots allowing even more fine grained configuration. EDIT: I tried Selenium mentioned by Aaron in his answer but I managed to find an app that does exactly what I needed. It's called Paparazzi!. I wrote a blog post to showcase my attempt at Selenium and the findings regarding Paparazzi!'s batch capture functionality: Software to automate website screenshot capture

Read the article
illegitimate traffic from user agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729)

- by user114293

Since the beginning of the year, I'm getting a lot of traffic with the user agent Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.10) Gecko/2009042316 Firefox/3.0.10 (.NET CLR 3.5.30729). My access logs show 40% - 60% from that user agent. That's strange because the user agent states a Firefox 3.0.10 browser (is anybody using that browser in 2012? Definitely not 40%-60% of visitors on a normal website). Also, the logs show that this user agent only requested the HTML document and no referenced assets like images, css, js files. I checked the IPs of those requests (with that UA). It's coming from all over the world. I recognized that those IPs sometimes have a mobile user agent. So my suspicion is a mobile app that is doing a lot of "spider requests" - but if that would be the case than other web sites should have the same problem. That's actually my question: Does anybody experience same/similar problems?

Read the article
Strange Domain name under the same IP Address

- by Mike Chip

There's something really weird happening in my server. But first things first: I wanted to have my website and chose the domain name "myowndomain.com", Now on my domain registrar I point "myowndomain.com" to the address of my recently setup VPS, let's say 50.50.50.50 So I installed everything I needed to run my website, and I started to notice strange queries coming from different IP Addresses. Like these [client 123.123.123.123] File does not exist: /var/www/html/api, referer: http://www.strangedomain.com/api/manyou/my.php [client 456.456.456.456] File does not exist: /var/www/html/api, referer: http://www.strangedomain.com/api/manyou/my.php or like this (Really a long line, I cut some things) GET /?s=vod-show-id-22-area-%E5%85%B6%E4%BB%96-language-%E9%9F%A9%E8%AF%AD.html HTTP/1.1" 301 295 "http://v.strangedomain.com/?s=vod-s ...[cut]... spider" That above is happening the most. The 'strangedomain.com' returns the same IP address of my VPS which my website is hosted on. The whois of such domain shows it's registered to a chinese. But the street name didn't look so right (like a huge single word), so I think all of that info might be fake, but still might be a chinese. I also noticed that all 'clients' trying to access the 'strangedomain.com' is coming from china. If I type in the browser 'strangedomain.com', I see my website. I'm worried, because my website is actually an e-commerce. I don't know if 'strangedomain.com' WAS a website on 50.50.50.50 in the not so far past, or if it's something else.

Read the article

Search Results

Search found 173 results on 7 pages for 'spider'.

Page 6/7 | < Previous Page | 2 3 4 5 6 7 | Next Page >

- by Nicklas Ansman

- by Skoder

- by John Isaacks

- by Volomike

- by Mridang Agarwalla

- by Ralph

- by GordonB

- by Ryan Fuentes

- by Jean Ventura

- by aleccolocco

- by avri

- by Veryel Hua

- by Marek

- by tomo

- by Rachel Sparks

- by Roger Hart

- by me

- by iveqy

- by rdjurovich

- by Leniel Macaferi

- by user114293

- by Mike Chip

< Previous Page | 2 3 4 5 6 7 | Next Page >