Search Results

Search found 68825 results on 2753 pages for 'problem'.

Page 760/2753 | < Previous Page | 756 757 758 759 760 761 762 763 764 765 766 767 | Next Page >

Can this Query can be corrected or different table structure needed? (question is clear, detailed, d

- by sandeepan

This is a bit lengthy but I have provided sufficient details and kept things very clear. Please see if you can help. (I will surely accept answer if it solves my problem) I am sure a person experienced with this can surely help or suggest me to decide the tables structure. About the system:- There are tutors who create classes A tags based search approach is being followed Tag relations are created/edited when new tutors registers/edits profile data and when tutors create classes (this makes tutors and classes searcheable).For simplicity, let us consider only tutor name and class name are the fields which are matched against search keywords. In this example, I am considering - tutor "Sandeepan Nath" has created a class called "first class" tutor "Bob Cratchit" has created a class called "new class" Desired search results- AND logic to be appied on the search keywords and match against class and tutor data(class name + tutor name), in other words, All those classes be shown such that all the search terms are present in the class name or its tutor name. Example to be clear - Searching "first class" returns class with id_wc = 1. Working Searching "Sandeepan class" should also return class with id_wc = 1. Not working in System 2. Problem with profile editing and searching To tell in one sentence, I am facing a conflict between the ease of profile edition (edition of tag relations when tutor profiles are edited) and the ease of search logic. In the beginning, we had one table structure and search was easy but tag edition logic was very clumsy and unmaintainable(Check System 1 in the section below) . So we created separate tag relations tables to make profile edition simpler but search has become difficult. Please dump the tables so that you can run the search query I have given below and see the results. System 1 (previous system - search easy - profile edition difficult):- Only one table called All_Tag_Relations table had the all the tag relations. The tags table below is common to both systems 1 and 2. CREATE TABLE IF NOT EXISTS `all_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `all_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`, `id_wc`) VALUES (1, 1, 1, NULL), (2, 2, 1, NULL), (3, 1, 1, 1), (4, 2, 1, 1), (5, 3, 1, 1), (6, 4, 1, 1), (7, 6, 2, NULL), (8, 7, 2, NULL), (9, 6, 2, 2), (10, 7, 2, 2), (11, 5, 2, 2), (12, 4, 2, 2); CREATE TABLE IF NOT EXISTS `tags` ( `id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT, `tag` varchar(255) DEFAULT NULL, PRIMARY KEY (`id_tag`), UNIQUE KEY `tag` (`tag`), KEY `id_tag` (`id_tag`), KEY `tag_2` (`tag`), KEY `tag_3` (`tag`), KEY `tag_4` (`tag`), FULLTEXT KEY `tag_5` (`tag`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ; INSERT INTO `tags` (`id_tag`, `tag`) VALUES (1, 'Sandeepan'), (2, 'Nath'), (3, 'first'), (4, 'class'), (5, 'new'), (6, 'Bob'), (7, 'Cratchit'); Please note that for every class, the tag rels of its tutor have to be duplicated. Example, for class with id_wc=1, the tag rel records with id_tag_rel = 3 and 4 are actually extras if you compare with the tag rel records with id_tag_rel = 1 and 2. System 2 (present system - profile edition easy, search difficult) Two separate tables Tutors_Tag_Relations and Webclasses_Tag_Relations have the corresponding tag relations data (Please dump into a separate database)- CREATE TABLE IF NOT EXISTS `tutors_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `tutors_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`) VALUES (1, 1, 1), (2, 2, 1), (3, 6, 2), (4, 7, 2); CREATE TABLE IF NOT EXISTS `webclasses_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `webclasses_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`), KEY `id_tag` (`id_tag`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; INSERT INTO `webclasses_tag_relations` (`id_tag_rel`, `id_tag`, `id_tutor`, `id_wc`) VALUES (1, 3, 1, 1), (2, 4, 1, 1), (3, 5, 2, 2), (4, 4, 2, 2); CREATE TABLE IF NOT EXISTS `tags` ( `id_tag` int(10) unsigned NOT NULL AUTO_INCREMENT, `tag` varchar(255) DEFAULT NULL, PRIMARY KEY (`id_tag`), UNIQUE KEY `tag` (`tag`), KEY `id_tag` (`id_tag`), KEY `tag_2` (`tag`), KEY `tag_3` (`tag`), KEY `tag_4` (`tag`), FULLTEXT KEY `tag_5` (`tag`) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 AUTO_INCREMENT=8 ; INSERT INTO `tags` (`id_tag`, `tag`) VALUES (1, 'Sandeepan'), (2, 'Nath'), (3, 'first'), (4, 'class'), (5, 'new'), (6, 'Bob'), (7, 'Cratchit'); CREATE TABLE IF NOT EXISTS `all_tag_relations` ( `id_tag_rel` int(10) NOT NULL AUTO_INCREMENT, `id_tag` int(10) unsigned NOT NULL DEFAULT '0', `id_tutor` int(10) DEFAULT NULL, `id_wc` int(10) unsigned DEFAULT NULL, PRIMARY KEY (`id_tag_rel`), KEY `All_Tag_Relations_FKIndex1` (`id_tag`), KEY `id_wc` (`id_wc`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1; insert into All_Tag_Relations select NULL,id_tag,id_tutor,NULL from Tutors_Tag_Relations; insert into All_Tag_Relations select NULL,id_tag,id_tutor,id_wc from Webclasses_Tag_Relations; Here you can see how easily tutor first name can be edited only in one place. But search has become really difficult, so on being advised to use a Temporary table, I am creating one at every search request, then dumping all the necessary data and then searching from it, I am creating this All_Tag_Relations table at search run time. Here I am just dumping all the data from the two tables Tutors_Tag_Relations and Webclasses_Tag_Relations. But, I am still not able to get classes if I search with tutor name This is the query which searches "first class". Running them on both the systems shows correct results (returns the class with id_wc = 1). SELECT wtagrels.id_wc,SUM(DISTINCT( wtagrels.id_tag =3)) AS key_1_total_matches, SUM(DISTINCT( wtagrels.id_tag =4)) AS key_2_total_matches FROM all_tag_relations AS wtagrels WHERE ( wtagrels.id_tag =3 OR wtagrels.id_tag =4 ) GROUP BY wtagrels.id_wc HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 LIMIT 0, 20 But, searching for "Sandeepan class" works only with the 1st system Here is the query which searches "Sandeepan class" SELECT wtagrels.id_wc,SUM(DISTINCT( wtagrels.id_tag =1)) AS key_1_total_matches, SUM(DISTINCT( wtagrels.id_tag =4)) AS key_2_total_matches FROM all_tag_relations AS wtagrels WHERE ( wtagrels.id_tag =1 OR wtagrels.id_tag =4 ) GROUP BY wtagrels.id_wc HAVING key_1_total_matches = 1 AND key_2_total_matches = 1 LIMIT 0, 20 Can anybody alter this query and somehow do a proper join or something to get correct results. That solves my problem in a nice way. As you can figure out, the reason why it does not work in system 2 is that in system 1, for every class, one additional tag relation linking class and tutor name is present. e.g. for class first class, (records with id_tag_rel 3 and 4) which returns the class on searching with tutor name. So, you see the trade-off between the search and profile edition difficulty with the two systems. How do I overcome both. I have to reach a conclusion soon. So far my reasoning is it is definitely not good from a code maintainability point of view to follow the single tag rel table structure of system one, because in a real system while editing a field like "tutor qualifications", there can be as many records in tag rels table as there are words in qualification of a tutor (one word in a field = one tag relation). Now suppose a tutor has 100 classes. When he edits his qualification, all the tag rel rows corresponding to him are deleted and then as many copies are to be created (as per the new qualification data) as there are classes. This becomes particularly difficult if later more searcheable fields are added. The code cannot be robust. Is the best solution to follow system 2 (edition has to be in one table - no extra work for each and every class) and somehow re-create the all_tag_relations table like system 1 (from the tables tutor_tag_relations and webclasses_tag_relations), creating the extra tutor tag rels for each and every class by a tutor (which is currently missing in system 2's temporary all_tag_relations table). That would be a time consuming logic script. I doubt that table can be recreated without resorting to PHP sript (mysql alone cannot do that). But the problem is that running all this at search time will make search definitely slow. So, how do such systems work? How are such situations handled? I thought about we can run a cron which initiates that PHP script, say every 1 minute and replaces the existing all_tag_relations table as per new tag rels from tutor_tag_relations and webclasses_tag_relations (replaces means creates a new table, deletes the original and renames the new one as all_tag_relations, otherwise search won't work during that period- or is there any better way to that?). Anyway, the result would be that any changes by tutors will reflect in search in the next 1 minute and not immediately. An alternateve would be to initate that PHP script every time a tutor edits his profile. But here again, since many users may edit their profiles concurrently, will the creation of so many tables be a burden and can mysql make the server slow? Any help would be appreciated and working solution will be accepted as answer. Thanks, Sandeepan

Read the article
How Should I Generate Trade Statistics For CouchDB/Rails3 Application?

- by James

My Problem: I am trying to developing a web application for currency traders. The application allows traders to enter or upload information about their trades and I want to calculate a wide variety of statistics based on what the user entered. Now, normally I would use a relational database for this, but I have two requirements that don't fit well with a relational database so I am attempting to use couchdb. Those two problems are: 1) Primarily, I have a companion desktop application that users will be able to work with and replicate to the site using couchdb's awesome replication feature and 2) I would like to allow users to be able to define their own custom things to track about trades and generate results based off of what they enter. The schema less nature of couch seems perfect here, but it may end up being harder than it sounds. (I already know couch requires you to define views in advance and such so I was just planning on sticking all the custom attributes in an array and then emitting the array in the view and further processing from there.) What I Am Doing: Right now I am just emitting each trade in couch keyed by each user's system and querying with the key of the system to get an array of trades per system. Simple. I am not using a reduce function currently to calculate any stats because I couldn't figure out how to get everything I need without getting a reduce overflow error. Here is an example of rows that are getting emitted from couch: {"total_rows":134,"offset":0,"rows":[ {"id":"5b1dcd47221e160d8721feee4ccc64be", "key":["80e40ba2fa43589d57ec3f1d19db41e6","2010/05/14 04:32:37 +0000"], null, "doc":{ "_id":"5b1dcd47221e160d8721feee4ccc64be", "_rev":"1-bc9fe763e2637694df47d6f5efb58e5b", "couchrest-type":"Trade", "system":"80e40ba2fa43589d57ec3f1d19db41e6", "pair":"EUR/USD", "direction":"Buy", "entry":12600, "exit":12700, "stop_loss":12500, "profit_target":12700, "status":"Closed", "slug":"101332132375", "custom_tracking": [{"name":"signal", "value":"Pin Bar"}] "updated_at":"2010/05/14 04:32:37 +0000", "created_at":"2010/05/14 04:32:37 +0000", "result":100}} ]} In my rails 3 controller I am basically just populating an array of trades such as the one above and then extracting out the relevant data into smaller arrays that I can compute my statistics on. Here is my show action for the page that I want to display the stats and all the trades: def show @trades = Trade.by_system(:startkey => [@system.id], :endkey => [@system.id, Time.now ]) @trades.each do |trade| if trade.result > 0 @winning_trades << trade.result elsif trade.result < 0 @losing_trades << trade.result else @breakeven_trades << trade.result end if trade.direction == "Buy" @long_trades << trade.result else @short_trades << trade.result end if trade["custom_tracking"] != nil @custom_tracking << {"result" => trade.result, "variables" => trade["custom_tracking"]} end end end I am omitting some other stuff that is going on, but that is the gist of what I am doing. Then I am calculating stuff in the view layer to produce some results: <% winning_long_trades = @long_trades.reject {|trade| trade <= 0 } %> <% winning_short_trades = @short_trades.reject {|trade| trade <= 0 } %> <ul> <li>Total Trades: <%= @trades.count %></li> <li>Winners: <%= @winning_trades.size %></li> <li>Biggest Winner (Pips): <%= @winning_trades.max %></li> <li>Average Win(Pips): <%= @winning_trades.sum/@winning_trades.size %></li> <li>Losers: <%= @losing_trades.size %></li> <li>Biggest Loser (Pips): <%= @losing_trades.min %></li> <li>Average Loss(Pips): <%= @losing_trades.sum/@losing_trades.size %></li> <li>Breakeven Trades: <%= @breakeven_trades.size %></li> <li>Long Trades: <%= @long_trades.size %></li> <li>Winning Long Trades: <%= winning_long_trades.size %></li> <li>Short Trades: <%= @short_trades.size %></li> <li>Winning Short Trades: <%= winning_short_trades.size %></li> <li>Total Pips: <%= @winning_trades.sum + @losing_trades.sum %></li> <li>Win Rate (%): <%= @winning_trades.size/@trades.count.to_f * 100 %></li> </ul> This produces the following results, which aside from a few things is exactly what I want: Total Trades: 134 Winners: 70 Biggest Winner (Pips): 1488 Average Win(Pips): 440 Losers: 58 Biggest Loser (Pips): -516 Average Loss(Pips): -225 Breakeven Trades: 6 Long Trades: 125 Winning Long Trades: 67 Short Trades: 9 Winning Short Trades: 3 Total Pips: 17819 Win Rate (%): 52.23880597014925 What I Am Wondering- Finally The Actual Questions: I am starting to get really skeptical of how well this method will work when a user has 5,000 trades instead of just 134 like in this example. I anticipate most users will only have somewhere under 200 per year, but some users may have a couple thousand trades per year. Probably no more than 5,000 per year. It seems to work ok now, but the page load times are already getting a tad high for my tastes. (About 800ms to generate the page according to rails logs with about a 250ms of that spent in the view layer.) I will end up caching this page I am sure, but I still need the regenerate the page each time a trade is updated and I can't afford to have this be too slow. Sooo..... Is doing something similar here possible with a straight couchdb reduce function? I am assuming handing this off to couch would possibly help with larger data sets. I couldn't figure out how, but I suppose that doesn't mean it isn't possible. If possible, any hints will be helpful. Could I use a list function if a reduce was not available due to reduce constraints? Are couchdb list functions suitable for this type of calculations? Anyone have any idea of whether or not list functions perform well? Any hints what one would look like for the type of calculations I am trying to achieve? I thought about other options such as running the calculations at the time each trade was saved or nightly if I had to and saving the results to a statistics doc that I could then query so that all the processing was done ahead of time. I would like this to be the last resort because then I can't really filter out trades by time periods dynamically like I would really like to. (I want to have a slider that a user can slide to only show trades from that time period using the startkey and endkey in couchdb if I can.) If I should continue running the calculations inside the rails app at the time of the page view, what can I do to improve my current implementation. I am new to rails, couch and programming in general. I am sure that I could be doing something better here. Do I need to create an array for each stat or is there a better way to do that. I guess I just would really like some advice on how to tackle this problem. I want to keep the page generation time minimal since I anticipate these being some of the highest trafficked pages. My gut is that I will need to offload the statistics calculation to either couch or run the stats in advance of when they are called, but I am not sure. Lastly: Like I mentioned above, one of the primary reasons for using couch is to allow users to define their own things to track per trade. Getting the data into couch is no problem, but how would I be able to take the custom_tracking array and find how many winning trades for each named tracking attribute. If anyone can give me any hints to the possibility of doing this that would be great. Thanks a bunch. Would really appreciate any help. Willing to fork out some $$$ if someone wants to take on the problem for me. (Don't know if that is allowed on stack overflow or not.)

Read the article
What does it mean when ARP shows <incomplete> on eth1

- by Geoff Dalgas

We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps.

Read the article
"power limit notification" clobbering on 12G Dell servers with RHEL6

- by Andrew B

Server: Poweredge r620 OS: RHEL 6.4 Kernel: 2.6.32-358.18.1.el6.x86_64 I'm experiencing application alarms in my production environment. Critical CPU hungry processes are being starved of resources and causing a processing backlog. The problem is happening on all the 12th Generation Dell servers (r620s) in a recently deployed cluster. As near as I can tell, instances of this happening are matching up to peak CPU utilization, accompanied by massive amounts of "power limit notification" spam in dmesg. An excerpt of one of these events: Nov 7 10:15:15 someserver [.crit] CPU12: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU0: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU6: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU14: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU18: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU2: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU4: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU16: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU0: Package power limit notification (total events = 11) Nov 7 10:15:15 someserver [.crit] CPU6: Package power limit notification (total events = 13) Nov 7 10:15:15 someserver [.crit] CPU14: Package power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU18: Package power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU20: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU8: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU2: Package power limit notification (total events = 12) Nov 7 10:15:15 someserver [.crit] CPU10: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU22: Core power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU4: Package power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU16: Package power limit notification (total events = 13) Nov 7 10:15:15 someserver [.crit] CPU20: Package power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU8: Package power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU10: Package power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU22: Package power limit notification (total events = 14) Nov 7 10:15:15 someserver [.crit] CPU15: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU3: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU1: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU5: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU17: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU13: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU15: Package power limit notification (total events = 375) Nov 7 10:15:15 someserver [.crit] CPU3: Package power limit notification (total events = 374) Nov 7 10:15:15 someserver [.crit] CPU1: Package power limit notification (total events = 376) Nov 7 10:15:15 someserver [.crit] CPU5: Package power limit notification (total events = 376) Nov 7 10:15:15 someserver [.crit] CPU7: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU19: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU17: Package power limit notification (total events = 377) Nov 7 10:15:15 someserver [.crit] CPU9: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU21: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU23: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU11: Core power limit notification (total events = 369) Nov 7 10:15:15 someserver [.crit] CPU13: Package power limit notification (total events = 376) Nov 7 10:15:15 someserver [.crit] CPU7: Package power limit notification (total events = 375) Nov 7 10:15:15 someserver [.crit] CPU19: Package power limit notification (total events = 375) Nov 7 10:15:15 someserver [.crit] CPU9: Package power limit notification (total events = 374) Nov 7 10:15:15 someserver [.crit] CPU21: Package power limit notification (total events = 375) Nov 7 10:15:15 someserver [.crit] CPU23: Package power limit notification (total events = 374) A little Google Fu reveals that this is typically associated with the CPU running hot, or voltage regulation kicking in. I don't think that's what is happening though. Temperature sensors for all servers in the cluster are running fine, Power Cap Policy is disabled in the iDRAC, and my System Profile is set to "Performance" on all of these servers: # omreport chassis biossetup | grep -A10 'System Profile' System Profile Settings ------------------------------------------ System Profile : Performance CPU Power Management : Maximum Performance Memory Frequency : Maximum Performance Turbo Boost : Enabled C1E : Disabled C States : Disabled Monitor/Mwait : Enabled Memory Patrol Scrub : Standard Memory Refresh Rate : 1x Memory Operating Voltage : Auto Collaborative CPU Performance Control : Disabled A Dell mailing list post describes the symptoms almost perfectly. Dell suggested that the author try using the Performance profile, but that didn't help. He ended up applying some settings in Dell's guide for configuring a server for low latency environments and one of those settings (or a combination thereof) seems to have fixed the problem. Kernel.org bug #36182 notes that power-limit interrupt debugging was enabled by default, which is causing performance degradation in scenarios where CPU voltage regulation is kicking in. A RHN KB article (RHN login required) mentions a problem impacting PE r620 and r720 servers not running the Performance profile, and recommends an update to a kernel released two weeks ago. ...Except we are running the Performance profile... Everything I can find online is running me in circles here. What's the heck is going on?

Read the article
PHP pages working slow from time to time

- by user1038179

I have VPS with limit of 2GB of ram and 8 CPU cores. I have 5 sites on that VPS (one of them is just for testing, no visitors exept me). All 5 sites are image galleries, like wallpaper sites. Last week I noticed problem on one site (main domain, used for name servers, and also with most traffic, visitors). That site has two image galleries, one is old static html gallery made few years ago and another, main, is powered by ZENPhoto CMS. Also I have that same gallery CMS on another two sites on that same VPS (on one running site and on one just for testing site). On other two sites I have diferent PHP driven gallery. Problem is that after some time (it vary from 10 minutes to few hours after apache restart), loading of pages on main site becomes very slow, or I get 503 Service Temporarily Unavailable error. So pages becomes unavailable. But just that part with new CMS gallery, old part of site with static html pages are working fast and just fine. Also other two sites with same CMS gallery and other two with different PHP driven gallery are working fine and fast at the same time. I thought it must be something with CMS on that main site, because other sites are working nice. Then I tryed to open contact and guest book pages on that main site which are outside of that CMS but also PHP pages, and they do not load too, but that same contact php scipts are working on other sites at the same time. So, when site starts to hangs, ONLY PHP generated content is not working, like I said other static pages are working. And, ONLY on that one main site I have problems. Then I need to restart Apache, after restart everything is vorking nice and fast, for some time, than again, just PHP pages on main site are becomming slower. If I do not restart apache that slowness take some time (several minutes, hours, depending ot traffic) and during that time PHP diven content is loading very slow or unavailable on that site. After sime time, on moments everything start to work and is fast again for some time, and again. In hours with more traffic PHP content is loading slowly or it is unavailable, in hours with less traffic it is sometimes fast and sometimes little bit slower than usually. And ones again, only on that main site, and only PHP driven pages, static pages are working fast even in most traffic hours also other sites with even same CMS are working fast. Currently I have about 7000 unique visitors on that site but site worked nice even with 11500 visitors per day. And about 17000 in total visitors on VPS, all sites ( about 3 pages per unique visitor). When site start to slow down sometimes in apache status I can see something like this: mod_fcgid status: Total FastCGI processes: 37 Process: php5 (/usr/local/cpanel/cgi-sys/php5)Pid Active Idle Accesses State 11300 39 28 7 Working 11274 47 28 7 Working 11296 40 29 3 Working 11283 45 30 3 Working 11304 36 31 1 Working 11282 46 32 3 Working 11292 42 33 1 Working 11289 44 34 1 Working 11305 35 35 0 Working 11273 48 36 2 Working 11280 47 39 1 Working 10125 133 40 12 Exiting(communication error) 11294 41 41 1 Exiting(communication error) 11277 47 42 2 Exiting(communication error) 11291 43 43 1 Exiting(communication error) 10187 108 43 10 Exiting(communication error) 10209 95 44 7 Exiting(communication error) 10171 113 44 5 Exiting(communication error) 11275 47 47 1 Exiting(communication error) 10144 125 48 8 Exiting(communication error) 10086 149 48 20 Exiting(communication error) 10212 94 49 5 Exiting(communication error) 10158 118 49 5 Exiting(communication error) 10169 114 50 4 Exiting(communication error) 10105 141 50 16 Exiting(communication error) 10094 146 50 15 Exiting(communication error) 10115 139 51 17 Exiting(communication error) 10213 93 51 9 Exiting(communication error) 10197 103 51 7 Exiting(communication error) Process: php5 (/usr/local/cpanel/cgi-sys/php5)Pid Active Idle Accesses State 7983 1079 2 149 Ready 7979 1079 11 151 Ready Process: php5 (/usr/local/cpanel/cgi-sys/php5)Pid Active Idle Accesses State 7990 1066 0 57 Ready 8001 1031 64 35 Ready 7999 1032 94 29 Ready 8000 1031 91 36 Ready 8002 1029 34 52 Ready Process: php5 (/usr/local/cpanel/cgi-sys/php5)Pid Active Idle Accesses State 7991 1064 29 115 Ready When it is working nicly there is no lines with "Exiting(communication error)" Active and Idle are time active and time since last request, in seconds. Here are system info. Sysem info: Total processors: 8 Processor #1 Vendor GenuineIntel Name Intel(R) Xeon(R) CPU E5440 @ 2.83GHz Speed 88.320 MHz Cache 6144 KB All other seven are the same. System Information Linux vps.nnnnnnnnnnnnnnnnn.nnn 2.6.18-028stab099.3 #1 SMP Wed Mar 7 15:20:22 MSK 2012 x86_64 x86_64 x86_64 GNU/Linux Current Memory Usage total used free shared buffers cached Mem: 8388608 882164 7506444 0 0 0 -/+ buffers/cache: 882164 7506444 Swap: 0 0 0 Total: 8388608 882164 7506444 Current Disk Usage Filesystem Size Used Avail Use% Mounted on /dev/vzfs 100G 34G 67G 34% / none System Details: Running on: Apache/2.2.22 System info: (Unix) mod_ssl/2.2.22 OpenSSL/0.9.8e-fips-rhel5 DAV/2 mod_auth_passthrough/2.1 mod_bwlimited/1.4 FrontPage/5.0.2.2635 mod_fcgid/2.3.6 Powered by: PHP/5.3.10 Current Configuration Default PHP Version (.php files) 5 PHP 5 Handler fcgi PHP 4 Handler suphp Apache suEXEC on Apache Ruid2 off PHP 4 Handler suphp Apache suEXEC on Apache Configuration The following settings have been saved: fileetag: All keepalive: On keepalivetimeout: 3 maxclients: 150 maxkeepaliverequests: 10 maxrequestsperchild: 10000 maxspareservers: 10 minspareservers: 5 root_options: ExecCGI, FollowSymLinks, Includes, IncludesNOEXEC, Indexes, MultiViews, SymLinksIfOwnerMatch serverlimit: 256 serversignature: Off servertokens: Full sslciphersuite: ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:-LOW:-SSLv2:-EXP:!kEDH startservers: 5 timeout: 30 I hope, I explained my problem nicely. Any help would be nice.

Read the article
Windows 7 Machine Makes Router Drop -All- Wireless Connections

- by Hammer Bro.

Some background: My home network consists of my Desktop, a two-month old Windows 7 (x64) machine which is online most frequently (N-spec), as well as three other Windows XP laptops (all G) that only connect every now and then (one for work, one for Netflix, and the other for infrequent regular laptop uses). I used to have a Belkin F5D8236-4 wireless router, and everything worked great. A week ago, however, I found out that the Belkin absolutely in no way would establish a VPN connection, something that has become important for work. So I bought a Netgear WNR3500v2/U/L. The wireless was acting a little sketchy at first for just the Windows 7 machine, but I thought it had something to do with 802.11N and I was in a hurry so I just fished up an ethernet cable and disabled the computer's wireless. It has now become apparent, though, that whenever the Windows 7 machine is connected to the router, all wireless connections become unstable. I was using my work laptop for a solid six hours today with no trouble, having multiple SSH connections open over VPN and streaming internet radio in the background. Then, within two minutes of turning on this Windows 7 box, I had lost all connectivity over the wireless. And I was two feet away from the router. The same sort of thing happens on all of the other laptops -- Netflix can be playing stuff all weekend, but if I come up here and do things on this (W7) computer, the streaming will be dead within ten minutes. So here are my basic observations: If the Windows 7 machine is off, then all connections will have a Signal Strength of Very Good or Excellent and a Speed of 48-54 Mbps for an indefinite amount of time. Shortly after the Windows 7 machine is turned on, all wireless connections will experience a consistent decline in Speed down to 1.0 Mbps, eventually losing their connection entirely. These machines will continue to maintain 70% signal strength, as observed by themselves and router. Once dropped, a wireless connection will have difficulty reconnecting. And, if a connection manages to become established, it will quickly drop off again. The Windows 7 machine itself will continue to function just fine if it's using a wired connection, although it will experience these same issues over the wireless. All of the drivers and firmwares are up to date, and this happened both with the stock Netgear firmware as well as the (current) DD-WRT. What I've tried: Making sure each computer is being assigned a distinct IP. (They are.) Disabling UPnP and Stateful Packet Inspection on the router. Disabling Network Sharing, SSDP Discovery, TCP/IP NetBios Helper and Computer Browser services on the Windows 7 machine. Disabling QoS Packet Scheduler, IPv6, and Link Layer Topology Discovery options on my ethernet controller (leaving only Client for Microsoft Networks, File and Printer Sharing, and IPv4 enabled). What I think: It seems awfully similar to the problems discussed in detail at http://social.msdn.microsoft.com/Forums/en/wsk/thread/1064e397-9d9b-4ae2-bc8e-c8798e591915 (which was both the most relevant and concrete information I could dig up on the internet). I still think that something the Windows 7 IP stack (or just Operating System itself) is doing is giving the router fits. However, I could be wrong, because I have two key differences. One is that most instances of this problem are reported as the entire router dying or restarting, and mine still works just fine over the wired connection. The other is that it's a new router, tested with both the factory firmware and the (I assume) well-maintained DD-WRT project. Even if Windows 7 is still secretly sending IPv6 packets or the TCP Window Scaling implementation that I hear Vista caused some trouble with (even though I've tried my best to disable anything fancy), this router should support those functions. I don't want to get a new or a replacement router unless someone can convince me that this is a defective unit. But the problem seems too specific and predictable by my instincts to be a hardware hiccup. And I don't want to deal with the inevitable problems that always seem to take half a day to resolve when getting a new router, since I'm frantically working (including tomorrow) to complete a project by next week's deadline. Plus, I think in the worst case scenario, I could keep this router connected directly to the modem, disable its wireless entirely, and connect the old Belkin to it directly. That should allow me to still use VPN (although I'll have to plug my work laptop directly into that router), and then maintain wireless connections for all of the other computers. But that feels so wrong to me. Anyone have any ideas what the cause and possible solution could be? Clarifications: The Windows 7 machine is directly connected via an ethernet cable to the router for everything above. But while it is online, all other computers' wireless connections become unusable. It is not an issue of signal strength or interference -- no other devices within scanning range are using Channel 1, and the problem will affect computers that are literally feet away from the router with 95% signal strength.

Read the article
Load-balancing between a Procurve switch and a server

- by vlad

Hello I've been searching around the web for this problem i've been having. It's similar in a way to this question: How exactly & specifically does layer 3 LACP destination address hashing work? My setup is as follows: I have a central switch, a Procurve 2510G-24, image version Y.11.16. It's the center of a star topology, there are four switches connected to it via a single gigabit link. Those switches service the users. On the central switch, I have a server with two gigabit interfaces that I want to bond together in order to achieve higher throughput, and two other servers that have single gigabit connections to the switch. The topology looks as follows: sw1 sw2 sw3 sw4 | | | | --------------------- | sw0 | --------------------- || | | srv1 srv2 srv3 The servers were running FreeBSD 8.1. On srv1 I set up a lagg interface using the lacp protocol, and on the switch I set up a trunk for the two ports using lacp as well. The switch showed that the server was a lacp partner, I could ping the server from another computer, and the server could ping other computers. If I unplugged one of the cables, the connection would keep working, so everything looked fine. Until I tested throughput. There was only one link used between srv1 and sw0. All testing was conducted with iperf, and load distribution was checked with systat -ifstat. I was looking to test the load balancing for both receive and send operations, as I want this server to be a file server. There were therefore two scenarios: iperf -s on srv1 and iperf -c on the other servers iperf -s on the other servers and iperf -c on srv1 connected to all the other servers. Every time only one link was used. If one cable was unplugged, the connections would keep going. However, once the cable was plugged back in, the load was not distributed. Each and every server is able to fill the gigabit link. In one-to-one test scenarios, iperf was reporting around 940Mbps. The CPU usage was around 20%, which means that the servers could withstand a doubling of the throughput. srv1 is a dell poweredge sc1425 with onboard intel 82541GI nics (em driver on freebsd). After troubleshooting a previous problem with vlan tagging on top of a lagg interface, it turned out that the em could not support this. So I figured that maybe something else is wrong with the em drivers and / or lagg stack, so I started up backtrack 4r2 on this same server. So srv1 now uses linux kernel 2.6.35.8. I set up a bonding interface bond0. The kernel module was loaded with option mode=4 in order to get lacp. The switch was happy with the link, I could ping to and from the server. I could even put vlans on top of the bonding interface. However, only half the problem was solved: if I used srv1 as a client to the other servers, iperf was reporting around 940Mbps for each connection, and bwm-ng showed, of course, a nice distribution of the load between the two nics; if I run the iperf server on srv1 and tried to connect with the other servers, there was no load balancing. I thought that maybe I was out of luck and the hashes for the two mac addresses of the clients were the same, so I brought in two new servers and tested with the four of them at the same time, and still nothing changed. I tried disabling and reenabling one of the links, and all that happened was the traffic switched from one link to the other and back to the first again. I also tried setting the trunk to "plain trunk mode" on the switch, and experimented with other bonding modes (roundrobin, xor, alb, tlb) but I never saw any traffic distribution. One interesting thing, though: one of the four switches is a Cisco 2950, image version 12.1(22)EA7. It has 48 10/100 ports and 2 gigabit uplinks. I have a server (call it srv4) with a 4 channel trunk connected to it (4x100), FreeBSD 8.0 release. The switch is connected to sw0 via gigabit. If I set up an iperf server on one of the servers connected to sw0 and a client on srv4, ALL 4 links are used, and iperf reports around 330Mbps. systat -ifstat shows all four interfaces are used. The cisco port-channel uses src-mac to balance the load. The HP should use both the source and destination according to the manual, so it should work as well. Could this mean there is some bug in the HP firmware? Am I doing something wrong?

Read the article
Bind9 as a caching resolver fails with mismatch ID on localhost but not external IP

- by argibbs

I'm running Ubuntu 12.04 LTS on a machine on my private network. I have bind9 installed (v9.8.1-P1) via aptitude, so it appears to have put all the bits in the right places and the service starts automatically. I plan on adding some zones later, but first I'm just trying to get it working as a caching resolver. I installed bind, configured it, and starting using it. Initially I thought it was working ok, but then I found some sites weren't being resolved. I've pinned it down to being linked to the size of the result and bind failing-over to TCP mode. So: I'm trying to find out why bind is failing when I query for domain info and the result is 512 bytes (causing a truncation and retry on TCP). Specifically it fails with ID mismatches if I point dig at localhost, but works when I query the machine's own IP (192.168.0.2). This appears to be backwards to the problem that most people have when using bind (fails on external ip, works on localhost). If I do dig @localhost google.com (which has a response of <512 bytes) then it works; I get no warnings, and plenty of output. $ dig @localhost google.com ; <<>> DiG 9.8.1-P1 <<>> @localhost google.com [snip lots of output] ;; Query time: 39 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Oct 17 23:08:34 2013 ;; MSG SIZE rcvd: 495 If I do dig @localhost play.google.com (which has a larger response) then I get back something like: $ dig @localhost play.google.com ;; Truncated, retrying in TCP mode. ;; ERROR: ID mismatch: expected ID 3696, got 27130 This seems to be standard, documented behaviour - when the UDP response is large (here 'large' == 512 bytes) it falls back to TCP. The ID mismatch is not expected though. If I do dig @192.168.0.2 play.google.com then I still get the warning about using TCP mode, but it otherwise works $ dig @192.168.0.2 play.google.com ;; Truncated, retrying in TCP mode. ; <<>> DiG 9.8.1-P1 <<>> @192.168.0.2 play.google.com [snip most of the output] ;; Query time: 5 msec ;; SERVER: 192.168.0.2#53(192.168.0.2) ;; WHEN: Thu Oct 17 23:05:55 2013 ;; MSG SIZE rcvd: 521 At the moment I've not set up any zones in my local instance, so it's just acting as a caching resolver. My options config is pretty much unchanged from standard, I've got the following set: options { directory "/var/cache/bind"; allow-query { 192.168/16; 127.0.0.1; }; forwarders { 8.8.8.8; 8.8.4.4; }; dnssec-validation auto; edns-udp-size 4096 ; allow-transfer { any; }; auth-nxdomain no; # conform to RFC1035 listen-on-v6 { any; }; }; And my /etc/resolv.conf is just nameserver 127.0.0.1 search .local The problem definitely seems linked to the failover to TCP mode: if I do dig +bufsize=4096 @localhost play.google.com then it works; no warning about failover to TCP, no ID mismatch, and a standard looking result. To be honest, if there was a way to force bind to use a much larger UDP buffer, that'd probably be good enough for me, but all I've been able to find mention of is max-udp-size 4096 and that doesn't change the behaviour in any way. I've also tried setting edns-udp-size 512 in case the problem is some weird EDNS issue with my router (which seems unlikely since the +bufsize=4096 flag works fine). I've also tried dig +trace @localhost play.google.com; this works. No truncation/TCP warning, and a full result. I've also tried changing the servers used in the forwarder (e.g. to OpenDNS), but that makes no difference. There's one last data point: if I repetitively do dig @localhost play.google.com I don't always get an ID mismatch, but sometimes a REFUSED error. I'm much more likely to get a REFUSED error if I dig the non-localhost IP (192.168.0.2) first: $ dig @localhost play.google.com ;; Truncated, retrying in TCP mode. ; <<>> DiG 9.8.1-P1 <<>> @localhost play.google.com ; (1 server found) ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: REFUSED, id: 35104 ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;play.google.com. IN A ;; Query time: 4 msec ;; SERVER: 127.0.0.1#53(127.0.0.1) ;; WHEN: Thu Oct 17 23:20:13 2013 ;; MSG SIZE rcvd: 33 Any insights or things to try would be much appreciated.

Read the article
UnicodeEncodeError when uploading files in Django admin

- by Samuel Linde

Note: I asked this question on StackOverflow, but I realize this might be a more proper place to ask this kind of question. I'm trying to upload a file called 'Testaråäö.txt' via the Django admin app. I'm running Django 1.3.1 with Gunicorn 0.13.4 and Nginx 0.7.6.7 on a Debian 6 server. Database is PostgreSQL 8.4.9. Other Unicode data is saved to the database with no problem, so I guess the problem must be with the filesystem somehow. I've set http { charset utf-8; } in my nginx.conf. LC_ALL and LANG is set to 'sv_SE.UTF-8'. Running 'locale' verifies this. I even tried setting LC_ALL and LANG in my nginx init script just to make sure locale is set properly. Here's the traceback: Traceback (most recent call last): File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/handlers/base.py", line 111, in get_response response = callback(request, *callback_args, **callback_kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/options.py", line 307, in wrapper return self.admin_site.admin_view(view)(*args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 93, in _wrapped_view response = view_func(request, *args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/views/decorators/cache.py", line 79, in _wrapped_view_func response = view_func(request, *args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/sites.py", line 197, in inner return view(request, *args, **kwargs) File "/srv/django/letebo/app/cms/admin.py", line 81, in change_view return super(PageAdmin, self).change_view(request, obj_id) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 28, in _wrapper return bound_func(*args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 93, in _wrapped_view response = view_func(request, *args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/utils/decorators.py", line 24, in bound_func return func(self, *args2, **kwargs2) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/transaction.py", line 217, in inner res = func(*args, **kwargs) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/options.py", line 985, in change_view self.save_formset(request, form, formset, change=True) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/contrib/admin/options.py", line 677, in save_formset formset.save() File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/forms/models.py", line 482, in save return self.save_existing_objects(commit) + self.save_new_objects(commit) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/forms/models.py", line 613, in save_new_objects self.new_objects.append(self.save_new(form, commit=commit)) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/forms/models.py", line 717, in save_new obj.save() File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/base.py", line 460, in save self.save_base(using=using, force_insert=force_insert, force_update=force_update) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/base.py", line 504, in save_base self.save_base(cls=parent, origin=org, using=using) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/base.py", line 543, in save_base for f in meta.local_fields if not isinstance(f, AutoField)] File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/fields/files.py", line 255, in pre_save file.save(file.name, file, save=False) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/db/models/fields/files.py", line 92, in save self.name = self.storage.save(name, content) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/files/storage.py", line 48, in save name = self.get_available_name(name) File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/files/storage.py", line 74, in get_available_name while self.exists(name): File "/srv/.virtualenvs/letebo/lib/python2.6/site-packages/django/core/files/storage.py", line 218, in exists return os.path.exists(self.path(name)) File "/srv/.virtualenvs/letebo/lib/python2.6/genericpath.py", line 18, in exists st = os.stat(path) UnicodeEncodeError: 'ascii' codec can't encode characters in position 52-54: ordinal not in range(128) I tried running Gunicorn with debugging turned on, and the file uploads without any problem at all. I suppose this must mean that the issue is with Nginx. Still beats me where to look, though. Here are the raw response headers from Gunicorn and Nginx, if it makes any sense: Gunicorn: HTTP/1.1 302 FOUND Server: gunicorn/0.13.4 Date: Thu, 09 Feb 2012 14:50:27 GMT Connection: close Transfer-Encoding: chunked Expires: Thu, 09 Feb 2012 14:50:27 GMT Vary: Cookie Last-Modified: Thu, 09 Feb 2012 14:50:27 GMT Location: http://my-server.se:8000/admin/cms/page/15/ Cache-Control: max-age=0 Content-Type: text/html; charset=utf-8 Set-Cookie: messages="yada yada yada"; Path=/ Nginx: HTTP/1.1 500 INTERNAL SERVER ERROR Server: nginx/0.7.67 Date: Thu, 09 Feb 2012 14:50:57 GMT Content-Type: text/html; charset=utf-8 Transfer-Encoding: chunked Connection: close Vary: Cookie 500 UPDATE: Both locale.getpreferredencoding() and sys.getfilesystemencoding() outputs 'UTF-8'. locale.getdefaultlocale() outputs ('sv_SE', 'UTF8'). This seem correct to me, so I'm still not sure why I keep getting these errors.

Read the article
Spotlight Infinite Indexing issue (external data drive)

- by Manca Weeks

This is an external drive, formerly a boot drive which is now in use only to access music files (sibelius, audio, midi, live, logic etc.) without transferring the data into a new boot system, partly because of the issue I am about to describe, but mostly because the majority of the data is mainly there for archival purposes. The user is a composer and prominent musician and needs to be able to rehash the data at will. I have tried several things - here is a list: - make complete filesystem clone with antonio diaz's ddrescue - run Disk Warrior on copy, repair whatever errors occurred - wipe out all ACLs on entire drive - set all permissions to the same value - wide open 777 - remove any system data (applications, system files, including hidden files to the best of my knowledge) by selecting only non-system/app data and using Carbon Copy Cloner to put only the data of interest onto a newly formatted drive - transfer data to newly formatted drive folder by folder, resetting the spotlight index in between adding each to observe for issues (interesting here is that no issues occurred except for in Documents folder - when I transferred only the Documents folder to a newly formatted drive on its own - no trouble. It appears almost as thought it may not be the content but the quantity or specific combination of data that results in problems) - use DataRescue to transfer the data to yet another newly formatted drive to expose any missed hidden files Between each of the above steps I stopped Spotlight (search for anything beginning with md in Activity Monitor - All Processes and quitting it), deleted the .Spotlight-V100 directory from the affected drive. Restart Splotlight indexing by adding drive to Spotlight privacy list and removing it. In each case the same issue occurs - Spotlight begins indexing normally (or so it seems), then the index estimated time increases, usually to 4 hours remaining. This is where it gets stuck and continues to predict 4 hours remaining but never finishes. Sometimes I can't eject the drive and have to quit the md.. processes from Activity Monitor to be able to eject the drive without Force Eject. Once I disconnect the drive after the 4 hours remaining situation - if I reattach it, Spotlight forever estimates remaining time and never gets going again. So there it is. It is apparently not a filesystem issue, not a permissions issue and not tied to any particular piece of hardware or protocol (used USB and FW drives). I have tried this on several machines (3 to be precise) and in 10.5.8 and 10.6.5. Simply disabling Spotlight on this volume is not an option because the owner has no clue where things are as the data on the volume dates back to music projects and compositions from 2003 and before. He needs to be able to query for results. Anyone got any ideas? ---update 2-6-11 Since I have not received any responses except the one below which appears to misunderstand my point, I am updating this post hoping to get more responses. I have used the terminal command sudo opensnoop -p PID where PID is the mdworker process ID to try and determine what Spotlight is doing and hopefully find the files it's having trouble with. Here's what happens: After indexing for a few hours, mdworker is gone. It no longer shows up in Activity Monitor under "All Processes" and the Terminal window with the opensnoop result stops moving. I then proceeded to execute the same command on mds to see what it was doing and here's what I get, repeatedly: 501 57 mds 21 / 501 57 mds 21 /Volumes/Sno Leppard 501 57 mds 21 /Volumes/Tiger 501 57 mds 21 /Volumes/Leppard 501 57 mds 21 /Volumes/Disk Warrior 501 57 mds 21 /Volumes/ONM Data These represent all the volumes currently mounted in the system. All except ONM Data, which is the one I am trying to index, are excluded from SPotlight indexing at the moment. The sequence above repeats over and over, with slight variation, sometimes skipping one of the volumes. Questions - what happened to mdworker? What is mds doing? I will let this run until tomorrow morning and throughout the day and monitor for any changes. Any input would be very much appreciated. Even if you're not sure what the ultimate answer is, please alert me to anything you think I may be missing. Hopefully at some point we will figure this out... Thanks, M __final edit__ I finally resolved the issue and here is how I did it. I used the terminal command "sudo opensnoop -p PID" where the PID is the process id of the processes I was monitoring. I was looking at all instances of mds and mdworker running in the system. After the third time through indexing the same data set (see info above), I contacted Apple and got to their highest level of support - they were flabbergasted as well. They advised me to install yet another default 10.6.6 system and try again. The same pattern repeated - mds and mdworker(s) would start indexing and eventually the spotlight icon would say 6 hours remaining and all mdworkers were gone, mds at 90% or so of CPU. But I did finally figure out that the first time mdworker stopped like that, the last file it touched was always in the same folder. I excluded that folder from spotlight search and the rest of the data set indexed within about 2 hours with no strange behavior or failures. I copied that folder to another machine and Spotlight barfed immediately. Exclude that folder and all is well again. I have no clue what is causing this behavior, still, but I did find a functional solution to the problem. Anyone with a similar problem - run opensnoop on all instances of mds and mdworker and wait patiently for wdworker to exit. Look at the last file it touched and exclude the enclosing folder from being indexed. I was able to repeat the issue and solution on 2 different installs and 2 different copies of the data set. Hope this helps. If we find an actual cause of the folder being such a problem (it is called MICHAEL BRECKER RECORD SOLOS and contains almost 1 GB of audio related files - performer, live, SD2 - things like that), I will edit again to let you all know. Thanks for ay attempts to help, M

Read the article
Windows 7 Machine Makes Router Drop -All- Wireless Connections [closed]

- by Hammer Bro.

Note: I accidentally originally posted this question over at SuperUser, and I still think the issue is caused by some low-level networking practice of Windows 7, but I think the expertise here would be more apt to figuring it out. Apologies for the cross-post. Some background: My home network consists of my Desktop, a two-month old Windows 7 (x64) machine which is online most frequently (N-spec), as well as three other Windows XP laptops (all G) that only connect every now and then (one for work, one for Netflix, and the other for infrequent regular laptop uses). I used to have a Belkin F5D8236-4 wireless router, and everything worked great. A week ago, however, I found out that the Belkin absolutely in no way would establish a VPN connection, something that has become important for work. So I bought a Netgear WNR3500v2/U/L. The wireless was acting a little sketchy at first for just the Windows 7 machine, but I thought it had something to do with 802.11N and I was in a hurry so I just fished up an ethernet cable and disabled the computer's wireless. It has now become apparent, though, that whenever the Windows 7 machine is connected to the router, all wireless connections become unstable. I was using my work laptop for a solid six hours today with no trouble, having multiple SSH connections open over VPN and streaming internet radio in the background. Then, within two minutes of turning on this Windows 7 box, I had lost all connectivity over the wireless. And I was two feet away from the router. The same sort of thing happens on all of the other laptops -- Netflix can be playing stuff all weekend, but if I come up here and do things on this (W7) computer, the streaming will be dead within ten minutes. So here are my basic observations: If the Windows 7 machine is off, then all connections will have a Signal Strength of Very Good or Excellent and a Speed of 48-54 Mbps for an indefinite amount of time. Shortly after the Windows 7 machine is turned on, all wireless connections will experience a consistent decline in Speed down to 1.0 Mbps, eventually losing their connection entirely. These machines will continue to maintain 70% signal strength, as observed by themselves and router. Once dropped, a wireless connection will have difficulty reconnecting. And, if a connection manages to become established, it will quickly drop off again. The Windows 7 machine itself will continue to function just fine if it's using a wired connection, although it will experience these same issues over the wireless. All of the drivers and firmwares are up to date, and this happened both with the stock Netgear firmware as well as the (current) DD-WRT. What I've tried: Making sure each computer is being assigned a distinct IP. (They are.) Disabling UPnP and Stateful Packet Inspection on the router. Disabling Network Sharing, SSDP Discovery, TCP/IP NetBios Helper and Computer Browser services on the Windows 7 machine. Disabling QoS Packet Scheduler, IPv6, and Link Layer Topology Discovery options on my ethernet controller (leaving only Client for Microsoft Networks, File and Printer Sharing, and IPv4 enabled). What I think: It seems awfully similar to the problems discussed in detail at http://social.msdn.microsoft.com/Forums/en/wsk/thread/1064e397-9d9b-4ae2-bc8e-c8798e591915 (which was both the most relevant and concrete information I could dig up on the internet). I still think that something the Windows 7 IP stack (or just Operating System itself) is doing is giving the router fits. However, I could be wrong, because I have two key differences. One is that most instances of this problem are reported as the entire router dying or restarting, and mine still works just fine over the wired connection. The other is that it's a new router, tested with both the factory firmware and the (I assume) well-maintained DD-WRT project. Even if Windows 7 is still secretly sending IPv6 packets or the TCP Window Scaling implementation that I hear Vista caused some trouble with (even though I've tried my best to disable anything fancy), this router should support those functions. I don't want to get a new or a replacement router unless someone can convince me that this is a defective unit. But the problem seems too specific and predictable by my instincts to be a hardware hiccup. And I don't want to deal with the inevitable problems that always seem to take half a day to resolve when getting a new router, since I'm frantically working (including tomorrow) to complete a project by next week's deadline. Plus, I think in the worst case scenario, I could keep this router connected directly to the modem, disable its wireless entirely, and connect the old Belkin to it directly. That should allow me to still use VPN (although I'll have to plug my work laptop directly into that router), and then maintain wireless connections for all of the other computers. But that feels so wrong to me. Anyone have any ideas what the cause and possible solution could be? Clarifications: The Windows 7 machine is directly connected via an ethernet cable to the router for everything above. But while it is online, all other computers' wireless connections become unusable. It is not an issue of signal strength or interference -- no other devices within scanning range are using Channel 1, and the problem will affect computers that are literally feet away from the router with 95% signal strength.

Read the article
vSphere ESX 5.5 hosts cannot connect to NFS Server

- by Gerald

Summary: My problem is I cannot use the QNAP NFS Server as an NFS datastore from my ESX hosts despite the hosts being able to ping it. I'm utilising a vDS with LACP uplinks for all my network traffic (including NFS) and a subnet for each vmkernel adapter. Setup: I'm evaluating vSphere and I've got two vSphere ESX 5.5 hosts (node1 and node2) and each one has 4x NICs. I've teamed them all up using LACP/802.3ad with my switch and then created a distributed switch between the two hosts with each host's LAG as the uplink. All my networking is going through the distributed switch, ideally, I want to take advantage of DRS and the redundancy. I have a domain controller VM ("Central") and vCenter VM ("vCenter") running on node1 (using node1's local datastore) with both hosts attached to the vCenter instance. Both hosts are in a vCenter datacenter and a cluster with HA and DRS currently disabled. I have a QNAP TS-669 Pro (Version 4.0.3) (TS-x69 series is on VMware Storage HCL) which I want to use as the NFS server for my NFS datastore, it has 2x NICs teamed together using 802.3ad with my switch. vmkernel.log: The error from the host's vmkernel.log is not very useful: NFS: 157: Command: (mount) Server: (10.1.2.100) IP: (10.1.2.100) Path: (/VM) Label (datastoreNAS) Options: (None) cpu9:67402)StorageApdHandler: 698: APD Handle 509bc29f-13556457 Created with lock[StorageApd0x411121] cpu10:67402)StorageApdHandler: 745: Freeing APD Handle [509bc29f-13556457] cpu10:67402)StorageApdHandler: 808: APD Handle freed! cpu10:67402)NFS: 168: NFS mount 10.1.2.100:/VM failed: Unable to connect to NFS server. Network Setup: Here is my distributed switch setup (JPG). Here are my networks. 10.1.1.0/24 VM Management (VLAN 11) 10.1.2.0/24 Storage Network (NFS, VLAN 12) 10.1.3.0/24 VM vMotion (VLAN 13) 10.1.4.0/24 VM Fault Tolerance (VLAN 14) 10.2.0.0/24 VM's Network (VLAN 20) vSphere addresses 10.1.1.1 node1 Management 10.1.1.2 node2 Management 10.1.2.1 node1 vmkernel (For NFS) 10.1.2.2 node2 vmkernel (For NFS) etc. Other addresses 10.1.2.100 QNAP TS-669 (NFS Server) 10.2.0.1 Domain Controller (VM on node1) 10.2.0.2 vCenter (VM on node1) I'm using a Cisco SRW2024P Layer-2 switch (Jumboframes enabled) with the following setup: LACP LAG1 for node1 (Ports 1 through 4) setup as VLAN trunk for VLANs 11-14,20 LACP LAG2 for my router (Ports 5 through 8) setup as VLAN trunk for VLANs 11-14,20 LACP LAG3 for node2 (Ports 9 through 12) setup as VLAN trunk for VLANs 11-14,20 LACP LAG4 for the QNAP (Ports 23 and 24) setup to accept untagged traffic into VLAN 12 Each subnet is routable to another, although, connections to the NFS server from vmk1 shouldn't need it. All other traffic (vSphere Web Client, RDP etc.) goes through this setup fine. I tested the QNAP NFS server beforehand using ESX host VMs atop of a VMware Workstation setup with a dedicated physical NIC and it had no problems. The ACL on the NFS Server share is permissive and allows all subnet ranges full access to the share. I can ping the QNAP from node1 vmk1, the adapter that should be used to NFS: ~ # vmkping -I vmk1 10.1.2.100 PING 10.1.2.100 (10.1.2.100): 56 data bytes 64 bytes from 10.1.2.100: icmp_seq=0 ttl=64 time=0.371 ms 64 bytes from 10.1.2.100: icmp_seq=1 ttl=64 time=0.161 ms 64 bytes from 10.1.2.100: icmp_seq=2 ttl=64 time=0.241 ms Netcat does not throw an error: ~ # nc -z 10.1.2.100 2049 Connection to 10.1.2.100 2049 port [tcp/nfs] succeeded! The routing table of node1: ~ # esxcfg-route -l VMkernel Routes: Network Netmask Gateway Interface 10.1.1.0 255.255.255.0 Local Subnet vmk0 10.1.2.0 255.255.255.0 Local Subnet vmk1 10.1.3.0 255.255.255.0 Local Subnet vmk2 10.1.4.0 255.255.255.0 Local Subnet vmk3 default 0.0.0.0 10.1.1.254 vmk0 VM Kernel NIC info ~ # esxcfg-vmknic -l Interface Port Group/DVPort IP Family IP Address Netmask Broadcast MAC Address MTU TSO MSS Enabled Type vmk0 133 IPv4 10.1.1.1 255.255.255.0 10.1.1.255 00:50:56:66:8e:5f 1500 65535 true STATIC vmk0 133 IPv6 fe80::250:56ff:fe66:8e5f 64 00:50:56:66:8e:5f 1500 65535 true STATIC, PREFERRED vmk1 164 IPv4 10.1.2.1 255.255.255.0 10.1.2.255 00:50:56:68:f5:1f 1500 65535 true STATIC vmk1 164 IPv6 fe80::250:56ff:fe68:f51f 64 00:50:56:68:f5:1f 1500 65535 true STATIC, PREFERRED vmk2 196 IPv4 10.1.3.1 255.255.255.0 10.1.3.255 00:50:56:66:18:95 1500 65535 true STATIC vmk2 196 IPv6 fe80::250:56ff:fe66:1895 64 00:50:56:66:18:95 1500 65535 true STATIC, PREFERRED vmk3 228 IPv4 10.1.4.1 255.255.255.0 10.1.4.255 00:50:56:72:e6:ca 1500 65535 true STATIC vmk3 228 IPv6 fe80::250:56ff:fe72:e6ca 64 00:50:56:72:e6:ca 1500 65535 true STATIC, PREFERRED Things I've tried/checked: I'm not using DNS names to connect to the NFS server. Checked MTU. Set to 9000 for vmk1, dvSwitch and Cisco switch and QNAP. Moved QNAP onto VLAN 11 (VM Management, vmk0) and gave it an appropriate address, still had same issue. Changed back afterwards of course. Tried initiating the connection of NAS datastore from vSphere Client (Connected to vCenter or directly to host), vSphere Web Client and the host's ESX Shell. All resulted in the same problem. Tried a path name of "VM", "/VM" and "/share/VM" despite not even having a connection to server. I plugged in a linux system (10.1.2.123) into a switch port configured for VLAN 12 and tried mounting the NFS share 10.1.2.100:/VM, it worked successfully and I had read-write access to it I tried disabling the firewall on the ESX host esxcli network firewall set --enabled false I'm out of ideas on what to try next. The things I'm doing differently from my VMware Workstation setup is the use of LACP with a physical switch and a virtual distributed switch between the two hosts. I'm guessing the vDS is probably the source of my troubles but I don't know how to fix this problem without eliminating it.

Read the article
Cross-platform distributed fault-tolerant (disconnected operation/local cache) filesystem

- by Adrian Frühwirth

We are facing a design "challenge" where we are required to set up a storage solution with the following properties: What we need HA a scalable storage backend offline/disconnected operation on the client to account for network outages cross-platform access client-side access from certainly Windows (probably XP upwards), possibly Linux backend integrates with AD/LDAP (permission management (user/group management, ...)) should work reasonably well over slow WAN-links Another problem is that we don't really know all possible use cases here, if people need to be able to have concurrent access to shared files or if they will only be accessing their own files, so a possible solution needs to account for concurrent access and how conflict management would look in this case from a user's point of view. This two years old blog posts sums up the impression that I have been getting during the last couple of days of research, that there are lots of current übercool projects implementing (non-Windows) clustered petabyte-capable blob-storage solutions but that there is none that supports disconnected operation nicely and natively, but I am hoping that we have missed an obvious solution. What we have tried OpenAFS We figured that we want a distributed network filesystem with a local cache and tested OpenAFS (which, as the only currently "stable" DFS supporting disconnected operation, seemed the way to go) for a week but there are several problems with it: it's a real pain to set up there are no official RHEL/CentOS packages the package of the current stable version 1.6.5.1 from elrepo randomly kernel panics on fresh installs, this is an absolute no-go Windows support (including the required Kerberos packages) is mystical. The current client for the 1.6 branch does not run on Windows 8, the current client for the 1.7 does but it just randomly crashes. After that experience we didn't even bother testing on XP and Windows 7. Suffice to say, we couldn't get it working and the whole setup has been so unstable and complicated to setup that it's just not an option for production. Samba + Unison Since OpenAFS was a complete disaster and no other DFS seems to support disconnected operation we went for a simpler idea that would sync files against a Samba server using Unison. This has the following advantages: Samba integrates with ADs; it's a pain but can be done. Samba solves the problem of remotely accessing the storage from Windows but introduces another SPOF and does not address the actual storage problem. We could probably stick any clustered FS underneath Samba, but that means we need a HA Samba setup on top of that to maintain HA which probably adds a lot of additional complexity. I vaguely remember trying to implement redundancy with Samba before and I could not silently failover between servers. Even when online, you are working with local files which will result in more conflicts than would be necessary if a local cache were only touched when disconnected It's not automatic. We cannot expect users to manually sync their files using the (functional, but not-so-pretty) GTK GUI on a regular basis. I attempted to semi-automate the process using the Windows task scheduler, but you cannot really do it in a satisfactory way. On top of that, the way Unison works makes syncing against Samba a costly operation, so I am afraid that it just doesn't scale very well or even at all. Samba + "Offline Files" After that we became a little desparate and gave Windows "offline files" a chance. We figured that having something that is inbuilt into the OS would reduce administrative efforts, helps blaming someone else when it's not working properly and should just work since people have been using this for years. Right? Wrong. We really wanted it to work, but it just doesn't. 30 minutes of copying files around and unplugging network cables/disabling network interfaces left us with (silent! there is only a tiny notification in Windows explorer in the statusbar, which doesn't even open Sync Center if you click on it!) undeletable files on the server (!) and conflicts that should not even be conflicts. In the end, we had one successful sync of a tiny text file, everything else just exploded horribly. Beyond that, there are other problems: Microsoft admits that "offline files" in Windows XP cannot cope with "large files" and therefore does not cache/sync them at all which would mean those files become unavailable if the connection drop In Windows 7 the feature is only available in the Professional/Ultimate/Enterprise editions. Summary Unless there is another fault-tolerant DFS that supports Windows natively I assume that stacking a HA Samba cluster on top of something like GlusterFS/Lustre/whatnot is the only option, but I hope that I am wrong here. How do other companies allow fault-tolerant network access to redundant storage in a heterogeneous environment with Windows?

Read the article
Remote Desktop Services Gateway Issue

- by AVandelay05

Alright fellow techies here's the rundown. I have installed Server 2008 r2 Remote Dekstop Services on a VM in my network. I installed the following RD role services: RD Session Host, Licensing, Connection Broker, Gateway, Web Access. When I set things up originally, the gateway server and RDWeb worked as it should locally. After getting things running locally (remoteserver.domainname.local) I wanted to test things externally. From the outside, I couldn't get things running (meaning I could connect to rdweb access externally, but when I tried to run an app I would get the message "can't connect/find computer"). Here's my setup for external access The VM has every RD Services role services installed on it, meaning it acts as gateway, rd web access, session host, licensing, the whole bit. I made a self-signed certificate on the gateway server (gateway.domainname.net is the cert name). Internally, I have a secondary forward-lookup zone called domainname.net with an A record gateway pointing to the local IP of the gateway server. On our public DNS (domainname.net) I have an A record gateway. This is to access the RDWeb externally. In IIS I have the following authentication settings RDWeb: All disabled except for anonymous authentication Rpc: All disabled except for basic and windows RpcWithCert: All disbled except for windows authentication I have the necessary web access config in our sonicwall tz210 (https and rdp, external ip pointing to local ip of rds server) RAP and CAP have the correct user and computer groups, authentication, and allowed devices After all of this, here's what happens accessing externally. I can login correctly to RDWeb Access (I've tried a bogus login, I can't login to it so that's working properly). I see the Apps for use. I click on an app, click connect, the credential window opens, I put in the correct user creds, it tries to connect to the gateway server, but then the cred window comes back in view. I tried to reach a limit of failed logins, but never reached one, haha. So from the same external client machine I try to connect to the gateway through a Remote Desktop connection. I put in the correct gateway settings in the RD window, try to connect and get the same results as I did in RDWeb access. I checked the event logs on the RD Services machine and saw the following event IDs around the time I tried to login externally: ID 6037 with the message "The program svchost.exe, with the assigned process ID 2168, could not authenticate locally by using the target name host/gateway.domainname.net. The target name used is not valid. A target name should refer to one of the local computer names, for example, the DNS host name. Try a different target name." ID 10 RADWebAccess "RD Web Access was unable to access gateway.domainname.net, which is the server that is specified as running the RemoteApp and Desktop Connection Management service. Ensure that the computer account of the RD Web Access server is a member of the TS Web Access Computers security group on gateway.domainname.net" ID 4625 "An account failed to log on. Subject: Security ID: NULL SID Account Name: - Account Domain: - Logon ID: 0x0 Logon Type: 3 Account For Which Logon Failed: Security ID: NULL SID Account Name: Administrator Account Domain: gateway.domainname.net Failure Information: Failure Reason: Unknown user name or bad password. Status: 0xc000006d Sub Status: 0xc000006a Process Information: Caller Process ID: 0x0 Caller Process Name: - Network Information: Workstation Name: USER-LAPTOP Source Network Address: External IP Source Port: 63125 Detailed Authentication Information: Logon Process: NtLmSsp Authentication Package: NTLM Transited Services: - Package Name (NTLM only): - Key Length: 0 This event is generated when a logon request fails. It is generated on the computer where access was attempted. The Subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe. The Logon Type field indicates the kind of logon that was requested. The most common types are 2 (interactive) and 3 (network). The Process Information fields indicate which account and process on the system requested the logon. The Network Information fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases. The authentication information fields provide detailed information about this specific logon request. - Transited services indicate which intermediate services have participated in this logon request. - Package name indicates which sub-protocol was used among the NTLM protocols." I don't think the VM has a null SID. The SID of the VM and it's physical host have different SIDS. I can access the blank page for rpc externally using the external gateway name. It seems like authentication is a problem. Also, is it a problem that the external name of the gateway server doesn't match the local name? The external name (which the cert is based on) is gateway.domainname.net and the internal name is remoteserver.domainname.local. That's the only thing I can think of that would be the problem, but the external name has to be different from the local right? Internally, I ping gateway.domainname.net and it gives me the correct local IP of the server. Now, there isn't an actual computer name in AD, but I don't know how I would achieve that? I hope I've been clear....any help would be appreciated. I think I'm close to achieving this. :)

Read the article
VFS: file-max limit 1231582 reached

- by Rick Koshi

I'm running a Linux 2.6.36 kernel, and I'm seeing some random errors. Things like ls: error while loading shared libraries: libpthread.so.0: cannot open shared object file: Error 23 Yes, my system can't consistently run an 'ls' command. :( I note several errors in my dmesg output: # dmesg | tail [2808967.543203] EXT4-fs (sda3): re-mounted. Opts: (null) [2837776.220605] xv[14450] general protection ip:7f20c20c6ac6 sp:7fff3641b368 error:0 in libpng14.so.14.4.0[7f20c20a9000+29000] [4931344.685302] EXT4-fs (md16): re-mounted. Opts: (null) [4982666.631444] VFS: file-max limit 1231582 reached [4982666.764240] VFS: file-max limit 1231582 reached [4982767.360574] VFS: file-max limit 1231582 reached [4982901.904628] VFS: file-max limit 1231582 reached [4982964.930556] VFS: file-max limit 1231582 reached [4982966.352170] VFS: file-max limit 1231582 reached [4982966.649195] top[31095]: segfault at 14 ip 00007fd6ace42700 sp 00007fff20746530 error 6 in libproc-3.2.8.so[7fd6ace3b000+e000] Obviously, the file-max errors look suspicious, being clustered together and recent. # cat /proc/sys/fs/file-max 1231582 # cat /proc/sys/fs/file-nr 1231712 0 1231582 That also looks a bit odd to me, but the thing is, there's no way I have 1.2 million files open on this system. I'm the only one using it, and it's not visible to anyone outside the local network. # lsof | wc 16046 148253 1882901 # ps -ef | wc 574 6104 44260 I saw some documentation saying: file-max & file-nr: The kernel allocates file handles dynamically, but as yet it doesn't free them again. The value in file-max denotes the maximum number of file- handles that the Linux kernel will allocate. When you get lots of error messages about running out of file handles, you might want to increase this limit. Historically, the three values in file-nr denoted the number of allocated file handles, the number of allocated but unused file handles, and the maximum number of file handles. Linux 2.6 always reports 0 as the number of free file handles -- this is not an error, it just means that the number of allocated file handles exactly matches the number of used file handles. Attempts to allocate more file descriptors than file-max are reported with printk, look for "VFS: file-max limit reached". My first reading of this is that the kernel basically has a built-in file descriptor leak, but I find that very hard to believe. It would imply that any system in active use needs to be rebooted every so often to free up the file descriptors. As I said, I can't believe this would be true, since it's normal to me to have Linux systems stay up for months (even years) at a time. On the other hand, I also can't believe that my nearly-idle system is holding over a million files open. Does anyone have any ideas, either for fixes or further diagnosis? I could, of course, just reboot the system, but I don't want this to be a recurring problem every few weeks. As a stopgap measure, I've quit Firefox, which was accounting for almost 2000 lines of lsof output (!) even though I only had one window open, and now I can run 'ls' again, but I doubt that will fix the problem for long. (edit: Oops, spoke too soon. By the time I finished typing out this question, the symptom was/is back) Thanks in advance for any help. And another update: My system was basically unusable, so I decided I had no option but to reboot. But before I did, I carefully quit one process at a time, checking /proc/sys/fs/file-nr after each termination. I found that, predictably, the number of open files gradually went down as I closed things down. Unfortunately, it wasn't a large effect. Yes, I was able to clear up 5000-10000 open files, but there were still over 1.2 million left. I shut down just about everything. All interactive shells, except for the one ssh I left open to finish closing down, httpd, even nfs service. Basically everything in the process table that wasn't a kernel process, and there were still an appalling number of files apparently left open. After the reboot, I found that /proc/sys/fs/file-nr showed about 2000 files open, which is much more reasonable. Starting up 2 Xvnc sessions as usual, along with the dozen or so monitoring windows I like to keep open, brought the total up to about 4000 files. I can see nothing wrong with that, of course, but I've obviously failed to identify the root cause. I'm still looking for ideas, since I definitely expect it to happen again. And another update, the next day: I watched the system carefully, and discovered that /proc/sys/fs/file-nr showed a growth of about 900 open files per hour. I shut down the system's only NFS client for the night, and the growth stopped. Mind you, it didn't free up the resources, but it did at least stop consuming more. Is this a known bug with NFS? I'll be bringing the NFS client back online today, and I'll narrow it down further. If anyone is familiar with this behavior, feel free to jump in with "Yeah, NFS4 has this problem, go back to NFS3" or something like that.

Read the article
CISCO 2911 Router configuration

- by bala

Device cisco 2911 router configuration support is required please. I have exchange server 2010 configured and working without any errors the problem is in cisco router configuration when exchange server sends emails out the receives WAN IP not the public ip. I have configured RDNS lookups with our MX record IP addesses that match the FQDN but all our emails are rejected because it does not match with the public ip. Receiving mails problem is not an problem all mails are coming through. i am sure i am missing something on the router configuration that does not sends the public ip, can any one help me to solve this issue. Note; I've got 1 WAN IP & 8 Public IP from ISP . Find below the running configuration. Building configuration... Current configuration : 2734 bytes ! ! Last configuration change at 06:32:13 UTC Tue Apr 3 2012 ! NVRAM config last updated at 06:32:14 UTC Tue Apr 3 2012 ! NVRAM config last updated at 06:32:14 UTC Tue Apr 3 2012 version 15.1 service timestamps debug datetime msec service timestamps log datetime msec service password-encryption ! hostname BSBG-LL ! boot-start-marker boot-end-marker ! ! enable secret 5 $x$xHrxxxxx5ox0 enable password 7 xx23xx5FxxE1xx044 ! no aaa new-model ! no ipv6 cef ip source-route ip cef ! ! ! ! ! ip flow-cache timeout active 1 ip domain name yourdomain.com ip name-server 213.42.20.20 ip name-server 195.229.241.222 multilink bundle-name authenticated ! ! crypto pki token default removal timeout 0 ! ! license udi pid CISCO2911/K9 ! ! username bsbg ! ! ! ! ! ! interface Embedded-Service-Engine0/0 no ip address shutdown ! interface GigabitEthernet0/0 ip address 192.168.0.9 255.255.255.0 ip flow ingress ip nat inside ip virtual-reassembly in duplex auto speed 100 no cdp enable ! interface GigabitEthernet0/1 ip address 213.42.xx.x2 255.255.255.252 ip nat outside ip virtual-reassembly in duplex auto speed auto no cdp enable ! interface GigabitEthernet0/2 no ip address shutdown duplex auto speed auto ! ip forward-protocol nd ! no ip http server no ip http secure-server ! ip nat inside source list 120 interface GigabitEthernet0/1 overload ip nat inside source static tcp 192.168.0.4 25 94.56.89.100 25 extendable ip nat inside source static tcp 192.168.0.4 53 94.56.89.100 53 extendable ip nat inside source static udp 192.168.0.4 53 94.56.89.100 53 extendable ip nat inside source static tcp 192.168.0.4 110 94.56.89.100 110 extendable ip nat inside source static tcp 192.168.0.4 443 94.56.89.100 443 extendable ip nat inside source static tcp 192.168.0.4 587 94.56.89.100 587 extendable ip nat inside source static tcp 192.168.0.4 995 94.56.89.100 995 extendable ip nat inside source static tcp 192.168.0.4 3389 94.56.89.100 3389 extendable ip nat inside source static tcp 192.168.0.4 443 94.56.89.101 443 extendable ip nat inside source static tcp 192.168.0.12 80 94.56.89.102 80 extendable ip nat inside source static tcp 192.168.0.12 443 94.56.89.102 443 extendable ip nat inside source static tcp 192.168.0.12 3389 94.56.89.102 3389 extendable ip route 0.0.0.0 0.0.0.0 213.42.69.41 ! access-list 120 permit ip 192.168.0.0 0.0.0.255 any ! ! ! control-plane ! ! ! line con 0 exec-timeout 5 0 line aux 0 line 2 no activation-character no exec transport preferred none transport input all transport output pad telnet rlogin lapb-ta mop udptn v120 ssh stopbits 1 line vty 0 4 password 7 xx64xxD530D26086Dxx login transport input all ! scheduler allocate 20000 1000 end

Read the article
Deployment Workbench no longer available after PXE boot

- by Patrick

Our build process revolves around windows Deployment Workbench. Unfortunately this was setup by someone who is no longer with the company, and no-one has ever dared/needed to make any changes. The other day it stopped working. It turns out that one of our build guys started thinking about changing some stuff in it, clicked something and now it no longer works (He is saying now that he right clicked on the 'LAB' entry in 'Deployment points' and hit 'Update', which took some time to run through apparently). The job has fallen on me to resolve and frankly I'm not sure what I'm doing. I was wondering if someone with more experience than me can provide some pointers as to troubleshooting cos I'm feeling quite a lot in the dark here. On the server I have Deployment Workbench up and running (MMC snapin) version 3.0. There is a WDS service that appears to be running ok, as does the tFTPd service. Nothing specific to this in event logs. From the client side; PXE boot works and gets you to the Win PE launch, and it has the correct company logo as the background (proving to me that its loading win PE from the network). WPEINIT runs, and asks for domain credentials, here the team simply put User/Pass/Domain in the boxes and click ok. Normally the build would kick off. Instead they get an error message saying that the \NATBLU01\Distribution$ share isn't available. Checking \NATBLU01\Distribution$ shows that its there and accessible over the network. Security/permissions seem ok, even 'ANONYMOUS LOGON' has read access to that share so I don't see that being a problem. Digging the trace files from C:\MININT\SMSOSD\OSDLOGS\ after an attempt to run the build I can see an error saying much the same - <![LOG[Validating connection to \\NATBLU01\Distribution$]LOG]!><time="16:42:14.000+000" date="03-15-2012" component="LiteTouch" context="" type="1" thread="" file="LiteTouch"> <![LOG[FindFile: The file OSDConnectToUNC.exe could not be found in any standard locations.]LOG]!><time="16:42:14.000+000" date="03-15-2012" component="LiteTouch" context="" type="1" thread="" file="LiteTouch"> <![LOG[The network location cannot be reached. For information about network troubleshooting, see Windows Help.]LOG]!><time="16:42:24.000+000" date="03-15-2012" component="LiteTouch" context="" type="3" thread="" file="LiteTouch"> <![LOG[ERROR - Unable to map a network drive to \\NATBLU01\Distribution$.]LOG]!><time="16:42:24.000+000" date="03-15-2012" component="LiteTouch" context="" type="3" thread="" file="LiteTouch"> BDD.LOG shows much the same. Full copies of the .LOG files can both files be found here : BDD.LOG LITETOUCH.LOG I can get to a command prompt from the Win PE that boots from PXE, however there isn't any network stuff there. IPCONFIG returns nothing so none of the tests I would usually run resolve anything. I'm at a loss frankly. I did wonder if I could perhaps start a new build process but if the change to the DeploymentWorkbench has knocked it offline I don't think I'm going to be able to create a new deployment. Failing that; we do have a deployment point labeled type 'Media' which appears to be a DVD ISO image of one of the builds, but its dated 2008, is it possible to export the network build to .ISO and build from DVD? We are looking at new hardware to run this from anyway (for the impending Windows 7 roll out) so a temporary work round isn't going to be too much of a problem. All assistance is appreciated! EDIT : OK. Got it working again. Solution was close to Newmanth's idea. The problem was that our PE image didn't appear to be connecting the network. I had an older copy of the PE boot.WIM on a stick that I had been using for other purposes. I booted that and correctly got a network connection. Showed a correct internal IP and could ping out etc etc. However I was still getting the same errors in all the logs and in when wpeinit was running. What I did seperately was to update the PE image that DeploymentWorkbench was pushing out to display a different back ground. I wanted to prove that I was working in the correct place. Turns out that I wasn't. I went and looked at the other deployment stuff we had on this machine, Windows Deployment Services was installed and although all the install images are off line the boot image was online, so I uploaded the copy from my stick to that. Booted straight off. And fixed. Working. Yay! For anyone stumbling across this in the future you may find that although your deployment images are located in the DeploymentWorkbench, the Win PE boot image you are launching from is located in the associated Windows Deployment Services images.

Read the article
OS Missing? Messed up the MBR on Win7 64-bit

- by hom3lesshom3boy

I have a Windows 7 machine with two hard drives: a 1TB C: drive and 500GB J:. I had Windows XP installed on C: and Windows 7 installed on J:. I installed Windows 7 after Windows XP from an installer .exe I (legally) bought and downloaded. It, and all of my other files, are sitting on my J: drive intact. While under my Windows 7 install, a few days ago I decided to use Priform's CCleaner and use its DriveWipe utility to wipe the C: drive. 1% into the process, I cancelled and attempted to use it again. It gives me an error saying it can't format the drive, so I poke around the Internet a bit, give up, and restart my computer. I first get an "OS is missing" error after the computer boots past the BIOS. I downloaded and put UBCD on a bootable USB to use another drivewiping tool to completely erase the C: drive, hoping it'll take the problem with it. No luck. I try to use TestDisk to make my J: my primary active drive, but no luck. I still get the "OS is missing" error. Or sometimes it'll hang at Verifying DMI Pool. Or sometimes I'll get the "NTLDR is missing" error. I get hold of Hiren's and put it on another bootable USB. I first I tried the Boot Windows 7 from Hard Drive option, and I get "Error 15: File Not Found". I tried the "Fix 'NTLDR is Missing'" option (I'm not quite sure why this is even showing up, since I'm trying to get into a HDD with Windows 7 installed. Probably messed up somewhere when I used TestDisk) and I get this list: I'll run through the error messages I get: 1st Try - Windows could not start because the following file is missing or corrupt: \system32\hal.dll 2nd Try - Windows could not start because the following file is missing or corrupt: \system32\ntoskrnl.exe 3rd Try - Windows could not start because of a computer disk hardware configuration problem. Could not read from the selected boot disk. Check boot path and disk hardware. 4th - 8th Try - Same as #3 9th Try - I/O Error accessing boot sector file multi(0)disk(0)fdisk(0)\BOOTSEC.DOS. And computer freezes. 10th Try - computer restarts Needless to say, not a single one of those works. I then tried to open up the Windows 7 exe I have sitting on my J: from the Mini-XP OS on Hiren's, but it won't run because I'm trying to run a 64-bit file from a 32-bit exe. At least, that's the problem according to these guys: http://social.technet.microsoft.com/...-b2f54e9c7d18/ I then borrowed a 64-bit Windows Home Premium CD from a friend to get to the recovery options. But I get the error message: This version of System Recovery Options is not compatible with the version of Windows you are trying to repair. Try using a recovery disc that is compatible with this version of Windows. I pressed Shift + F10 to get to the Command Prompt directly. These are the exact steps I took from there (paraphrased a little): X:\Sources>bootrec /Fixmbr The operation completed successfully. X:\Sources>bootrec /Fixboot The operation completed successfully. I restarted my computer, but it still didn't work. I unplugged the C: drive, then tried bootrec and Diskpart: X:\Sources> bootrec.exe X:\Sources> bootrec /RebuildBcd Total identified Windows installations: 1 [1] \\?\GLOBALROOT\Device\HarddiskVolume1\Windows Add installation to bootlist? Yes(Y)/No(N)/All(A):y The requested system device cannot be found. X:\Sources>DiskPart DISKPART> List Disk Disk # Status Size Free Dyn Gpt Disk 0_Online_465GB_0B_______* Disk 1 Online 1000MB 0B (this is Hiren's on a bootable usb) DISKPART> Select Disk 0 Disk 0 is now the selected disk. DISKPART> List Partition Partition # Type Size Offset Partition 1 System 465GB 31KB DISKPART> Select Partition 1 Partition 1 is now the selected partition DISKPART> Active The selected disk is not a fixed MBR disk. The ACTIVE command can only be used on fixed MBR disks. DISKPART> exit Leaving Diskpart... X:\Sources>bootrec /Fixmbr The operation completed successfully. X:\Sources>bootrec /Fixboot The operation completed successfully. Before I go any further, is there anything I'm overlooking/doing wrong? All I care about is making the J: and Windows 7 bootable again. SPECS: Windows 7 Professional 64-Bit GIGABYTE - Motherboard - Socket 775 - GA-P35-DS3R (rev. 2.1) Crucial Ballistix 2048MB PC6400 DDR2 800MHz (2x2GB) Intel Core 2 Duo E6700 Processor (2.6 (6GHZ) I think... not sure anymore C: HDD - SAMSUNG HD103UJ (1TB, not plugged in) J: HDD - WDC WD5000AKS-00V1A0 (500GB)

Read the article
xen domUs crashes or unavailability

- by Rush

I've xen server with 8 domU. Server is Xeon E31270 with 16gb ram. I think it is enough for 8 machines. Sometimes domU's crashes and i can't figure out the reason. After crash i can connect to console and there is somthing like this: Oct 8 22:20:49 server kernel: [30892.320780] lowmem_reserve[]: 0 0 0 0 Oct 8 22:20:49 server kernel: [30892.320790] Node 0 DMA: 10*4kB 3*8kB 13*16kB 10*32kB 7*64kB 3*128kB 2*256kB 2*512kB 1*1024kB 2*2048kB 0*4096kB = 8080kB Oct 8 22:20:49 server kernel: [30892.320817] Node 0 DMA32: 648*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 5760kB Oct 8 22:20:49 server kernel: [30892.320842] 1491 total pagecache pages Oct 8 22:20:49 server kernel: [30892.320847] 0 pages in swap cache Oct 8 22:20:49 server kernel: [30892.320852] Swap cache stats: add 0, delete 0, find 0/0 Oct 8 22:20:49 server kernel: [30892.320858] Free swap = 0kB Oct 8 22:20:49 server kernel: [30892.320862] Total swap = 0kB Oct 8 22:20:49 server kernel: [30892.324024] 524288 pages RAM Oct 8 22:20:49 server kernel: [30892.324024] 11010 pages reserved Oct 8 22:20:49 server kernel: [30892.324024] 424467 pages shared Oct 8 22:20:49 server kernel: [30892.324024] 503538 pages non-shared Oct 8 22:20:49 server kernel: [30892.330308] apache2 invoked oom-killer: gfp_mask=0x200da, order=0, oom_adj=0 Oct 8 22:20:49 server kernel: [30892.330322] apache2 cpuset=/ mems_allowed=0 Oct 8 22:20:49 server kernel: [30892.330330] Pid: 23938, comm: apache2 Not tainted 2.6.32-5-xen-amd64 #1 Oct 8 22:20:49 server kernel: [30892.330337] Call Trace: Oct 8 22:20:49 server kernel: [30892.330349] [<ffffffff810b7180>] ? oom_kill_process+0x7f/0x23f Oct 8 22:20:49 server kernel: [30892.330358] [<ffffffff810b76a4>] ? __out_of_memory+0x12a/0x141 Oct 8 22:20:49 server kernel: [30892.330367] [<ffffffff810b77fb>] ? out_of_memory+0x140/0x172 Oct 8 22:20:49 server kernel: [30892.330376] [<ffffffff810bb59c>] ? __alloc_pages_nodemask+0x4e5/0x5f5 Oct 8 22:20:49 server kernel: [30892.330385] [<ffffffff810cc224>] ? do_wp_page+0x386/0x707 Oct 8 22:20:49 server kernel: [30892.330395] [<ffffffff8100c3a5>] ? __raw_callee_save_xen_pud_val+0x11/0x1e Oct 8 22:20:49 server kernel: [30892.330404] [<ffffffff8100c369>] ? __raw_callee_save_xen_pmd_val+0x11/0x1e Oct 8 22:20:49 server kernel: [30892.330412] [<ffffffff810cdfc7>] ? handle_mm_fault+0x7aa/0x80f Oct 8 22:20:49 server kernel: [30892.330422] [<ffffffff8130f906>] ? do_page_fault+0x2e0/0x2fc Oct 8 22:20:49 server kernel: [30892.330433] [<ffffffff8130d7a5>] ? page_fault+0x25/0x30 Oct 8 22:20:49 server kernel: [30892.330439] Mem-Info: Oct 8 22:20:49 server kernel: [30892.330443] Node 0 DMA per-cpu: Oct 8 22:20:49 server kernel: [30892.330450] CPU 0: hi: 0, btch: 1 usd: 0 Oct 8 22:20:49 server kernel: [30892.330463] CPU 1: hi: 0, btch: 1 usd: 0 Oct 8 22:20:49 server kernel: [30892.330466] Node 0 DMA32 per-cpu: Oct 8 22:20:49 server kernel: [30892.330469] CPU 0: hi: 186, btch: 31 usd: 0 Oct 8 22:20:49 server kernel: [30892.330472] CPU 1: hi: 186, btch: 31 usd: 60 Oct 8 22:20:49 server kernel: [30892.330476] active_anon:342076 inactive_anon:115398 isolated_anon:0 Oct 8 22:20:49 server kernel: [30892.330477] active_file:268 inactive_file:481 isolated_file:0 Oct 8 22:20:49 server kernel: [30892.330477] unevictable:1125 dirty:2 writeback:13 unstable:0 Oct 8 22:20:49 server kernel: [30892.330478] free:3410 slab_reclaimable:1718 slab_unreclaimable:6946 Oct 8 22:20:49 server kernel: [30892.330478] mapped:899 shmem:113 pagetables:35697 bounce:0 Oct 8 22:20:49 server kernel: [30892.330502] Node 0 DMA free:8036kB min:32kB low:40kB high:48kB active_anon:1144kB inactive_anon:1268kB active_file:8kB inactive_file:8kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:11792kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:224kB kernel_stack:16kB pagetables:1228kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no Oct 8 22:20:49 server kernel: [30892.330518] lowmem_reserve[]: 0 2004 2004 2004 Oct 8 22:20:49 server kernel: [30892.330523] Node 0 DMA32 free:5604kB min:5708kB low:7132kB high:8560kB active_anon:1367160kB inactive_anon:460324kB active_file:1064kB inactive_file:1916kB unevictable:4500kB isolated(anon):0kB isolated(file):0kB present:2052320kB mlocked:4500kB dirty:8kB writeback:52kB mapped:3600kB shmem:452kB slab_reclaimable:6872kB slab_unreclaimable:27560kB kernel_stack:3528kB pagetables:141560kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:992 all_unreclaimable? no Oct 8 22:20:49 server kernel: [30892.330539] lowmem_reserve[]: 0 0 0 0 Oct 8 22:20:49 server kernel: [30892.330544] Node 0 DMA: 1*4kB 2*8kB 13*16kB 10*32kB 7*64kB 3*128kB 2*256kB 2*512kB 1*1024kB 2*2048kB 0*4096kB = 8036kB Oct 8 22:20:49 server kernel: [30892.330579] Node 0 DMA32: 609*4kB 2*8kB 1*16kB 0*32kB 1*64kB 0*128kB 0*256kB 0*512kB 1*1024kB 1*2048kB 0*4096kB = 5604kB Oct 8 22:20:49 server kernel: [30892.330605] 1522 total pagecache pages Oct 8 22:20:49 server kernel: [30892.330610] 0 pages in swap cache Oct 8 22:20:49 server kernel: [30892.330615] Swap cache stats: add 0, delete 0, find 0/0 Oct 8 22:20:49 server kernel: [30892.330621] Free swap = 0kB Oct 8 22:20:49 server kernel: [30892.330625] Total swap = 0kB Oct 8 22:20:49 server kernel: [30892.333018] 524288 pages RAM Oct 8 22:20:49 server kernel: [30892.333018] 11010 pages reserved Oct 8 22:20:49 server kernel: [30892.333018] 424367 pages shared Oct 8 22:20:49 server kernel: [30892.333018] 503658 pages non-shared Seems like there isn't enough memory for this domU. But there is no any memory problems reported in munin monitoring: As you see system uses around 0.2G and 1G is available. So my question is: Is it xen specific problem, that real memory usage and memory usage that shows munin are different (I've never seen such problems oh real hardware machines)? Or maybe it is just monitoring problem, that can't catch moment when there is unusual high load and domU go down? And how I can to defeat this problem? it is really annoying to catch messages in e-mail that domU went down. Btw, such situation was when domU had 2G memory.

Read the article
Various problems with software raid1 array built with Samsung 840 Pro SSDs

- by Andy B

I am bringing to ServerFault a problem that is tormenting me for 6+ months. I have a CentOS 6 (64bit) server with an md software raid-1 array with 2 x Samsung 840 Pro SSDs (512GB). Problems: Serious write speed problems: root [~]# time dd if=arch.tar.gz of=test4 bs=2M oflag=sync 146+1 records in 146+1 records out 307191761 bytes (307 MB) copied, 23.6788 s, 13.0 MB/s real 0m23.680s user 0m0.000s sys 0m0.932s When doing the above (or any other larger copy) the load spikes to unbelievable values (even over 100) going up from ~ 1. When doing the above I've also noticed very weird iostat results: Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 0.00 1589.50 0.00 54.00 0.00 13148.00 243.48 0.60 11.17 0.46 2.50 sdb 0.00 1627.50 0.00 16.50 0.00 9524.00 577.21 144.25 1439.33 60.61 100.00 md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 0.00 1602.00 0.00 12816.00 8.00 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 And it keeps it this way until it actually writes the file to the device (out from swap/cache/memory). The problem is that the second SSD in the array has svctm and await roughly 100 times larger than the second. For some reason the wear is different between the 2 members of the array root [~]# smartctl --attributes /dev/sda | grep -i wear 177 Wear_Leveling_Count 0x0013 094% 094 000 Pre-fail Always - 180 root [~]# smartctl --attributes /dev/sdb | grep -i wear 177 Wear_Leveling_Count 0x0013 070% 070 000 Pre-fail Always - 1005 The first SSD has a wear of 6% while the second SSD has a wear of 30%!! It's like the second SSD in the array works at least 5 times as hard as the first one as proven by the first iteration of iostat (the averages since reboot): Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util sda 10.44 51.06 790.39 125.41 8803.98 1633.11 11.40 0.33 0.37 0.06 5.64 sdb 9.53 58.35 322.37 118.11 4835.59 1633.11 14.69 0.33 0.76 0.29 12.97 md1 0.00 0.00 1.88 1.33 15.07 10.68 8.00 0.00 0.00 0.00 0.00 md2 0.00 0.00 1109.02 173.12 10881.59 1620.39 9.75 0.00 0.00 0.00 0.00 md0 0.00 0.00 0.41 0.01 3.10 0.02 7.42 0.00 0.00 0.00 0.00 What I've tried: I've updated the firmware to DXM05B0Q (following reports of dramatic improvements for 840Ps after this update). I have looked for "hard resetting link" in dmesg to check for cable/backplane issues but nothing. I have checked the alignment and I believe they are aligned correctly (1MB boundary, listing below) I have checked /proc/mdstat and the array is Optimal (second listing below). root [~]# fdisk -ul /dev/sda Disk /dev/sda: 512.1 GB, 512110190592 bytes 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x00026d59 Device Boot Start End Blocks Id System /dev/sda1 2048 4196351 2097152 fd Linux raid autodetect Partition 1 does not end on cylinder boundary. /dev/sda2 * 4196352 4605951 204800 fd Linux raid autodetect Partition 2 does not end on cylinder boundary. /dev/sda3 4605952 814106623 404750336 fd Linux raid autodetect root [~]# fdisk -ul /dev/sdb Disk /dev/sdb: 512.1 GB, 512110190592 bytes 255 heads, 63 sectors/track, 62260 cylinders, total 1000215216 sectors Units = sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes Disk identifier: 0x0003dede Device Boot Start End Blocks Id System /dev/sdb1 2048 4196351 2097152 fd Linux raid autodetect Partition 1 does not end on cylinder boundary. /dev/sdb2 * 4196352 4605951 204800 fd Linux raid autodetect Partition 2 does not end on cylinder boundary. /dev/sdb3 4605952 814106623 404750336 fd Linux raid autodetect /proc/mdstat root # cat /proc/mdstat Personalities : [raid1] md0 : active raid1 sdb2[1] sda2[0] 204736 blocks super 1.0 [2/2] [UU] md2 : active raid1 sdb3[1] sda3[0] 404750144 blocks super 1.0 [2/2] [UU] md1 : active raid1 sdb1[1] sda1[0] 2096064 blocks super 1.1 [2/2] [UU] unused devices: Running a read test with hdparm root [~]# hdparm -t /dev/sda /dev/sda: Timing buffered disk reads: 664 MB in 3.00 seconds = 221.33 MB/sec root [~]# hdparm -t /dev/sdb /dev/sdb: Timing buffered disk reads: 288 MB in 3.01 seconds = 95.77 MB/sec But look what happens if I add --direct root [~]# hdparm --direct -t /dev/sda /dev/sda: Timing O_DIRECT disk reads: 788 MB in 3.01 seconds = 262.08 MB/sec root [~]# hdparm --direct -t /dev/sdb /dev/sdb: Timing O_DIRECT disk reads: 534 MB in 3.02 seconds = 176.90 MB/sec Both tests increase but /dev/sdb doubles while /dev/sda increases maybe 20%. I just don't know what to make of this. As suggested by Mr. Wagner I've done another read test with dd this time and it confirms the hdparm test: root [/home2]# dd if=/dev/sda of=/dev/null bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 38.0855 s, 282 MB/s root [/home2]# dd if=/dev/sdb of=/dev/null bs=1G count=10 10+0 records in 10+0 records out 10737418240 bytes (11 GB) copied, 115.24 s, 93.2 MB/s So sda is 3 times faster than sdb. Or maybe sdb is doing also something else besides what sda does. Is there some way to find out if sdb is doing more than what sda does? UPDATE Again, as suggested by Mr. Wagner, I have swapped the 2 SSDs. And as he thought it would happen, the problem moved from sdb to sda. So I guess I'll RMA one of the SSDs. I wonder if the cage might be problematic. What is wrong with this array? Please help!

Read the article
curl can't verify cert using capath, but can with cacert option

- by phylae

I am trying to use curl to connect to a site using HTTPS. But curl is failing to verify the SSL cert. $ curl --verbose --capath ./certs/ --head https://example.com/ * About to connect() to example.com port 443 (#0) * Trying 1.1.1.1... connected * Connected to example.com (1.1.1.1) port 443 (#0) * successfully set certificate verify locations: * CAfile: none CApath: ./certs/ * SSLv3, TLS handshake, Client hello (1): * SSLv3, TLS handshake, Server hello (2): * SSLv3, TLS handshake, CERT (11): * SSLv3, TLS alert, Server hello (2): * SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed * Closing connection #0 curl: (60) SSL certificate problem, verify that the CA cert is OK. Details: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed More details here: http://curl.haxx.se/docs/sslcerts.html curl performs SSL certificate verification by default, using a "bundle" of Certificate Authority (CA) public keys (CA certs). If the default bundle file isn't adequate, you can specify an alternate file using the --cacert option. If this HTTPS server uses a certificate signed by a CA represented in the bundle, the certificate verification probably failed due to a problem with the certificate (it might be expired, or the name might not match the domain name in the URL). If you'd like to turn off curl's verification of the certificate, use the -k (or --insecure) option. I know about the -k option. But I do actually want to verify the cert. The certs directory has been properly hashed with c_rehash . and it contains: A Verisign intermediate cert Two self-signed certs The above site should be verified with the Verisign intermediate cert. When I use the --cacert option instead (and point directly to the Verisign cert) curl is able to verify the SSL cert. $ curl --verbose --cacert ./certs/verisign-intermediate-ca.crt --head https://example.com/ * About to connect() to example.com port 443 (#0) * Trying 1.1.1.1... connected * Connected to example.com (1.1.1.1) port 443 (#0) * successfully set certificate verify locations: * CAfile: ./certs/verisign-intermediate-ca.crt CApath: /etc/ssl/certs * SSLv3, TLS handshake, Client hello (1): * SSLv3, TLS handshake, Server hello (2): * SSLv3, TLS handshake, CERT (11): * SSLv3, TLS handshake, Server finished (14): * SSLv3, TLS handshake, Client key exchange (16): * SSLv3, TLS change cipher, Client hello (1): * SSLv3, TLS handshake, Finished (20): * SSLv3, TLS change cipher, Client hello (1): * SSLv3, TLS handshake, Finished (20): * SSL connection using RC4-SHA * Server certificate: * subject: C=US; ST=State; L=City; O=Company; OU=ou1; CN=example.com * start date: 2011-04-17 00:00:00 GMT * expire date: 2012-04-15 23:59:59 GMT * common name: example.com (matched) * issuer: C=US; O=VeriSign, Inc.; OU=VeriSign Trust Network; OU=Terms of use at https://www.verisign.com/rpa (c)10; CN=VeriSign Class 3 Secure Server CA - G3 * SSL certificate verify ok. > HEAD / HTTP/1.1 > User-Agent: curl/7.19.7 (x86_64-pc-linux-gnu) libcurl/7.19.7 OpenSSL/0.9.8k zlib/1.2.3.3 libidn/1.15 > Host: example.com > Accept: */* > < HTTP/1.1 404 Not Found HTTP/1.1 404 Not Found < Cache-Control: must-revalidate,no-cache,no-store Cache-Control: must-revalidate,no-cache,no-store < Content-Type: text/html;charset=ISO-8859-1 Content-Type: text/html;charset=ISO-8859-1 < Content-Length: 1267 Content-Length: 1267 < Server: Jetty(7.2.2.v20101205) Server: Jetty(7.2.2.v20101205) < * Connection #0 to host example.com left intact * Closing connection #0 * SSLv3, TLS alert, Client hello (1): In addition, if I try hitting one of the sites using a self signed cert and the --capath option, it also works. (Let me know if I should post an example of that.) This implies that curl is finding the cert directory, and it is properly hash. Finally, I am able to verify the SSL cert with openssl, using its -CApath option. $ openssl s_client -CApath ./certs/ -connect example.com:443 CONNECTED(00000003) depth=3 /C=US/O=VeriSign, Inc./OU=Class 3 Public Primary Certification Authority verify return:1 depth=2 /C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=(c) 2006 VeriSign, Inc. - For authorized use only/CN=VeriSign Class 3 Public Primary Certification Authority - G5 verify return:1 depth=1 /C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at https://www.verisign.com/rpa (c)10/CN=VeriSign Class 3 Secure Server CA - G3 verify return:1 depth=0 /C=US/ST=State/L=City/O=Company/OU=ou1/CN=example.com verify return:1 --- Certificate chain 0 s:/C=US/ST=State/L=City/O=Company/OU=ou1/CN=example.com i:/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at https://www.verisign.com/rpa (c)10/CN=VeriSign Class 3 Secure Server CA - G3 --- Server certificate -----BEGIN CERTIFICATE----- <cert removed> -----END CERTIFICATE----- subject=/C=US/ST=State/L=City/O=Company/OU=ou1/CN=example.com issuer=/C=US/O=VeriSign, Inc./OU=VeriSign Trust Network/OU=Terms of use at https://www.verisign.com/rpa (c)10/CN=VeriSign Class 3 Secure Server CA - G3 --- No client certificate CA names sent --- SSL handshake has read 1563 bytes and written 435 bytes --- New, TLSv1/SSLv3, Cipher is RC4-SHA Server public key is 2048 bit Secure Renegotiation IS NOT supported Compression: NONE Expansion: NONE SSL-Session: Protocol : TLSv1 Cipher : RC4-SHA Session-ID: D65C4C6D52E183BF1E7543DA6D6A74EDD7D6E98EB7BD4D48450885188B127717 Session-ID-ctx: Master-Key: 253D4A3477FDED5FD1353D16C1F65CFCBFD78276B6DA1A078F19A51E9F79F7DAB4C7C98E5B8F308FC89C777519C887E2 Key-Arg : None Start Time: 1303258052 Timeout : 300 (sec) Verify return code: 0 (ok) --- QUIT DONE How can I get curl to verify this cert using the --capath option?

Read the article
SPF record doesn't work (not sure which DNS server to tweak)

- by Ion

Problem: Google (and perhaps others) marks our emails as SPF neutral. Let me give you some background about the setup: initially got a dedicated server (Hetzner) with Plesk installed to host a domain/web application, let's say: bigjaws.com. Plesk automatically creates a DNS zone for it with some records for the various services it provides out of the box, e.g. webmail.bigjaws.com as a CNAME to bigjaws.com to provide Horde/whatever, etc. Let me point out four relevant of these records (where XXX.XXX.XXX.158 is our dedicated IP): bigjaws.com. A XXX.XXX.XXX.158 mail.bigjaws.com. A XXX.XXX.XXX.158 bigjaws.com MX (10) mail.bigjaws.com. bigjaws.com. TXT v=spf1 +a +mx -all The above records are not(?) valid anymore though, because after using this dedicated server for a while, our site got bigger and bigger so we decided to move our operations over to AWS (EC2, RDS, ELB, etc), but we retained the mail functionality as is, i.e. emails from [email protected] are sent by connecting to our dedicated server where Plesk takes care of things. This was decided in order not to setup anything from scratch. Of course for all DNS-related things we now use Route53. In Route53 I have the following records: mail.schoox.com. A XXX.XXX.XXX.158 bigjaws.com. MX (10) mail.bigjaws.com bigjaws.com. SPF "v=spf1 +ip4:XXX.XXX.XXX.158 +mx ~all" From my understanding of SPF, the SPF status should have been passed: I designate that all email being sent by bigjaws.com from XXX.XXX.XXX.158 are valid/not spam (I added +mx there but I'm not sure if needed). When a mail server receives an email, doesn't it lookup the SPF record of the domain and checks against the IP it got the email from? Checking with spfquery: root@box:~# spfquery -ip XXX.XXX.XXX.158 -sender [email protected] -rcpt-to [email protected] StartError Context: Failed to query MAIL-FROM ErrorCode: (2) Could not find a valid SPF record Error: No DNS data for 'bigjaws.com'. EndError noneneutral Please see http://www.openspf.org/Why?id=employee1%40bigjaws.com&ip=XXX.XXX.XXX.158&receiver=spfquery : Reason: default spfquery: XXX.XXX.XXX.158 is neither permitted nor denied by domain of bigjaws.com Received-SPF: neutral (spfquery: XXX.XXX.XXX.158 is neither permitted nor denied by domain of bigjaws.com) client-ip=XXX.XXX.XXX.158; [email protected]; If I go to the address listed above (openspf.org) it tells me that the message should have been accepted(!): spfquery rejected a message that claimed an envelope sender address of [email protected]. spfquery received a message from static.158.XXX.XXX.XXX.clients.your-server.de (XXX.XXX.XXX.158) that claimed an envelope sender address of [email protected]. The domain bigjaws.com has authorized static.158.XXX.XXX.XXX.clients.your-server.de (XXX.XXX.XXX.158) to send mail on its behalf, so the message should have been accepted. It is impossible for us to say why it was rejected. What should I do? If the problem persists, contact the bigjaws.com postmaster. Also, here are some headers from an email sent by one of our [email protected] addresses to a gmail.com address (by the way, bigjaws.de listed in the "Received: from" field was the initial domain hosted on the dedicated server before adding the .com one -- both are still listed as separate subscriptions under Plesk). Delivered-To: [email protected] Received: by 10.14.177.70 with SMTP id c46csp289656eem; Wed, 23 Oct 2013 01:11:00 -0700 (PDT) X-Received: by 10.14.102.66 with SMTP id c42mr306186eeg.47.1382515860386; Wed, 23 Oct 2013 01:11:00 -0700 (PDT) Return-Path: <[email protected]> Received: from bigjaws.de (static.158.XXX.XXX.XXX.clients.your-server.de. [XXX.XXX.XXX.158]) by mx.google.com with ESMTPS id l4si19438578eew.161.2013.10.23.01.10.59 for <[email protected]> (version=TLSv1 cipher=RC4-SHA bits=128/128); Wed, 23 Oct 2013 01:10:59 -0700 (PDT) Received-SPF: neutral (google.com: XXX.XXX.XXX.158 is neither permitted nor denied by best guess record for domain of [email protected]) client-ip=XXX.XXX.XXX.158; Authentication-Results: mx.google.com; spf=neutral (google.com: XXX.XXX.XXX.158 is neither permitted nor denied by best guess record for domain of [email protected]) [email protected] DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=default; d=bigjaws.com; b=WwRAS0WKjp9lO17iMluYPXOHzqRcOueiQT4rPdvy3WFf0QzoXiy6rLfxU/Ra53jL1vlPbwlLNa5gjoJBi7ZwKfUcvs3s02hJI7b3ozl0fEgJtTPKoCfnwl4bLPbtXNFu; h=Received:Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:Content-Type:Content-Transfer-Encoding; Received: (qmail 22722 invoked from network); 23 Oct 2013 10:10:59 +0200 Received: from hostname.static.ISP.com (HELO ?192.168.1.60?) (YYY.YYY.ISP.IP) by static.158.XXX.XXX.XXX.clients.your-server.de. with ESMTPSA (DHE-RSA-AES256-SHA encrypted, authenticated); 23 Oct 2013 10:10:59 +0200 Message-ID: <[email protected]> Date: Wed, 23 Oct 2013 11:11:00 +0300 From: BigJaws Employee <[email protected]> User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.0.1 MIME-Version: 1.0 To: [email protected] Subject: test SPF Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit test SPF Any ideas why SPF is not working correctly? Also, are there any DNS settings that are not needed anymore and create a problem?

Read the article
Windows 7: Search indexing is stuck

- by Ricket

When I open Indexing Options, it says: 4,317 items indexed Indexing in progress. Search results might not be complete during this time. It's stuck at 4,317 though; no more items have been indexed. Worst of all, SearchIndexer.exe is taking up 100% CPU (well, 50%, but I have a dual core CPU; it's taking up all processing power it can). It is not causing hard drive activity though. I tried clicking "Troubleshoot search and indexing" at the bottom of the Indexing Options window, but it couldn't find any problem. I've also tried the repair registry key that several websites suggest; I change HKLM\SOFTWARE\Microsoft\Windows Search SetupCompletedSuccessfully to 0 and restarted the computer, and it apparently repaired because it flipped back to 1, but the same problem continues to occur. It's reducing the battery life of my laptop and making it really hot so that my fans are running all the time. I've had to disable the Windows Search service. How can I fix this? Do I need to just flat-out reformat my computer? Update: I've tried rebuilding a couple times. There's nothing unusual about the locations I have to index, and I don't have any downloads in progress or anything like that. I don't see any reason why it stopped, and I noticed it much too late to do a system restore. At this point, I'm hoping someone will offer up some secret answer that will fix the problem, thus the bounty. Another update: I tried starting the service again, just to let it try yet again. It seemed okay at first (Indexing Options showed it operating at reduced speed due to user activity, and the number of files was going up). A while later I checked, and the service had stopped. Event viewer revealed some errors like this: Log Name: Application Source: Application Error Date: 2/1/2010 7:34:23 PM Event ID: 1000 Task Category: (100) Level: Error Keywords: Classic User: N/A Computer: ricky-win7 Description: Faulting application name: SearchIndexer.exe, version: 7.0.7600.16385, time stamp: 0x4a5bcdd0 Faulting module name: NLSData0007.dll, version: 6.1.7600.16385, time stamp: 0x4a5bda88 Exception code: 0xc0000005 Fault offset: 0x002141ba Faulting process id: 0x13a0 Faulting application start time: 0x01caa39f2a70ec02 Faulting application path: C:\Windows\system32\SearchIndexer.exe Faulting module path: C:\Windows\System32\NLSData0007.dll Report Id: b4f7a7ae-0f92-11df-87fc-e5d65d8794c2 Event Xml: <Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event"> <System> <Provider Name="Application Error" /> <EventID Qualifiers="0">1000</EventID> <Level>2</Level> <Task>100</Task> <Keywords>0x80000000000000</Keywords> <TimeCreated SystemTime="2010-02-02T00:34:23.000000000Z" /> <EventRecordID>10689</EventRecordID> <Channel>Application</Channel> <Computer>ricky-win7</Computer> <Security /> </System> <EventData> <Data>SearchIndexer.exe</Data> <Data>7.0.7600.16385</Data> <Data>4a5bcdd0</Data> <Data>NLSData0007.dll</Data> <Data>6.1.7600.16385</Data> <Data>4a5bda88</Data> <Data>c0000005</Data> <Data>002141ba</Data> <Data>13a0</Data> <Data>01caa39f2a70ec02</Data> <Data>C:\Windows\system32\SearchIndexer.exe</Data> <Data>C:\Windows\System32\NLSData0007.dll</Data> <Data>b4f7a7ae-0f92-11df-87fc-e5d65d8794c2</Data> </EventData> </Event> If you are having the same error and arrived here from a Google search, please comment or add an answer detailing your progress on this, if any...

Read the article
Upgrading from TFS 2010 RC to TFS 2010 RTM done

- by Martin Hinshelwood

Today is the big day, with the Launch of Visual Studio 2010 already done in Asia, and rolling around the world towards us, we are getting ready for the RTM (Released). We have had TFS 2010 in Production for nearly 6 months and have had only minimal problems. Update 12th April 2010 – Added Scott Hanselman’s tweet about the MSDN download release time. SSW was the first company in the world outside of Microsoft to deploy Visual Studio 2010 Team Foundation Server to production, not once, but twice. I am hoping to make it 3 in a row, but with all the hype around the new version, and with it being a production release and not just a go-live, I think there will be a lot of competition. Developers: MSDN will be updated with #vs2010 downloads and details at 10am PST *today*! @shanselman - Scott Hanselman Same as before, we need to Uninstall 2010 RC and install 2010 RTM. The installer will take care of all the complexity of actually upgrading any schema changes. If you are upgrading from TFS 2008 to TFS2010 you can follow our Rules To Better TFS 2010 Migration and read my post on our successes. We run TFS 2010 in a Hyper-V virtual environment, so we have the advantage of running a snapshot as well as taking a DB backup. Done - Snapshot the hyper-v server Microsoft does not support taking a snapshot of a running server, for very good reason, and Brian Harry wrote a post after my last upgrade with the reason why you should never snapshot a running server. Done - Uninstall Visual Studio Team Explorer 2010 RC You will need to uninstall all of the Visual Studio 2010 RC client bits that you have on the server. Done - Uninstall TFS 2010 RC Done - Install TFS 2010 RTM Done - Configure TFS 2010 RTM Pick the Upgrade option and point it at your existing “tfs_Configuration” database to load all of the existing settings Done - Upgrade the SharePoint Extensions Upgrade Build Servers (Pending) Test the server The back out plan, and you should always have one, is to restore the snapshot. Upgrading to Team Foundation Server 2010 – Done The first thing you need to do is off the TFS server and then log into the Hyper-v server and create a snapshot. Figure: Make sure you turn the server off and delete all old snapshots before you take a new one I noticed that the snapshot that was taken before the Beta 2 to RC upgrade was still there. You should really delete old snapshots before you create a new one, but in this case the SysAdmin (who is currently tucked up in bed) asked me not to. I guess he is worried about a developer messing up his server Turn your server on and wait for it to boot in anticipation of all the nice shiny RTM’ness that is coming next. The upgrade procedure for TFS2010 is to uninstal the old version and install the new one. Figure: Remove Visual Studio 2010 Team Foundation Server RC from the system. Figure: Most of the heavy lifting is done by the Uninstaller, but make sure you have removed any of the client bits first. Specifically Visual Studio 2010 or Team Explorer 2010. Once the uninstall is complete, this took around 5 minutes for me, you can begin the install of the RTM. Running the 64 bit OS will allow the application to use more than 2GB RAM, which while not common may be of use in heavy load situations. Figure: It is always recommended to install the 64bit version of a server application where possible. I do not think it is likely, with SharePoint 2010 and Exchange 2010 and even Windows Server 2008 R2 being 64 bit only, I do not think there will be another release of a server app that is 32bit. You then need to choose what it is you want to install. This depends on how you are running TFS and on how many servers. In our case we run TFS and the Team Foundation Build Service (controller only) on out TFS server along with Analysis services and Reporting Services. But our SharePoint server lives elsewhere. Figure: This always confuses people, but in reality it makes sense. Don’t install what you do not need. Every extra you install has an impact of performance. If you are integrating with SharePoint you will need to run this install on every Front end server in your farm and don’t forget to upgrade your Build servers and proxy servers later. Figure: Selecting only Team Foundation Server (TFS) and Team Foundation Build Services (TFBS) It is worth noting that if you have a lot of builds kicking off, and hence a lot of get operations against your TFS server, you can use a proxy server to cache the source control on another server in between your TFS server and your build servers. Figure: Installing Microsoft .NET Framework 4 takes the most time. Figure: Now run Windows Update, and SSW Diagnostic to make sure all your bits and bobs are up to date. Note: SSW Diagnostic will check your Power Tools, Add-on’s, Check in Policies and other bits as well. Configure Team Foundation Server 2010 – Done Now you can configure the server. If you have no key you will need to pick “Install a Trial Licence”, but it is only £500, or free with a MSDN subscription. Anyway, if you pick Trial you get 90 days to get your key. Figure: You can pick trial and add your key later using the TFS Server Admin. Here is where the real choices happen. We are doing an Upgrade from a previous version, so I will pick Upgrade the same as all you folks that are using the RC or TFS 2008. Figure: The upgrade wizard takes your existing 2010 or 2008 databases and upgraded them to the release. Once you have entered your database server name you can click “List available databases” and it will show what it can upgrade. Figure: Select your database from the list and at this point, make sure you have a valid backup. At this point you have not made ANY changes to the databases. At this point the configuration wizard will load configuration from your existing database if you have one. If you are upgrading TFS 2008 refer to Rules To Better TFS 2010 Migration. Mostly during the wizard the default values will suffice, but depending on the configuration you want you can pick different options. Figure: Set the application tier account and Authentication method to use. We use NTLM to keep things simple as we host our TFS server externally for our remote developers. Figure: Setting your TFS server URL’s to be the remote URL’s allows the reports to be accessed without using VPN. Very handy for those remote developers. Figure: Detected the existing Warehouse no problem. Figure: Again we love green ticks. It gives us a warm fuzzy feeling. Figure: The username for connecting to Reporting services should be a domain account (if you are on a domain that is). Figure: Setup the SharePoint integration to connect to your external SharePoint server. You can take the option to connect later. You then need to run all of your readiness checks. These check can save your life! it will check all of the settings that you have entered as well as checking all the external services are configures and running properly. There are two reasons that TFS 2010 is so easy and painless to install where previous version were not. Microsoft changes the install to two steps, Install and configuration. The second reason is that they have pulled out all of the stops in making the install run all the checks necessary to make sure that once you start the install that it will complete. if you find any errors I recommend that you report them on http://connect.microsoft.com so everyone can benefit from your misery. Figure: Now we have everything setup the configuration wizard can do its work. Figure: Took a while on the “Web site” stage for some point, but zipped though after that. Figure: last wee bit. TFS Needs to do a little tinkering with the data to complete the upgrade. Figure: All upgraded. I am not worried about the yellow triangle as SharePoint was being a little silly Exception Message: TF254021: The account name or password that you specified is not valid. (type TfsAdminException) Exception Stack Trace: at Microsoft.TeamFoundation.Management.Controls.WizardCommon.AccountSelectionControl.TestLogon(String connectionString) at System.ComponentModel.BackgroundWorker.WorkerThreadStart(Object argument) [Info @16:10:16.307] Benign exception caught as part of verify: Exception Message: TF255329: The following site could not be accessed: http://projects.ssw.com.au/. The server that you specified did not return the expected response. Either you have not installed the Team Foundation Server Extensions for SharePoint Products on this server, or a firewall is blocking access to the specified site or the SharePoint Central Administration site. For more information, see the Microsoft Web site (http://go.microsoft.com/fwlink/?LinkId=161206). (type TeamFoundationServerException) Exception Stack Trace: at Microsoft.TeamFoundation.Client.SharePoint.WssUtilities.VerifyTeamFoundationSharePointExtensions(ICredentials credentials, Uri url) at Microsoft.TeamFoundation.Admin.VerifySharePointSitesUrl.Verify() Inner Exception Details: Exception Message: TF249064: The following Web service returned an response that is not valid: http://projects.ssw.com.au/_vti_bin/TeamFoundationIntegrationService.asmx. This Web service is used for the Team Foundation Server Extensions for SharePoint Products. Either the extensions are not installed, the request resulted in HTML being returned, or there is a problem with the URL. Verify that the following URL points to a valid SharePoint Web application and that the application is available: http://projects.ssw.com.au. If the URL is correct and the Web application is operating normally, verify that a firewall is not blocking access to the Web application. (type TeamFoundationServerInvalidResponseException) Exception Data Dictionary: ResponseStatusCode = InternalServerError I’ll look at SharePoint after, probably the SharePoint box just needs a restart or a kick If there is a problem with SharePoint it will come out in testing, But I will definatly be passing this on to Microsoft. Upgrading the SharePoint connector to TFS 2010 You will need to upgrade the Extensions for SharePoint Products and Technologies on all of your SharePoint farm front end servers. To do this uninstall the TFS 2010 RC from it in the same way as the server, and then install just the RTM Extensions. Figure: Only install the SharePoint Extensions on your SharePoint front end servers. TFS 2010 supports both SharePoint 2007 and SharePoint 2010. Figure: When you configure SharePoint it uploads all of the solutions and templates. Figure: Everything is uploaded Successfully. Figure: TFS even remembered the settings from the previous installation, fantastic. Upgrading the Team Foundation Build Servers to TFS 2010 Just like on the SharePoint servers you will need to upgrade the Build Server to the RTM. Just uninstall TFS 2010 RC and then install only the Team Foundation Build Services component. Unlike on the SharePoint server you will probably have some version of Visual Studio installed. You will need to remove this as well. (Coming Soon) Connecting Visual Studio 2010 / 2008 / 2005 and Eclipse to TFS2010 If you have developers still on Visual Studio 2005 or 2008 you will need do download the respective compatibility pack: Visual Studio Team System 2005 Service Pack 1 Forward Compatibility Update for Team Foundation Server 2010 Visual Studio Team System 2008 Service Pack 1 Forward Compatibility Update for Team Foundation Server 2010 If you are using Eclipse you can download the new Team Explorer Everywhere install for connecting to TFS. Get your developers to check that you have the latest version of your applications with SSW Diagnostic which will check for Service Packs and hot fixes to Visual Studio as well. Technorati Tags: TFS,TFS2010,TFS 2010,Upgrade

Read the article
MVVM: Binding radio buttons to a view model?

- by David Veeneman

I have been trying to bind a group of radio buttons to a view model using the IsChecked button. After reviewing other posts, it appears that the IsChecked property simply doesn't work. I have put together a short demo that reproduces the problem, which I have included below. Here is my question: Is there a straightforward and reliable way to bind radio buttons using MVVM? Thanks. Additional information: The IsChecked property doesn't work for two reasons: When a button is selected, the IsChecked properties of other buttons in the group don't get set to false. When a button is selected, its own IsChecked property does not get set after the first time the button is selected. I am guessing that the binding is getting trashed by WPF on the first click. Demo project: Here is the code and markup for a simple demo that reproduces the problem. Create a WPF project and replace the markup in Window1.xaml with the following: <Window x:Class="WpfApplication1.Window1" xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation" xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml" Title="Window1" Height="300" Width="300" Loaded="Window_Loaded"> <StackPanel> <RadioButton Content="Button A" IsChecked="{Binding Path=ButtonAIsChecked, Mode=TwoWay}" /> <RadioButton Content="Button B" IsChecked="{Binding Path=ButtonBIsChecked, Mode=TwoWay}" /> </StackPanel> </Window> Replace the code in Window1.xaml.cs with the following code (a hack), which sets the view model: using System.Windows; namespace WpfApplication1 { /// <summary> /// Interaction logic for Window1.xaml /// </summary> public partial class Window1 : Window { public Window1() { InitializeComponent(); } private void Window_Loaded(object sender, RoutedEventArgs e) { this.DataContext = new Window1ViewModel(); } } } Now add the following code to the project as Window1ViewModel.cs: using System.Windows; namespace WpfApplication1 { public class Window1ViewModel { private bool p_ButtonAIsChecked; /// <summary> /// Summary /// </summary> public bool ButtonAIsChecked { get { return p_ButtonAIsChecked; } set { p_ButtonAIsChecked = value; MessageBox.Show(string.Format("Button A is checked: {0}", value)); } } private bool p_ButtonBIsChecked; /// <summary> /// Summary /// </summary> public bool ButtonBIsChecked { get { return p_ButtonBIsChecked; } set { p_ButtonBIsChecked = value; MessageBox.Show(string.Format("Button B is checked: {0}", value)); } } } } To reproduce the problem, run the app and click Button A. A message box will appear, saying that Button A's IsChecked property has been set to true. Now select Button B. Another message box will appear, saying that Button B's IsChecked property has been set to true, but there is no message box indicating that Button A's IsChecked property has been set to false--the property hasn't been changed. Now click Button A again. The button will be selected in the window, but no message box will appear--the IsChecked property has not been changed. Finally, click on Button B again--same result. The IsChecked property is not updated at all for either button after the button is first clicked.

Read the article

Search Results

Search found 68825 results on 2753 pages for 'problem'.

Page 760/2753 | < Previous Page | 756 757 758 759 760 761 762 763 764 765 766 767 | Next Page >

- by sandeepan

- by James

- by Geoff Dalgas

- by Andrew B

- by user1038179

- by Hammer Bro.

- by vlad

- by argibbs

- by Samuel Linde

- by Manca Weeks

- by Hammer Bro.

- by Gerald

- by Adrian Frühwirth

- by AVandelay05

- by Rick Koshi

- by bala

- by Patrick

- by hom3lesshom3boy

- by Rush

- by Andy B

- by phylae

- by Ion

- by Ricket

- by Martin Hinshelwood

- by David Veeneman

< Previous Page | 756 757 758 759 760 761 762 763 764 765 766 767 | Next Page >