Search Results

Search found 5221 results on 209 pages for 'low latency'.

Page 207/209 | < Previous Page | 203 204 205 206 207 208 209 | Next Page >

.NET interview, code structure and the design

- by j_lewis

I have been given the below .NET question in an interview. I don’t know why I got low marks. Unfortunately I did not get a feedback. Question: The file hockey.csv contains the results from the Hockey Premier League. The columns ‘For’ and ‘Against’ contain the total number of goals scored for and against each team in that season (so Alabama scored 79 goals against opponents, and had 36 goals scored against them). Write a program to print the name of the team with the smallest difference in ‘for’ and ‘against’ goals. the structure of the hockey.csv looks like this (it is a valid csv file, but I just copied the values here to get an idea) Team - For - Against Alabama 79 36 Washinton 67 30 Indiana 87 45 Newcastle 74 52 Florida 53 37 New York 46 47 Sunderland 29 51 Lova 41 64 Nevada 33 63 Boston 30 64 Nevada 33 63 Boston 30 64 Solution: class Program { static void Main(string[] args) { string path = @"C:\Users\<valid csv path>"; var resultEvaluator = new ResultEvaluator(string.Format(@"{0}\{1}",path, "hockey.csv")); var team = resultEvaluator.GetTeamSmallestDifferenceForAgainst(); Console.WriteLine( string.Format("Smallest difference in ‘For’ and ‘Against’ goals > TEAM: {0}, GOALS DIF: {1}", team.Name, team.Difference )); Console.ReadLine(); } } public interface IResultEvaluator { Team GetTeamSmallestDifferenceForAgainst(); } public class ResultEvaluator : IResultEvaluator { private static DataTable leagueDataTable; private readonly string filePath; private readonly ICsvExtractor csvExtractor; public ResultEvaluator(string filePath){ this.filePath = filePath; csvExtractor = new CsvExtractor(); } private DataTable LeagueDataTable{ get { if (leagueDataTable == null) { leagueDataTable = csvExtractor.GetDataTable(filePath); } return leagueDataTable; } } public Team GetTeamSmallestDifferenceForAgainst() { var teams = GetTeams(); var lowestTeam = teams.OrderBy(p => p.Difference).First(); return lowestTeam; } private IEnumerable<Team> GetTeams() { IList<Team> list = new List<Team>(); foreach (DataRow row in LeagueDataTable.Rows) { var name = row["Team"].ToString(); var @for = int.Parse(row["For"].ToString()); var against = int.Parse(row["Against"].ToString()); var team = new Team(name, against, @for); list.Add(team); } return list; } } public interface ICsvExtractor { DataTable GetDataTable(string csvFilePath); } public class CsvExtractor : ICsvExtractor { public DataTable GetDataTable(string csvFilePath) { var lines = File.ReadAllLines(csvFilePath); string[] fields; fields = lines[0].Split(new[] { ',' }); int columns = fields.GetLength(0); var dt = new DataTable(); //always assume 1st row is the column name. for (int i = 0; i < columns; i++) { dt.Columns.Add(fields[i].ToLower(), typeof(string)); } DataRow row; for (int i = 1; i < lines.GetLength(0); i++) { fields = lines[i].Split(new char[] { ',' }); row = dt.NewRow(); for (int f = 0; f < columns; f++) row[f] = fields[f]; dt.Rows.Add(row); } return dt; } } public class Team { public Team(string name, int against, int @for) { Name = name; Against = against; For = @for; } public string Name { get; private set; } public int Against { get; private set; } public int For { get; private set; } public int Difference { get { return (For - Against); } } } Output: Smallest difference in for' andagainst' goals TEAM: Boston, GOALS DIF: -34 Can someone please review my code and see anything obviously wrong here? They were only interested in the structure/design of the code and whether the program produces the correct result (i.e lowest difference). Much appreciated. "P.S - Please correct me if the ".net-interview" tag is not the right tag to use"

Read the article
Threads to make video out of images

- by masood

updates: I think/ suspect the imageIO is not thread safe. shared by all threads. the read() call might use resources that are also shared. Thus it will give the performance of a single thread no matter how many threads used. ? if its correct . what is the solution (in practical code) Single request and response model at one time do not utilizes full network/internet bandwidth, thus resulting in low performance. (benchmark is of half speed utilization or even lower) This is to make a video out of an IP cam that gives a new image on each request. http://149.5.43.10:8001/snapshot.jpg It makes a delay of 3 - 8 seconds no matter what I do. Changed thread no. and thread time intervals, debugged the code by System.out.println statements to see if threads work. All seems normal. Any help? Please show some practical code. You may modify mine. This code works (javascript) with much smoother frame rate and max bandwidth usage. but the later code (java) dont. same 3 to 8 seconds gap. <!DOCTYPE html> <html> <head> <script type="text/javascript"> (function(){ var img="/*url*/"; var interval=50; var pointer=0; function showImg(image,idx) { if(idx<=pointer) return; document.body.replaceChild(image,document.getElementsByTagName("img")[0]); pointer=idx; preload(); } function preload() { var cache=null,idx=0;; for(var i=0;i<5;i++) { idx=Date.now()+interval*(i+1); cache=new Image(); cache.onload=(function(ele,idx){return function(){showImg(ele,idx);};})(cache,idx); cache.src=img+"?"+idx; } } window.onload=function(){ document.getElementsByTagName("img")[0].onload=preload; document.getElementsByTagName("img")[0].src="/*initial url*/"; }; })(); </script> </head> <body> <img /> </body> </html> and of java (with problem) : package camba; import java.applet.Applet; import java.awt.Button; import java.awt.Graphics; import java.awt.Image; import java.awt.Label; import java.awt.Panel; import java.awt.TextField; import java.awt.event.ActionEvent; import java.awt.event.ActionListener; import java.net.URL; import java.security.Timestamp; import java.util.Date; import java.util.concurrent.TimeUnit; import java.util.concurrent.atomic.AtomicBoolean; import javax.imageio.ImageIO; public class Camba extends Applet implements ActionListener{ Image img; TextField textField; Label label; Button start,stop; boolean terminate = false; long viewTime; public void init(){ label = new Label("please enter camera URL "); add(label); textField = new TextField(30); add(textField); start = new Button("Start"); add(start); start.addActionListener(this); stop = new Button("Stop"); add(stop); stop.addActionListener(this); } public void actionPerformed(ActionEvent e){ Button source = (Button)e.getSource(); if(source.getLabel() == "Start"){ for (int i = 0; i < 7; i++) { myThread(50*i); } System.out.println("start..."); } if(source.getLabel() == "Stop"){ terminate = true; System.out.println("stop..."); } } public void paint(Graphics g) { update(g); } public void update(Graphics g){ try{ viewTime = System.currentTimeMillis(); g.drawImage(img, 100, 100, this); } catch(Exception e) { e.printStackTrace(); } } public void myThread(final int sleepTime){ new Thread(new Runnable() { public void run() { while(!terminate){ try { TimeUnit.MILLISECONDS.sleep(sleepTime); } catch (InterruptedException ex) { ex.printStackTrace(); } long requestTime= 0; Image tempImage = null; try { URL pic = null; requestTime= System.currentTimeMillis(); pic = new URL(getDocumentBase(), textField.getText()); tempImage = ImageIO.read(pic); } catch(Exception e) { e.printStackTrace(); } if(requestTime >= /*last view time*/viewTime){ img = tempImage; Camba.this.repaint(); } } }}).start(); System.out.println("thread started..."); } }

Read the article
trying to make an accordion menu from a list - jquery indexhibit

- by orionrush

Hello - Im teaching my self javascript & jquery so this might be a bit of a low brow question or entirely too much code for anyone to wade through, but Im hoping for some feedback. I have looked around and haven't found a thread that looks like it will deals neatly with my question. Im using the cms indexhibit (cant create a new tag!) and trying to create an accordion style menu from the menu list it generates. I basically have the behaviour Im after, modifying an existing bit of work but there are quite a few foibles, which are no doubt a conflict between the .click and .toggle and a confused use if statements. I basically want to start from scratch and redo this so I can a) learn from my mistakes b) understand what's happening. Im having trouble now because I dont know where to go from here, or how to trouble shoot it. Can anyone give me a quick analysis how the the script in the head of the document work together? Also any insight into the nature of the conflicts Im seeing and what approach might take to remedy them? If you were going to start afresh what would be your approach? Here is a test to see it in action (warts and all): http://stillstatic.nfshost.com/ This script goes into the document head: <script type='text/javascript'> //im not entirely clear as to what this achieves path = 'path/to/script/'; $(document).ready(function() { setTimeout('move_up()', 1); expandingMenu(0); expandingMenu(1); expandingMenu(2); expandingMenu(3); expandingMenu(4); //etc }); </script> the generated list: <ul> <li class='section-title active_menu'>blogs</li> <li><a class="active" href='#' onclick="do_click();">3</a></li> </ul> <ul> //this menu section dose not have a label: class .section-title <li><a href='#' onclick="do_click();">1</a></li> <li><a href='#' onclick="do_click();">2</a></li> </ul> <ul> //this menu section is not the 'active menu' this is achieved by the jquery script <li class='section-title'>writing</li> <li><a href='#' onclick="do_click();">4</a></li> <li><a href='#' onclick="do_click();">5</a></li> </ul> The meat of in an external script: function expandingMenu(num) { var speed = 500; var menu_title = $("#menu ul").eq(num).children(":first"); // ie. first child be the title with the class .section-title unless the user turned it off var menu_items = $("#menu ul").eq(num).children().filter(function (index) { return index 0; }); // ie. any li NOT in position 0, below li.section-title if (menu_items.is(".active") == true) { menu_title.addClass("active_menu"); //Add a class to the active list so we can style it. } if (menu_title.is(".section-title") == true){ // this if prevents interference with users who turn off the section titling if ((menu_items.is(".active") == false) && (menu_items.is(":visible")) ) { menu_items.hide(0);// first we hide the inactive exhibits } $('li').click(function (){ if ( (menu_title.is(":visible") == true) ){ menu_items.hide(speed); } if ( (menu_items.is(":hidden") == true ) && (('')) ){// ?! without this second condition things break down. . . menu_items.show(speed); } }) menu_title.css({cursor:"pointer"}).toggle( // add click functions + pointer to menu_title function () { menu_items.show(speed);//Open it up }, function () { // this function could even be empty but without the if things get weird if (menu_items.is(".xx")) menu_items.hide(speed); //Take the menu item off of active duty! } ) } }

Read the article
How do I make my applet turn the user's input into an integer and compare it to the computer's random number?

- by Kitteran

I'm in beginning programming and I don't fully understand applets yet. However, (with some help from internet tutorials) I was able to create an applet that plays a game of guess with the user. The applet compiles fine, but when it runs, this error message appears: "Exception in thread "main" java.lang.NumberFormatException: For input string: "" at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48) at java.lang.Integer.parseInt(Integer.java:470) at java.lang.Integer.parseInt(Integer.java:499) at Guess.createUserInterface(Guess.java:101) at Guess.<init>(Guess.java:31) at Guess.main(Guess.java:129)" I've tried moving the "userguess = Integer.parseInt( t1.getText() );" on line 101 to multiple places, but I still get the same error. Can anyone tell me what I'm doing wrong? The Code: // Creates the game GUI. import javax.swing.*; import java.awt.*; import java.awt.event.*; public class Guess extends JFrame{ private JLabel userinputJLabel; private JLabel lowerboundsJLabel; private JLabel upperboundsJLabel; private JLabel computertalkJLabel; private JButton guessJButton; private JPanel guessJPanel; static int computernum; int userguess; static void declare() { computernum = (int) (100 * Math.random()) + 1; //random number picked (1-100) } // no-argument constructor public Guess() { createUserInterface(); } // create and position GUI components private void createUserInterface() { // get content pane and set its layout Container contentPane = getContentPane(); contentPane.setLayout( null ); contentPane.setBackground( Color.white ); // set up userinputJLabel userinputJLabel = new JLabel(); userinputJLabel.setText( "Enter Guess Here -->" ); userinputJLabel.setBounds( 0, 65, 120, 50 ); userinputJLabel.setHorizontalAlignment( JLabel.CENTER ); userinputJLabel.setBackground( Color.white ); userinputJLabel.setOpaque( true ); contentPane.add( userinputJLabel ); // set up lowerboundsJLabel lowerboundsJLabel = new JLabel(); lowerboundsJLabel.setText( "Lower Bounds Of Guess = 1" ); lowerboundsJLabel.setBounds( 0, 0, 170, 50 ); lowerboundsJLabel.setHorizontalAlignment( JLabel.CENTER ); lowerboundsJLabel.setBackground( Color.white ); lowerboundsJLabel.setOpaque( true ); contentPane.add( lowerboundsJLabel ); // set up upperboundsJLabel upperboundsJLabel = new JLabel(); upperboundsJLabel.setText( "Upper Bounds Of Guess = 100" ); upperboundsJLabel.setBounds( 250, 0, 170, 50 ); upperboundsJLabel.setHorizontalAlignment( JLabel.CENTER ); upperboundsJLabel.setBackground( Color.white ); upperboundsJLabel.setOpaque( true ); contentPane.add( upperboundsJLabel ); // set up computertalkJLabel computertalkJLabel = new JLabel(); computertalkJLabel.setText( "Computer Says:" ); computertalkJLabel.setBounds( 0, 130, 100, 50 ); //format (x, y, width, height) computertalkJLabel.setHorizontalAlignment( JLabel.CENTER ); computertalkJLabel.setBackground( Color.white ); computertalkJLabel.setOpaque( true ); contentPane.add( computertalkJLabel ); //Set up guess jbutton guessJButton = new JButton(); guessJButton.setText( "Enter" ); guessJButton.setBounds( 250, 78, 100, 30 ); contentPane.add( guessJButton ); guessJButton.addActionListener( new ActionListener() // anonymous inner class { // event handler called when Guess button is pressed public void actionPerformed( ActionEvent event ) { guessActionPerformed( event ); } } // end anonymous inner class ); // end call to addActionListener // set properties of application's window setTitle( "Guess Game" ); // set title bar text setSize( 500, 500 ); // set window size setVisible( true ); // display window //create text field TextField t1 = new TextField(); // Blank text field for user input t1.setBounds( 135, 78, 100, 30 ); contentPane.add( t1 ); userguess = Integer.parseInt( t1.getText() ); //create section for computertalk Label computertalkLabel = new Label(""); computertalkLabel.setBounds( 115, 130, 300, 50); contentPane.add( computertalkLabel ); } // Display computer reactions to user guess private void guessActionPerformed( ActionEvent event ) { if (userguess > computernum) //if statements (computer's reactions to user guess) computertalkJLabel.setText( "Computer Says: Too High" ); else if (userguess < computernum) computertalkJLabel.setText( "Computer Says: Too Low" ); else if (userguess == computernum) computertalkJLabel.setText( "Computer Says:You Win!" ); else computertalkJLabel.setText( "Computer Says: Error" ); } // end method oneJButtonActionPerformed // end method createUserInterface // main method public static void main( String args[] ) { Guess application = new Guess(); application.setDefaultCloseOperation (JFrame.EXIT_ON_CLOSE); } // end method main } // end class Phone

Read the article
bandwidth throttling C linux

- by bob moch

hi im currently creating a function to create a sleep time i can pause between packets for my port scanner im creating for personal/educational use for my home network. what im currently doing is opening /proc/net/dev and reading the 9th set of digits for the eth0 interface to find out the current packets being set and then reading it again and doing some math to figure out a delay to sleep between sending a packet to a port to identify it and fingerprint it. my problem is that no matter what throttle % i use it always seems to send the same rate of packets. i think its mainly my way of mathematically creating my sleep delay. edit:: dont mind the function declaration and the struct stuff all im doing is spawning this function in a thread and passing a pointer to a struct to the function, recreating the struct locally and then freeing the passed structs memory. void *bandwidthmonitor_cmd(void *param) { char cmdline[1024], *bytedata[19]; int i = 0, ii = 0; long long prevbytes = 0, currentbytes = 0, elapsedbytes = 0, byteusage = 0, maxthrottle = 0; command_struct bandwidth = *((command_struct *)param); free(param); //printf("speed: %d\n throttle: %d\n\n", UPLOAD_SPEED, bandwidth.throttle); maxthrottle = UPLOAD_SPEED * bandwidth.throttle / 100; //printf("max throttle:%lld\n", maxthrottle); FILE *f = fopen("/proc/net/dev", "r"); if(f != NULL) { while(1) { while(fgets(cmdline, sizeof(cmdline), f) != NULL) { cmdline[strlen(cmdline)] = '\0'; if(strncmp(cmdline, " eth0", 6) == 0) { bytedata[0] = strtok(cmdline, " "); while(bytedata[i] != NULL) { i++; bytedata[i] = strtok(NULL, " "); } bytedata[i + 1] = '\0'; currentbytes = atoi(bytedata[9]); } } i = 0; rewind(f); elapsedbytes = currentbytes - prevbytes; prevbytes = currentbytes; byteusage = 8 * (elapsedbytes / 1024); //printf("usage:%lld\n",byteusage); if(ii & 0x40) { SLEEP += (maxthrottle - byteusage) * -1.1;//-2.5; if(SLEEP < 0){ SLEEP = 0; } //printf("sleep:%d\n", SLEEP); } usleep(25000); ii++; } } return NULL; } SLEEP and UPLOAD_SPEED are global variables and UPLOAD_SPEED is in kb/s and generated via a speedtest function that gets the upload speed of my computer. this function is running inside a POSIX thread updating SLEEP which my threads doing the socket work grab to sleep by after every packet. as testing instead of only doing the ports i want to check i make it do all the ports over and over again so i can run dstat on a machine to check bandwidth and no matter what bandwidth.throttle is set to it always seems to generate the same amount of bandwidth to the dstat machine. the way i calculate how much i "should" throttle by is by finding the maximum throttle speed which is defined as maxthrottle = upload_speed * throttle / 100; for example if my upload speed was 1000kb/s and my throttle was 90 (90%) my max throttle would be 900kb/s from there it would find the current bytes sent from /proc/net/dev and then find my sleep time via incrementing or decrementing it via sleep += (maxthrottle - bytesysed) * -1.1; this should in theory increase or decrease the sleep time based on how many bytes used there are. the if(ii & 0x40) statement is just for some moderation control. it makes it so it only sets sleep to a new time every 30-40 iterations. final notes: the main problem is that the sleep timer does not seem to modify the speed of packets being set. or maybe its just my implementation because on a freshly restarted machine where /proc/net/dev has low numbers of bytes sent it seems to raise the sleep timer accordingly on my 60kb/s upload machine (ex if i set the throttle to 2 it will incline the sleep timer until network bandwidth out reaches the max bandwidth threshold, but when i try running it on a server which as been online forever it doesnt seem to work as nicely if at all. if anyone can suggest a new method of monitoring the network to adjust a sleep delay then let me know or if anyone sees a flaw in my code. thank you.

Read the article
nagios NRPE: Unable to read output

- by user555854

I currently set up a script to restart my http servers + php5 fpm but can't get it to work. I have googled and have found that mostly permissions are the problems of my error but can't figure it out. I start my script using /usr/lib/nagios/plugins/check_nrpe -H bart -c restart_http This is the output in my syslog on the node I want to restart Jun 27 06:29:35 bart nrpe[8926]: Connection from 192.168.133.17 port 25028 Jun 27 06:29:35 bart nrpe[8926]: Host address is in allowed_hosts Jun 27 06:29:35 bart nrpe[8926]: Handling the connection... Jun 27 06:29:35 bart nrpe[8926]: Host is asking for command 'restart_http' to be run... Jun 27 06:29:35 bart nrpe[8926]: Running command: /usr/bin/sudo /usr/lib/nagios/plugins/http-restart Jun 27 06:29:35 bart nrpe[8926]: Command completed with return code 1 and output: Jun 27 06:29:35 bart nrpe[8926]: Return Code: 1, Output: NRPE: Unable to read output Jun 27 06:29:35 bart nrpe[8926]: Connection from 192.168.133.17 closed. If I run the command myself it runs fine (but asks for a password) (nagios user) This are the script permission and the script contents. -rwxrwxrwx 1 nagios nagios 142 Jun 26 21:41 /usr/lib/nagios/plugins/http-restart #!/bin/bash echo "ok" /etc/init.d/nginx stop /etc/init.d/nginx start /etc/init.d/php5-fpm stop /etc/init.d/php5-fpm start echo "done" I also added this line to visudo nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ My local nagios nrpe.cfg ############################################################################# # Sample NRPE Config File # Written by: Ethan Galstad ([email protected]) # # # NOTES: # This is a sample configuration file for the NRPE daemon. It needs to be # located on the remote host that is running the NRPE daemon, not the host # from which the check_nrpe client is being executed. ############################################################################# # LOG FACILITY # The syslog facility that should be used for logging purposes. log_facility=daemon # PID FILE # The name of the file in which the NRPE daemon should write it's process ID # number. The file is only written if the NRPE daemon is started by the root # user and is running in standalone mode. pid_file=/var/run/nagios/nrpe.pid # PORT NUMBER # Port number we should wait for connections on. # NOTE: This must be a non-priviledged port (i.e. > 1024). # NOTE: This option is ignored if NRPE is running under either inetd or xinetd server_port=5666 # SERVER ADDRESS # Address that nrpe should bind to in case there are more than one interface # and you do not want nrpe to bind on all interfaces. # NOTE: This option is ignored if NRPE is running under either inetd or xinetd #server_address=127.0.0.1 # NRPE USER # This determines the effective user that the NRPE daemon should run as. # You can either supply a username or a UID. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd nrpe_user=nagios # NRPE GROUP # This determines the effective group that the NRPE daemon should run as. # You can either supply a group name or a GID. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd nrpe_group=nagios # ALLOWED HOST ADDRESSES # This is an optional comma-delimited list of IP address or hostnames # that are allowed to talk to the NRPE daemon. # # Note: The daemon only does rudimentary checking of the client's IP # address. I would highly recommend adding entries in your /etc/hosts.allow # file to allow only the specified host to connect to the port # you are running this daemon on. # # NOTE: This option is ignored if NRPE is running under either inetd or xinetd allowed_hosts=127.0.0.1,192.168.133.17 # COMMAND ARGUMENT PROCESSING # This option determines whether or not the NRPE daemon will allow clients # to specify arguments to commands that are executed. This option only works # if the daemon was configured with the --enable-command-args configure script # option. # # *** ENABLING THIS OPTION IS A SECURITY RISK! *** # Read the SECURITY file for information on some of the security implications # of enabling this variable. # # Values: 0=do not allow arguments, 1=allow command arguments dont_blame_nrpe=0 # COMMAND PREFIX # This option allows you to prefix all commands with a user-defined string. # A space is automatically added between the specified prefix string and the # command line from the command definition. # # *** THIS EXAMPLE MAY POSE A POTENTIAL SECURITY RISK, SO USE WITH CAUTION! *** # Usage scenario: # Execute restricted commmands using sudo. For this to work, you need to add # the nagios user to your /etc/sudoers. An example entry for alllowing # execution of the plugins from might be: # # nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ # # This lets the nagios user run all commands in that directory (and only them) # without asking for a password. If you do this, make sure you don't give # random users write access to that directory or its contents! command_prefix=/usr/bin/sudo # DEBUGGING OPTION # This option determines whether or not debugging messages are logged to the # syslog facility. # Values: 0=debugging off, 1=debugging on debug=1 # COMMAND TIMEOUT # This specifies the maximum number of seconds that the NRPE daemon will # allow plugins to finish executing before killing them off. command_timeout=60 # CONNECTION TIMEOUT # This specifies the maximum number of seconds that the NRPE daemon will # wait for a connection to be established before exiting. This is sometimes # seen where a network problem stops the SSL being established even though # all network sessions are connected. This causes the nrpe daemons to # accumulate, eating system resources. Do not set this too low. connection_timeout=300 # WEEK RANDOM SEED OPTION # This directive allows you to use SSL even if your system does not have # a /dev/random or /dev/urandom (on purpose or because the necessary patches # were not applied). The random number generator will be seeded from a file # which is either a file pointed to by the environment valiable $RANDFILE # or $HOME/.rnd. If neither exists, the pseudo random number generator will # be initialized and a warning will be issued. # Values: 0=only seed from /dev/[u]random, 1=also seed from weak randomness #allow_weak_random_seed=1 # INCLUDE CONFIG FILE # This directive allows you to include definitions from an external config file. #include=<somefile.cfg> # INCLUDE CONFIG DIRECTORY # This directive allows you to include definitions from config files (with a # .cfg extension) in one or more directories (with recursion). #include_dir=<somedirectory> #include_dir=<someotherdirectory> # COMMAND DEFINITIONS # Command definitions that this daemon will run. Definitions # are in the following format: # # command[<command_name>]=<command_line> # # When the daemon receives a request to return the results of <command_name> # it will execute the command specified by the <command_line> argument. # # Unlike Nagios, the command line cannot contain macros - it must be # typed exactly as it should be executed. # # Note: Any plugins that are used in the command lines must reside # on the machine that this daemon is running on! The examples below # assume that you have plugins installed in a /usr/local/nagios/libexec # directory. Also note that you will have to modify the definitions below # to match the argument format the plugins expect. Remember, these are # examples only! # The following examples use hardcoded command arguments... command[check_users]=/usr/lib/nagios/plugins/check_users -w 5 -c 10 command[check_load]=/usr/lib/nagios/plugins/check_load -w 15,10,5 -c 30,25,20 command[check_hda1]=/usr/lib/nagios/plugins/check_disk -w 20% -c 10% -p /dev/hda1 command[check_zombie_procs]=/usr/lib/nagios/plugins/check_procs -w 5 -c 10 -s Z command[check_total_procs]=/usr/lib/nagios/plugins/check_procs -w 150 -c 200 # The following examples allow user-supplied arguments and can # only be used if the NRPE daemon was compiled with support for # command arguments *AND* the dont_blame_nrpe directive in this # config file is set to '1'. This poses a potential security risk, so # make sure you read the SECURITY file before doing this. #command[check_users]=/usr/lib/nagios/plugins/check_users -w $ARG1$ -c $ARG2$ #command[check_load]=/usr/lib/nagios/plugins/check_load -w $ARG1$ -c $ARG2$ #command[check_disk]=/usr/lib/nagios/plugins/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$ #command[check_procs]=/usr/lib/nagios/plugins/check_procs -w $ARG1$ -c $ARG2$ -s $ARG3$ command[restart_http]=/usr/lib/nagios/plugins/http-restart # # local configuration: # if you'd prefer, you can instead place directives here include=/etc/nagios/nrpe_local.cfg # # you can place your config snipplets into nrpe.d/ include_dir=/etc/nagios/nrpe.d/ My Sudoers files # /etc/sudoers # # This file MUST be edited with the 'visudo' command as root. # # See the man page for details on how to write a sudoers file. # Defaults env_reset # Host alias specification # User alias specification # Cmnd alias specification # User privilege specification root ALL=(ALL) ALL nagios ALL=(ALL) NOPASSWD: /usr/lib/nagios/plugins/ # Allow members of group sudo to execute any command # (Note that later entries override this, so you might need to move # it further down) %sudo ALL=(ALL) ALL # #includedir /etc/sudoers.d Hopefully someone can help!

Read the article
Failing Sata HDD

- by DaveCol

I think my HDD is fried... Could someone confirm or help me restore it? I was using Hardware RAID 1 Configuration [2 x 160GB SATA HDD] on a CentOS 4 Installation. All of a sudden I started seeing bad sectors on the second HDD which stopped being mirrored. I have removed the RAID array and have tested with SMART which showed the following error: 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 I have no clue what this means, or if I can recover from it. Could someone give me some ideas on how to fix this, or what HDD to get to replace this? Complete SMART report: Smartctl version 5.33 [i686-redhat-linux-gnu] Copyright (C) 2002-4 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: GB0160CAABV Serial Number: 6RX58NAA Firmware Version: HPG1 User Capacity: 160,041,885,696 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 7 ATA Standard is: ATA/ATAPI-7 T13 1532D revision 4a Local Time is: Tue Oct 19 13:42:42 2010 COT SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED See vendor-specific Attribute list for marginal Attributes. General SMART Values: Offline data collection status: (0x82) Offline data collection activity was completed without error. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 433) seconds. Offline data collection capabilities: (0x5b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 54) minutes. SMART Attributes Data Structure revision number: 10 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000f 100 253 006 Pre-fail Always - 0 3 Spin_Up_Time 0x0002 097 097 000 Old_age Always - 0 4 Start_Stop_Count 0x0033 100 100 020 Pre-fail Always - 152 5 Reallocated_Sector_Ct 0x0033 095 095 036 Pre-fail Always - 214 7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 73109713 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 15133 10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0 12 Power_Cycle_Count 0x0033 100 100 020 Pre-fail Always - 154 184 Unknown_Attribute 0x0032 038 038 000 Old_age Always - 62 187 Unknown_Attribute 0x003a 001 001 051 Old_age Always FAILING_NOW 4645 189 Unknown_Attribute 0x0022 100 100 000 Old_age Always - 0 190 Unknown_Attribute 0x001a 061 055 000 Old_age Always - 656408615 194 Temperature_Celsius 0x0000 039 045 000 Old_age Offline - 39 (Lifetime Min/Max 0/22) 195 Hardware_ECC_Recovered 0x0032 070 059 000 Old_age Always - 12605265 197 Current_Pending_Sector 0x0000 100 100 000 Old_age Offline - 1 198 Offline_Uncorrectable 0x0000 100 100 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0000 200 200 000 Old_age Offline - 62 SMART Error Log Version: 1 ATA Error Count: 4645 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 4645 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 02 7b 86 b1 ea 00 00:38:52.796 READ DMA ec 03 45 00 00 00 a0 00 00:38:52.796 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:52.794 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 04 79 86 b1 ea 00 00:38:49.935 READ DMA Error 4644 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 04 79 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:49.991 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:49.935 READ DMA Error 4643 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4642 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA Error 4641 occurred at disk power-on lifetime: 15132 hours (630 days + 12 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 00 7b 86 b1 ea Error: UNC at LBA = 0x0ab1867b = 179406459 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 06 77 86 b1 ea 00 00:38:41.517 READ DMA ec 03 45 00 00 00 a0 00 00:38:41.515 IDENTIFY DEVICE ef 03 45 00 00 00 a0 00 00:38:41.515 SET FEATURES [Set transfer mode] ec 00 00 7b 86 b1 a0 00 00:38:41.513 IDENTIFY DEVICE c8 00 06 77 86 b1 ea 00 00:38:38.706 READ DMA SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 15131 - # 2 Short offline Completed without error 00% 15131 - SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.

Read the article
Hard Disk DRDY error: is it a crash

- by pranjal

I am using IBM Thinkpad, 1.7GHz, 512 RAM with Linux Mint 9 installed. I have two partitions in addition to root. One of the partitions became read-only yesterday, after which I rebooted my system. It is extremely slow along with DRDY Error : Is my Hard disk crashed ? Error Log while booting. Differences between boot sector and its backup. failed command : READ DMA BMDMA : stat 0X25 ata 1.00 : status : { DRDY ERR } ata 1.00 : status :{ UNC } Buffer I/O error on logical device, logical block 65467 smartctl output for the partition: mint mint # smartctl -a /dev/sda1 smartctl version 5.38 [i686-pc-linux-gnu] Copyright (C) 2002-8 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Device Model: TOSHIBA MK4026GAX RoHS Serial Number: X5LY1623T Firmware Version: PA107E User Capacity: 40,007,761,920 bytes Device is: Not in smartctl database [for details use: -P showall] ATA Version is: 6 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Thu Feb 17 06:48:25 2011 UTC SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x84) Offline data collection activity was suspended by an interrupting command from host. Auto Offline Data Collection: Enabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 153) seconds. Offline data collection capabilities: (0x1b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. No Conveyance Self-test supported. No Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. No General Purpose Logging support. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 30) minutes. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 2 Throughput_Performance 0x0005 100 100 050 Pre-fail Offline - 0 3 Spin_Up_Time 0x0027 100 100 001 Pre-fail Always - 310 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 3968 5 Reallocated_Sector_Ct 0x0033 100 100 050 Pre-fail Always - 40 7 Seek_Error_Rate 0x000b 100 100 050 Pre-fail Always - 0 8 Seek_Time_Performance 0x0005 100 100 050 Pre-fail Offline - 0 9 Power_On_Hours 0x0032 082 082 000 Old_age Always - 7257 10 Spin_Retry_Count 0x0033 179 100 030 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 3484 192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age Always - 489 193 Load_Cycle_Count 0x0032 064 064 000 Old_age Always - 367150 194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 36 (Lifetime Min/Max 14/57) 196 Reallocated_Event_Count 0x0032 100 100 000 Old_age Always - 33 197 Current_Pending_Sector 0x0032 100 100 000 Old_age Always - 82 198 Offline_Uncorrectable 0x0030 100 100 000 Old_age Offline - 1 199 UDMA_CRC_Error_Count 0x0032 200 253 000 Old_age Always - 0 220 Disk_Shift 0x0002 100 100 000 Old_age Always - 101 222 Loaded_Hours 0x0032 085 085 000 Old_age Always - 6146 223 Load_Retry_Count 0x0032 100 100 000 Old_age Always - 0 224 Load_Friction 0x0022 100 100 000 Old_age Always - 0 226 Load-in_Time 0x0026 100 100 000 Old_age Always - 227 240 Head_Flying_Hours 0x0001 100 100 001 Pre-fail Offline - 0 SMART Error Log Version: 1 ATA Error Count: 2371 (device log contains only the most recent five errors) CR = Command Register [HEX] FR = Features Register [HEX] SC = Sector Count Register [HEX] SN = Sector Number Register [HEX] CL = Cylinder Low Register [HEX] CH = Cylinder High Register [HEX] DH = Device/Head Register [HEX] DC = Device Command Register [HEX] ER = Error register [HEX] ST = Status register [HEX] Powered_Up_Time is measured from power on, and printed as DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes, SS=sec, and sss=millisec. It "wraps" after 49.710 days. Error 2371 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:03:10.061 READ DMA f8 00 00 00 00 00 e0 00 00:03:10.061 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:03:10.053 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:03:10.053 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:03:10.053 READ NATIVE MAX ADDRESS Error 2370 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:03:03.328 READ DMA f8 00 00 00 00 00 e0 00 00:03:03.327 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:03:03.320 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:03:03.319 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:03:03.319 READ NATIVE MAX ADDRESS Error 2369 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:56.582 READ DMA f8 00 00 00 00 00 e0 00 00:02:56.582 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:56.574 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:56.574 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:56.574 READ NATIVE MAX ADDRESS Error 2368 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:49.809 READ DMA f8 00 00 00 00 00 e0 00 00:02:49.809 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:49.801 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:49.801 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:49.801 READ NATIVE MAX ADDRESS Error 2367 occurred at disk power-on lifetime: 7256 hours (302 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 05 1a 1b 00 e0 Error: UNC 5 sectors at LBA = 0x00001b1a = 6938 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 05 1a 1b 00 e0 00 00:02:43.056 READ DMA f8 00 00 00 00 00 e0 00 00:02:43.056 READ NATIVE MAX ADDRESS ec 00 00 00 00 00 a0 02 00:02:43.048 IDENTIFY DEVICE ef 03 45 00 00 00 a0 02 00:02:43.048 SET FEATURES [Set transfer mode] f8 00 00 00 00 00 e0 00 00:02:43.047 READ NATIVE MAX ADDRESS SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] Device does not support Selective Self Tests/Logging Do I need to get a new Hard Disk my PC ?

Read the article
Xenserver 5.5 U2 a bit unstable with an unstable W2003 VM

- by twistedbrain

In the last week I had to reboot the host system twice and the second one by means of the power button. The system is a Dell PE 6950 (4 Opteron dual core, 2,8 Ghz, 16 GB RAM, 900 GB disk in a RAID 10 array composed by 4 450GB 15000 rpm disks) with XenServer 5.5 U2. We're installing it and at now there are working in production a 2003 32 bit Windows server VM with 2 GB RAM, 3 vcpu and about 200 GB disk in 4 partition (12 GB boot, 20 GB program, 80 GB user data, 80 GB other data). The first time I was compressing many Windows 2003 folders (some tens GB by means of W2003 compressed folder option) from a Windows remote console and some hours before my colleague installed Backup Exec agent that was alredy installed and that required a reboot (that was pending). The console stopped responding, it was no more possible to connect by means of remote console or by means of the console of the XenCentre, it was still possible from the network to use the shared folder of the VM and the programs on it (2 db and a GIS program), but the print server didn't work any more and I couldn't give remote reboot from other domain controller hosts. I couldn't stop the virtual machine neither from the XenCentre, neither from the command line of the host also forcing the reboot. I had to reboot the host server. Yesterday it has been worst. I installed a template of another VM, CentOS 5.3 and then, put the DVD in the drive of the host. Then, before the install and after the boot I checked for defect the DVD and the W2003 VM began to respond slowly (I was connected by means of an administration remote console) the task manager showed only mid or low load, it is, only the first of the 3 vcpu was loaded (about 70%), while the other two were about at 20% and also the disk I/O was not so heavy. Then the users were not so happy because they couldn't any more use the MS word docs on the server. I immediately stopped the check of the DVD (to do that I had to force the stop of such Centos 5.3 VM), then some users could again use their docs, but other had still problems, so I decided to reboot the VM, but it doesn't stopped, neither from XenCenter, neither from command line, neither forcing the reboot. Then I tried to reboot of the host, but it didn't worked, neither from the XenCentre, neither from the host prompt (shutdown -r now as root: it told that it was shutting down, but then it didn't did that). So I had to power off by means of the power button of the server (before I tried some Magic SyS REQ, but I saw that Xen isn't compiled with this option enabled). What could you suggest about my problem, what can I look, search and see? In the W2003 VM logs there are no errors or warning to explain what happened. Some more exciting, amusing and inspiring words of poetry (the facts happened around 11 am): \# egrep -i 'err|warning' xensource.log [20100219 10:32:05.597|debug|culo|6301 unix-RPC|VBD.plug R:c81bcda701f6|xenops] watch: watching xenstore paths: [ /xapi/0/frontend/vbd/51712/hotplug; /local/domain/0/backend/vbd/0/51712/tapdisk-error ] with timeout 1200.000000 seconds [20100219 10:32:05.597|debug|culo|6301 unix-RPC|VBD.plug R:c81bcda701f6|xenops] watch: fired on /local/domain/0/backend/vbd/0/51712/tapdisk-error [20100219 10:32:14.314|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] watch: watching xenstore paths: [ /local/domain/0/backend/vbd/0/51712/shutdown-done; /local/domain/0/error/device/vbd/51712/error ] with timeout 1200.000000 seconds [20100219 10:32:14.337|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] xenstore-rm /local/domain/0/error/backend/vbd/0 [20100219 10:32:14.337|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] xenstore-rm /local/domain/0/error/device/vbd/51712 [20100219 10:32:14.338|debug|culo|6335 unix-RPC|VBD.unplug R:9258f54578d6|xenops] watch: fired on /local/domain/0/error/device/vbd/51712/error [20100219 10:53:48.903|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|helpers] Ignoring exception: INTERNAL_ERROR: [ Xb.Noent ] while Vmops.destroy_domain: Destroying domid 14 guest session [20100219 10:53:52.048|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14 [20100219 10:53:52.048|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51744 [20100219 10:53:52.085|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14 [20100219 10:53:52.086|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51728 [20100219 10:53:52.122|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/tap/14 [20100219 10:53:52.122|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51712 [20100219 10:53:52.127|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vbd/14 [20100219 10:53:52.128|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vbd/51760 [20100219 10:53:52.496|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths [20100219 10:53:52.497|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vif/14 [20100219 10:53:52.497|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vif/0 [20100219 10:53:53.385|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths [20100219 10:53:53.386|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/0/error/backend/vif/14 [20100219 10:53:53.386|debug|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|xenops] xenstore-rm /local/domain/14/error/device/vif/1 [20100219 10:53:53.389| warn|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|hotplug] Warning, deleting 'vif' entry from /xapi/14/hotplug/vif/0 [20100219 10:53:53.391| warn|culo|6418|Async.VM.hard_shutdown R:88d2095678f7|hotplug] Warning, deleting 'vif' entry from /xapi/14/hotplug/vif/1 [20100219 11:20:49.766|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|helpers] Ignoring exception: INTERNAL_ERROR: [ Xb.Noent ] while Vmops.destroy_domain: Destroying domid 11 guest session [20100219 11:20:50.339|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/tap/11 [20100219 11:20:50.339|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/832 [20100219 11:20:50.360|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/tap/11 [20100219 11:20:50.360|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/768 [20100219 11:20:50.366|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/vbd/11 [20100219 11:20:50.366|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vbd/5696 [20100219 11:20:50.753|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] Device.Vif.hard_shutdown about to blow away backend and error paths [20100219 11:20:50.754|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/0/error/backend/vif/11 [20100219 11:20:50.754|debug|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|xenops] xenstore-rm /local/domain/11/error/device/vif/1 [20100219 11:20:50.757| warn|culo|6484|Async.VM.clean_shutdown R:78d3c3e28cb6|hotplug] Warning, deleting 'vif' entry from /xapi/11/hotplug/vif/1 [20100219 11:28:13.803|debug|culo|6610 inet-RPC|Connection to VM console R:e9f8b76e8975|console] error: INTERNAL_ERROR: [ Unix.Unix_error(63, "connect", "") ]

Read the article
solved: puppet master REST API returns 403 when running under passenger works when master runs from command line

- by Anadi Misra

I am using the standard auth.conf provided in puppet install for the puppet master which is running through passenger under Nginx. However for most of the catalog, files and certitifcate request I get a 403 response. ### Authenticated paths - these apply only when the client ### has a valid certificate and is thus authenticated # allow nodes to retrieve their own catalog path ~ ^/catalog/([^/]+)$ method find allow $1 # allow nodes to retrieve their own node definition path ~ ^/node/([^/]+)$ method find allow $1 # allow all nodes to access the certificates services path ~ ^/certificate_revocation_list/ca method find allow * # allow all nodes to store their reports path /report method save allow * # unconditionally allow access to all file services # which means in practice that fileserver.conf will # still be used path /file allow * ### Unauthenticated ACL, for clients for which the current master doesn't ### have a valid certificate; we allow authenticated users, too, because ### there isn't a great harm in letting that request through. # allow access to the master CA path /certificate/ca auth any method find allow * path /certificate/ auth any method find allow * path /certificate_request auth any method find, save allow * path /facts auth any method find, search allow * # this one is not stricly necessary, but it has the merit # of showing the default policy, which is deny everything else path / auth any Puppet master however does not seems to be following this as I get this error on client [amisr1@blramisr195602 ~]$ sudo puppet agent --no-daemonize --verbose --server bangvmpllda02.XXXXX.com [sudo] password for amisr1: Starting Puppet client version 3.0.1 Warning: Unable to fetch my node definition, but the agent run will continue: Warning: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /certificate_revocation_list/ca [find] at :110 Info: Retrieving plugin Error: /File[/var/lib/puppet/lib]: Failed to generate additional resources using 'eval_generate: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /file_metadata/plugins [search] at :110 Error: /File[/var/lib/puppet/lib]: Could not evaluate: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /file_metadata/plugins [find] at :110 Could not retrieve file metadata for puppet://devops.XXXXX.com/plugins: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /file_metadata/plugins [find] at :110 Error: Could not retrieve catalog from remote server: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /catalog/blramisr195602.XXXXX.com [find] at :110 Using cached catalog Error: Could not retrieve catalog; skipping run Error: Could not send report: Error 403 on SERVER: Forbidden request: XX.XXX.XX.XX(XX.XXX.XX.XX) access to /report/blramisr195602.XXXXX.com [save] at :110 and the server logs show XX.XXX.XX.XX - - [10/Dec/2012:14:46:52 +0530] "GET /production/certificate_revocation_list/ca? HTTP/1.1" 403 102 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:52 +0530] "GET /production/file_metadatas/plugins?links=manage&recurse=true&&ignore=---+%0A++-+%22.svn%22%0A++-+CVS%0A++-+%22.git%22&checksum_type=md5 HTTP/1.1" 403 95 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:52 +0530] "GET /production/file_metadata/plugins? HTTP/1.1" 403 93 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:53 +0530] "POST /production/catalog/blramisr195602.XXXXX.com HTTP/1.1" 403 106 "-" "Ruby" XX.XXX.XX.XX - - [10/Dec/2012:14:46:53 +0530] "PUT /production/report/blramisr195602.XXXXX.com HTTP/1.1" 403 105 "-" "Ruby" thefile server conf file is as follows (and goin by what they say on puppet site, It is better to regulate access in auth.conf for reaching file server and then allow file server to server all) [files] path /apps/puppet/files allow * [private] path /apps/puppet/private/%H allow * [modules] allow * I am using server and client version 3 Nginx has been compiled using the following options nginx version: nginx/1.3.9 built by gcc 4.4.6 20120305 (Red Hat 4.4.6-4) (GCC) TLS SNI support enabled configure arguments: --prefix=/apps/nginx --conf-path=/apps/nginx/nginx.conf --pid-path=/apps/nginx/run/nginx.pid --error-log-path=/apps/nginx/logs/error.log --http-log-path=/apps/nginx/logs/access.log --with-http_ssl_module --with-http_gzip_static_module --add-module=/usr/lib/ruby/gems/1.8/gems/passenger-3.0.18/ext/nginx --add-module=/apps/Downloads/nginx/nginx-auth-ldap-master/ and the standard nginx puppet master conf server { ssl on; listen 8140 ssl; server_name _; passenger_enabled on; passenger_set_cgi_param HTTP_X_CLIENT_DN $ssl_client_s_dn; passenger_set_cgi_param HTTP_X_CLIENT_VERIFY $ssl_client_verify; passenger_min_instances 5; access_log logs/puppet_access.log; error_log logs/puppet_error.log; root /apps/nginx/html/rack/public; ssl_certificate /var/lib/puppet/ssl/certs/bangvmpllda02.XXXXXX.com.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/bangvmpllda02.XXXXXX.com.pem; ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem; ssl_client_certificate /var/lib/puppet/ssl/certs/ca.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_prefer_server_ciphers on; ssl_verify_client optional; ssl_verify_depth 1; ssl_session_cache shared:SSL:128m; ssl_session_timeout 5m; } Puppet is picking up the correct settings from the files mentioned because config print command points to /etc/puppet [amisr1@bangvmpllDA02 puppet]$ sudo puppet config print | grep conf async_storeconfigs = false authconfig = /etc/puppet/namespaceauth.conf autosign = /etc/puppet/autosign.conf catalog_cache_terminus = store_configs confdir = /etc/puppet config = /etc/puppet/puppet.conf config_file_name = puppet.conf config_version = "" configprint = all configtimeout = 120 dblocation = /var/lib/puppet/state/clientconfigs.sqlite3 deviceconfig = /etc/puppet/device.conf fileserverconfig = /etc/puppet/fileserver.conf genconfig = false hiera_config = /etc/puppet/hiera.yaml localconfig = /var/lib/puppet/state/localconfig name = config rest_authconfig = /etc/puppet/auth.conf storeconfigs = true storeconfigs_backend = puppetdb tagmap = /etc/puppet/tagmail.conf thin_storeconfigs = false I checked the firewall rules on this VM; 80, 443, 8140, 3000 are allowed. Do I still have to tweak any specifics to auth.conf for getting this to work? Update I added verbose logging to the puppet master and restarted nginx; here's the additional info I see in logs Mon Dec 10 18:19:15 +0530 2012 Puppet (err): Could not resolve 10.209.47.31: no name for 10.209.47.31 Mon Dec 10 18:19:15 +0530 2012 access[/] (info): defaulting to no access for 10.209.47.31 Mon Dec 10 18:19:15 +0530 2012 Puppet (warning): Denying access: Forbidden request: 10.209.47.31(10.209.47.31) access to /file_metadata/plugins [find] at :111 Mon Dec 10 18:19:15 +0530 2012 Puppet (err): Forbidden request: 10.209.47.31(10.209.47.31) access to /file_metadata/plugins [find] at :111 10.209.47.31 - - [10/Dec/2012:18:19:15 +0530] "GET /production/file_metadata/plugins? HTTP/1.1" 403 93 "-" "Ruby" On the agent machine facter fqdn and hostname both return a fully qualified host name [amisr1@blramisr195602 ~]$ sudo facter fqdn blramisr195602.XXXXXXX.com I then updated the agent configuration to add dns_alt_names = 10.209.47.31 cleaned all certificates on master and agent and regenerated the certificates and signed them on master using the option --allow-dns-alt-names [amisr1@bangvmpllDA02 ~]$ sudo puppet cert sign blramisr195602.XXXXXX.com Error: CSR 'blramisr195602.XXXXXX.com' contains subject alternative names (DNS:10.209.47.31, DNS:blramisr195602.XXXXXX.com), which are disallowed. Use `puppet cert --allow-dns-alt-names sign blramisr195602.XXXXXX.com` to sign this request. [amisr1@bangvmpllDA02 ~]$ sudo puppet cert --allow-dns-alt-names sign blramisr195602.XXXXXX.com Signed certificate request for blramisr195602.XXXXXX.com Removing file Puppet::SSL::CertificateRequest blramisr195602.XXXXXX.com at '/var/lib/puppet/ssl/ca/requests/blramisr195602.XXXXXX.com.pem' however, that doesn't help either; I get same errors as before. Not sure why in the logs it shows comparing access rules by IP and not hostname. Is there any Nginx configuration to change this behavior?

Read the article
pasenger does not start puppet master under nginx

- by Anadi Misra

On the server [root@bangvmpllDA02 logs]# ruby -v ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] [root@bangvmpllDA02 logs]# puppet --version 3.0.1 and [root@bangvmpllDA02 logs]# service nginx configtest nginx: the configuration file /apps/nginx/nginx.conf syntax is ok nginx: configuration file /apps/nginx/nginx.conf test is successful [root@bangvmpllDA02 logs]# service nginx status nginx (pid 25923 25921 25920 25917 25908) is running... [root@bangvmpllDA02 logs]# however none of my agents are able to connect to the master, they all fail with errors like so [amisr1@blramisr195602 ~]$ puppet agent --test --verbose --server bangvmpllda02.XXX.com Info: Creating a new SSL certificate request for blramisr195602.XXX.com Info: Certificate Request fingerprint (SHA256): 26:EB:08:1F:82:32:E4:03:7A:64:8E:30:A3:99:93:26:E6:66:B9:B0:49:B6:08:F9:67:CA:1B:0C:00:B9:1D:41 Error: Could not request certificate: Error 405 on SERVER: <html> <head><title>405 Not Allowed</title></head> <body bgcolor="white"> <center><h1>405 Not Allowed</h1></center> <hr><center>nginx</center> </body> </html> Exiting; failed to retrieve certificate and waitforcert is disabled when I check logs on puppet master [root@bangvmpllDA02 logs]# tail puppet_access.log [05/Dec/2012:17:45:18 +0530] "GET /production/certificate/ca? HTTP/1.1" 404 162 "-" "Ruby" [05/Dec/2012:18:32:23 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" and the error logs show that nginx is not really able to process the request well 2012/12/05 18:33:33 [error] 25920#0: *23 open() "/etc/puppet/rack/public/production/certificate/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:33:33 [error] 25920#0: *24 open() "/etc/puppet/rack/public/production/certificate_request/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *27 open() "/etc/puppet/rack/public/production/certificate/ca" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate/ca? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *28 open() "/etc/puppet/rack/public/production/certificate_request/blramisr195602.XXX.com" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate_request/blramisr195602.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" Passenger does not show any application groups either [root@bangvmpllDA02 nginx]# passenger-status ----------- General information ----------- max = 15 count = 0 active = 0 inactive = 0 Waiting on global queue: 0 ----------- Application groups ----------- [root@bangvmpllDA02 nginx]# here's my nginx configuration [root@bangvmpllDA02 logs]# cat ../nginx.conf user puppet; worker_processes 4; #error_log logs/error.log; #error_log logs/error.log notice; error_log logs/error.log info; #pid logs/nginx.pid; events { use epoll; worker_connections 1024; } http { include mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log logs/access.log main; sendfile on; #tcp_nopush on; server_tokens off; #keepalive_timeout 0; keepalive_timeout 120; gzip on; gzip_http_version 1.1; gzip_disable "msie6"; gzip_vary on; gzip_min_length 1100; gzip_buffers 64 8k; gzip_comp_level 3; gzip_proxied any; gzip_types text/plain text/css application/x-javascript text/xml application/xml; server { listen 80; server_name bangvmpllda02.XXXX.com; charset utf-8; #access_log logs/http.access.log main; location / { root html; index index.html index.htm index.php; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # location ~ \.php$ { root html; fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param SCRIPT_NAME $fastcgi_script_name; include fastcgi_params; } # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # location ~ /\.ht { access_log off; log_not_found off; deny all; } location ~* \.(jpg|jpeg|gif|png|css|js|ico|xml)$ { access_log off; log_not_found off; expires 2d; } } # Passenger needed for puppet passenger_root /usr/lib/ruby/gems/1.8/gems/passenger-3.0.18; passenger_ruby /usr/bin/ruby; passenger_max_pool_size 15; server { ssl on; listen 8140 default ssl; server_name bangvmpllda02.XXXX.com; passenger_enabled on; passenger_set_cgi_param HTTP_X_CLIENT_DN $ssl_client_s_dn; passenger_set_cgi_param HTTP_X_CLIENT_VERIFY $ssl_client_verify; passenger_min_instances 5; access_log logs/puppet_access.log; error_log logs/puppet_error.log; root /etc/puppet/rack/public; ssl_certificate /var/lib/puppet/ssl/certs/bangvmpllda02.XXX.com.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/bangvmpllda02.XXX.com.pem; ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem; ssl_client_certificate /var/lib/puppet/ssl/certs/ca.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_prefer_server_ciphers on; ssl_verify_client optional; ssl_verify_depth 1; ssl_session_cache shared:SSL:128m; ssl_session_timeout 5m; } } and the puppet.conf [main] # The Puppet log directory. # The default value is '$vardir/log'. logdir = /var/log/puppet # Where Puppet PID files are kept. # The default value is '$vardir/run'. rundir = /var/run/puppet dns_alt_names = devops.XXXX.com,devops confdir = /etc/puppet vardir = /var/lib/puppet storeconfigs = true storeconfigs_backend = puppetdb thin_storeconfigs = false async_storeconfigs = false ssl_client_header = SSL_CLIENT_S_D ssl_client_verify_header = SSL_CLIENT_VERIFY # Where SSL certificates are kept. # The default value is '$confdir/ssl'. ssldir = $vardir/ssl any ideas where am I going wrong? I checkthe directory permissions; /usr/share/puppet, /etc/puppet and /var/lib/puppet (and files inside them) are owned by puppet user.

Read the article
solved: passenger(mod_rails) fails to start puppet master under nginx

- by Anadi Misra

On the server [root@bangvmpllDA02 logs]# ruby -v ruby 1.8.7 (2011-06-30 patchlevel 352) [x86_64-linux] [root@bangvmpllDA02 logs]# puppet --version 3.0.1 and [root@bangvmpllDA02 logs]# service nginx configtest nginx: the configuration file /apps/nginx/nginx.conf syntax is ok nginx: configuration file /apps/nginx/nginx.conf test is successful [root@bangvmpllDA02 logs]# service nginx status nginx (pid 25923 25921 25920 25917 25908) is running... [root@bangvmpllDA02 logs]# however none of my agents are able to connect to the master, they all fail with errors like so [amisr1@blramisr195602 ~]$ puppet agent --test --verbose --server bangvmpllda02.XXX.com Info: Creating a new SSL certificate request for blramisr195602.XXX.com Info: Certificate Request fingerprint (SHA256): 26:EB:08:1F:82:32:E4:03:7A:64:8E:30:A3:99:93:26:E6:66:B9:B0:49:B6:08:F9:67:CA:1B:0C:00:B9:1D:41 Error: Could not request certificate: Error 405 on SERVER: <html> <head><title>405 Not Allowed</title></head> <body bgcolor="white"> <center><h1>405 Not Allowed</h1></center> <hr><center>nginx</center> </body> </html> Exiting; failed to retrieve certificate and waitforcert is disabled when I check logs on puppet master [root@bangvmpllDA02 logs]# tail puppet_access.log [05/Dec/2012:17:45:18 +0530] "GET /production/certificate/ca? HTTP/1.1" 404 162 "-" "Ruby" [05/Dec/2012:18:32:23 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1" 404 162 "-" "-" [05/Dec/2012:18:33:33 +0530] "PUT /production/certificate_request/sl63anadi.XXX.com HTTP/1.1" 405 166 "-" "-" and the error logs show that nginx is not really able to process the request well 2012/12/05 18:33:33 [error] 25920#0: *23 open() "/etc/puppet/rack/public/production/certificate/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:33:33 [error] 25920#0: *24 open() "/etc/puppet/rack/public/production/certificate_request/sl63anadi.XXX.com" failed (2: No such file or directory), client: 10.209.47.26, server: , request: "GET /production/certificate_request/sl63anadi.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *27 open() "/etc/puppet/rack/public/production/certificate/ca" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate/ca? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" 2012/12/05 18:47:56 [error] 25923#0: *28 open() "/etc/puppet/rack/public/production/certificate_request/blramisr195602.XXX.com" failed (2: No such file or directory), client: 10.209.47.31, server: , request: "GET /production/certificate_request/blramisr195602.XXX.com? HTTP/1.1", host: "bangvmpllda02.XXX.com:8140" Passenger does not show any application groups either [root@bangvmpllDA02 nginx]# passenger-status ----------- General information ----------- max = 15 count = 0 active = 0 inactive = 0 Waiting on global queue: 0 ----------- Application groups ----------- [root@bangvmpllDA02 nginx]# here's my nginx configuration [root@bangvmpllDA02 logs]# cat ../nginx.conf user puppet; worker_processes 4; #error_log logs/error.log; #error_log logs/error.log notice; error_log logs/error.log info; #pid logs/nginx.pid; events { use epoll; worker_connections 1024; } http { include mime.types; default_type application/octet-stream; log_format main '$remote_addr - $remote_user [$time_local] "$request" ' '$status $body_bytes_sent "$http_referer" ' '"$http_user_agent" "$http_x_forwarded_for"'; access_log logs/access.log main; sendfile on; #tcp_nopush on; server_tokens off; #keepalive_timeout 0; keepalive_timeout 120; gzip on; gzip_http_version 1.1; gzip_disable "msie6"; gzip_vary on; gzip_min_length 1100; gzip_buffers 64 8k; gzip_comp_level 3; gzip_proxied any; gzip_types text/plain text/css application/x-javascript text/xml application/xml; server { listen 80; server_name bangvmpllda02.XXXX.com; charset utf-8; #access_log logs/http.access.log main; location / { root html; index index.html index.htm index.php; } #error_page 404 /404.html; # redirect server error pages to the static page /50x.html # error_page 500 502 503 504 /50x.html; location = /50x.html { root html; } # proxy the PHP scripts to Apache listening on 127.0.0.1:80 # #location ~ \.php$ { # proxy_pass http://127.0.0.1; #} # pass the PHP scripts to FastCGI server listening on 127.0.0.1:9000 # location ~ \.php$ { root html; fastcgi_pass unix:/var/run/php-fpm/php-fpm.sock; fastcgi_index index.php; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; fastcgi_param SCRIPT_NAME $fastcgi_script_name; include fastcgi_params; } # deny access to .htaccess files, if Apache's document root # concurs with nginx's one # location ~ /\.ht { access_log off; log_not_found off; deny all; } location ~* \.(jpg|jpeg|gif|png|css|js|ico|xml)$ { access_log off; log_not_found off; expires 2d; } } # Passenger needed for puppet passenger_root /usr/lib/ruby/gems/1.8/gems/passenger-3.0.18; passenger_ruby /usr/bin/ruby; passenger_max_pool_size 15; server { ssl on; listen 8140 default ssl; server_name bangvmpllda02.XXXX.com; passenger_enabled on; passenger_set_cgi_param HTTP_X_CLIENT_DN $ssl_client_s_dn; passenger_set_cgi_param HTTP_X_CLIENT_VERIFY $ssl_client_verify; passenger_min_instances 5; access_log logs/puppet_access.log; error_log logs/puppet_error.log; root /etc/puppet/rack/public; ssl_certificate /var/lib/puppet/ssl/certs/bangvmpllda02.XXX.com.pem; ssl_certificate_key /var/lib/puppet/ssl/private_keys/bangvmpllda02.XXX.com.pem; ssl_crl /var/lib/puppet/ssl/ca/ca_crl.pem; ssl_client_certificate /var/lib/puppet/ssl/certs/ca.pem; ssl_ciphers SSLv2:-LOW:-EXPORT:RC4+RSA; ssl_prefer_server_ciphers on; ssl_verify_client optional; ssl_verify_depth 1; ssl_session_cache shared:SSL:128m; ssl_session_timeout 5m; } } and the puppet.conf [main] # The Puppet log directory. # The default value is '$vardir/log'. logdir = /var/log/puppet # Where Puppet PID files are kept. # The default value is '$vardir/run'. rundir = /var/run/puppet dns_alt_names = devops.XXXX.com,devops confdir = /etc/puppet vardir = /var/lib/puppet storeconfigs = true storeconfigs_backend = puppetdb thin_storeconfigs = false async_storeconfigs = false ssl_client_header = SSL_CLIENT_S_D ssl_client_verify_header = SSL_CLIENT_VERIFY # Where SSL certificates are kept. # The default value is '$confdir/ssl'. ssldir = $vardir/ssl any ideas where am I going wrong? I checkthe directory permissions; /usr/share/puppet, /etc/puppet and /var/lib/puppet (and files inside them) are owned by puppet user. Solved The simple solution to my complicated problem was that I had placed the config.ru in wrong place moved it to /etc/puppet/rack , it was in /etc/puppet/rack/public Well!!! :-/

Read the article
Connect ps/2->usb keyboard to linux?

- by Daniel

I have a lovely ancient ergonomic keyboard (no name SK - 6000) connected via a DIN-ps/2 adapter to a ps/2-usb adapter to my docking station. After Grub it stops working. It takes either suspending and waking up or replugging it while Linux is running to get it to work. No extra kernel modules get loaded for this. When it works and I restart without power off, it will work immediately. Even when it does not work, it is visible (lsusb device number varies but output is identical whether working or not): $ lsusb -v -s 001:006 Bus 001 Device 006: ID 0a81:0205 Chesen Electronics Corp. PS/2 Keyboard+Mouse Adapter Device Descriptor: bLength 18 bDescriptorType 1 bcdUSB 1.10 bDeviceClass 0 (Defined at Interface level) bDeviceSubClass 0 bDeviceProtocol 0 bMaxPacketSize0 8 idVendor 0x0a81 Chesen Electronics Corp. idProduct 0x0205 PS/2 Keyboard+Mouse Adapter bcdDevice 0.10 iManufacturer 1 CHESEN iProduct 2 PS2 to USB Converter iSerial 0 bNumConfigurations 1 Configuration Descriptor: bLength 9 bDescriptorType 2 wTotalLength 59 bNumInterfaces 2 bConfigurationValue 1 iConfiguration 2 PS2 to USB Converter bmAttributes 0xa0 (Bus Powered) Remote Wakeup MaxPower 100mA Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 0 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 1 Keyboard iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 64 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x81 EP 1 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Interface Descriptor: bLength 9 bDescriptorType 4 bInterfaceNumber 1 bAlternateSetting 0 bNumEndpoints 1 bInterfaceClass 3 Human Interface Device bInterfaceSubClass 1 Boot Interface Subclass bInterfaceProtocol 2 Mouse iInterface 0 HID Device Descriptor: bLength 9 bDescriptorType 33 bcdHID 1.10 bCountryCode 0 Not supported bNumDescriptors 1 bDescriptorType 34 Report wDescriptorLength 148 Report Descriptors: ** UNAVAILABLE ** Endpoint Descriptor: bLength 7 bDescriptorType 5 bEndpointAddress 0x82 EP 2 IN bmAttributes 3 Transfer Type Interrupt Synch Type None Usage Type Data wMaxPacketSize 0x0008 1x 8 bytes bInterval 10 Device Status: 0x0000 (Bus Powered) $ ll -R /sys/bus/hid/drivers/ /sys/bus/hid/drivers/: total 0 drwxr-xr-x 2 root root 0 Jul 8 2012 generic-usb/ /sys/bus/hid/drivers/generic-usb: total 0 lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:046D:C03D.0003 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.2/1-1.2.2:1.0/0003:046D:C03D.0003/ lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:0A81:0205.0001 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/ lrwxrwxrwx 1 root root 0 Jul 7 23:33 0003:0A81:0205.0002 -> ../../../../devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.1/0003:0A81:0205.0002/ --w------- 1 root root 4096 Jul 7 23:32 bind lrwxrwxrwx 1 root root 0 Jul 7 23:33 module -> ../../../../module/usbhid/ --w------- 1 root root 4096 Jul 7 23:32 new_id --w------- 1 root root 4096 Jul 8 2012 uevent --w------- 1 root root 4096 Jul 7 23:32 unbind When replugging, dmesg shows this (which except for the 1st line and different input numbers already came at boot time): [ 1583.295385] usb 1-1.2.1: new low-speed USB device number 6 using ehci_hcd [ 1583.446514] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/input/input17 [ 1583.446817] generic-usb 0003:0A81:0205.0001: input,hidraw0: USB HID v1.10 Keyboard [CHESEN PS2 to USB Converter] on usb-0000:00:1a.0-1.2.1/input0 [ 1583.454764] input: CHESEN PS2 to USB Converter as /devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.1/input/input18 [ 1583.455534] generic-usb 0003:0A81:0205.0002: input,hidraw1: USB HID v1.10 Mouse [CHESEN PS2 to USB Converter] on usb-0000:00:1a.0-1.2.1/input1 [ 1583.455578] usbcore: registered new interface driver usbhid [ 1583.455584] usbhid: USB HID core driver So I tried $ sudo udevadm test /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 run_command: calling: test adm_test: version 175 This program is for debugging only, it does not run any program, specified by a RUN key. It may show incorrect results, because some values may be different, or not available at a simulation run. parse_file: reading '/lib/udev/rules.d/40-crda.rules' as rules file parse_file: reading '/lib/udev/rules.d/40-fuse.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/40-usb-media-players.rules' as rules file parse_file: reading '/lib/udev/rules.d/40-usb_modeswitch.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/42-qemu-usb.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/69-cd-sensors.rules' as rules file add_rule: IMPORT found builtin 'usb_id', replacing /lib/udev/rules.d/69-cd-sensors.rules:76 ... parse_file: reading '/lib/udev/rules.d/77-mm-usb-device-blacklist.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/85-usbmuxd.rules' as rules file ... parse_file: reading '/lib/udev/rules.d/95-upower-hid.rules' as rules file parse_file: reading '/lib/udev/rules.d/95-upower-wup.rules' as rules file parse_file: reading '/lib/udev/rules.d/97-bluetooth-hid2hci.rules' as rules file udev_rules_new: rules use 271500 bytes tokens (22625 * 12 bytes), 44331 bytes buffer udev_rules_new: temporary index used 76320 bytes (3816 * 20 bytes) udev_device_new_from_syspath: device 0x7f78a5e4d2d0 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' udev_device_new_from_syspath: device 0x7f78a5e5f820 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' udev_device_read_db: device 0x7f78a5e5f820 filled with db file data udev_device_new_from_syspath: device 0x7f78a5e60270 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001' udev_device_new_from_syspath: device 0x7f78a5e609c0 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0' udev_device_new_from_syspath: device 0x7f78a5e61160 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1' udev_device_new_from_syspath: device 0x7f78a5e61960 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2' udev_device_new_from_syspath: device 0x7f78a5e62150 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1/1-1' udev_device_new_from_syspath: device 0x7f78a5e62940 has devpath '/devices/pci0000:00/0000:00:1a.0/usb1' udev_device_new_from_syspath: device 0x7f78a5e630f0 has devpath '/devices/pci0000:00/0000:00:1a.0' udev_device_new_from_syspath: device 0x7f78a5e638a0 has devpath '/devices/pci0000:00' udev_event_execute_rules: no node name set, will use kernel supplied name 'hidraw0' udev_node_add: creating device node '/dev/hidraw0', devnum=251:0, mode=0600, uid=0, gid=0 udev_node_mknod: preserve file '/dev/hidraw0', because it has correct dev_t udev_node_mknod: preserve permissions /dev/hidraw0, 020600, uid=0, gid=0 node_symlink: preserve already existing symlink '/dev/char/251:0' to '../hidraw0' udev_device_update_db: created empty file '/run/udev/data/c251:0' for '/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0' ACTION=add DEVNAME=/dev/hidraw0 DEVPATH=/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 MAJOR=251 MINOR=0 SUBSYSTEM=hidraw UDEV_LOG=6 USEC_INITIALIZED=969079051 The later lines sound like it's already there. And none of these awakes the keyboard: $ sudo udevadm trigger --verbose --sysname-match=usb* /sys/devices/pci0000:00/0000:00:1a.0/usb1 /sys/devices/pci0000:00/0000:00:1a.0/usbmon/usbmon1 /sys/devices/pci0000:00/0000:00:1d.0/usb2 /sys/devices/pci0000:00/0000:00:1d.0/usbmon/usbmon2 /sys/devices/virtual/usbmon/usbmon0 $ sudo udevadm trigger --verbose --sysname-match=hidraw0 /sys/devices/pci0000:00/0000:00:1a.0/usb1/1-1/1-1.2/1-1.2.1/1-1.2.1:1.0/0003:0A81:0205.0001/hidraw/hidraw0 $ sudo udevadm trigger I also tried this to no avail: # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/bind ksh: echo: write to 1 failed [No such device] # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/unbind # echo -n 0003:0A81:0205.0001 > /sys/bus/hid/drivers/generic-usb/bind # echo usb1 >/sys/bus/usb/drivers/usb/unbind # echo usb1 >/sys/bus/usb/drivers/usb/bind What else should I try to get the same result as replugging or suspending, by just issuing a command?

Read the article
Why are my hard drives failing?

- by WishCow

I have a small Ubuntu server running at home, with 2 HDDs. There are two software raids (raid1) on the disks, managed by mdadm, which I believe is irrelevant, but mentioning it anyway. Both of the HDDs are Western Digital, and have been used for around 2 years, when one of them started making clicking noises, and died. I figured that maybe it's natural after 2 years, so I bought a new one, and resynced the raid arrays. After about a month, the other drive also died. I didn't get suspicious, since both drives have been bought at the same time, it's not that surprising to see both of them near each other, so I bought another one. So far, 2 old drives failed, and 2 brand new in the system. After one month, one of the new drives died. This is when it started getting suspicious. Since the PC was put together from some really old parts (think AthlonXP), I figured that maybe the motherboard's SATA controller is the culprit. Of course you can't switch parts easily in an old PC like this, so I bought a whole system, new MB, new CPU, new RAM. Took the just failed drive back, since it was under warranty, and got it replaced. So it is up to 2 failed drives from the old ones, and 1 failed drive from the new ones. No problems, for 1 month. After that errors were creeping up again in /var/log/messages, and mdadm was reporting raid array failures. I started tearing my hair out. Everything is new in the system, it's up to the third brand new HDD, it's simply not possible that all of the new drives that I bought were faulty. Let's see what is still common... the cables. Okay, long shot, let's replace the SATA cables. Take HDD back, smile to the guy at the counter and say that I'm really unlucky. He replaces the HDD. I come home, one month passes and one of HDDs fails, again. I'm not joking. Two of the brand new HDDs have failed. Maybe it's a bug in the OS. Let's see what the manufacturer's testing tool says. Download testing tool, burn it to a CD, reboot, leave HDD testing overnight. Test says that the drive is faulty, and I should back up everything, if I still can. I don't know what's happening, but it does not look like a software problem, something is definitely thrashing the HDDs. I should mention now, that the whole system is in a shoebox. Since there are a load of "build your own ikea case" stuff, I thought there shouldn't be any problems throwing the thing in a box, and stuffing it away somewhere. The box is well ventilated, but I thought that just maybe the drives were overheating. There is no other possible answer to this. So I took the HDD back, and got it replaced (for the 3rd time), and bought HDD coolers. And just now, I have heard the sound of doom. click click whizzzzzzzzz. SSH into the box: You have new mail! mail r 1 DegradedArrayEvent on /dev/md0 ... dmesg output: [47128.000051] ata3: lost interrupt (Status 0x50) [47128.000097] end_request: I/O error, dev sda, sector 58588863 [47128.000134] md: super_written gets error=-5, uptodate=0 [48043.976054] ata3: lost interrupt (Status 0x50) [48043.976086] ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen [48043.976132] ata3.00: cmd c8/00:18:bf:40:52/00:00:00:00:00/e1 tag 0 dma 12288 in [48043.976135] res 40/00:00:00:4f:c2/00:00:00:00:00/00 Emask 0x4 (timeout) [48043.976208] ata3.00: status: { DRDY } [48043.976241] ata3: soft resetting link [48044.148446] ata3.00: configured for UDMA/133 [48044.148457] ata3.00: device reported invalid CHS sector 0 [48044.148477] ata3: EH complete Recap: No possibility of overheating 6 drives have failed, 4 of those have been brand new. I'm not sure now that the original two have been faulty, or suffered the same thing that the new ones. There is nothing common in the system, apart from the OS which is Ubuntu Karmic now (started with Jaunty). New MB, new CPU, new RAM, new SATA cables. No, the little holes on the HDD are not covered I'm crying. Really. I don't have the face to return to the store now, it's not possible for 4 drives to fail under 4 months. A few ideas that I have been thinking: Is it possible that I fuck up something when I partition and resync the drives? Can it be so bad that it physicaly wrecks the drive? (since the vendor supplied tool says that the drive is damaged) I do the partitoning with fdisk, and use the same block size for the raid1 partitions (I check the exact block sizes with fdisk -lu) Is it possible that the linux kernel or mdadm, or something is not compatible with this exact brand of HDDs, and thrashes them? Is it possible that it may be the shoebox? Try placing it somewhere else? It's under a shelf now, so humidity is not a problem either. Is it possible that a normal PC case will solve my problem (I'm going to shoot myself then)? I will get a picture tomorrow. Am I just simply cursed? Any help or speculation is greatly appreciated. Edit: The power strip is guarded against overvoltage. Edit2: I have moved inbetween these 4 months, so the possibility of the cause being "dirty" electricity in both places, is very low. Edit3: I have checked the voltages in the BIOS (couldn't borrow a multimeter), and they are all seem correct, the biggest discrepancy is in the 12V, because it's supplying 11.3. Should I be worried about that? Edit4: I put my desktop PC's PSU into the server. The BIOS reported much more accurate voltage readings, and also it has successfully rebuilt the raid1 array, which took some 3-4 hours, so I feel a little positive now. Will get a new PSU tomorrow to test with that. Also, attaching the picture about the box: (disregard the 3rd drive)

Read the article
Windows Server 2008 R2 network adapter stops working, requires hard reboot

- by Geoff Dalgas

TL;DR version: Turns out this was a Windows Server 2008 R2 kernel networking bug. After siccing Microsoft support on it, we (eventually) got an unpublished kernel hotfix from Microsoft to address it. If you, too, are experiencing mysterious low-level network driver failures requiring a reboot/bluescreen cycle, you might want that hotfix (or maybe Service Pack 1 whenever it is released, too.) We have been using HAProxy along with heartbeat from the Linux-HA project. We are using two linux instances to provide a failover. Each server has with their own public IP and a single IP which is shared between the two using a virtual interface (eth1:1) at IP: 69.59.196.211 The virtual interface (eth1:1) IP 69.59.196.211 is configured as the gateway for the windows servers behind them and we use ip_forwarding to route traffic. We are experiencing an occasional network outage on one of our windows servers behind our linux gateways. HAProxy will detect the server is offline which we can verify by remoting to the failed server and attempting to ping the gateway: Pinging 69.59.196.211 with 32 bytes of data: Reply from 69.59.196.220: Destination host unreachable. Running arp -a on this failed server shows that there is no entry for the gateway address (69.59.196.211): Interface: 69.59.196.220 --- 0xa Internet Address Physical Address Type 69.59.196.161 00-26-88-63-c7-80 dynamic 69.59.196.210 00-15-5d-0a-3e-0e dynamic 69.59.196.212 00-21-5e-4d-45-c9 dynamic 69.59.196.213 00-15-5d-00-b2-0d dynamic 69.59.196.215 00-21-5e-4d-61-1a dynamic 69.59.196.217 00-21-5e-4d-2c-e8 dynamic 69.59.196.219 00-21-5e-4d-38-e5 dynamic 69.59.196.221 00-15-5d-00-b2-0d dynamic 69.59.196.222 00-15-5d-0a-3e-09 dynamic 69.59.196.223 ff-ff-ff-ff-ff-ff static 224.0.0.22 01-00-5e-00-00-16 static 224.0.0.252 01-00-5e-00-00-fc static 225.0.0.1 01-00-5e-00-00-01 static On our linux gateway instances arp -a shows: peak-colo-196-220.peak.org (69.59.196.220) at <incomplete> on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-222.peak.org (69.59.196.222) at 00:15:5d:0a:3e:09 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 Why would arp occasionally set the entry for this failed server as <incomplete>? Should we be defining our arp entries statically? I've always left arp alone since it works 99% of the time, but in this one instance it appears to be failing. Are there any additional troubleshooting steps we can take help resolve this issue? THINGS WE HAVE TRIED I added a static arp entry for testing on one of the linux gateways which still didn't help. root@haproxy2:~# arp -a peak-colo-196-215.peak.org (69.59.196.215) at 00:21:5e:4d:61:1a [ether] on eth1 peak-colo-196-221.peak.org (69.59.196.221) at 00:15:5d:00:b2:0d [ether] on eth1 stackoverflow.com (69.59.196.212) at 00:21:5e:4d:45:c9 [ether] on eth1 peak-colo-196-219.peak.org (69.59.196.219) at 00:21:5e:4d:38:e5 [ether] on eth1 peak-colo-196-209.peak.org (69.59.196.209) at 00:26:88:63:c7:80 [ether] on eth1 peak-colo-196-217.peak.org (69.59.196.217) at 00:21:5e:4d:2c:e8 [ether] on eth1 peak-colo-196-220.peak.org (69.59.196.220) at 00:21:5e:4d:30:8d [ether] PERM on eth1 root@haproxy2:~# arp -i eth1 -s 69.59.196.220 00:21:5e:4d:30:8d root@haproxy2:~# ping 69.59.196.220 PING 69.59.196.220 (69.59.196.220) 56(84) bytes of data. --- 69.59.196.220 ping statistics --- 7 packets transmitted, 0 received, 100% packet loss, time 6006ms Rebooting the windows web server solves this issue temporarily with no other changes to the network but our experience shows this issue will come back. Swapping network cards and switches I noticed the link light on the port of the switch for the failed windows server was running at 100Mb instead of 1Gb on the failed interface. I moved the cable to several other open ports and the link indicated 100Mb for each port that I tried. I also swapped the cable with the same result. I tried changing the properties of the network card in windows and the server locked up and required a hard reset after clicking apply. This windows server has two physical network interfaces so I have swapped the cables and network settings on the two interfaces to see if the problem follows the interface. If the public interface goes down again we will know that it is not an issue with the network card. (We also tried another switch we have on hand, no change) Changing network hardware driver versions We've had the same problem with the latest Broadcom driver, as well as the built-in driver that ships in Windows Server 2008 R2. Replacing network cables As a last ditch effort we remembered another change that occurred was the replacement of all of the patch cords between our servers / switch. We had purchased two sets, one green of lengths 1ft - 3ft for the private interfaces and another set of red cables for the public interfaces. We swapped out all of the public interface patch cables with a different brand and ran our servers without issue for a full week ... aaaaaand then the problem recurred. Disable checksum offload, remove TProxy We also tried disabling TCP/IP checksum offload in the driver, no change. We're now pulling out TProxy and moving to a more traditional x-forwarded-for network arrangement without any fancy IP address rewriting. We'll see if that helps. Switch Virtualization providers On the off chance this was related to Hyper-V in some way (we do host Linux VMs on it), we switched to VMWare Server. No change. Switch host model We've reached the end of our troubleshooting rope and are now formally involving Microsoft support. They recommended changing the host model: http://en.wikipedia.org/wiki/Host_model http://technet.microsoft.com/en-us/magazine/2007.09.cableguy.aspx We did that, and.. we'll see.

Read the article
Linux server is only using 60% of memory, then swapping

- by Kamil Kisiel

I've got a Linux server that's running our bacula backup system. The machine is grinding like mad because it's going heavy in to swap. The problem is, it's only using 60% of its physical memory! Here's the output from free -m: free -m total used free shared buffers cached Mem: 3949 2356 1593 0 0 1 -/+ buffers/cache: 2354 1595 Swap: 7629 1804 5824 and some sample output from vmstat 1: procs -----------memory---------- ---swap-- -----io---- -system-- -----cpu------ r b swpd free buff cache si so bi bo in cs us sy id wa st 0 2 1843536 1634512 0 4188 54 13 2524 666 2 1 1 1 89 9 0 1 11 1845916 1640724 0 388 2700 4816 221880 4879 14409 170721 4 3 63 30 0 0 9 1846096 1643952 0 0 4956 756 174832 804 12357 159306 3 4 63 30 0 0 11 1846104 1643532 0 0 4916 540 174320 580 10609 139960 3 4 64 29 0 0 4 1846084 1640272 0 2336 4080 524 140408 548 9331 118287 3 4 63 30 0 0 8 1846104 1642096 0 1488 2940 432 102516 457 7023 82230 2 4 65 29 0 0 5 1846104 1642268 0 1276 3704 452 126520 452 9494 119612 3 5 65 27 0 3 12 1846104 1641528 0 328 6092 608 187776 636 8269 113059 4 3 64 29 0 2 2 1846084 1640960 0 724 5948 0 111480 0 7751 116370 4 4 63 29 0 0 4 1846100 1641484 0 404 4144 1476 125760 1500 10668 105358 2 3 71 25 0 0 13 1846104 1641932 0 0 5872 828 153808 840 10518 128447 3 4 70 22 0 0 8 1846096 1639172 0 3164 3556 556 74884 580 5082 65362 2 2 73 23 0 1 4 1846080 1638676 0 396 4512 28 50928 44 2672 38277 2 2 80 16 0 0 3 1846080 1628808 0 7132 2636 0 28004 8 1358 14090 0 1 78 20 0 0 2 1844728 1618552 0 11140 7680 0 12740 8 763 2245 0 0 82 18 0 0 2 1837764 1532056 0 101504 2952 0 95644 24 802 3817 0 1 87 12 0 0 11 1842092 1633324 0 4416 1748 10900 143144 11024 6279 134442 3 3 70 24 0 2 6 1846104 1642756 0 0 4768 468 78752 468 4672 60141 2 2 76 20 0 1 12 1846104 1640792 0 236 4752 440 140712 464 7614 99593 3 5 58 34 0 0 3 1846084 1630368 0 6316 5104 0 20336 0 1703 22424 1 1 72 26 0 2 17 1846104 1638332 0 3168 4080 1720 211960 1744 11977 155886 3 4 65 28 0 1 10 1846104 1640800 0 132 4488 556 126016 584 8016 106368 3 4 63 29 0 0 14 1846104 1639740 0 2248 3436 428 114188 452 7030 92418 3 3 59 35 0 1 6 1846096 1639504 0 1932 5500 436 141412 460 8261 112210 4 4 63 29 0 0 10 1846104 1640164 0 3052 4028 448 147684 472 7366 109554 4 4 61 30 0 0 10 1846100 1641040 0 2332 4952 632 147452 664 8767 118384 3 4 63 30 0 4 8 1846084 1641092 0 664 4948 276 152264 292 6448 98813 5 5 62 28 0 Furthermore, the output of top sorted by CPU time seems to support the theory that swap is what's bogging down the system: top - 09:05:32 up 37 days, 23:24, 1 user, load average: 9.75, 8.24, 7.12 Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie Cpu(s): 1.6%us, 1.4%sy, 0.0%ni, 76.1%id, 20.6%wa, 0.1%hi, 0.2%si, 0.0%st Mem: 4044632k total, 2405628k used, 1639004k free, 0k buffers Swap: 7812492k total, 1851852k used, 5960640k free, 436k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 4174 root 17 0 63156 176 56 S 8 0.0 2138:52 35,38 bacula-fd 4185 root 17 0 63352 284 104 S 6 0.0 1709:25 28,29 bacula-sd 240 root 15 0 0 0 0 D 3 0.0 831:55.19 831:55 kswapd0 2852 root 10 -5 0 0 0 S 1 0.0 126:35.59 126:35 xfsbufd 2849 root 10 -5 0 0 0 S 0 0.0 119:50.94 119:50 xfsbufd 1364 root 10 -5 0 0 0 S 0 0.0 117:05.39 117:05 xfsbufd 21 root 10 -5 0 0 0 S 1 0.0 48:03.44 48:03 events/3 6940 postgres 16 0 43596 8 8 S 0 0.0 46:50.35 46:50 postmaster 1342 root 10 -5 0 0 0 S 0 0.0 23:14.34 23:14 xfsdatad/4 5415 root 17 0 1770m 108 48 S 0 0.0 15:03.74 15:03 bacula-dir 23 root 10 -5 0 0 0 S 0 0.0 13:09.71 13:09 events/5 5604 root 17 0 1216m 500 200 S 0 0.0 12:38.20 12:38 java 5552 root 16 0 1194m 580 248 S 0 0.0 11:58.00 11:58 java Here's the same sorted by virtual memory image size: top - 09:08:32 up 37 days, 23:27, 1 user, load average: 8.43, 8.26, 7.32 Tasks: 173 total, 1 running, 172 sleeping, 0 stopped, 0 zombie Cpu(s): 3.6%us, 3.4%sy, 0.0%ni, 62.2%id, 30.2%wa, 0.2%hi, 0.3%si, 0.0%st Mem: 4044632k total, 2404212k used, 1640420k free, 0k buffers Swap: 7812492k total, 1852548k used, 5959944k free, 100k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ TIME COMMAND 5415 root 17 0 1770m 56 44 S 0 0.0 15:03.78 15:03 bacula-dir 5604 root 17 0 1216m 492 200 S 0 0.0 12:38.30 12:38 java 5552 root 16 0 1194m 476 200 S 0 0.0 11:58.20 11:58 java 4598 root 16 0 117m 44 44 S 0 0.0 0:13.37 0:13 eventmond 9614 gdm 16 0 93188 0 0 S 0 0.0 0:00.30 0:00 gdmgreeter 5527 root 17 0 78716 0 0 S 0 0.0 0:00.30 0:00 gdm 4185 root 17 0 63352 284 104 S 20 0.0 1709:52 28,29 bacula-sd 4174 root 17 0 63156 208 88 S 24 0.0 2139:25 35,39 bacula-fd 10849 postgres 18 0 54740 216 108 D 0 0.0 0:31.40 0:31 postmaster 6661 postgres 17 0 49432 0 0 S 0 0.0 0:03.50 0:03 postmaster 5507 root 15 0 47980 0 0 S 0 0.0 0:00.00 0:00 gdm 6940 postgres 16 0 43596 16 16 S 0 0.0 46:51.39 46:51 postmaster 5304 postgres 16 0 40580 132 88 S 0 0.0 6:21.79 6:21 postmaster 5301 postgres 17 0 40448 24 24 S 0 0.0 0:32.17 0:32 postmaster 11280 root 16 0 40288 28 28 S 0 0.0 0:00.11 0:00 sshd 5534 root 17 0 37580 0 0 S 0 0.0 0:56.18 0:56 X 30870 root 30 15 31668 28 28 S 0 0.0 1:13.38 1:13 snmpd 5305 postgres 17 0 30628 16 16 S 0 0.0 0:11.60 0:11 postmaster 27403 postfix 17 0 30248 0 0 S 0 0.0 0:02.76 0:02 qmgr 10815 postfix 15 0 30208 16 16 S 0 0.0 0:00.02 0:00 pickup 5306 postgres 16 0 29760 20 20 S 0 0.0 0:52.89 0:52 postmaster 5302 postgres 17 0 29628 64 32 S 0 0.0 1:00.64 1:00 postmaster I've tried tuning the swappiness kernel parameter to both high and low values, but nothing appears to change the behavior here. I'm at a loss to figure out what's going on. How can I find out what's causing this? Update: The system is a fully 64-bit system, so there should be no question of memory limitations due to 32-bit issues. Update2: As I mentioned in the original question, I've already tried tuning swappiness to all sorts of values, including 0. The result is always the same, with approximately 1.6 GB of memory remaining unused. Update3: Added top output to the above info.

Read the article
WCF WS-Security and WSE Nonce Authentication

- by Rick Strahl

WCF makes it fairly easy to access WS-* Web Services, except when you run into a service format that it doesn't support. Even then WCF provides a huge amount of flexibility to make the service clients work, however finding the proper interfaces to make that happen is not easy to discover and for the most part undocumented unless you're lucky enough to run into a blog, forum or StackOverflow post on the matter. This is definitely true for the Password Nonce as part of the WS-Security/WSE protocol, which is not natively supported in WCF. Specifically I had a need to create a WCF message on the client that includes a WS-Security header that looks like this from their spec document:<soapenv:Header> <wsse:Security soapenv:mustUnderstand="1" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <wsse:UsernameToken wsu:Id="UsernameToken-8" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"> <wsse:Username>TeStUsErNaMe1</wsse:Username> <wsse:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText" >TeStPaSsWoRd1</wsse:Password> <wsse:Nonce EncodingType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary" >f8nUe3YupTU5ISdCy3X9Gg==</wsse:Nonce> <wsu:Created>2011-05-04T19:01:40.981Z</wsu:Created> </wsse:UsernameToken> </wsse:Security> </soapenv:Header> Specifically, the Nonce and Created keys are what WCF doesn't create or have a built in formatting for. Why is there a nonce? My first thought here was WTF? The username and password are there in clear text, what does the Nonce accomplish? The Nonce and created keys are are part of WSE Security specification and are meant to allow the server to detect and prevent replay attacks. The hashed nonce should be unique per request which the server can store and check for before running another request thus ensuring that a request is not replayed with exactly the same values. Basic ServiceUtl Import - not much Luck The first thing I did when I imported this service with a service reference was to simply import it as a Service Reference. The Add Service Reference import automatically detects that WS-Security is required and appropariately adds the WS-Security to the basicHttpBinding in the config file:<?xml version="1.0" encoding="utf-8" ?> <configuration> <system.serviceModel> <bindings> <basicHttpBinding> <binding name="RealTimeOnlineSoapBinding"> <security mode="Transport" /> </binding> <binding name="RealTimeOnlineSoapBinding1" /> </basicHttpBinding> </bindings> <client> <endpoint address="https://notarealurl.com:443/services/RealTimeOnline" binding="basicHttpBinding" bindingConfiguration="RealTimeOnlineSoapBinding" contract="RealTimeOnline.RealTimeOnline" name="RealTimeOnline" /> </client> </system.serviceModel> </configuration> If if I run this as is using code like this:var client = new RealTimeOnlineClient(); client.ClientCredentials.UserName.UserName = "TheUsername"; client.ClientCredentials.UserName.Password = "ThePassword"; … I get nothing in terms of WS-Security headers. The request is sent, but the the binding expects transport level security to be applied, rather than message level security. To fix this so that a WS-Security message header is sent the security mode can be changed to: <security mode="TransportWithMessageCredential" /> Now if I re-run I at least get a WS-Security header which looks like this:<s:Envelope xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"> <s:Header> <o:Security s:mustUnderstand="1" xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <u:Timestamp u:Id="_0"> <u:Created>2012-11-24T02:55:18.011Z</u:Created> <u:Expires>2012-11-24T03:00:18.011Z</u:Expires> </u:Timestamp> <o:UsernameToken u:Id="uuid-18c215d4-1106-40a5-8dd1-c81fdddf19d3-1"> <o:Username>TheUserName</o:Username> <o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText" >ThePassword</o:Password> </o:UsernameToken> </o:Security> </s:Header> Closer! Now the WS-Security header is there along with a timestamp field (which might not be accepted by some WS-Security expecting services), but there's no Nonce or created timestamp as required by my original service. Using a CustomBinding instead My next try was to go with a CustomBinding instead of basicHttpBinding as it allows a bit more control over the protocol and transport configurations for the binding. Specifically I can explicitly specify the message protocol(s) used. Using configuration file settings here's what the config file looks like:<?xml version="1.0"?> <configuration> <system.serviceModel> <bindings> <customBinding> <binding name="CustomSoapBinding"> <security includeTimestamp="false" authenticationMode="UserNameOverTransport" defaultAlgorithmSuite="Basic256" requireDerivedKeys="false" messageSecurityVersion="WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10"> </security> <textMessageEncoding messageVersion="Soap11"></textMessageEncoding> <httpsTransport maxReceivedMessageSize="2000000000"/> </binding> </customBinding> </bindings> <client> <endpoint address="https://notrealurl.com:443/services/RealTimeOnline" binding="customBinding" bindingConfiguration="CustomSoapBinding" contract="RealTimeOnline.RealTimeOnline" name="RealTimeOnline" /> </client> </system.serviceModel> <startup> <supportedRuntime version="v4.0" sku=".NETFramework,Version=v4.0"/> </startup> </configuration> This ends up creating a cleaner header that's missing the timestamp field which can cause some services problems. The WS-Security header output generated with the above looks like this:<s:Header> <o:Security s:mustUnderstand="1" xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <o:UsernameToken u:Id="uuid-291622ca-4c11-460f-9886-ac1c78813b24-1"> <o:Username>TheUsername</o:Username> <o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText" >ThePassword</o:Password> </o:UsernameToken> </o:Security> </s:Header> This is closer as it includes only the username and password. The key here is the protocol for WS-Security:messageSecurityVersion="WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10" which explicitly specifies the protocol version. There are several variants of this specification but none of them seem to support the nonce unfortunately. This protocol does allow for optional omission of the Nonce and created timestamp provided (which effectively makes those keys optional). With some services I tried that requested a Nonce just using this protocol actually worked where the default basicHttpBinding failed to connect, so this is a possible solution for access to some services. Unfortunately for my target service that was not an option. The nonce has to be there. Creating Custom ClientCredentials As it turns out WCF doesn't have support for the Digest Nonce as part of WS-Security, and so as far as I can tell there's no way to do it just with configuration settings. I did a bunch of research on this trying to find workarounds for this, and I did find a couple of entries on StackOverflow as well as on the MSDN forums. However, none of these are particularily clear and I ended up using bits and pieces of several of them to arrive at a working solution in the end. http://stackoverflow.com/questions/896901/wcf-adding-nonce-to-usernametoken http://social.msdn.microsoft.com/Forums/en-US/wcf/thread/4df3354f-0627-42d9-b5fb-6e880b60f8ee The latter forum message is the more useful of the two (the last message on the thread in particular) and it has most of the information required to make this work. But it took some experimentation for me to get this right so I'll recount the process here maybe a bit more comprehensively. In order for this to work a number of classes have to be overridden: ClientCredentials ClientCredentialsSecurityTokenManager WSSecurityTokenizer The idea is that we need to create a custom ClientCredential class to hold the custom properties so they can be set from the UI or via configuration settings. The TokenManager and Tokenizer are mainly required to allow the custom credentials class to flow through the WCF pipeline and eventually provide custom serialization. Here are the three classes required and their full implementations:public class CustomCredentials : ClientCredentials { public CustomCredentials() { } protected CustomCredentials(CustomCredentials cc) : base(cc) { } public override System.IdentityModel.Selectors.SecurityTokenManager CreateSecurityTokenManager() { return new CustomSecurityTokenManager(this); } protected override ClientCredentials CloneCore() { return new CustomCredentials(this); } } public class CustomSecurityTokenManager : ClientCredentialsSecurityTokenManager { public CustomSecurityTokenManager(CustomCredentials cred) : base(cred) { } public override System.IdentityModel.Selectors.SecurityTokenSerializer CreateSecurityTokenSerializer(System.IdentityModel.Selectors.SecurityTokenVersion version) { return new CustomTokenSerializer(System.ServiceModel.Security.SecurityVersion.WSSecurity11); } } public class CustomTokenSerializer : WSSecurityTokenSerializer { public CustomTokenSerializer(SecurityVersion sv) : base(sv) { } protected override void WriteTokenCore(System.Xml.XmlWriter writer, System.IdentityModel.Tokens.SecurityToken token) { UserNameSecurityToken userToken = token as UserNameSecurityToken; string tokennamespace = "o"; DateTime created = DateTime.Now; string createdStr = created.ToString("yyyy-MM-ddThh:mm:ss.fffZ"); // unique Nonce value - encode with SHA-1 for 'randomness' // in theory the nonce could just be the GUID by itself string phrase = Guid.NewGuid().ToString(); var nonce = GetSHA1String(phrase); // in this case password is plain text // for digest mode password needs to be encoded as: // PasswordAsDigest = Base64(SHA-1(Nonce + Created + Password)) // and profile needs to change to //string password = GetSHA1String(nonce + createdStr + userToken.Password); string password = userToken.Password; writer.WriteRaw(string.Format( "<{0}:UsernameToken u:Id=\"" + token.Id + "\" xmlns:u=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd\">" + "<{0}:Username>" + userToken.UserName + "</{0}:Username>" + "<{0}:Password Type=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText\">" + password + "</{0}:Password>" + "<{0}:Nonce EncodingType=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary\">" + nonce + "</{0}:Nonce>" + "<u:Created>" + createdStr + "</u:Created></{0}:UsernameToken>", tokennamespace)); } protected string GetSHA1String(string phrase) { SHA1CryptoServiceProvider sha1Hasher = new SHA1CryptoServiceProvider(); byte[] hashedDataBytes = sha1Hasher.ComputeHash(Encoding.UTF8.GetBytes(phrase)); return Convert.ToBase64String(hashedDataBytes); } } Realistically only the CustomTokenSerializer has any significant code in. The code there deals with actually serializing the custom credentials using low level XML semantics by writing output into an XML writer. I can't take credit for this code - most of the code comes from the MSDN forum post mentioned earlier - I made a few adjustments to simplify the nonce generation and also added some notes to allow for PasswordDigest generation. Per spec the nonce is nothing more than a unique value that's supposed to be 'random'. I'm thinking that this value can be any string that's unique and a GUID on its own probably would have sufficed. Comments on other posts that GUIDs can be potentially guessed are highly exaggerated to say the least IMHO. To satisfy even that aspect though I added the SHA1 encryption and binary decoding to give a more random value that would be impossible to 'guess'. The original example from the forum post used another level of encoding and decoding to string in between - but that really didn't accomplish anything but extra overhead. The header output generated from this looks like this:<s:Header> <o:Security s:mustUnderstand="1" xmlns:o="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd"> <o:UsernameToken u:Id="uuid-f43d8b0d-0ebb-482e-998d-f544401a3c91-1" xmlns:u="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd"> <o:Username>TheUsername</o:Username> <o:Password Type="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#PasswordText">ThePassword</o:Password> <o:Nonce EncodingType="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary" >PjVE24TC6HtdAnsf3U9c5WMsECY=</o:Nonce> <u:Created>2012-11-23T07:10:04.670Z</u:Created> </o:UsernameToken> </o:Security> </s:Header> which is exactly as it should be. Password Digest? In my case the password is passed in plain text over an SSL connection, so there's no digest required so I was done with the code above. Since I don't have a service handy that requires a password digest, I had no way of testing the code for the digest implementation, but here is how this is likely to work. If you need to pass a digest encoded password things are a little bit trickier. The password type namespace needs to change to: http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#Digest and then the password value needs to be encoded. The format for password digest encoding is this: Base64(SHA-1(Nonce + Created + Password)) and it can be handled in the code above with this code (that's commented in the snippet above): string password = GetSHA1String(nonce + createdStr + userToken.Password); The entire WriteTokenCore method for digest code looks like this:protected override void WriteTokenCore(System.Xml.XmlWriter writer, System.IdentityModel.Tokens.SecurityToken token) { UserNameSecurityToken userToken = token as UserNameSecurityToken; string tokennamespace = "o"; DateTime created = DateTime.Now; string createdStr = created.ToString("yyyy-MM-ddThh:mm:ss.fffZ"); // unique Nonce value - encode with SHA-1 for 'randomness' // in theory the nonce could just be the GUID by itself string phrase = Guid.NewGuid().ToString(); var nonce = GetSHA1String(phrase); string password = GetSHA1String(nonce + createdStr + userToken.Password); writer.WriteRaw(string.Format( "<{0}:UsernameToken u:Id=\"" + token.Id + "\" xmlns:u=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd\">" + "<{0}:Username>" + userToken.UserName + "</{0}:Username>" + "<{0}:Password Type=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-username-token-profile-1.0#Digest\">" + password + "</{0}:Password>" + "<{0}:Nonce EncodingType=\"http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-soap-message-security-1.0#Base64Binary\">" + nonce + "</{0}:Nonce>" + "<u:Created>" + createdStr + "</u:Created></{0}:UsernameToken>", tokennamespace)); } I had no service to connect to to try out Digest auth - if you end up needing it and get it to work please drop a comment… How to use the custom Credentials The easiest way to use the custom credentials is to create the client in code. Here's a factory method I use to create an instance of my service client: public static RealTimeOnlineClient CreateRealTimeOnlineProxy(string url, string username, string password) { if (string.IsNullOrEmpty(url)) url = "https://notrealurl.com:443/cows/services/RealTimeOnline"; CustomBinding binding = new CustomBinding(); var security = TransportSecurityBindingElement.CreateUserNameOverTransportBindingElement(); security.IncludeTimestamp = false; security.DefaultAlgorithmSuite = SecurityAlgorithmSuite.Basic256; security.MessageSecurityVersion = MessageSecurityVersion.WSSecurity10WSTrustFebruary2005WSSecureConversationFebruary2005WSSecurityPolicy11BasicSecurityProfile10; var encoding = new TextMessageEncodingBindingElement(); encoding.MessageVersion = MessageVersion.Soap11; var transport = new HttpsTransportBindingElement(); transport.MaxReceivedMessageSize = 20000000; // 20 megs binding.Elements.Add(security); binding.Elements.Add(encoding); binding.Elements.Add(transport); RealTimeOnlineClient client = new RealTimeOnlineClient(binding, new EndpointAddress(url)); // to use full client credential with Nonce uncomment this code: // it looks like this might not be required - the service seems to work without it client.ChannelFactory.Endpoint.Behaviors.Remove<System.ServiceModel.Description.ClientCredentials>(); client.ChannelFactory.Endpoint.Behaviors.Add(new CustomCredentials()); client.ClientCredentials.UserName.UserName = username; client.ClientCredentials.UserName.Password = password; return client; } This returns a service client that's ready to call other service methods. The key item in this code is the ChannelFactory endpoint behavior modification that that first removes the original ClientCredentials and then adds the new one. The ClientCredentials property on the client is read only and this is the way it has to be added. Summary It's a bummer that WCF doesn't suport WSE Security authentication with nonce values out of the box. From reading the comments in posts/articles while I was trying to find a solution, I found that this feature was omitted by design as this protocol is considered unsecure. While I agree that plain text passwords are rarely a good idea even if they go over secured SSL connection as WSE Security does, there are unfortunately quite a few services (mosly Java services I suspect) that use this protocol. I've run into this twice now and trying to find a solution online I can see that this is not an isolated problem - many others seem to have struggled with this. It seems there are about a dozen questions about this on StackOverflow all with varying incomplete answers. Hopefully this post provides a little more coherent content in one place. Again I marvel at WCF and its breadth of support for protocol features it has in a single tool. And even when it can't handle something there are ways to get it working via extensibility. But at the same time I marvel at how freaking difficult it is to arrive at these solutions. I mean there's no way I could have ever figured this out on my own. It takes somebody working on the WCF team or at least being very, very intricately involved in the innards of WCF to figure out the interconnection of the various objects to do this from scratch. Luckily this is an older problem that has been discussed extensively online and I was able to cobble together a solution from the online content. I'm glad it worked out that way, but it feels dirty and incomplete in that there's a whole learning path that was omitted to get here… Man am I glad I'm not dealing with SOAP services much anymore. REST service security - even when using some sort of federation is a piece of cake by comparison :-) I'm sure once standards bodies gets involved we'll be right back in security standard hell…© Rick Strahl, West Wind Technologies, 2005-2012Posted in WCF Web Services Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Is there a Telecommunications Reference Architecture?

- by raul.goycoolea

@font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Abstract Reference architecture provides needed architectural information that can be provided in advance to an enterprise to enable consistent architectural best practices. Enterprise Reference Architecture helps business owners to actualize their strategies, vision, objectives, and principles. It evaluates the IT systems, based on Reference Architecture goals, principles, and standards. It helps to reduce IT costs by increasing functionality, availability, scalability, etc. Telecom Reference Architecture provides customers with the flexibility to view bundled service bills online with the provision of multiple services. It provides real-time, flexible billing and charging systems, to handle complex promotions, discounts, and settlements with multiple parties. This paper attempts to describe the Reference Architecture for the Telecom Enterprises. It lays the foundation for a Telecom Reference Architecture by articulating the requirements, drivers, and pitfalls for telecom service providers. It describes generic reference architecture for telecom enterprises and moves on to explain how to achieve Enterprise Reference Architecture by using SOA. Introduction A Reference Architecture provides a methodology, set of practices, template, and standards based on a set of successful solutions implemented earlier. These solutions have been generalized and structured for the depiction of both a logical and a physical architecture, based on the harvesting of a set of patterns that describe observations in a number of successful implementations. It helps as a reference for the various architectures that an enterprise can implement to solve various problems. It can be used as the starting point or the point of comparisons for various departments/business entities of a company, or for the various companies for an enterprise. It provides multiple views for multiple stakeholders. Major artifacts of the Enterprise Reference Architecture are methodologies, standards, metadata, documents, design patterns, etc. Purpose of Reference Architecture In most cases, architects spend a lot of time researching, investigating, defining, and re-arguing architectural decisions. It is like reinventing the wheel as their peers in other organizations or even the same organization have already spent a lot of time and effort defining their own architectural practices. This prevents an organization from learning from its own experiences and applying that knowledge for increased effectiveness. Reference architecture provides missing architectural information that can be provided in advance to project team members to enable consistent architectural best practices. Enterprise Reference Architecture helps an enterprise to achieve the following at the abstract level: · Reference architecture is more of a communication channel to an enterprise · Helps the business owners to accommodate to their strategies, vision, objectives, and principles. · Evaluates the IT systems based on Reference Architecture Principles · Reduces IT spending through increasing functionality, availability, scalability, etc · A Real-time Integration Model helps to reduce the latency of the data updates Is used to define a single source of Information · Provides a clear view on how to manage information and security · Defines the policy around the data ownership, product boundaries, etc. · Helps with cost optimization across project and solution portfolios by eliminating unused or duplicate investments and assets · Has a shorter implementation time and cost Once the reference architecture is in place, the set of architectural principles, standards, reference models, and best practices ensure that the aligned investments have the greatest possible likelihood of success in both the near term and the long term (TCO). Common pitfalls for Telecom Service Providers Telecom Reference Architecture serves as the first step towards maturity for a telecom service provider. During the course of our assignments/experiences with telecom players, we have come across the following observations – Some of these indicate a lack of maturity of the telecom service provider: · In markets that are growing and not so mature, it has been observed that telcos have a significant amount of in-house or home-grown applications. In some of these markets, the growth has been so rapid that IT has been unable to cope with business demands. Telcos have shown a tendency to come up with workarounds in their IT applications so as to meet business needs. · Even for core functions like provisioning or mediation, some telcos have tried to manage with home-grown applications. · Most of the applications do not have the required scalability or maintainability to sustain growth in volumes or functionality. · Applications face interoperability issues with other applications in the operator's landscape. Integrating a new application or network element requires considerable effort on the part of the other applications. · Application boundaries are not clear, and functionality that is not in the initial scope of that application gets pushed onto it. This results in the development of the multiple, small applications without proper boundaries. · Usage of Legacy OSS/BSS systems, poor Integration across Multiple COTS Products and Internal Systems. Most of the Integrations are developed on ad-hoc basis and Point-to-Point Integration. · Redundancy of the business functions in different applications • Fragmented data across the different applications and no integrated view of the strategic data • Lot of performance Issues due to the usage of the complex integration across OSS and BSS systems However, this is where the maturity of the telecom industry as a whole can be of help. The collaborative efforts of telcos to overcome some of these problems have resulted in bodies like the TM Forum. They have come up with frameworks for business processes, data, applications, and technology for telecom service providers. These could be a good starting point for telcos to clean up their enterprise landscape. Industry Trends in Telecom Reference Architecture Telecom reference architectures are evolving rapidly because telcos are facing business and IT challenges. “The reality is that there probably is no killer application, no silver bullet that the telcos can latch onto to carry them into a 21st Century.... Instead, there are probably hundreds – perhaps thousands – of niche applications.... And the only way to find which of these works for you is to try out lots of them, ramp up the ones that work, and discontinue the ones that fail.” – Martin Creaner President & CTO TM Forum. The following trends have been observed in telecom reference architecture: · Transformation of business structures to align with customer requirements · Adoption of more Internet-like technical architectures. The Web 2.0 concept is increasingly being used. · Virtualization of the traditional operations support system (OSS) · Adoption of SOA to support development of IP-based services · Adoption of frameworks like Service Delivery Platforms (SDPs) and IP Multimedia Subsystem · (IMS) to enable seamless deployment of various services over fixed and mobile networks · Replacement of in-house, customized, and stove-piped OSS/BSS with standards-based COTS products · Compliance with industry standards and frameworks like eTOM, SID, and TAM to enable seamless integration with other standards-based products Drivers of Reference Architecture The drivers of the Reference Architecture are Reference Architecture Goals, Principles, and Enterprise Vision and Telecom Transformation. The details are depicted below diagram. @font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoCaption, li.MsoCaption, div.MsoCaption { margin: 0cm 0cm 10pt; font-size: 9pt; font-family: "Times New Roman"; color: rgb(79, 129, 189); font-weight: bold; }div.Section1 { page: Section1; } Figure 1. Drivers for Reference Architecture @font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Today’s telecom reference architectures should seamlessly integrate traditional legacy-based applications and transition to next-generation network technologies (e.g., IP multimedia subsystems). This has resulted in new requirements for flexible, real-time billing and OSS/BSS systems and implications on the service provider’s organizational requirements and structure. Telecom reference architectures are today expected to: · Integrate voice, messaging, email and other VAS over fixed and mobile networks, back end systems · Be able to provision multiple services and service bundles • Deliver converged voice, video and data services · Leverage the existing Network Infrastructure · Provide real-time, flexible billing and charging systems to handle complex promotions, discounts, and settlements with multiple parties. · Support charging of advanced data services such as VoIP, On-Demand, Services (e.g. Video), IMS/SIP Services, Mobile Money, Content Services and IPTV. · Help in faster deployment of new services • Serve as an effective platform for collaboration between network IT and business organizations · Harness the potential of converging technology, networks, devices and content to develop multimedia services and solutions of ever-increasing sophistication on a single Internet Protocol (IP) · Ensure better service delivery and zero revenue leakage through real-time balance and credit management · Lower operating costs to drive profitability Enterprise Reference Architecture The Enterprise Reference Architecture (RA) fills the gap between the concepts and vocabulary defined by the reference model and the implementation. Reference architecture provides detailed architectural information in a common format such that solutions can be repeatedly designed and deployed in a consistent, high-quality, supportable fashion. This paper attempts to describe the Reference Architecture for the Telecom Application Usage and how to achieve the Enterprise Level Reference Architecture using SOA. • Telecom Reference Architecture • Enterprise SOA based Reference Architecture Telecom Reference Architecture Tele Management Forum’s New Generation Operations Systems and Software (NGOSS) is an architectural framework for organizing, integrating, and implementing telecom systems. NGOSS is a component-based framework consisting of the following elements: · The enhanced Telecom Operations Map (eTOM) is a business process framework. · The Shared Information Data (SID) model provides a comprehensive information framework that may be specialized for the needs of a particular organization. · The Telecom Application Map (TAM) is an application framework to depict the functional footprint of applications, relative to the horizontal processes within eTOM. · The Technology Neutral Architecture (TNA) is an integrated framework. TNA is an architecture that is sustainable through technology changes. NGOSS Architecture Standards are: · Centralized data · Loosely coupled distributed systems · Application components/re-use · A technology-neutral system framework with technology specific implementations · Interoperability to service provider data/processes · Allows more re-use of business components across multiple business scenarios · Workflow automation The traditional operator systems architecture consists of four layers, · Business Support System (BSS) layer, with focus toward customers and business partners. Manages order, subscriber, pricing, rating, and billing information. · Operations Support System (OSS) layer, built around product, service, and resource inventories. · Networks layer – consists of Network elements and 3rd Party Systems. · Integration Layer – to maximize application communication and overall solution flexibility. Reference architecture for telecom enterprises is depicted below. @font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoCaption, li.MsoCaption, div.MsoCaption { margin: 0cm 0cm 10pt; font-size: 9pt; font-family: "Times New Roman"; color: rgb(79, 129, 189); font-weight: bold; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Figure 2. Telecom Reference Architecture The major building blocks of any Telecom Service Provider architecture are as follows: 1. Customer Relationship Management CRM encompasses the end-to-end lifecycle of the customer: customer initiation/acquisition, sales, ordering, and service activation, customer care and support, proactive campaigns, cross sell/up sell, and retention/loyalty. CRM also includes the collection of customer information and its application to personalize, customize, and integrate delivery of service to a customer, as well as to identify opportunities for increasing the value of the customer to the enterprise. The key functionalities related to Customer Relationship Management are · Manage the end-to-end lifecycle of a customer request for products. · Create and manage customer profiles. · Manage all interactions with customers – inquiries, requests, and responses. · Provide updates to Billing and other south bound systems on customer/account related updates such as customer/ account creation, deletion, modification, request bills, final bill, duplicate bills, credit limits through Middleware. · Work with Order Management System, Product, and Service Management components within CRM. · Manage customer preferences – Involve all the touch points and channels to the customer, including contact center, retail stores, dealers, self service, and field service, as well as via any media (phone, face to face, web, mobile device, chat, email, SMS, mail, the customer's bill, etc.). · Support single interface for customer contact details, preferences, account details, offers, customer premise equipment, bill details, bill cycle details, and customer interactions. CRM applications interact with customers through customer touch points like portals, point-of-sale terminals, interactive voice response systems, etc. The requests by customers are sent via fulfillment/provisioning to billing system for ordering processing. 2. Billing and Revenue Management Billing and Revenue Management handles the collection of appropriate usage records and production of timely and accurate bills – for providing pre-bill usage information and billing to customers; for processing their payments; and for performing payment collections. In addition, it handles customer inquiries about bills, provides billing inquiry status, and is responsible for resolving billing problems to the customer's satisfaction in a timely manner. This process grouping also supports prepayment for services. The key functionalities provided by these applications are · To ensure that enterprise revenue is billed and invoices delivered appropriately to customers. · To manage customers’ billing accounts, process their payments, perform payment collections, and monitor the status of the account balance. · To ensure the timely and effective fulfillment of all customer bill inquiries and complaints. · Collect the usage records from mediation and ensure appropriate rating and discounting of all usage and pricing. · Support revenue sharing; split charging where usage is guided to an account different from the service consumer. · Support prepaid and post-paid rating. · Send notification on approach / exceeding the usage thresholds as enforced by the subscribed offer, and / or as setup by the customer. · Support prepaid, post paid, and hybrid (where some services are prepaid and the rest of the services post paid) customers and conversion from post paid to prepaid, and vice versa. · Support different billing function requirements like charge prorating, promotion, discount, adjustment, waiver, write-off, account receivable, GL Interface, late payment fee, credit control, dunning, account or service suspension, re-activation, expiry, termination, contract violation penalty, etc. · Initiate direct debit to collect payment against an invoice outstanding. · Send notification to Middleware on different events; for example, payment receipt, pre-suspension, threshold exceed, etc. Billing systems typically get usage data from mediation systems for rating and billing. They get provisioning requests from order management systems and inquiries from CRM systems. Convergent and real-time billing systems can directly get usage details from network elements. 3. Mediation Mediation systems transform/translate the Raw or Native Usage Data Records into a general format that is acceptable to billing for their rating purposes. The following lists the high-level roles and responsibilities executed by the Mediation system in the end-to-end solution. · Collect Usage Data Records from different data sources – like network elements, routers, servers – via different protocol and interfaces. · Process Usage Data Records – Mediation will process Usage Data Records as per the source format. · Validate Usage Data Records from each source. · Segregates Usage Data Records coming from each source to multiple, based on the segregation requirement of end Application. · Aggregates Usage Data Records based on the aggregation rule if any from different sources. · Consolidates multiple Usage Data Records from each source. · Delivers formatted Usage Data Records to different end application like Billing, Interconnect, Fraud Management, etc. · Generates audit trail for incoming Usage Data Records and keeps track of all the Usage Data Records at various stages of mediation process. · Checks duplicate Usage Data Records across files for a given time window. 4. Fulfillment This area is responsible for providing customers with their requested products in a timely and correct manner. It translates the customer's business or personal need into a solution that can be delivered using the specific products in the enterprise's portfolio. This process informs the customers of the status of their purchase order, and ensures completion on time, as well as ensuring a delighted customer. These processes are responsible for accepting and issuing orders. They deal with pre-order feasibility determination, credit authorization, order issuance, order status and tracking, customer update on customer order activities, and customer notification on order completion. Order management and provisioning applications fall into this category. The key functionalities provided by these applications are · Issuing new customer orders, modifying open customer orders, or canceling open customer orders; · Verifying whether specific non-standard offerings sought by customers are feasible and supportable; · Checking the credit worthiness of customers as part of the customer order process; · Testing the completed offering to ensure it is working correctly; · Updating of the Customer Inventory Database to reflect that the specific product offering has been allocated, modified, or cancelled; · Assigning and tracking customer provisioning activities; · Managing customer provisioning jeopardy conditions; and · Reporting progress on customer orders and other processes to customer. These applications typically get orders from CRM systems. They interact with network elements and billing systems for fulfillment of orders. 5. Enterprise Management This process area includes those processes that manage enterprise-wide activities and needs, or have application within the enterprise as a whole. They encompass all business management processes that · Are necessary to support the whole of the enterprise, including processes for financial management, legal management, regulatory management, process, cost, and quality management, etc.; · Are responsible for setting corporate policies, strategies, and directions, and for providing guidelines and targets for the whole of the business, including strategy development and planning for areas, such as Enterprise Architecture, that are integral to the direction and development of the business; · Occur throughout the enterprise, including processes for project management, performance assessments, cost assessments, etc. (i) Enterprise Risk Management: Enterprise Risk Management focuses on assuring that risks and threats to the enterprise value and/or reputation are identified, and appropriate controls are in place to minimize or eliminate the identified risks. The identified risks may be physical or logical/virtual. Successful risk management ensures that the enterprise can support its mission critical operations, processes, applications, and communications in the face of serious incidents such as security threats/violations and fraud attempts. Two key areas covered in Risk Management by telecom operators are: · Revenue Assurance: Revenue assurance system will be responsible for identifying revenue loss scenarios across components/systems, and will help in rectifying the problems. The following lists the high-level roles and responsibilities executed by the Revenue Assurance system in the end-to-end solution. o Identify all usage information dropped when networks are being upgraded. o Interconnect bill verification. o Identify where services are routinely provisioned but never billed. o Identify poor sales policies that are intensifying collections problems. o Find leakage where usage is sent to error bucket and never billed for. o Find leakage where field service, CRM, and network build-out are not optimized. · Fraud Management: Involves collecting data from different systems to identify abnormalities in traffic patterns, usage patterns, and subscription patterns to report suspicious activity that might suggest fraudulent usage of resources, resulting in revenue losses to the operator. The key roles and responsibilities of the system component are as follows: o Fraud management system will capture and monitor high usage (over a certain threshold) in terms of duration, value, and number of calls for each subscriber. The threshold for each subscriber is decided by the system and fixed automatically. o Fraud management will be able to detect the unauthorized access to services for certain subscribers. These subscribers may have been provided unauthorized services by employees. The component will raise the alert to the operator the very first time of such illegal calls or calls which are not billed. o The solution will be to have an alarm management system that will deliver alarms to the operator/provider whenever it detects a fraud, thus minimizing fraud by catching it the first time it occurs. o The Fraud Management system will be capable of interfacing with switches, mediation systems, and billing systems (ii) Knowledge Management This process focuses on knowledge management, technology research within the enterprise, and the evaluation of potential technology acquisitions. Key responsibilities of knowledge base management are to · Maintain knowledge base – Creation and updating of knowledge base on ongoing basis. · Search knowledge base – Search of knowledge base on keywords or category browse · Maintain metadata – Management of metadata on knowledge base to ensure effective management and search. · Run report generator. · Provide content – Add content to the knowledge base, e.g., user guides, operational manual, etc. (iii) Document Management It focuses on maintaining a repository of all electronic documents or images of paper documents relevant to the enterprise using a system. (iv) Data Management It manages data as a valuable resource for any enterprise. For telecom enterprises, the typical areas covered are Master Data Management, Data Warehousing, and Business Intelligence. It is also responsible for data governance, security, quality, and database management. Key responsibilities of Data Management are · Using ETL, extract the data from CRM, Billing, web content, ERP, campaign management, financial, network operations, asset management info, customer contact data, customer measures, benchmarks, process data, e.g., process inputs, outputs, and measures, into Enterprise Data Warehouse. · Management of data traceability with source, data related business rules/decisions, data quality, data cleansing data reconciliation, competitors data – storage for all the enterprise data (customer profiles, products, offers, revenues, etc.) · Get online update through night time replication or physical backup process at regular frequency. · Provide the data access to business intelligence and other systems for their analysis, report generation, and use. (v) Business Intelligence It uses the Enterprise Data to provide the various analysis and reports that contain prospects and analytics for customer retention, acquisition of new customers due to the offers, and SLAs. It will generate right and optimized plans – bolt-ons for the customers. The following lists the high-level roles and responsibilities executed by the Business Intelligence system at the Enterprise Level: · It will do Pattern analysis and reports problem. · It will do Data Analysis – Statistical analysis, data profiling, affinity analysis of data, customer segment wise usage patterns on offers, products, service and revenue generation against services and customer segments. · It will do Performance (business, system, and forecast) analysis, churn propensity, response time, and SLAs analysis. · It will support for online and offline analysis, and report drill down capability. · It will collect, store, and report various SLA data. · It will provide the necessary intelligence for marketing and working on campaigns, etc., with cost benefit analysis and predictions. It will advise on customer promotions with additional services based on loyalty and credit history of customer · It will Interface with Enterprise Data Management system for data to run reports and analysis tasks. It will interface with the campaign schedules, based on historical success evidence. (vi) Stakeholder and External Relations Management It manages the enterprise's relationship with stakeholders and outside entities. Stakeholders include shareholders, employee organizations, etc. Outside entities include regulators, local community, and unions. Some of the processes within this grouping are Shareholder Relations, External Affairs, Labor Relations, and Public Relations. (vii) Enterprise Resource Planning It is used to manage internal and external resources, including tangible assets, financial resources, materials, and human resources. Its purpose is to facilitate the flow of information between all business functions inside the boundaries of the enterprise and manage the connections to outside stakeholders. ERP systems consolidate all business operations into a uniform and enterprise wide system environment. The key roles and responsibilities for Enterprise System are given below: · It will handle responsibilities such as core accounting, financial, and management reporting. · It will interface with CRM for capturing customer account and details. · It will interface with billing to capture the billing revenue and other financial data. · It will be responsible for executing the dunning process. Billing will send the required feed to ERP for execution of dunning. · It will interface with the CRM and Billing through batch interfaces. Enterprise management systems are like horizontals in the enterprise and typically interact with all major telecom systems. E.g., an ERP system interacts with CRM, Fulfillment, and Billing systems for different kinds of data exchanges. 6. External Interfaces/Touch Points The typical external parties are customers, suppliers/partners, employees, shareholders, and other stakeholders. External interactions from/to a Service Provider to other parties can be achieved by a variety of mechanisms, including: · Exchange of emails or faxes · Call Centers · Web Portals · Business-to-Business (B2B) automated transactions These applications provide an Internet technology driven interface to external parties to undertake a variety of business functions directly for themselves. These can provide fully or partially automated service to external parties through various touch points. Typical characteristics of these touch points are · Pre-integrated self-service system, including stand-alone web framework or integration front end with a portal engine · Self services layer exposing atomic web services/APIs for reuse by multiple systems across the architectural environment · Portlets driven connectivity exposing data and services interoperability through a portal engine or web application These touch points mostly interact with the CRM systems for requests, inquiries, and responses. 7. Middleware The component will be primarily responsible for integrating the different systems components under a common platform. It should provide a Standards-Based Platform for building Service Oriented Architecture and Composite Applications. The following lists the high-level roles and responsibilities executed by the Middleware component in the end-to-end solution. · As an integration framework, covering to and fro interfaces · Provide a web service framework with service registry. · Support SOA framework with SOA service registry. · Each of the interfaces from / to Middleware to other components would handle data transformation, translation, and mapping of data points. · Receive data from the caller / activate and/or forward the data to the recipient system in XML format. · Use standard XML for data exchange. · Provide the response back to the service/call initiator. · Provide a tracking until the response completion. · Keep a store transitional data against each call/transaction. · Interface through Middleware to get any information that is possible and allowed from the existing systems to enterprise systems; e.g., customer profile and customer history, etc. · Provide the data in a common unified format to the SOA calls across systems, and follow the Enterprise Architecture directive. · Provide an audit trail for all transactions being handled by the component. 8. Network Elements The term Network Element means a facility or equipment used in the provision of a telecommunications service. Such terms also includes features, functions, and capabilities that are provided by means of such facility or equipment, including subscriber numbers, databases, signaling systems, and information sufficient for billing and collection or used in the transmission, routing, or other provision of a telecommunications service. Typical network elements in a GSM network are Home Location Register (HLR), Intelligent Network (IN), Mobile Switching Center (MSC), SMS Center (SMSC), and network elements for other value added services like Push-to-talk (PTT), Ring Back Tone (RBT), etc. Network elements are invoked when subscribers use their telecom devices for any kind of usage. These elements generate usage data and pass it on to downstream systems like mediation and billing system for rating and billing. They also integrate with provisioning systems for order/service fulfillment. 9. 3rd Party Applications 3rd Party systems are applications like content providers, payment gateways, point of sale terminals, and databases/applications maintained by the Government. Depending on applicability and the type of functionality provided by 3rd party applications, the integration with different telecom systems like CRM, provisioning, and billing will be done. 10. Service Delivery Platform A service delivery platform (SDP) provides the architecture for the rapid deployment, provisioning, execution, management, and billing of value added telecom services. SDPs are based on the concept of SOA and layered architecture. They support the delivery of voice, data services, and content in network and device-independent fashion. They allow application developers to aggregate network capabilities, services, and sources of content. SDPs typically contain layers for web services exposure, service application development, and network abstraction. SOA Reference Architecture SOA concept is based on the principle of developing reusable business service and building applications by composing those services, instead of building monolithic applications in silos. It’s about bridging the gap between business and IT through a set of business-aligned IT services, using a set of design principles, patterns, and techniques. In an SOA, resources are made available to participants in a value net, enterprise, line of business (typically spanning multiple applications within an enterprise or across multiple enterprises). It consists of a set of business-aligned IT services that collectively fulfill an organization’s business processes and goals. We can choreograph these services into composite applications and invoke them through standard protocols. SOA, apart from agility and reusability, enables: · The business to specify processes as orchestrations of reusable services · Technology agnostic business design, with technology hidden behind service interface · A contractual-like interaction between business and IT, based on service SLAs · Accountability and governance, better aligned to business services · Applications interconnections untangling by allowing access only through service interfaces, reducing the daunting side effects of change · Reduced pressure to replace legacy and extended lifetime for legacy applications, through encapsulation in services · A Cloud Computing paradigm, using web services technologies, that makes possible service outsourcing on an on-demand, utility-like, pay-per-usage basis The following section represents the Reference Architecture of logical view for the Telecom Solution. The new custom built application needs to align with this logical architecture in the long run to achieve EA benefits. Packaged implementation applications, such as ERP billing applications, need to expose their functions as service providers (as other applications consume) and interact with other applications as service consumers. COT applications need to expose services through wrappers such as adapters to utilize existing resources and at the same time achieve Enterprise Architecture goal and objectives. The following are the various layers for Enterprise level deployment of SOA. This diagram captures the abstract view of Enterprise SOA layers and important components of each layer. Layered architecture means decomposition of services such that most interactions occur between adjacent layers. However, there is no strict rule that top layers should not directly communicate with bottom layers. The diagram below represents the important logical pieces that would result from overall SOA transformation. @font-face { font-family: "Arial"; }@font-face { font-family: "Courier New"; }@font-face { font-family: "Wingdings"; }@font-face { font-family: "Cambria"; }p.MsoNormal, li.MsoNormal, div.MsoNormal { margin: 0cm 0cm 0.0001pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoCaption, li.MsoCaption, div.MsoCaption { margin: 0cm 0cm 10pt; font-size: 9pt; font-family: "Times New Roman"; color: rgb(79, 129, 189); font-weight: bold; }p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpFirst, li.MsoListParagraphCxSpFirst, div.MsoListParagraphCxSpFirst { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpMiddle, li.MsoListParagraphCxSpMiddle, div.MsoListParagraphCxSpMiddle { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }p.MsoListParagraphCxSpLast, li.MsoListParagraphCxSpLast, div.MsoListParagraphCxSpLast { margin: 0cm 0cm 0.0001pt 36pt; font-size: 12pt; font-family: "Times New Roman"; }div.Section1 { page: Section1; }ol { margin-bottom: 0cm; }ul { margin-bottom: 0cm; } Figure 3. Enterprise SOA Reference Architecture 1. Operational System Layer: This layer consists of all packaged applications like CRM, ERP, custom built applications, COTS based applications like Billing, Revenue Management, Fulfilment, and the Enterprise databases that are essential and contribute directly or indirectly to the Enterprise OSS/BSS Transformation. ERP holds the data of Asset Lifecycle Management, Supply Chain, and Advanced Procurement and Human Capital Management, etc. CRM holds the data related to Order, Sales, and Marketing, Customer Care, Partner Relationship Management, Loyalty, etc. Content Management handles Enterprise Search and Query. Billing application consists of the following components: · Collections Management, Customer Billing Management, Invoices, Real-Time Rating, Discounting, and Applying of Charges · Enterprise databases will hold both the application and service data, whether structured or unstructured. MDM - Master data majorly consists of Customer, Order, Product, and Service Data. 2. Enterprise Component Layer: This layer consists of the Application Services and Common Services that are responsible for realizing the functionality and maintaining the QoS of the exposed services. This layer uses container-based technologies such as application servers to implement the components, workload management, high availability, and load balancing. Application Services: This Service Layer enables application, technology, and database abstraction so that the complex accessing logic is hidden from the other service layers. This is a basic service layer, which exposes application functionalities and data as reusable services. The three types of the Application access services are: · Application Access Service: This Service Layer exposes application level functionalities as a reusable service between BSS to BSS and BSS to OSS integration. This layer is enabled using disparate technology such as Web Service, Integration Servers, and Adaptors, etc. · Data Access Service: This Service Layer exposes application data services as a reusable reference data service. This is done via direct interaction with application data. and provides the federated query. · Network Access Service: This Service Layer exposes provisioning layer as a reusable service from OSS to OSS integration. This integration service emphasizes the need for high performance, stateless process flows, and distributed design. Common Services encompasses management of structured, semi-structured, and unstructured data such as information services, portal services, interaction services, infrastructure services, and security services, etc. 3. Integration Layer: This consists of service infrastructure components like service bus, service gateway for partner integration, service registry, service repository, and BPEL processor. Service bus will carry the service invocation payloads/messages between consumers and providers. The other important functions expected from it are itinerary based routing, distributed caching of routing information, transformations, and all qualities of service for messaging-like reliability, scalability, and availability, etc. Service registry will hold all contracts (wsdl) of services, and it helps developers to locate or discover service during design time or runtime. • BPEL processor would be useful in orchestrating the services to compose a complex business scenario or process. • Workflow and business rules management are also required to support manual triggering of certain activities within business process. based on the rules setup and also the state machine information. Application, data, and service mediation layer typically forms the overall composite application development framework or SOA Framework. 4. Business Process Layer: These are typically the intermediate services layer and represent Shared Business Process Services. At Enterprise Level, these services are from Customer Management, Order Management, Billing, Finance, and Asset Management application domains. 5. Access Layer: This layer consists of portals for Enterprise and provides a single view of Enterprise information management and dashboard services. 6. Channel Layer: This consists of various devices; applications that form part of extended enterprise; browsers through which users access the applications. 7. Client Layer: This designates the different types of users accessing the enterprise applications. The type of user typically would be an important factor in determining the level of access to applications. 8. Vertical pieces like management, monitoring, security, and development cut across all horizontal layers Management and monitoring involves all aspects of SOA-like services, SLAs, and other QoS lifecycle processes for both applications and services surrounding SOA governance. 9. EA Governance, Reference Architecture, Roadmap, Principles, and Best Practices: EA Governance is important in terms of providing the overall direction to SOA implementation within the enterprise. This involves board-level involvement, in addition to business and IT executives. At a high level, this involves managing the SOA projects implementation, managing SOA infrastructure, and controlling the entire effort through all fine-tuned IT processes in accordance with COBIT (Control Objectives for Information Technology). Devising tools and techniques to promote reuse culture, and the SOA way of doing things needs competency centers to be established in addition to training the workforce to take up new roles that are suited to SOA journey. Conclusions Reference Architectures can serve as the basis for disparate architecture efforts throughout the organization, even if they use different tools and technologies. Reference architectures provide best practices and approaches in the independent way a vendor deals with technology and standards. Reference Architectures model the abstract architectural elements for an enterprise independent of the technologies, protocols, and products that are used to implement an SOA. Telecom enterprises today are facing significant business and technology challenges due to growing competition, a multitude of services, and convergence. Adopting architectural best practices could go a long way in meeting these challenges. The use of SOA-based architecture for communication to each of the external systems like Billing, CRM, etc., in OSS/BSS system has made the architecture very loosely coupled, with greater flexibility. Any change in the external systems would be absorbed at the Integration Layer without affecting the rest of the ecosystem. The use of a Business Process Management (BPM) tool makes the management and maintenance of the business processes easy, with better performance in terms of lead time, quality, and cost. Since the Architecture is based on standards, it will lower the cost of deploying and managing OSS/BSS applications over their lifecycles.

Read the article
West Wind WebSurge - an easy way to Load Test Web Applications

- by Rick Strahl

A few months ago on a project the subject of load testing came up. We were having some serious issues with a Web application that would start spewing SQL lock errors under somewhat heavy load. These sort of errors can be tough to catch, precisely because they only occur under load and not during typical development testing. To replicate this error more reliably we needed to put a load on the application and run it for a while before these SQL errors would flare up. It’s been a while since I’d looked at load testing tools, so I spent a bit of time looking at different tools and frankly didn’t really find anything that was a good fit. A lot of tools were either a pain to use, didn’t have the basic features I needed, or are extravagantly expensive. In the end I got frustrated enough to build an initially small custom load test solution that then morphed into a more generic library, then gained a console front end and eventually turned into a full blown Web load testing tool that is now called West Wind WebSurge. I got seriously frustrated looking for tools every time I needed some quick and dirty load testing for an application. If my aim is to just put an application under heavy enough load to find a scalability problem in code, or to simply try and push an application to its limits on the hardware it’s running I shouldn’t have to have to struggle to set up tests. It should be easy enough to get going in a few minutes, so that the testing can be set up quickly so that it can be done on a regular basis without a lot of hassle. And that was the goal when I started to build out my initial custom load tester into a more widely usable tool. If you’re in a hurry and you want to check it out, you can find more information and download links here: West Wind WebSurge Product Page Walk through Video Download link (zip) Install from Chocolatey Source on GitHub For a more detailed discussion of the why’s and how’s and some background continue reading. How did I get here? When I started out on this path, I wasn’t planning on building a tool like this myself – but I got frustrated enough looking at what’s out there to think that I can do better than what’s available for the most common simple load testing scenarios. When we ran into the SQL lock problems I mentioned, I started looking around what’s available for Web load testing solutions that would work for our whole team which consisted of a few developers and a couple of IT guys both of which needed to be able to run the tests. It had been a while since I looked at tools and I figured that by now there should be some good solutions out there, but as it turns out I didn’t really find anything that fit our relatively simple needs without costing an arm and a leg… I spent the better part of a day installing and trying various load testing tools and to be frank most of them were either terrible at what they do, incredibly unfriendly to use, used some terminology I couldn’t even parse, or were extremely expensive (and I mean in the ‘sell your liver’ range of expensive). Pick your poison. There are also a number of online solutions for load testing and they actually looked more promising, but those wouldn’t work well for our scenario as the application is running inside of a private VPN with no outside access into the VPN. Most of those online solutions also ended up being very pricey as well – presumably because of the bandwidth required to test over the open Web can be enormous. When I asked around on Twitter what people were using– I got mostly… crickets. Several people mentioned Visual Studio Load Test, and most other suggestions pointed to online solutions. I did get a bunch of responses though with people asking to let them know what I found – apparently I’m not alone when it comes to finding load testing tools that are effective and easy to use. As to Visual Studio, the higher end skus of Visual Studio and the test edition include a Web load testing tool, which is quite powerful, but there are a number of issues with that: First it’s tied to Visual Studio so it’s not very portable – you need a VS install. I also find the test setup and terminology used by the VS test runner extremely confusing. Heck, it’s complicated enough that there’s even a Pluralsight course on using the Visual Studio Web test from Steve Smith. And of course you need to have one of the high end Visual Studio Skus, and those are mucho Dinero ($$$) – just for the load testing that’s rarely an option. Some of the tools are ultra extensive and let you run analysis tools on the target serves which is useful, but in most cases – just plain overkill and only distracts from what I tend to be ultimately interested in: Reproducing problems that occur at high load, and finding the upper limits and ‘what if’ scenarios as load is ramped up increasingly against a site. Yes it’s useful to have Web app instrumentation, but often that’s not what you’re interested in. I still fondly remember early days of Web testing when Microsoft had the WAST (Web Application Stress Tool) tool, which was rather simple – and also somewhat limited – but easily allowed you to create stress tests very quickly. It had some serious limitations (mainly that it didn’t work with SSL), but the idea behind it was excellent: Create tests quickly and easily and provide a decent engine to run it locally with minimal setup. You could get set up and run tests within a few minutes. Unfortunately, that tool died a quiet death as so many of Microsoft’s tools that probably were built by an intern and then abandoned, even though there was a lot of potential and it was actually fairly widely used. Eventually the tools was no longer downloadable and now it simply doesn’t work anymore on higher end hardware. West Wind Web Surge – Making Load Testing Quick and Easy So I ended up creating West Wind WebSurge out of rebellious frustration… The goal of WebSurge is to make it drop dead simple to create load tests. It’s super easy to capture sessions either using the built in capture tool (big props to Eric Lawrence, Telerik and FiddlerCore which made that piece a snap), using the full version of Fiddler and exporting sessions, or by manually or programmatically creating text files based on plain HTTP headers to create requests. I’ve been using this tool for 4 months now on a regular basis on various projects as a reality check for performance and scalability and it’s worked extremely well for finding small performance issues. I also use it regularly as a simple URL tester, as it allows me to quickly enter a URL plus headers and content and test that URL and its results along with the ability to easily save one or more of those URLs. A few weeks back I made a walk through video that goes over most of the features of WebSurge in some detail: Note that the UI has slightly changed since then, so there are some UI improvements. Most notably the test results screen has been updated recently to a different layout and to provide more information about each URL in a session at a glance. The video and the main WebSurge site has a lot of info of basic operations. For the rest of this post I’ll talk about a few deeper aspects that may be of interest while also giving a glance at how WebSurge works. Session Capturing As you would expect, WebSurge works with Sessions of Urls that are played back under load. Here’s what the main Session View looks like: You can create session entries manually by individually adding URLs to test (on the Request tab on the right) and saving them, or you can capture output from Web Browsers, Windows Desktop applications that call services, your own applications using the built in Capture tool. With this tool you can capture anything HTTP -SSL requests and content from Web pages, AJAX calls, SOAP or REST services – again anything that uses Windows or .NET HTTP APIs. Behind the scenes the capture tool uses FiddlerCore so basically anything you can capture with Fiddler you can also capture with Web Surge Session capture tool. Alternately you can actually use Fiddler as well, and then export the captured Fiddler trace to a file, which can then be imported into WebSurge. This is a nice way to let somebody capture session without having to actually install WebSurge or for your customers to provide an exact playback scenario for a given set of URLs that cause a problem perhaps. Note that not all applications work with Fiddler’s proxy unless you configure a proxy. For example, .NET Web applications that make HTTP calls usually don’t show up in Fiddler by default. For those .NET applications you can explicitly override proxy settings to capture those requests to service calls. The capture tool also has handy optional filters that allow you to filter by domain, to help block out noise that you typically don’t want to include in your requests. For example, if your pages include links to CDNs, or Google Analytics or social links you typically don’t want to include those in your load test, so by capturing just from a specific domain you are guaranteed content from only that one domain. Additionally you can provide url filters in the configuration file – filters allow to provide filter strings that if contained in a url will cause requests to be ignored. Again this is useful if you don’t filter by domain but you want to filter out things like static image, css and script files etc. Often you’re not interested in the load characteristics of these static and usually cached resources as they just add noise to tests and often skew the overall url performance results. In my testing I tend to care only about my dynamic requests. SSL Captures require Fiddler Note, that in order to capture SSL requests you’ll have to install the Fiddler’s SSL certificate. The easiest way to do this is to install Fiddler and use its SSL configuration options to get the certificate into the local certificate store. There’s a document on the Telerik site that provides the exact steps to get SSL captures to work with Fiddler and therefore with WebSurge. Session Storage A group of URLs entered or captured make up a Session. Sessions can be saved and restored easily as they use a very simple text format that simply stored on disk. The format is slightly customized HTTP header traces separated by a separator line. The headers are standard HTTP headers except that the full URL instead of just the domain relative path is stored as part of the 1st HTTP header line for easier parsing. Because it’s just text and uses the same format that Fiddler uses for exports, it’s super easy to create Sessions by hand manually or under program control writing out to a simple text file. You can see what this format looks like in the Capture window figure above – the raw captured format is also what’s stored to disk and what WebSurge parses from. The only ‘custom’ part of these headers is that 1st line contains the full URL instead of the domain relative path and Host: header. The rest of each header are just plain standard HTTP headers with each individual URL isolated by a separator line. The format used here also uses what Fiddler produces for exports, so it’s easy to exchange or view data either in Fiddler or WebSurge. Urls can also be edited interactively so you can modify the headers easily as well: Again – it’s just plain HTTP headers so anything you can do with HTTP can be added here. Use it for single URL Testing Incidentally I’ve also found this form as an excellent way to test and replay individual URLs for simple non-load testing purposes. Because you can capture a single or many URLs and store them on disk, this also provides a nice HTTP playground where you can record URLs with their headers, and fire them one at a time or as a session and see results immediately. It’s actually an easy way for REST presentations and I find the simple UI flow actually easier than using Fiddler natively. Finally you can save one or more URLs as a session for later retrieval. I’m using this more and more for simple URL checks. Overriding Cookies and Domains Speaking of HTTP headers – you can also overwrite cookies used as part of the options. One thing that happens with modern Web applications is that you have session cookies in use for authorization. These cookies tend to expire at some point which would invalidate a test. Using the Options dialog you can actually override the cookie: which replaces the cookie for all requests with the cookie value specified here. You can capture a valid cookie from a manual HTTP request in your browser and then paste into the cookie field, to replace the existing Cookie with the new one that is now valid. Likewise you can easily replace the domain so if you captured urls on west-wind.com and now you want to test on localhost you can do that easily easily as well. You could even do something like capture on store.west-wind.com and then test on localhost/store which would also work. Running Load Tests Once you’ve created a Session you can specify the length of the test in seconds, and specify the number of simultaneous threads to run each session on. Sessions run through each of the URLs in the session sequentially by default. One option in the options list above is that you can also randomize the URLs so each thread runs requests in a different order. This avoids bunching up URLs initially when tests start as all threads run the same requests simultaneously which can sometimes skew the results of the first few minutes of a test. While sessions run some progress information is displayed: By default there’s a live view of requests displayed in a Console-like window. On the bottom of the window there’s a running total summary that displays where you’re at in the test, how many requests have been processed and what the requests per second count is currently for all requests. Note that for tests that run over a thousand requests a second it’s a good idea to turn off the console display. While the console display is nice to see that something is happening and also gives you slight idea what’s happening with actual requests, once a lot of requests are processed, this UI updating actually adds a lot of CPU overhead to the application which may cause the actual load generated to be reduced. If you are running a 1000 requests a second there’s not much to see anyway as requests roll by way too fast to see individual lines anyway. If you look on the options panel, there is a NoProgressEvents option that disables the console display. Note that the summary display is still updated approximately once a second so you can always tell that the test is still running. Test Results When the test is done you get a simple Results display: On the right you get an overall summary as well as breakdown by each URL in the session. Both success and failures are highlighted so it’s easy to see what’s breaking in your load test. The report can be printed or you can also open the HTML document in your default Web Browser for printing to PDF or saving the HTML document to disk. The list on the right shows you a partial list of the URLs that were fired so you can look in detail at the request and response data. The list can be filtered by success and failure requests. Each list is partial only (at the moment) and limited to a max of 1000 items in order to render reasonably quickly. Each item in the list can be clicked to see the full request and response data: This particularly useful for errors so you can quickly see and copy what request data was used and in the case of a GET request you can also just click the link to quickly jump to the page. For non-GET requests you can find the URL in the Session list, and use the context menu to Test the URL as configured including any HTTP content data to send. You get to see the full HTTP request and response as well as a link in the Request header to go visit the actual page. Not so useful for a POST as above, but definitely useful for GET requests. Finally you can also get a few charts. The most useful one is probably the Request per Second chart which can be accessed from the Charts menu or shortcut. Here’s what it looks like: Results can also be exported to JSON, XML and HTML. Keep in mind that these files can get very large rather quickly though, so exports can end up taking a while to complete. Command Line Interface WebSurge runs with a small core load engine and this engine is plugged into the front end application I’ve shown so far. There’s also a command line interface available to run WebSurge from the Windows command prompt. Using the command line you can run tests for either an individual URL (similar to AB.exe for example) or a full Session file. By default when it runs WebSurgeCli shows progress every second showing total request count, failures and the requests per second for the entire test. A silent option can turn off this progress display and display only the results. The command line interface can be useful for build integration which allows checking for failures perhaps or hitting a specific requests per second count etc. It’s also nice to use this as quick and dirty URL test facility similar to the way you’d use Apache Bench (ab.exe). Unlike ab.exe though, WebSurgeCli supports SSL and makes it much easier to create multi-URL tests using either manual editing or the WebSurge UI. Current Status Currently West Wind WebSurge is still in Beta status. I’m still adding small new features and tweaking the UI in an attempt to make it as easy and self-explanatory as possible to run. Documentation for the UI and specialty features is also still a work in progress. I plan on open-sourcing this product, but it won’t be free. There’s a free version available that provides a limited number of threads and request URLs to run. A relatively low cost license removes the thread and request limitations. Pricing info can be found on the Web site – there’s an introductory price which is $99 at the moment which I think is reasonable compared to most other for pay solutions out there that are exorbitant by comparison… The reason code is not available yet is – well, the UI portion of the app is a bit embarrassing in its current monolithic state. The UI started as a very simple interface originally that later got a lot more complex – yeah, that never happens, right? Unless there’s a lot of interest I don’t foresee re-writing the UI entirely (which would be ideal), but in the meantime at least some cleanup is required before I dare to publish it :-). The code will likely be released with version 1.0. I’m very interested in feedback. Do you think this could be useful to you and provide value over other tools you may or may not have used before? I hope so – it already has provided a ton of value for me and the work I do that made the development worthwhile at this point. You can leave a comment below, or for more extensive discussions you can post a message on the West Wind Message Board in the WebSurge section Microsoft MVPs and Insiders get a free License If you’re a Microsoft MVP or a Microsoft Insider you can get a full license for free. Send me a link to your current, official Microsoft profile and I’ll send you a not-for resale license. Send any messages to [email protected]. Resources For more info on WebSurge and to download it to try it out, use the following links. West Wind WebSurge Home Download West Wind WebSurge Getting Started with West Wind WebSurge Video© Rick Strahl, West Wind Technologies, 2005-2014Posted in ASP.NET Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Top things web developers should know about the Visual Studio 2013 release

- by Jon Galloway

ASP.NET and Web Tools for Visual Studio 2013 Release NotesASP.NET and Web Tools for Visual Studio 2013 Release NotesSummary for lazy readers: Visual Studio 2013 is now available for download on the Visual Studio site and on MSDN subscriber downloads) Visual Studio 2013 installs side by side with Visual Studio 2012 and supports round-tripping between Visual Studio versions, so you can try it out without committing to a switch Visual Studio 2013 ships with the new version of ASP.NET, which includes ASP.NET MVC 5, ASP.NET Web API 2, Razor 3, Entity Framework 6 and SignalR 2.0 The new releases ASP.NET focuses on One ASP.NET, so core features and web tools work the same across the platform (e.g. adding ASP.NET MVC controllers to a Web Forms application) New core features include new templates based on Bootstrap, a new scaffolding system, and a new identity system Visual Studio 2013 is an incredible editor for web files, including HTML, CSS, JavaScript, Markdown, LESS, Coffeescript, Handlebars, Angular, Ember, Knockdown, etc. Top links: Visual Studio 2013 content on the ASP.NET site are in the standard new releases area: http://www.asp.net/vnext ASP.NET and Web Tools for Visual Studio 2013 Release Notes Short intro videos on the new Visual Studio web editor features from Scott Hanselman and Mads Kristensen Announcing release of ASP.NET and Web Tools for Visual Studio 2013 post on the official .NET Web Development and Tools Blog Scott Guthrie's post: Announcing the Release of Visual Studio 2013 and Great Improvements to ASP.NET and Entity Framework Okay, for those of you who are still with me, let's dig in a bit. Quick web dev notes on downloading and installing Visual Studio 2013 I found Visual Studio 2013 to be a pretty fast install. According to Brian Harry's release post, installing over pre-release versions of Visual Studio is supported. I've installed the release version over pre-release versions, and it worked fine. If you're only going to be doing web development, you can speed up the install if you just select Web Developer tools. Of course, as a good Microsoft employee, I'll mention that you might also want to install some of those other features, like the Store apps for Windows 8 and the Windows Phone 8.0 SDK, but they do download and install a lot of other stuff (e.g. the Windows Phone SDK sets up Hyper-V and downloads several GB's of VM's). So if you're planning just to do web development for now, you can pick just the Web Developer Tools and install the other stuff later. If you've got a fast internet connection, I recommend using the web installer instead of downloading the ISO. The ISO includes all the features, whereas the web installer just downloads what you're installing. Visual Studio 2013 development settings and color theme When you start up Visual Studio, it'll prompt you to pick some defaults. These are totally up to you -whatever suits your development style - and you can change them later. As I said, these are completely up to you. I recommend either the Web Development or Web Development (Code Only) settings. The only real difference is that Code Only hides the toolbars, and you can switch between them using Tools / Import and Export Settings / Reset. Web Development settings Web Development (code only) settings Usually I've just gone with Web Development (code only) in the past because I just want to focus on the code, although the Standard toolbar does make it easier to switch default web browsers. More on that later. Color theme Sigh. Okay, everyone's got their favorite colors. I alternate between Light and Dark depending on my mood, and I personally like how the low contrast on the window chrome in those themes puts the emphasis on my code rather than the tabs and toolbars. I know some people got pretty worked up over that, though, and wanted the blue theme back. I personally don't like it - it reminds me of ancient versions of Visual Studio that I don't want to think about anymore. So here's the thing: if you install Visual Studio Ultimate, it defaults to Blue. The other versions default to Light. If you use Blue, I won't criticize you - out loud, that is. You can change themes really easily - either Tools / Options / Environment / General, or the smart way: ctrl+q for quick launch, then type Theme and hit enter. Signing in During the first run, you'll be prompted to sign in. You don't have to - you can click the "Not now, maybe later" link at the bottom of that dialog. I recommend signing in, though. It's not hooked in with licensing or tracking the kind of code you write to sell you components. It is doing good things, like syncing your Visual Studio settings between computers. More about that here. So, you don't have to, but I sure do. Overview of shiny new things in ASP.NET land There are a lot of good new things in ASP.NET. I'll list some of my favorite here, but you can read more on the ASP.NET site. One ASP.NET You've heard us talk about this for a while. The idea is that options are good, but choice can be a burden. When you start a new ASP.NET project, why should you have to make a tough decision - with long-term consequences - about how your application will work? If you want to use ASP.NET Web Forms, but have the option of adding in ASP.NET MVC later, why should that be hard? It's all ASP.NET, right? Ideally, you'd just decide that you want to use ASP.NET to build sites and services, and you could use the appropriate tools (the green blocks below) as you needed them. So, here it is. When you create a new ASP.NET application, you just create an ASP.NET application. Next, you can pick from some templates to get you started... but these are different. They're not "painful decision" templates, they're just some starting pieces. And, most importantly, you can mix and match. I can pick a "mostly" Web Forms template, but include MVC and Web API folders and core references. If you've tried to mix and match in the past, you're probably aware that it was possible, but not pleasant. ASP.NET MVC project files contained special project type GUIDs, so you'd only get controller scaffolding support in a Web Forms project if you manually edited the csproj file. Features in one stack didn't work in others. Project templates were painful choices. That's no longer the case. Hooray! I just did a demo in a presentation last week where I created a new Web Forms + MVC + Web API site, built a model, scaffolded MVC and Web API controllers with EF Code First, add data in the MVC view, viewed it in Web API, then added a GridView to the Web Forms Default.aspx page and bound it to the Model. In about 5 minutes. Sure, it's a simple example, but it's great to be able to share code and features across the whole ASP.NET family. Authentication In the past, authentication was built into the templates. So, for instance, there was an ASP.NET MVC 4 Intranet Project template which created a new ASP.NET MVC 4 application that was preconfigured for Windows Authentication. All of that authentication stuff was built into each template, so they varied between the stacks, and you couldn't reuse them. You didn't see a lot of changes to the authentication options, since they required big changes to a bunch of project templates. Now, the new project dialog includes a common authentication experience. When you hit the Change Authentication button, you get some common options that work the same way regardless of the template or reference settings you've made. These options work on all ASP.NET frameworks, and all hosting environments (IIS, IIS Express, or OWIN for self-host) The default is Individual User Accounts: This is the standard "create a local account, using username / password or OAuth" thing; however, it's all built on the new Identity system. More on that in a second. The one setting that has some configuration to it is Organizational Accounts, which lets you configure authentication using Active Directory, Windows Azure Active Directory, or Office 365. Identity There's a new identity system. We've taken the best parts of the previous ASP.NET Membership and Simple Identity systems, rolled in a lot of feedback and made big enhancements to support important developer concerns like unit testing and extensiblity. I've written long posts about ASP.NET identity, and I'll do it again. Soon. This is not that post. The short version is that I think we've finally got just the right Identity system. Some of my favorite features: There are simple, sensible defaults that work well - you can File / New / Run / Register / Login, and everything works. It supports standard username / password as well as external authentication (OAuth, etc.). It's easy to customize without having to re-implement an entire provider. It's built using pluggable pieces, rather than one large monolithic system. It's built using interfaces like IUser and IRole that allow for unit testing, dependency injection, etc. You can easily add user profile data (e.g. URL, twitter handle, birthday). You just add properties to your ApplicationUser model and they'll automatically be persisted. Complete control over how the identity data is persisted. By default, everything works with Entity Framework Code First, but it's built to support changes from small (modify the schema) to big (use another ORM, store your data in a document database or in the cloud or in XML or in the EXIF data of your desktop background or whatever). It's configured via OWIN. More on OWIN and Katana later, but the fact that it's built using OWIN means it's portable. You can find out more in the Authentication and Identity section of the ASP.NET site (and lots more content will be going up there soon). New Bootstrap based project templates The new project templates are built using Bootstrap 3. Bootstrap (formerly Twitter Bootstrap) is a front-end framework that brings a lot of nice benefits: It's responsive, so your projects will automatically scale to device width using CSS media queries. For example, menus are full size on a desktop browser, but on narrower screens you automatically get a mobile-friendly menu. The built-in Bootstrap styles make your standard page elements (headers, footers, buttons, form inputs, tables etc.) look nice and modern. Bootstrap is themeable, so you can reskin your whole site by dropping in a new Bootstrap theme. Since Bootstrap is pretty popular across the web development community, this gives you a large and rapidly growing variety of templates (free and paid) to choose from. Bootstrap also includes a lot of very useful things: components (like progress bars and badges), useful glyphicons, and some jQuery plugins for tooltips, dropdowns, carousels, etc.). Here's a look at how the responsive part works. When the page is full screen, the menu and header are optimized for a wide screen display: When I shrink the page down (this is all based on page width, not useragent sniffing) the menu turns into a nice mobile-friendly dropdown: For a quick example, I grabbed a new free theme off bootswatch.com. For simple themes, you just need to download the boostrap.css file and replace the /content/bootstrap.css file in your project. Now when I refresh the page, I've got a new theme: Scaffolding The big change in scaffolding is that it's one system that works across ASP.NET. You can create a new Empty Web project or Web Forms project and you'll get the Scaffold context menus. For release, we've got MVC 5 and Web API 2 controllers. We had a preview of Web Forms scaffolding in the preview releases, but they weren't fully baked for RTM. Look for them in a future update, expected pretty soon. This scaffolding system wasn't just changed to work across the ASP.NET frameworks, it's also built to enable future extensibility. That's not in this release, but should also hopefully be out soon. Project Readme page This is a small thing, but I really like it. When you create a new project, you get a Project_Readme.html page that's added to the root of your project and opens in the Visual Studio built-in browser. I love it. A long time ago, when you created a new project we just dumped it on you and left you scratching your head about what to do next. Not ideal. Then we started adding a bunch of Getting Started information to the new project templates. That told you what to do next, but you had to delete all of that stuff out of your website. It doesn't belong there. Not ideal. This is a simple HTML file that's not integrated into your project code at all. You can delete it if you want. But, it shows a lot of helpful links that are current for the project you just created. In the future, if we add new wacky project types, they can create readme docs with specific information on how to do appropriately wacky things. Side note: I really like that they used the internal browser in Visual Studio to show this content rather than popping open an HTML page in the default browser. I hate that. It's annoying. If you're doing that, I hope you'll stop. What if some unnamed person has 40 or 90 tabs saved in their browser session? When you pop open your "Thanks for installing my Visual Studio extension!" page, all eleventy billion tabs start up and I wish I'd never installed your thing. Be like these guys and pop stuff Visual Studio specific HTML docs in the Visual Studio browser. ASP.NET MVC 5 The biggest change with ASP.NET MVC 5 is that it's no longer a separate project type. It integrates well with the rest of ASP.NET. In addition to that and the other common features we've already looked at (Bootstrap templates, Identity, authentication), here's what's new for ASP.NET MVC. Attribute routing ASP.NET MVC now supports attribute routing, thanks to a contribution by Tim McCall, the author of http://attributerouting.net. With attribute routing you can specify your routes by annotating your actions and controllers. This supports some pretty complex, customized routing scenarios, and it allows you to keep your route information right with your controller actions if you'd like. Here's a controller that includes an action whose method name is Hiding, but I've used AttributeRouting to configure it to /spaghetti/with-nesting/where-is-waldo public class SampleController : Controller { [Route("spaghetti/with-nesting/where-is-waldo")] public string Hiding() { return "You found me!"; } } I enable that in my RouteConfig.cs, and I can use that in conjunction with my other MVC routes like this: public class RouteConfig { public static void RegisterRoutes(RouteCollection routes) { routes.IgnoreRoute("{resource}.axd/{*pathInfo}"); routes.MapMvcAttributeRoutes(); routes.MapRoute( name: "Default", url: "{controller}/{action}/{id}", defaults: new { controller = "Home", action = "Index", id = UrlParameter.Optional } ); } } You can read more about Attribute Routing in ASP.NET MVC 5 here. Filter enhancements There are two new additions to filters: Authentication Filters and Filter Overrides. Authentication filters are a new kind of filter in ASP.NET MVC that run prior to authorization filters in the ASP.NET MVC pipeline and allow you to specify authentication logic per-action, per-controller, or globally for all controllers. Authentication filters process credentials in the request and provide a corresponding principal. Authentication filters can also add authentication challenges in response to unauthorized requests. Override filters let you change which filters apply to a given action method or controller. Override filters specify a set of filter types that should not be run for a given scope (action or controller). This allows you to configure filters that apply globally but then exclude certain global filters from applying to specific actions or controllers. ASP.NET Web API 2 ASP.NET Web API 2 includes a lot of new features. Attribute Routing ASP.NET Web API supports the same attribute routing system that's in ASP.NET MVC 5. You can read more about the Attribute Routing features in Web API in this article. OAuth 2.0 ASP.NET Web API picks up OAuth 2.0 support, using security middleware running on OWIN (discussed below). This is great for features like authenticated Single Page Applications. OData Improvements ASP.NET Web API now has full OData support. That required adding in some of the most powerful operators: $select, $expand, $batch and $value. You can read more about OData operator support in this article by Mike Wasson. Lots more There's a huge list of other features, including CORS (cross-origin request sharing), IHttpActionResult, IHttpRequestContext, and more. I think the best overview is in the release notes. OWIN and Katana I've written about OWIN and Katana recently. I'm a big fan. OWIN is the Open Web Interfaces for .NET. It's a spec, like HTML or HTTP, so you can't install OWIN. The benefit of OWIN is that it's a community specification, so anyone who implements it can plug into the ASP.NET stack, either as middleware or as a host. Katana is the Microsoft implementation of OWIN. It leverages OWIN to wire up things like authentication, handlers, modules, IIS hosting, etc., so ASP.NET can host OWIN components and Katana components can run in someone else's OWIN implementation. Howard Dierking just wrote a cool article in MSDN magazine describing Katana in depth: Getting Started with the Katana Project. He had an interesting example showing an OWIN based pipeline which leveraged SignalR, ASP.NET Web API and NancyFx components in the same stack. If this kind of thing makes sense to you, that's great. If it doesn't, don't worry, but keep an eye on it. You're going to see some cool things happen as a result of ASP.NET becoming more and more pluggable. Visual Studio Web Tools Okay, this stuff's just crazy. Visual Studio has been adding some nice web dev features over the past few years, but they've really cranked it up for this release. Visual Studio is by far my favorite code editor for all web files: CSS, HTML, JavaScript, and lots of popular libraries. Stop thinking of Visual Studio as a big editor that you only use to write back-end code. Stop editing HTML and CSS in Notepad (or Sublime, Notepad++, etc.). Visual Studio starts up in under 2 seconds on a modern computer with an SSD. Misspelling HTML attributes or your CSS classes or jQuery or Angular syntax is stupid. It doesn't make you a better developer, it makes you a silly person who wastes time. Browser Link Browser Link is a real-time, two-way connection between Visual Studio and all connected browsers. It's only attached when you're running locally, in debug, but it applies to any and all connected browser, including emulators. You may have seen demos that showed the browsers refreshing based on changes in the editor, and I'll agree that's pretty cool. But it's really just the start. It's a two-way connection, and it's built for extensiblity. That means you can write extensions that push information from your running application (in IE, Chrome, a mobile emulator, etc.) back to Visual Studio. Mads and team have showed off some demonstrations where they enabled edit mode in the browser which updated the source HTML back on the browser. It's also possible to look at how the rendered HTML performs, check for compatibility issues, watch for unused CSS classes, the sky's the limit. New HTML editor The previous HTML editor had a lot of old code that didn't allow for improvements. The team rewrote the HTML editor to take advantage of the new(ish) extensibility features in Visual Studio, which then allowed them to add in all kinds of features - things like CSS Class and ID IntelliSense (so you type style="" and get a list of classes and ID's for your project), smart indent based on how your document is formatted, JavaScript reference auto-sync, etc. Here's a 3 minute tour from Mads Kristensen. The previous HTML editor had a lot of old code that didn't allow for improvements. The team rewrote the HTML editor to take advantage of the new(ish) extensibility features in Visual Studio, which then allowed them to add in all kinds of features - things like CSS Class and ID IntelliSense (so you type style="" and get a list of classes and ID's for your project), smart indent based on how your document is formatted, JavaScript reference auto-sync, etc. Lots more Visual Studio web dev features That's just a sampling - there's a ton of great features for JavaScript editing, CSS editing, publishing, and Page Inspector (which shows real-time rendering of your page inside Visual Studio). Here are some more short videos showing those features. Lots, lots more Okay, that's just a summary, and it's still quite a bit. Head on over to http://asp.net/vnext for more information, and download Visual Studio 2013 now to get started!

Read the article
IIS SSL Certificate Renewal Pain

- by Rick Strahl

I’m in the middle of my annual certificate renewal for the West Wind site and I can honestly say that I hate IIS’s certificate system. When it works it’s fine, but when it doesn’t man can it be a pain. Because I deal with public certificates on my site merely once a year, and you have to perform the certificate dance just the right way, I seem to run into some sort of trouble every year, thinking that Microsoft surely must have addressed the issues I ran into previously – HA! Not so. Don’t ever use the Renew Certificate Feature in IIS! The first rule that I should have never forgotten is that certificate renewals in IIS (7 is what I’m using but I think it’s no different in 7.5 and 8), simply don’t work if you’re submitting to get a public certificate from a certificate authority. I use DNSimple for my DNS domain management and SSL certificates because they provide ridiculously easy domain management and good prices for SSL certs – especially wildcard certificates, which is what I use on west-wind.com. Certificates in IIS can be found pegged to the machine root. If you go into the IIS Manager, go to the machine root the tree and then click on certificates and you then get various certificate options: Both of these options create a new Certificate request (CSR), which is just a text file. But if you’re silly enough like me to click on the Renew button on your old certificate, you’ll find that you end up generating a very long Certificate Request that looks nothing like the original certificate request and the format that’s used for this is not accepted by most certificate authorities. While I’m not sure exactly what the problem is, it simply looks like IIS is respecting none of your original certificate bit size choices and is generating a huge certificate request that is 3 times the size of a ‘normal’ certificate request. The end result is (and I’ve done this at least twice now) is that the certificate processor is likely to fail processing those renewals. Always create a new Certificate While it’s a little more work and you have to remember how to fill out the certificate request properly, this is the safe way to make sure your certificate generates properly. First comes the Distinguished Name Properties dialog: Ah yes you have to love the nomenclature of this stuff. Distinguished name, Common name – WTF is a common name? It doesn’t look common to me! Make sure this form gets filled out correctly. Common NameThis is the domain name of the Web site. In my case I’m creating a wildcard certificate so I’m using the * prefix. If you’re purchasing a certificate for a specific domain use www.west-wind.com or store.west-wind.com for example. Make sure this matches the EXACT domain you’re trying to use secure access on because that’s all the certificate is going to work on unless you get a wildcard certificate. Organization Is the name of your company or organization. Depending on the kind of certificate you purchase this name will show up on your certificate. Most low end SSL certificates (ie. those that cost under $100 for single domains) don’t list the organization, the higher signature certificates that also require extensive validation by the cert authority do. Regardless you should make sure this matches the right company/organization. Organizational Unit This can be anything. Not really sure what this is for, but traditionally I’ve always set this to Web because – well this is a Web thing after all right? I’ve never seen this used anywhere that I can tell other than to internally reference the cert. State and CountryPretty obvious. Should reflect the location of the business/organization/person or site. Next you have to configure the bit size used for the certificate: The default on this dialog is 1024, but I’ve found that most providers these days request a minimum bit length of 2048, as did my DNSimple provider. Again check with the provider when you submit to make sure. Bit length mismatches can cause problems if you use a size that isn’t supported by the provider. I had that happen last year when I submitted my CSR and it got rejected quite a bit later, when the certs usually are issued within an hour or less. When you’re done here, the certificate is saved to disk as a .txt file and it should look something like this (this is a 2048 bit length CSR):-----BEGIN NEW CERTIFICATE REQUEST----- MIIEVGCCAz0CAQAwdjELMAkGA1UEBhMCVVMxDzANBgNVBAgMBkhhd2FpaTENMAsG A1UEBwwEUGFpYTEfMB0GA1UECgwWV2VzdCBXaW5kIFRlY2hub2xvZ2llczEMMAoG B1UECwwDV2ViMRgwFgYDVQQDDA8qLndlc3Qtd2luZC5jb20wggEiMA0GCSqGSIb3 DQEBAQUAA4IBDwAwggEKAoIBAQDIPWOFMkMVRp2Ftj9w/cCVV4OYYhoZYtl+8lTk oqDwKca0xWHLgioX/9v0rZLS6a82MHqKEBxVXu+cuCmSE4AQtB/1YH9lS4tpc/be OZDvnTotP6l4MCEzzAfROcw4CiIg6X0RMSnl8IATAvv2V5LQM9TDdt9oDdMpX2IY +vVC9RZ7PMHBmR9kwI2i/lrKitzhQKaHgpmKcRlM6iqpALUiX28w5HJaDKK1MDHN 607tyFJLHijuJKx7PdTqZYf50KkC3NupfZ2avVycf18Q13jHWj59tvwEOczoVzRL l4LQivAqbhyiqMpWnrZunIOUZta5aGm+jo7O1knGWJjxuraTAgMBAAGgggGYMBoG CisGAQQBgjcNAgMxDBYKNi4yLjkyMDAuMjA0BgkrBgEEAYI3FRQxJzAlAgEFDAZS QVNYUFMMC1JBU1hQU1xSaWNrDAtJbmV0TWdyLmV4ZTByBgorBgEEAYI3DQICMWQw YgIBAR5aAE0AaQBjAHIAbwBzAG8AZgB0ACAAUgBTAEEAIABTAEMAaABhAG4AbgBl AGwAIABDAHIAeQBwAHQAbwBnAHIAYQBwAGgAaQBjACAAUAByAG8AdgBpAGQAZQBy AwEAMIHPBgkqhkiG9w0BCQ4xgcEwgb4wDgYDVR0PAQH/BAQDAgTwMBMGA1UdJQQM MAoGCCsGAQUFBwMBMHgGCSqGSIb3DQEJDwRrMGkwDgYIKoZIhvcNAwICAgCAMA4G CCqGSIb3DQMEAgIAgDALBglghkgBZQMEASowCwYJYIZIAWUDBAEtMAsGCWCGSAFl AwQBAjALBglghkgBZQMEAQUwBwYFKw4DAgcwCgYIKoZIhvcNAwcwHQYDVR0OBBYE FD/yOsTbXE+GVFCFMmldzQvyloz9MA0GCSqGSIb3DQEBBQUAA4IBAQCK6LlsCuIM 1AU0niB6QZ9v0FTsGFxP1dYvVUnJyY6VEKNiGFiQjZac7UCs0p58yScdXWEFOE8V OsjAYD3xYNc05+ckyD67UHRGEUAVB9RBvbKW23KeR/8kBmEzc8PemD52YOgExxAJ 57xWmAwEHAvbgYzQvhO8AOzH3TGvvHbg5UKM1pYgNmuwZq5DkL/IDoeIJwfk/wrI wghNTuxxIFgbH4YrgLgv4PRvrS/LaTCRBdboaCgzATMczaOb1nd/DVNR+3fCtMhM W0psTAjzRbmXF3nJyAQa7jF/52gkY0RfFX2lG5tJnG+XDsVNvKNvh9Qa5Tlmkm06 ILKCm9ciWCKk -----END NEW CERTIFICATE REQUEST----- You can take that certificate request and submit that to your certificate provider. Since this is base64 encoded you can typically just paste it into a text box on the submission page, or some providers will ask you to upload the CSR as a file. What does a Renewal look like? Note the length of the CSR will vary somewhat with key strength, but compare this to a renewal request that IIS generated from my existing site:-----BEGIN NEW CERTIFICATE REQUEST----- MIIPpwYFKoZIhvcNAQcCoIIPmDCCD5QCAQExCzAJBgUrDgMCGgUAMIIIqAYJKoZI hvcNAQcBoIIImQSCCJUwggiRMIIH+gIBADBdMSEwHwYDVQQLDBhEb21haW4gQ29u dHJvbCBWYWxpFGF0ZWQxHjAcBgNVBAsMFUVzc2VudGlhbFNTTCBXaWxkY2FyZDEY MBYGA1UEAwwPKi53ZXN0LXdpbmQuY29tMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCB iQKBgQCK4OuIOR18Wb8tNMGRZiD1c9X57b332Lj7DhbckFqLs0ys8kVDHrTXSj+T Ye9nmAvfPpZmBtE5p9qRNN79rUYugAdl+qEtE4IJe1bRfxXzcKa1SXa8+TEs3zQa zYSmcR2dDuC8om1eAdeCtt0NnkvANgm1VLwGOor/UHMASaEhCQIDAQABoIIG8jAa BgorBgEEAYI3DQIDMQwWCjYuMi45MjAwLjIwNAYJKwYBBAGCNxUUMScwJQIBBQwG UkFTWFBTDAtSQVNYUFNcUmljawwLSW5ldE1nci5leGUwZgYKKwYBBAGCNw0CAjFY MFYCAQIeTgBNAGkAYwByAG8AcwBvAGYAdAAgAFMAdAByAG8AbgBnACAAQwByAHkA cAB0AG8AZwByAGEAcABoAGkAYwAgAFAAcgBvAHYAaQBkAGUAcgMBADCCAQAGCSqG SIb3DQEJDjGB8jCB7zAOBgNVHQ8BAf8EBAMCBaAwDAYDVR0TAQH/BAIwADA0BgNV HSUELTArBggrBgEFBQcDAQYIKwYBBQUHAwIGCisGAQQBgjcKAwMGCWCGSAGG+EIE ATBPBgNVHSAESDBGMDoGCysGAQQBsjEBAgIHMCswKQYIKwYBBQUHAgEWHWh0dHBz Oi8vc2VjdXJlLmNvbW9kby5jb20vQ1BTMAgGBmeBDAECATApBgNVHREEIjAggg8q Lndlc3Qtd2luZC5jb22CDXdlc3Qtd2luZC5jb20wHQYDVR0OBBYEFEVLAyO8gDiv lsfovKrx9mHPyrsiMIIFMAYJKwYBBAGCNw0BMYIFITCCBR0wggQFoAMCAQICEQDu 1E1T5Jvtkm5LOfSHabWlMA0GCSqGSIb3DQEBBQUAMHIxCzAJBgNVBAYTAkdCMRsw GQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAY BgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVkMRgwFgYDVQQDEw9Fc3NlbnRpYWxTU0wg Q0EwHhcNMTQwNTA3MDAwMDAwWhcNMTUwNjA2MjM1OTU5WjBdMSEwHwYDVQQLExhE b21haW4gQ29udHJvbCBWYWxpZGF0ZWQxHjAcBgNVBAsTFUVzc2VudGlhbFNTTCBX aWxkY2FyZDEYMBYGA1UEAxQPKi53ZXN0LXdpbmQuY29tMIIBIjANBgkqhkiG9w0B AQEFAAOCAQ8AMIIBCgKCAQEAiyKfL66XB51DlUfm6xXqJBcvMU2qorRHxC+WjEpB amvg8XoqNfCKzDAvLMbY4BLhbYCTagqtslnP3Gj4AKhXqRKU0n6iSbmS1gcWzCJM CHufZ5RDtuTuxhTdJxzP9YqZUfKV5abWQp/TK6V1ryaBJvdqM73q4tRjrQODtkiR PfZjxpybnBHFJS8jYAf8jcOjSDZcgN1d9Evc5MrEJCp/90cAkozyF/NMcFtD6Yj8 UM97z3MzDT2JPDoH3kAr3cCgpUNyQ2+wDNCnL9eWYFkOQi8FZMsZol7KlZ5NgNfO a7iZMVGbqDg6rkS//2uGe6tSQJTTs+mAZB+na+M8XT2UqwIDAQABo4IBwTCCAb0w HwYDVR0jBBgwFoAU2svqrVsIXcz//CZUzknlVcY49PgwHQYDVR0OBBYEFH0AmLiL RSEL9+sQD/n5O4N7/nnqMA4GA1UdDwEB/wQEAwIFoDAMBgNVHRMBAf8EAjAAMDQG A1UdJQQtMCsGCCsGAQUFBwMBBggrBgEFBQcDAgYKKwYBBAGCNwoDAwYJYIZIAYb4 QgQBME8GA1UdIARIMEYwOgYLKwYBBAGyMQECAgcwKzApBggrBgEFBQcCARYdaHR0 cHM6Ly9zZWN1cmUuY29tb2RvLmNvbS9DUFMwCAYGZ4EMAQIBMDsGA1UdHwQ0MDIw MKAuoCyGKmh0dHA6Ly9jcmwuY29tb2RvY2EuY29tL0Vzc2VudGlhbFNTTENBLmNy bDBuBggrBgEFBQcBAQRiMGAwOAYIKwYBBQUHMAKGLGh0dHA6Ly9jcnQuY29tb2Rv Y2EuY29tL0Vzc2VudGlhbFNTTENBXzIuY3J0MCQGCCsGAQUFBzABhhhodHRwOi8v b2NzcC5jb21vZG9jYS5jb20wKQYDVR0RBCIwIIIPKi53ZXN0LXdpbmQuY29tgg13 ZXN0LXdpbmQuY29tMA0GCSqGSIb3DQEBBQUAA4IBAQBqBfd6QHrxXsfgfKARG6np 8yszIPhHGPPmaE7xq7RpcZjY9H+8l6fe4jQbGFjbA5uHBklYI4m2snhPaW2p8iF8 YOkm2V2hEsSTnkf5/flw9mZtlCFEDFXSsBxBdNz8RYTthPMu1h09C0XuDB30sztg nR692FrxJN5/bXsk+MC9nEweTFW/t2HW+XZ8bhM7vsAS+pZionR4MyuQ0mYIt/lD csZVZ91KxTsIm8rNMkkYGFoSIXjQ0+0tCbxMF0i2qnpmNRpA6PU8l7lxxvPkplsk 9KB8QIPFrR5p/i/SUAd9vECWh5+/ktlcrfFP2PK7XcEwWizsvMrNqLyvQVNXSUPT MA0GCSqGSIb3DQEBBQUAA4GBABt/NitwMzc5t22p5+zy4HXbVYzLEjesLH8/v0ot uLQ3kkG8tIWNh5RplxIxtilXt09H4Oxpo3fKUN0yw+E6WsBfg0sAF8pHNBdOJi48 azrQbt4HvKktQkGpgYFjLsormjF44SRtToLHlYycDHBNvjaBClUwMCq8HnwY6vDq xikRoIIFITCCBR0wggQFoAMCAQICEQDu1E1T5Jvtkm5LOfSHabWlMA0GCSqGSIb3 DQEBBQUAMHIxCzAJBgNVBAYTAkdCMRswGQYDVQQIExJHcmVhdGVyIE1hbmNoZXN0 ZXIxEDAOBgNVBAcTB1NhbGZvcmQxGjAYBgNVBAoTEUNPTU9ETyBDQSBMaW1pdGVk MRgwFgYDVQQDEw9Fc3NlbnRpYWxTU0wgQ0EwHhcNMTQwNTA3MDAwMDAwWhcNMTUw NjA2MjM1OTU5WjBdMSEwHwYDVQQLExhEb21haW4gQ29udHJvbCBWYWxpZGF0ZWQx HjAcBgNVBAsTFUVzc2VudGlhbFNTTCBXaWxkY2FyZDEYMBYGA1UEAxQPKi53ZXN0 LXdpbmQuY29tMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAiyKfL66X B51DlUfm6xXqJBcvMU2qorRHxC+WjEpBamvg8XoqNfCKzDAvLMbY4BLhbYCTagqt slnP3Gj4AKhXqRKU0n6iSbmS1gcWzCJMCHufZ5RDtuTuxhTdJxzP9YqZUfKV5abW Qp/TK6V1ryaBJvdqM73q4tRjrQODtkiRPfZjxpybnBHFJS8jYAf8jcOjSDZcgN1d 9Evc5MrEJCp/90cAkozyF/NMcFtD6Yj8UM97z3MzDT2JPDoH3kAr3cCgpUNyQ2+w DNCnL9eWYFkOQi8FZMsZol7KlZ5NgNfOa7iZMVGbqDg6rkS//2uGe6tSQJTTs+mA ZB+na+M8XT2UqwIDAQABo4IBwTCCAb0wHwYDVR0jBBgwFoAU2svqrVsIXcz//CZU zknlVcY49PgwHQYDVR0OBBYEFH0AmLiLRSEL9+sQD/n5O4N7/nnqMA4GA1UdDwEB /wQEAwIFoDAMBgNVHRMBAf8EAjAAMDQGA1UdJQQtMCsGCCsGAQUFBwMBBggrBgEF BQcDAgYKKwYBBAGCNwoDAwYJYIZIAYb4QgQBME8GA1UdIARIMEYwOgYLKwYBBAGy MQECAgcwKzApBggrBgEFBQcCARYdaHR0cHM6Ly9zZWN1cmUuY29tb2RvLmNvbS9D UFMwCAYGZ4EMAQIBMDsGA1UdHwQ0MDIwMKAuoCyGKmh0dHA6Ly9jcmwuY29tb2Rv Y2EuY29tL0Vzc2VudGlhbFNTTENBLmNybDBuBggrBgEFBQcBAQRiMGAwOAYIKwYB BQUHMAKGLGh0dHA6Ly9jcnQuY29tb2RvY2EuY29tL0Vzc2VudGlhbFNTTENBXzIu Y3J0MCQGCCsGAQUFBzABhhhodHRwOi8vb2NzcC5jb21vZG9jYS5jb20wKQYDVR0R BCIwIIIPKi53ZXN0LXdpbmQuY29tgg13ZXN0LXdpbmQuY29tMA0GCSqGSIb3DQEB BQUAA4IBAQBqBfd6QHrxXsfgfKARG6np8yszIPhHGPPmaE7xq7RpcZjY9H+8l6fe 4jQbGFjbA5uHBklYI4m2snhPaW2p8iF8YOkm2V2hEsSTnkf5/flw9mZtlCFEDFXS sBxBdNz8RYTthPMu1h09C0XuDB30sztgnR692FrxJN5/bXsk+MC9nEweTFW/t2HW +XZ8bhM7vsAS+pZionR4MyuQ0mYIt/lDcsZVZ91KxTsIm8rNMkkYGFoSIXjQ0+0t CbxMF0i2qnpmNRpA6PU8l7lxxvPkplsk9KB8QIPFrR5p/i/SUAd9vECWh5+/ktlc rfFP2PK7XcEwWizsvMrNqLyvQVNXSUPTMYIBrzCCAasCAQEwgYcwcjELMAkGA1UE BhMCR0IxGzAZBgNVBAgTEkdyZWF0ZXIgTWFuY2hlc3RlcjEQMA4GA1UEBxMHU2Fs Zm9yZDEaMBgGA1UEChMRQ09NT0RPIENBIExpbWl0ZWQxGDAWBgNVBAMTD0Vzc2Vu dGlhbFNTTCBDQQIRAO7UTVPkm+2Sbks59IdptaUwCQYFKw4DAhoFADANBgkqhkiG 9w0BAQEFAASCAQB8PNQ6bYnQpWfkHyxnDuvNKw3wrqF2p7JMZm+SuN2qp3R2LpCR mW2LrGtQIm9Iob/QOYH+8houYNVdvsATGPXX2T8gzn+anof4tOG0vCTK1Bp9bwf9 MkRP+1c8RW/vkYmUW4X5/C+y3CZpMH5dDTaXBIpXFzjX/fxNpH/rvLzGiaYYL3Cn OLO+aOADr9qq5yoqwpiYCSfYNNYKTUNNGfYIidQwYtbHXEYhSukB2oR89xD2sZZ4 bOqFjUPgTa5SsERLDDeg3omMKiIXVYGxlqBEq51Kge6IQt4qQV9P9VgInW7cWmKe dTqNHI9ri3ttewdEnT++TKGKKfTjX9SR8Waj -----END NEW CERTIFICATE REQUEST----- Clearly there’s something very different between this an my original request! And it didn’t work. IIS creates a custom CSR that is encoded in a format that no certificate authority I’ve ever used uses. If you want the gory details of what’s in there look at this ServerFault question (thanks to Mika in the comments). In the end it doesn’t matter though – no certificate authority knows what to do with this CSR. So create a new CSR and skip the renewal. Always! Use the same Server Keep in mind that on IIS at least you should always create your certificate on a single server and then when you receive the final certificate from your provider import it on that server. IIS tracks the CSR it created and requires it in order to import the final certificate properly. So if for some reason you try to install the certificate on another server, it won’t work. I’ve also run into trouble trying to install the same certificate twice – this time around I didn’t give my certificate the proper friendly name and IIS failed to allow me to assign the certificate to any of my Web sites. So I removed the certificate and tried to import again, only to find it failed the second time around. There are other ways to fix this, but in my case I had to have the certificate re-issued to work – not what you want to do. Regardless of what you do though, when you import make sure you do it right the first time by crossing all your t’s and dotting your i's– it’ll save you a lot of grief! You don’t actually have to use the server that the certificate gets installed on to generate the CSR and first install it, but it is generally a good idea to do so just so you can get the certificate installed into the right place right away. If you have access to the server where you need to install the certificate you might as well use it. But you can use another machine to generated the and install the certificate, then export the certificate and move it to another machine as needed. So you can use your Dev machine to create a certificate then export it and install it on a live server. More on installation and back up/export later. Installing the Certificate Once you’ve submitted a CSR request your provider will process the request and eventually issue you a new final certificate that contains another text file with the final key to import into your certificate store. IIS does this by combining the content in your certificate request with the original CSR. If all goes well your new certificate shows up in the certificate list and you’re ready to assign the certificate to your sites. Make sure you use a friendly name that matches domain name of your site. So use *.mysite.com or www.mysite.com or store.mysite.com to ensure IIS recognizes the certificate. I made the mistake of not naming my friendly name this way and found that IIS was unable to link my sites to my wildcard certificate. It needed to have the *. as part of the certificate otherwise the Hostname input field was blanked out. Changing the Friendly Name If you by accidentally used an invalid friendly name you can change it later in the Windows certificate store. Bring up a Run Box Type MMC File | Add/Remove Snap In Add Certificates | Computer Account | Local Computer Drill into Certificates | Personal | Certificates Find your Certificate | Right Click | Properties Edit the Friendly Name | Click OK Backing up your Certificate The first thing you should do once your certificate is successfully installed is to back it up! In case your server crashes or you otherwise lose your configuration this will ensure you have an easy way to recover and reinstall your certificate either on the same server or a different one. If you’re running a server farm or using a wildcard certificate you also need to get the certificate onto other machines and a PFX file import is the easiest way to do this. To back up your certificate select your certificate and choose Export from the context or sidebar menu: The Export Certificate option allows you to export a password protected binary file that you can import in a single step. You can copy the resulting binary PFX file to back up or copy to other machines to install on. Importing the certificate on another machine is as easy as pointing at the PFX file and specifying the password. IIS handles the rest. Assigning a new certificate to your Site Once you have the new certificate installed, all that’s left to do is assign it to your site. In IIS select your Web site and bring up the Site Bindings from the right sidebar. Add a new binding for https, bind it to port 443, specify your hostname and pick the certificate from the pick list. If you’re using a root site make sure to set up your certificate for www.yoursite.com and also for yoursite.com so that both work properly with SSL. Note that you need to explicitly configure each hostname for a certificate if you plan to use SSL. Luckily if you update your SSL certificate in the following year, IIS prompts you and asks whether you like to update all other sites that are using the existing cert to the newer cert. And you’re done. So what’s the Pain? So, all of this is old hat and it doesn’t look all that bad right? So what’s the pain here? Well if you follow the instructions and do everything right, then the process is about as straight forward as you would expect it to be. You create a cert request, you import it and assign it to your sites. That’s the basic steps and to be perfectly fair it works well – if nothing goes wrong. However, renewing tends to be the problem. The first unintuitive issue is that you simply shouldn’t renew but create a new CSR and generate your new certificate from that. Over the years I’ve fallen prey to the belief that Microsoft eventually will fix this so that the renewal creates the same type of CSR as the old cert, but apparently that will just never happen. Booo! The other problem I ran into is that I accidentally misnamed my imported certificate which in turn set off a chain of events that caused my originally issued certificate to become uninstallable. When I received my completed certificate I installed it and it installed just fine, but the friendly name was wrong. As a result IIS refused to assign the certificate to any of my host headered sites. That’s strike number one. Why the heck should the friendly name have any effect on the ability to attach the certificate??? Next I uninstalled the certificate because I figured that would be the easiest way to make sure I get it right. But I found that I could not reinstall my certificate. I kept getting these stop errors: "ASN1 bad tag value met" that would prevent the installation from completion. After searching around for this error and reading countless long messages on forums, I found that this error supposedly does not actually mean the install failed, but the list wouldn’t refresh. Commodo has this to say: Note: There is a known issue in IIS 7 giving the following error: "Cannot find the certificate request associated with this certificate file. A certificate request must be completed on the computer where it was created." You may also receive a message stating "ASN1 bad tag value met". If this is the same server that you generated the CSR on then, in most cases, the certificate is actually installed. Simply cancel the dialog and press "F5" to refresh the list of server certificates. If the new certificate is now in the list, you can continue with the next step. If it is not in the list, you will need to reissue your certificate using a new CSR (see our CSR creation instructions for IIS 7). After creating a new CSR, login to your Comodo account and click the 'replace' button for your certificate. Not sure if this issue is fixed in IIS 8 but that’s an insane bug to have crop up. As it turns out, in my case the refresh didn’t work and the certificate didn’t show up in the IIS list after the reinstall. In fact when looking at the certificate store I could see my certificate was installed in the right place, but the private key is missing which is most likely why IIS is not picking it up. It looks like IIS could not match the final cert to the original CSR generated. But again some sort of message to that affect might be helpful instead of ASN1 bad tag value met. Recovering the Private Key So it turns out my original problem was that I received the published key, but when I imported the private key was missing. There’s a relatively easy way to recover from this. If your certificate doesn’t show up in IIS check in the certificate store for the local machine (see steps above on how to bring this up). If you look at the certificate in Certificates/Personal/Certificates make sure you see the key as shown in the image below: if the key is missing it means that the certificate is missing the private key most likely. To fix a certificate you can do the following: Double click the certificate Go to the Details Tab Copy down the Serial number You can copy the serial number from the area blurred out above. The serial number will be in a format like ?00 a7 9b a1 a4 9d 91 63 57 d6 9f 26 b8 ee 79 b5 cb and you’ll need to strip out the spaces in order to use it in the next step. Next open up an Administrative command prompt and issue the following command: certutil -repairstore my 00a79ba1a49d916357d69f26b8ee79b5cb You should get a confirmation message that the repair worked. If you now go back to the certificate store you should now see the key icon show up on the certificate. Your certificate is fixed. Now go back into IIS Manager and refresh the list of certificates and if all goes well you should see all the certificates that showed in the cert store now: Remember – back up the key first then map to your site… Summary I deal with a lot of customers who run their own IIS servers, and I can’t tell you how often I hear about botched SSL installations. When I posted some of my issues on Twitter yesterday I got a hell storm of “me too” responses. I’m clearly not the only one, who’s run into this especially with renewals. I feel pretty comfortable with IIS configuration and I do a lot of it for support purposes, but the SSL configuration is one that never seems to go seamlessly. This blog post is meant as reminder to myself to read next time I do a renewal. So I can dot my i's and dash my t’s before I get caught in the mess I’m dealing with today. Hopefully some of you find this useful as well.© Rick Strahl, West Wind Technologies, 2005-2014Posted in IIS7 Security Tweet !function(d,s,id){var js,fjs=d.getElementsByTagName(s)[0];if(!d.getElementById(id)){js=d.createElement(s);js.id=id;js.src="//platform.twitter.com/widgets.js";fjs.parentNode.insertBefore(js,fjs);}}(document,"script","twitter-wjs"); (function() { var po = document.createElement('script'); po.type = 'text/javascript'; po.async = true; po.src = 'https://apis.google.com/js/plusone.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(po, s); })();

Read the article
Improving Partitioned Table Join Performance

- by Paul White

The query optimizer does not always choose an optimal strategy when joining partitioned tables. This post looks at an example, showing how a manual rewrite of the query can almost double performance, while reducing the memory grant to almost nothing. Test Data The two tables in this example use a common partitioning partition scheme. The partition function uses 41 equal-size partitions: CREATE PARTITION FUNCTION PFT (integer) AS RANGE RIGHT FOR VALUES ( 125000, 250000, 375000, 500000, 625000, 750000, 875000, 1000000, 1125000, 1250000, 1375000, 1500000, 1625000, 1750000, 1875000, 2000000, 2125000, 2250000, 2375000, 2500000, 2625000, 2750000, 2875000, 3000000, 3125000, 3250000, 3375000, 3500000, 3625000, 3750000, 3875000, 4000000, 4125000, 4250000, 4375000, 4500000, 4625000, 4750000, 4875000, 5000000 ); GO CREATE PARTITION SCHEME PST AS PARTITION PFT ALL TO ([PRIMARY]); There two tables are: CREATE TABLE dbo.T1 ( TID integer NOT NULL IDENTITY(0,1), Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T1 PRIMARY KEY CLUSTERED (TID) ON PST (TID) ); CREATE TABLE dbo.T2 ( TID integer NOT NULL, Column1 integer NOT NULL, Padding binary(100) NOT NULL DEFAULT 0x, CONSTRAINT PK_T2 PRIMARY KEY CLUSTERED (TID, Column1) ON PST (TID) ); The next script loads 5 million rows into T1 with a pseudo-random value between 1 and 5 for Column1. The table is partitioned on the IDENTITY column TID: INSERT dbo.T1 WITH (TABLOCKX) (Column1) SELECT (ABS(CHECKSUM(NEWID())) % 5) + 1 FROM dbo.Numbers AS N WHERE n BETWEEN 1 AND 5000000; In case you don’t already have an auxiliary table of numbers lying around, here’s a script to create one with 10 million rows: CREATE TABLE dbo.Numbers (n bigint PRIMARY KEY); WITH L0 AS(SELECT 1 AS c UNION ALL SELECT 1), L1 AS(SELECT 1 AS c FROM L0 AS A CROSS JOIN L0 AS B), L2 AS(SELECT 1 AS c FROM L1 AS A CROSS JOIN L1 AS B), L3 AS(SELECT 1 AS c FROM L2 AS A CROSS JOIN L2 AS B), L4 AS(SELECT 1 AS c FROM L3 AS A CROSS JOIN L3 AS B), L5 AS(SELECT 1 AS c FROM L4 AS A CROSS JOIN L4 AS B), Nums AS(SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS n FROM L5) INSERT dbo.Numbers WITH (TABLOCKX) SELECT TOP (10000000) n FROM Nums ORDER BY n OPTION (MAXDOP 1); Table T1 contains data like this: Next we load data into table T2. The relationship between the two tables is that table 2 contains ‘n’ rows for each row in table 1, where ‘n’ is determined by the value in Column1 of table T1. There is nothing particularly special about the data or distribution, by the way. INSERT dbo.T2 WITH (TABLOCKX) (TID, Column1) SELECT T.TID, N.n FROM dbo.T1 AS T JOIN dbo.Numbers AS N ON N.n >= 1 AND N.n <= T.Column1; Table T2 ends up containing about 15 million rows: The primary key for table T2 is a combination of TID and Column1. The data is partitioned according to the value in column TID alone. Partition Distribution The following query shows the number of rows in each partition of table T1: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T1 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are 40 partitions containing 125,000 rows (40 * 125k = 5m rows). The rightmost partition remains empty. The next query shows the distribution for table 2: SELECT PartitionID = CA1.P, NumRows = COUNT_BIG(*) FROM dbo.T2 AS T CROSS APPLY (VALUES ($PARTITION.PFT(TID))) AS CA1 (P) GROUP BY CA1.P ORDER BY CA1.P; There are roughly 375,000 rows in each partition (the rightmost partition is also empty): Ok, that’s the test data done. Test Query and Execution Plan The task is to count the rows resulting from joining tables 1 and 2 on the TID column: SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; The optimizer chooses a plan using parallel hash join, and partial aggregation: The Plan Explorer plan tree view shows accurate cardinality estimates and an even distribution of rows across threads (click to enlarge the image): With a warm data cache, the STATISTICS IO output shows that no physical I/O was needed, and all 41 partitions were touched: Running the query without actual execution plan or STATISTICS IO information for maximum performance, the query returns in around 2600ms. Execution Plan Analysis The first step toward improving on the execution plan produced by the query optimizer is to understand how it works, at least in outline. The two parallel Clustered Index Scans use multiple threads to read rows from tables T1 and T2. Parallel scan uses a demand-based scheme where threads are given page(s) to scan from the table as needed. This arrangement has certain important advantages, but does result in an unpredictable distribution of rows amongst threads. The point is that multiple threads cooperate to scan the whole table, but it is impossible to predict which rows end up on which threads. For correct results from the parallel hash join, the execution plan has to ensure that rows from T1 and T2 that might join are processed on the same thread. For example, if a row from T1 with join key value ‘1234’ is placed in thread 5’s hash table, the execution plan must guarantee that any rows from T2 that also have join key value ‘1234’ probe thread 5’s hash table for matches. The way this guarantee is enforced in this parallel hash join plan is by repartitioning rows to threads after each parallel scan. The two repartitioning exchanges route rows to threads using a hash function over the hash join keys. The two repartitioning exchanges use the same hash function so rows from T1 and T2 with the same join key must end up on the same hash join thread. Expensive Exchanges This business of repartitioning rows between threads can be very expensive, especially if a large number of rows is involved. The execution plan selected by the optimizer moves 5 million rows through one repartitioning exchange and around 15 million across the other. As a first step toward removing these exchanges, consider the execution plan selected by the optimizer if we join just one partition from each table, disallowing parallelism: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = 1 AND $PARTITION.PFT(T2.TID) = 1 OPTION (MAXDOP 1); The optimizer has chosen a (one-to-many) merge join instead of a hash join. The single-partition query completes in around 100ms. If everything scaled linearly, we would expect that extending this strategy to all 40 populated partitions would result in an execution time around 4000ms. Using parallelism could reduce that further, perhaps to be competitive with the parallel hash join chosen by the optimizer. This raises a question. If the most efficient way to join one partition from each of the tables is to use a merge join, why does the optimizer not choose a merge join for the full query? Forcing a Merge Join Let’s force the optimizer to use a merge join on the test query using a hint: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN); This is the execution plan selected by the optimizer: This plan results in the same number of logical reads reported previously, but instead of 2600ms the query takes 5000ms. The natural explanation for this drop in performance is that the merge join plan is only using a single thread, whereas the parallel hash join plan could use multiple threads. Parallel Merge Join We can get a parallel merge join plan using the same query hint as before, and adding trace flag 8649: SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (MERGE JOIN, QUERYTRACEON 8649); The execution plan is: This looks promising. It uses a similar strategy to distribute work across threads as seen for the parallel hash join. In practice though, performance is disappointing. On a typical run, the parallel merge plan runs for around 8400ms; slower than the single-threaded merge join plan (5000ms) and much worse than the 2600ms for the parallel hash join. We seem to be going backwards! The logical reads for the parallel merge are still exactly the same as before, with no physical IOs. The cardinality estimates and thread distribution are also still very good (click to enlarge): A big clue to the reason for the poor performance is shown in the wait statistics (captured by Plan Explorer Pro): CXPACKET waits require careful interpretation, and are most often benign, but in this case excessive waiting occurs at the repartitioning exchanges. Unlike the parallel hash join, the repartitioning exchanges in this plan are order-preserving ‘merging’ exchanges (because merge join requires ordered inputs): Parallelism works best when threads can just grab any available unit of work and get on with processing it. Preserving order introduces inter-thread dependencies that can easily lead to significant waits occurring. In extreme cases, these dependencies can result in an intra-query deadlock, though the details of that will have to wait for another time to explore in detail. The potential for waits and deadlocks leads the query optimizer to cost parallel merge join relatively highly, especially as the degree of parallelism (DOP) increases. This high costing resulted in the optimizer choosing a serial merge join rather than parallel in this case. The test results certainly confirm its reasoning. Collocated Joins In SQL Server 2008 and later, the optimizer has another available strategy when joining tables that share a common partition scheme. This strategy is a collocated join, also known as as a per-partition join. It can be applied in both serial and parallel execution plans, though it is limited to 2-way joins in the current optimizer. Whether the optimizer chooses a collocated join or not depends on cost estimation. The primary benefits of a collocated join are that it eliminates an exchange and requires less memory, as we will see next. Costing and Plan Selection The query optimizer did consider a collocated join for our original query, but it was rejected on cost grounds. The parallel hash join with repartitioning exchanges appeared to be a cheaper option. There is no query hint to force a collocated join, so we have to mess with the costing framework to produce one for our test query. Pretending that IOs cost 50 times more than usual is enough to convince the optimizer to use collocated join with our test query: -- Pretend IOs are 50x cost temporarily DBCC SETIOWEIGHT(50); -- Co-located hash join SELECT COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID OPTION (RECOMPILE); -- Reset IO costing DBCC SETIOWEIGHT(1); Collocated Join Plan The estimated execution plan for the collocated join is: The Constant Scan contains one row for each partition of the shared partitioning scheme, from 1 to 41. The hash repartitioning exchanges seen previously are replaced by a single Distribute Streams exchange using Demand partitioning. Demand partitioning means that the next partition id is given to the next parallel thread that asks for one. My test machine has eight logical processors, and all are available for SQL Server to use. As a result, there are eight threads in the single parallel branch in this plan, each processing one partition from each table at a time. Once a thread finishes processing a partition, it grabs a new partition number from the Distribute Streams exchange…and so on until all partitions have been processed. It is important to understand that the parallel scans in this plan are different from the parallel hash join plan. Although the scans have the same parallelism icon, tables T1 and T2 are not being co-operatively scanned by multiple threads in the same way. Each thread reads a single partition of T1 and performs a hash match join with the same partition from table T2. The properties of the two Clustered Index Scans show a Seek Predicate (unusual for a scan!) limiting the rows to a single partition: The crucial point is that the join between T1 and T2 is on TID, and TID is the partitioning column for both tables. A thread that processes partition ‘n’ is guaranteed to see all rows that can possibly join on TID for that partition. In addition, no other thread will see rows from that partition, so this removes the need for repartitioning exchanges. CPU and Memory Efficiency Improvements The collocated join has removed two expensive repartitioning exchanges and added a single exchange processing 41 rows (one for each partition id). Remember, the parallel hash join plan exchanges had to process 5 million and 15 million rows. The amount of processor time spent on exchanges will be much lower in the collocated join plan. In addition, the collocated join plan has a maximum of 8 threads processing single partitions at any one time. The 41 partitions will all be processed eventually, but a new partition is not started until a thread asks for it. Threads can reuse hash table memory for the new partition. The parallel hash join plan also had 8 hash tables, but with all 5,000,000 build rows loaded at the same time. The collocated plan needs memory for only 8 * 125,000 = 1,000,000 rows at any one time. Collocated Hash Join Performance The collated join plan has disappointing performance in this case. The query runs for around 25,300ms despite the same IO statistics as usual. This is much the worst result so far, so what went wrong? It turns out that cardinality estimation for the single partition scans of table T1 is slightly low. The properties of the Clustered Index Scan of T1 (graphic immediately above) show the estimation was for 121,951 rows. This is a small shortfall compared with the 125,000 rows actually encountered, but it was enough to cause the hash join to spill to physical tempdb: A level 1 spill doesn’t sound too bad, until you realize that the spill to tempdb probably occurs for each of the 41 partitions. As a side note, the cardinality estimation error is a little surprising because the system tables accurately show there are 125,000 rows in every partition of T1. Unfortunately, the optimizer uses regular column and index statistics to derive cardinality estimates here rather than system table information (e.g. sys.partitions). Collocated Merge Join We will never know how well the collocated parallel hash join plan might have worked without the cardinality estimation error (and the resulting 41 spills to tempdb) but we do know: Merge join does not require a memory grant; and Merge join was the optimizer’s preferred join option for a single partition join Putting this all together, what we would really like to see is the same collocated join strategy, but using merge join instead of hash join. Unfortunately, the current query optimizer cannot produce a collocated merge join; it only knows how to do collocated hash join. So where does this leave us? CROSS APPLY sys.partitions We can try to write our own collocated join query. We can use sys.partitions to find the partition numbers, and CROSS APPLY to get a count per partition, with a final step to sum the partial counts. The following query implements this idea: SELECT row_count = SUM(Subtotals.cnt) FROM ( -- Partition numbers SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1 ) AS P CROSS APPLY ( -- Count per collocated join SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; The estimated plan is: The cardinality estimates aren’t all that good here, especially the estimate for the scan of the system table underlying the sys.partitions view. Nevertheless, the plan shape is heading toward where we would like to be. Each partition number from the system table results in a per-partition scan of T1 and T2, a one-to-many Merge Join, and a Stream Aggregate to compute the partial counts. The final Stream Aggregate just sums the partial counts. Execution time for this query is around 3,500ms, with the same IO statistics as always. This compares favourably with 5,000ms for the serial plan produced by the optimizer with the OPTION (MERGE JOIN) hint. This is another case of the sum of the parts being less than the whole – summing 41 partial counts from 41 single-partition merge joins is faster than a single merge join and count over all partitions. Even so, this single-threaded collocated merge join is not as quick as the original parallel hash join plan, which executed in 2,600ms. On the positive side, our collocated merge join uses only one logical processor and requires no memory grant. The parallel hash join plan used 16 threads and reserved 569 MB of memory: Using a Temporary Table Our collocated merge join plan should benefit from parallelism. The reason parallelism is not being used is that the query references a system table. We can work around that by writing the partition numbers to a temporary table (or table variable): SET STATISTICS IO ON; DECLARE @s datetime2 = SYSUTCDATETIME(); CREATE TABLE #P ( partition_number integer PRIMARY KEY); INSERT #P (partition_number) SELECT p.partition_number FROM sys.partitions AS p WHERE p.[object_id] = OBJECT_ID(N'T1', N'U') AND p.index_id = 1; SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals; DROP TABLE #P; SELECT DATEDIFF(Millisecond, @s, SYSUTCDATETIME()); SET STATISTICS IO OFF; Using the temporary table adds a few logical reads, but the overall execution time is still around 3500ms, indistinguishable from the same query without the temporary table. The problem is that the query optimizer still doesn’t choose a parallel plan for this query, though the removal of the system table reference means that it could if it chose to: In fact the optimizer did enter the parallel plan phase of query optimization (running search 1 for a second time): Unfortunately, the parallel plan found seemed to be more expensive than the serial plan. This is a crazy result, caused by the optimizer’s cost model not reducing operator CPU costs on the inner side of a nested loops join. Don’t get me started on that, we’ll be here all night. In this plan, everything expensive happens on the inner side of a nested loops join. Without a CPU cost reduction to compensate for the added cost of exchange operators, candidate parallel plans always look more expensive to the optimizer than the equivalent serial plan. Parallel Collocated Merge Join We can produce the desired parallel plan using trace flag 8649 again: SELECT row_count = SUM(Subtotals.cnt) FROM #P AS p CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: One difference between this plan and the collocated hash join plan is that a Repartition Streams exchange operator is used instead of Distribute Streams. The effect is similar, though not quite identical. The Repartition uses round-robin partitioning, meaning the next partition id is pushed to the next thread in sequence. The Distribute Streams exchange seen earlier used Demand partitioning, meaning the next partition id is pulled across the exchange by the next thread that is ready for more work. There are subtle performance implications for each partitioning option, but going into that would again take us too far off the main point of this post. Performance The important thing is the performance of this parallel collocated merge join – just 1350ms on a typical run. The list below shows all the alternatives from this post (all timings include creation, population, and deletion of the temporary table where appropriate) from quickest to slowest: Collocated parallel merge join: 1350ms Parallel hash join: 2600ms Collocated serial merge join: 3500ms Serial merge join: 5000ms Parallel merge join: 8400ms Collated parallel hash join: 25,300ms (hash spill per partition) The parallel collocated merge join requires no memory grant (aside from a paltry 1.2MB used for exchange buffers). This plan uses 16 threads at DOP 8; but 8 of those are (rather pointlessly) allocated to the parallel scan of the temporary table. These are minor concerns, but it turns out there is a way to address them if it bothers you. Parallel Collocated Merge Join with Demand Partitioning This final tweak replaces the temporary table with a hard-coded list of partition ids (dynamic SQL could be used to generate this query from sys.partitions): SELECT row_count = SUM(Subtotals.cnt) FROM ( VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10), (11),(12),(13),(14),(15),(16),(17),(18),(19),(20), (21),(22),(23),(24),(25),(26),(27),(28),(29),(30), (31),(32),(33),(34),(35),(36),(37),(38),(39),(40),(41) ) AS P (partition_number) CROSS APPLY ( SELECT cnt = COUNT_BIG(*) FROM dbo.T1 AS T1 JOIN dbo.T2 AS T2 ON T2.TID = T1.TID WHERE $PARTITION.PFT(T1.TID) = p.partition_number AND $PARTITION.PFT(T2.TID) = p.partition_number ) AS SubTotals OPTION (QUERYTRACEON 8649); The actual execution plan is: The parallel collocated hash join plan is reproduced below for comparison: The manual rewrite has another advantage that has not been mentioned so far: the partial counts (per partition) can be computed earlier than the partial counts (per thread) in the optimizer’s collocated join plan. The earlier aggregation is performed by the extra Stream Aggregate under the nested loops join. The performance of the parallel collocated merge join is unchanged at around 1350ms. Final Words It is a shame that the current query optimizer does not consider a collocated merge join (Connect item closed as Won’t Fix). The example used in this post showed an improvement in execution time from 2600ms to 1350ms using a modestly-sized data set and limited parallelism. In addition, the memory requirement for the query was almost completely eliminated – down from 569MB to 1.2MB. The problem with the parallel hash join selected by the optimizer is that it attempts to process the full data set all at once (albeit using eight threads). It requires a large memory grant to hold all 5 million rows from table T1 across the eight hash tables, and does not take advantage of the divide-and-conquer opportunity offered by the common partitioning. The great thing about the collocated join strategies is that each parallel thread works on a single partition from both tables, reading rows, performing the join, and computing a per-partition subtotal, before moving on to a new partition. From a thread’s point of view… If you have trouble visualizing what is happening from just looking at the parallel collocated merge join execution plan, let’s look at it again, but from the point of view of just one thread operating between the two Parallelism (exchange) operators. Our thread picks up a single partition id from the Distribute Streams exchange, and starts a merge join using ordered rows from partition 1 of table T1 and partition 1 of table T2. By definition, this is all happening on a single thread. As rows join, they are added to a (per-partition) count in the Stream Aggregate immediately above the Merge Join. Eventually, either T1 (partition 1) or T2 (partition 1) runs out of rows and the merge join stops. The per-partition count from the aggregate passes on through the Nested Loops join to another Stream Aggregate, which is maintaining a per-thread subtotal. Our same thread now picks up a new partition id from the exchange (say it gets id 9 this time). The count in the per-partition aggregate is reset to zero, and the processing of partition 9 of both tables proceeds just as it did for partition 1, and on the same thread. Each thread picks up a single partition id and processes all the data for that partition, completely independently from other threads working on other partitions. One thread might eventually process partitions (1, 9, 17, 25, 33, 41) while another is concurrently processing partitions (2, 10, 18, 26, 34) and so on for the other six threads at DOP 8. The point is that all 8 threads can execute independently and concurrently, continuing to process new partitions until the wider job (of which the thread has no knowledge!) is done. This divide-and-conquer technique can be much more efficient than simply splitting the entire workload across eight threads all at once. Related Reading Understanding and Using Parallelism in SQL Server Parallel Execution Plans Suck © 2013 Paul White – All Rights Reserved Twitter: @SQL_Kiwi

Read the article
Guidance: A Branching strategy for Scrum Teams

- by Martin Hinshelwood

Having a good branching strategy will save your bacon, or at least your code. Be careful when deviating from your branching strategy because if you do, you may be worse off than when you started! This is one possible branching strategy for Scrum teams and I will not be going in depth with Scrum but you can find out more about Scrum by reading the Scrum Guide and you can even assess your Scrum knowledge by having a go at the Scrum Open Assessment. You can also read SSW’s Rules to Better Scrum using TFS which have been developed during our own Scrum implementations. Acknowledgements Bill Heys – Bill offered some good feedback on this post and helped soften the language. Note: Bill is a VS ALM Ranger and co-wrote the Branching Guidance for TFS 2010 Willy-Peter Schaub – Willy-Peter is an ex Visual Studio ALM MVP turned blue badge and has been involved in most of the guidance including the Branching Guidance for TFS 2010 Chris Birmele – Chris wrote some of the early TFS Branching and Merging Guidance. Dr Paul Neumeyer, Ph.D Parallel Processes, ScrumMaster and SSW Solution Architect – Paul wanted to have feature branches coming from the release branch as well. We agreed that this is really a spin-off that needs own project, backlog, budget and Team. Scenario: A product is developed RTM 1.0 is released and gets great sales. Extra features are demanded but the new version will have double to price to pay to recover costs, work is approved by the guys with budget and a few sprints later RTM 2.0 is released. Sales a very low due to the pricing strategy. There are lots of clients on RTM 1.0 calling out for patches. As I keep getting Reverse Integration and Forward Integration mixed up and Bill keeps slapping my wrists I thought I should have a reminder: You still seemed to use reverse and/or forward integration in the wrong context. I would recommend reviewing your document at the end to ensure that it agrees with the common understanding of these terms merge (forward integration) from parent to child (same direction as the branch), and merge (reverse integration) from child to parent (the reverse direction of the branch). - one of my many slaps on the wrist from Bill Heys. As I mentioned previously we are using a single feature branching strategy in our current project. The single biggest mistake developers make is developing against the “Main” or “Trunk” line. This ultimately leads to messy code as things are added and never finished. Your only alternative is to NEVER check in unless your code is 100%, but this does not work in practice, even with a single developer. Your ADD will kick in and your half-finished code will be finished enough to pass the build and the tests. You do use builds don’t you? Sadly, this is a very common scenario and I have had people argue that branching merely adds complexity. Then again I have seen the other side of the universe ... branching structures from he... We should somehow convince everyone that there is a happy between no-branching and too-much-branching. - Willy-Peter Schaub, VS ALM Ranger, Microsoft A key benefit of branching for development is to isolate changes from the stable Main branch. Branching adds sanity more than it adds complexity. We do try to stress in our guidance that it is important to justify a branch, by doing a cost benefit analysis. The primary cost is the effort to do merges and resolve conflicts. A key benefit is that you have a stable code base in Main and accept changes into Main only after they pass quality gates, etc. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft The second biggest mistake developers make is branching anything other than the WHOLE “Main” line. If you branch parts of your code and not others it gets out of sync and can make integration a nightmare. You should have your Source, Assets, Build scripts deployment scripts and dependencies inside the “Main” folder and branch the whole thing. Some departments within MSFT even go as far as to add the environments used to develop the product in there as well; although I would not recommend that unless you have a massive SQL cluster to house your source code. We tried the “add environment” back in South-Africa and while it was “phenomenal”, especially when having to switch between environments, the disk storage and processing requirements killed us. We opted for virtualization to skin this cat of keeping a ready-to-go environment handy. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I think people often think that you should have separate branches for separate environments (e.g. Dev, Test, Integration Test, QA, etc.). I prefer to think of deploying to environments (such as from Main to QA) rather than branching for QA). - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft You can read about SSW’s Rules to better Source Control for some additional information on what Source Control to use and how to use it. There are also a number of branching Anti-Patterns that should be avoided at all costs: You know you are on the wrong track if you experience one or more of the following symptoms in your development environment: Merge Paranoia—avoiding merging at all cost, usually because of a fear of the consequences. Merge Mania—spending too much time merging software assets instead of developing them. Big Bang Merge—deferring branch merging to the end of the development effort and attempting to merge all branches simultaneously. Never-Ending Merge—continuous merging activity because there is always more to merge. Wrong-Way Merge—merging a software asset version with an earlier version. Branch Mania—creating many branches for no apparent reason. Cascading Branches—branching but never merging back to the main line. Mysterious Branches—branching for no apparent reason. Temporary Branches—branching for changing reasons, so the branch becomes a permanent temporary workspace. Volatile Branches—branching with unstable software assets shared by other branches or merged into another branch. Note Branches are volatile most of the time while they exist as independent branches. That is the point of having them. The difference is that you should not share or merge branches while they are in an unstable state. Development Freeze—stopping all development activities while branching, merging, and building new base lines. Berlin Wall—using branches to divide the development team members, instead of dividing the work they are performing. -Branching and Merging Primer by Chris Birmele - Developer Tools Technical Specialist at Microsoft Pty Ltd in Australia In fact, this can result in a merge exercise no-one wants to be involved in, merging hundreds of thousands of change sets and trying to get a consolidated build. Again, we need to find a happy medium. - Willy-Peter Schaub on Merge Paranoia Merge conflicts are generally the result of making changes to the same file in both the target and source branch. If you create merge conflicts, you will eventually need to resolve them. Often the resolution is manual. Merging more frequently allows you to resolve these conflicts close to when they happen, making the resolution clearer. Waiting weeks or months to resolve them, the Big Bang approach, means you are more likely to resolve conflicts incorrectly. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Main line, this is where your stable code lives and where any build has known entities, always passes and has a happy test that passes as well? Many development projects consist of, a single “Main” line of source and artifacts. This is good; at least there is source control . There are however a couple of issues that need to be considered. What happens if: you and your team are working on a new set of features and the customer wants a change to his current version? you are working on two features and the customer decides to abandon one of them? you have two teams working on different feature sets and their changes start interfering with each other? I just use labels instead of branches? That's a lot of “what if’s”, but there is a simple way of preventing this. Branching… In TFS, labels are not immutable. This does not mean they are not useful. But labels do not provide a very good development isolation mechanism. Branching allows separate code sets to evolve separately (e.g. Current with hotfixes, and vNext with new development). I don’t see how labels work here. - Bill Heys, VS ALM Ranger & TFS Branching Lead, Microsoft Figure: Creating a single feature branch means you can isolate the development work on that branch. Its standard practice for large projects with lots of developers to use Feature branching and you can check the Branching Guidance for the latest recommendations from the Visual Studio ALM Rangers for other methods. In the diagram above you can see my recommendation for branching when using Scrum development with TFS 2010. It consists of a single Sprint branch to contain all the changes for the current sprint. The main branch has the permissions changes so contributors to the project can only Branch and Merge with “Main”. This will prevent accidental check-ins or checkouts of the “Main” line that would contaminate the code. The developers continue to develop on sprint one until the completion of the sprint. Note: In the real world, starting a new Greenfield project, this process starts at Sprint 2 as at the start of Sprint 1 you would have artifacts in version control and no need for isolation. Figure: Once the sprint is complete the Sprint 1 code can then be merged back into the Main line. There are always good practices to follow, and one is to always do a Forward Integration from Main into Sprint 1 before you do a Reverse Integration from Sprint 1 back into Main. In this case it may seem superfluous, but this builds good muscle memory into your developer’s work ethic and means that no bad habits are learned that would interfere with additional Scrum Teams being added to the Product. The process of completing your sprint development: The Team completes their work according to their definition of done. Merge from “Main” into “Sprint1” (Forward Integration) Stabilize your code with any changes coming from other Scrum Teams working on the same product. If you have one Scrum Team this should be quick, but there may have been bug fixes in the Release branches. (we will talk about release branches later) Merge from “Sprint1” into “Main” to commit your changes. (Reverse Integration) Check-in Delete the Sprint1 branch Note: The Sprint 1 branch is no longer required as its useful life has been concluded. Check-in Done But you are not yet done with the Sprint. The goal in Scrum is to have a “potentially shippable product” at the end of every Sprint, and we do not have that yet, we only have finished code. Figure: With Sprint 1 merged you can create a Release branch and run your final packaging and testing In 99% of all projects I have been involved in or watched, a “shippable product” only happens towards the end of the overall lifecycle, especially when sprints are short. The in-between releases are great demonstration releases, but not shippable. Perhaps it comes from my 80’s brain washing that we only ship when we reach the agreed quality and business feature bar. - Willy-Peter Schaub, VS ALM Ranger, Microsoft Although you should have been testing and packaging your code all the way through your Sprint 1 development, preferably using an automated process, you still need to test and package with stable unchanging code. This is where you do what at SSW we call a “Test Please”. This is first an internal test of the product to make sure it meets the needs of the customer and you generally use a resource external to your Team. Then a “Test Please” is conducted with the Product Owner to make sure he is happy with the output. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: If you find a deviation from the expected result you fix it on the Release branch. If during your final testing or your “Test Please” you find there are issues or bugs then you should fix them on the release branch. If you can’t fix them within the time box of your Sprint, then you will need to create a Bug and put it onto the backlog for prioritization by the Product owner. Make sure you leave plenty of time between your merge from the development branch to find and fix any problems that are uncovered. This process is commonly called Stabilization and should always be conducted once you have completed all of your User Stories and integrated all of your branches. Even once you have stabilized and released, you should not delete the release branch as you would with the Sprint branch. It has a usefulness for servicing that may extend well beyond the limited life you expect of it. Note: Don't get forced by the business into adding features into a Release branch instead that indicates the unspoken requirement is that they are asking for a product spin-off. In this case you can create a new Team Project and branch from the required Release branch to create a new Main branch for that product. And you create a whole new backlog to work from. Figure: When the Team decides it is happy with the product you can create a RTM branch. Once you have fixed all the bugs you can, and added any you can’t to the Product Backlog, and you Team is happy with the result you can create a Release. This would consist of doing the final Build and Packaging it up ready for your Sprint Review meeting. You would then create a read-only branch that represents the code you “shipped”. This is really an Audit trail branch that is optional, but is good practice. You could use a Label, but Labels are not Auditable and if a dispute was raised by the customer you can produce a verifiable version of the source code for an independent party to check. Rare I know, but you do not want to be at the wrong end of a legal battle. Like the Release branch the RTM branch should never be deleted, or only deleted according to your companies legal policy, which in the UK is usually 7 years. Figure: If you have made any changes in the Release you will need to merge back up to Main in order to finalise the changes. Nothing is really ever done until it is in Main. The same rules apply when merging any fixes in the Release branch back into Main and you should do a reverse merge before a forward merge, again for the muscle memory more than necessity at this stage. Your Sprint is now nearly complete, and you can have a Sprint Review meeting knowing that you have made every effort and taken every precaution to protect your customer’s investment. Note: In order to really achieve protection for both you and your client you would add Automated Builds, Automated Tests, Automated Acceptance tests, Acceptance test tracking, Unit Tests, Load tests, Web test and all the other good engineering practices that help produce reliable software. Figure: After the Sprint Planning meeting the process begins again. Where the Sprint Review and Retrospective meetings mark the end of the Sprint, the Sprint Planning meeting marks the beginning. After you have completed your Sprint Planning and you know what you are trying to achieve in Sprint 2 you can create your new Branch to develop in. How do we handle a bug(s) in production that can’t wait? Although in Scrum the only work done should be on the backlog there should be a little buffer added to the Sprint Planning for contingencies. One of these contingencies is a bug in the current release that can’t wait for the Sprint to finish. But how do you handle that? Willy-Peter Schaub asked an excellent question on the release activities: In reality Sprint 2 starts when sprint 1 ends + weekend. Should we not cater for a possible parallelism between Sprint 2 and the release activities of sprint 1? It would introduce FI’s from main to sprint 2, I guess. Your “Figure: Merging print 2 back into Main.” covers, what I tend to believe to be reality in most cases. - Willy-Peter Schaub, VS ALM Ranger, Microsoft I agree, and if you have a single Scrum team then your resources are limited. The Scrum Team is responsible for packaging and release, so at least one run at stabilization, package and release should be included in the Sprint time box. If more are needed on the current production release during the Sprint 2 time box then resource needs to be pulled from Sprint 2. The Product Owner and the Team have four choices (in order of disruption/cost): Backlog: Add the bug to the backlog and fix it in the next Sprint Buffer Time: Use any buffer time included in the current Sprint to fix the bug quickly Make time: Remove a Story from the current Sprint that is of equal value to the time lost fixing the bug(s) and releasing. Note: The Team must agree that it can still meet the Sprint Goal. Cancel Sprint: Cancel the sprint and concentrate all resource on fixing the bug(s) Note: This can be a very costly if the current sprint has already had a lot of work completed as it will be lost. The choice will depend on the complexity and severity of the bug(s) and both the Product Owner and the Team need to agree. In this case we will go with option #2 or #3 as they are uncomplicated but severe bugs. Figure: Real world issue where a bug needs fixed in the current release. If the bug(s) is urgent enough then then your only option is to fix it in place. You can edit the release branch to find and fix the bug, hopefully creating a test so it can’t happen again. Follow the prior process and conduct an internal and customer “Test Please” before releasing. You can read about how to conduct a Test Please on our Rules to Successful Projects: Do you conduct an internal "test please" prior to releasing a version to a client? Figure: After you have fixed the bug you need to ship again. You then need to again create an RTM branch to hold the version of the code you released in escrow. Figure: Main is now out of sync with your Release. We now need to get these new changes back up into the Main branch. Do a reverse and then forward merge again to get the new code into Main. But what about the branch, are developers not working on Sprint 2? Does Sprint 2 now have changes that are not in Main and Main now have changes that are not in Sprint 2? Well, yes… and this is part of the hit you take doing branching. But would this scenario even have been possible without branching? Figure: Getting the changes in Main into Sprint 2 is very important. The Team now needs to do a Forward Integration merge into their Sprint and resolve any conflicts that occur. Maybe the bug has already been fixed in Sprint 2, maybe the bug no longer exists! This needs to be identified and resolved by the developers before they continue to get further out of Sync with Main. Note: Avoid the “Big bang merge” at all costs. Figure: Merging Sprint 2 back into Main, the Forward Integration, and R0 terminates. Sprint 2 now merges (Reverse Integration) back into Main following the procedures we have already established. Figure: The logical conclusion. This then allows the creation of the next release. By now you should be getting the big picture and hopefully you learned something useful from this post. I know I have enjoyed writing it as I find these exploratory posts coupled with real world experience really help harden my understanding. Branching is a tool; it is not a silver bullet. Don’t over use it, and avoid “Anti-Patterns” where possible. Although the diagram above looks complicated I hope showing you how it is formed simplifies it as much as possible. Technorati Tags: Branching,Scrum,VS ALM,TFS 2010,VS2010

Read the article
Toorcon14

- by danx

Toorcon 2012 Information Security Conference San Diego, CA, http://www.toorcon.org/ Dan Anderson, October 2012 It's almost Halloween, and we all know what that means—yes, of course, it's time for another Toorcon Conference! Toorcon is an annual conference for people interested in computer security. This includes the whole range of hackers, computer hobbyists, professionals, security consultants, press, law enforcement, prosecutors, FBI, etc. We're at Toorcon 14—see earlier blogs for some of the previous Toorcon's I've attended (back to 2003). This year's "con" was held at the Westin on Broadway in downtown San Diego, California. The following are not necessarily my views—I'm just the messenger—although I could have misquoted or misparaphrased the speakers. Also, I only reviewed some of the talks, below, which I attended and interested me. MalAndroid—the Crux of Android Infections, Aditya K. Sood Programming Weird Machines with ELF Metadata, Rebecca "bx" Shapiro Privacy at the Handset: New FCC Rules?, Valkyrie Hacking Measured Boot and UEFI, Dan Griffin You Can't Buy Security: Building the Open Source InfoSec Program, Boris Sverdlik What Journalists Want: The Investigative Reporters' Perspective on Hacking, Dave Maas & Jason Leopold Accessibility and Security, Anna Shubina Stop Patching, for Stronger PCI Compliance, Adam Brand McAfee Secure & Trustmarks — a Hacker's Best Friend, Jay James & Shane MacDougall MalAndroid—the Crux of Android Infections Aditya K. Sood, IOActive, Michigan State PhD candidate Aditya talked about Android smartphone malware. There's a lot of old Android software out there—over 50% Gingerbread (2.3.x)—and most have unpatched vulnerabilities. Of 9 Android vulnerabilities, 8 have known exploits (such as the old Gingerbread Global Object Table exploit). Android protection includes sandboxing, security scanner, app permissions, and screened Android app market. The Android permission checker has fine-grain resource control, policy enforcement. Android static analysis also includes a static analysis app checker (bouncer), and a vulnerablity checker. What security problems does Android have? User-centric security, which depends on the user to grant permission and make smart decisions. But users don't care or think about malware (the're not aware, not paranoid). All they want is functionality, extensibility, mobility Android had no "proper" encryption before Android 3.0 No built-in protection against social engineering and web tricks Alternative Android app markets are unsafe. Simply visiting some markets can infect Android Aditya classified Android Malware types as: Type A—Apps. These interact with the Android app framework. For example, a fake Netflix app. Or Android Gold Dream (game), which uploads user files stealthy manner to a remote location. Type K—Kernel. Exploits underlying Linux libraries or kernel Type H—Hybrid. These use multiple layers (app framework, libraries, kernel). These are most commonly used by Android botnets, which are popular with Chinese botnet authors What are the threats from Android malware? These incude leak info (contacts), banking fraud, corporate network attacks, malware advertising, malware "Hackivism" (the promotion of social causes. For example, promiting specific leaders of the Tunisian or Iranian revolutions. Android malware is frequently "masquerated". That is, repackaged inside a legit app with malware. To avoid detection, the hidden malware is not unwrapped until runtime. The malware payload can be hidden in, for example, PNG files. Less common are Android bootkits—there's not many around. What they do is hijack the Android init framework—alteering system programs and daemons, then deletes itself. For example, the DKF Bootkit (China). Android App Problems: no code signing! all self-signed native code execution permission sandbox — all or none alternate market places no robust Android malware detection at network level delayed patch process Programming Weird Machines with ELF Metadata Rebecca "bx" Shapiro, Dartmouth College, NH https://github.com/bx/elf-bf-tools @bxsays on twitter Definitions. "ELF" is an executable file format used in linking and loading executables (on UNIX/Linux-class machines). "Weird machine" uses undocumented computation sources (I think of them as unintended virtual machines). Some examples of "weird machines" are those that: return to weird location, does SQL injection, corrupts the heap. Bx then talked about using ELF metadata as (an uintended) "weird machine". Some ELF background: A compiler takes source code and generates a ELF object file (hello.o). A static linker makes an ELF executable from the object file. A runtime linker and loader takes ELF executable and loads and relocates it in memory. The ELF file has symbols to relocate functions and variables. ELF has two relocation tables—one at link time and another one at loading time: .rela.dyn (link time) and .dynsym (dynamic table). GOT: Global Offset Table of addresses for dynamically-linked functions. PLT: Procedure Linkage Tables—works with GOT. The memory layout of a process (not the ELF file) is, in order: program (+ heap), dynamic libraries, libc, ld.so, stack (which includes the dynamic table loaded into memory) For ELF, the "weird machine" is found and exploited in the loader. ELF can be crafted for executing viruses, by tricking runtime into executing interpreted "code" in the ELF symbol table. One can inject parasitic "code" without modifying the actual ELF code portions. Think of the ELF symbol table as an "assembly language" interpreter. It has these elements: instructions: Add, move, jump if not 0 (jnz) Think of symbol table entries as "registers" symbol table value is "contents" immediate values are constants direct values are addresses (e.g., 0xdeadbeef) move instruction: is a relocation table entry add instruction: relocation table "addend" entry jnz instruction: takes multiple relocation table entries The ELF weird machine exploits the loader by relocating relocation table entries. The loader will go on forever until told to stop. It stores state on stack at "end" and uses IFUNC table entries (containing function pointer address). The ELF weird machine, called "Brainfu*k" (BF) has: 8 instructions: pointer inc, dec, inc indirect, dec indirect, jump forward, jump backward, print. Three registers - 3 registers Bx showed example BF source code that implemented a Turing machine printing "hello, world". More interesting was the next demo, where bx modified ping. Ping runs suid as root, but quickly drops privilege. BF modified the loader to disable the library function call dropping privilege, so it remained as root. Then BF modified the ping -t argument to execute the -t filename as root. It's best to show what this modified ping does with an example: $ whoami bx $ ping localhost -t backdoor.sh # executes backdoor $ whoami root $ The modified code increased from 285948 bytes to 290209 bytes. A BF tool compiles "executable" by modifying the symbol table in an existing ELF executable. The tool modifies .dynsym and .rela.dyn table, but not code or data. Privacy at the Handset: New FCC Rules? "Valkyrie" (Christie Dudley, Santa Clara Law JD candidate) Valkyrie talked about mobile handset privacy. Some background: Senator Franken (also a comedian) became alarmed about CarrierIQ, where the carriers track their customers. Franken asked the FCC to find out what obligations carriers think they have to protect privacy. The carriers' response was that they are doing just fine with self-regulation—no worries! Carriers need to collect data, such as missed calls, to maintain network quality. But carriers also sell data for marketing. Verizon sells customer data and enables this with a narrow privacy policy (only 1 month to opt out, with difficulties). The data sold is not individually identifiable and is aggregated. But Verizon recommends, as an aggregation workaround to "recollate" data to other databases to identify customers indirectly. The FCC has regulated telephone privacy since 1934 and mobile network privacy since 2007. Also, the carriers say mobile phone privacy is a FTC responsibility (not FCC). FTC is trying to improve mobile app privacy, but FTC has no authority over carrier / customer relationships. As a side note, Apple iPhones are unique as carriers have extra control over iPhones they don't have with other smartphones. As a result iPhones may be more regulated. Who are the consumer advocates? Everyone knows EFF, but EPIC (Electrnic Privacy Info Center), although more obsecure, is more relevant. What to do? Carriers must be accountable. Opt-in and opt-out at any time. Carriers need incentive to grant users control for those who want it, by holding them liable and responsible for breeches on their clock. Location information should be added current CPNI privacy protection, and require "Pen/trap" judicial order to obtain (and would still be a lower standard than 4th Amendment). Politics are on a pro-privacy swing now, with many senators and the Whitehouse. There will probably be new regulation soon, and enforcement will be a problem, but consumers will still have some benefit. Hacking Measured Boot and UEFI Dan Griffin, JWSecure, Inc., Seattle, @JWSdan Dan talked about hacking measured UEFI boot. First some terms: UEFI is a boot technology that is replacing BIOS (has whitelisting and blacklisting). UEFI protects devices against rootkits. TPM - hardware security device to store hashs and hardware-protected keys "secure boot" can control at firmware level what boot images can boot "measured boot" OS feature that tracks hashes (from BIOS, boot loader, krnel, early drivers). "remote attestation" allows remote validation and control based on policy on a remote attestation server. Microsoft pushing TPM (Windows 8 required), but Google is not. Intel TianoCore is the only open source for UEFI. Dan has Measured Boot Tool at http://mbt.codeplex.com/ with a demo where you can also view TPM data. TPM support already on enterprise-class machines. UEFI Weaknesses. UEFI toolkits are evolving rapidly, but UEFI has weaknesses: assume user is an ally trust TPM implicitly, and attached to computer hibernate file is unprotected (disk encryption protects against this) protection migrating from hardware to firmware delays in patching and whitelist updates will UEFI really be adopted by the mainstream (smartphone hardware support, bank support, apathetic consumer support) You Can't Buy Security: Building the Open Source InfoSec Program Boris Sverdlik, ISDPodcast.com co-host Boris talked about problems typical with current security audits. "IT Security" is an oxymoron—IT exists to enable buiness, uptime, utilization, reporting, but don't care about security—IT has conflict of interest. There's no Magic Bullet ("blinky box"), no one-size-fits-all solution (e.g., Intrusion Detection Systems (IDSs)). Regulations don't make you secure. The cloud is not secure (because of shared data and admin access). Defense and pen testing is not sexy. Auditors are not solution (security not a checklist)—what's needed is experience and adaptability—need soft skills. Step 1: First thing is to Google and learn the company end-to-end before you start. Get to know the management team (not IT team), meet as many people as you can. Don't use arbitrary values such as CISSP scores. Quantitive risk assessment is a myth (e.g. AV*EF-SLE). Learn different Business Units, legal/regulatory obligations, learn the business and where the money is made, verify company is protected from script kiddies (easy), learn sensitive information (IP, internal use only), and start with low-hanging fruit (customer service reps and social engineering). Step 2: Policies. Keep policies short and relevant. Generic SANS "security" boilerplate policies don't make sense and are not followed. Focus on acceptable use, data usage, communications, physical security. Step 3: Implementation: keep it simple stupid. Open source, although useful, is not free (implementation cost). Access controls with authentication & authorization for local and remote access. MS Windows has it, otherwise use OpenLDAP, OpenIAM, etc. Application security Everyone tries to reinvent the wheel—use existing static analysis tools. Review high-risk apps and major revisions. Don't run different risk level apps on same system. Assume host/client compromised and use app-level security control. Network security VLAN != segregated because there's too many workarounds. Use explicit firwall rules, active and passive network monitoring (snort is free), disallow end user access to production environment, have a proxy instead of direct Internet access. Also, SSL certificates are not good two-factor auth and SSL does not mean "safe." Operational Controls Have change, patch, asset, & vulnerability management (OSSI is free). For change management, always review code before pushing to production For logging, have centralized security logging for business-critical systems, separate security logging from administrative/IT logging, and lock down log (as it has everything). Monitor with OSSIM (open source). Use intrusion detection, but not just to fulfill a checkbox: build rules from a whitelist perspective (snort). OSSEC has 95% of what you need. Vulnerability management is a QA function when done right: OpenVas and Seccubus are free. Security awareness The reality is users will always click everything. Build real awareness, not compliance driven checkbox, and have it integrated into the culture. Pen test by crowd sourcing—test with logging COSSP http://www.cossp.org/ - Comprehensive Open Source Security Project What Journalists Want: The Investigative Reporters' Perspective on Hacking Dave Maas, San Diego CityBeat Jason Leopold, Truthout.org The difference between hackers and investigative journalists: For hackers, the motivation varies, but method is same, technological specialties. For investigative journalists, it's about one thing—The Story, and they need broad info-gathering skills. J-School in 60 Seconds: Generic formula: Person or issue of pubic interest, new info, or angle. Generic criteria: proximity, prominence, timeliness, human interest, oddity, or consequence. Media awareness of hackers and trends: journalists becoming extremely aware of hackers with congressional debates (privacy, data breaches), demand for data-mining Journalists, use of coding and web development for Journalists, and Journalists busted for hacking (Murdock). Info gathering by investigative journalists include Public records laws. Federal Freedom of Information Act (FOIA) is good, but slow. California Public Records Act is a lot stronger. FOIA takes forever because of foot-dragging—it helps to be specific. Often need to sue (especially FBI). CPRA is faster, and requests can be vague. Dumps and leaks (a la Wikileaks) Journalists want: leads, protecting ourselves, our sources, and adapting tools for news gathering (Google hacking). Anonomity is important to whistleblowers. They want no digital footprint left behind (e.g., email, web log). They don't trust encryption, want to feel safe and secure. Whistleblower laws are very weak—there's no upside for whistleblowers—they have to be very passionate to do it. Accessibility and Security or: How I Learned to Stop Worrying and Love the Halting Problem Anna Shubina, Dartmouth College Anna talked about how accessibility and security are related. Accessibility of digital content (not real world accessibility). mostly refers to blind users and screenreaders, for our purpose. Accessibility is about parsing documents, as are many security issues. "Rich" executable content causes accessibility to fail, and often causes security to fail. For example MS Word has executable format—it's not a document exchange format—more dangerous than PDF or HTML. Accessibility is often the first and maybe only sanity check with parsing. They have no choice because someone may want to read what you write. Google, for example, is very particular about web browser you use and are bad at supporting other browsers. Uses JavaScript instead of links, often requiring mouseover to display content. PDF is a security nightmare. Executible format, embedded flash, JavaScript, etc. 15 million lines of code. Google Chrome doesn't handle PDF correctly, causing several security bugs. PDF has an accessibility checker and PDF tagging, to help with accessibility. But no PDF checker checks for incorrect tags, untagged content, or validates lists or tables. None check executable content at all. The "Halting Problem" is: can one decide whether a program will ever stop? The answer, in general, is no (Rice's theorem). The same holds true for accessibility checkers. Language-theoretic Security says complicated data formats are hard to parse and cannot be solved due to the Halting Problem. W3C Web Accessibility Guidelines: "Perceivable, Operable, Understandable, Robust" Not much help though, except for "Robust", but here's some gems: * all information should be parsable (paraphrasing) * if not parsable, cannot be converted to alternate formats * maximize compatibility in new document formats Executible webpages are bad for security and accessibility. They say it's for a better web experience. But is it necessary to stuff web pages with JavaScript for a better experience? A good example is The Drudge Report—it has hand-written HTML with no JavaScript, yet drives a lot of web traffic due to good content. A bad example is Google News—hidden scrollbars, guessing user input. Solutions: Accessibility and security problems come from same source Expose "better user experience" myth Keep your corner of Internet parsable Remember "Halting Problem"—recognize false solutions (checking and verifying tools) Stop Patching, for Stronger PCI Compliance Adam Brand, protiviti @adamrbrand, http://www.picfun.com/ Adam talked about PCI compliance for retail sales. Take an example: for PCI compliance, 50% of Brian's time (a IT guy), 960 hours/year was spent patching POSs in 850 restaurants. Often applying some patches make no sense (like fixing a browser vulnerability on a server). "Scanner worship" is overuse of vulnerability scanners—it gives a warm and fuzzy and it's simple (red or green results—fix reds). Scanners give a false sense of security. In reality, breeches from missing patches are uncommon—more common problems are: default passwords, cleartext authentication, misconfiguration (firewall ports open). Patching Myths: Myth 1: install within 30 days of patch release (but PCI Â§6.1 allows a "risk-based approach" instead). Myth 2: vendor decides what's critical (also PCI Â§6.1). But Â§6.2 requires user ranking of vulnerabilities instead. Myth 3: scan and rescan until it passes. But PCI Â§11.2.1b says this applies only to high-risk vulnerabilities. Adam says good recommendations come from NIST 800-40. Instead use sane patching and focus on what's really important. From NIST 800-40: Proactive: Use a proactive vulnerability management process: use change control, configuration management, monitor file integrity. Monitor: start with NVD and other vulnerability alerts, not scanner results. Evaluate: public-facing system? workstation? internal server? (risk rank) Decide:on action and timeline Test: pre-test patches (stability, functionality, rollback) for change control Install: notify, change control, tickets McAfee Secure & Trustmarks — a Hacker's Best Friend Jay James, Shane MacDougall, Tactical Intelligence Inc., Canada "McAfee Secure Trustmark" is a website seal marketed by McAfee. A website gets this badge if they pass their remote scanning. The problem is a removal of trustmarks act as flags that you're vulnerable. Easy to view status change by viewing McAfee list on website or on Google. "Secure TrustGuard" is similar to McAfee. Jay and Shane wrote Perl scripts to gather sites from McAfee and search engines. If their certification image changes to a 1x1 pixel image, then they are longer certified. Their scripts take deltas of scans to see what changed daily. The bottom line is change in TrustGuard status is a flag for hackers to attack your site. Entire idea of seals is silly—you're raising a flag saying if you're vulnerable.

Read the article
Node.js Adventure - Host Node.js on Windows Azure Worker Role

- by Shaun

In my previous post I demonstrated about how to develop and deploy a Node.js application on Windows Azure Web Site (a.k.a. WAWS). WAWS is a new feature in Windows Azure platform. Since it’s low-cost, and it provides IIS and IISNode components so that we can host our Node.js application though Git, FTP and WebMatrix without any configuration and component installation. But sometimes we need to use the Windows Azure Cloud Service (a.k.a. WACS) and host our Node.js on worker role. Below are some benefits of using worker role. - WAWS leverages IIS and IISNode to host Node.js application, which runs in x86 WOW mode. It reduces the performance comparing with x64 in some cases. - WACS worker role does not need IIS, hence there’s no restriction of IIS, such as 8000 concurrent requests limitation. - WACS provides more flexibility and controls to the developers. For example, we can RDP to the virtual machines of our worker role instances. - WACS provides the service configuration features which can be changed when the role is running. - WACS provides more scaling capability than WAWS. In WAWS we can have at most 3 reserved instances per web site while in WACS we can have up to 20 instances in a subscription. - Since when using WACS worker role we starts the node by ourselves in a process, we can control the input, output and error stream. We can also control the version of Node.js. Run Node.js in Worker Role Node.js can be started by just having its execution file. This means in Windows Azure, we can have a worker role with the “node.exe” and the Node.js source files, then start it in Run method of the worker role entry class. Let’s create a new windows azure project in Visual Studio and add a new worker role. Since we need our worker role execute the “node.exe” with our application code we need to add the “node.exe” into our project. Right click on the worker role project and add an existing item. By default the Node.js will be installed in the “Program Files\nodejs” folder so we can navigate there and add the “node.exe”. Then we need to create the entry code of Node.js. In WAWS the entry file must be named “server.js”, which is because it’s hosted by IIS and IISNode and IISNode only accept “server.js”. But here as we control everything we can choose any files as the entry code. For example, I created a new JavaScript file named “index.js” in project root. Since we created a C# Windows Azure project we cannot create a JavaScript file from the context menu “Add new item”. We have to create a text file, and then rename it to JavaScript extension. After we added these two files we should set their “Copy to Output Directory” property to “Copy Always”, or “Copy if Newer”. Otherwise they will not be involved in the package when deployed. Let’s paste a very simple Node.js code in the “index.js” as below. As you can see I created a web server listening at port 12345. 1: var http = require("http"); 2: var port = 12345; 3: 4: http.createServer(function (req, res) { 5: res.writeHead(200, { "Content-Type": "text/plain" }); 6: res.end("Hello World\n"); 7: }).listen(port); 8: 9: console.log("Server running at port %d", port); Then we need to start “node.exe” with this file when our worker role was started. This can be done in its Run method. I found the Node.js and entry JavaScript file name, and then create a new process to run it. Our worker role will wait for the process to be exited. If everything is OK once our web server was opened the process will be there listening for incoming requests, and should not be terminated. The code in worker role would be like this. 1: public override void Run() 2: { 3: // This is a sample worker implementation. Replace with your logic. 4: Trace.WriteLine("NodejsHost entry point called", "Information"); 5: 6: // retrieve the node.exe and entry node.js source code file name. 7: var node = Environment.ExpandEnvironmentVariables(@"%RoleRoot%\approot\node.exe"); 8: var js = "index.js"; 9: 10: // prepare the process starting of node.exe 11: var info = new ProcessStartInfo(node, js) 12: { 13: CreateNoWindow = false, 14: ErrorDialog = true, 15: WindowStyle = ProcessWindowStyle.Normal, 16: UseShellExecute = false, 17: WorkingDirectory = Environment.ExpandEnvironmentVariables(@"%RoleRoot%\approot") 18: }; 19: Trace.WriteLine(string.Format("{0} {1}", node, js), "Information"); 20: 21: // start the node.exe with entry code and wait for exit 22: var process = Process.Start(info); 23: process.WaitForExit(); 24: } Then we can run it locally. In the computer emulator UI the worker role started and it executed the Node.js, then Node.js windows appeared. Open the browser to verify the website hosted by our worker role. Next let’s deploy it to azure. But we need some additional steps. First, we need to create an input endpoint. By default there’s no endpoint defined in a worker role. So we will open the role property window in Visual Studio, create a new input TCP endpoint to the port we want our website to use. In this case I will use 80. Even though we created a web server we should add a TCP endpoint of the worker role, since Node.js always listen on TCP instead of HTTP. And then changed the “index.js”, let our web server listen on 80. 1: var http = require("http"); 2: var port = 80; 3: 4: http.createServer(function (req, res) { 5: res.writeHead(200, { "Content-Type": "text/plain" }); 6: res.end("Hello World\n"); 7: }).listen(port); 8: 9: console.log("Server running at port %d", port); Then publish it to Windows Azure. And then in browser we can see our Node.js website was running on WACS worker role. We may encounter an error if we tried to run our Node.js website on 80 port at local emulator. This is because the compute emulator registered 80 and map the 80 endpoint to 81. But our Node.js cannot detect this operation. So when it tried to listen on 80 it will failed since 80 have been used. Use NPM Modules When we are using WAWS to host Node.js, we can simply install modules we need, and then just publish or upload all files to WAWS. But if we are using WACS worker role, we have to do some extra steps to make the modules work. Assuming that we plan to use “express” in our application. Firstly of all we should download and install this module through NPM command. But after the install finished, they are just in the disk but not included in the worker role project. If we deploy the worker role right now the module will not be packaged and uploaded to azure. Hence we need to add them to the project. On solution explorer window click the “Show all files” button, select the “node_modules” folder and in the context menu select “Include In Project”. But that not enough. We also need to make all files in this module to “Copy always” or “Copy if newer”, so that they can be uploaded to azure with the “node.exe” and “index.js”. This is painful step since there might be many files in a module. So I created a small tool which can update a C# project file, make its all items as “Copy always”. The code is very simple. 1: static void Main(string[] args) 2: { 3: if (args.Length < 1) 4: { 5: Console.WriteLine("Usage: copyallalways [project file]"); 6: return; 7: } 8: 9: var proj = args[0]; 10: File.Copy(proj, string.Format("{0}.bak", proj)); 11: 12: var xml = new XmlDocument(); 13: xml.Load(proj); 14: var nsManager = new XmlNamespaceManager(xml.NameTable); 15: nsManager.AddNamespace("pf", "http://schemas.microsoft.com/developer/msbuild/2003"); 16: 17: // add the output setting to copy always 18: var contentNodes = xml.SelectNodes("//pf:Project/pf:ItemGroup/pf:Content", nsManager); 19: UpdateNodes(contentNodes, xml, nsManager); 20: var noneNodes = xml.SelectNodes("//pf:Project/pf:ItemGroup/pf:None", nsManager); 21: UpdateNodes(noneNodes, xml, nsManager); 22: xml.Save(proj); 23: 24: // remove the namespace attributes 25: var content = xml.InnerXml.Replace("<CopyToOutputDirectory xmlns=\"\">", "<CopyToOutputDirectory>"); 26: xml.LoadXml(content); 27: xml.Save(proj); 28: } 29: 30: static void UpdateNodes(XmlNodeList nodes, XmlDocument xml, XmlNamespaceManager nsManager) 31: { 32: foreach (XmlNode node in nodes) 33: { 34: var copyToOutputDirectoryNode = node.SelectSingleNode("pf:CopyToOutputDirectory", nsManager); 35: if (copyToOutputDirectoryNode == null) 36: { 37: var n = xml.CreateNode(XmlNodeType.Element, "CopyToOutputDirectory", null); 38: n.InnerText = "Always"; 39: node.AppendChild(n); 40: } 41: else 42: { 43: if (string.Compare(copyToOutputDirectoryNode.InnerText, "Always", true) != 0) 44: { 45: copyToOutputDirectoryNode.InnerText = "Always"; 46: } 47: } 48: } 49: } Please be careful when use this tool. I created only for demo so do not use it directly in a production environment. Unload the worker role project, execute this tool with the worker role project file name as the command line argument, it will set all items as “Copy always”. Then reload this worker role project. Now let’s change the “index.js” to use express. 1: var express = require("express"); 2: var app = express(); 3: 4: var port = 80; 5: 6: app.configure(function () { 7: }); 8: 9: app.get("/", function (req, res) { 10: res.send("Hello Node.js!"); 11: }); 12: 13: app.get("/User/:id", function (req, res) { 14: var id = req.params.id; 15: res.json({ 16: "id": id, 17: "name": "user " + id, 18: "company": "IGT" 19: }); 20: }); 21: 22: app.listen(port); Finally let’s publish it and have a look in browser. Use Windows Azure SQL Database We can use Windows Azure SQL Database (a.k.a. WACD) from Node.js as well on worker role hosting. Since we can control the version of Node.js, here we can use x64 version of “node-sqlserver” now. This is better than if we host Node.js on WAWS since it only support x86. Just install the “node-sqlserver” module from NPM, copy the “sqlserver.node” from “Build\Release” folder to “Lib” folder. Include them in worker role project and run my tool to make them to “Copy always”. Finally update the “index.js” to use WASD. 1: var express = require("express"); 2: var sql = require("node-sqlserver"); 3: 4: var connectionString = "Driver={SQL Server Native Client 10.0};Server=tcp:{SERVER NAME}.database.windows.net,1433;Database={DATABASE NAME};Uid={LOGIN}@{SERVER NAME};Pwd={PASSWORD};Encrypt=yes;Connection Timeout=30;"; 5: var port = 80; 6: 7: var app = express(); 8: 9: app.configure(function () { 10: app.use(express.bodyParser()); 11: }); 12: 13: app.get("/", function (req, res) { 14: sql.open(connectionString, function (err, conn) { 15: if (err) { 16: console.log(err); 17: res.send(500, "Cannot open connection."); 18: } 19: else { 20: conn.queryRaw("SELECT * FROM [Resource]", function (err, results) { 21: if (err) { 22: console.log(err); 23: res.send(500, "Cannot retrieve records."); 24: } 25: else { 26: res.json(results); 27: } 28: }); 29: } 30: }); 31: }); 32: 33: app.get("/text/:key/:culture", function (req, res) { 34: sql.open(connectionString, function (err, conn) { 35: if (err) { 36: console.log(err); 37: res.send(500, "Cannot open connection."); 38: } 39: else { 40: var key = req.params.key; 41: var culture = req.params.culture; 42: var command = "SELECT * FROM [Resource] WHERE [Key] = '" + key + "' AND [Culture] = '" + culture + "'"; 43: conn.queryRaw(command, function (err, results) { 44: if (err) { 45: console.log(err); 46: res.send(500, "Cannot retrieve records."); 47: } 48: else { 49: res.json(results); 50: } 51: }); 52: } 53: }); 54: }); 55: 56: app.get("/sproc/:key/:culture", function (req, res) { 57: sql.open(connectionString, function (err, conn) { 58: if (err) { 59: console.log(err); 60: res.send(500, "Cannot open connection."); 61: } 62: else { 63: var key = req.params.key; 64: var culture = req.params.culture; 65: var command = "EXEC GetItem '" + key + "', '" + culture + "'"; 66: conn.queryRaw(command, function (err, results) { 67: if (err) { 68: console.log(err); 69: res.send(500, "Cannot retrieve records."); 70: } 71: else { 72: res.json(results); 73: } 74: }); 75: } 76: }); 77: }); 78: 79: app.post("/new", function (req, res) { 80: var key = req.body.key; 81: var culture = req.body.culture; 82: var val = req.body.val; 83: 84: sql.open(connectionString, function (err, conn) { 85: if (err) { 86: console.log(err); 87: res.send(500, "Cannot open connection."); 88: } 89: else { 90: var command = "INSERT INTO [Resource] VALUES ('" + key + "', '" + culture + "', N'" + val + "')"; 91: conn.queryRaw(command, function (err, results) { 92: if (err) { 93: console.log(err); 94: res.send(500, "Cannot retrieve records."); 95: } 96: else { 97: res.send(200, "Inserted Successful"); 98: } 99: }); 100: } 101: }); 102: }); 103: 104: app.listen(port); Publish to azure and now we can see our Node.js is working with WASD through x64 version “node-sqlserver”. Summary In this post I demonstrated how to host our Node.js in Windows Azure Cloud Service worker role. By using worker role we can control the version of Node.js, as well as the entry code. And it’s possible to do some pre jobs before the Node.js application started. It also removed the IIS and IISNode limitation. I personally recommended to use worker role as our Node.js hosting. But there are some problem if you use the approach I mentioned here. The first one is, we need to set all JavaScript files and module files as “Copy always” or “Copy if newer” manually. The second one is, in this way we cannot retrieve the cloud service configuration information. For example, we defined the endpoint in worker role property but we also specified the listening port in Node.js hardcoded. It should be changed that our Node.js can retrieve the endpoint. But I can tell you it won’t be working here. In the next post I will describe another way to execute the “node.exe” and Node.js application, so that we can get the cloud service configuration in Node.js. I will also demonstrate how to use Windows Azure Storage from Node.js by using the Windows Azure Node.js SDK. Hope this helps, Shaun All documents and related graphics, codes are provided "AS IS" without warranty of any kind. Copyright © Shaun Ziyan Xu. This work is licensed under the Creative Commons License.

Read the article

Search Results

Search found 5221 results on 209 pages for 'low latency'.

Page 207/209 | < Previous Page | 203 204 205 206 207 208 209 | Next Page >

- by j_lewis

- by masood

- by orionrush

- by Kitteran

- by bob moch

- by user555854

- by DaveCol

- by pranjal

- by twistedbrain

- by Anadi Misra

- by Anadi Misra

- by Anadi Misra

- by Daniel

- by WishCow

- by Geoff Dalgas

- by Kamil Kisiel

- by Rick Strahl

- by raul.goycoolea

- by Rick Strahl

- by Jon Galloway

- by Rick Strahl

- by Paul White

- by Martin Hinshelwood

- by danx

- by Shaun

< Previous Page | 203 204 205 206 207 208 209 | Next Page >